12 January 2026

Generated Assets: Underground Cable

Summary

This blog highlights how a major Australian Electrical distributor transformed fragmented GIS data into consolidated, decision-ready assets by leveraging graph processing and spatial analytics in Databricks. The solution accelerates cost-benefit analysis and scales across the entire network.

Why It Matters

Traditional GIS breaks a single physical cable into multiple segments whenever any installation detail changes, even though it is still the same physical cable. For example, one cable may be directly buried, then run through ducting, and then return to direct lay as it passes through a structure – yet GIS records this as multiple separate assets. This over-segmentation slows down planning and makes the impacts of replacement hard to analyse and estimate.

Generated Assets resolves this by stitching those segments into a single physical-cable view, enabling clear costing, risk assessment, and replacement decisions.

Solution Overview

Using node-based connectivity, we modelled the electrical network as a graph, where nodes represent grid objects (such as supply points and substations), and edges represent the cables connecting them. A graph-based approach was a natural fit for this problem, as the core requirement was to trace all connected components between predefined termination nodes.

Initially, NetworkX (a Python library for the creation, manipulation, and analysis of complex networks and graphs) was used to build and traverse graphs. This worked well for validating logic on a single feeder and allowed rapid experimentation with tracing behaviour. However, this approach does not scale efficiently across the full network, where millions of nodes and connections must be processed.

To address scalability, GraphFrames was introduced. GraphFrames is a Spark-native graph processing library designed for large-scale, distributed graph analytics, making it well-suited to our data volumes. Its strength lies in efficiently identifying connected components across the entire network. However, GraphFrames offers limited built-in support for complex tracing logic, whereas NetworkX provides rich and flexible traversal capabilities.

As a result, a hybrid approach was adopted to leverage the strengths of both tools. This hybrid design combines the scalability of Spark and GraphFrames with the flexibility of NetworkX, enabling efficient, large-scale network tracing without sacrificing analytical capability. The process works as follows:

  • GraphFrames is used first to identify connected components, effectively breaking the entire network into smaller, independent sub-networks.
  • Using Spark’s applyInPandas functionality, each connected component is then processed independently. applyInPandas allows custom Python logic to be executed on grouped Spark data while still benefiting from distributed execution.
  • Within each component, NetworkX is used to construct an in-memory graph.
  • Start and end (termination) nodes are identified within that graph.
  • Tracing is then performed to determine all components and paths between those termination points.

Key Benefits

  • Significantly faster than the legacy FME solution (FME by Safe Software is a Feature Manipulation Engine is a complete data access solution for reading, writing and transforming spatial data)
  • Eliminates the need for complex and expensive spatial joins
  • Uses node-based connectivity rather than spatial proximity
  • Leverages unified UC data already available

Is your network data limiting confident asset decisions?

Connect with us to explore how graph analytics and Databricks can turn fragmented GIS data into decision-ready assets.