How does BlueConduit use an algorithm to find lead pipes?
BlueConduit’s algorithm is based on statistical best practices. The model uses information that is known to help make predictions about information that is not known with certainty. It combines existing service line data (historical and recently verified), the results of inspections at a representative set of unknown service lines, and other relevant parcel data to generate home-by-home probabilities of the presence of LSLs.
Gathering inspection data from a representative set of unknown service lines allows BlueConduit to reduce potential data biases and improve the model’s efficacy. This practice is a key tenet for how BlueConduit approaches its work of building predictive algorithms that promote equity. The relative predictiveness of different features describing a service line and a parcel varies from community to community. We work to get as much data as we can about a city, train the model on representative data, and allow the algorithm’s feature selection to determine which features have the greatest predictive weight.
BlueConduit highlights the importance of using representative data to build and train a predictive model. What are the risks associated with using non-representative data?
There’s a common phrase about statistical models: “garbage in, garbage out.” A statistical model will generate biased, unhelpful probabilities and recommendations if it is built using non-representative data.
BlueConduit’s data science team evaluated data from different LSL replacement programs and noted that a statistical model built on non-representative data performs poorly in recall – the ability of a model to identify all the LSLs in a system. A biased model may be decently accurate about what it knows, but it doesn’t know what it doesn’t know.
In some instances, biased data could result in over- or underestimated LSL counts. In other instances, biased data could risk excluding high-risk neighborhoods from LSL replacement efforts altogether. For example, if there is a part of a town that has a high concentration of lead but there aren’t any verified data points in the dataset from that community, the predictive model risks overlooking that neighborhood in the replacement effort. A model based on representative data helps mitigate challenges such as these.
Using representative sample data is a critical component of ethical data science and should serve as the foundation of analysis for any data-driven decision making process.
What data does a city need to provide BlueConduit?
BlueConduit uses the following types of data to create LSL predictions:
- Verified service line material records
- Historical service line material records
- Codified water ordinances
- Real estate, parcel, or structure data
- Parcel ID
- Year built
- Coordinates or geometries
- Home value
- Acreage or lot size
- Property class or zoning
- Recent surveys
- Construction records related to SL replacements, plumbing work, renovations, and/or new construction
- Water test results
- Water main size and material data
- Fire hydrant locations and attributes
- Water billing information
- Census data
What is the key metric of success for the application of machine learning to the identification of LSLs?
Hit rate over time is the key metric of success for the application of machine learning to the identification of LSLs. Hit rate is the percentage of accurate service line material predictions in the field -or- the number of LSLs that were identified divided by the number of attempted replacements regardless of what material was discovered. The higher the percentage hit rate in the field, the better for all.
Hit rate can be computed for an entire region or broken down into a specific geography or time. It can also apply to any replacement project whether or not a predictive model was used. Regardless of a hit rate’s parameters, it must be evaluated over time because it is likely to change as LSLs are excavated from the properties that were most likely to have lead present.
BlueConduit regularly achieves hit rates of more than 80% in its replacement projects across the United States, which translates into
- verified lead service line inventories that comply with EPA requirements and
- lead service line replacement efforts that are efficient, cost-effective, equitable, and community-focused.
For example, higher hit rate accuracy reduces the number of exploratory digs conducted in a city. With an estimated cost of $2,000 per excavation, a city could save upwards of $50,000 by avoiding 25 unnecessary excavations. See more information about hit rates and cost savings in our Flint Case study [link to case study].
How does BlueConduit’s platform help facilitate community engagement and build public trust?
If decision makers don’t welcome resident perspectives or gain community buy-in, LSL inventory and replacement programs will fail even before they start. Residents need to be given the opportunity to share their needs, concerns, and priorities. They need a forum where they can ask questions and gain a greater understanding of what local officials are doing to address the public health crisis caused by lead pipes.
The BlueConduit platform can be used to distribute and track resident communications. The platform can also generate maps and statistics that demonstrate the scope of the LSL problem and communicate progress toward LSL removal and replacement.
How can utilities best communicate the progress of LSL inventory and replacement programs to their customers?
The EPA’s Lead and Copper Rule (LCR) requires water systems to make their service line inventories accessible to the public. BlueConduit’s platform creates a public-facing LSL map that water systems can use to be LCR compliant.
The cities of Flint (see the Flint Pipe Map) and Toledo (see the Toledo Map) are already using BlueConduit LSL maps as a critical component of their public education campaigns. The maps empower residents with information about
- the known or likely service line material at their address (i.e., lead or copper),
- the status of their city’s pipe replacement project at an individual residence and neighborhood level, and
- the action steps they can take to minimize the risk of lead exposure.