Back to Blog

Blog

Turning unknowns into action: insights from an AWWA webinar

Alice Berners-Lee, PhD

Blog
November 11, 2025

Alice Berners-Lee, PhD

Alice Berners-Lee, PhD is the VP of Data Science at BlueConduit. She leads the Data Science team to build tools to deliver machine learning (ML) models across water assets. She has a background in neuroscience and over a decade of experience leveraging large-scale data analysis to advance rigorous scientific progress on challenging problems. Alice is passionate about using science to help the environment and communities and to empower people to gain insights from their data.

As VP of Data Science at BlueConduit, one of the things I enjoy most is connecting with people across the water industry to explore how AI and machine learning can help solve real problems. I was lucky enough to be invited to do just that a few weeks ago at an AWWA (American Water Works Association) webinar.

This webinar concerned Using Predictive Modeling to Meet LCRI Inventory Compliance and BlueConduit’s data science work was being shared in two of the five partner presentations. The City of Bloomington, IN and Arcadis presented our work modeling in a low-lead environment. The City of Goshen, IN and I presented our work performing statistical analysis in a no-lead environment, as well as modeling for galvanized steel in a high-galvanized environment. These two case studies highlighted some key aspects of our data science work.

Reduce unknowns quickly

In both of these cases, the utilities were able to use modeling as a means to reduce unknowns quickly and rigorously. This way systems can quickly have results that are defensible and meet the high bar for classification.

Bloomington suspected there was some legacy lead but had very few confirmed examples for us to train our models on (only 5 at first). Nevertheless, our AI/ML models allowed Bloomington to classify over 5k unknowns as non-lead.

In Goshen, where they suspected there was no lead at all, our recommended inspections and no-lead statistical analysis allowed for classification of the entire inventory as non-lead. This reduced their unknowns to zero.

Choose the right tools

These examples also highlight that it is important to pick the right tool for your reality. Below is a chart of the three different AI/ML/Statistical techniques that went into the work for these two cities.

Scenario	System	Context & Challenge	Techniques Used	Three Biggest Positive Impacts / Outcomes
“No-Lead”	City of Goshen, IN	Utility staff and records indicated no historical lead, but documentation was insufficient to prove it for compliance.	– Representative statistical sampling of both public and private service lines to achieve 95% confidence that no lead existed.- Demographic and geographic representativeness checks to avoid bias. – Statistical analysis rather than ML. Since no lead data exists to “train” on, this is the appropriate approach.	✅ Allowed state certification as non-lead, eliminating unnecessary replacement and notification requirements. ✅ Freed up funding and time to focus on galvanized connectors instead. ✅ Completed full inventory within 2 years despite limited initial records.
“Low-Lead”	City of Bloomington, IN	System known to have very limited lead (5 confirmed lines) among ~26,000+ services, creating difficulty for traditional statistical validation.	– Built predictive machine learning models – Applied thresholding methods to classify service lines as likely non-lead, unknown, or likely lead. – Collaborated with regulators to set acceptable false-negative rates.	✅ Targeted field investigations efficiently found additional 17 lead lines and reduced unknowns by ~40%. ✅ Strengthened GIS and records data through integrated modeling workflows. ✅ Established an iterative, defensible modeling process for future LCRI updates.
“High-Galvanized”	City of Goshen, IN (second phase)	After confirming no lead, Goshen still had galvanized lines with legacy lead connectors which they wanted to prioritize for replacement.	– Developed machine learning models focused on galvanized lines. – Important features included structure year, annexation year, distance from town center/hydrant, and water-main installation year. – Combined model results with replacement planning & funding strategy.	✅ Identified 458 galvanized service lines with lead connectors for prioritized replacement. ✅ Enabled targeted neighborhood-level applications to access funds. ✅ Gave the utility explainable results for transparent communication with ratepayers.

Statistical analysis vs machine learning

These three case studies analyzing differing amounts of potentially unsafe material (lead or galvanized) can be further separated into two categories of analysis: Statistical Analysis vs Machine Learning. These techniques can be chosen between or used in tandem depending on the need. They have some key similarities and differences, some of which are highlighted below.

Similarities	Differences
They both: – Necessitate representative sampling – Provide risk assessment at each home (i.e. probability of lead/galvanized) – Provide confidence (e.g. 95%) – Can be updated with more information	– Machine Learning necessitates examples of safe and unsafe materials (e.g. lead and non-lead) – Machine Learning provides prioritization of risk (i.e. ranking homes from high to low probability of risk) – Machine Learning provides explainability of why risk may occur (aspects in the environment that are important in discriminating risk)

The importance of iteration

BlueConduit’s work with each of these cities also shows the importance of iterating with multiple rounds of modeling. Both utilities used the results of field work to improve accuracy and credibility, and allowing for an iterative process hypercharged these efforts.

In Bloomington, BlueConduit modeling drove targeted verifications, prioritizing digs where the model saw the most plausible lead signal, resulting in finding over 4x more lead lines (from 5 to 22).

In Goshen, BlueConduit recommended many iterations of inspections, starting with a random and representative sample and moving to targeted inspections after there was enough signal for a preliminary model. Continued rounds of small batches of inspections allowed for a more dynamic model-building process, saving time and money on digs.

Our playbook in one checklist

☐ Start with representative sampling (this is vital if you want to use any of these AI/ML/Statistics tools)
☐ Align method to question
☐ Establish regulatory alignment
☐ Prioritize explainability
☐ Institutionalize iteration

Discover your solutions

At BlueConduit, we’re method-agnostic and evidence-first. If your history points to no lead, we’ll help you finish fast with statistics. If you’ve got some lead, or galvanized/connector questions, we’ll put predictive modeling to work so every field hour moves the meter. Reach out to our team to find out what route will close unknowns fastest for your system.