Editor’s note: BlueConduit recently hosted a webinar around the process of using predictive modeling to classify unknown service lines. In this series, we detail some of the questions we received.
The short answer is, no and it depends. Let’s explore what matters.
There are a lot of terms out there – predictive modeling, statistical analysis, machine learning, and AI, just to name a few. They may be used interchangeably in the market, but they’re not the same. Let’s look at predictive modeling first. Predictive modeling takes existing data and looks for patterns in that data, which results in a lead likelihood prediction. For example, this service line is 92% likely to be lead.
Predictive modeling requires known lead or GRR service lines in order to predict the likelihood of other lead/GRR service lines. Why? Without known lead, the predictive model doesn’t have any information about where lead is likely to be found.
A history of lead pipes also counts as ‘known lead.’ For systems that don’t know of any lead/GRR currently in the ground, but have records of where those pipes have been found in the past, predictive modeling can use that data to generate predictions for lines still in the ground today.
So what do you do if you don’t have any known lead pipes, current or historical, in the system?
No known lead is not the same as not having any lead. So first, you need better data to help confirm that it is unlikely that you have lead pipes in the ground.
To do this, you must physically verify a representative sample of service lines. If you find lead, then you’re able to use predictive modeling to generate lead likelihood predictions for the rest of the service lines with unknown materials.
If you don’t find lead in this representative sample, then you can’t use predictive modeling. However, water systems without lead can instead use statistical analysis to validate the absence of lead in their system.
This statistical approach uses field-verified data to make statistical calculations about what hasn’t been found. For example, if a water system completed 300 field investigations within a representative sample and didn’t find any lead lines, statistical analysis can be used to show (statistically) that they are unlikely to have any lead service lines in the system. For purposes of service line inventories, most states will accept statistical analysis, accompanied by a non-lead validation report, to allow the water system to reclassify their entire system as non-lead.
Related: Statistically validating No Lead or Low Lead for Service Line Inventory compliance
An example of using statistical analysis with no known lead: Goshen, Indiana
A real-life example of this is the city of Goshen, Indiana. After classifying ~70 percent of their system as non-lead through historical records and field verifications, Goshen used BlueConduit’s statistical analysis tools to show that their remaining service lines had a lead likelihood of less than 0.5 percent on the utility side, and 0.05 percent on the customer side.
“For us, using predictive modeling was more than just a cost savings – it helped us reduce our unknowns much more quickly, thereby reducing our LCRR notification and, eventually, our replacement requirements,” said Mattie Lehman, GIS Coordinator with Goshen.
Get more information about predictive modeling, statistical analysis, and machine learning by watching our webinar on-demand, or contact us to learn more.