If you’re a water system with unknown service lines in your inventory, you’ve probably heard that you should use predictive modeling to classify those lines. (You should.) But how, exactly, do predictions influence whether a line is classified as lead, non-lead, or remains classified as unknown? In this post, we share what we do at BlueConduit to help customers use the predictions we generate to classify their unknown materials as lead, GRR or non-lead.
Getting started
To start, you need high trust in your predictive modeling results. High quality predictive models should provide segment-level lead likelihood predictions and full line classification, be dynamic, provide explainability, offer custom thresholds for lead likelihood classification, and have trustworthy performance metrics. Anything less and you could be at risk of not meeting compliance requirements and making decisions based on low quality data.
At BlueConduit, once a prediction is generated, our data scientists (who are humans, btw) work with each customer to define custom high and low lead likelihood thresholds that account for state guidance, unique local needs and data best practices, so they can classify as many unknowns as possible. Let’s talk a little more about what thresholding means, exactly.
Using thresholding
While predictions provide an exact percentage of a line’s probability – for example, “Service Line A is 92 percent likely to be lead,” the important question is whether that number should classify a line as lead. Conversely, the model may predict the likelihood of Service Line A being lead as only 8 percent. Is that low enough to be considered non-lead? Setting these limits, or thresholds, is based (when applicable) on any existing state guidance (North Carolina, for example, has specific thresholds for classifying a line as lead/GRR or non-lead), the estimated performance of the model at different thresholds across a slew of metrics, and the water system’s own unique areas of risk tolerance. For example, Water System A may not feel comfortable classifying any line whose probability of being lead is lower than 70 percent or higher than 10 percent, while Water System B may be fine with it.
Setting thresholds allows water systems to quickly classify their unknowns as lead or non-lead, efficiently reducing their unknowns and the LCRI compliance burdens associated with them. It also saves time and money that would otherwise have been spent on digging up lines to visually verify them.
Predictive modeling is an effective tool for water systems to quickly and efficiently classify their unknown service lines, but it’s critical to have the right teams and regulatory support to ensure you’re using your predictions as effectively as possible. Not all models are the same – a good model will include segment-level predictions and full line classification, be dynamic, and be tested across many trustworthy performance metrics, among other factors.
Reduce your unknowns and save time and money with BlueConduit’s LSL Solutions. Contact us to learn more.