The American Water Works Association’s M77 manual “Condition Assessment of Water Mains” defines risk as: Risk = (Likelihood of failure) * (Consequence of failure)
For a likelihood of failure (LoF) analysis, the manual recommends using a weighted model. The manual also notes that predictive modeling, which we define as machine learning models, can be used to assess LoF and risk, particularly where the current condition of a pipe is unknown, inaccessible, and/or cost-prohibitive to capture.
So what’s the difference between a weighted model and predictive modeling? And which approach is the best for assessing a water main’s likelihood of failure?
Weighted model
A weighted model is a simple, subjective method for prioritizing options based on weights assigned to different criteria. Simple weighted models are a good tool for making quick, relatively simple decisions with a clear understanding of the relative importance of different factors, like selecting a vendor based on price, quality, and delivery time.
The M77 shows an example of a simple weighted model that focuses on 3 core criteria for water main LoF analysis: pipe age; soil corrosivity; and history of breaks.
However, water main risk LoF assessment is not simple and most utilities don’t have a clear understanding of the relative importance of different factors. Relying on subjective decisions by the utility or engineering firm is a best-guess approach to what matters and can easily bias the results and obscure other areas of risk.
To address the subjective nature of simple weighted models, the M77 recommends utilizing statistics, such as a regression analysis, as a tool to objectively assign weights to the different variables in the model. Which leads to the next approach to consider – a weighted model supported by regression analysis.
Weighted model with regression analysis
Multivariable regression analysis uses statistics to identify linear relationships between variables, which enables utilities to use data to make decisions about how variables should be weighted (aka, how much they impact LoF) in a weighted model.
Regression analysis is a great tool when looking for linear relationships between variables and making predictions based on those relationships, especially when data is well-structured and the relationships are relatively straightforward.
While using regression analysis adds objectivity to a weighted model, it is not well-suited for a high accuracy LoF analysis. Why? Because water main LoF data is complex and relationships between variables are often non-linear and do not exist in isolation. Let’s explore these ideas further.
Regression assumes linear relationships between variables but water main break data doesn’t always have linear relationships. Take pipe age for example. There are industry averages that show pipes degrading linearly over time; while this might be a common relationship, there are many factors that influence LoF and we often hear from customers who tell us about 100+ year old mains that have never broken, compared to newer mains that break regularly.
The second challenge of using a regression for LoF analysis is that the relevant variables influence each other. A regression analysis can only consider how a single variable changes in isolation. For example, a regression analysis can show how a water main’s LoF shifts up or down as soil corrosivity increases. But in reality, the impact of soil corrosivity is likely also related to water main material, among other factors. A regression analysis can explore these features in isolation, but cannot assess what happens when the variables combine and interact in different ways.
Last but not least, regression analysis as part of a weighted model is complex and estimates of risk are static over time. Ultimately, this analysis adds important data to a weighted model but it also eliminates the ease of using a weighted model and is unable to provide a highly accurate, dynamic view of LoF in water main infrastructure.
Predictive modeling
Machine learning, often referred to as predictive modeling, is a statistical approach that can handle complex non-linear relationships between variables through the use of advanced algorithms such as decision trees, random forests and neural networks, allowing for more nuanced predictions.
Machine learning models lean on the same data used for a regression analysis, including historical break data. The models, supported by data scientists, can then manage tens or hundreds of additional factors and identify complex patterns across the dataset, including an understanding of how factors combine and interact with one another.
Machine learning models can then determine how each factor influences risk main-by-main, rather than on average across the full dataset. This enables LoF predictions for each water main driven by the specific reality of that main; while break history may be a key driver of risk for 1 water main, another may have a risk profile driven by soil type, weather, weight of the load above the pipe, etc.
The major challenge of machine learning models is explainability; this is a more complex analysis than a simple weighted model or regression, so it can be more difficult to understand the factors driving the LoF predictions. However, this is an easy problem to solve (with the right tools & analysis). When selecting a vendor for machine learning analysis, make sure to focus on explainability. Ask about how they communicate and show the features driving predictions in their models and output. Don’t forget to ask about both global and local explainability; you want to understand what’s most influencing LoF across your complete system as well as for each individual water main.
Hands down, a high quality machine learning model will provide the most complete, accurate view of LoF risk across your water mains infrastructure, enabling high quality, data-driven decision making and proactive intervention, from physical condition assessment to maintenance to replacement.
Want to learn more about using machine learning models for water main risk and condition assessment? Sign up for our upcoming webinar.