Statement for the record provided by BlueConduit to the U.S. House of Representatives Committee on Science, Space, and Technology, Subcommittee on Investigations and Oversight
October 15, 2019
Dear Chairwoman Sherrill and Ranking Member Norman:
Ending the ongoing public health crisis created by lead service pipes requires identifying the pipes most likely to pose a risk in order to replace them. We currently do not have enough information to identify these pipes in a way that responds to the immediacy of the crisis and reduces the time people live with lead pipes. A data-driven approach, like the predictive model we developed for and employed in Flint, Michigan, helps identify lead service pipes in a sustainable manner that can be replicated across communities.
Please accept the following statement for the record regarding the Committee’s October 15th Addressing the Lead Crisis through Innovation & Technology hearing. We applaud the Committee for holding this hearing and for recognizing the urgent need to respond to this public health crisis. Our hope is that this hearing will demonstrate how data-driven approaches can help provide safe drinking water for millions of Americans in a financially sustainable manner.
In response to the Flint Water Crisis in 2016, we developed a data-driven approach to help the city discover and replace its most harmful lead service lines to stop the ongoing lead exposure to its residents, within its budgetary constraints. This same approach, strengthened through insights gained in Flint, can be replicated nationwide, as our team has started to do. Our team is led by Dr. Eric Schwartz, Assistant Professor of Marketing at the University of Michigan, and Dr. Jacob Abernethy, Assistant Professor of Machine Learning at Georgia Institute of Technology, who co-authored related peer-reviewed research1 and collaborated with Brigadier General (ret.) Michael McDaniel, the coordinator for Flint’s service line replacement program in 2016-17. Dr. Schwartz and Dr. Abernethy also co-founded BlueConduit, LLC, a social venture created to help other water systems replace lead service lines quickly and efficiently.
We respectfully offer our perspective to the Committee and the public in the hope that public-health concerns and data will drive decision making in lead service-line replacement efforts across the country. The following are the key takeaways from our work in Flint we believe that communities should and can incorporate as they structure their service-line replacement and inventory programs.
A, The Number of Lead Services Lines is Currently Unknown
1. National and regional estimates are unreliable because local data is unavailable. Unfortunately, community water systems, some of which were installed over a century ago, have not maintained good records about the composition of service lines. In Flint, like in other cities across the country, outdated and incomplete records make it difficult to identify that quantity and location of hazardous pipes. Only 10% of Flint’s historical records indicated a lead service line. Now, after excavations at more than 20,000 homes, the data show that almost 60% of Flint had lead pipes. As a result the original budget allocation was highly insufficient.
2. The federal rules are inadequately enforceable. Current rules require local governments to replace a specified percentage of lead service lines. But because localities do not know how many lead service lines they have, enforcing a percentage of an unknown number is unfeasible and can disincentivize identification of lead service lines. In addition to requiring lead replacement, rules could also better aim to locate all of the lead by requiring the inspection of a certain percentage of all service lines with unverified materials.
3. When ignoring the challenge of identifying homes with lead lines, stated costs can both underestimate and overestimate the real costs of replacement. Service line replacement costs are underestimated by failing to incorporate the costs of discovering lead-tainted service lines. For example, a typical estimate for the cost of replacing a single home’s water service line is $5,000. Yet this quote ignores the excess cost of the same crews digging at a home only to discover it does not need a replacement, which typically costs around $2,500. If a city digs like this at 300 homes, replacing lead service line at 100 homes and discovering copper at the other 200 homes, the average cost of a successful replacement is $10,000. Likewise, using wholesale approaches to dig everywhere instead of data-driven approaches to prioritize efforts unnecessarily overestimates costs by requiring excess spending on service line discovery, while also diverting resources from successful service line replacement.
4. Even existing verified service lines may not be representative of the whole system. Identifying lead service lines based on previous discoveries of lead service lines is also not sufficient. In Flint, the first 171 attempted replacement excavations found lead pipes more than 96% of the time.2 In other cities, existing data on verified service lines are commonly collected in settings like water main breaks. While this information is helpful, the occurrence of lead service lines found in such settings may not be representative of the whole community.
B. Identifying Lead Service Lines Using Statistics and Machine Learning
5. There are inexpensive methods to inspect a home’s service line materials. The cost of digging under the street and into a home’s yard with a backhoe can be thousands of dollars. But the pipe material can be inspected just as accurately for only hundreds of dollars by combining an in-home inspection and an excavation method with a hydro-vacuum. A hydro-vacuum truck can be dispatched around a city to perform inspections quickly and at more homes for the same budget.
6. Local governments should begin service line replacement programs by inspecting materials at a representative set of homes. Best practice in statistics suggests that localities perform inspections at a uniformly random set of homes, representing the whole community, in order to provide the best estimate of the total numbers of lead service lines. This provides a dataset, which then serves as a reliable basis for other analyses like estimates for funding requests. In Flint, we recommended crews conduct inspections with a hydro-vacuum truck at a statistically representative set of homes to get a better estimate of the number of lead pipes in the city. Based on a few hundred inspections in late 2016, this estimate3 proved to be accurate three years later.
7. The initial data enables statistical models to guide local governments in targeteting areas most likely to have lead. The representative set of homes in Flint, combined with other publicly available information about homes, enabled Schwartz and Abernethy to use a statistical machine-learning model, which produced home-by-home predictions of the probability of lead service lines for every home in the city. The probabilities allowed General McDaniel’s team to prioritize homes with the highest likelihood of lead and direct the crews where to go next for replacing lead. The approach’s success led a 2019 federal court settlement agreement to require its use in Flint.
C. Being transparent with data and clarifying stakeholders’ incentives will help align financial and public health goals.
8. Statistical models need transparency in implementation to best serve the public. Data-driven models are not sufficient on their own. They require transparent implementation and the collection of representative data, including by recognizing competing interests of stakeholders. For example, the proper collection of data requires a division between crews inspecting service pipes and those replacing the pipes. Data must also be easily accessible to the public, and in turn, the community should be able to participate in data collection. On a broader level, Flint has illustrated how empowering local communities is necessary to promote public health.
The nation’s aging infrastructure presents grave risks to public health. Making data-driven decisions by applying established statistical approaches, as we have in Flint, can help other communities remove lead from their drinking water systems and maximize any budget’s impact on public health.
Eric M. Schwartz, Ph.D.
Assistant Professor of Marketing, Stephen M. Ross School of Business, University of Michigan
Jacob Abernethy, Ph.D.
Assistant Professor of Computer Science, College of Computing, Georgia Institute of Technology.
Managing Director, BlueConduit
Chief Data Scientist, BlueConduit
Michael McDaniel, Brigadier General (Ret.)
Professor, Director of LL.M. in Homeland and National Security Law, and Associate Dean, Cooley Law School, Western Michigan University
Director of Government and Customer Relations, BlueConduit
Abernethy, Jacob D., Alex Chojacki, Arya Farahi, Eric M. Schwartz, Jared Webb (2018). ActiveRemediation: The Search for Lead Pipes in Flint, Michigan. KDD 2018, Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining, London, England, U.K.