Understanding accuracy, confidence intervals, and “low-lead” statistical analyses
As water systems work to comply with Lead and Copper Rule Improvements (LCRI), many are turning to statistically valid sampling and analysis to better characterize remaining unknown service lines. A common and important question arises in this process:
If lead is found at one of the randomized inspection locations, how do we know whether that finding is accurate—and whether it represents a one-time occurrence or a broader issue? Especially when there is no history of lead in your system and you were not expecting to find any lead during your investigation efforts.
The short answer: finding lead at a randomized location does not automatically invalidate a statistical analysis. In fact, when properly designed, these analyses are built to account for that possibility.
Below, we walk through how no-lead and low-lead statistical analyses work, how confidence levels and confidence intervals are interpreted, and what utilities should do when lead is discovered during randomized inspections.
How no-lead and low-lead statistical analyses work
When a representative set of unknown service lines is inspected—on both the public and private sides—the resulting data can be used to characterize uncertainty across the entire system.
If none of the inspected service lines contain lead, utilities can make statistically defensible statements such as:
“We are 95% confident that there are fewer than X lead service lines on the public side of the system (n% of the system) and fewer than X lead service lines on the private side of the system (n% of the system).”
To produce these results, data scientists rely on two inputs:
- The number of inspected service lines with no lead found, and
- The desired confidence level (commonly 95%).
From these inputs, the analysis calculates the maximum percentage of lead service lines that could reasonably exist, given the observed data.
Example result
A system’s analysis might conclude:
We are 95% confident that there are fewer than 75 lead service lines on the public side (<1% of the system) and fewer than 50 lead service lines on the private side (<1% of the system).
In this example, the upper bound possibility is 75 lead service lines on the public side and 50 lead service lines on the private side.
This represents a conservative, protective estimate—not a guarantee that zero lead exists.
Why 100% certainty is rare (without 100% inspection)
With the exception of inspecting every single service line, absolute certainty that zero lead exists is not achievable. Statistical approaches instead focus on bounding uncertainty.
That means:
- Discovering one or a few lead service lines does not automatically contradict the analysis.
- The key question becomes whether those findings align with the expected upper bound of the statistical results.
When limited lead findings fall within that expected range, the system is generally characterized as “low-lead,” not misclassified.
Understanding 95% confidence levels and confidence intervals
What a 95% confidence level means
A 95% confidence level means that if the same randomized inspection process were repeated many times, 95 out of 100 samples would be expected to produce an interval containing the true system-wide proportion of lead service lines.
Importantly:
- This confidence applies to the system estimate, not to any individual service line.
- It does not mean there is a 95% chance a specific service line is lead or non-lead.
What a confidence interval represents
A confidence interval is the range of plausible values for lead prevalence, reflecting sampling uncertainty.
For example:
- If results estimate that less than 1% of unknown service lines may be lead, and
- The 95% confidence interval is ±5%,
Then the true proportion is expected to fall between 0% and 6% (since prevalence cannot be negative).
Why regulators focus on the upper bound
The upper bound of a confidence interval represents a conservative, worst-case estimate that is still consistent with the observed data.
Regulators often rely on the upper bound because it:
- Accounts for uncertainty,
- Prioritizes public health protection, and
- Supports defensible planning decisions.
In lead service line identification, the upper bound can be used to:
- Inform replacement planning and prioritization,
- Establish compliance strategies under LCRI or state guidance, and
- Set conservative assumptions when full physical verification is not feasible.
If lead is found at a randomized inspection location
Finding lead during randomized inspections is an expected and acceptable outcome within a statistically valid framework. Each inspection is one observation drawn from a population with some unknown level of lead prevalence.
Utilities typically follow a structured process to interpret the result.
1. Verification of the observation
- Confirm material type using approved methods (e.g., potholing, meter set inspection, records review).
- Ensure inspection protocols were followed correctly.
2. Assessment within the confidence interval
- Determine whether the finding is consistent with the existing confidence interval.
- A lead discovery that falls within the expected probability range does not indicate a flaw in the analysis.
3. Model and interval re-evaluation
- Incorporate the new data point into the dataset.
- Recalculate the estimated lead rate and confidence interval.
- Evaluate whether the upper bound meaningfully changes.
4. Targeted follow-up (if needed)
- If findings materially increase the upper bound or suggest clustering, utilities may conduct additional randomized or targeted inspections to reduce uncertainty.
Demonstrating that a finding is isolated
A lead finding can reasonably be characterized as a one-time or isolated occurrence when:
- It does not materially shift the estimated lead prevalence,
- The upper bound remains stable and within acceptable planning thresholds, and
- Additional inspections do not reveal spatial or material clustering.
In these cases, the overall classification and conclusions of the statistical analysis remain valid.
The bottom line
Confidence levels and confidence intervals provide a transparent, defensible way to quantify uncertainty in lead service line identification. When properly applied, they allow utilities and regulators to make informed, health-protective decisions—even when individual lead findings occur.
As more states adopt no-lead and low-lead statistical analyses, these frameworks are becoming a critical tool for responsibly reclassifying unknown service lines, prioritizing replacement efforts, and advancing compliance under LCRI without requiring unnecessary or impractical levels of excavation.
If you’d like help documenting a statistically defensible approach for regulatory submission—or aligning your analysis with state-specific guidance—those plans can be developed and tailored to your system’s needs.
If your system is navigating no-lead or low-lead statistical analyses—and you want confidence that your approach will stand up to regulatory review—BlueConduit can help.
Our team works with utilities nationwide to design statistically defensible inspection plans, interpret findings within confidence intervals, and align results with state-specific LCRI guidance.
Reach out to BlueConduit to discuss how we can support your LCRI compliance efforts.