I interviewed BlueConduit’s VP of Data Science, Alice Berners-Lee, to learn more about how BlueConduit’s expert data science team defines AI, predictive modeling, and statistical analysis. This interview has been edited and condensed for clarity.
Elana Fox: We always hear about statistical analysis, predictive modeling, and AI. How do you define these 3 terms and how do they relate to each other?
Alice Berners-Lee: Broadly, I think about it in a hierarchy: AI to statistical analysis to predictive modeling. So you’re getting more and more specific as you go.
To me, AI is a colloquial term that is a huge umbrella. It basically just means things that a computer can do that a brain can also do; it’s just intelligence that is not in a brain. The colloquial definition of “AI” changes with the times. That’s one of the reasons that I think it’s really a useless word in a lot of ways.
In certain subfields of computer science or in neuroscience, there are areas where you get specific definitions of AI, but that’s not what we’re typically talking about when we talk about AI.
Elana Fox: When people talk about AI today, do you think they’re mostly talking about generative AI?
Alice Berners-Lee: I don’t think so. I think that people like using the term AI because it changes with the times. Artificial intelligence is us making computers simulate what our brains can do.To me, AI always seems to mean, colloquially, the thing that people are impressed that computers can do. That new thing becomes ‘AI’ and once people get used to the fact that computers can do this and you realize how different humans are than computers then all of a sudden you’re like ‘oh, that’s not AI because that’s not intelligence. That’s just a computer doing this thing.” And we do it a lot better.
But when things first come out then everybody’s like, ‘I can’t believe a computer is doing this for me.’ So that’s why I feel like trying to define AI is really fickle. Right now, large language models like ChatGPT are in the news a lot and because people are super impressed with them, that’s what people are talking about. There’s always something new that people are impressed with but, in 10 years, people will realize how limited they are in some ways and we’ll call them something else. That’s just my prediction.
Elana Fox: Tell me about statistical analysis and predictive modeling.
Alice Berners-Lee: Statistical analysis is basically anything that uses math and mathematical models to infer the underlying population from a sample. There are a wide range of different types of models, and predictive modeling is a type of statistical analysis that produces and relates to probabilities.
Even the works “analysis” and “model” are tricky. They both can have multiple meanings and be used in different ways. So you can have a statistical model, statistical analysis, a machine learning model, predictive analysis, etc.
To me, we often substitute in different words to try to communicate different levels of rigor. Colloquially, we usually use ‘model’ as a higher rigor compared to ‘analysis.’
Analysis can really be anything. It can range from complicated mathematical models that you have to have a PhD in a subfield to understand to using simple formulas in an Excel spreadsheet. Sometimes, you can even do analysis without even using a spreadsheet; for example, if you press buttons on a software that’s been made for you.
When I think of modeling, it typically comes after we’ve already done a good amount of analysis to understand what data we’re working with. And then we’re going to build something that can do the type of math that we as humans don’t do regularly. It would take us a lot longer to do it manually. So we build a model that will optimize the things that we are defining it to optimize.
But the problem with the word ‘model’ is that even though it connotes more rigor in some senses, it also has a really broad meaning in other senses. Because it’s perfectly reasonable to use the word model in a way of having a mental model of something. A schema or anything that has a heuristic can be a model, so basically your simplest way of visualizing a problem or a solution and implementing it in any form is a model. There’s just a gradient between things like a schema or a word model or a picture all the way up to a highly parameterized model that takes a huge amount of time to compute.
So even though I like to use the word ‘model’ when I’m actually talking about a more rigorous amount of analysis and use ‘analysis’ when talking about sort of earlier steps in a process and more rudimentary sort of mathematical equations, it’s really all about how the person is using the term.
Elana Fox: Can you give me an example?
Alice Berners-Lee: We’ve seen other companies use the word ‘model’ and then realize later that they are using something I would actually call a heuristic.
For example, someone could say they built a spatial model around the idea that lead pipes tend to be found near other places that used to have lead. Having just that idea is a heuristic; it still has a lot of open parameters. If you find a lead pipe, how close to that is it; is it just the neighboring house or is it within 50 meters? And so on.
When we build a baseline model for the spatial baseline at BlueConduit, we are fitting parameters that are looking at how zoomed in or out you want to be spatially and we’re using the same type of technique of optimizing to increase the accuracy within the model and to find the perfect parameter within space to generate those predictions. And so even the straw man model that we’re using is a more valuable model than just going through the spreadsheet and ordering things by distance or something like that.
Elana Fox: What else do you wish water systems better understood about AI or predictive modeling?
Alice Berners-Lee: Nothing (laughing). I want water systems to not have to know anything about AI and to be able to focus on what’s important to them. I don’t want water systems to have to take the time to learn about any of these differences; that’s my wish.