Why do “data” and “AI” always go hand in hand?
Data and AIAI andThe data.
You almost always hear the two terms spoken in the same breath. Why is that?
If you’re a founder trying to better understand these topics, whether it’s to improve your workflows or your products or some aspect of your operations, here’s a business owner’s primer on what people mean when ‘they insist on saying the two together.
AI needs data to do anything.
At its core, AI is an algorithm, which in plain language is a process that takes inputs and produces outputs. Just like your car, which is just a piece of metal that sits in the garage until it has fuel to run it, an algorithm alone with no data to process cannot do anything useful. In fact, he can’t do anything at all.
This means that if you want your business to benefit from AI, the first task is to gather and shape your data. This can be a real stumbling block, according to Phong Nguyen, founder of data science consultancy Partners in Company. “According to clients we’ve worked with and spoken to, the barriers to greater data orientation are usually the basics of having clean, consistent data and having it centralized and secure,” says -she.
This usually means either pulling your data from spreadsheets or bringing your data together from multiple platforms, like a customer relationship management (CRM) platform and a marketing platform, into a centralized repository, where data can begin to be combined and compared. for analysis. Typically, it will still need to be cleaned up and normalized in various ways to make sure it’s consistent and in the right form before data teams can draw the correct conclusions and then build on the data with the AI.
Also, most AIs need large amounts of data to produce reliable results, for the same reason you need a large sample of anything to make a reasonable judgment. We’re all familiar with political polls, where professionals typically claim greater than 95% accuracy on how the wider population plans to vote in an election by sampling around 300 people.
This is for a simple choice between two options. If you’re trying to create more complex predictions, such as differentiating between types of customer behavior in your marketing data, you’ll want to start with several thousand samples. Often you will use a lot more to have a lot of confidence in your results.
How much data are we talking about? Proper statistical analysis can give you an accurate number for what you’re trying to do, but as a rule of thumb, hundreds of thousands of rows are usually inferior to machine learning based analyses. “I’m not used to working with anything less than a million rows,” says Chantel Perry, veteran data scientist at large corporations and author of the book. Data Newbie to Guru.
And for something like marketing analytics, where the customer trends you’re trying to understand might vary from day to day and month to month, you also want enough to gather data on a long enough to make useful predictions: “You want to be in business for at least six months and collect data on your customers for at least six months,” says Perry.
So now you understand why AI needs data. This dependence also goes the other way. The truth is, you can’t have one without the other.
A lot of data comes out of the AI
Just as AI algorithms need data as input, their output is often a form of data.
Let’s say your marketing data is analyzed in such a way that you discover that you have eight major customer groups. You may also find that different customer groups should receive different types of pitches or advertisements. Those outputs are data that you can feed into another algorithm, one where you can then use that labeling to predict which cluster a future customer will belong to, and then have an automated process that assigns them which placements or ads should be the most efficient.
When you think about it, all data exists as a result of an algorithm-like process, often AI. Sometimes AI powers this data collection process, sometimes it doesn’t, and sometimes the distinction isn’t so clear. Take, for example, data on average incomes and spending habits in a geographic area you are targeting: it may come from a combination of surveys, government data, data processed by credit card companies and merchants and then aggregated into a single number for a unique census block, which your marketing algorithms could then use to help you target different customers in different ways.
There’s a common saying that I often refer to when talking about data science: “No one believes in a model, other than the person who wrote it, and everyone believes in a given set of data, other than the person responsible for its assembly.” Noodle on it for a minute.
We tend to believe that data is necessarily true and does not depend on human or AI process to be what it is. But this is often wrong. If you want to achieve meaningful results, you need to look at the data that feeds your models, as well as the models that produced the data you feed your models.
“The biggest thing I’m struggling with is data quality,” Perry says. “Everything that goes into the decision-making process needs to be checked for cleanliness, bias, and other issues, especially with machine learning models.”
Understanding this back-and-forth between data and AI and their feedback loop will help you avoid relying on analytics that aren’t as good as they might first appear.