CTO AI Corner: How much does data quality matter for AI services?

For a long time in machine learning, one truth has been repeated almost like a mantra: data quality is everything. In many cases, the first step in any project has been to clean and structure the data before even attempting to find a solution. But I am starting to wonder whether that still holds true for modern general-purpose AI models.

AI can't magically interpret data out of thin air, but it can handle messy inputs surprisingly well. It can work with incomplete context, inconsistent formats, and even partially incorrect data in ways that older approaches simply couldn't.

My previous understanding was that AI could only match what a human could infer from the same dataset. In other words, if you could fully document a human expert’s tacit knowledge, AI might reach that level. I am gradually revising that assumption.

‍

Rethinking the Role of Data Quality

General-purpose AI possesses broad contextual knowledge that often extends beyond the expertise of any single person. For example, differences in terminology between industries can confuse people who are highly specialized in a single field. AI, on the other hand, is not really constrained by such boundaries. It can often infer meaning across different contexts without getting tripped up by unfamiliar phrasing.

Then there is patience. If guided properly, AI does not get tired or lose focus. Where a human might start skimming by page five, AI will remain just as attentive on page five hundred. That alone can help prevent details from being overlooked.

It also performs well in areas where humans rely on educated guesswork. Interpreting misspelled names, identifying misplaced information, or reconstructing intent from imperfect data are all tasks where AI can be surprisingly effective.

So, my current view is this: with the right guidance, AI can often perform at least as well as a human when dealing with messy data, and sometimes even better.

That raises an interesting question. If humans can already handle the process with imperfect data, do we really need to prioritize cleaning it up before trying to automate it?

Improving data quality is still worthwhile, especially when it is easy to do. But it may no longer need to be the automatic first step. It might be more efficient to first determine whether the data actually affects the outcome.

May 6, 2026

ai-corner

Authors

Tomi Leppälahti

CAIO & CTO

CTO AI Corner: How much does data quality matter for AI services?

Rethinking the Role of Data Quality

Do you have questions about AI? Leave a message and let’s explore together how and where to use artificial intelligence.

Subscribe to our newsletter