AI can't magically interpret data out of thin air, but it can handle messy inputs surprisingly well. It can work with incomplete context, inconsistent formats, and even partially incorrect data in ways that older approaches simply couldn't.
My previous understanding was that AI could only match what a human could infer from the same dataset. In other words, if you could fully document a human expert’s tacit knowledge, AI might reach that level. I am gradually revising that assumption.
General-purpose AI possesses broad contextual knowledge that often extends beyond the expertise of any single person. For example, differences in terminology between industries can confuse people who are highly specialized in a single field. AI, on the other hand, is not really constrained by such boundaries. It can often infer meaning across different contexts without getting tripped up by unfamiliar phrasing.
Then there is patience. If guided properly, AI does not get tired or lose focus. Where a human might start skimming by page five, AI will remain just as attentive on page five hundred. That alone can help prevent details from being overlooked.
It also performs well in areas where humans rely on educated guesswork. Interpreting misspelled names, identifying misplaced information, or reconstructing intent from imperfect data are all tasks where AI can be surprisingly effective.
So, my current view is this: with the right guidance, AI can often perform at least as well as a human when dealing with messy data, and sometimes even better.
That raises an interesting question. If humans can already handle the process with imperfect data, do we really need to prioritize cleaning it up before trying to automate it?
Improving data quality is still worthwhile, especially when it is easy to do. But it may no longer need to be the automatic first step. It might be more efficient to first determine whether the data actually affects the outcome.