Data: the backbone of AI
Artificial intelligence is only as good as the data behind it — and that’s a big problem. A recent survey shows that only about half of executives believe their data is ready to meet the demands of AI.
More than half of executives with companies adopting AI, 54%, are worried about the reliability and quality of their data, the survey out of Dun & Bradstreet finds. The findings are based on an on-site survey of executives attending the AI Summit New York, held in December.
Related AI concerns all have a common thread — data. These include data security (46%), data privacy violations (43%), sensitive or proprietary information disclosure (42%) and data’s amplification of bias (26%).
Data quality, timelines, and consistency have been slowing down technology progress for decades — since business intelligence tools emerged in the 1980s, to the data analytics revolution in the early 2000s, to today’s AI activity.
Observers across the industry agree that actionable data is still too few and far between for the AI world. As a result, trust is lacking in today’s AI projects, said Kunju Kashalikar, senior director of product management with Pentaho. “Organizations don’t have enough visibility into their data – even with the basics of who owns it, its source, or who has modified it.”
Untrustworthiness in data “means possibly feeding proprietary or biased data into machine models, likely breaching IP and data protection rules,” said Kashalikar. “It also makes it difficult to establish accountability for regulatory compliance. Data must be catalogued at the source with easily understandable terminology so it can flow through various projects like AI with the ability to have streamlined discovery.”
There are security implications to AI running on untrustworthy data as well. “AI, from a security perspective, is founded on data trust,” said David Brauchler, technical director at NCC Group. “The quality, quantity, and nature of data are all paramount. For training purposes, data quality and quantity have a direct impact on the resultant model.”
AI-based applications “cannot be implemented securely without knowledge of proper access controls applied to the data in question,” he added. “What stands between an attacker and the CEO’s emails, for example, is the quality of the organization’s labeling and access control measures.”
Overall, data integration is more important than ever, as data silos and data fragmentation stand as major roadblocks to generative AI. Few enterprises “have achieved meaningful, enterprise-wide impact from generative AI. Having a seamless and integrated data environment is crucial for achieving the full potential of AI,” said Mary Hamilton, managing director and global lead for Accenture’s Innovation Center Network.
To move forward with AI, it’s critical that data is well-prepared and integrated, Hamilton added. “This includes making all relevant data accessible to AI agents in real-time, including unstructured data, through APIs or microservices.”
That foundation also needs to effectively incorporate semi- and unstructured data, such as documents, images, and video. Add to this real-time data APIs, as “many emerging AI capabilities require nearly real-time access to data across functions and systems to feed the model daily,” said Alex Baldenko, head of data science at MassMutual.
Automation is another essential element that is needed to effectively bring massive amounts of data in line for AI applications. “Often we see companies struggling with AI still doing most of the heavy lifting through manual data management,” said Kashalikar. “This means engineers are rolling up their sleeves to build and maintain data pipelines, standardize and classify data and find and fix problems. This is time consuming, inefficient and unreliable because it introduces delays in supply that leave AI to work with inaccurate or inconsistent data – and produces results that cannot be trusted.”
It’s time to remove as many of these manual processes as possible, Kashalikar advocated. “Automation replaces manual procedures such as finding, onboarding, tracking and managing data with policies that are written as code and implemented as software-driven processes.”
An automated data management approach will help “catalog data, which assists with streamlined discovery,” said Kashalikar. “Also, deploy capabilities in order to observe data as it changes, which leads to accountability. With these systems in place, leaders can set and enforce data-level policy for a systematic approach to security and compliance.”

