By  Insight Editor / 25 Oct 2024 / Topics: Artificial Intelligence (AI) Digital transformation Analytics IT modernization

As more businesses embrace the transformative power of generative AI, they’re coming to appreciate the importance of data integrity to producing quality results. Data and AI go hand in hand, and it’s never been more important to develop best practices around the data we’re feeding to LLMs. In this post, we’re taking a close look at the kinds of considerations you’ll need to make when using data for generative AI.
New Gartner research suggests that at least 30% of generative AI businesses testing the technology will end proofs of concept by the end of 2025. Early adopters often struggle with escalating costs, and deployments can range from $5M to $20M.
With so much on the line, ensuring that AI and GenAI projects succeed is imperative for organizations that want to drive efficiency and generate new revenue streams. The challenges facing any business that hopes to take advantage of the promise of GenAI include collecting, storing, organizing, and managing the data that informs today’s models.
One cautionary example of how data can negatively impact the outcomes of AI projects is the case of IBM Watson and the MD Anderson Cancer Center.
In 2017, IBM Watson got the opportunity to partner with MD Anderson Cancer Center to use AI to help oncologists diagnose and recommend treatments for cancer patients. After several years and over $60M invested, the project was halted due to data issues. Since medical records were stored in multiple locations in unstructured formats across different systems, it was difficult to unify data in a way that Watson could effectively utilize. As a result, Watson struggled to produce accurate results, and the project was ultimately considered too costly.
This is just one example of how the best-laid plans can go awry when the integrity of data isn’t attended to from the beginning of a major initiative. It’s one thing to ask whether an organization is ready to take on an AI project. A better question is whether the data itself is ready.
According to a new RAND Corporation report, an estimated 80% of AI projects fail to achieve their intended goals, with data quality emerging as the second most pressing hurdle. This high failure rate often stems from:
Data quality and data infrastructure are two major reasons why AI and GenAI projects can come up short. Since GenAI models require large volumes of high quality data for training, inaccurate and incomplete data can impact the results. In the case of LLM models, many users have seen hallucinations that provide erroneous answers to questions that can be easily answered with access to common knowledge. Let’s look at two case examples of projects that suffered from poor data quality.
In May 2023, a lawyer used ChatGPT to help with legal research and ended up submitting a brief with six completely made-up cases. The judge was not impressed, to say the least, and the lawyer faced potential sanctions. This incident highlighted the dangers of relying on AI without double-checking its output, especially in high-stakes situations.
In another case, An Australian mayor was falsely accused of bribery and imprisonment by ChatGPT. This incident demonstrated how AI hallucinations can damage reputations and spread false information about individuals, leading to troubling real-world consequences.
Hallucinations are just one of many examples of feeding an AI model inaccurate data and experiencing inconsistent or inaccurate results.
AI may be evolving at breakneck speed, but many of the core rules around data that professionals have relied upon for decades have remained the same. Ten years ago, big data technology was rapidly transforming how organizations understood and used data.
The 3 V’s (volume, velocity, and variety) was one of the primary frameworks defining the properties of big data. These rules still apply today, especially when it comes to data for generative AI. Let’s break down how the 3 V’s can adversely impact today’s AI and GenAI models if not managed correctly.
Google played a major role in defining the modern space of data management, data operations, analytics, and scalable storage solutions. Their solutions are built to support AI and GenAI applications. When it comes to the proper implementation of the 3 V’s of data, Google Cloud brings a set of solutions designed to handle complex data architectures that perfectly integrate with powerful AI models like Vertex AI and Gemini.
The path to AI-driven transformation begins with preparing your data. Insight’s expertise includes a track record of identifying, cataloging, and examining the data sources organizations need to achieve desired business outcomes. What steps does Insight take to make your data AI-ready?
Oftentimes, organizations need a clarified data strategy to avoid data sprawl and disorganization. Insight collaborates with your team to locate and organize your data, ensuring that everything is accounted for.
Sometimes data needs to be relocated. For example, you might have legacy data sitting in outdated systems like Teradata or Hadoop, or scattered across aging databases in your data center. Insight frequently migrates such data to the cloud for optimal processing. If your data resides in another cloud platform, you’ll have several options, from leaving it in place to summarizing, hashing, or moving it to Google Cloud.
Insight combines your domain expertise with AI and data expertise to resolve issues at the data’s source. Your dedicated Insight team ensures that the corrected data is accurately flowing to AI training systems. If data is missing, your Insight team will help you fill in those gaps. In other instances, corrections necessitate using methods like third-party APIs for address verification, deduplication, cleaning up formatting issues, or fixing erroneous records. These processes are automated whenever possible.
For more insights into how to harness your operational databases for enterprise GenAI apps, we recommend the informative new guide from Google Cloud, Accelerating generative AI-driven transformation with databases. This helpful resource includes such topics as:
When you’re ready to talk to a Insight AI and data expert, feel free to sign up for Insight’s GenAI Journey Accelerator program for your customized blueprint for GenAI success.