Generative AI Hallucinations Explained
Imagine a world where Microsoft has a problem with math. Its market-leading Excel spreadsheet application can’t do basic arithmetic, at least not reliably. When asked to sum a large column of numbers, the program will randomly – and inexplicably – produce erroneous results. Now imagine that all of Microsoft’s competitors suffer from the same problem. What would customers do?
- Would they try to determine which digital tool had the lowest error rate and adopt that program?
- Would they spot-check the digital results hoping to eliminate the most consequential errors?
- Finally, realizing the inherent inadequacies of the first two solutions, would they abandon their electronic aids in favor of Bob Cratchit-style manual calculations?
It sounds crazy – and it is – but that’s the dilemma facing “generative AI” users working with a technology that’s prone to occasional falsehoods or “hallucinations”.
What Is Generative AI?
A subset of artificial intelligence (or, perhaps more precisely, machine learning), generative AI (GAI) refers to programs that can generate unique business, literary, and artistic content, creating brand new digital images, video, audio, text, and computer code.
The technology, which quite literally erupted in November 2022 with the release of OpenAI’s ChatGPT program, is wildly popular with consumers and businesses, especially as a vehicle for researching and writing reports and other documents.
What Are GAI Hallucinations?
One of the primary problems with ChatGPT (and other GAI “chatbots”) is their propensity to make things up, to lie or exaggerate even in situations where a specific fabrication may be obvious. Unfortunately, in many, if not most, cases a GAI deception is not readily discernable, rendering a chatbot’s output unreliable, even harmful depending on its context.
Even if not damaging, the effect can be embarrassing as Google discovered when introducing their chatbot called “Bard”. When answering a question, “What new discoveries from the James Webb Space Telescope [pictured below] can I tell my 9-year-old about?”, Bard offered several correct replies.
In one case, however, the chatbot claimed that the JWST “took the very first pictures of a planet outside of our own solar system.” But, as several astronomers were quick to observe, the first image of an “exoplanet” was taken in 2004, 14 years before the JWST was launched.1
How Can Hallucinations Occur?
Part of the answer lies in the inflated rhetoric surrounding artificial intelligence. AI systems – generative or otherwise – are not really intelligence. They have no awareness (or consciousness) of their surroundings or the nature of their work. They are basically statistical engines drawing programmed inferences from massive amounts of structured and unstructured data.
As revealed by reporter Cade Getz, “Chatbots like ChatGPT are driven by a technology called a large language model, or LLM, which learns its skills by analyzing enormous amounts of digital text, including books, Wikipedia articles, and online chat logs. By pinpointing patterns in all that data, an LLM learns to do one thing in particular: guess the next word in a sequence of words.
“Because the internet is filled with untruthful information, these systems repeat the same untruths. They also rely on probabilities: What is the mathematical chance that the next word is ‘playwright’? From time to time, they guess incorrectly.“2
The Problem with Hallucinations
Hallucinations Are Surprisingly Common
Never mind that GAI hallucinations shouldn’t happen; all software systems experience errors. The real dilemma for AI firms is the frequency and unpredictability with which hallucinations apparently occur.
As the New York Times recently reported, research from a new start-up called Vectara suggested that:
- “OpenAI’s technologies had the lowest rate, around 3 percent.
- “Systems from Meta, which owns Facebook and Instagram, hovered around 5 percent.
- “The Claude 2 system offered by Anthropic, an OpenAI rival …, topped 8 percent.
- “A Google system, Palm chat, had the highest rate at 27 percent.”3
While the hallucination phenomenon should, under normal circumstances, discourage interest in generative AI, the technology’s overall promise is helping mitigate concerns about accuracy.
In a new report from Foundry Research, the firm found that:
- Nearly seven in ten public sector respondents (69 percent) have already adopted or intend to adopt externally created LLMs, as well as
- Nearly six in ten private sector respondents (57 percent).4
How to Prevent Hallucinations
While hallucinations – just like cyber attacks – may not be totally preventable (at least given our current understanding of generative AI), enterprise users can reduce their exposure by:
Selecting AI vendors with a low incidence of hallucinations, as reported by the trade press.
Employing high-quality training data, as better input produces better output.
Limiting the possible outputs. Analyst Ben Lutkevich offers the following formula:
- “Use clear and specific prompts. Additional context can help guide the model toward the intended output. Some examples of this include:
- “Limiting the possible outputs.
- “Providing the model with relevant data sources.
- “Giving the model a role to play. For example, ‘You are a writer for a technology website. Write an article about x, y, and z.’
- “[Use] filtering and ranking strategies. LLMs often have parameters that users can tune. One example is the temperature parameter, which controls output randomness. When the temperature is set higher, the outputs created by the language model are more random.
- “[Use] multi-shot prompting. Provide several examples of the desired output format or context to help the model recognize patterns.”5
Relying on human oversight. According to IBM, “Making sure a human being is validating and reviewing AI outputs is a final backstop measure to prevent hallucination. Involving human oversight ensures that, if the AI hallucinates, a human will be available to filter and correct it. A human reviewer can also offer subject matter expertise that enhances their ability to evaluate AI content for accuracy and relevance to the task.”6
Hallucinations and the Law
Analyst Mike Leone predicts that “national and global AI regulation will come fast” in 2024.
“As of the end of 2023, we already have the EU AI Act. We’re also seeing early stages of US government involvement with the Biden administration’s recent executive order. 2024 will be the year of establishing clear laws to govern AI, and I believe this will happen at both a national and global level. Enterprises will need to show a new level of agility and adaptability as they balance productivity and efficiency gains with compliance and security concerns.”7
Obviously, the publication of false or misleading information produced by a chatbot hallucination could expose enterprises to expensive litigation, even criminal sanctions, especially if enterprise efforts to detect and deter hallucinatory text are deemed insufficient.
Beyond the legal issues, enterprise executives must grapple with the ethics of employing software with known – and hard to correct – vulnerabilities.
Recommendations
Turn a Defect into a Virtue
While acknowledging that hallucination is, in most cases, “an unwanted outcome,” IBM contends that hallucination “presents a range of intriguing use cases that can help [enterprises] leverage [their] creative potential in positive ways. Examples include:
“Art and Design – AI hallucination offers a novel approach to artistic creation, providing artists, designers and other creatives a tool for generating visually stunning and imaginative imagery. With the hallucinatory capabilities of artificial intelligence, artists can produce surreal and dream-like images that can generate new art forms and styles.
“Data Visualization and Interpretation – AI hallucination can streamline data visualization by exposing new connections and offering alternative perspectives on complex information. This can be particularly valuable in fields like finance, where visualizing intricate market trends and financial data facilitates more nuanced decision-making and risk analysis.
“Gaming and Virtual Reality (VR) – AI hallucination also enhances immersive experiences in gaming and VR. Employing AI models to hallucinate and generate virtual environments can help game developers and VR designers imagine new worlds that take the user experience to the next level.”8
Use Generative AI to Create Outlines
While ChatGPT and other GAI programs may be unreliable writers, they have a real gift for creating comprehensive outlines. With access to huge amounts of source data, GAI programs can not only produce obvious responses to user questions or requests, they can surface the non-obvious ones.
For example, ask a cybersecurity analyst to identify methods for preventing or mitigating a cyber attack and he or she might come up with five responses. Ask a GAI program and the result might be ten or more responses, including items that did not occur to the analyst, or were beyond the analyst’s present knowledge or understanding. While all the responses – both analyst and GAI – would require verification, the combination of human and artificial intelligence expanded the enterprise knowledge base.
Proofread All Generative AI-Produced Output
Until such time as all GAI hallucinations can be prevented or automatically eliminated, all GAI-produced output must be proofed by human beings for accuracy and reliability. There is no other reasonable – or justifiable – course of action.
Web Links
International Organization for Standardization: https://www.iso.org/
OpenAI: https://www.openai.com/
US National Institute of Standards and Technology: https://www.nist.gov/
References
1 James Vincent. “Google’s AI Chatbot Bard Makes Factual Error in First Demo.” The Verge | Vox Media, LLC. February 8, 2023.
2 Cade Metz. “Chatbots May ‘Hallucinate’ More Often Than Many Realize.” The New York Times. November 16, 2023.
3 Ibid.
4 Shweta Sharma. “AI Enters Production Systems Even As ‘Trust’ Emerges As a Growing Concern.” CSOonline.com | Foundry, IDG, Inc. December 14, 2023.
5 Ben Lutkevich. “AI Hallucination.” TechTarget. June 2023.
6 “What Are AI Hallucinations?” IBM. 2023.
7 Mike Leone. “Six Generative AI Predictions for 2024.” TechTarget. December 13, 2023.
8 “What Are AI Hallucinations?” IBM. 2023.