
Ultimate access to all questions.
An AI practitioner is using a large language model (LLM) to create content for marketing campaigns. The generated content sounds plausible and factual but is incorrect. Which problem is the LLM having?
Explanation:
Hallucination is the correct answer because this is a common problem with large language models where they generate content that sounds plausible and factual but is actually incorrect or made up.
Let's break down each option:
A. Data leakage - This refers to when sensitive information from the training data unintentionally appears in the model's outputs. While this is a concern, it doesn't specifically describe the scenario where content sounds plausible but is incorrect.
B. Hallucination - This is exactly what's described in the question. LLMs can generate text that appears factual and coherent but contains false information, fabrications, or inaccuracies. This happens because LLMs are trained to predict the next most likely token based on patterns in their training data, not to verify factual accuracy.
C. Overfitting - This occurs when a model learns the training data too well, including noise and random fluctuations, making it perform poorly on new, unseen data. This is more about model generalization than generating incorrect but plausible content.
D. Underfitting - This happens when a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data. This doesn't describe the specific issue of generating plausible but incorrect content.
Key takeaway: Hallucination is a well-known limitation of LLMs where they confidently present false information as fact, which is particularly problematic for applications like marketing content generation where accuracy matters.