OpenAI's New Models Hallucinate More Than Ever Before

OpenAI's latest reasoning AI models, o3 and o4-mini, have been found to hallucinate more frequently than their predecessors. Internal tests revealed that o3 hallucinated 33% of the time on PersonQA, while o4-mini reached 48%. This trend raises concerns as hallucinations complicate the reliability of these models for business applications. Despite improvements in coding and math tasks, the increased hallucination rates challenge their utility. OpenAI acknowledges the need for more research to address these issues, especially as the industry shifts focus to reasoning models for enhanced performance.