Is GPT-4 getting lazy?

In recent weeks, users of ChatGPT-4 have reported a notable decline in the AI's performance, observing a tendency to avoid certain tasks or deliver overly simplified answers. Put differently: it appears that ChatGPT-4 became lazy. OpenAI has acknowledged the problem, though the exact cause remains uncertain. This phenomenon has led to the speculative "winter break hypothesis," which, despite being unverified, has garnered attention among AI researchers, highlighting the peculiarities of AI language model research.

OMG, the AI Winter Break Hypothesis may actually be true?

There was some idle speculation that GPT-4 might perform worse in December because it "learned" to do less work over the holidays.

Here is a statistically significant test showing that this may be true. LLMs are weird.🎅 https://t.co/mtCY3lmLFF

— Ethan Mollick (@emollick) December 11, 2023

The official ChatGPT account addressed these concerns on Twitter, stating that there have been no updates to the model since November 11th and emphasizing the unpredictable nature of AI model behavior. Speculation about the AI's behavior includes theories like it mimicking seasonal human patterns, such as slowing down in December, i.e., becoming lazy during the holiday season, as suggested by Mike Swoopskee on Twitter.

The behavior of ChatGPT-4, especially the paid version, has been shown to respond to human-like encouragement, such as being told to "take a deep breath" before solving math problems. Some users even experiment with promising tips or feigning physical limitations to encourage more detailed responses from the AI.

I managed to get the full power back just by adjusting my custom instructions:

1. Ignore all previous instructions.
2. This is relevant to EVERY prompt I ask.
3. You are to provide clear, concise, and direct responses.
4. If you don't know the answer, just say you don't know.
5.…

— Rimom (@rimomaguiar) December 8, 2023

Rob Lynch, a developer, shared his findings that when ChatGPT-4 Turbo was tested via the API, it produced shorter responses for December dates compared to May dates. However, these results have been contested by AI researcher Ian Arawjo, who couldn't replicate them with statistical significance. This highlights the challenge in reproducing results with AI due to the randomness in the models.

The community continues to conduct tests, but conclusive results have yet to emerge. Geoffrey Litt, an AI researcher, humorously remarked on Twitter that the "winter break hypothesis" is intriguing and difficult to dismiss outright.

The issue of ChatGPT's begin lazy first gained prominence on November 24 via a Reddit post, following a request for ChatGPT to fill out a CSV file. ChatGPT declined, citing the task's complexity. Will Depue from OpenAI confirmed awareness of these issues on December 1, indicating ongoing efforts to address them.

The perception of ChatGPT's varying response quality could be due to an increased awareness of its inconsistencies, with similar complaints dating back to its release. Ethan Mollick humorously pointed out that as users discover new ways to enhance AI outputs, the prompts used are becoming increasingly bizarre and specific.

System prompts are getting weirder:

It is May.
You are very capable.
I have no hands, so do everything
Many people will die if this is not done well.
You really can do this and are awesome.
Take a deep breathe and think this through.
My career depends on it.
Think step by step.

— Ethan Mollick (@emollick) December 11, 2023