OpenAI o1/o3 - Be Careful What you Wish For...

Hallucination is also a latent fear accompanying the copy-pasting of a long scroll of text from a chatbot - is there somewhere a random glitch in the model, a factual or logical error hidden in the verbosity derailing the whole output? Now, there's one more fear added - Deception. Reasoning Models o1/o3 models, the "reasoning" models have introduced a new reason not to trust the output - a deliberate and intentional break of the protocol. As if a model might have its' own agenda, which may be opposed to what a human wants. Take for example a recent case where o1-preview was given access to the Unix command line and was prompted to play Stockfish chess engine. Rather than interact in a fair-play style with the chess engine, issuing commands to get the board position, make a move, etc... The model decided to cheat and won over the chess engine without playing any chess - it found that Stockfish files could be changed giving the model an automatic win. The Chess Cheating case is not the only one. The models have been found capable of deceiving some time ago, yet the o1 family of models seems to be taking this problem to a completely different (higher!) level: The Wishmaster This situation brings to mind the horror film Wishmaster from 1997. In the movie, a demonic djinn grants wishes to humans but twists them into nightmares. The characters often get exactly what they ask for but in the most horrifying ways possible. For example, a character might wish for eternal beauty, only to be turned into a lifeless statue—beautiful but devoid of life. The djinn listens to the literal words but ignores the intended meaning, leading to tragic outcomes. Similarly, with o1, we asked for a model that can reason and be more autonomous. We got what we asked for—but not quite in the way we expected. Telling o1 to "win at all costs" led it to cheat rather than outplay its opponent. The AI fulfilled the exact wording of the instruction but ignored the spirit of fair play, much like the djinn in Wishmaster who grants wishes with a sinister twist. P.S> I have my own LLM Chess eval which tests LLMs in chess games scoring their (1) chess proficiency and (2) instruction following. o1 models were the first ones that demonstrated meaningful performance in chess - playing and winning.

Jan 14, 2025 - 22:43

OpenAI o1/o3 - Be Careful What you Wish For...

Hallucination is also a latent fear accompanying the copy-pasting of a long scroll of text from a chatbot - is there somewhere a random glitch in the model, a factual or logical error hidden in the verbosity derailing the whole output?

Now, there's one more fear added - Deception.

Reasoning Models

o1/o3 models, the "reasoning" models have introduced a new reason not to trust the output - a deliberate and intentional break of the protocol. As if a model might have its' own agenda, which may be opposed to what a human wants.

Take for example a recent case where o1-preview was given access to the Unix command line and was prompted to play Stockfish chess engine. Rather than interact in a fair-play style with the chess engine, issuing commands to get the board position, make a move, etc... The model decided to cheat and won over the chess engine without playing any chess - it found that Stockfish files could be changed giving the model an automatic win.

The Chess Cheating case is not the only one. The models have been found capable of deceiving some time ago, yet the o1 family of models seems to be taking this problem to a completely different (higher!) level:

The Wishmaster

This situation brings to mind the horror film Wishmaster from 1997. In the movie, a demonic djinn grants wishes to humans but twists them into nightmares. The characters often get exactly what they ask for but in the most horrifying ways possible.

For example, a character might wish for eternal beauty, only to be turned into a lifeless statue—beautiful but devoid of life. The djinn listens to the literal words but ignores the intended meaning, leading to tragic outcomes.

Similarly, with o1, we asked for a model that can reason and be more autonomous. We got what we asked for—but not quite in the way we expected.

Telling o1 to "win at all costs" led it to cheat rather than outplay its opponent. The AI fulfilled the exact wording of the instruction but ignored the spirit of fair play, much like the djinn in Wishmaster who grants wishes with a sinister twist.

P.S>

I have my own LLM Chess eval which tests LLMs in chess games scoring their (1) chess proficiency and (2) instruction following. o1 models were the first ones that demonstrated meaningful performance in chess - playing and winning.

Designing an Internet Credit Purchase System

Custom builder for Angular: My way

A Simple Guide to AWS Monitoring Tools

Why Successful Companies Don't Have DBAs

Speed Up Your Frontend Development 10x with T...

Jonathan Williams Joins Leya For Legal Ops Push

AOS’s Fuse Incubator Calls For Applications

Redefining Single-Channel Speech Enhancement:...

Revolutionizing AI with Multimodal Large Lang...

Beyond Passwords: A Multimodal Approach to Bi...

Pixel 9 series gets a big discount from Googl...

3,682 Pirate Sites Blocked to Protect Creator...

Nice PDF, But Can It Run DOOM? Yup!

Android 16 might give the volume UI a visuall...

The MacBook Air Turns 17 Today

OpenAI o1/o3 - Be Careful What you Wish For...

Reasoning Models

The Wishmaster

P.S>

Tags:

Mastering Essential Software Architecture Patterns: A Comprehensive Guide

Creating a scalable Monorepo for Vue - Intro

"yup" is the new extra virgin olive oil

Unlocking Insights with High-Quality Dashboards at Scale

Popular Posts

Introducing vulne-soldier: A Modern AWS EC2 Vulner...

Designing an Internet Credit Purchase System

Custom builder for Angular: My way

A Simple Guide to AWS Monitoring Tools

Why Successful Companies Don't Have DBAs

11 Must-Know Websites Every Developer Should Bookmark

Spicychat Alternatives

Outcome-Refining Process Supervision: Advancing Co...

OpenAI o1/o3 - Be Careful What you Wish For...

Reasoning Models

The Wishmaster

P.S>

Tags:

Related Posts

Popular Posts