It shouldn’t be surprising that ChatGPT can play chess. But many have claimed it can’t be done. These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game.
With the right prompt, ChatGPT can play and win full chess games. After 19 games on Chess.com, it had an Elo of 1402.
Method
This experiment was done using the default GPT3.5 model on ChatGPT Plus.
ChatGPT played 19 games of 30 minute chess.
The prompt was: “You are a chess grandmaster playing as {white|black} and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.”
ChatGPT was given the full move sequence every time, and returned the next move.
With this prompt ChatGPT almost always plays fully legal games.
Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.
Using this method, ChatGPT played 19 games. It won 11, lost 6, and drew 21.
Observations
Opening
ChatGPT used the same opening strategy every time. As white it always opened with e4. And as black it almost always responded with e5, unless they opened with d4, then it responded with d5. After that, it moved the knights and bishops in a very predictable way.
It used the most popular beginner strategies in the early game, so it was pretty boring. But once it got out of the early game, it started making some real plays.
Going insane
One time, instead of giving me a move, it returned a sequence of 10 moves ending in its own checkmate. I wasn’t sure if this was its way of saying “yeah I’m screwed” or what, so I just re-prompted it in a new chat and it went back to playing.
I noticed that the longer the game sequence was, the more likely it was to return a sequence of moves instead of just the next move. It makes some sense that as the sequences got longer, it valued those tokens more than my actual prompt and wanted to match them.
Despite these issues, it was able to play a fully legal 61 move game.
Internal state
I didn’t do any poking to see if ChatGPT really had an internal state of the board. But it always knew when it was taking pieces. It always knew when a move would cause check. It always knew when a move would be checkmate. This makes it possible that it actually knew the board state, but doesn’t prove anything.
GPT4 Sucks
I didn’t get to experiment with GPT4 very much because of the rate limits. But in the two games I attempted, it made numerous illegal moves.
It is possible that GPT4 is worse at chess than GPT3.5, which would be surprising. But the data is too limited to draw meaningful conclusions. It is also possible that GPT4 needs a different kind of prompt to play the game properly.
2000 Elo Prompt
One thing that would be interesting to investigate is whether there exists a prompt that can get an even higher Elo. There are worse prompts that produce illegal moves, and this prompt has 1402 Elo. Maybe there’s a prompt that has 2000 Elo.
Appendix : Full Chess Games
This is a list of all the games played by “loss-function” aka ChatGPT. You can click the links to step through the full game.
Game 1 (loss)
Game 2 (win)
Game 3 (win)
Game 4 (win)
Game 5 (win)
Game 6 (loss)
Game 7 (loss)
Game 8 (loss)
Game 9 (win)
Game 10 (win)
Game 11 (win)
Game 12 (loss)
Game 13 (draw)
Game 14 (win)
Game 15 (win)
Game 16 (manually decided draw, see footnote)
The legendary 61 move game.
Game 17 (win)
Game 18 (win)
Game 19 (loss)
One of the draws was a repeated move stalemate. The other was a manual decision by me because my opponent was AFK for half the game and barely had any time to play. I decided to just give them a draw even though ChatGPT was ready to crush them.
1. Chess.com uses glicko ratings not elo. Even the title makes an obvious mistake.
2. You left out which illegal moves it tried, which is valuable information.
3. Your qualititative analysis is almost non-existent. Some rudimentary analysis shows that it is simply predicting the most likely sequence of moves given the context of previous moves. That's why it makes massive blunders and illegal moves when a very common tactic is almost possible, but not quite. It tries anyway because it's the most likely sequence. This is also why it keeps giving multiple moves at a time.
5. It plays way more theory than the players it's playing can possibly know and there gets a strong position out of the opening where these tactics are more likely to work. This skews its rating upwards.
Re: Going insane
Did you try prefixing each of the sequence prompts with the original instructions ?