ChatGPT Fails The Coding Interview

It's good at coming up with plausible but buggy code.

Feb 22, 2023

ChatGPT performance on the most recent 100 LeetCode questions.

If you were planning to use ChatGPT to pass your coding interview, I have bad news for you. ChatGPT can only solve 28% of LeetCode questions. And that’s being generous.

Real interviews use medium and hard questions. So while ChatGPT can solve 77% of easy questions, it can only solve 16% of mediums, and 0% of hards.

The Experiment

This data comes from my own experiment, where I tested ChatGPT on the most recent 100 LeetCode questions.

ChatGPT performs well on old, popular interview questions, but that’s a biased sample.

The answers to popular interview questions were undoubtedly in the ChatGPT training data. It doesn’t tell us how it would perform in a real interview with new questions.

So I tested it on questions it definitely hasn’t seen before, and I gave it 3 tries at solving each problem.

Assisted Solutions

If we gave it a high level plan of how to solve the problem, would that help?

Kind of.

Providing a high level solution helped ChatGPT go from a wrong answer to the right answer 19% of the time. That’s better than nothing, but not as much as I hoped.

Lower The Bar

There are different degrees of solving a LeetCode problem. The above stats were for getting an accepted solution, which involves passing numerous test cases.

What if we lower the bar a bit?

If we use the extremely low bar of “can it pass more than one test case”, then it performs well.

If we lower the bar by a lot, ChatGPT isn’t so bad after all

Plausible But Buggy

Overall, ChatGPT is good at coming up with plausible but buggy code. It often knows the general direction for the solution (e.g it’s a dynamic programming problem), but then it implements it with bugs and fails test cases.

The most interesting fail was an easy question that it got wrong. It explained its thought process and plan for solving the problem, which was correct. Then it wrote the code, and managed to forget an edge case that it specifically mentioned.

ChatGPT coming up with the correct reasoning before writing buggy code

It figured out the general formula, then wrote this code that ignored the n=1 edge case.

The correct code just needed 2 more lines.

Conclusion

ChatGPT isn’t ready to crack the coding interview just yet. But it is good at coming up with a high level plan, and writing a buggy implementation.

If GPT4 is able to write more accurate code, then it could be the thing that finally kills the algorithmic coding interview.

(All of the ChatGPT generated solutions are available on github if you’re curious.)

DKB Blog

Discussion about this post