ChatGPT Fails The Coding Interview
It's good at coming up with plausible but buggy code.
If you were planning to use ChatGPT to pass your coding interview, I have bad news for you. ChatGPT can only solve 28% of LeetCode questions. And that’s being generous.
Real interviews use medium and hard questions. So while ChatGPT can solve 77% of easy questions, it can only solve 16% of mediums, and 0% of hards.
This data comes from my own experiment, where I tested ChatGPT on the most recent 100 LeetCode questions.
ChatGPT performs well on old, popular interview questions, but that’s a biased sample.
The answers to popular interview questions were undoubtedly in the ChatGPT training data. It doesn’t tell us how it would perform in a real interview with new questions.
So I tested it on questions it definitely hasn’t seen before, and I gave it 3 tries at solving each problem.
If we gave it a high level plan of how to solve the problem, would that help?
Providing a high level solution helped ChatGPT go from a wrong answer to the right answer 19% of the time. That’s better than nothing, but not as much as I hoped.
Lower The Bar
There are different degrees of solving a LeetCode problem. The above stats were for getting an accepted solution, which involves passing numerous test cases.
What if we lower the bar a bit?
If we use the extremely low bar of “can it pass more than one test case”, then it performs well.
Plausible But Buggy
Overall, ChatGPT is good at coming up with plausible but buggy code. It often knows the general direction for the solution (e.g it’s a dynamic programming problem), but then it implements it with bugs and fails test cases.
The most interesting fail was an easy question that it got wrong. It explained its thought process and plan for solving the problem, which was correct. Then it wrote the code, and managed to forget an edge case that it specifically mentioned.
It figured out the general formula, then wrote this code that ignored the n=1 edge case.
The correct code just needed 2 more lines.
ChatGPT isn’t ready to crack the coding interview just yet. But it is good at coming up with a high level plan, and writing a buggy implementation.
If GPT4 is able to write more accurate code, then it could be the thing that finally kills the algorithmic coding interview.
(All of the ChatGPT generated solutions are available on github if you’re curious.)
The idea that an simple chatbot can pass leetcode at more than 25% is enough scare for me. Without grinding I doubt most engineer can pass leetcode at higher percentage that ChatGPT