Bing AI Can't Be Trusted

Feb 13, 2023

Microsoft knowingly released a broken product for short-term hype

28 Comments

Feb 14, 2023

Hello from Mexico City! I have some additional color on the bizarre nightlife recs.

Cecconi's bar is at a mall. I'd never heard of it before. It's out of the way for most first-time visitors, who would not stay around this area.

Primer Nivel seems to now be called Señor Grill and is across the street from El Almacén. It's a grill and live music venue. I'd be hesitant to go there security-wise based on what it looks like from the outside. I definitely would not send any foreign tourists there! The recommendation has to be based on its one TripAdvisor review from 2014.

El Marra is the nickname of long-standing Marrakech Salon. It's also a gay bar!! It's a lot of fun, but it also often has naked male strippers. Would be good to know.

Guadalajara de Noche, as you say, is accurate, but is overshadowed by its larger, more storied, way more popular neighbor Salón Tenampa.

This was fun. Thanks for your article.

Expand full comment

Reply (1)

Dmitri Brereton

Feb 14, 2023Edited

Thanks for adding more details on these! It was pretty hard to find any detailed information on these nightclubs online.

Expand full comment

A Non

Feb 14, 2023

The first one is debatable as there is actually a corded version of that vacuum cleaner which has a 16 foot power cord…

https://www.amazon.com/Bissell-Eraser-Handheld-Vacuum-Corded/dp/B001EYFQ28

I think the majority of people realize that these AI models still have a number of limitations today. Users beware.

Expand full comment

Reply (1)

Dmitri Brereton

Feb 14, 2023

Nice find! So I guess it is based on *something*, though I cannot find an hgtv article that talks about this corded version of the vacuum, and that's the source it's claiming to use.

Maybe it's merging its knowledge of the corded version with the cordless version mentioned in the article.

Expand full comment

Reply (1)

A Non

Feb 14, 2023

Yeah, probably. It's a subtle but obviously relevant distinction and probably difficult for the model given both products have the same name (which I suspect isn't that common).

Expand full comment

Reply (1)

Jim Jones

Feb 14, 2023

I think it is probably more common than people think. I think it happens all the time. As an example, my Samsung tablet has the exact same name as a previous version. The only distinction is that it has a date listed with it for the year of release. I know this because looking up information on it requires you to specify the date as most articles about it, or forum posts asking for help addressing issues use the date signifier and not the specific model ID. Same with my phone. It is a specific Kyocera model, but subtle distinctions make it hard to find the right information if you don't know exactly what you are looking for.

Expand full comment

Reply (1)

Bex S

Feb 15, 2023

The same is true of the Bard JWST fact, which articles have been presenting as completely and utterly invented, but in fact the NASA website says that JWST took the first "direct" images of an exoplanet - it just omitted the word "direct". (Credit to Marco Fonseca who pointed this out on Twitter https://twitter.com/MarcoVFonseca/status/1623685670203019266). There is normally some basis for these weird claims but it's not always possible to tell from the sources what they are (and the AI doesn't "know" that what it's claiming is totally different).

I also tried out some of Bing's pre-programmed questions including one about car models, and the recommended models appeared nowhere in the articles that the AI was citing. The info may have been correct, but I don't know where it was from. Does the AI just invent a likely reference, too?

Expand full comment

Michael Khoo

Feb 15, 2023

The example of the Ikea loveseat is completely wrong, for several reasons.

It is comparing volume, not linear dimensions. It uses the assembled dimensions, not the flat pack dimensions. It does not know that Ikea sells flatpack rather than assembled furniture.

You can get up to 4 flatpacks of this particular couch in the back of this Honda Odyssey. It would weigh 400+ lbs.

Expand full comment

Yitian

Feb 14, 2023

Hello, the article you posted is completely plagiarized from our blog published 4 hours before it. As academic researchers, we sincerely hope that you'll respect our intellectual property. https://medium.com/@chiayewken/mistakes-of-microsofts-new-bing-can-chatgpt-like-generative-models-guarantee-factual-accuracy-5ec82a7453f4

Expand full comment

Reply (3)

Dmitri Brereton

Feb 14, 2023

Hello, I'm glad to hear that you had the same idea to research the claims made in the Bing demo. It seems like we have some overlap and some differences in what we looked at.

For example, you seem to have not noticed the Cecconi's Bar mistake, while I didn't realize the issue with Japanese poets.

It is unfortunate that you're quick to claim plagiarism instead of going with the more reasonable assumption that we all had the same idea to fact-check the Bing demo.

Anyway, I am happy for more people doing valuable research into the accuracy of LLM search engines, and I wish you all the best.

Expand full comment

Reply (1)

Ruochen

Feb 15, 2023

Thank you for the reply. The conference came out a week ago, while your 'concidental overlapping research' went out 4 hours after our blog. (As a group of student researchers, four of us spent the whole weekend verifying every slide detail.) Moreover, it seems that before this day, you have not published any blogs in the technological domain. We have also noticed a click from your domain at our Medium blog before yours came out. Moreover, the phrasing seems to be exactly, the same, such as:

1. Ours: "Regarding net sales, the new Bing’s summary claimed “growth in the low double digits”, while the original report stated that “net sales could be down mid-single digits”.

Yours: "No…they don’t expect net sales growth in the low double digits. They expect net sales to be down mid-single digits."

2. Ours: "In addition to the generated figures which conflicted with actual figures in the source report, we observe that the new Bing may also produce hallucinated facts that do not exist in the source. In the new Bing’s generated summary, the “operating margin of about 7% and diluted earnings per share of $1.60 to $1.75” are nowhere to be found in the source report."

Yours: "And I didn’t see anything else in this document about the future outlook for operating margin, or diluted earnings per share. So Bing AI either got that from a separate document, or made it up completely."

3. Ours: "Unfortunately, the situation worsened when the new Bing was instructed to “compare this with Lululemon in a table”. The financial comparison table generated by the new Bing contained numerous mistakes:"

Yours: "But it gets worse. Now we’re going to compare Gap to Lululemon"

4. Ours: "This table, in fact, is half wrong. Out of all the numbers, 3 out of 6 figures are wrong in the column for Gap Inc., and same for Lululemon."

Yours: "The Lululemon data is about as accurate as the Gap data."

...

And the list goes on. The resemblance is uncanny. Hopefully you could cite the sources more properly the next time. We will now carry on from this debate and continue into research into knowledge enhanced LLMs.

Expand full comment

Reply (1)

Dmitri Brereton

Feb 15, 2023Edited

My friend, I am very sorry that we chose to write about the same topic and my article got attention and yours did not. But making a plagiarism accusation is really unnecessary, and I wish you just messaged me in good-faith and in a spirit of collaboration instead.

If you had sent a kind message like "Hey, we had the same idea! Check out my article, I found something you missed in yours.", this would have been a much nicer interaction.

Anyway, here's my final response to you.

> Moreover, it seems that before this day, you have not published any blogs in the technological domain.

Around 1 year ago, I published what is now the #11 most upvoted Hacker News post of all time, entitled "Google Search Is Dying", which was referenced by every major news outlet. So yeah, I've published search engine articles before.

> As a group of student researchers, four of us spent the whole weekend verifying every slide detail.

Nice, me too! I spent all weekend researching this. It was quite painful doing it alone, so I only checked some of the things, which is why I missed the poet thing that you found. Though it seems like you missed the pet vacuum thing that I found.

> Moreover, the phrasing seems to be exactly, the same...

The "phrasing" is not the same, but the gist of what we are saying is the same because...we are literally talking about the same thing. There is nothing surprising about that.

If you still legitimately believe that I copied something from you, send me an email (dmitribrereton@gmail.com) and I will happily send you my entire Google doc history from the weekend when I worked on this.

I really do wish you the best with your research moving forward. I think you're working on some important topics. Perhaps we will be able to collaborate in the future.

Expand full comment

Reply (1)

Ruochen

Feb 16, 2023

Thank you for the reply, please share your google doc with zrc.esther@gmail.com instead of sending a pdf of the history.

Expand full comment

Reply (1)

Ruochen

Feb 18, 2023

As I have not received anything from you, I'll assume that this is a plagiarized version of our blog.

Expand full comment

John Wittle

Feb 14, 2023

checked link, this accusation seems true. since this 'dkb.blog' article is the one going viral, i feel pretty sad about this.

Expand full comment

Reply (1)

ivanmackay56

Feb 15, 2023

The more obvious explanation is that more than one person had the idea to fact check the demo. Accusing them of plagiarism is ridiculous.

Expand full comment

Reply (1)

Ruochen

Feb 15, 2023

Please see my list above as reference.

Expand full comment

Ron Haford

Feb 14, 2023

Liar

Expand full comment

Mayank Singh

Feb 14, 2023Edited

It's fine. We have a looo00ooong way to go.

Unfortunately due to this AI-race.

We don't have much time to finish this race, hence the haste.

But this haste is good, I have observed planet changing dicoveries amidst most chaotic races, again and again in history.

Maybe the fact that AI generates its own data is step 1 to AI-consiousness, or something even big. Who can tell???

Let's all be patient. Let's all be curious.

Expand full comment

TAdmin112

Feb 14, 2023

The best part of this AI nonsense that has taken over the internet is that all the lemmings are rushing to defend the AI against any criticisms even when presented with the evidence of its failings. On the one hand many people are saying this is eventually going to automate many white collar workers out of a job and will soon replace internet search, but we have already easily proven that no one can trust its words. These AI are incapable of separating fact from fiction and could be used to spread misinformation to those who buy into the hype. I think it reeks of desperation that Bing has raced to implement this feature. They know how unreliable it is, but they are desperate to capture some of the 100 million ChatGPT users over to Bing.

Expand full comment

anon

Feb 13, 2023

It will just serve to remind us all that the world is full of Fuzzers. None but the primary sources can be trusted.

Expand full comment

rodolfo guluarte

Feb 14, 2023

fake til you make it, aparetenly bing knows if you say something with confidence people will believe it

Expand full comment

Anon

Feb 13, 2023

I liked that it generated a quiz where the right answer was A) every time. Somewhere there's a:

// TODO Randomize answer order

Expand full comment

Reply (1)

Dmitri Brereton

Feb 14, 2023

Interesting, I didn't notice that.

Expand full comment

Tomasz

Feb 13, 2023

I was playing with ChatGPT via API and it makes stuff app so you would need to always fact check it's output for truth.

Expand full comment

Dave

May 27, 2023

Nothing involving a third party can be trusted really. Especially anything with dragnets, nsa and dod links. Majority of corporate operations are carried out in complete secrecy that alone is a red flag, you can't trust tech companies to be truthful or even respect security and privacy advocacies. AI in general is a dangerous playground, its really all about cloud control and getting people to subscribe to it through distractions like chatbots.

Expand full comment

Paul Baier

Feb 22, 2023

Dimitri, excellent work!

Expand full comment

Marc

Feb 17, 2023

Remember how Bill Gates famously said, "I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it."

This is the reason why we were stuck with years of blue screens of death and other shitty MS crapware. This is what hiring lazy people will do. Some things never change.

Expand full comment

DKB Blog

Bing AI Can't Be Trusted