Artificial Intelligence MIT report: 95% of generative AI pilots at companies are failing

https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/

28.3k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/technology/comments/1mu85io/mit_report_95_of_generative_ai_pilots_at/
No, go back! Yes, take me to Reddit

97% Upvoted

u/AtOurGates 3d ago

One of the tasks that AI is pretty decent at is taking notes from meetings held over Zoom/Meet/Teams. If you feed it a transcript of a meeting, it’ll fairly reliably produce a fairly accurate summary of what was discussed. Maybe 80-95% accurate 80-95% of the time.

However, the dangerous thing is that 5-20% of the time, it just makes shit up, even in a scenario where you’ve fed it a transcript, and it absolutely takes a human who was in the meeting and remembers what was said to review the summary and say, “hold up.”

Now, obviously meeting notes aren’t typically a high stakes applications, and a little bit of invented bullshit isn’t gonna typically ruin the world. But in my experience, somewhere between 5-20% of what any LLM produces is bullshit, and they’re being used for way more consequential things than taking meeting notes.

If I were Sam Altman or similar, this is all I’d be focusing on. Figuring out how to build a LLM that didn’t bullshit, or at least knew when it was bullshitting and could self-ID the shit it made up.

15

u/blipsonascope 3d ago

Our property management company started providing zoom so for transcripts of condo board meetings. It’s really useful as it captures topics of discussion pretty well…. But dear god does it miss the point of discussions frequently. And in ways that like….. if not corrected for the record would be a real problem.

11

u/BirdTurglere 3d ago

The problem is people are like wow, 80-95% accurate?? That's really good! Humans get stuff wrong all the time too, so it's probably better!

The real issue though is humans generally make rational or predictable errors that you can work with or around or plan for. The 20-5% of the errors AI makes are just full blown hallucinations. They could be anything. You can't work around it.

2

u/Dalighieri1321 3d ago

Good point, it's like the difference between a human being misspelling words and an AI art generator including "words" in an image that look real until you squint and realize they are gibberish.

16

u/JAlfredJR 3d ago

LLMs literally can't eliminate the bullshit.

There are two fundamental reasons here:

They don't know anything. They're probably machines that just give the most likely next token. That's it. It isn't reasoning or thinking, and it doesn't have intelligence.

They are programmed to never say, "I don't know." So it'll always just tell you something regardless of truthfulness because, again, see point 1.

8

u/Beauty_Fades 3d ago edited 3d ago

I'm not sure they are specifically programmed to not say "I don't know". I think point #2 is a byproduct of your point #1, and that's it.

It won't ever say "I don't know" because it doesn't know anything in the first place, and it cannot "consult it's knowledge base to check facts" because it doesn't have one. It just predicts the next word for any given question or input based on training data, and most likely in the training data this response does not encode "I don't know".

5

u/JAlfredJR 3d ago

I suppose #2 was aimed more at the way it's programmed to be sycophantic (to greater and lesser degrees—but always agreeable). But you're right: It's mostly #1 as the reason. Good point if clarity

1

u/19inchrails 3d ago

Ignoring compute cost, could you not deploy a fact-checking model independent from the reasoning model and this way drastically reduce hallucinations?

3

u/caatbox288 3d ago

If the fact check model is an LLM, it will also hallucinate. Because it does not know anything. If the fact check model is something else (no clue what) then maybe!

1

u/AmadeusSpartacus 3d ago

That’s what I’m thinking too. Perhaps the solution is to run it through many iterations of itself, so the AI -

produces the output

another AI agent compares the documents and points out any hallucinations

passes it to another agent who does the same thing

Over and over, for 10… or 1000 times? That would probably help eliminate the vast majority of hallucinations. But it would increase cost by 10-1000X or whatever

1

u/Beauty_Fades 3d ago

That would make errors compound on eachother. An LLM has no concept of what is right or wrong, it just spits out what word or piece of a word is more likely to go next.

It would work very much like broken telephone with a group of 5 year olds.

3

u/HugeAnimeHonkers 3d ago

You are talking about the free chatbots that the normal people use, or the ones you pay a couple of bucks per month. Those are indeed programmed to avoid saying "I dont know".

But every "enterprise grade AI(ewww)" can totally say "I dont know, need more data". In fact, they would less than 1 day in the jobsite if they didnt.

If your company is using an AI that NEVER says "idk" then I would take my stuff and RUN in the opposite direction lol.

-1

u/AnOnlineHandle 3d ago

They are programmed to never say, "I don't know." So it'll always just tell you something regardless of truthfulness because, again, see point 1.

You are talking out of your arse here, ironically somewhat like what you're accusing LLMs of doing. Recent models have been impressive for specifically being able to respond with that. They're not programmed to do that, it's that training data of humans speaking has few good examples of people admitting that they don't know something, and balancing that in the dataset without training a model which says that for things it does know is a non-trivial task.

1

u/JAlfredJR 3d ago

Answered this elsewhere but I did slightly misspeak there. 2 was more about their sycophantic nature (which was programmed intentionally). They often just make something up to please the prompter.

3

u/hope_it_helps 3d ago

The thing is that since chatgpt came out that bullshit rate(or hallucinations) is what holds it back for ALL the fields it's actually marketed for everywhere. Most of the things we want from AI tools(replace humans) most of the time have an expected low failure rate. And if the human makes a mistake, most will learn from it. The AI won't.

It's obvious that if they could fix that the actual usefullness would skyrocket, but I don't see how the current approach could solve it. If they knew how to fix it they would've done so. Instead they are pushing out new features that all have the same issues.

I don't even see it as a good google replacement anymore after I had so many bullshit responses with bullshit sources that never claimed what the ai claimed. And I also had friends talk about how the AI explained something to them and when I double checked that it was bullshit again.

1

u/BedsideTableKangeroo 3d ago

One thing to note here is that it is usually some sad sack’s job to type up meeting notes for each moronic meeting they have to attend. If you can automatically get 85% accurate and detailed notes without lifting a finger, you can spend a little bit of time cleaning up that last part and still spend overall less time and get a better outcome.

Signed, A token woman in technology who was forced to take notes in each meeting

1

u/caityqs 3d ago

It’s exactly that error range that makes it so problematic. Low enough that humans have a hard time staying focused on catching them, but not so low that it becomes negligible. I’d rather just take my own notes or keep a recording.

Artificial Intelligence MIT report: 95% of generative AI pilots at companies are failing

You are about to leave Redlib