People who use slop keep saying stuff like “my productivity grew by leaps and bounds” and such. Yet I’ve never seen substantiation of productivity gains with LLMs: not with colleagues who gave it a try (at my boss’ behest until he realized it was idiotic), nor with any actual formal management studies.
Are there any peer-reviewed studies showing productivity going “off the charts” by use of LLMs? Or is this all anecdote from people who were never productive in the first place and can thus make any claim they like about their productivity? (After all 1000× nothing is … nothing.)
Yeah, that’s the kind of thing I’ve been seeing, meaning that so far all actual studies directly contradict the anecdotes. And directly contradict my personal experience watching people trying to use it before just giving up and going back to working like actual people.
I guess if you measure productivity by word count instead of work quality and utility, L. Ron Hubbard is one of the most productive writers in all of history.
I cannot fathom people using AI to write an email. Anyone not competent enough to quickly express themselves in writing is a failure to begin with. Even if you suck at writing, you should still be able to get something down.
I have limited experience, but ChatGPT was great for getting me over a hump when I was writing PowerShell. I was at a dev shop at the time and most of them seemed to feel the same way.
Vibe coding is a dead end, and no replacement for actually knowing the job, but an LLM can give you ideas and inspiration when you’re stuck.
One example; I was trying for a Google Sheets/Calendar integration. Very thin documentation and almost no examples online. Had ChatGPT write it. Of course it didn’t function, but I picked out a couple of lines that got me finished. I had spent hours banging my head when I could have cut to the chase.
So how could we study effectiveness when that depends on how it’s used?
It’s really good at giving you hints to steer you to the right path. Sometimes it has given me answers that are exactly what I was looking for, but didn’t know existed. But I’ve also seen it spit out outdated answers, incomplete answers, and solutions that an experienced dev wouldn’t use. It’s also limited because it doesn’t understand the context, where a better solution could’ve been available, but you wouldn’t know.
Overall, I’d give it a 7/10 as a study aid. It’s a great place to start, but don’t rely on it.
We design studies around claimed beneficial workflows and measure.
Nobody said this is easy. It isn’t. Social sciences in particular are hard to design good studies for. But without measurement (and precise definitions of what “effectiveness” means for purposes of these measurements) all claims about “productivity through the roof” are sus as all fuck.
I haven’t seen anything yet. At work, the ones that praise AI the loudest are exceptionally highly correlated with the people who lack a good understanding of the core concepts. The ones that just float around cargo culting and looking busy by making noise.
That said, LLMs are still useful tools, that are highly misused. What they’re useful for, is a lot less than most think. The user needs to be the expert. If they’re not, they’d be better off reading a book on the matter (and how things are looking, it might have to be one written before LLMs came out).
Also checking work of AI is a lot harder than checking the worker of a junior employee.
Junior employee makes predictable mistakes and improves over time.
These models improve alright based on benchmarks but they fundamentally will never know right from wrong and the whole point of being a professional is being able to tell that. Being able to catch being wrong and correct. Stand your ground when you know you know you are right…
In my experience it can really only competently do the 80% that takes 20% of the effort. Brainstorming and planning and gathering data - and even that you have to sift through for accuracy so maybe it saves 10% of the work in the best situations.
The biggest benefit I’ve gotten from AI is that it sometimes enables me to do some things I never would have been able to do with a small budget and a small team. But that didn’t save me any time, maybe even required more time.
I guess “allowed me to do something at all, even if slower and not as high quality” is a benefit. But that seems an underwhelming benefit for something touted as “Ph.D. level assistance”…
Well I didn’t say lower quality, but I wouldn’t say better quality either. My field is pretty subjective so while I was pretty excited about what it enabled, I’m sure there are people who would have preferred the pre-AI version.
It could just be someone lazy/bad at their job going from below average to average using llms. Shit productivity to standard productivity can be described as “leaps and bounds”.
Again, however, there’s no evidence beyond the anecdotal that this is a thing. The only formal studies I’ve seen have found the opposite: that LLMs reduce productivity.
I’d like to see formal studies stating otherwise before I take seriously any claim of huge performance gains, 'cause when I see below average people using LLMs for writing, their writing gets worse, not better, because they lack the ability to assess the LLM output for accuracy.
Oh absolutely, I’d love to see some real studies on this as well. I was just mentioning that I’d suspect a lot of these numbers to be artificially inflated due to things like I mentioned. The people ai pushes productivity up leaps and bounds for (if they even exist) are probably the people who could just be fully replaced by ai. People who’s bar is already set so low that an ai-aided uptick could be seen as a massive improvement. Leaps and bounds is one thing, but leaps and bounds in a meaningful way is a whole other metric haha.
I asked for peer-reviewed studies, not anecdotal opinion. I’m full up to the brim with anecdotal opinions that don’t seem to match reality in any measurable way.
People who use slop keep saying stuff like “my productivity grew by leaps and bounds” and such. Yet I’ve never seen substantiation of productivity gains with LLMs: not with colleagues who gave it a try (at my boss’ behest until he realized it was idiotic), nor with any actual formal management studies.
Are there any peer-reviewed studies showing productivity going “off the charts” by use of LLMs? Or is this all anecdote from people who were never productive in the first place and can thus make any claim they like about their productivity? (After all 1000× nothing is … nothing.)
Related: https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity
It’s possible they’re getting more stuff written and more emails sent, but the lower signal to noise ratio means this isn’t actually helping.
Yeah, that’s the kind of thing I’ve been seeing, meaning that so far all actual studies directly contradict the anecdotes. And directly contradict my personal experience watching people trying to use it before just giving up and going back to working like actual people.
I guess if you measure productivity by word count instead of work quality and utility, L. Ron Hubbard is one of the most productive writers in all of history.
I cannot fathom people using AI to write an email. Anyone not competent enough to quickly express themselves in writing is a failure to begin with. Even if you suck at writing, you should still be able to get something down.
I have limited experience, but ChatGPT was great for getting me over a hump when I was writing PowerShell. I was at a dev shop at the time and most of them seemed to feel the same way.
Vibe coding is a dead end, and no replacement for actually knowing the job, but an LLM can give you ideas and inspiration when you’re stuck.
One example; I was trying for a Google Sheets/Calendar integration. Very thin documentation and almost no examples online. Had ChatGPT write it. Of course it didn’t function, but I picked out a couple of lines that got me finished. I had spent hours banging my head when I could have cut to the chase.
So how could we study effectiveness when that depends on how it’s used?
It’s really good at giving you hints to steer you to the right path. Sometimes it has given me answers that are exactly what I was looking for, but didn’t know existed. But I’ve also seen it spit out outdated answers, incomplete answers, and solutions that an experienced dev wouldn’t use. It’s also limited because it doesn’t understand the context, where a better solution could’ve been available, but you wouldn’t know.
Overall, I’d give it a 7/10 as a study aid. It’s a great place to start, but don’t rely on it.
We design studies around claimed beneficial workflows and measure.
Nobody said this is easy. It isn’t. Social sciences in particular are hard to design good studies for. But without measurement (and precise definitions of what “effectiveness” means for purposes of these measurements) all claims about “productivity through the roof” are sus as all fuck.
I haven’t seen anything yet. At work, the ones that praise AI the loudest are exceptionally highly correlated with the people who lack a good understanding of the core concepts. The ones that just float around cargo culting and looking busy by making noise.
That said, LLMs are still useful tools, that are highly misused. What they’re useful for, is a lot less than most think. The user needs to be the expert. If they’re not, they’d be better off reading a book on the matter (and how things are looking, it might have to be one written before LLMs came out).
Exactly this…
Also checking work of AI is a lot harder than checking the worker of a junior employee.
Junior employee makes predictable mistakes and improves over time.
These models improve alright based on benchmarks but they fundamentally will never know right from wrong and the whole point of being a professional is being able to tell that. Being able to catch being wrong and correct. Stand your ground when you know you know you are right…
These systems can’t do any of it.
Oh, do I ever feel this!
In my experience it can really only competently do the 80% that takes 20% of the effort. Brainstorming and planning and gathering data - and even that you have to sift through for accuracy so maybe it saves 10% of the work in the best situations.
The biggest benefit I’ve gotten from AI is that it sometimes enables me to do some things I never would have been able to do with a small budget and a small team. But that didn’t save me any time, maybe even required more time.
I guess “allowed me to do something at all, even if slower and not as high quality” is a benefit. But that seems an underwhelming benefit for something touted as “Ph.D. level assistance”…
Well I didn’t say lower quality, but I wouldn’t say better quality either. My field is pretty subjective so while I was pretty excited about what it enabled, I’m sure there are people who would have preferred the pre-AI version.
It could just be someone lazy/bad at their job going from below average to average using llms. Shit productivity to standard productivity can be described as “leaps and bounds”.
Again, however, there’s no evidence beyond the anecdotal that this is a thing. The only formal studies I’ve seen have found the opposite: that LLMs reduce productivity.
I’d like to see formal studies stating otherwise before I take seriously any claim of huge performance gains, 'cause when I see below average people using LLMs for writing, their writing gets worse, not better, because they lack the ability to assess the LLM output for accuracy.
Oh absolutely, I’d love to see some real studies on this as well. I was just mentioning that I’d suspect a lot of these numbers to be artificially inflated due to things like I mentioned. The people ai pushes productivity up leaps and bounds for (if they even exist) are probably the people who could just be fully replaced by ai. People who’s bar is already set so low that an ai-aided uptick could be seen as a massive improvement. Leaps and bounds is one thing, but leaps and bounds in a meaningful way is a whole other metric haha.
deleted by creator
I asked for peer-reviewed studies, not anecdotal opinion. I’m full up to the brim with anecdotal opinions that don’t seem to match reality in any measurable way.
Especially because coding is a thinking task and current LLMs have the thinking capacity of a three year old. Here is the, as OpenAI claims, PhD level intelligence: https://chatgpt.com/share/68e749ba-b984-800c-a90a-240d17485b68
LLMs are knowledgeable because of the amount of data they are trained on, but they are extremely dumb.
I’m not even sure that they can match a three year old.
I’m not sure they can match a newborn.
This claim also varies greatly depending on coding language, frameworks used and the task at hand.