Why your AI transformation has become an AI slop factory
The world is drowning in what’s now called “AI slop".
Thought Leadership
7
Minute Read
Minute Read
Minute Listening
Minute Reading
Minute Viewing
Series One
Episode
Article
18
Neel Doshi
Co-Founder and Director
Neel is the cofounder of Vega Factor and co-author of the NYT bestseller Primed to Perform, published in fall 2015 by HarperBusiness. Previously, Neel was a Partner at McKinsey & Company, founding member of an award-winning tech startup, and employee of several mega-institutions. He studied engineering at MIT, and received his MBA from Wharton.
95% of generative AI projects in companies are failing according to a study at MIT.
However, MIT also found that when the projects were built externally (i.e., by vendors, tech companies), they were three times more likely to succeed. And even in our personal experience, we've all witnessed the magic and power of generative AI.
So how can we make sense of this disconnect?
In the last two weeks alone, nine C-suite leaders shared with me some version of “We gave everyone access to AI, but all we’re getting is more AI slop. Where’s the productivity?"
It’s not just them. If you’re leading an AI transformation, you’ve probably felt it too. The dashboards say “AI adoption is up.” But when you look at the actual work—emails, reports, even code—something’s off. There's more stuff but less value.
This is the first in our article series on why so many AI transformations and projects are failing. Across this series, we're going to work through the three biggest issues we're seeing in practice, and what you can do about it.
1. AI produces slop if you're not managing it well.
2. Grafted-on AI has far less value than AI-native processes.
3. AI still requires humans, which means AI adoption requires transformational change.
Today, we're going to dip our toe into AI slop.
Slop Everywhere
You’re not imagining it. The world is drowning in what’s now called “AI slop”—a flood of low-value content generated by large language models. It’s everywhere: in your LinkedIn feed, your inbox, in the ads you see from brands that should know better, and even in legal filings. There’s a Wikipedia page for AI slop. It's in the dictionary. And it might just be destroying the internet.
If you’re tired of seeing your AI investment turn into a slop factory, you’re in the right place. And just so you know, you'll see a bunch of em-dashes in this article, and each has been hand-typed by me, not an AI. I didn't memorize the shortcut "option+shift+dash" only to have AI paranoia take away my beloved em-dash.
What is “AI slop” in a corporate setting?
When I was early in my career as a consultant, a mentor told me when you're sharing an idea, make sure it is insightful. And an insight is an idea that is valuable that the other person doesn't know. My mentor didn't use the term, but he was essentially saying that "consultant slop" is non-insightful chatter.
AI slop is no different than consultant slop.
Wikipedia defines AI slop as "low quality" content and "digital clutter." In corporate productivity, we'd go one step further and add to the definition:
Content that produces bad feedback loops, like overly sycophantic AI that just agrees with everything you're saying
Content that is overly wordy
Content that is not insightful
What many organizations don't realize is their approach to using AI often produces this slop.
Three critical factors of AI quality
AI can help create motivated, adaptive, high-performing organizations. We're seeing it for ourselves. When wielded well, AI can create a golden age of work. When wielded poorly, it can destroy much of what humanity most holds dear.
To wield AI well, organizations must learn (or understand) three different factors to AI quality:
Model quality – Using an analogy, think of model quality like the education system's ability to produce knowledgeable and skilled employees. Model quality is like the quality of the employees you hire. Models vary in quality, even by single providers. For example, Google's Gemini has three models of different levels of quality –Pro, Flash, and Flash-lite. The upside of using a lower quality model is it costs less and is faster.
Prompt quality – Continuing that analogy, think of prompt quality as a leader's ability to ask that employee the ideal questions to help that employee perform at their best. Prompts can vary in quality, from asking simple questions to providing detailed step-by-step instructions on how to think through a problem.
Context quality – And closing out this analogy, think of context quality as a leader's ability to give that employee just the right amount of context (not too much, not too little) to do their work adaptively and to a high level of excellence. Context can vary from not enough to truly understand a problem to too much, causing confusion.
In some disciplines, sometimes these three are considered types of engineering (model engineering, prompt engineering, and context engineering).
Because we are deep believers in the promise of AI and deeply against the destructive value of AI slop, while developing the Factor.ai platform, we continually run multi-factor experiments on AI output quality. Below is one such example.
The experiment is to see how effectively Factor.AI can guide a frontline team to build high-performing and motivating strategies. This is important to us because we've found that teams that can quickly and effectively strategize will be much more motivated to perform, and thus will perform significantly better. In practice, we've found that the majority of teams struggle to strategize, as evidenced by poor prioritization and high burnout.
In this experiment, we're going to run a 2 × 3 × 3 design. (Note: just because I use a proper "×"multiplication symbol instead of a lowercase x, doesn't mean I'm an AI. It means I love fonts.)
As you can see above, our experiment varied model, context, and prompt quality. What you'll also notice is that differences in prompt and context had the highest impact on quality (as models have more or less asymptoted on performance).
Here's how the three factors were varied:
Parameter 1: Model quality.
Medium: Half of the experiments use Google's Gemini Flash (without reasoning). This is Google's mid-tier model with its "thinking mode" turned off.
High: The other half of the experiments use Google's Gemini Pro. This model is a "reasoning" model, meaning that it takes extra steps to deeply think about what it is doing. This is Google's flagship model.
We often also test with Gemini Flash Lite, but often it failed our tests on this type of output, so we excluded it from this test setup. We also test with many other models including GPT and Claude, but for this article, let's keep it simple.
Parameter 2: Prompt quality.
Low: A basic prompt that is about the level a typical employee would use in ChatGPT.
Medium: A prompt that provides more detailed instructions on how to structure output well and what specifically would be helpful.
High: A prompt that provides detailed instructions on how to think through the output, step-by-step.
Parameter 3: Context quality.
Low: The amount of context the average user would typically include in ChatGPT.
Medium: More context, to make sure the model would know what is differentiating.
High: A significant amount of context about a team, its members, their skills, and broader organizational strategic context.
The output of this experiment were strategies (problems to solve and ideas) for a specific frontline team for a specific mid-sized company. The output was graded (blindly) on its quality, which includes its level of value and insight.
When you look at each dimension singularly in this experiment, prompt quality had the strongest impact on output quality, then context. Model was about a wash. For the sake of accuracy, across many experiments, we see similar patterns, namely:
Sometimes context comes out ahead of prompt.
Over time, as models have gotten better, the difference in model has gotten less pronounced.
Lastly, when our scoring criteria includes writing quality, generally the highest-end models show better results. In this experiment, we were much more focused on the quality and insightfulness of generated strategies.
Table
When we look at different factors together (see above exhibit), a slightly more nuanced pattern emerges. For example, when prompt and context quality are both high, output hits an acceptable 94% in quality.
Note that the average user's use of your corporate branded ChatGPT is typically low-quality prompt and low-quality context. At 52% quality, this is slop.
Lastly, in this particular test, you can see overall, that the combination of the best model, context, and prompt outputted the best results. However, the next two test cells were close enough for a high-quality bar. Moreover, it is also worth noticing that the best option had the largest total cost (by a lot) and took over twice as long to generate.
How should this impact your AI transformation?
Again, I want to remind you that this was one of the many experiments we run. But the pattern we see across them is extremely clear - Prompt and Context are critical to going from slop to performance increasing.
When we see AI transformations or projects in organizations, not nearly enough thought goes into getting context perfect, or getting the prompt perfect. In Factor.ai, every AI experience is obsessed over so that it has a prompt + quality + model combination that results in the highest possible levels of quality.
If you aim to get AI right, you need to make sure your tools get all three factors—prompts, context, and model—right for the specific use case of that tool. Moreover, we've found that both prompts and context are not perfected through engineering experience alone. Instead, they need to also be perfected by deep subject matter experts in the domains of those tools.
For example, if you're making an AI tool to automate auditing, the prompts and context need to be developed by AI experts who are also coupled with audit experts. Every prompt and context injection in Factor.ai, which automates leader workflows and saves leaders three months of work per year, was developed by AI experts and the world's leading experts in those workflows.
If this is not how you're building your tools or driving your AI transformation, it is unlikely to succeed.
Going forward
Many of you may know our work from the book Primed to Perform. In it, we unpack the science of human performance. Since then, our mission has been to create a world where everyone is able to feel the highest levels of fulfillment and impact in their work.
Over the past decade, we've worked with the who's who of companies, big and small. And if there's one lesson we've learned, even before AI was prevalent, how we prompt people and how we give them context is ultimately what separates the motivated from the rest. The same is true with AI.
If this article resonates with your challenge, let us help you sort it out!
If you could hear any of your challenges in this podcast, we can help!
Want to engage with our Solution Partners, schedule a call now!
Wanting to achieve similar success across your organisation?
Did this recording echo challenges you may be experiencing in your workplace, let us help!
As a community of like-minded clients and solution partners, we're driven to improve the workforce so every organisation and their employees can thrive.