We've Been Having the Wrong Conversation About AI

The Biggest Gains Aren't in Big Tech. They're on the Shop Floor, in the Test Lab, and Buried in Paper

May 2026 · 14 min

I use AI tools several hours a day. I build agent workflows, I write evaluation harnesses, I ship software with LLMs embedded in the process. I say this up front because what follows might sound like it’s coming from someone on the outside looking in. It isn’t, I’m deep in the tooling. And from where I sit, we’ve been having the wrong conversation.

The dominant AI narrative right now is a big-tech story; Record profits alongside mass layoffs product managers shipping demos overnight that used to take dev teams weeks, or the “10x engineer” framing. The discourse is almost entirely about software companies using AI to do the same work with fewer people. I don’t work in big tech, but I keep up. I have plenty of connections in that world, I follow the trends, and what I see doesn’t sit well with me. Posting record profits while cutting the people who built them is not innovation; it’s extraction.

So, I’m someone who uses the tools, but I don’t like the narrative around them or how they’re being used to justify cutting the workforce. What I think is far more interesting is that the capability gains in that arena pale in comparison to what’s possible in industries that barely get mentioned in the AI conversation. Places like manufacturing, test labs, quality engineering and regulatory compliance. Traditional engineering firms where the shop floor still runs on paper are hardly mentioned. These are the places where AI could be transformative, not by replacing people, but by closing decades of digital debt and freeing teams to focus on work that actually matters.

Disconnected nodes transitioning to a connected network

The Scale of the Problem

The gap between where manufacturing is and where it could be is massive, and the numbers back that up. Seventy-four percent of manufacturing and engineering companies still rely on legacy systems and spreadsheets for daily operations.^[1] Roughly 70% of manufacturers who have started Industry 4.0 pilots are stuck in what McKinsey calls “pilot purgatory,” unable to scale beyond proof-of-concept.^[2] Meanwhile, 60–80% of manufacturing IT budgets go to maintaining old systems, not building new ones.^[3]

These aren’t abstract numbers. I’ve lived in this world. In a previous role as a materials and process engineer focused on polymer extrusion, most of the shop floor was managed by paper. Work orders and routing information were printed, slipped into a plastic folio, and sent through the process, often passing through ten to twelve workstations. Before any of that happened, planners, buyers, and sales had already put in significant effort with almost no digital connectivity between them.

Your extrusion operator on step eight had to get the shop order physically delivered to them, walk over to the plastics warehouse, have a warehouse worker pull the correct resin, walk it back to their station, set up the tooling, start heating the machine, and wait, often two to three hours, for everything to reach temperature before they could even begin dialing in the process. If something went wrong, the process engineer came out to review. If the deviation was significant, the material went into non-conforming quarantine until the appropriate team could convene and disposition it.

A lot of these bottlenecks would be remedied by introducing basic digital systems. Reducing paperwork, mix-ups, and unnecessary hand-holding through the process. But the real opportunity sits on top of that foundation, and this is where AI and ML start to change the math.

Paper-based vs. connected and AI-augmented shop floor workflow

What AI Actually Does on the Shop Floor

Consider the extrusion floor I just described, but with connectivity in place. An AI scheduling system reads tomorrow’s production sequence and pre-stages materials at each workstation before the first shift clocks in. It knows Machine 3 is running HDPE at 7 AM, so it triggers the warehouse pull at 5:30 and flags the operator’s queue. That thirty-to-forty-five-minute wait-for-materials bottleneck, repeated across every changeover, every shift, disappears.

Layer ML on top and the gains compound. The extruder is running and the melt pressure starts drifting. Today, a good operator catches it, they’ve seen this before. They’ve got years of tribal knowledge, and they start troubleshooting on their own: checking their temps, reviewing the screen pack and resin lot. They only call the process engineer if they’ve exhausted what they can do at the machine. These folks are incredibly capable and most of the time they solve the problem themselves. But with a model trained on historical process data, the system can flag the drift even earlier, cross-reference it against known failure modes across the entire production history (“last time melt pressure trended this way on this resin lot, the screen pack was 80% blocked due to gels”), and surface that context to the operator in real time. Now the operator’s tribal knowledge is augmented by pattern-matching across thousands of runs they weren’t personally there for. The engineer still gets called when the situation demands it, but fewer situations require it because the operator has better information at their fingertips. That’s a hybrid system doing exactly what it should: giving capable people better tools, not replacing their judgment.

Melt pressure drift detection with ML-flagged anomaly and historical pattern match

The two-to-three-hour warmup window is another target. A scheduling system that knows the run sequence could start pre-heating machines in the right order so they’re at temperature when operators arrive. Layer on ML to optimize the sequence based on energy costs, changeover time, and material thermal sensitivity, and you’ve moved from “catching up” to “leaping forward.” The distinction matters. Basic digitization is the floor, not the ceiling.

The Test Lab

This is where it gets personal for me. Say you’re running a test lab with five scientists. Your team handles a wide variety of work, some tests take thirty minutes, some run for over ten weeks (corrosion studies, accelerated aging). The report burden varies just as widely. Some reports take twenty minutes: pull a PDF of the equipment output, write a summary, include calibration records and methods. Others take ten to twelve hours: process raw data, build charts and tabular summaries, organize photos and microscopy images, then write three to four thousand words of technical narrative.

Now, I don’t think the strategy is to throw everything into a context window with a prompt and pray. But if you build out agent skills, hooks, and reference data, with deterministic steps where they belong, you can get strong, consistent results. The key is that the human is always in the loop. An experienced scientist walks through the process section by section with the LLM.

To keep things consistent, maybe you’ve got a Python script that fires every time there’s tensile testing data. It parses the raw output, summarizes key values, and builds plots with standardized formatting so your team isn’t reinventing the wheel every report. You’ve given the LLM concrete examples of what your reports look like: your team’s writing style, how figures are labeled, what kinds of tables you use. The LLM has read access to your calibration database. It knows your Instron gets used for tensile tests, can pull the serial number and calibration dates for the load cell and test frame, and it checks the data to see whether the extensometer was used without being prompted.

Agent workflow pipeline for structured technical authoring

I’ve been building tools in this direction as part of my own work, agent workflows designed for exactly this kind of structured, human-in-the-loop technical authoring. The results are real. What used to take ten to twelve hours can come down to under an hour when the tooling is purpose-built for the domain.

Run the math on a team of five scientists processing 250 test requests per year, averaging 4.5 hours per report. Drop that to 45 minutes with a well-built tool and you’ve saved your team roughly 940 hours over the year, nearly 190 hours freed per scientist. Think about the impact, five scientists now have an extra month each to spend on the work that actually requires their expertise: designing better experiments, investigating anomalies, pushing the boundaries of what your materials can do.

The Quality Engineer

Consider a quality engineer who’s an active resource on ten NPI projects spanning every phase of the product lifecycle: stage gate reviews, design reviews, supplier qualifications, process validations. Keeping all of this together is a genuinely challenging exercise in context management.

Now give that engineer an LLM that does pre-reads of incoming design review packages, flags obvious gaps against your work instructions and quality policies, and surfaces the items that need real attention. At four to five stage gates per project, dropping review prep from three to four hours down to one hour per gate saves 100 to 150 hours per year.

Add regulatory monitoring on top. Keeping up with REACH, RoHS, incoming legislation, EPA changes, that’s three to five hours a week of manual tracking for a diligent compliance engineer. Replace that with curated, AI-delivered digests that help you prioritize and tackle the high-impact items first, and you’re saving another 150-plus hours annually. Combined, that’s 250 to 300 hours back, six to seven weeks of working time, for a single engineer. Across a quality team, the numbers scale accordingly.

This isn’t theoretical. Augmentir, a connected worker platform, helped a battery manufacturer cut onboarding time by 40% and boost productivity by 17% by giving frontline workers AI-guided instructions, not by reducing headcount.^[4] The gains came from closing information gaps, not cutting people.

The Hybrid System

Dr. Michael Jordan (the UC Berkeley professor, not the basketball player) put this better than I could in a recent conversation on Machine Learning Street Talk. His framing: AI isn’t about building a super-intelligent oracle that replaces human judgment. It’s about closing the gaps where human systems were never going to scale: information flow, uncertainty, decisions made on incomplete signals.^[5]

Jordan draws an analogy to aviation. Modern flight is safe not because autopilots replaced pilots, but because the hybrid system, automation handling what humans aren’t built for, humans handling judgment and edge cases, is dramatically better than either alone. Safety is a property of the whole system, not any one component.

He extends the argument further: don’t model AI on autonomous vehicles, model it on air traffic control. We didn’t build chemical factories by first creating an artificial chemist. We built engineering disciplines with principles of analysis and design. AI needs the same approach. Infrastructure that channels and amplifies distributed human expertise, not a single system pretending to know everything.

This maps directly to every example I’ve described. The extrusion operator whose tribal knowledge is augmented by pattern-matching across thousands of historical runs. The scientist walking through a report with an LLM that knows the calibration database. The quality engineer reviewing AI-flagged gaps instead of reading every document from scratch. In each case, the system is better because both participants are doing what they’re good at.

Caitlin Kalinowski, who built hardware teams at Apple, Meta, and most recently OpenAI, made a complementary point on a recent episode of Lenny’s Podcast: AI’s next frontier is the physical world. Hardware, robotics, manufacturing, the sensing layer. She notes that AI literally cannot do real CAD yet; models don’t understand friction, weight, contact pressure, or surface texture.^[6] Manufacturing and traditional engineering are largely untouched. That’s a huge opportunity sitting on the table. The workers who hold institutional knowledge about how things actually get built become more critical, not less, as these tools mature.

The Investment Question

None of this is free. Building a purpose-built agent workflow for your test lab takes real effort. The tool needs type hinting, linting, version control, a test suite, an evaluation harness, guardrails, and logging, all the things you’d expect from production software. These are not skillsets you’d typically expect of a lab manager or a process engineer.

There are two paths here, and I think organizations should consider both.

The first is direct investment in training. A lab manager or process engineer who knows the domain better than anyone learns to build the tools their team needs. A frontier LLM subscription costs around $100 a month, and with that, a motivated technical professional can build a functional tool in less than a week. I’m walking this path myself, dumping the hours into learning software development practices because I know what my domain needs and I don’t want to wait for IT to get around to it. The ramp isn’t trivial; you need PR reviews from someone with the right software engineering background, and the learning curve is real. But once you’re there, you’re the person who knows both the domain and the tooling.

The second is the software advocate model. Maybe your organization has a developer who’s been freed up from routine work by their own productivity gains, and now they partner with domain experts to build the tools those teams need. The lab manager doesn’t have to learn git; the software advocate doen’t have to learn ASTM test methods. Each brings what they’re good at. This is the hybrid system applied to the build process itself.

Either way, the people closest to the problem have to be deeply involved in building the solution. An IT department working from a requirements doc written by someone three organizational layers removed from the shop floor will not build a tool that understands which extensometer was used on a tensile test. Now that code isn’t the bottleneck for these problems we have to recognize that domain expertise is.

And the people exploring these tools need access. Not everyone needs a full enterprise license, but let your teams experiment. Let people get creative with the technology. The best tools I’ve seen have come from people who weren’t waiting for permission.

The Horizon

Everything I’ve described so far is about catching up, closing digital debt, getting to where we arguably should have been years ago. But these tools are evolving fast, and the next generation is going to make this current wave look incremental.

Kalinowski’s observation that AI can’t do real CAD yet isn’t a permanent state. When designers start talking to their CAD tools about what assemblies should look like, getting a complex model roughed out in hours instead of weeks, then sending it straight to a 3D printer for prototyping, the cycle time from concept to physical part compresses by an order of magnitude. The World Economic Forum projects a net gain of 78 million jobs by 2030, with the future of work framed as tasks “nearly evenly divided between human, machine, and hybrid approaches.”^[7] Meanwhile, 1.9 million manufacturing jobs risk going unfilled by 2033.^[8] We don’t have a surplus of workers to replace. We have a shortage of workers to empower.

Even the “catching up” I’ve proposed here is near-sighted in many ways. The tools are going to get better, the integrations deeper, the friction lower. But the principle doesn’t change.

The Point

There is an enormous amount of low-hanging fruit in this world. Boundless opportunities in places that never make the AI discourse because they aren’t venture-backed startups in San Francisco. The shop floor running on paper folios, the test lab where five scientists spend half their time formatting reports instead of doing science, the quality engineer drowning in stage gate reviews with no tooling to triage the workload, or the regulatory team manually tracking legislative changes across three jurisdictions.

We’re not replacing these people. We’re giving them tools to close the gaps that were never going to close on their own: the information asymmetries, the manual processes, the institutional knowledge trapped in one person’s head. Whether that means a team becomes 30% more productive or an organization shifts to four six-hour workdays (my preference) is a choice. But the point of these tools, to me, is to give people back their time to focus on what’s important, not make people obsolete.

LLMs shouldn’t be pulling you away from your kids or making you lose your job. They should be freeing people up for what matters.

References

[1] The Manufacturer. “74% of Manufacturers Held Back by Disconnected Data.” themanufacturer.com

[2] McKinsey & Company. “Capturing the True Value of Industry 4.0.” mckinsey.com

[3] Code District. “Manufacturers Ditching Legacy Systems.” codedistrict.com

[4] Augmentir. “Connected Worker Platform Case Studies.” augmentir.com

[5] Machine Learning Street Talk. “Intelligence Is Collective, Not Artificial: Prof. Michael I. Jordan.” YouTube

[6] Lenny’s Podcast. “Why We’re at the Beginning of the AI Hardware Boom: Caitlin Kalinowski.” lennysnewsletter.com

[7] World Economic Forum. “Future of Jobs Report 2025.” weforum.org

[8] Deloitte & The Manufacturing Institute. “Supporting US Manufacturing Growth Amid Workforce Challenges.” deloitte.com