When AI Wrote Our Specs in 1 min and Fixing It Took 6 Months

As a Project Manager in a large enterprise, I recently inherited from my colleagues a task that was supposed to take just ten days (and was sold to the client as such), but ended up being a half-year saga for my colleagues. It looked simple: create a template for better designing APIs. The problem was: They reached for AI to get a head start.

The 20-page monster

The first AI-generated iteration produced a twenty-page Word document. It was articulate, well-structured, and looked professional. That was exactly the problem. Behind the polished surface, the document had almost nothing in common with our actual environment. It discussed abstract API design concepts, listed every possible edge case, and referenced patterns that simply did not make sense in our stack.

What followed was predictable in hindsight. Instead of becoming a foundation we could build on, the document became a legacy asset. Our most senior engineers were pulled in. Each one, rightly, wanted to argue on each section. Weeks were spent debating the importance of individual paragraphs, whether a given topic deserved a place in the template, whether a generic caching discussion applied to our services at all.

There was another issue. Because the document read as impersonal, almost robotic, nobody truly took ownership of it. It belonged to no one. And a deliverable that belongs to no one moves very slowly through an organization. Every step of validation required finding someone willing to put their name on content they had not written.

Finally, after spending some evenings on it, I got it done. By the time management looked at the bill, the conclusion was uncomfortable. They have essentially paid for debating over generated text. A simple template based on our existing internal specifications would have solved 80% of our design needs in a few days.

When AI-generated documentation actually harms

This was not a local accident. It is a pattern I now see across Confluence, design docs, and internal reports in our company. Many of these pages are AI-assisted, and a growing share of them carry what I would call toxic documentation: content that looks coherent but contradicts other documents in the system, has no verifiable link to how things actually work, or manufactures a false sense of consistency where facts are missing.

An expert normally operates inside a living network of people, decisions, and process context. An LLM does not hold that context. A generated document can be internally consistent and still be wrong the moment it touches the rest of the organization. Worse, once it is published, downstream reports, procedures, and code start relying on it. The contamination spreads quietly.

Let’s look at two other examples in Amazon and Deloitte:

Amazon introduced mandatory review of AI-generated code after a six-hour outage: https://habr.com/ru/news/1008860/
Errors in Deloitte reports involving AI-generated sources: https://oecd.ai/en/incidents/2025-11-26-f9ab

What I now do differently

After this experience, my team and I changed how we use AI in documentation. A few practical principles stand out:

Use contextual prompts or agents with tightly defined knowledge.
A generic chat session with a generic model will produce a generic document. If you want output that fits your company, you have to feed it your company: existing standards, your architecture, your naming conventions, the specific systems involved. The less context you provide, the more the model will invent plausible-sounding filler that nobody in the room actually needs.

AI-generated content validated by SME, early.
Not at the end, when twenty pages exist and every section has political weight, but at the outline stage, when it still costs nothing to throw things away. Knowledgable SME review should happen before the document becomes something people feel they need to debate line by line.

The grapevine effect.
When AI output becomes the input for the next AI task, which becomes the input for a human summary, which becomes the input for another prompt, the final artifact has very little left of the original source. This is the same dynamic as a rumor traveling through a team. Each individual step feels reasonable. The end state is not.

Corporate language is not everyday language.
An LLM is trained on the general internet, not on your company. A term that carries a precise meaning in your meetings will be interpreted by the model as its common, same-sounding equivalent. Acronyms, internal product names, domain-specific jargon, team nicknames — all of these can silently shift meaning in generated text. You will not always notice until a downstream reader acts on the wrong interpretation.

The 30% AI Content Rule.
The “30% AI rule” is a simple guideline used in education: when producing a piece of work, no more than 30% of it should come directly from AI tools. It was designed to keep students honest in university essays, but the principle translates well to enterprise documentation. Applied to our case, it would have saved us months of iterations.

The real cost

The most important lesson is about cost asymmetry. Generating plausible text is essentially free. Verifying it, reconciling it with the rest of the company’s knowledge, and cleaning it up once it has spread came at a much higher cost.

AI is genuinely useful in documentation work. It accelerates drafts delivery and removes blank-page paralysis. But if you do not verify sources, and if you lack a clear policy on documentation approval and review, you will not save time with AI. You will create a legacy that pollutes your organization by spreading false information through every document that builds on it.