AI Content at Scale Tends Toward the Average. Here Is Why, and What to Do About It.
On signal architecture, ADK skill design, and the engineering behind content that survives compression.
AI Content at Scale Tends Toward the Average
Something broke in how most organisations are thinking about AI content. The framing that keeps circulating is a human vs. AI binary: use AI for production speed, use humans for authenticity, keep the "crown jewels" away from the machines. It is a clean narrative. It is also missing the actual problem.
The actual problem is not where the humans are in the loop. It is what the humans put into the system before the AI starts generating. The teams producing generic, interchangeable AI content are not generating too much. They are not using the wrong model. They have not automated too aggressively. They have built pipelines that receive generic inputs and are surprised to get generic outputs.
This post is about how that failure maps to concrete architectural decisions in Google ADK, and what it looks like to build a content system that does not have this problem. We will get specific about skill file structure, signal encoding in references/, and model routing with Gemini 3, because the abstraction level of most conversations about this is too high to act on.
The Amplifier Property
Language models are statistical amplifiers. Given a rich, specific, opinionated input, they produce rich, specific, opinionated output. Given a generic input, they produce a synthesis of the training distribution adjacent to that input, which in 2026 is an enormous corpus of enterprise marketing content that has converged toward a very small vocabulary of safe, interchangeable expressions.
This is not a flaw. It is exactly what the model is supposed to do. The model cannot differentiate your output from your competitors' output if your input does not differentiate your position from your competitors' position.
The practical implication: the return on improving your input signal is higher than the return on upgrading your model. We have seen teams get dramatically better output by rewriting their skill instruction structure than by switching between frontier models. The question is not whether the model is good enough. It is whether the model has enough to work with.
Why System Prompts Are Not Enough
Most teams answer the signal problem by putting everything in the system prompt. Brand guidelines, audience descriptions, tone notes, output constraints. One large block of context at session initialisation.
This fails for two reasons.
First, the system prompt is static within a session. A content pipeline processing multiple campaigns, audiences, and formats in a single session cannot differentiate signal per task through a fixed session-level instruction. Every task gets the same base context. Everything drifts toward the same output.
Second, signal in the system prompt competes with everything else that accumulates in the context window. In a long-running multi-agent session, the specific owned claims that were meant to ground the output gradually become background noise diluted by conversation history, tool outputs, and intermediate results. The model is still following instructions, but the instructions are being crowded out.
The right architecture treats signal injection as a skill-level operation, not a session-level configuration. ADK's Skills feature exists precisely to solve this.
ADK Skills: The Three-Level Architecture
In ADK, a Skill is a self-contained unit of functionality loaded on demand by an agent. It is distinct from a tool (a callable Python function). A Skill is a file-based knowledge and instruction package organised into three levels. This incremental loading model is the structural property that makes Skills the right place to encode signal.
my_agent/
agent.py
skills/
campaign_brief/
SKILL.md # L1 metadata + L2 instructions (required)
references/
brand_signal.md # L3: owned claims, checksum, vocabulary
audience.md # L3: belief states per audience segment
brief_guide.md # L3: what a valid brief contains
assets/
brief_schema.json # L3: structured output format
content_generator/
SKILL.md
references/
voice_guide.md
anti_patterns.md
compliance_check/
SKILL.md
references/
brand_signal.md # same source of truth, referenced not duplicatedProgressive Skill Loading
The three levels load progressively:
- •L1 (metadata): The frontmatter of
SKILL.md. Name and description only. Loaded at agent startup for skill discovery. Negligible context cost. - •L2 (instructions): The body of
SKILL.md. The primary instructions the agent follows when this skill activates. Loaded when the skill triggers. - •L3 (resources): Everything in
references/andassets/. Detailed signal documents, templates, schemas. Loaded on demand as the skill needs them during execution.
Your brand_signal.md does not occupy context window space during tasks that do not require it. Your audience.md does not load during a compliance check. The signal is available when needed and absent when it is not. This is why Skills are architecturally superior to system prompts for signal encoding: the loading is scoped to the task.
What Good Signal Actually Looks Like
Before going into the skill structure, it is worth being precise about what "signal" means and what it does not mean.
Signal is specific, falsifiable, differentiated claims. Signal is: we deployed a multi-agent compliance system in an NDMO-classified data environment in Saudi Arabia. We reduced false positive escalations by 62%. Our co-founder built NLP systems for cancer genomics AI at Cambridge. These are not interchangeable with your competitors' claims.
Signal is not brand voice guidelines. "Bold, innovative, human-centric" is not signal. It is aspiration described with adjectives. The model cannot use it to make any real output decision because it cannot distinguish your adjectives from the identical adjectives your competitors use.
The test: if you can swap your company name with a direct competitor and the claim still reads true, it is not signal. It is category description. Category description produces category-average output.
The SKILL.md: Where Instruction Lives
The SKILL.md file is the L2 layer. It is the instruction the agent loads when it decides to use this skill. The quality of what you write here, and the specificity of what it asks the agent to load from references/, determines output quality.
Here is what a SKILL.md for a campaign brief generator looks like when signal architecture is taken seriously:
---
name: campaign-brief-generator
description: >
Generates structured campaign briefs grounded in brand signal and audience
belief state. Must be run before any content generation task.
---
## Purpose
Produce a brief that forces specificity before content generation begins.
The brief is not a form. It is the primary signal document passed to the
content generation skill. Every field must be specific enough that it could
not apply to a competitor.
## Steps
Step 1: Load `references/brand_signal.md`. Identify:
- The owned claims this brand has earned the right to make
- The brand checksum: 15-25 words that must survive any compression
- The anti-vocabulary: words that appear in every competitor's content
Step 2: Load `references/audience.md`. Identify:
- The audience segment relevant to this campaign
- Their current belief about this problem or solution category
- The specific belief change this campaign must produce
Step 3: Select a proof point. A proof point is a specific, verifiable claim
from `references/brand_signal.md` that supports the campaign's single truth.
Numbers, named outcomes, and specific capabilities qualify.
Generic claims ("industry-leading", "innovative") do not.
Step 4: Complete the brief using `assets/brief_schema.json` as the output
schema. Every field must be filled with content specific enough that it
could not be lifted into a competitor's brief unchanged.
Step 5: Before returning, apply these checks:
- Can the objective be met without the proof point? If yes, the proof point
is too weak. Replace it.
- Could this brief have been written six months ago? If yes, the context
field is not specific enough. Rewrite it.
- Does the single truth contain an unsupported superlative? If yes,
replace it with the proof point directly.The brand_signal.md: What the References Layer Contains
The references/ directory is where your brand signal lives as structured markdown documents. These are not exported brand guidelines PDFs. They are documents written for AI consumption: structured, specific, testable.
A brand_signal.md that works looks different from one that does not.
Does not work: A brand voice document that says "We are bold, innovative, and human-centric. We speak plainly and avoid jargon. We lead with outcomes." The model cannot use any of this to make a real output decision.
Works:
# Brand Signal Document
## Owned claims
Claims we have earned the right to make. Each has evidence behind it.
Do not generate content that makes claims not on this list.
- We deploy multi-agent AI systems in NDMO-classified data environments
- We have shipped production systems across three regulated industries
- Our co-founder built NLP systems for cancer genomics AI at Cambridge
- We reduced client false positive escalations by 62% in a compliance context
- We operate under PDPL, NCA ECC-2:2024, and NDMO standards in Gulf deployments
## Brand checksum
These words or their semantic equivalents must survive any compression
of content that represents this brand. If an AI summarises our content
and the checksum disappears, the summary has failed.
> AI-native. Built for regulated environments most vendors avoid.
> Run by practitioners who have shipped the systems, not described them.
## Vocabulary
Words and phrases that signal credibility to our audience:
- "production" (not "prototype", not "pilot")
- "regulated environment" (not "enterprise" or "complex organisation")
- "multi-agent" (not "AI solution" or "AI platform")
- "we built" / "we shipped" (first person, past tense, specific)
## Anti-vocabulary
Do not use. These appear in every competitor's content and carry no signal.
- cutting-edge, state-of-the-art, next-generation
- empower, transform, unlock potential
- seamless, robust, scalable, innovative
- "AI-powered" as a standalone modifier with no technical specificitySignal Architecture: Checksum and Anti-Vocabulary
Two things are worth calling out. The checksum concept is directly practical: it is a short string that encodes your positioning at maximum compression. If you take all your marketing content and ask a language model to summarise who you are in 20 words, and those words do not match your checksum, your signal is failing somewhere in the generation layer. The checksum is a testable standard, not a branding aspiration.
The anti-vocabulary list matters as much as the vocabulary list. Most brand documents tell the model what to say. A document that also tells the model what not to say, and implicitly why (because it appears in every competitor's content), produces substantially more differentiated output. The model has a direction to move away from as well as toward.
Loading Skills and Routing Models
With the skill directories built, loading them into an ADK agent is straightforward:
import pathlib
from google.adk import Agent
from google.adk.skills import load_skill_from_dir
from google.adk.tools import skill_toolset
skills_path = pathlib.Path(__file__).parent / "skills"
brief_skill = load_skill_from_dir(skills_path / "campaign_brief")
content_skill = load_skill_from_dir(skills_path / "content_generator")
compliance_skill = load_skill_from_dir(skills_path / "compliance_check")Model Routing with Gemini 3
The model routing question is separate from skill loading. Different tasks have genuinely different requirements, and treating model selection as a system-level decision is expensive, slow, and produces inconsistent quality. In ADK, each agent has its own model parameter.
A note on gemini-3-flash-preview: Gemini 3 Flash replaces 2.5 Flash as the default production model for speed-optimised tasks. Its tool use and reasoning performance at Flash latency is substantially better than its predecessor. For agentic pipelines where the bottleneck is generation throughput rather than reasoning depth, it is the right choice. For reasoning-heavy orchestration tasks like brief generation, gemini-3.1-pro-preview is the current recommended model following the deprecation of Gemini 3 Pro Preview in March 2026.
# Brief generation: reasoning-heavy. Use the most capable model.
brief_agent = Agent(
model="gemini-3.1-pro-preview",
name="brief_agent",
description="Generates campaign briefs grounded in brand signal.",
instruction=(
"Use the campaign-brief-generator skill. Load brand_signal.md and "
"audience.md before making any output decisions. Apply all validation "
"checks before returning the brief."
),
tools=[skill_toolset.SkillToolset(skills=[brief_skill])],
)
# Long-form generation: quality-critical. Gemini 3 Flash.
longform_agent = Agent(
model="gemini-3-flash-preview",
name="longform_agent",
description="Generates long-form brand content from a campaign brief.",
instruction=(
"Use the content-generator skill. Treat the campaign brief as the "
"primary signal document. Do not generate content that makes claims "
"not present in the brief's proof point or brand signal."
),
tools=[skill_toolset.SkillToolset(skills=[content_skill])],
)
# Social copy: higher volume, shorter outputs, faster turnaround.
social_agent = Agent(
model="gemini-3-flash-preview",
name="social_agent",
description="Generates social copy from a campaign brief.",
instruction=(
"Use the content-generator skill for social formats. "
"The brand checksum must survive in every output, even at short lengths."
),
tools=[skill_toolset.SkillToolset(skills=[content_skill])],
)
# Compliance check: structured validation, not generation.
compliance_agent = Agent(
model="gemini-3-flash-preview",
name="compliance_agent",
description="Validates content against brand signal requirements.",
instruction=(
"Use the compliance-check skill. Validate checksum presence, "
"anti-vocabulary absence, and claim coverage against owned claims. "
"Return PASS or FAIL with specific notes, never a judgment call."
),
tools=[skill_toolset.SkillToolset(skills=[compliance_skill])],
)The Orchestrator: Enforcing Sequence
The most important structural decision is enforcing that the brief runs before content generation. In ADK, this is done through the orchestrator agent's instruction layer.
The instruction is doing something specific: it makes the brief mandatory, not optional. The operator cannot skip it. The compliance check is not advisory. This structural enforcement is what separates a content system that maintains signal quality over time from one that degrades as operators find shortcuts.
orchestrator = Agent(
model="gemini-3-flash-preview",
name="content_orchestrator",
description="Routes content production tasks through the brief-first pipeline.",
instruction=(
"You coordinate content production. The sequence is fixed:\n"
"1. Always run brief_agent first to produce a campaign brief.\n"
"2. Pass the brief as input to the appropriate content agent "
" (longform_agent for articles and blog posts, social_agent for "
" social formats).\n"
"3. Pass all generated content through compliance_agent before "
" returning it to the user.\n"
"Do not generate content without a brief. Do not return content that "
"has not passed the compliance check."
),
agents=[brief_agent, longform_agent, social_agent, compliance_agent],
)The Compression Test
There is a practical test worth running on any content pipeline before declaring it production-ready. Take a representative sample of output, feed it to three different language models, and ask each one the same question: based solely on this content, what does this company do, what specific claims does it make, and how is it different from competitors in its category?
If the answers are specific and consistent, the signal is working. The brand checksum is surviving generation. The owned claims are present. The anti-vocabulary is absent. The system is doing what it is supposed to do.
If the answers are generic, or if different models produce incompatible descriptions of the same brand, the signal is failing. The owned claims are not making it through the generation layer. The brief is not grounding the output tightly enough. The references/ documents need more specificity.
This test is not a one-time diagnostic. Run it on a sample of output every few weeks. Brand drift in AI content systems is gradual and hard to notice from inside the system. The output continues to look fine. It just stops meaning anything.
What Good Architecture Produces
A content system with strong signal architecture produces output that, when you strip the logo and the formatting, reads as something a specific organisation would say. Not something any organisation in the category could say.
The work that produces this is upstream of the model and unglamorous relative to the speed of generation. Writing a brand_signal.md with owned claims that have evidence behind them, not adjectives describing aspiration. Writing a campaign_brief skill that enforces proof point quality before a single piece of content is generated. Keeping a single brand_signal.md as a shared reference across the brief generator, content generator, and compliance check skills, so there is one source of truth and no drift between them.
ADK's skill structure makes this work concrete and maintainable. The SKILL.md is the instruction layer. The references/ directory is the signal library. The incremental loading model means signal documents occupy context window space only when the task requires them.
When these layers are built with discipline, the model has something to amplify. When they are not, it amplifies the median of everything it has seen before.
The models are not the constraint. The skill files are.
Bayseian builds AI-native content and marketing systems for enterprise clients across the UK, GCC, and Asia. We specialise in multi-agent architectures on Google Cloud and Vertex AI.
If you are building an AI content pipeline and want to talk about signal architecture, reach out.
Related Articles
Why Your AI Coding Agent Keeps Going Off-Script — And How to Fix It
How spec-driven development turns unpredictable AI agents into reliable software factories through deterministic orchestration, bounded execution, and automated evaluation.
AI & Machine LearningAI Governance in Saudi Arabia: Building the Technical Foundations for Responsible AI at Scale
How PII redaction, document classification, and data governance are becoming critical capabilities for organisations operating under the Kingdom's rapidly evolving regulatory framework.