AI Summary
TLDR: The AI training dataset licensing market is projected to reach 22.6 billion dollars by 2034 according to MarketIntelo. Meta alone signed 7 publisher deals in 2025. The economics are seductive but the cannibalisation risk is real: content used to train LLMs feeds the same answer engines that displace publisher traffic. This is an opinion piece arguing publishers should license selectively, with hard structural protections, not blanket exclusivity.
The deal flow accelerated in 2025
A Digiday timeline of 2025 publisher AI deals documented dozens of named agreements across OpenAI, Anthropic, Meta, Microsoft, and Google. Meta alone signed 7 publisher licensing deals in the year. Headline values ranged from low seven figures to over 250 million dollars across multi-year terms.
Market analysts at MarketIntelo project the dataset licensing market to reach 22.6 billion dollars by 2034, growing at a roughly 25% CAGR from 2025. The pool of buyers is small (frontier labs and a handful of enterprises) but the cheque sizes are large.
The cannibalisation concern is not hypothetical
Every article fed into a training dataset improves the LLM’s ability to answer questions on that topic without sending the user to the original source. Publishers who license blindly are subsidising the construction of the same answer engines that suppress their referral traffic.
- The optimistic view: Licensing revenue replaces declining ad revenue, and visible attribution preserves brand equity.
- The pessimistic view: Even with attribution, AI engines absorb the answer and click-through rates collapse 60 to 90% on cited content.
- The honest view: Both are partially true. Outcome depends entirely on deal structure and content type.
When licensing makes sense (and when it does not)
Strong candidates for licensing
- Archive content older than 2 years, where future referral traffic value is low.
- Reference and reference-adjacent material (encyclopedic, definitional) where the marginal traffic value is already declining due to AI Overviews.
- Content in categories where you have already been displaced and licensing revenue is purely additive.
Poor candidates for licensing
- Original investigative reporting, where being the canonical source still drives meaningful direct traffic and subscriptions.
- Time-sensitive content where freshness is the moat.
- Premium subscriber-only content, where licensing may erode the paywall value proposition.
How to structure a defensible licensing deal
If you decide to license, the contract terms matter as much as the cheque size. Minimum protections to negotiate:
- Non-exclusivity by default. Exclusive deals lock you out of competing buyers and depress future market price.
- Mandatory attribution with linkback. The model must surface a clickable source citation, not a vague mention.
- Carve-outs for premium tiers. Exclude paywalled or member-only content from training corpora.
- Renegotiation triggers. If the buyer’s market cap or AI revenue grows beyond a threshold, your fee escalates.
- Right to audit usage. The buyer must report which prompts surface your content and at what frequency.
Track the downstream impact using the GEO/AEO Tracker. Publishers who license without monitoring downstream citation behaviour cannot tell whether the deal helped or hurt total brand visibility.
Frequently Asked Questions
Are smaller publishers being offered licensing deals too?
Does blocking AI crawlers via robots.txt strengthen my negotiating position?
What is a fair per-article licensing rate?
Want this implemented for your brand?
I help growth-stage companies own their category in AI search. Get an honest read on your licensing options.