Does this decoder still work for AI Mode citation URLs?

No. AI Mode dropped its text fragment coverage from 70.9% in March 2026 to 0% in May 2026. AI Mode citation URLs no longer carry a #:~:text= fragment, so the decoder returns None for all of them. The decoder is now specifically a Gemini tool.

What percentage of Gemini citation URLs carry a text fragment?

84.1% of Gemini citation URLs carry a text fragment, based on our analysis of 13,487 Gemini citation URLs in the May 2026 study of 153,425 total citations across six platforms. The remaining 15.9% of Gemini citations give you the URL only, with no sentence-level data.

How long are the sentences Gemini typically cites?

Cited sentences in our May 2026 dataset average 9.27 words, with a median of 10 words. The 6-10 word range accounts for 45.2% of all Gemini citations. No cited sentence in the dataset exceeds 18 words. Sentences longer than 18 words are not cited.

Where on the page do Gemini typically cite sentences from?

The mean cited sentence position is 37% through the document. 74.9% of cited sentences appear in the first half of the page. If your most important claims are buried past the midpoint, Gemini's retrieval pipeline is unlikely to reach them.

Do other AI platforms like ChatGPT or Perplexity use text fragments?

No. ChatGPT, Perplexity, Copilot, and Grok have never used text fragments in their citation URLs. For those platforms, citation analysis is limited to domain frequency and cross-platform overlap. Sentence-level extraction from citation URLs only works for Gemini.

How to Decode Gemini Citations: Text Fragments

AI Summary

Gemini is now the only major AI platform that encodes cited sentences in citation URLs via text fragments, with 84.1% fragment coverage across 13,487 citation URLs in our May 2026 study of 153,425 citations. AI Mode dropped from 70.9% to 0% over the same period. The decoder uses urllib.parse.unquote and urlparse from the Python standard library. Cited sentences average 9.27 words; the 6-10 word range accounts for 45.2% of citations. Mean cited sentence position is 37% through the document; 74.9% of cited sentences are in the first half of the page. Readability is bimodal: 22.9% Very Easy (Flesch 90+) and 20.5% Very Confusing (under 30), with a dead zone at Flesch 50-59 (2.6%). The batch script handles prefix/suffix components and outputs a decoded CSV ready for positional analysis.

Gemini encodes the exact sentence it cites from a source page inside the citation URL itself. With a few lines of Python you can decode any Gemini citation, audit which of your sentences are getting picked up, and run a batch audit against a full CSV of URLs. This tutorial covers the spec, the decoder, a production-ready batch script, and three analytical use cases.

What changed: AI Mode fragments are dead, Gemini fragments are everywhere

In our March 2026 study of 42,971 citations, Google AI Mode carried a #:~:text= fragment in 70.9% of its citation URLs. Gemini was at 51.8%. By May 2026, that picture had completely reversed. Our analysis of 153,425 citations across six platforms found AI Mode fragment coverage had dropped to 0%. Google silently removed the mechanism between March and May 2026. Gemini went the other direction: 84.1% of Gemini’s 13,487 citation URLs now carry a text fragment, up from 51.8% two months earlier.

If you were monitoring AI Mode citation URLs for fragment data, your scripts have been returning empty strings since May 2026. Pivot that tooling to Gemini. Gemini is now the only major AI platform that encodes, at the sentence level, exactly what it cited from your page.

Fragment coverage by platform (May 2026, 153,425 citations)

Platform	Fragment coverage	Change vs March 2026
Gemini	84.1%	Up from 51.8%
AI Mode	0%	Down from 70.9%
ChatGPT	0%	No change
Perplexity	0%	No change
Copilot	0%	No change
Grok	0%	No change

ChatGPT, Perplexity, Copilot, and Grok have never used text fragments. For those four platforms you can analyse domain frequency and citation overlap, but you cannot decode the cited sentence from the URL alone.

What is a #:~:text= fragment?

Web Text Fragments are a Chromium specification (documented at web.dev) that let a URL encode an exact passage to highlight on the destination page. The format is:

#:~:text=[prefix-,]textStart[,textEnd][,-suffix]

textStart: the cited sentence (the primary extraction target)
textEnd: optional range end for multi-sentence spans
prefix- and -suffix: disambiguating context when the same sentence appears more than once on the page

Browsers use these fragments to scroll the user to the cited passage and visually highlight it. Gemini repurposes the same mechanism to encode the exact sentence it extracted, which is what makes sentence-level citation analysis possible. This is the technical foundation of our atomic sentence research: cited sentences in our May 2026 dataset average 9.27 words, with a median of 10 words, and none exceed 18 words. The 6-10 word range accounts for 45.2% of all cited sentences.

The Python decoder

The decoder uses only the Python standard library. No dependencies to install:

from urllib.parse import unquote, urlparse

def decode_gemini_citation(url):
    frag = urlparse(url).fragment
    if not frag.startswith(":~:text="):
        return None
    parts = frag.replace(":~:text=", "").split(",")
    text = [p for p in parts if not p.endswith("-") and not p.startswith("-")]
    return unquote(text[0]) if text else None

Pass any Gemini citation URL into decode_gemini_citation() and you get back the exact sentence Gemini cited. For example:

decode_gemini_citation(
    "https://www.healthline.com/nutrition/intermittent-fasting-guide"
    "#:~:text=Intermittent%20fasting%20is%20an%20eating%20pattern"
    "%20that%20cycles%20between%20periods%20of%20fasting%20and%20eating"
)
# Returns: "Intermittent fasting is an eating pattern that cycles between periods of fasting and eating"

Mental walkthrough of each line:

urlparse(url).fragment extracts everything after the #. For a text fragment URL this gives you :~:text=Intermittent%20fasting...
The guard clause if not frag.startswith(":~:text=") returns None for citation URLs without a fragment (15.9% of Gemini citations carry no fragment).
.replace(":~:text=", "") strips the prefix, leaving the encoded text parameters.
.split(",") breaks the fragment into its components, then the list comprehension drops any prefix- (ends with a hyphen) and -suffix (starts with a hyphen) parts, leaving textStart as the first surviving component.
unquote() decodes percent-encoding back to readable text.

The citation URL decoding flow

Three use cases for the decoder

Audit your own citations. Run a Gemini query for your target keywords, copy the citation URLs, decode the fragments, and see exactly which of your sentences Gemini chose. Cross-reference with the top-35% positional rule: in our May 2026 data, the mean cited sentence sits at 37% through the document, with 74.9% of cited sentences in the first half of the page. If Gemini is citing your footer, something is wrong.
Reverse-engineer competitor strategy. Decode citations from competitor pages on the same query and look for patterns: do their cited sentences sit in the intro, the FAQ, or the conclusion? Are they 8 words or 14? Plain or technical? Our bimodal readability research shows Gemini cites both very easy content (Flesch 90+, 22.9% of citations) and very confusing technical content (Flesch under 30, 20.5% of citations). The dead zone is Flesch 50-59, where just 2.6% of cited sentences land. If your competitor pages all live in that middle register, that is an opening.
Build a citation tracker. Schedule a weekly Gemini scrape of your top 50 keywords, decode all fragments, and store sentence plus URL plus query in a database. Over time you build a corpus of which sentence patterns earn citations on your domain. This is the foundation of serious AI brand visibility tracking.

Batch audit script

If you want to process a CSV of Gemini citation URLs, here is the complete loop with prefix/suffix handling:

import csv
from urllib.parse import unquote, urlparse

def decode_fragment(url):
    frag = urlparse(url).fragment
    if ":~:text=" not in frag:
        return ""
    text_part = frag.split(":~:text=", 1)[1]
    components = text_part.split(",")
    # Strip prefix (ends with -) and suffix (starts with -)
    text_components = [c for c in components if not c.endswith("-") and not c.startswith("-")]
    if not text_components:
        return ""
    return unquote(text_components[0])

with open("citations.csv") as f, open("decoded.csv", "w", newline="") as out:
    reader = csv.DictReader(f)
    writer = csv.writer(out)
    writer.writerow(["url", "cited_sentence"])
    for row in reader:
        sent = decode_fragment(row["url"])
        writer.writerow([row["url"], sent])

The script expects a CSV with a url column. It writes decoded.csv with the original URL and the decoded cited sentence side by side. Rows without a fragment get an empty string. A production-ready version with source-page matching and sentence-boundary chunking analysis is in the grounding-citation-analysis repo.

What to do with decoded sentences

Decoding is the easy part. The analytical work is matching decoded sentences back to your page and drawing conclusions. For each decoded sentence, check three things:

Position on page: Is it in the top third? The mean cited sentence sits at 37% through the document in our May 2026 dataset. Sentences cited from below the 75th percentile of a page are unusual and worth investigating.
Word count: The 6-10 word range accounts for 45.2% of all Gemini citations. Sentences over 18 words are never cited. If your cited sentences are consistently short, that pattern is worth replicating in new content.
Structural context: Is the cited sentence from a list item, a table cell, or prose? See our schema markup research for how structure signals affect AI retrieval.

For AI Mode-specific signals now that AI Mode no longer emits fragments, our Google AI Mode optimization playbook covers the updated approach. For Gemini specifically, the complete Gemini optimization guide ties fragment-level findings to on-page changes with the highest citation impact. To track Gemini citation share alongside ChatGPT and Perplexity in a unified dashboard, our open-source GEO/AEO Tracker handles multi-platform monitoring at no cost.

The practical optimization loop

In our client work, we run a weekly cycle: collect Gemini citation URLs for target queries, decode all fragments, match each decoded sentence to its position on the source page, and update a running spreadsheet of sentence patterns. Over 4-6 weeks, clear patterns emerge. Sentences in the top 40% of the page that are 7-12 words and state a single declarative fact dominate the cited-sentence corpus. That is not a coincidence: it reflects how Gemini’s retrieval pipeline scores and selects sentences.

The decoder is a diagnostic tool, not a magic fix. What it gives you is ground truth about which sentences Gemini selected, so you stop guessing and start measuring. Run a GEO audit alongside the decoder output to identify the structural and semantic signals your highest-cited pages share. Apply those patterns to pages with strong organic traffic but low Gemini citation share. For a full picture of how fragment data fits into a sentence-level GEO strategy, the 153,425-citation May study post covers readability, positional bias, and platform-by-platform citation playbooks in detail.

How to Decode Gemini Citations: Python Text Fragment Tutorial