Technical SEO

How to Decode AI Mode Citations: Text Fragments Tutorial (with Python)

Updated 3 min read Daniel Shashko
How to Decode AI Mode Citations: Text Fragments Tutorial (with Python)
AI Summary
Google AI Mode and Gemini citation URLs include a #:~:text= fragment that encodes the exact sentence cited from a source page. A 5-line Python script can decode these fragments, allowing users to audit their own citations or reverse-engineer competitor strategies. This method works for 70.9% of AI Mode citations and 51.8% of Gemini citations.

TLDR: Google AI Mode and Gemini citation URLs end in a #:~:text= fragment that encodes the exact sentence Google cited from the source page. With 5 lines of Python you can decode any citation, audit which of your sentences are getting picked up, and reverse-engineer competitor strategies. This tutorial walks through the spec, the Python decoder, a batch audit script, and three practical use cases for SEO and content teams.

What is a #:~:text= fragment?

Web Text Fragments are a Chromium specification (documented at web.dev) that let a URL encode an exact passage to highlight on the destination page. The format is:

#:~:text=[prefix-,]textStart[,textEnd][,-suffix]

  • textStart: the cited sentence (the primary extraction target)
  • textEnd: optional range end for multi-sentence spans
  • prefix- and -suffix: disambiguating context when the same sentence appears multiple times on the page

Browsers use these fragments to scroll the user to the cited passage and visually highlight it. Google’s AI Mode and Gemini repurpose the same mechanism to encode the exact sentence they extracted, which is what makes sentence-level citation analysis possible.

The 5-line Python decoder

Here is the minimal decoder:

from urllib.parse import unquote, urlparse
def decode(url):
    frag = urlparse(url).fragment
    if not frag.startswith(':~:text='):
        return None
    return unquote(frag.replace(':~:text=', '').split(',')[0])

Pass any AI Mode citation URL into decode() and you get back the exact sentence Google cited. For example:

decode("https://www.healthline.com/nutrition/intermittent-fasting-guide#:~:text=Intermittent%20fasting%20is%20an%20eating%20pattern%20that%20cycles%20between%20periods%20of%20fasting%20and%20eating")

returns: "Intermittent fasting is an eating pattern that cycles between periods of fasting and eating"

Three use cases for the decoder

  1. Audit your own citations. Run an AI Mode query for your target keywords, copy the citation URLs, decode the fragments, and see exactly which of your sentences Google chose. This tells you which writing patterns work on your domain.
  2. Reverse-engineer competitor strategy. Decode citations from competitor pages on the same query and look for patterns: do their cited sentences sit in the intro, the FAQ, or the conclusion? Are they 8 words or 14? Plain or technical? This is competitive intelligence at the sentence level.
  3. Build a citation tracker. Schedule a weekly AI Mode scrape of your top 50 keywords, decode all fragments, and store sentence + URL + query in a database. Over time you build a corpus of which sentence patterns earn citations on your domain.

Batch audit script

If you want to process a CSV of citation URLs, here is the loop:

import csv
from urllib.parse import unquote, urlparse

with open('citations.csv') as f, open('decoded.csv', 'w', newline='') as out:
    reader = csv.DictReader(f)
    writer = csv.writer(out)
    writer.writerow(['url', 'cited_sentence'])
    for row in reader:
        frag = urlparse(row['url']).fragment
        sent = unquote(frag.replace(':~:text=', '').split(',')[0]) if ':~:text=' in frag else ''
        writer.writerow([row['url'], sent])

You can find a production-ready version with prefix/suffix handling, multi-fragment URLs, and source-page matching in the grounding-citation-analysis repo.

Coverage caveats

Not every citation URL contains a fragment. From our 42,971 citation dataset:

  • AI Mode: 70.9% of citations have a fragment
  • Gemini: 51.8% of citations have a fragment
  • ChatGPT, Perplexity, Copilot, Grok: 0% (these platforms do not use text fragments)

Citations without fragments give you the URL but not the cited sentence. You can still scrape the source page and try to infer which passage was likely cited (we used token overlap with the answer text), but it is messier than the fragment approach.

Frequently Asked Questions

Do all browsers support text fragments?
Chromium-based browsers (Chrome, Edge, Brave) support them natively. Safari added support in version 16.1. Firefox supports them behind a flag. The fragment is part of the URL regardless of browser support, so the decoder works on any URL string.
Why do some fragments have prefix and suffix?
When the cited sentence appears multiple times on the source page, the prefix and suffix disambiguate which occurrence to highlight. For citation analysis you can usually ignore them and just extract textStart.
Can I use this to track competitor citations?
Yes. Run AI Mode queries for shared target keywords, capture competitor citation URLs, decode their fragments, and analyse the patterns. This is one of the highest-signal competitive intelligence techniques in 2026.
Is there a no-code tool for this?
Not yet for the general public. The grounding-citation-analysis repo is the closest thing to an open-source toolkit. Several SEO platforms are building citation-fragment dashboards but adoption is patchy as of Q2 2026.

Want this implemented for your brand?

I help growth-stage companies own their category in AI search. Build a custom citation tracker for your domain.