AI Summary
TLDR: Google AI Mode and Gemini citation URLs end in a #:~:text= fragment that encodes the exact sentence Google cited from the source page. With 5 lines of Python you can decode any citation, audit which of your sentences are getting picked up, and reverse-engineer competitor strategies. This tutorial walks through the spec, the Python decoder, a batch audit script, and three practical use cases for SEO and content teams.
What is a #:~:text= fragment?
Web Text Fragments are a Chromium specification (documented at web.dev) that let a URL encode an exact passage to highlight on the destination page. The format is:
#:~:text=[prefix-,]textStart[,textEnd][,-suffix]
- textStart: the cited sentence (the primary extraction target)
- textEnd: optional range end for multi-sentence spans
- prefix- and -suffix: disambiguating context when the same sentence appears multiple times on the page
Browsers use these fragments to scroll the user to the cited passage and visually highlight it. Google’s AI Mode and Gemini repurpose the same mechanism to encode the exact sentence they extracted, which is what makes sentence-level citation analysis possible.
The 5-line Python decoder
Here is the minimal decoder:
from urllib.parse import unquote, urlparse
def decode(url):
frag = urlparse(url).fragment
if not frag.startswith(':~:text='):
return None
return unquote(frag.replace(':~:text=', '').split(',')[0])
Pass any AI Mode citation URL into decode() and you get back the exact sentence Google cited. For example:
decode("https://www.healthline.com/nutrition/intermittent-fasting-guide#:~:text=Intermittent%20fasting%20is%20an%20eating%20pattern%20that%20cycles%20between%20periods%20of%20fasting%20and%20eating")
returns: "Intermittent fasting is an eating pattern that cycles between periods of fasting and eating"
Three use cases for the decoder
- Audit your own citations. Run an AI Mode query for your target keywords, copy the citation URLs, decode the fragments, and see exactly which of your sentences Google chose. This tells you which writing patterns work on your domain.
- Reverse-engineer competitor strategy. Decode citations from competitor pages on the same query and look for patterns: do their cited sentences sit in the intro, the FAQ, or the conclusion? Are they 8 words or 14? Plain or technical? This is competitive intelligence at the sentence level.
- Build a citation tracker. Schedule a weekly AI Mode scrape of your top 50 keywords, decode all fragments, and store sentence + URL + query in a database. Over time you build a corpus of which sentence patterns earn citations on your domain.
Batch audit script
If you want to process a CSV of citation URLs, here is the loop:
import csv
from urllib.parse import unquote, urlparse
with open('citations.csv') as f, open('decoded.csv', 'w', newline='') as out:
reader = csv.DictReader(f)
writer = csv.writer(out)
writer.writerow(['url', 'cited_sentence'])
for row in reader:
frag = urlparse(row['url']).fragment
sent = unquote(frag.replace(':~:text=', '').split(',')[0]) if ':~:text=' in frag else ''
writer.writerow([row['url'], sent])
You can find a production-ready version with prefix/suffix handling, multi-fragment URLs, and source-page matching in the grounding-citation-analysis repo.
Coverage caveats
Not every citation URL contains a fragment. From our 42,971 citation dataset:
- AI Mode: 70.9% of citations have a fragment
- Gemini: 51.8% of citations have a fragment
- ChatGPT, Perplexity, Copilot, Grok: 0% (these platforms do not use text fragments)
Citations without fragments give you the URL but not the cited sentence. You can still scrape the source page and try to infer which passage was likely cited (we used token overlap with the answer text), but it is messier than the fragment approach.
Frequently Asked Questions
Do all browsers support text fragments?
Why do some fragments have prefix and suffix?
Can I use this to track competitor citations?
Is there a no-code tool for this?
Want this implemented for your brand?
I help growth-stage companies own their category in AI search. Build a custom citation tracker for your domain.