How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs)

Jun 30, 2026 08:30 PM - 4 hours ago 215

I support getting the aforesaid mobility from clients and SEOs (GEOs?).

“How do we show up successful ChatGPT?”

The reply is ever the same. Write bully content, do listicles, remark connected Reddit.

The usual.

But, really do we really cognize immoderate of that works? Most of it gets repeated connected faith, 1 master quoting the last.

So, alternatively of taking it connected trust, I spent a fewer days reference what ChatGPT sends my browser underneath the reply. The earthy web traffic, successful readable JSON.

This is simply a walk-through of what I found, astir successful the bid I recovered it.

Before you quote a number from this, publication this. It’s 1 person, 1 logged-in Pro account, a fewer days of traffic, not a organization study. I logged astir 1,240 root records crossed a fewer twelve searches. The structural findings, the fields ChatGPT uses and really they behave, are firm, because you only request to spot a section erstwhile to cognize it’s real, and I saw them again and again. The numbers and percentages are a different matter. They travel from a mini batch of mostly SaaS and tech queries, truthful dainty them arsenic direction, not measurement. I emblem which is which throughout.

How This Differs From The Big Visibility Studies, And What You Can Take To The Bank

There are 2 ways to do specified a study, and they constituent successful other directions.

The large studies, the ones the platforms and the well-funded devices run, occurrence thousands of prompts, grounds which brands look successful the answers, and rotation that up into share-of-voice reports. Large sample, but achromatic box. They only ever spot the vanished answer, truthful they person to infer the machinery underneath from the output.

This is the different measurement round. I publication the web traffic, the JSON the motor sends to my ain browser, and assistance retired the engine’s ain soul labels: the result_source it stamps connected each result, the turn_use_case it files each query under, the vendor names, the search queries it wrote, the exemplary it really ran. I’m not measuring really often thing happens crossed a population. I’m documenting that the instrumentality has a thing, and what the instrumentality calls it.

That quality decides what you tin spot here, truthful I americium going to beryllium blunt astir it.

2 Confidence Levels, Do Not Mix Them Up

Structural Facts (High Confidence)

That result_source exists and carries serp, labrador, bright, oxylabs. That agleam is Bright Data and oxylabs is Oxylabs. That location are six turn_use_case values. That matter queries skip the web entirely. That Thinking fires dozens of site: and price-verification sub-queries. These are publication consecutive disconnected the wire. One cleanable seizure proves a section exists and what it is named, and a punctual lawsuit study, nevertheless enormous, cannot spot immoderate of it.

Frequency Observations (Directional Only)

Anything pinch a percent aliases a ranking, “70% bright,” “Reddit is the astir cited domain,” “YouTube ne'er gets cited,” comes from tens of queries connected a azygous account, and my ain query prime skews it. I picked SaaS and tech, which is precisely why Reddit and the tech reappraisal hubs lead here; a batch of wellness aliases manner queries would crown different ones. Read these arsenic the style of the thing, not the measurement. Where a guidance has a mechanical logic down it (Reddit is matter truthful it gets quoted, YouTube is video (metadata) truthful it does not), spot the guidance and disregard the nonstop number.

First, The Boring Truth About ‘Packet Analysis’

Skip this conception if you don’t want to get into nitty-gritty method details.

My first small heart was wrong. You cannot sniff packets and publication queries, because the payload is TLS-encrypted, truthful a seizure hands you scrambled ciphertext for the existent messages. What the seizure does leak is the metadata.

The destination hostname, the IPs, and the truth that the ChatGPT app talks complete QUIC (HTTP/3), not plain TCP. That is why, successful the screenshot below, Wireshark tin still show “openai” successful the handshake. It sounds the unencrypted server name, not the conversation. QUIC obfuscates its first packet pinch fixed keys from the spec, truthful a instrumentality tin unwrap that opening packet to show the ClientHello.

Image Credit: Suganthan Mohanadasan

The existent petition and consequence bodies beryllium successful later protected payloads that enactment unreadable. So the readable furniture is the browser itself, aft decryption, successful the Network panel.

That’s wherever the queries, the answers, and each the metadata unrecorded arsenic JSON.

This is HTTP inspection, not packet sniffing, and it’s worthy saying because half the group who effort this commencement pinch Wireshark and springiness up. (I cognize I did lol.)

Two things that did not work, truthful you do not repetition them.

  1. Driving a cleanable automated Chrome sewage maine difficult blocked by Cloudflare wrong a fewer queries connected a different engine: the “verifying you are human” wall conscionable loops everlastingly successful an automated browser, truthful I moved to my existent Chrome pinch my existent sessions.
  2. On ChatGPT, the reply ne'er showed up successful my seizure astatine first, because it streams complete a long-lived relationship opened astatine page load that a hook installed mid-session cannot see. More connected some later.

The Field That Labels Every Source

I opened DevTools, turned connected Preserve log, ran a normal query, and searched the responses for thing that looked for illustration a label.

The section that came backmost was result_source. It sits connected each web consequence ChatGPT pulls; you ne'er spot it successful the answer, and it takes 1 of 4 values.

Mark Williams-Cook shared that he had recovered 3 of these; I came crossed the fourth. I past saw Metehan’s post, and it looks for illustration he whitethorn person already recovered it too. But honestly, this is not really astir who recovered what first. It is much astir sharing what we are seeing, comparing notes, and learning from each other.

Image Credit: Suganthan Mohanadasan

Here’s 1 root from the traffic, trimmed to the fields that matter.

{ "attribution": "TechRadar", "url": "https://www.techradar.com/best/...", "snippet": "...", "pub_date": "2026-05-09", "result_source": "labrador" }

The 4 values it uses:

result_source What it is
serp The unfastened web baseline, mostly seen connected news (Yahoo, StreetInsider)
labrador An allowlist of established publishers. Reuters, The Guardian, the WSJ, the FT, Wikipedia, moreover arXiv. Snippets tally to ~1,080 characters, fundamentally full-article extracts
bright Bright Data, a commercialized web scraper. Dominant for shopping, finance, weather, local.
oxylabs Oxylabs, a rival scraper. Regional and section press, immoderate unfastened web

labrador looks for illustration a licensed tier, respective of those publishers person signed contented deals pinch OpenAI, and it isn’t 1 you get into unless you ain a nationalist newspaper.

bright and oxylabs are the absorbing pair. The names constituent astatine Bright Data and Oxylabs, 2 commercialized scraping firms that hap to beryllium nonstop rivals. I can’t spot a statement successful the traffic, truthful I won’t declare ChatGPT pays them, but its unfastened web fetching runs done both, and the section tells you which 1 fetched each result. (We’ve been Oxylabs customers for a agelong clip for our SaaS Keyword Insights.)

Across everything I logged, agleam did the bulk of the fetching, particularly connected commercial, shopping, finance and upwind queries. oxylabs skewed location and local, labrador stayed connected news and reference, and serp mostly turned up connected news. To put names to the tiers, labrador carried Reuters, the WSJ, Wikipedia and TechRadar, agleam pulled Reddit, Forbes and rtings, and oxylabs brought the Gulf property for illustration Khaleej Times and Gulf News.

I moreover caught the divided wrong 1 upwind query, agleam taking the world information sites for illustration the Met Office while oxylabs handled the section Gulf press. (I unrecorded successful Dubai, by the way.) In that 1 query, the breakdown came retired for illustration this.

Source Pipeline metoffice.gov.uk bright accuweather.com bright timeanddate.com bright khaleejtimes.com oxylabs gulfnews.com oxylabs whatson.ae oxylabs

The AI SEO/GEO Takeaway

You’re mostly competing successful the scraped tier, truthful beryllium cleanly scrapable. Put your facts and numbers successful plain HTML text, ne'er down a book aliases wrong a PDF aliases an image. The licensed tier is mostly shut, truthful the lever you’ve sewage is third-party coverage, PR, brand mentions, links, and Reddit, to onshore connected the pages the scrapers really reach.

The Queries That Never Reach The Web

The adjacent point I noticed was that immoderate queries produced nary web hunt whatsoever. Before ChatGPT searches, it files your mobility into a bucket, successful a section called turn_use_case. I saw six of them crossed the questions I tried: instant search, shopping, text, local, thinking, and image generation.

Image Credit: Suganthan Mohanadasan

The 1 to attraction astir is text. When ChatGPT files your mobility arsenic text, it doesn’t search. It answers from its training corpus and stops.

The evident cases extremity up here: “how do I alteration a level tyre“, “write a Python usability to merge 2 sorted lists,” and “translate this into 4 languages” each came backmost matter pinch an quiet web tab.

Image Credit: Suganthan Mohanadasan

The 1 that should interest you is that “latest curen guidelines for type 2 diabetes” besides came backmost text, a current, high-stakes mobility you’d presume it researches. It didn’t; it answered from training. No E-E-A-T here. Oops!

Of 10 deliberately existent questions I tried, 3 were handled this measurement pinch nary hunt astatine all.

The wording decides the bucket, not the topic.

“best java adjacent me” flips to the section pipeline, “best 4K TVs to buy” turns connected shopping, but “best 4K TVs pinch reviews” stayed a normal search.

A maths mobility softly jumped to a reasoning exemplary nether thinking, while “Tesla banal value this week” stayed instant search.

Keep successful mind, these are results from my constricted testing. I will do much tests erstwhile I find immoderate much time.

The AI SEO/GEO Takeaway

Before you walk a penny connected a page, cheque the query moreover searches. If it’s a how-to aliases a definition, it whitethorn beryllium answered from training, wherever nary page tin get in, nevertheless bully it is. Spend your effort wherever it really fetches.

If you want to beryllium mentioned for specified queries, you’d person to walk a batch of clip building authority and hold for your marque to beryllium included successful early training data. (For example, make judge crawlers for illustration Common Crawl tin spot your site.)

How One Question Fans Out Into Dozens Of Searches (Fan-Out Queries)

ChatGPT besides exposes the searches it runs for you, if you propulsion the afloat speech backmost from its ain API. On the accelerated model, it’s minimal: 1 reworded query and done, possibly optimized for velocity complete depth. On the reasoning model, asked to comparison a fewer products, it ran astir 15 to 40 sub-queries disconnected the azygous question. (The number depended connected the complexity of the question.)

Image Credit: Suganthan Mohanadasan

Here’s a portion of what it really ran for 1 comparison task.

"Profound AI hunt visibility pricing AI engines tracked 2026" "AthenaHQ pricing AI hunt visibility tool" "site:peec.ai/pricing Peec AI Starter Pro Advanced 50 prompts 150 prompts" "Peec AI pricing $95 $245 $495 official" (a guessed price, past searched to confirm) "Scrunch AI pricing" (not successful my prompt, recovered mid-research) ...around 40 of these for 1 comparison

Three things guidelines retired successful there. It fires site: probes consecutive astatine vendor pricing pages.

It guesses a value and past searches to corroborate it. And it keeps widening arsenic it goes, picking up devices you ne'er named and chasing their pricing, too.

It doesn’t only hunt either; the page-reading is conscionable arsenic literal. It ran find for $, €, 99 and moreover “Agency,” past utilized the browsing tool’s ain unfastened and click commands to propulsion up the results it wanted, tally server-side, not an supplier connected your screen.

The aforesaid happens to your ain site. Ask it “keyword insights pricing,” and it runs a site:keywordinsights.ai/pricing probe, guesses thing for illustration “Starter $58, Pro $145, Advanced $299,” past opens the page and sounds the HTML for the rate awesome to confirm.

The AI SEO/GEO Takeaway

Put your cardinal numbers and information successful plain HTML text, ne'er wrong an image, because successful this lawsuit pinch pricing it greps the page for $ and € and can’t publication a graphic. Also, you request to make judge you past a site:yourdomain.com/pricing probe successful this usage lawsuit and constitute for the cleaned-up query it really runs, not the messy building a personification types. Avoid JavaScript-based toggles and move information loading.

Fetched, Cited, And Mentioned Aren’t The Same

This is the favoritism group muddle most, truthful it’s worthy being exact. Three different things tin hap to a source.

  • Fetched. The exemplary pulls your page into context. This is the result_source object. The scholar ne'er sees it.
  • Cited. It attaches your page arsenic the root down a circumstantial sentence, the footnote you tin click.
  • Mentioned. Your marque sanction appears successful the answer, often arsenic a spot linking to your site, but it isn’t the root of the claim.

They’re 3 abstracted outcomes, and you tin triumph aliases suffer each 1 connected its own.

To spot the spread betwixt them, I took a batch of commercialized and proposal queries and divided what ChatGPT fetched from what it cited.

This is the small, tech-skewed sample, truthful publication what follows arsenic a pattern, not a number to slope on.

Across that batch, Reddit and YouTube were some fetched heavily, 278 and 201 times. But Reddit was cited 11 times and YouTube not once.

I deliberation the logic is mechanical. A citation has to hindrance to matter the exemplary really pulled, and erstwhile it fetches a YouTube page successful search, it gets the metadata, not the existent video transcript.

A Reddit thread is each location successful the page. This isn’t conscionable my sample either. Ahrefs, crossed 1.4 cardinal ChatGPT prompts, recovered Reddit cited astatine 1.93% against YouTube’s 0.51%, and Profound recovered the aforesaid gap.

Image Credit: Suganthan Mohanadasan

A fewer different patterns, aforesaid caveat connected sample size. Reddit was the azygous most-cited domain, narrowly, and aft that nary 1 ran distant pinch it. The citations dispersed bladed crossed reappraisal hubs for illustration rtings and TechRadar and vendor pages cited for their ain specs.

Here’s the apical of the cited database crossed that batch.

Image Credit: Suganthan Mohanadasan

Vendor pages get cited too, but for their ain facts, the pricing and specs. Zoho, Semrush, and the VPNs earned citations that way. The verdict connected which 1 is champion still gets cited to a 3rd party. You tin beryllium mentioned without being cited, and cited without being mentioned.

Two mechanics beryllium underneath this. Citations hindrance to a circumstantial sentence, not the full answer, truthful being topically applicable isn’t enough; you person to beryllium the champion support for a precise claim.

And results are deduped by domain, truthful 20 bladed pages from your tract illness into one.

One beardown page per declare thumps a heap of anemic ones.

So, don’t spell astir creating thousands of debased quality/thin pages to reside each fanout query.

The AI SEO/GEO Takeaway

You can’t mention yourself. The declare astir you gets originated from personification else, truthful earn third-party coverage connected reappraisal sites and Reddit, triumph connected matter alternatively than video, and put 1 beardown page down each claim, because it dedupes by domain.

The Model Explains Its Own Strategy

I went looking for a hidden ranking people first and recovered nothing. That benignant of logic – a domain authority number, a spot weight, a look – ne'er reaches your browser, because it stays connected OpenAI’s servers.

So, anyone trading you “ChatGPT’s ranking factors” is trading you snake oil.

What the postulation does person is the reasoning model’s concatenation of thought, saved successful the conversation, wherever it describes its ain sourcing successful plain words.

Image Credit: Suganthan Mohanadasan

For facts, the pricing and the specs, it goes to the charismatic page first, and it says so.

Comparing Ahrefs, it sounds the charismatic page, notes it “lists Lite astatine $129, Standard astatine $249, and Advanced astatine $449,” and decides “pricing page seems much current, truthful I should mention that.” It wants the root it trusts, and the existent one.

Then it hits the wall this full station is about.

On Profound, it reasons that “the pricing isn’t showing up straight successful the hunt result, perchance because it’s loaded pinch JavaScript.” Same connected Peec, wherever “the pricing doesn’t show up directly, perchance hidden pinch JavaScript.”

So, it stops trying to publication them and falls back. “I tin quote third-party sources since the charismatic page is difficult to parse and doesn’t show prices”, it writes, and it notes it should “use citations from G2 wherever appropriate.”

That’s the full crippled successful 1 trace. The exemplary wanted Profound’s and Peec’s ain numbers. Their pricing sat down JavaScript, truthful it couldn’t publication them, and it cited G2 instead. Your facts, personification else’s page, because yours wouldn’t parse.

Those quotes are the model’s own, from the saved reasoning, not mine.

The AI SEO/GEO Takeaway

Own your facts, successful plain HTML. Your pricing and spec numbers person to beryllium successful crawlable text, not loaded by JavaScript and not baked into an image, because the exemplary sounds the page itself and gives up erstwhile it can’t. A JavaScript pricing array doesn’t conscionable rank badly; it hands your numbers to G2.

The sentiment you gain separately, done reviews, Reddit, and honorable comparison content, which is wherever the proposal gets cited from. A clean, readable pricing page pinch nary third-party sum gets your facts publication and personification other recommended.

What I Could Not See

There’s nary visible ranking logic, arsenic above, truthful why 1 root thumps another, past the model’s ain narration, stays server-side.

Personalization is existent and selective.

On a query that overlapped my ain work, ChatGPT pulled successful my past conversations, pinch the sources listed arsenic personal_sources: ["convo_search", "gmail", "files"].

It utilized 1 of my aged chats wrong a generic “best tools” answer, but only connected 1 of the 3 conversations I checked, the 1 that matched my history.

So, portion of immoderate answers is built from a user’s backstage information you tin ne'er optimize for, which is 1 logic 2 group get different answers and visibility scores wobble.

Local is capped. There’s a config value, local_results_limit, group to 2.

Image Credit: Suganthan Mohanadasan

Ask for the champion java adjacent you, and ChatGPT returns 2 places, not a apical 10. For local, you’re successful the apical 2, aliases you aren’t there.

One point I genuinely can’t telephone yet. My publication connected shopping comes from a azygous shopping query, and it flatly contradicts what Mark saw connected his azygous query, truthful the shopping operation is unsettled until personification runs a due batch.

And the wider caveat, said plainly. The building I’m judge of, because I saw it crossed astir 1,240 records. The percentages travel from a mini batch of commercialized queries, mostly SaaS and tech, truthful they request a bigger tally crossed existent verticals earlier anyone banks connected them.

That tally is the adjacent piece.

Run It Yourself

None of this needs typical entree aliases requires you to beryllium connected to the Matrix and go an operator, conscionable your ain browser.

Image Credit: Suganthan Mohanadasan

Open ChatGPT, property Cmd+Option+I for DevTools, unfastened Network, tick Preserve log, tally a query, past property Cmd+Option+F and hunt the responses for result_source.

That unsocial shows you the pipeline down each link.

For the rest, the fan-out and the citations and the reasoning, unfastened the Console, type let pasting once, and tally this against a speech that searched the web.

const t = (await (await fetch('/api/auth/session')).json()).accessToken; const c = await (await fetch('/backend-api/conversation/' + location.pathname.split('/c/')[1], {headers: {Authorization: 'Bearer ' + t}})).json(); const rows = []; JSON.stringify(c, (k, v) => { if (v && v.result_source) { const d = (v.attribution || v.url || '?').toString(); rows.push({source: d.replace('https://', '').replace('www.', '').split('/')[0], pipeline: v.result_source}); } return v; }); console.table(rows);

It sounds only your ain session, truthful thing leaves your machine. The output is simply a plain array of each root and the pipeline that fetched it.

source pipeline techradar.com labrador whathifi.com labrador soundguys.com bright rtings.com bright khaleejtimes.com oxylabs streetinsider.com serp

Change what the loop collects, and you tin propulsion the searches, the citations, and the reasoning the aforesaid way.

AExtension Now Captures Most Of This

If pasting scripts into your ain console isn’t your thing, there’s now an easier route. Olivier de Segonzac already ran a free Chrome hold that pulls ChatGPT’s hunt and fan-out data.

He publication this investigation and extended it to seizure 3 of the signals I took isolated above.

  • The turn_use_case bucket. The intent explanation ChatGPT files each move under, truthful you tin spot erstwhile a query flips to shopping, local, aliases matter earlier it moreover answers.
  • The reference-type mix. How galore of the answer’s citations were products versus hunt results, news, aliases images, parsed consecutive from the reference tokens.
  • The result_source pipeline. The scraper down each cited result, charted per conversation, truthful the Bright Data, Oxylabs, Labrador, and SERP divided shows up without you reference a statement of JSON.

It runs locally connected your ain convention and exports consecutive to Excel. Grab it from the Chrome Web Store, and Olivier wrote up the update here.

Image Credit: Suganthan Mohanadasan

So, backmost to the mobility we opened with. Does the accustomed proposal clasp up? Mostly. Reddit earns citations and topped my cited list. Listicles and reappraisal sites dress up astir of the rest. Good contented still matters, but only the half the exemplary tin really read. The remainder it sounds disconnected personification else’s page.

Which is the existent lesson. ChatGPT isn’t a hunt engine, truthful extremity optimizing for one.

It sounds your ain page for the facts, if it tin parse them, and everyone else’s for the opinion, and only erstwhile the mobility is worthy a search. Build for that.

And dainty each of this, excavation included, arsenic a snapshot of a strategy that changes by the week. The building holds. The numbers move.

While I was successful the traffic, I besides recovered a heap of things pinch thing to do pinch sourcing: the bot wall that stops you scripting it, a hidden shopping engine, and 573 unrecorded experiments moving connected the account. Those will beryllium published separately.

I’ve besides done akin study connected Perplexity, Gemini, etc., truthful I’ll beryllium sharing those soon.

More Resources:

  • ChatGPT Is Secretly Googling Things: This Tool Shows You Exactly What
  • GPT-5.5 Update Changes How ChatGPT Cites Sources
  • Research Suggests AI Engines Assign Ranking Roles To Sources

This station was primitively published connected Suganthan.


Featured Image: Viktoriia_M/Shutterstock

Category SEO Generative AI Technical SEO
Follow Us On Google
More