Your articles are on the first page of Google, yet ChatGPT, Claude, or Perplexity do not cite them. The problem is not the content. The content is trapped in technical debt — and this can be paid off surprisingly quickly. The point in 20 seconds

More and more business leaders sit down with us and say the same sentence: “I asked ChatGPT who the best supplier in my market is — and it listed my competitors. Not me.”

The almost automatic reaction at this point is panic: surely the content is weak, we need twenty more blog posts. In reality, the content is often perfectly good. The problem is that AI physically cannot see part of the content. It does not rank it poorly — it simply cannot access it.

This is where the key concept most marketers miss comes in: technical data debt. This is the accumulated, invisible burden — JavaScript waiting to be rendered, locked documents, misconfigured access — that makes an otherwise excellent website appear as an empty space to generative search engines. The good news: this is not a creative problem, but an engineering problem, and therefore it can be solved.

Why does Google see something that ChatGPT does not?

Most misconceptions start here: “If Google can read my website, AI can too.” In 2026, this is simply not true, and the difference costs money.

Googlebot works with an advanced two-phase system: using a headless Chrome browser, it executes the page’s JavaScript, waits until the dynamic content loads, and only then indexes it. AI bots — GPTBot, ClaudeBot, PerplexityBot — do not do this. They download the raw HTML, read what is inside it, and move on. No rendering, no waiting, no second attempt.

A joint Vercel and MERJ analysis covering more than half a billion GPTBot requests was clear: zero evidence of JavaScript execution. Bots do download JS files from time to time — Claude’s crawler in nearly a quarter of requests, ChatGPT’s in roughly one tenth — but they never execute them. Anything that would appear only after the code runs does not exist for them.

CrawlerWhat it powersDoes it render JavaScript?
GooglebotGoogle Search, AI Overviews, GeminiYes (with delay)
BingbotBing, partly ChatGPT searchPartly
GPTBot / OAI-SearchBotChatGPTNo
ClaudeBotClaudeNo
PerplexityBotPerplexityNo

Source: Vercel & MERJ crawler analysis, plus independent technical SEO audits, 2026.

The JavaScript trap — the most expensive mistake

A large share of modern websites is built with client-side rendering (CSR): React, Vue, or Angular, where the browser assembles the content during runtime. A human sees a perfect page. The bot, however, receives an empty skeleton: a navigation menu, an empty

container, and a few script references. The product description, prices, comparison table, FAQ — all would appear only after the script runs, which the bot does not wait for.

And here there is no such thing as “partly visible.” Either the content is in the raw HTML, or it is not. If the skeleton is empty, then for AI the entire page does not exist — it is not performing weakly, it is missing.

On top of that, AI bots do not scroll. Content loaded on scroll through “lazy load” techniques — lower sections, text below images, infinite lists — therefore regularly remains invisible even if it would otherwise be present in the HTML.

What the bot sees (CSR)

Empty skeleton. Zero sentences, zero schema, zero citable facts.

What it should see (SSR)

Heat pump heating...

Price, benefit, FAQ, schema...

Finished text already in the source code. Citable, indexable, quotable.

How to fix it

  • Server-side rendering (SSR) or static generation (SSG): Next.js, Nuxt, Astro, or Angular Universal — the essential content must already be present in the server response.
  • If an SSR migration is not feasible right now, use prerendering to serve bots.
  • The 10-second test: open the page, turn off JavaScript (or check “View Source”). What you cannot see this way, the bot cannot see either.
  • JSON-LD schema must also be in the raw HTML, not injected later with JavaScript.

Locked PDFs — trapped knowledge

The most valuable professional content — case studies, white papers, price lists, research findings — often lives in PDFs. And here, an important distinction must be made, because not every PDF is the same in the eyes of bots.

  • Text-based PDF: the content can be extracted from it, and AI can process it. This is the good case.
  • Scanned, image-based PDF: the bot only sees an image. Text can only be extracted through OCR, which is never perfect — names, numbers, accented characters, and tables get distorted or fall apart.
  • Encrypted / password-protected PDF: the text cannot be extracted at all. It cannot be indexed or fed into an AI model. Complete silence.
  • Untagged PDF: structural information is missing, so the reading order may become scrambled — multi-column layouts or tables can turn into an unreadable mass.

The practical consequence is painful: if you have locked your best arguments inside a beautifully designed but image-exported brochure, that knowledge does not exist for AI. The competitor who also published the same information on a simple HTML page appears in the answers. You do not.

How to fix it

  • HTML-first principle: key content — prices, benefits, FAQ, case study summary — should live on a proper web page. The PDF should be the downloadable “bonus,” not the only source.
  • If a PDF is required, make it a real text-based PDF, not an image PDF, with proper tagging and, in the case of scanned material, an OCR layer.
  • Simple check: open the PDF and try to select the text with your mouse. If you cannot select it, the bot cannot read it either.

The missing llms.txt — and the nuanced truth

llms.txt is a simple Markdown file placed at the root of a domain (https://example.com/llms.txt) that shows your most important pages in one clean list — a kind of “content map” for language models. Since late 2024, the GEO industry has been selling it almost as a miracle cure. Here comes the part that most marketing agencies leave out.

The reality in 2026: according to SE Ranking’s study of 300,000 domains, llms.txt adoption is roughly 10%, and it is not growing quickly. Even more importantly: Limy’s analysis based on more than half a billion AI bot events found that search-oriented bots — GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot — typically do not even open llms.txt, but crawl the HTML directly instead. Several independent SEO studies have also measured no detectable traffic or citation increase after implementation. John Mueller from Google has also urged caution: in his view, it strongly resembles the keywords meta tag, which Google has not used for a long time.

So is it useless? No — it just does not work where everyone is looking for its effect. Its real value is in the agent-based web. Coding and research AI agents — Cursor, Windsurf, Claude Code, GitHub Copilot — routinely request llms.txt when directed to a documentation site. This is a “business-to-agent” (B2A) interface: the first standardized way for your brand to publish a clean, machine-readable surface that AI agents can attach to.

In 2026, llms.txt is a cheap, low-yield bet — but with a clean option on the future. It is not worth creating because it will double your citations tomorrow, but because it is infrastructure for the direction the web is heading.

How to do it well

  • Create it — but with realistic expectations. It is cheap, fast, and does no harm. But do not expect an SEO miracle from it.
  • Keep it fresh. A common disease of manually written files is “staleness”: 404 links, old product names. An outdated llms.txt is worse than nothing.
  • Align it with robots.txt. The two files should not contradict each other — treat them as one coordinated package.
  • On WordPress, Yoast SEO and Rank Math can both generate llms.txt without coding.

The 10-minute self-diagnosis

Before hiring anyone, you can complete these four steps yourself today:

  1. 1 Source test. Open your most important page, right click → “View Source.” Can you find your main texts? If not, you are in the JavaScript trap.
  2. 2 PDF test. Try selecting the text in your most important PDF. If you cannot, bots cannot read it either.
  3. 3 robots.txt test. Check whether you have accidentally blocked access for GPTBot, ClaudeBot, or PerplexityBot. Conscious exclusion is fine — accidental exclusion is not.
  4. 4 Schema test. Is the JSON-LD structured data present in the source code? If it only appears after rendering, the bot does not see it.

Frequently asked questions

If I rank well on Google, AI sees me too, right?

Not necessarily. Googlebot executes JavaScript, but most AI bots — GPTBot, ClaudeBot, PerplexityBot — do not. This is why a page can rank first on Google while being practically empty for ChatGPT or Claude. A good Google ranking is a necessary, but not sufficient, condition for AI visibility.

Do AI bots render JavaScript in 2026?

Search-oriented AI bots — GPTBot, ClaudeBot, PerplexityBot — generally do not. They only download the raw HTML. Vercel and MERJ’s study covering more than half a billion GPTBot requests found zero JavaScript execution. Bots do download JS files from time to time, but they do not execute them. Only Googlebot renders comprehensively, while Bing does so partly.

Do I really need an llms.txt file?

It is worth creating, but with realistic expectations. Search-oriented AI bots still largely ignore it today, and studies have not measured any detectable SEO advantage from it. Its real value lies in the agent-based web: coding and research AI agents — Cursor, Claude Code, Copilot — routinely request it. It is cheap, fast, and does no harm — but keep it fresh, because an outdated file is worse than none.

Why do AI bots not see my PDFs?

Text-based PDFs are generally readable. The problem is with scanned, image-based PDFs and encrypted or password-protected documents: text can only be extracted from the former using error-prone OCR, while from the latter it cannot be extracted at all. Simple test: if you cannot select the PDF text with your mouse, the bot cannot read it either.

What is “technical data debt”?

It is the accumulated, invisible technical burden — JavaScript waiting to be rendered, locked documents, misconfigured access — that makes otherwise valuable content inaccessible to generative search engines. It is not a creative problem, but an engineering problem, and therefore it can be fixed at the system level.

The content is not missing. Visibility is.

The good news is that technical data debt can be assessed and paid back — often much faster than writing another twenty articles. A thorough AI visibility audit shows exactly where your content falls out of the bots’ field of view.

Data sources (2026): Vercel & MERJ crawler analysis (JavaScript rendering); SE Ranking llms.txt study of 300,000 domains; Limy AI bot traffic analysis; and independent technical SEO audits. The values are informational and may change as AI crawler behavior evolves.

Miért akarnak ilyen sokan velünk dolgozni?

Az onlinemarketing101.biz SEO ügynökség arra törekszik, hogy vállalkozásod online jelenlétét a csúcsra emelje. Weboldalunkon minden információt megtalálsz a keresőoptimalizálási szolgáltatásainkról és a kapcsolódó árakról, amelyek egyszerűvé és átláthatóvá teszik a döntéseidet. Akár a legújabb digitális marketing trendekben rejlő lehetőségeket szeretnéd kihasználni, akár márkád ismertségét növelnéd, nálunk a megoldás kéznél van. Nézd meg legújabb tartalmainkat, és ismerd meg, hogyan segíthetjük vállalkozásod fejlődését az online térben.

5-stars