dizzysoft

web development for search engine optimization

Crawler Record Plugin for WordPress

Before your customers can find your website in search results, the search engine needs to be able to see your website. This is true whether your customers are searching in an LLM (such as ChatGPT, Claude, or AI Mode) or a traditional search engine. When was the last time one of these systems viewed your website? That’s what this plugin can tell you.

This plugin has not yet been released to the public.

What is a User Agent and why should I care?

A User Agent is a software application or a web browser that attempts to access your site. A User Agent might be an individual’s web browser or a search engine spider. Each User Agent has a unique identifier telling the web host (or CDN) who it is.

If a User Agent has viewed your website, it means it has the potential to add your site to its index. Once you’re in the index for this tool, it can draw upon what it’s discovered from the pages on your site and potentially show your site to people looking for what you have to offer.

Of course, the crawl (or viewing of pages) doesn’t mean these systems will or want to show your site- just that it can access it. If you want to show up on these sites, that’s called “search engine optimization” (SEO) and involves much more than this plugin can do for you. This data demonstrates that the first step to SEO is indeed possible.

Since the goal of this plugin is to identify search and LLM spiders, it is only checking for these particular User Agents to show up:

  • Googlebot (including AI Mode, AI Overviews, and Gemini)
  • Bingbot (including Copilot)
  • OpenAI
  • Claude (from Anthropic)
  • Perplexity
  • DuckDuckGo (including their AI system)
  • dizzysoft bot (my own bot, which I created to test my tools). I only run this on demand.

What this tells you about your site

  1. The last time a particular User Agent visited your website. If it has accessed your site, you might be in the index of this tool. This plugin only tracks bots after you’ve installed the plugin- so you might have to wait several days to see any information.
  2. If a particular User Agent has not visited your website, you can check to make sure your site allows this User Agent to view your website by checking your robots.txt or robots meta tag. This plugin will inform you if there’s something on your site preventing the agent from accessing it, but it cannot determine if your host or CDN is blocking the access. NOTE: This tool does not check any llms.txt file, since no User Agents in search or LLMs actually obey it.
  3. The last page that the User Agent visited on your website.
  4. View a particular page of your site, and this plugin will tell you the last time one of these agents viewed that specific page.

What this doesn’t tell you about your site

  1. That your site is “optimized” for SEO, LLMs, GEO, AIO, SAO, AEO (or whatever you want to call it). Just because a platform’s User Agent has visited a page on your site, does not mean that page is in their index- just that it could be in their index. If it is not in the index, then clearly you need to improve it.
  2. That people are visiting your website. This information reflects search engines or chatbots that are looking for opportunities to learn about the information on your sites to send people to it. Just because a bot came, however, doesn’t mean people come. That’s what Google Analytics is for. The only exception: some bots (ChatGPT-User and Claude-User) are closely connected to people visiting.
  3. That your host or CDN is preventing this User Agent from accessing your site, this only shows whether or not an agent has accessed it. If it’s been a while (weeks) and one of these agents hasn’t accessed your site (and there’s nothing in your robots.txt/meta preventing it), you should investigate further.

How is this different from Google Search Console or Bing Webmaster Tools?

If I’m concerned with how Google and Bing are viewing my site, that’s where I would turn before using this plugin. Those two tools (Search Console and Bing Webmaster Tools) are excellent, free tools you should use.

For now, however, LLMs do not have this type of data available to website owners. This plugin hopes to fill that gap.

Can’t you get this information (and more) from your web server’s logs?

Yes, you can. Unfortunately, most web hosts that support WordPress don’t allow access to the logs. This plugin is a replacement for that lack of information. However, if you can get into your server’s logs, you’d learn a lot more than this plugin can tell you!

How you might use this plugin:

  1. Are search systems able to view your website? If so, great: you’re capable of getting served by a search engine or LLM- if the pages on your website are deemed good enough quality. If not, you need to fix the problem.
  2. When was the last time a search engine visited your website? This information is valuable after launching a new website, to learn how quickly it is going to get re-indexed.
  3. Are search systems consistently viewing your website? How often should they visit? It depends- more popular sites are visited more frequently. If a specific User Agent hasn’t visited in over a month, you might look into it.
  4. Are search systems that you don’t want to view your website, viewing it anyway? For example, most websites that provide ads (and generate revenue from page views) may not want a User Agent from an LLM system to index their site. Unfortunately, it’s known that some LLM agents come anyway.

Plugin Features:

  1. From the admin, in the left column, select “Crawler Record” to learn:
    1. The last time and last page each of these bots/user agents has accessed.
    2. If a robot directive is preventing one (or all) of these agents from accessing your site.
    3. Click on the manager of each user agent to learn more about these bots.
  2. When logged in, visit any page and learn:
    1. If you’re viewing a page or post from the front-end, hover over “Crawler Record” to see all managers of relevant user agents. Hover over a manager to learn the last time each of these specific user agents accessed this particular page.
    2. Edit the page and you will be able to access a list of all monitored user agents, sorted by the last time they accessed this particular page or post. From this, you will also learn if any robot directives are preventing one of these bots from accessing this specific page.

Please note:

  • The information about each user agent (both “Last Seen” and “Last Page” is only valid from when you have installed this plugin. It cannot determine visits from any user agent before that time.
  • This plugin detects whether or not your WordPress install is blocking one of these agents- but not if your host or CDN is blocking this user agent.

Need Help?

If you need support for technical difficulties in the plugin, please reach out through the plugin page on WordPress.org (accessible through the plugin).

If you need help understanding why bots aren’t coming to your site- or optimizing your site for search engines or LLM chats, please reach out to my agency: Reliable Acorn has expertise in search engine optimization.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Examples

Recent Comments

Topics