Where Is Your Site Ranked In Terms of Google Crawl Rate?

Filed Under (SEOmeter Announcements) by SEOmaster on February 10, 2008

SEO Meter now monitors more than 1,000 websites’ crawling activities, and the number is still growing as of this writing. These websites include many high profile commercial/non-commercial websites manually added by us (30%), as well as user-submitted websites and blogs (70%).

The average crawl cycle of these 1,000 or so websites comes out at 2.19. So on average, Google is updating its cache for these websites once every other day. Comparing your own stat against this average can give you a rough idea where your website is ranked in terms of Google’s crawling activities.

For those folks who are more into statistics, the following scary-looking figure can give you a more accurate way of figuring out your (relative) crawl rank on the web.

cdf.gif

The red curve in the figure is so-called cumulative distribution function (CDF) of Google’s crawl cycle obtained from our listed websites. Putting aside nitty-gritty details about CDF, this plot basically tell you your relative crawl ranking (shown in the y-axis) from your crawl cycle (shown in the x-axis). For example, suppose your crawl cycle is 1.0. Then check the y-value of the curve at x=1.0, which is approximately 0.15. This means your site belongs to top-15%. Suppose your crawl cycle is 3.0. Then from the figure, you can see that your site belongs to top-75% (i.e., y=0.75 at x=3.0). Hope I did a good job deciphering this stuff.

One caveat about this CDF is that it was generated from 1,000 or so websites monitored by us. So the resulting ranking distribution may not be a totally accurate snapshot of the entire web. Only Google will know the accurate info on that. :) Anyway it will be interesting to see how the stats will change as our database grows over time.


So for now, feel free to use this toy metric to infer your relative crawl rank on the web.

You don’t know the crawl cycle of your website? Then feel free to add your site to SEO Meter for free, and start to have your crawl cycle monitored by us.

Blogger Responses to SEO Meter

Filed Under (SEOmeter Announcements) by SEOmaster on January 29, 2008

It’s about one month now since we opened up SEO Meter to the public last month. So far, I’ve received many positive/useful comments about the tool. Here are some of the bloggers’ comments about our tool:

Thumbs up:

All and all SEOmeter is a pretty cool website. I use compete.com a lot so I know I will be using this website a lot too.

SEO Meter is a brilliant name for a brand and I can see this site growing into a very valuable service.

This is the kind of site that I would love to start. It has a good idea, feels a need, and is very brandable.

I love automation of process. All of this information about a site can be obtained by doing a search or looking into Google Webmater tools, but wouldn’t be easier to just set it and forget it? In this case I say yes.

If you run a blog, news site, or very large site, then the fee will probably become worth it…. All in all. Pretty gosh darn neat if you ask me.

I’m just more fascinated on graphs and how much the trends change over time. This gives you different things to check your experiments. Maybe you start posting more or less. It could affect your search engine referrals.

The SEOMeter.com site is easy to use and it explains in greater detail all of the benefits of using this type of Search Engine Optimization Tool.

I must say, it’s a very useful tool, one that every webmaster on earth should take advantage of…Some newer webmasters might not see the need for it for a while, but it’s nice to know that there is something out there to do this efficiently for you.

Seometer is a valuable addition to any blogger or sites owners toolkit and with its ease of use something that won’t take more than a few minutes to get up and running with.

Thumbs down:

Is the system foolproof and 100% accurate? It shows that ReadWriteWeb is visited more often than Engadget, while Engadget is a lot more popular.

after a blog reaches a certain point (becomes really popular), the graphing loses value as every single blogs graph would eventually plateau at the same point

you can have your very own Google Crawl Cycle chart but I’m afraid all this did was make me yawn. Not more stats to pour over,” I thought.

The graph provided by the SEOmeter SEO tool looks an awful lot like the graphs that you see on Alexa.

It is ridiculous to charge price for a service that is completely dependent on Google’s crawling policy and if google, like yahoo and msn, suddenly decides to “not publicise” their crawl timestamp, this service will just go bust.

Suggestions:

Maybe a paid version were it tracks your spider traffic accurately by putting code on your site or server.

Ok the first thing I would do is upgrade the method used to determine Google’s crawl rate… I would use the Statcounter approach, giving people a script or a file to place on their site. They would then come back to SEO Meter to log in and check the results. This would be a lot more accurate method to test the rate and would bring more repeat traffic back to SEO Meter.

I’ll get back to some of these thumbs down comments and suggestions later. :)

SEOmeter Is a Free Tool Now!

Filed Under (SEOmeter Announcements) by SEOmaster on January 10, 2008

Tagged Under :

After receiving a number of feedbacks from our users, we decided to make one important change for our tool: making SEOmeter a free tool for everyone!

There are a couple of points that I’d like to highlight for our new free SEOmeter tool:

  • Google’s cache history is limited to the latest three months. Any history older than three months will be discarded and won’t be accessible.
  • Free submission is limited to top-level domain or sub-domain only. Internal URL submission is still subject to annual payment.
  • Website under construction, or with incomplete content or poor design will not be accepted.

If you have any question or suggestion, feel free to add it here.

If you have a website to submit, get onboard! :)

“Noarchive” Meta Tag To Disable Search Engine Caching

Filed Under (Search Engine Crawling) by SEOmaster on January 01, 2008

Tagged Under : ,

“Noarchive” meta tag is used when you want to prevent or remove cached pages in a search engine. This meta tag is known to work for all major search engines including Google, Yahoo, MSN and Ask.com. It is also known that “noarchive” meta tag does NOT affect your search engine rankings or indexing, but only determines whether or not search engines will cache the crawled content.

The “noarchive” metatag can be used when a website publisher charges a fee for access to its content, and thus want to prevent content theft, but still would like the content to be indexed and ranked by search engines. Also it’s useful when the content changes frequently, and it’s not desirable to keep the outdated stale content cached by search engines for human access. This meta tag is sometimes exploitted by blackhat SEO to hide their cloaking techniques.

Sites that currently use “noachive” meta tag to protect against search engine caching include:

  • http://www.webmasterworld.com
  • http://www.nytimes.com

These sites are indexed in Google but not cached because they have the metatag ‘noarchive’ for all robots.

<META NAME=”GOOGLEBOT” CONTENT=”NOARCHIVE”>
<META NAME=”ROBOTS” CONTENT=”NOARCHIVE”>

SEOmeter Is Now Accepting 100 Websites For Free

Filed Under (SEOmeter Announcements) by SEOmaster on December 29, 2007

To celebrate the launch of SEOmeter.com, we are accepting 100 websites in our monitoring system for free. Yes, you heard it right. Free admission with no string attached! To be eligible for free admission, you should add a comment to this blog post, and include your site URL in the comment.

Here are other requirements for free admission:

  1. Only one site is allowed for each IP address. If there are multiple entries for a given IP, the one mentioned in the earliest comment will be accepted, and the rest will be automatically rejected.
  2. Only top-level domains or sub-domains are allowed. No internal page or folder is allowed.
  3. Site should be cached by Google within the last two weeks of your post. Please check with Google by using cache: command.

This post/offer will be closed once 100 eligible entries have been chosen.

Top-20 Most Crawled Sites

Filed Under (SEOmeter Announcements) by SEOmaster on December 28, 2007

There are some internal changes made to our database. Now all the websites monitored by us are categorized by their topics internally; the classification is made by our editor, not by submitter. In the top-20 page, we then list the top-20 sites in each category in terms of their average crawling cycles. The top-20 rankers in each category are automatically updated as their stats change over time. The top-20 list will give you an idea of what kinds of sites are favored by search engines in each category. You should be proud of yourself if your site make the list. :)

What Determines Search Engine’s Crawl Rate?

Filed Under (Search Engine Crawling) by SEOmaster on December 27, 2007

Tagged Under : ,

The web is an ever-growing and dynamically changing world. Given the gigantic scale and dynamic nature of the web, it’s a nearly insurmountable task to maintain an up-to-date search engine index for the entire web. Therefore, one of the most important tasks of a search engine is to determine what part of the web is important and thus worth crawling more often, and reflect any updated content in their index in a timely manner. From a webmaster’s point of view, our mission is then to become a search engine’s favorite, and to make search engine robots visit our site more often than others. Here are a (non-exhaustive) list of factors that can affect the search engine’s crawl rate.

1. Relevant and Authoritative Backlinks
It’s a well known fact that backlinks help major search engines’ crawlers find your site and can give your site greater visibility in their search results. Especially links from relevant content and authoritative sources are considered a more powerful vote by search engines, and therefore are more likely to bring search engine robots to your website. Submitting your site to reputable and well-categorized web directories or major social networking sites helps your site get more exposed to crawlers.

2. Content Update and Pinging

Regular and frequent content update is another important factor that attract search engine robots. For example, the purpose of Google’s fresh crawl is to detect content update, and reflect the change in the search engine results immediately.

If your site is a blog, you can try existing pinging services such as pingomatic.com or Google’s Blog Search pinging service to proactively inform search engine robots of new posts and content changes.

3. Internal Link Structure
Another factor that affects search engine’s crawling rate is how the current page of a website is linked from other pages within the same website domain. Search engines determine the relative importance of the current page on a website based on the site’s overall internal link structure. Pages that are heavily linked to internally (e.g., site-wide pages) are considered important by search engines, and therefore receive more frequent visits from spiders.

4. Sitemap and Robots.txt

Creating a search engine sitemap for your site helps your site indexed more deeply as well as more frequently. With Google, you can create XML/TXT-formatted sitemap and submit it to your Google webmaster tools account. A typical sitemap contains a list of URLs for crawler to retrieve. If the sitemap is formatted in XML, you can specify extra information for crawlers, such as frequency of content change, last modification date, or relative importance of a page.

While sitemap informs crawlers which pages to retrieve, robots.txt does the opposite. That is, robots.txt prevents spiders from retrieving all or part of your website, which otherwise is publicly accessible by human. As webmasters become more SEO-savvy, they start to make use of robots.txt more actively (e.g., to eliminate duplicate content). But at the same time, it increases a chance for them to fumble robots.txt, and unwittingly block search engine spiders. In order to prevent any costly mistake, always arm yourself with the up-to-date syntax of robots.txt recommended by major search engines such as Google and Yahoo, and look out for Google’s crawl error reports.

5. Server Speed

Not to interfere with search engine’s crawling, the web server where your site is hosted should respond to a request in a reasonable time. Fast response time offers visitors good surfing experience. The same logic applies to search engine robots as well. Given that the search engine’s primary role is to provide users good searching experience, having your website hosted on a fast web server helps your site indexed faster and updated more frequently by search engine.

6. Set Crawl Rate Feature in Google Webmaster Tools

In your Google webmaster account, you can choose three different types of crawl speed for your website: Faster, normal, slower. The set crawl rate option is available only for top-level domain or sub-domains, but not for any internal pages or folders. An once requested crawl rate need to be renewed every 90 days. However, it’s reported that this feature does not guarantee an immediate effect on Google’s crawl rate.

Website Submission and AJAX Search Are Available

Filed Under (SEOmeter Announcements) by SEOmaster on December 22, 2007

Tagged Under :

Two updates from SEOmeter.com:

Website submission page has been added. We decided to charge an annual (but modest) fee for each submitted site for monitoring. The reason behind introducing the annual fee is to support the cost of our server and database resources which will increase over time as more sites are monitored by us.

The other update is, an AJAX comparison menu was added in the site information page (see the screenshot below). You can enter two more urls in the comparison form to compare the current website against two other sites in terms of crawling history.

ajax.jpg

Currently there is one known issue with the AJAX comparison form. For whatever reason, AJAX refuses to retrieve results on any mod-rewrited page. So for example, the comparison form works fine on http://www.seometer.com/index.php?search=google.com, but not on page http://www.seometer.com/cache/google.com . We are working on fixing this issue. The issue was fixed.

SEOmeter.com Open To The Public!

Filed Under (SEOmeter Announcements) by SEOmaster on December 20, 2007

Tagged Under :

SEOmeter.com is open to the public. Our site is still in beta, and we are still working to make it fully functional, including website submission procedure. In the mean time, you still can get a glimpse of our site. We will get back to our blog in a few days, and describe in more detail what our site is about, and what motivated us to develop SEOmeter.com.
Please tune in!