What I leant today - robots.txt parsing order

For those looking at SEO conundrums.

So what I learnt today is that if you do :

#

User-agent: * Disallow: /a Disallow: /b

User-agent: Googlebot Disallow: /c

#

Google’s serach indexer will ignore the lines for User-agent: * and only read the ones specified specifically for itself. So /a, /b get indexed and /c does not

However :

#

User-agent: Googlebot Disallow: /c

User-agent: * Disallow: /a Disallow: /b

#

Then google will correctly not spider /a, /b or /c.

Although I can find no mention of this rule in ordering.

* UPDATE *

Ok, I’ve noted that Google does say this actually; and moreso that if you have a bot specific block then that bot will ignore the rules for all bots.

So I’ve had to go and paste rules into each and every specific bot section

But what I have learnt today (1 day later) is that if you leave blank lines in a user-agent block then some engines will disregard that instruction (Yandex). I also had fun reading translated russian webmaster guidelines

Pages

Recent Comments

  • nickh: Daniel is right here, asleep in his bouncy chair. Very read more
  • Caroline Yates: You should keep cight for personal stuff and funny stuff read more
  • Meri: Whoops, the first comment was meant to go to your read more
  • Meri: Main thing that I would say is before you give read more
  • Meri: Have you tried the daily posting feature? You can set read more
  • Carlos Contreras: Would you like to see my work? www.3dreamagic.com bye read more
  • kyle: hey way better cricket game out there!! www.stickcricket.com read more
  • andrew tomlinson: this game is wicked and adictive thanks for it. read more
  • Jay: Hello there, My name is jay,I love this game and read more
  • Mayuresh Kadu: Found you via geourl. Seems we live a few minutes read more
OpenID accepted here Learn more about OpenID

Recent Entries

Adverts

About this Archive

This page is an archive of entries from April 2010 listed from newest to oldest.

January 2010 is the previous archive.

June 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.