Use of robot.txt file
A “robots.txt” file tells search engines whether they can access and
therefore crawl parts of your site. This file, which must be named
“robots.txt”, is placed in the root directory of your site. It’s
important to have a robots.txt file present in your root directory
because some search spiders will not crawl a site if they don’t find the
robots.txt file.
Files and Directories
Be sure to declare which files/directories you don’t want the robots
to crawl. Most bots will recognize most commands. Visit Google’s
Webmaster tools for declaration lists.
Session IDs
Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These techniques are
useful for tracking individual user behavior, but the access pattern of
bots is entirely different. Using these techniques may result in
incomplete indexing of your site, as bots may not be able to eliminate
URLs that look different but actually point to the same page.
No comments:
Post a Comment