Bots, Scrapers, Harvesters And Crawlers – Tools Or Misuse?


Bots and Crawlers (on web hosted sites)

Bots certainly run the gamut, so it’s hard to label them all at good or bad. For instance, search engines like Google rely upon bots, known as spiders, to crawl the Internet and add your content to databases. If you want visitors to be able to find your site from Google, then you will want to allow the search engine’s bots to access your site. On the other hand, your site might contain a few pages or a directory that you do not want to be indexed. In this case, you can edit your .htaccess file to block Google or Yahoo’s bots from indexing your content. You can only control privacy access if you have private web hosting access to this file.

Scrapers and Harvesters

On the other hand, scrapers and harvesters might seem a lot more nefarious, when you get down to it, because they violate copyrights. These tools crawl the Internet for content in order to display that content on other sites. Often, scrapers will grab content from RSS or Atom feeds, which have become commonplace. These feeds are a great tool for allowing your readers to access your content from all of their devices, without visiting your site in their browser. However, if you’ve ever discovered a site that is using your content word-for-word, they may be using a scraper.

Some site owners think they can scrape or harvest content from legitimate sites, install a free theme, throw up some ads and reap the benefits without any effort on their own part. If you think it’s a pretty shady practice, you’re not alone. These scraper sites may even include a link back to your post, but it’s often not clear who wrote the content. Unsurprisingly, these sites may include a lot of gibberish and may not even have an apparent theme or subject.

Sometimes legitimate sites use a type of scraper. For example, you may have signed up for a listing site for your blog’s niche, and this site may use your feed to show snippets of your recently-added content with a link that allows visitors to view your site.

What Can You Do About These Programs?

If you’ve noticed your content on other sites, you have a right to ask them to stop. Contacting the site owner is one of the first steps you can take. Several sites offer free cease and desist templates that you can use to get started.

If contacting the site owner is not possible, you may be able to contact the host, alerting them to the copyright infringement on their servers. Hosts within the United States are bound to obey the Digital Millennium Copyright Act (DMCA). A WHOIS search will often include this information, and you should be prepared to show evidence of the copyrighted content.

Google has a very specific service for reporting copyrighted materials on its services, including Google Image Search or Blogger. Simply enter the form to file your complaint. Similarly, Yahoo! and MSN have pages dedicated copyright infringement, allowing you to request the removal of the infringing content from search results.

Web Hosting Help Lady

Juliana is a webmaster curating all the cool webmaster tools for all your Virtual Private Server needs. You can find her @InMotionHosting based in Los Angeles touting all her web-pro research on Twitter @JulianaPayson.

How You Can Start A Social Networking Group?

Previous article

Markup For Videos: Optimizing With’s Video Markup

Next article

You may also like


Comments are closed.

More in News