What Is Content Scraping

ADVERTISEMENTS

by Tom

Content scraping is a technique of extracting content from websites and then posting it on other websites (or blogs) automatically via RSS.

I have seen [Tag-Tec]content scraping[/Tag-Tec] happen on this blog as well.I installed a Wordpress plugin called “RSS Footer” that adds an extra line of content to articles in my RSS feed.

Now it says “This post is originally posted at The Home Business Archive”, and a link back to my post.At least I get a link back :) Other than that, I can´t think of a way to prevent content scraping.

These bloggers discuss content scraping:

Why blog content scraping is a pain. Why you shouldn’t ignore it …
Why blog content scraping and plagiarism is a pain, why you shouldn’t ignore it & what to do. A look at a recent example of our logo blog getting ‘scraped’ and why this growing plagiarism trend is an issue in the design community.

Google Maps Scraping for Content « Online Radio / Podcast Network
Writing in his Understanding Google Maps and Local Search blog, Blumenthal takes a deep, technical look at Google’s new practice of scraping content from local news, review and event websites and applying that content to business …

Content Scraping: Is Someone Ripping Your Content AND Good Name?
Content scrapers are simple programs that scrape content on topic from sites or blogs for posting to the content scraper’s site. The sole purpose of content scrapers is to rip content, post it to a series of junk sites that are slathered with PPC and paid advertising, and make money on clickthroughs. …

Do you have any good ideas of how to prevent content scraping? Leave a comment.

Reblog this post [with Zemanta]

Technorati Tags: , , , , , ,

{ 5 comments… read them below or add one }

Zara from Zombie Halloween Costumes April 4, 2010 at 6:52 am

Hey, Im just getting started with Wordpress and obviously have a lot to learn. Thanks for the info about scraping – Im going to read more on the blogs you’ve recommended.

Reply

Carlos from Engine Tuning April 30, 2010 at 6:02 am

You just have to insert a bot trap which logs and blog ip addresses. Hide a link to the bot trap in your menu then dynamically block them from the rest of your site. This is also really effective at blocking spam comments and other malicious bot activity.

It is also possible to block repeated requests from certain IP’s. Most users only read 4-20 pages per day. Bot will grab lots more than this so a simple log check with whitelist for search engine bots will do well.

Reply

Emma from Scrap Car Collection May 16, 2010 at 1:44 pm

Is it true that changing the content on your website frequently can help you keep a number of pages indexed, I have recently launched a website and am in the process of getting it indexed but the indexed pages seem to drop out ???

Reply

Racheal from Kermit Costume June 28, 2010 at 10:53 am

Hmmm – I dont like the sound of content scraping – sounds like a lot of websites would end up have a lot of duplicate content on them and their value would decrease massively…
Racheal @ Kermit Costume´s last blog ..About Us My ComLuv Profile

Reply

Heather from H Miracle July 2, 2010 at 9:49 am

The websites which have the best long term success are always those which are built around solid content – not scraping of content. A lot of people with low quality websites have seen their rankings dive with recent changes to the Google Algorithm, quality website haven’t had any problems however…

Reply

{ 1 trackback }

Leave a Comment

CommentLuv Enabled

This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.

Previous post:

Next post: