Wednesday 19 November 2014

Web Scraping for SEO with these Open-Source Scrapers

When conducting Search Engine Optimization (SEO), we’re required to scrape websites for data, our campaigns, and reports for our clients. At the lowest level we utilize scraping to keep track of rankings on search engines like Google, Bing, and Yahoo, even keep a track of links on websites to know when it’s completed its lifespan. Then we’ve used them to help us aggregate data from APIs, RSS feeds, and websites to conduct some of our data mining to find patterns to help us become more competitive. 

So scraping is a function majority of companies (SEOmoz, Raventools, and Google) have to do to either save money, protect intellectual property, track trends, etc… Businesses can find infinite uses with scraping tools, it just depends if you’re an printed circuit board manufacturer looking for ideas on your e-mail marketing campaign or a Orange County based business trying to keep an eye out on the competition. which is why we’ve created a comprehensive list of open source scrapers out there to help all the businesses out there. Just keep in mind we haven’t used all of them!

Words of caution, web scrapers require knowledge specific to the language such as PHP & cURL. Take into considerations issues like cookie management, fault tolerance, organizing the data properly, not crashing the website being scraped, and making sure the website doesn’t prohibit scraping.

If you’re ready, here’s the list…

Erlang

    eBot

Java

    Heritrix
    Nutch
    Piggy Bank
    WebSPHINX
    WebHarvest

PHP

    PHPCrawl
    Snoopy
    SpiderMonkey

Python

    BeautifulSoap
    HarvestMan
    Scrape.py
    Scrapemark
    Scrapy **
    Mechanize

Ruby

    Anemone
    scRUBYt

We’ll come back and update this list as we encounter more! If you would like to submit a solution we missed, feel free. Also we’re looking for guides related to each of these, so if you know of any or would be interested in guesting blogging about one, let us know!

Source:http://www.annexcore.com/blog/web-scraping-for-seo-with-these-open-source-scrapers/

No comments:

Post a Comment