Data Page Content Scraping

Published on: 14-May 08:01am

I shared this in the RFR group last week. Scrapes Title, Description, H1, and text from the page. If you want to scrape H2-6, just take formula, make a new column and change h1 to h2, etc. Have scraped 100 URLs before, just takes a bit longer. Make a copy to your Google Drive. 

 

Not much different from Andrew's share except you get a text from the page as well. I made it when Alan Smith (Thank you SO MUCH for all the ideas and information, my thinking has expanded a thousandfold) started talking about stuffing images, last Sept. I think.

 

You add the links you want to scrape. I search google for keywords, use LinkClump to get URLs, add to sheet and that's it.

 

This is a basic scraper in sheets I made a while ago.  Not perfect but is simple.  Add URLs to the URL column, that's it. Make sure to delete what is there first. 

 

Scrapes Title-Description-H1-Text on-page. The bad thing is it scrapes all the text on the page usually, so has to be cleaned up.  

 

Also sometimes throws errors, when that happens you usually just have to delete URLs and add again. It will pull the title and only a part of the description from YT videos.

https://docs.google.com/spreadsheets/d/1fYb7iv_9kYiCvY_nKGBnusdXCQiONbzaWepWiT9hz2M/edit?usp=sharing 

 

A chrome extension I use all the time to get URLs off of google, YT, etc. is LinkClump.  Here is a video that shows what it does and how to set it up (not mine).

 

 

Another thing I do is search for other countries with my keywords in that language (i.e. dentista at google.it), take articles, and translate in sheets.  Can get some decent content that passes Copyscape.  (The Translate to English tab in the sheet.)

 

Also, sometimes the errors are simply because something is not on the page.One thing I've been doing with the first sheet I shared is scraping sites with demographic info about cities, surrounding cities and neighborhoods.

Make Google sites for each page, then pyramiding them. Main city<-surrounding cities<-neighborhoods.

The main city Google site is linked to a Google site with general local info which then links to money sites.

After a bit, the money site really starts moving for surrounding cities and neighborhood search terms.

 

YMMV

Unable to find an answer?

Looking for anything specific article which resides in general queries? Just browse the various relevant folders and categories and then you will find the desired article.

Contact Us

Confirm Action

Are you sure? You want to perform this action.