Imagescraper

From Bitpost wiki
Revision as of 22:05, 15 December 2009 by M (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Use this to immediately pull down all the big pictures from Google Image searches. WARNING: scrapers get out of date when the target site changes - this is currently in need of an update.


Imagescraper in action



This script combs google's image results for the link to the "full" image, rather than just the thumbnail. As the original full image locations are determined, they are placed in the html result. Once you submit a search, should start to get full images right away in the results. Be careful, simple searches that result in a large number of hits can pull down LOTS of large images!

There's a secondary function as well. The resulting images are wrapped with HTML that will do an imagescrape of the site where the image is located, if you click on the image itself.

The script uses my favorite Perl module, WWW::Mechanize.

Someone asked me to release the perl code under the GPL. It's just a quick sloppy hack, so no promises, but here it is. One word of warning: it's tied to the format of Google's image search and results pages. If they change, the script will need to be updated. That being said, it's been working as-is for a long time now - for years, actually, without change. Also of note: I got into this based on some article discussion somewhere - we stand on the shoulders of giants - keep on sharing!  :>

Imagescrape perl code

Just extract the files to a path accessible from your Perl-enabled apache-hosted website. If you're using [mod_deflate], for better responsiveness with the streaming results, you'll want to disable it on the results page; see this post for details.

Try out Imagescraper here:

http://thedigitalmachine.com/imagescrape/