Imagescraper: Difference between revisions
No edit summary |
No edit summary |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Use this to immediately pull down all the big pictures from Google Image searches. WARNING: scrapers get out of date when the target site changes - this is currently in need of an update. | |||
<br> | |||
[[Image:Imagescrape sample.jpg|center|frame|none|Imagescraper in action]] | |||
<br> | |||
<br> | |||
This script combs google's image results for the link to the "full" image, rather than just the thumbnail. As the original full image locations are determined, they are placed in the html result. Once you submit a search, should start to get full images right away in the results. Be careful, simple searches that result in a large number of hits can pull down LOTS of large images! | This script combs google's image results for the link to the "full" image, rather than just the thumbnail. As the original full image locations are determined, they are placed in the html result. Once you submit a search, should start to get full images right away in the results. Be careful, simple searches that result in a large number of hits can pull down LOTS of large images! | ||
There's a secondary function as well. The resulting images are wrapped with HTML that will do an imagescrape of the site where the image is located, if you click on the image itself. | |||
The script uses my favorite Perl module, [http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm WWW::Mechanize]. | The script uses my favorite Perl module, [http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm WWW::Mechanize]. | ||
Someone asked me to release the perl code under the GPL. It's just a quick sloppy hack, so no promises, but here it is. One word of warning: | Someone asked me to release the perl code under the GPL. It's just a quick sloppy hack, so no promises, but here it is. One word of warning: it's tied to the format of Google's image search and results pages. If they change, the script will need to be updated. That being said, it's been working as-is for a long time now - for years, actually, without change. Also of note: I got into this based on some article discussion somewhere - we stand on the shoulders of giants - keep on sharing! :> | ||
[http://thedigitalmachine.com/files/imagescrape.tar.gz Imagescrape perl code] | [http://thedigitalmachine.com/files/imagescrape.tar.gz Imagescrape perl code] | ||
Just extract the files to a path accessible from your Perl-enabled apache-hosted website. If you're using [mod_deflate], for better responsiveness with the streaming results, you'll want to disable it on the results page; see [http://news.thedigitalmachine.com/2008/11/01/mod_deflate-p0wn3d/ this post] for details. | |||
Try out Imagescraper here: | |||
http://thedigitalmachine.com/imagescrape/ |
Latest revision as of 22:05, 15 December 2009
Use this to immediately pull down all the big pictures from Google Image searches. WARNING: scrapers get out of date when the target site changes - this is currently in need of an update.
This script combs google's image results for the link to the "full" image, rather than just the thumbnail. As the original full image locations are determined, they are placed in the html result. Once you submit a search, should start to get full images right away in the results. Be careful, simple searches that result in a large number of hits can pull down LOTS of large images!
There's a secondary function as well. The resulting images are wrapped with HTML that will do an imagescrape of the site where the image is located, if you click on the image itself.
The script uses my favorite Perl module, WWW::Mechanize.
Someone asked me to release the perl code under the GPL. It's just a quick sloppy hack, so no promises, but here it is. One word of warning: it's tied to the format of Google's image search and results pages. If they change, the script will need to be updated. That being said, it's been working as-is for a long time now - for years, actually, without change. Also of note: I got into this based on some article discussion somewhere - we stand on the shoulders of giants - keep on sharing! :>
Just extract the files to a path accessible from your Perl-enabled apache-hosted website. If you're using [mod_deflate], for better responsiveness with the streaming results, you'll want to disable it on the results page; see this post for details.
Try out Imagescraper here: