[ Team LiB ] Previous Section Next Section

  
• Table of Contents
• Index
• Reviews
• Examples
• Reader Reviews
• Errata
• Academic
Spidering Hacks
By Tara Calishain, Kevin Hemenway
 
Publisher: O'Reilly
Pub Date: October 2003
ISBN: 0-596-00577-6
Pages: 424
 


   Copyright
   Credits
      About the Authors
      Contributors
   Preface
      Why Spidering Hacks?
      How This Book Is Organized
      How to Use This Book
      Conventions Used in This Book
      How to Contact Us
      Got a Hack?
      Chapter 1.  Walking Softly
      Hacks #1-7
      Hack 1.  A Crash Course in Spidering and Scraping
      Hack 2.  Best Practices for You and Your Spider
      Hack 3.  Anatomy of an HTML Page
      Hack 4.  Registering Your Spider
      Hack 5.  Preempting Discovery
      Hack 6.  Keeping Your Spider Out of Sticky Situations
      Hack 7.  Finding the Patterns of Identifiers
      Chapter 2.  Assembling a Toolbox
      Hacks #8-32
      Perl Modules
      Resources You May Find Helpful
      Hack 8.  Installing Perl Modules
      Hack 9.  Simply Fetching with LWP::Simple
      Hack 10.  More Involved Requests with LWP::UserAgent
      Hack 11.  Adding HTTP Headers to Your Request
      Hack 12.  Posting Form Data with LWP
      Hack 13.  Authentication, Cookies, and Proxies
      Hack 14.  Handling Relative and Absolute URLs
      Hack 15.  Secured Access and Browser Attributes
      Hack 16.  Respecting Your Scrapee's Bandwidth
      Hack 17.  Respecting robots.txt
      Hack 18.  Adding Progress Bars to Your Scripts
      Hack 19.  Scraping with HTML::TreeBuilder
      Hack 20.  Parsing with HTML::TokeParser
      Hack 21.  WWW::Mechanize 101
      Hack 22.  Scraping with WWW::Mechanize
      Hack 23.  In Praise of Regular Expressions
      Hack 24.  Painless RSS with Template::Extract
      Hack 25.  A Quick Introduction to XPath
      Hack 26.  Downloading with curl and wget
      Hack 27.  More Advanced wget Techniques
      Hack 28.  Using Pipes to Chain Commands
      Hack 29.  Running Multiple Utilities at Once
      Hack 30.  Utilizing the Web Scraping Proxy
      Hack 31.  Being Warned When Things Go Wrong
      Hack 32.  Being Adaptive to Site Redesigns
      Chapter 3.  Collecting Media Files
      Hacks #33-42
      Hack 33.  Detective Case Study: Newgrounds
      Hack 34.  Detective Case Study: iFilm
      Hack 35.  Downloading Movies from the Library of Congress
      Hack 36.  Downloading Images from Webshots
      Hack 37.  Downloading Comics with dailystrips
      Hack 38.  Archiving Your Favorite Webcams
      Hack 39.  News Wallpaper for Your Site
      Hack 40.  Saving Only POP3 Email Attachments
      Hack 41.  Downloading MP3s from a Playlist
      Hack 42.  Downloading from Usenet with nget
      Chapter 4.  Gleaning Data from Databases
      Hacks #43-89
      Hack 43.  Archiving Yahoo! Groups Messages with yahoo2mbox
      Hack 44.  Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
      Hack 45.  Gleaning Buzz from Yahoo!
      Hack 46.  Spidering the Yahoo! Catalog
      Hack 47.  Tracking Additions to Yahoo!
      Hack 48.  Scattersearch with Yahoo! and Google
      Hack 49.  Yahoo! Directory Mindshare in Google
      Hack 50.  Weblog-Free Google Results
      Hack 51.  Spidering, Google, and Multiple Domains
      Hack 52.  Scraping Amazon.com Product Reviews
      Hack 53.  Receive an Email Alert for Newly Added Amazon.com Reviews
      Hack 54.  Scraping Amazon.com Customer Advice
      Hack 55.  Publishing Amazon.com Associates Statistics
      Hack 56.  Sorting Amazon.com Recommendations by Rating
      Hack 57.  Related Amazon.com Products with Alexa
      Hack 58.  Scraping Alexa's Competitive Data with Java
      Hack 59.  Finding Album Information with FreeDB and Amazon.com
      Hack 60.  Expanding Your Musical Tastes
      Hack 61.  Saving Daily Horoscopes to Your iPod
      Hack 62.  Graphing Data with RRDTOOL
      Hack 63.  Stocking Up on Financial Quotes
      Hack 64.  Super Author Searching
      Hack 65.  Mapping O'Reilly Best Sellers to Library Popularity
      Hack 66.  Using All Consuming to Get Book Lists
      Hack 67.  Tracking Packages with FedEx
      Hack 68.  Checking Blogs for New Comments
      Hack 69.  Aggregating RSS and Posting Changes
      Hack 70.  Using the Link Cosmos of Technorati
      Hack 71.  Finding Related RSS Feeds
      Hack 72.  Automatically Finding Blogs of Interest
      Hack 73.  Scraping TV Listings
      Hack 74.  What's Your Visitor's Weather Like?
      Hack 75.  Trendspotting with Geotargeting
      Hack 76.  Getting the Best Travel Route by Train
      Hack 77.  Geographic Distance and Back Again
      Hack 78.  Super Word Lookup
      Hack 79.  Word Associations with Lexical Freenet
      Hack 80.  Reformatting Bugtraq Reports
      Hack 81.  Keeping Tabs on the Web via Email
      Hack 82.  Publish IE's Favorites to Your Web Site
      Hack 83.  Spidering GameStop.com Game Prices
      Hack 84.  Bargain Hunting with PHP
      Hack 85.  Aggregating Multiple Search Engine Results
      Hack 86.  Robot Karaoke
      Hack 87.  Searching the Better Business Bureau
      Hack 88.  Searching for Health Inspections
      Hack 89.  Filtering for the Naughties
      Chapter 5.  Maintaining Your Collections
      Hacks #90-93
      Hack 90.  Using cron to Automate Tasks
      Hack 91.  Scheduling Tasks Without cron
      Hack 92.  Mirroring Web Sites with wget and rsync
      Hack 93.  Accumulating Search Results Over Time
      Chapter 6.  Giving Back to the World
      Hacks #94-100
      Hack 94.  Using XML::RSS to Repurpose Data
      Hack 95.  Placing RSS Headlines on Your Site
      Hack 96.  Making Your Resources Scrapable with Regular Expressions
      Hack 97.  Making Your Resources Scrapable with a REST Interface
      Hack 98.  Making Your Resources Scrapable with XML-RPC
      Hack 99.  Creating an IM Interface
      Hack 100.  Going Beyond the Book
   Colophon
   Index
[ Team LiB ] Previous Section Next Section
Seguro de coche barato