Preface
When the Web began, it was a pretty small place. It
didn't take much to keep abreast of new sites, and
with subject indexes like the fledgling Yahoo! and
NCSA's "What's
New" page, you could actually give keeping up with
newly added pages the old college try.
Now, even the biggest search engines—yes, even
Google—admit they don't index the entire Web.
It's simply not possible. At the same time, the Web
is more compelling than ever. More information is being put online at
a faster clip—be it up-to-the-minute data or large collections
of old materials finding an online home. The Web is more browsable,
more searchable, and more useful than it ever was when it was still
small. That said, we, its users, can only go so fast when searching,
processing, and taking in information.
Thankfully, spidering allows us to bring a bit of sanity to the
wealth of information available. Spidering is
the process of automating the grabbing and sifting of information on
the Web, saving us the trouble of having to browse it all manually.
Spiders range in complexity from the simplest script to grab the
latest weather information from a web page, to the armies of complex
spiders working in concert with one another, searching, cataloging,
and indexing the Web's more than three billion
resources for a search engine like Google.
This book teaches you the methodologies and algorithms behind spiders
and the variety of ways that spiders can be used. Hopefully, it will
inspire you to come up with some useful spiders of your own.
 |