Hack 38 Archiving Your Favorite Webcams
 
Got a number of scenic or strategically placed
webcams you watch daily? Or would like to ensure that your coworkers
are actually doing the work you've assigned them?
Keep on top of your pictorial problems with Python.
Keeping track of a large number of active
webcams is a thankless task: half the
time the images haven't changed, and the rest of the
time it takes just as long to go through refreshing them all or
waiting for them to refresh as it does to look at and mentally
process the images themselves.
This hack alleviates your grief by automatically
downloading images from webcams every
15 seconds—but only if they've been updated,
so that we don't waste bandwidth.
It's also the only Python script in the entire book
and, as such, earns special recognition.
To tell the program which URLs to download, we have to put them in a
file, one per line. The program looks for this list by default at
URIs.txt, but this can be changed both in the
source and on the command line.
The program puts each picture in its own file, after producing an
index file (which defaults to webcams.html) so
that we can quickly and easily browse all of the downloaded images in
one go.
The Code
Save the following code as getcams.py:
#!/usr/bin/python
"""
getcams.py - Archiving Your Favorite Web Cams
Sean B. Palmer, <http://purl.org/net/sbp/>, 2003-07.
License: GPL 2; share and enjoy!
Usage:
python getcams.py [ <filename> ]
<filename> defaults to URIs.txt
"""
import urllib2, time
from urllib import quote
from email.Utils import parsedate
# # # # # # # # # # # # # # # # #
# Configurable stuff
#
# download how often, in seconds
seconds = 15
# what file we should write to
index = 'webcams.html'
# End of configurable stuff!
# # # # # # # # # # # # # # # # #
def quoteURI(uri):
# Turn a URI into a filename.
return quote(uri, safe='')
def makeHTML(uris):
# Create an HTML index so that we
# can look at the archived piccies.
print "Creating a webcam index at", index
f = open(index, 'w')
print >> f, '<html xmlns="http://www.w3.org/1999/xhtml" >'
print >> f, '<head><title>My Webcams</title></head>'
print >> f, '<body>'
for uri in uris:
# We use the URI of the image for the filename, but we have
# to hex encode it first so that our operating systems are
# happy with it. The following code unencodes the URI.
link = quoteURI(uri).replace('%', '%25')
# Now we make the image, and provide a link to the original.
print >> f, '<p><img src="%s" alt=" " /><br />' % link
print >> f, '-<a href="%s">%s</a></p>' % (uri, uri)
print >> f, '</body>'
print >> f, '</html>'
f.close( )
print "Done creating the index!\n"
metadata = {}
def getURI(uri):
print "Trying", uri
# Try to open the URI--we're not downloading it yet.
try: u = urllib2.urlopen(uri)
except Exception, e: print " ...failed:", e
else:
# Get some information about the URI; we do this
# to find out whether it's been updated yet.
info = u.info( )
meta = (info.get('last-modified'), info.get('content-size'))
print " ...got metadata:", meta
if metadata.get(uri) == meta:
print " ...not downloading: no update yet"
else:
# The image has been updated, so let's download it.
metadata[uri] = meta
print " ...downloading; type: %s; size: %s" % \
(info.get('content-type', '?'), info.get('content-size', '?'))
data = u.read( )
open(quoteURI(uri), 'wb').write(data)
print " ...done! %s bytes" % len(data)
# Save an archived version for later.
t = parsedate(info.get('last-modified'))
archv = quoteURI(uri) + '-' + time.strftime('%Y%m%dT%H%M%S', t) + [RETURN]
'.jpg'
open(archv, 'wb').write(data)
u.close( )
def doRun(uris):
for uri in uris:
startTime = time.time( )
getURI(uri)
finishTime = time.time( )
timeTaken = finishTime - startTime
print "This URI took", timeTaken, "seconds\n"
timeLeft = seconds - timeTaken # time until the next run
if timeLeft > 0: time.sleep(timeLeft)
def main(argv):
# We need a list of URIs to download. We require them to be
# in a file; the next line defaults the filename to URIs.txt
# if it can't gather one from the command line.
fn = (argv + [None])[0] or 'URIs.txt'
data = open(fn).read( )
uris = data.splitlines( )
# Now make an index, and then
# continuously download the piccies.
makeHTML(uris)
while 1: doRun(uris)
if __name__=="__main_ _":
import sys
# If the user asks for help, give it to them!
# Otherwise, just run the program as usual.
if sys.argv[1:] in (['--help', '-h', '-?']):
print __doc_ _
else: main(sys.argv[1:])
Running the Hack
Here's a typical run, invoked from the command line:
% python getcams.py
Creating a webcam index at webcams.html
Done creating the index!
Trying http://example.org/webcams/someplace.jpg
...got metadata: ('Thu, 10 Jul 2003 15:50:38 GMT', None)
...downloading; type: image/jpeg; size: ?
...done! 32594 bytes
This URI took 8.2480000257 seconds
Trying http://example.org/webcams/phenomic.jpg
...got metadata: ('Thu, 10 Jul 2003 11:35:51 GMT', None)
...not downloading: no update yet
This URI took 1.30099999905 seconds
The code, complicated though it looks, consists of only a few stages:
Open the list of URLs of each of the webcams. Create an HTML index so that we can view the downloaded webcam images. For each URL in our list, check to see if the image has been updated
or not. If it has, download it. In the event that it took under 15
seconds to download, wait for the remainder of the time in an attempt
to respect the server resources of others.
Hacking the Hack
The code has a number of limitations:
We have to know the URL of each picture for downloading. So, if we
don't know the URL or if it changes a lot, we have a
problem. But really, the biggest problem here is that
it's just a bit of an inconvenience to get the
actual URL of each picture that we want. If a web site goes down, the script hangs. We could get around this
problem by using Python's async
module, but this would add quite a bit of complexity. People have been known to fake Last-Modified HTTP
headers, so the metadata that we use to ascertain whether a picture
has been updated isn't absolutely reliable. However,
most Last-Modified headers are faked to force
people to use fresh rather than cached versions, so if
they're that passionate about it, we may as well let
them. If you have any files in your directory that have the same names as
the quoted versions of the URLs you're trying to
download, the program will overwrite them.
Other than these limitations, the code is safe.
—Sean B. Palmer
|