Archive

Posts Tagged ‘offline web’

Saving webpages for offline use using Firefox 3.0 + wget

July 18th, 2009 No comments

Several times I’ve had to save authenticated wiki content to my phone to view it on long flights (across the US). I found wget + Firefox 3.0 (Export Cookies extension) to be quite sufficient. The extension page explains a bit about how to use wget.

The basic strategy is:

  1. Login to website, and export cookies and session information to a text file (“cookies.txt”).
  2. Run wget and try to figure out what a reasonable download rate is. Some websites obviously don’t like bots, and so you may need to look through the wget options to slowly go throught the site. You may also need to periodically update your cookies or session info via the text file so don’t be too greedy.

As usual, you may want to look into the following wget options:

-I,-r,-l,-p,-k,-np

You may need to play around with the timing as websites generally don’t like being spidered. It also may be be illegal should check the Terms of Service.

The following is a sample commandline used to save webpages and attachments to a directory for offline use. I’ve noticed that firefox on the mac has issues with lack of extensions from wiki sites.

Example:

wget --random-wait --load-cookies=cookies.txt --save-cookies=cookies.txt --keep-session-cookies -r -l 2 -p -k -I /pages/ http://***:8080/pages/viewpage.action?pageId=xxxxxxx download/attachments
Categories: Education Tags: , ,