WWWOFFLE Hints And Tips Page

This page contains more information than that provided in the FAQ in a more user friendly format to allow WWWOFFLE to be used in the most effective way.

Configuration File Help

DontGet Options

The DontGet section of the allows you to stop WWWOFFLE from fetching certain types of URLs.

They can be specified as a filename extension that is not to be fetched, in the example below no .zip files will ever be fetched.

DontGet
{
 *://*/*.zip
}

They can also be specified as paths on a server that are not to be fetched, in the example below no files in the /dontget subdirectory on any server are fetched.

DontGet
{
 *://*/dontget/
}

I have available a list of entries for the DontGet section that will stop many unnecessary images / pages from being displayed. The easiest way to use this is to copy this file to wwwoffle.DontGet.conf in the same directory as the wwwoffle.conf file and include this file in the main configuration file.

DontGet
[
 wwwoffle.DontGet.conf
]

CensorHeader Options

The CensorHeader section allows you to control what information is sent about you to the server or what is received from the server. Some of the header information is important and cannot be censored, but some items can safely be removed.

The principle reason for modifying the headers when browsing it to preserve some privacy. In this case it is important to remove as much personal information as possible from the requests that are sent. A second level of privacy can be obtained by hiding information about the page that you came from.

Some browsers send your username with the requests that they make. This is the most important item to remove. This is sent in a From field.

Censorheader
{
 From =
}

There is also information about the browser that you are using. This is sent in a User-Agent field. If you do censor this then it will stop most servers from working out what browser you are using. There are other ways to detect what browser is being used, for example what other headers are sent and in what order. Also Javascript can be used to detect the browser and this cannot be blocked using these options.
If you do censor this header then it is possible that you will be denied access to certain sites or pages. This is not the fault of WWWOFFLE, but of the site designer for being so selective. In WWWOFFLE you can supply your own information for this (and any other) header, so my browsing shows up with WWWOFFLE/2.9 as the browser.

CensorHeader
{
 User-Agent = WWWOFFLE/2.9
}

The other information that can easily and usefully be removed is Cookies. These can provide information about how often you visit a site or what pages you have viewed on it. If you don't want cookies being sent to servers then remove the Cookie header, if you don't want to receive cookies from servers then remove the Set-Cookie header.

The options in this section of the configuration file can be configured so that they only apply to some URLs and not to others. This is most useful to allow cookies to be sent to some servers and not others. For example if you want to deny cookies to all servers except one called www.trusted-with-cookies.com then you would need two Cookie options here, one to allow cookies to the specified server and one to deny cookies to all other servers.

CensorHeader
{
 <http://www.trusted-with-cookies.com/*> Cookie = no
 Cookie = yes

 <http://www.trusted-with-cookies.com/*> Set-Cookie = no
 Set-Cookie = yes
}

Online Options

The online options section of the configuration file allows for many options to be set on a URL-by-URL basis. These options can be used to control the way that WWWOFFLE decides which pages are to be fetched again when requested and which ones are to use the cached version.

Pages that can usefully be cached for a long time are static pages, mainly images. These might be the icons that appear all over pages on the same server. These can be preserved in the WWWOFFLE cache for a long time and only requested infrequently since they change rarely. The following example shows the changes that could be made to reduce the bandwidth to one particular set of static images (these URL specific options need to go before the generic options in the section).

OnlineOptions
{
 <http://images*.slashdot.org> request-changed = 4w
 <http://*slashdot.org> request-changed-once = yes
}

Purge
{
 <http://images*.slashdot.org> age = 6w
 <http://*slashdot.org> age = 4w
}

I have a list of some entries for the OnlineOptions section that will help reduce bandwidth, it is based on the example that is given above.

Another feature that some web-servers find useful is to force the browser to keep reloading the same page. This can be done in a number of ways and there are many ways in WWWOFFLE to ignore these requests. Using the request-changed or request-changed-once options in the OnlineOptions section will mean that WWWOFFLE will not make another request for a cached page until it has reached a certain age.

OnlineOptions
{
 request-changed = 10m
 request-changed-once = yes
}

The request-expired and request-no-cache options can be set to no so that even pages that the server says have expired are not requested again.

OnlineOptions
{
 request-expired = no
 request-no-cache = no
}

Purge Options

The Purge Options allow control over what files in the cache are to be purged. The purging is done based on the timestamp of the file that stores the page in the cache.

The first choice to make is whether to keep pages based on when you fetched them or based on when you last viewed them. I choose to use access (viewing) time rather than modification (fetching) time. This means that pages that I revisit often don't get removed too soon. This selection is made using the use-mtime option. To purge based on viewing time set use-mtime = no, to use the time of fetching set use-mtime = yes. The one problem with this is if you change the access time of all of the pages in the cache (e.g. by running grep WWWOFFLE /var/spool/wwwoffle/http/*/*) then this will change the access time and stop the pages from being purged.

Purge
{
 use-mtime = no
}

The next choice is whether to set a maximum size for the cache or to let it grow. The maximum size parameter that you set is not automatic, it only takes effect when you run wwwoffle -purge. The size that you specify is used to calculate an age that should be used in the purge. If the default age for purging is 28 days, but using 25 days would keep to the specified size then that is used instead. This is a two stage process, once with the default ages then once with the newly calculated age.

Purge
{
 max-size  = 0
}

Finally there is the option to make some of the sites in the cache last for different amounts of time. This can be longer or shorter than the default or can be set never to purge. The ages are all measured in days (unless a longer suffix is used) and the value -1 is used to indicate a site that is never purged. These can now be specified using pathnames to allow parts of a server to be purged at different ages to other parts.

Purge
{
 # Don't purge this part of this site ever :-).
 <http://www.gedanken.org.uk/software/wwwoffle/> age = -1

 # Default to 4 weeks days for http and only 1 week for ftp.
 <http://*/> age = 4w
 <ftp://*/>  age = 1w

 # You must have this if you want to purge by URL rather than just by host.
 use-url = yes
}

If you have a DontGet section in the configuration file that contains a lot of entries and is updated often then it is useful to have the purge function remove these pages. This can be done with the del-dontget option.

Purge
{
 del-dontget = yes

 # You must have this if you want to purge by URL rather than just by host.
 use-url = yes
}

Making WWWOFFLE Run Automatically When Required

WWWOFFLE is the type of program that when it is running perfectly then the user should not know that it is there. It can be fully automatic so that when the computer is booted it starts and when you go online and come back offline WWWOFFLE changes mode.

This can all be achieved by using the example scripts that are supplied with WWWOFFLE. Below is just a simple introduction to what is required, for more detail and better scripts you should look at the contrib directory of the WWWOFFLE source code.

Booting

If you have installed WWWOFFLE from a binary distribution, for example a Linux distribution like Debian, RedHat, Suse or others then this will be done automatically. If you have not then the information below may be of help.

If you have BSD style startup scripts then the file /etc/rc.local or some other file of similar name will contain commands to run at boot time. This is the easiest case and all that is needed is to add the command to run WWWOFFLE. The safest place to add this is to the end of the file.

/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf

When using a SVR4 style of startup scripts there will be many scripts in various directories in /etc. Typically there will be two copies of the same script, one called /etc/rc2.d/S90wwwoffle and one called /etc/rc0.d/K90wwwoffle.

case "$1" in
   start)
      /usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf
      ;;
   stop)
      /usr/local/bin/wwwoffle -kill -c /var/spool/wwwoffle/wwwoffle.conf
      ;;
esac

Going Online and Offline

If you are using PPP to make the network connection then there are scripts that are run automatically by pppd when the connection is made and when it is broken.

To automate the connect process you will need to edit /etc/ppp/ip-up and add the following to the end of the file.

   /usr/local/bin/wwwoffle -online -c /var/spool/wwwoffle/wwwoffle.conf
   /usr/local/bin/wwwoffle -fetch -c /var/spool/wwwoffle/wwwoffle.conf &

To automate the disconnect process you will need to edit /etc/ppp/ip-down and add the following to the end of the file.

   /usr/local/bin/wwwoffle -offline -c /var/spool/wwwoffle/wwwoffle.conf

One problem with this is that pppd will not wait for the WWWOFFLE fetch that is started in /etc/ppp/ip-up before the network connection is broken. This means that it is quite possible to interrupt the fetch process. WWWOFFLE will try to handle this gracefully, but this is no substitute for monitoring the fetch progress before breaking the connection.

Not Using Syslog

The default option for WWWOFFLE is that error messages get reported using syslog. This is not always convenient if you don't want all of the WWWOFFLE error messages mixed up with the other ones.

There is an alternative which is not to use WWWOFFLE's normal output messages instead of syslog or as well as syslog. When you start WWWOFFLE you can specify that it is not to disconnect from the terminal. This has the effect of causing error messages to be printed to the terminal. These can then be redirected to a separate log file.

If you find your WWWOFFLE startup script it will contain a line like the following:

/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf
All that you need to do is to change this line so that it looks like this:
/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf -d 3 >> /var/log/wwwoffle.log 2>&1 &
This will start WWWOFFLE and direct all messages to /var/log/wwwoffle.log. The -d 3 option sets the level of logging, this is equal to log-level=important in the config file. You can make the number smaller for less logging or bigger (up to 6) for too much logging.

If you do this then you will need to have some way of rotating the log file so that it does not grow uncontrollably. Also doing this will keep sending the error messages to syslog as well as to the new logfile. You may want to reduce the level of reporting in syslog to log-level=warning so that only important messages are reported.