06 July 2010

How to use your unlimited quota 2AM-8AM

Fun with BASH. Disclaimer: this may contain typoes, thinkoes, etc. Use (or don’t use) entirely at your own risk. The intent is to be educational here rather than subject to chapter-&-verse unquestioning acceptance. If the clock on the computer running this is inaccurate by more than five minutes, the downloads may (partially) take place from your monthly quota (in which case install rdate & aim it at a suitable time server, then install/use hwclock to update the PC’s battery-backed clock).

Behold, the grabqueue.sh file:

#!/bin/sh
for list in pending/*; do
    N=$(wc -l $list | sed -e 's/ .*$//')
    for url in $(seq $N); do
        L=$(tail -n +$url $list | head -n 1 -)
        if [ "$L" != "" ]; then
            wget -c "$L"
        fi
    done
    rm -f $list
done
rm -f /tmp/grabqueue.pid


Run this (in cron) at 02:05 (a little clock slackness):

#!/bin/sh
cd /path/to/download/directory
sh grabqueue.sh &
echo $! >/tmp/grabqueue.pid


Run this (in cron) at 07:55 (again, clock slackness):

#!/bin/sh
if [ -f /tmp/grabqueue.pid ]; then
    kill $(cat /tmp/grabqueue.pid)
    rm -f /tmp/grabqueue.pid
fi


Discussion: put the URLs of whatever you wish to fetch within text files inside the pending subdirectory within your downloads directory, one URL per line. Empty lines within these files are ignored.

The “-c” option (to wget) causes it to continue an existing download (or start a fresh one if the file does not yet (locally) exist). This means that attempting to download a file you already have will occupy a fraction of a second, plus little or zero traffic.

The script deletes each file full of URLs after the last URL in the file has been downloaded from.

The $! phrase in BASH is replaced with the PID (Process ID) of the most recently spawned sub-process. The first cron job launches the script, then records the PID within a temporary file. The second cron job (if said file still exists) kills the process so listed, then deletes the temporary file. Anything not completely downloaded at that point will be resumed during the next morning, until each download is completed.

Within the script, $list is the name of the URL-list file currently being processed. $L is the line containing a URL which is being downloaded.

To download videos, ensure that you have the URL for the video file itself (which typically ends with .mpg or .flv or .mp4), rather than the URL of the web page upon which the video is presented (so typically not ending with .html or .asp or .htm or .php).

4 comments:

kundip@hotmail.com said...

Here is my attempt
#!/bin/sh
for list in /home/k~p/grabqueue/*; do
N=$(wc -l $list | sed -e 's/ .*$//')
for url in $(seq $N); do
L=$(tail -n +$url $list | head -n 1 -)
if [ "$L" != "" ]; then
wget -c "$L"
fi
done
rm -f $list
done
rm -f /tmp/grabqueue.pid

I put a file named "list" in /home/k~p/grabqueue
Do I need
rm -f /home/k~p/grabqueue/list
as last line so it does not download over and over each night?

Leon RJ Brooks said...

Brett, $list is a variable name, not a filename. Within the “for” loop, it represents each text file in the /home/k~p/grabqueue directory, one file at a time.

Each file is read, each non-empty line in the file is treated as a URL & downloaded.

YouTube URLs will only fetch the page around the video, rather than the video file itself, hence the need for the grabyoutube.sh mucking around.

This process is only useful if your ISP has a no-quota download period (which ExeTel does from 02:00 to 08:00).

kundip said...

Thanks I have 20 gig off-peak 95% unused to date usually use "at" command
and wget with a text file. Prone to running past 8am if links are slow to download. The best would be for me if as each line of text file is done it is deleted then at 8am something to kill wget. Then I could grab the rest on following nights.

Leon RJ Brooks said...

Yo, that’s what the second cron job does: kills off the downloading process.

One amendment needed for the grabyoutube.sh script is that ${RANDOM} fails, (resolves to an empty string), it needs to be $RANDOM instead (to resolve to a random number).