How to use your unlimited quota 2AM-8AM

Fun with BASH. Disclaimer: this may contain typoes, thinkoes, etc. Use (or don’t use) entirely at your own risk. The intent is to be educational here rather than subject to chapter-&-verse unquestioning acceptance. If the clock on the computer running this is inaccurate by more than five minutes, the downloads may (partially) take place from your monthly quota (in which case install rdate & aim it at a suitable time server, then install/use hwclock to update the PC’s battery-backed clock).

Behold, the grabqueue.sh file:

#!/bin/sh for list in pending/*; do N=$(wc -l $list | sed -e 's/ .*$//') for url in $(seq $N); do L=$(tail -n +$url $list | head -n 1 -) if [ "$L" != "" ]; then wget -c "$L" fi done rm -f $list done rm -f /tmp/grabqueue.pid

Run this (in cron) at 02:05 (a little clock slackness):

#!/bin/sh cd /path/to/download/directory sh grabqueue.sh & echo $! >/tmp/grabqueue.pid

Run this (in cron) at 07:55 (again, clock slackness):

#!/bin/sh if [ -f /tmp/grabqueue.pid ]; then kill $(cat /tmp/grabqueue.pid) rm -f /tmp/grabqueue.pid fi

Discussion: put the URLs of whatever you wish to fetch within text files inside the pending subdirectory within your downloads directory, one URL per line. Empty lines within these files are ignored.

The “-c” option (to wget) causes it to continue an existing download (or start a fresh one if the file does not yet (locally) exist). This means that attempting to download a file you already have will occupy a fraction of a second, plus little or zero traffic.

The script deletes each file full of URLs after the last URL in the file has been downloaded from.

The $! phrase in BASH is replaced with the PID (Process ID) of the most recently spawned sub-process. The first cron job launches the script, then records the PID within a temporary file. The second cron job (if said file still exists) kills the process so listed, then deletes the temporary file. Anything not completely downloaded at that point will be resumed during the next morning, until each download is completed.

Within the script, $list is the name of the URL-list file currently being processed. $L is the line containing a URL which is being downloaded.

To download videos, ensure that you have the URL for the video file itself (which typically ends with .mpg or .flv or .mp4), rather than the URL of the web page upon which the video is presented (so typically not ending with .html or .asp or .htm or .php).

Comments

kundip@hotmail.com said…

Here is my attempt
#!/bin/sh
for list in /home/k~p/grabqueue/*; do
N=$(wc -l $list | sed -e 's/ .*$//')
for url in $(seq $N); do
L=$(tail -n +$url $list | head -n 1 -)
if [ "$L" != "" ]; then
wget -c "$L"
fi
done
rm -f $list
done
rm -f /tmp/grabqueue.pid

I put a file named "list" in /home/k~p/grabqueue
Do I need
rm -f /home/k~p/grabqueue/list
as last line so it does not download over and over each night?

11 July, 2010 09:42

Leon RJ Brooks said…

Brett, $list is a variable name, not a filename. Within the “for” loop, it represents each text file in the /home/k~p/grabqueue directory, one file at a time.

Each file is read, each non-empty line in the file is treated as a URL & downloaded.

YouTube URLs will only fetch the page around the video, rather than the video file itself, hence the need for the grabyoutube.sh mucking around.

This process is only useful if your ISP has a no-quota download period (which ExeTel does from 02:00 to 08:00).

11 July, 2010 20:29

Unknown said…

Thanks I have 20 gig off-peak 95% unused to date usually use "at" command
and wget with a text file. Prone to running past 8am if links are slow to download. The best would be for me if as each line of text file is done it is deleted then at 8am something to kill wget. Then I could grab the rest on following nights.

11 July, 2010 22:10

Leon RJ Brooks said…

Yo, that’s what the second cron job does: kills off the downloading process.

One amendment needed for the grabyoutube.sh script is that ${RANDOM} fails, (resolves to an empty string), it needs to be $RANDOM instead (to resolve to a random number).

12 July, 2010 06:58

Plantagenet Penguinista

Search This Blog

How to use your unlimited quota 2AM-8AM

Comments

Popular posts from this blog

boundaries

new life for an old (FTX) PSU, improved life for one human

every-application-is-part-of-a-toolkit at work