Wget

Introduction

The latest version of wget can be downloaded from http://www.christopherlewis.com/WGet/WGetFiles.htm

Downloading newer files

Here's how to download a list of files, and have wget download any of them if they're newer:

del update.txt
wget http://www.acme.com/download/update.txt
wget -N -i update.txt -B http://www.acme.com/download/

Not sure how reliable the -N switch is, considering that dates can change when uploading files to an FTP server, and a file can have been changed even though its size remained the same, but I didn't find a way to force wget to overwrite files every time (-r creates a directory tree).

Failed attempts to overwrite files:

rem Date/time reliable on FTP server? Server file no newer than local file `bla.exe' -- not retrieving.
rem wget -i update.txt -B http://www.acme.com/download/ -m -nd
 
rem -r creates a subdirectory!
rem wget -r -i update.txt -B http://www.acme.com/download/
 
rem -r -nc creates a subdirectory!
rem wget -r -nc -i update.txt -B http://www.acme.com/download/
 
rem reliable?
rem wget -N -i update.txt -B http://www.acme.com/download/
 
rem no better : .1
rem wget -nd -i update.txt -B http://www.acme.com/download/

Downloading a subdirectory with FTP without ascending

wget -m -np ftp://jdoe:mypasswd@ftp.acme.com/somedir

... where -m = mirror (a shortcut to a bunch of switches), and -np = do not go up to the parent directory

Downloading in FTP in passive mode

--passive-ftp

Downloading a website for offline-browsing

wget -mpk http://www.acme.com

 

wget -np -I /mysite -m http://localhost

wget -nc -c -N -r -l=NUMBER -L http://localhost

where...

Should a login/passord be required, use --http-user=USER --http-passwd=PASS

If downloading files through FTP instead of HTTP, you can use the --passive-ftp option in case the host is running behind a stateless firewall.

If a site has a robots.txt and wget fails sucking a site, try the -e "robots = off" switch. If it still doesn't work, have Wget pretend it's a different user agent using -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)" or -U "user-agent="Mozilla/3.01Gold > (Win95;I)".

Downloading a website directory structure

... without actually downloading any file:

--spider                 don't download anything.

Uploading to a remote FTP server (POSSIBLE?)

Download:

Recursive retrieval:

 

  -p,  --page-requisites    get all images, etc. needed to display HTML page.

Recursive accept/reject:

Q&A

I need to download multiple files

http://multiget.sourceforge.net

Wget doesn't seem able to work with web servers that listen to a port other than the standard TCP 80

You are using an older release of wget. Upgrade to at least release 1.8.x

I use Wget to download a site from a server running on my host, and then upload the result to a remote FTP server, but all links refer to http://localhost!

Resources