wget命令可以在linux下进行下载文件,这里的用法是摘自其GNU的帮助文档,由于都是E文,所以亲自用了一下,并记录其要点,以便日后翻阅:
Simple Usage 简单用法:
1、Say you want to download a URL. Just type:
wget http://fly.srk.fer.hr/ 最简单直接下载
2、But what will happen if the connection is slow, and the file is lengthy? The connection will probably fail before the whole file is retrieved, more than once. In this case, Wget will try getting the file until it either gets the whole of it, or exceeds the default number of retries (this being 20). It is easy to change the number of tries to 45, to insure that the whole file will arrive safely:
wget –tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg 注:–tries=45是当服务器无反应时,连续重试连接45次,当为0是则无限次重下载.
Now let’s leave Wget to work in the background, and write its progress to log file log. It is tiring to type –tries, so we shall use -t.
wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg 注:-o log 把显示的信息写信到log文件中,亦可改为:wget -t 45 -o log.txt http://www.wcsky.com 即信息写入到log.txt文件中。
The ampersand at the end of the line makes sure that Wget works in the background. To unlimit the number of retries, use -t inf.
The usage of FTP is as simple. Wget will take care of login and password.
wget ftp://gnjilux.srk.fer.hr/welcome.msg
Advanced Usage 高级用法
1、Create a five levels deep mirror image of the GNU web site, with the same directory structure the original has, with only one try per document, saving the log of the activities to gnulog:
wget -r http://www.gnu.org/ -o log 注:-r 下载http://www.gnu.org/文件的镜像内容。
The same as the above, but convert the links in the HTML files to point to local files, so you can view the documents off-line:
wget –convert-links -r http://www.gnu.org/ -o gnulog 注:下载http://www.gnu.org/文件的镜像内容以及内容中的链接为本地的链接。
2、Retrieve only one HTML page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.
wget -p –convert-links http://www.server.com/dir/page.html
The HTML page will be saved to www.server.com/dir/page.html, and the images, stylesheets, etc., somewhere under www.server.com/, depending on where they were on the remote server.
3、The same as the above, but without the www.server.com/ directory. In fact, I don’t want to have all those random server directories anyway–just save all those files under a download/ subdirectory of the current directory.
wget -p –convert-links -nH -nd -Pdownload http://www.server.com/dir/page.html
wget -Pdownload http://www.wcsky.com 注:下载http://www.wcsky.com里的面容到download文件夹中,如果此文件夹不存在,它会自动创建。
4、Retrieve the index.html of www.lycos.com, showing the original server headers:
wget -S http://www.lycos.com/ 注:-S : 用来显示服务器端的头信息,
5、Save the server headers with the file, perhaps for post-processing.
wget -s http://www.lycos.com/ 注:-s:下载其URL文件,并把头信息写入到文件中。
6、Retrieve the first two levels of wuarchive.wustl.edu, saving them to /tmp.
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
7、You want to download all the GIFs from a directory on an HTTP server. You tried wget http://www.server.com/dir/*.gif, but that didn’t work because HTTP retrieval does not support globbing. In that case, use:
wget -r -l1 –no-parent -A.gif http://www.server.com/dir/
More verbose, but the effect is the same. -r -l1 means to retrieve recursively (see Recursive Retrieval), with maximum depth of 1. –no-parent means that references to the parent directory are ignored (see Directory-Based Limits), and -A.gif means to download only the GIF files. -A “*.gif” would have worked too.
8、Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:
wget -nc -r http://www.gnu.org/
wget -c http://www.wcsky.com/1.mp3 注:-c:用于断点续传。超实用!!
9、If you want to encode your own username and password to HTTP or FTP, use the appropriate URL syntax (see URL Format).
wget ftp://hniksic:mypassword@unix.server.com/.emacs
Note, however, that this usage is not advisable on multi-user systems because it reveals your password to anyone who looks at the output of ps.
10、You would like the output documents to go to standard output instead of to files?
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
You can also combine the two options and make pipelines to retrieve the documents from remote hotlists:wget -O - http://cool.list.com/ | wget –force-html -i -