How to download and mirror a website with wget
In an increasingly connected digital world, being able to access web content even when offline is a significant convenience. One of the most useful techniques for downloading and mirroring an entire website is using the wget
tool. This powerful and versatile command allows you to obtain a complete copy of a website, including its pages, images, CSS files, and JavaScript, so you can browse the site offline or create a backup.
What is wget
?
wget
(World Wide Web Get) is a command-line tool used to download content from the Internet. It is widely used on Unix, Linux, and macOS systems but can also be installed on Windows. wget
is especially useful for downloading entire websites or individual files in an automated manner while maintaining the original structure of the links between pages.
How to use wget
to mirror a website
If you need to mirror a website, meaning download it completely for offline browsing or backup, you can use the wget
command with some specific options.
Basic Command
The command to download a complete website with wget
is as follows:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P /path/to/save https://www.your-site.com
Explanation of the options:
--mirror
: enables full mirroring support, downloading all pages, images, and necessary files.--convert-links
: converts all links within the downloaded pages so that they can be navigated offline without issues.--adjust-extension
: adds the appropriate extension to the downloaded files (e.g., .html, .jpg, etc.).--page-requisites
: downloads all resources needed to properly display the pages (e.g., CSS, images, scripts).--no-parent
: prevents downloading files that are outside of the specified site's directory.-P /path/to/save
: specifies the directory where the website will be saved.
Practical Example
Suppose you want to download the website https://www.your-site.com
to the folder /Users/vincenzo/Downloads/site
. The command would be:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P /Users/vincenzo/Downloads/site https://www.your-site.com
This command will create a complete copy of the website in the specified folder, with all the necessary files for offline viewing.
Why use wget
to mirror a website?
- Offline browsing: You can access the website without an Internet connection, which is great when travelling or in places with limited connectivity.
- Website backups: You can create a backup of the website, which is useful if you want to archive a static version of it.
- Speed and automation:
wget
is fast and can be easily automated to download large sites.
Important Considerations
- Respect for copyright: Ensure that you have permission to download and use the content of a website. Some sites may prohibit downloading via their
robots.txt
. - Disk space: A full website, especially one rich in images and multimedia files, can take up a lot of space. Make sure you have enough room on your hard drive.
- Download time: Depending on the size of the site and your Internet connection, the mirroring process might take some time.
Conclusion
Using wget
to download an entire website is a powerful and flexible technique. If you need to browse a site offline or create a backup, wget
is one of the best tools at your disposal. With the right options, you can easily customise the command to suit your needs.