How to download and mirror a website with wget

In an increasingly connected digital world, being able to access web content even when offline is a significant convenience. One of the most useful techniques for downloading and mirroring an entire website is using the wget tool. This powerful and versatile command allows you to obtain a complete copy of a website, including its pages, images, CSS files, and JavaScript, so you can browse the site offline or create a backup.

What is wget?

wget (World Wide Web Get) is a command-line tool used to download content from the Internet. It is widely used on Unix, Linux, and macOS systems but can also be installed on Windows. wget is especially useful for downloading entire websites or individual files in an automated manner while maintaining the original structure of the links between pages.

How to use wget to mirror a website

If you need to mirror a website, meaning download it completely for offline browsing or backup, you can use the wget command with some specific options.

Basic Command

The command to download a complete website with wget is as follows:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P /path/to/save https://www.your-site.com

Explanation of the options:

  • --mirror: enables full mirroring support, downloading all pages, images, and necessary files.
  • --convert-links: converts all links within the downloaded pages so that they can be navigated offline without issues.
  • --adjust-extension: adds the appropriate extension to the downloaded files (e.g., .html, .jpg, etc.).
  • --page-requisites: downloads all resources needed to properly display the pages (e.g., CSS, images, scripts).
  • --no-parent: prevents downloading files that are outside of the specified site's directory.
  • -P /path/to/save: specifies the directory where the website will be saved.

Practical Example

Suppose you want to download the website https://www.your-site.com to the folder /Users/vincenzo/Downloads/site. The command would be:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P /Users/vincenzo/Downloads/site https://www.your-site.com

This command will create a complete copy of the website in the specified folder, with all the necessary files for offline viewing.

Why use wget to mirror a website?

  • Offline browsing: You can access the website without an Internet connection, which is great when travelling or in places with limited connectivity.
  • Website backups: You can create a backup of the website, which is useful if you want to archive a static version of it.
  • Speed and automation: wget is fast and can be easily automated to download large sites.

Important Considerations

  • Respect for copyright: Ensure that you have permission to download and use the content of a website. Some sites may prohibit downloading via their robots.txt.
  • Disk space: A full website, especially one rich in images and multimedia files, can take up a lot of space. Make sure you have enough room on your hard drive.
  • Download time: Depending on the size of the site and your Internet connection, the mirroring process might take some time.

Conclusion

Using wget to download an entire website is a powerful and flexible technique. If you need to browse a site offline or create a backup, wget is one of the best tools at your disposal. With the right options, you can easily customise the command to suit your needs.