Skip to main content

Harvest datasets in a batch with WGET or a Browser Plugin



WGET
  • This command line program can download batches of datasets or just XML files.  
  • Some recipes to try:
  • Download all zipped files from an FTP site: wget -r --no-parent -A.zip name of site
  • Download only the XML files from an online folder: wget -r -l 1 -np -A '*.xml' name of site
  • To download all ZIP files from a DCAT JSON:
    • Create a CSV from the JSON file (see above section called Harvesting Metadata from ArcGIS Open Data or Socrata portals)
    • Save a copy with only the download links (ZIP files)
    • Use this spreadsheet to download the datasets: wget -i filename.csv
    • Troubleshooting Note: ArcGIS portals create the downloadable shapefiles on the fly- this means that they may time out or cause errors when trying to download batches or even singly.


Browser Plug-ins

  • A browser plugin is another way of downloading datasets.  This option may work if WGET fails.
  • As of this writing, a useful one for Chrome is Tab Save, where the user can just paste a list of URLs into a window.