Skip to main content

Posts

Showing posts from December, 2017

Harvesting from CKAN and Sorting Adjacent Key Value Columns

Many of the open data portals are built on the open source application CKAN.  Metadata can be harvested from these portals using the CKAN API.  Many CKAN items include numerous resource URLs, including download links of varying formats, landing page links, web services, and applications. Sorting through myriad of links can be challenging. This post describes how to: Harvest metadata with the ckan-exporter script Use OpenRefine to sort the resource URLs. Part 1: Harvest metadata with the ckan-exporter script The CKAN API has a number of calls that will return information such as a list of items, tags, or organizations.  It will also return the item's metadata, also described as a package in the API calls.  The ckan-exporter script allows the user to define a list of desired metadata elements that can be harvested via the command line. The readme file includes documentation for how to set up the harvest files and examples are included. The BTAA GDP fork of the ckan-expo