Skip to main content

Posts

Scraping Portal Discovery Metadata and Merging it with Standards Documentation Metadata

Summary  This post describes a technique for scraping Portal Discovery Metadata from a custom site and merging it with Standards Documentation Metadata in accompanying XML files. The example portal used is PASDA , but this could be modified for other repositories. Background The BTAA GDP aggregates metadata to provide a catalog of geospatial resources from public data providers.  There are generally two types of sources for the metadata: Portal Discovery Metadata : This is found within the data provider's portal application and may include minimal elements, such as title, date, description, and links.  Several structured data portal applications, such as ArcGIS Hub and Socrata provide this through their API as DCAT.  Other portals, such as CKAN, have APIs that expose the Portal Discovery Metadata in a custom schema.   Standards Documentation Metadata : This is a file that accompanies the dataset and includes much more detail, such as ...
Recent posts

Geo4LibCamp2018 & Index Maps

January saw the third annual Geo4LibCamp , a hands-on "unconference" meeting to share best-practices, solve common problems, and address technical issues with integrating geospatial data into a repository and associated services. Geo4LibCamp2018 Attendees This year’s event, hosted at Stanford University, included nearly 50 attendees from 26 academic institutions and organizations.  Highlights from Day 1 included the Keynote Address from the always inspiring Stace Maples, who described how the Stanford Geospatial Center has been able to aid international public health projects, and the plenary panel, Getting started and keeping momentum , with speakers from five institutions that have successfully set up geospatial repositories and/or discovery applications. The featured speaker on Day 2 was none other than David Rumsey himself, who provided a wonderful walk through of the history of the San Francisco Bay area with maps ranging across hundreds of years. Rumsey als...

Harvesting from CKAN and Sorting Adjacent Key Value Columns

Many of the open data portals are built on the open source application CKAN.  Metadata can be harvested from these portals using the CKAN API.  Many CKAN items include numerous resource URLs, including download links of varying formats, landing page links, web services, and applications. Sorting through myriad of links can be challenging. This post describes how to: Harvest metadata with the ckan-exporter script Use OpenRefine to sort the resource URLs. Part 1: Harvest metadata with the ckan-exporter script The CKAN API has a number of calls that will return information such as a list of items, tags, or organizations.  It will also return the item's metadata, also described as a package in the API calls.  The ckan-exporter script allows the user to define a list of desired metadata elements that can be harvested via the command line. The readme file includes documentation for how to set up the harvest files and examples are included. The BTAA GDP fork o...

Exporting Metadata from Omeka to CSV

February 2018 Update: There is now a plugin for Omeka that allows users to export to CSV via the interface. The Omeka digital collections platform features many easy to use plugins to facilitate editing and sharing metadata.  Oddly, there isn't a built in option or even a plugin available yet that allows a user to export the metadata directly to a spreadsheet. This post lists a step by step method to do this using a PHP script written by an Omeka developer.  Note: This script will export all items that have been marked Public. It will also export all of the elements, even if they are empty. 1. Clone or download  OmekaApiToCsv .  This is a version of the original script, but has the addition of a pipe delimiter for multivalued elements. 2. Upload and extract this same set of scripts to the same web server as the Omeka installation.  Folder of the OmekaApiToCsv scripts within the rest of the Omeka files 3. Open the php file within the OmekaApi...

Geospatial Metadata Contact Types and Roles

Geospatial metadata standards include multiple elements for contacts - persons or organizations that play some kind of role in the creation or maintenance of the resource. Although many metadata profiles require listing multiple contact types/roles, they are often the same entity. This post lists the main contact types in use, and their locations in the ISO 19139 and FGDC CSDGM standards. Point of Contact This is the person/organization that the user should contact for questions.  This contact type should include an address, email, and phone number.  Their role in the creation or maintenance of the resource is not specified here- it is just an all-purpose contact for the resource. ISO ⇨ MD_DataIdentification.gmd:pointOfContact FGDC ⇨ 1.9-10 Metadata Point of Contact This is the person/organization that is responsible for the metadata.  They are not necessarily part of the publisher or distributor organization.  They are the perso...

ArcGIS Metadata Toolbox Guide

These charts list the most useful metadata tools/models, when to use them, and problems they may cause. For more information, see the ArcGIS online documentation. Name Synchronize Type Tool When to Use When there is no metadata file yet or the dataset has changed. Description Uses the dataset to create or update technical metadata. This script will automatically run when you open the dataset in ArcCatalog in the Description tab (this is set in the ArcCatalog Options - Metadata tab. It will create or update many fields, including extent, coordinate system, geometry, format, size, and attribute field names. Since this script can run automatically when opening the item in ArcCatalog, it is not often necessary to call the script manually. However, if you have the option to run automatically turned off in the options, it can be called to update the item. It helpful to use for batch technical metadata creation or updating with an ArcPy script that iterates ...

Adding Place keywords from GeoNames to map records

The GeoBlacklight plugin for Omeka includes a custom feature in the Spatial Coverage field. A user can type in a place term, which will query GeoNames, and produce a dropdown list of options.  The user selects a value from the list, and this will pull in the GeoNames URI. The user can select multiple place names using multiple inputs.   A second feature of the plugin is that it can pull the bounding box coordinates from the GeoNames official record.  This is triggered during export to the GeoBlacklight schema JSON file if the Bounding Box element for the item is empty.  Although users can add multiple place names, only the first input will be queried for this function. When you begin typing in this input, it will query the GeoNames API and pull suggestions for defined place names.    NOTE:  This feature is constrained by the fact that our site is hosted on an https site, but the GeoNames API query is on http.  This means that your brows...