Skip to main content

Posts

Showing posts from January, 2017

Using sentence-case for keywords in OpenRefine

Issue Capitalization and pluralization of ingested keywords vary.  Our keyword list for in GeoBlacklight is somewhat messy and contains near duplicates. Challenge Our instance of Solr for GeoBlacklight indexes Dog, dog, dogs as separate keywords. Solution Use OpenRefine to normalize keywords before importing to Solr. Description As we aggregate metadata records from multiple sources, we found that the keywords need attention. The GIS records have keyword groups that may or may not come from a thesaurus, but frequently are coming from the TAGS field in ArcGIS Open Data Portals.  As a result, the keywords are frequently just regional acronyms or abbreviations and often have many spelling variants. We also anticipate combining our metadata records with those made at other institutions outside of the Big Ten Academic Alliance Geospatial Data Project.  After reviewing records from other universities and consulting the RDA rules on capitalization , we decided to convert theme