Skip to main content

Deleting duplicate keywords in OpenRefine

We are planning to perform some keyword remediation on the Big Ten Academic Alliance Geoportal records starting in 2017.  This process includes normalizing values by spelling, capitalization, and pluralization.

ISSUE: Duplicate keywords in metadata
SOLUTION: Use a GREL expression in OpenRefine



One of the challenges our project has encountered is duplicated keywords.  Thanks to the code provided on the Free Your Metadata site, fixing this is an easy process with OpenRefine.  Here are the steps:

1. Create a Project in OpenRefine from a csv file
2. The keywords should be combined in one cell per row.  Take note of the separator character (usually a comma, but our csv files often use triple hash marks- ###)
3. Click the dropdown arrow next to the column name and select Edit Cells-Transform



4. Enter this Grel expression: value.split(", ").uniques().join(", ").



Note: The character between the quotes in the expression needs to match the separator used in your file.  For example, for the ### separator, the expression would be: value.split("###").uniques().join("###")