This post provides a description of the ten facets found in GeoBlacklight and how they are populated in the Big Ten Academic Alliance geoportal. Note: This information is subject to change as the geoportal project evolves.
What is Faceting?
Faceted searches are a way of helping a user narrow down information by selecting from predefined metadata subcategories. They constitute a cross between a browse and a search, because users can combine facets to create custom searches that were facilitated by the existence of selectable values. Faceting has become more popular among library catalogs and its introduction into the GeoBlacklight application reflects this trend.
One important issue for search interfaces that rely on faceting is metadata quality. This system puts a greater onus upon cataloguers to provide consistent and complete metadata. Variations in spelling and element usage will cause similar items to fail to be linked together. This can not only cause a user to not discover items, but also to erroneously conclude that an item might not exist.
This post will not delve into a literature review, but searching for “library catalog facet” in Google Scholar will reveal an interesting list of many recent articles on the topic.
The Facets in GeoBlacklight
Q. Which parts of the metadata do the facets come from?
The Big Ten Academic Alliance Geospatial Data Project Task Force members create GeoBlacklight metadata in one of two ways. For GIS datasets, the metadata is edited as ISO 19139. An XSLT transforms the ISO metadata into the GeoBlacklight schema. For digitized maps, the process is more direct. The metadata is edited as Dublin Core with the GeoBlacklight extension elements. The following list describes which metadata elements the facets are linked to and provides a brief commentary on their meaning (September 2016).
Institution
This identifies which university within the Big Ten Academic Alliance has taken the responsibility as steward of the record in the geoportal. Note that it does not correspond to ownership of the data. Most of the records in the geoportal point to data hosted on a state or municipal clearinghouse. This element is not present in the ISO 19139 metadata: for GIS dataset records, it is batch added when the metadata is transformed into the GeoBlacklight schema.
Publisher
For historical maps and atlases, the publisher is straightforward, as it is likely to be printed directly on the item. However, for GIS datasets, the act of “publishing” a dataset is not always known or obvious. The flexibility of the ISO 19115 metadata standard results in different interpretations for the publisher field. As a result, this is one of the most ambiguous fields in this project’s metadata.
- Some collections do not use the publisher field
- Some groups may consider the publisher to be the same as the distributor, or clearinghouse entity.
- Some collections will default the publisher field as identical to the originator or author.
Subject
This facet pulls from keywords that have been entered by scores of different agencies. The subject list for a GIS dataset record is a combination of its ISO Topic Category (listed first), and keywords with the attribute of theme. At this time, the geoportal does not recognize interchangeability between capitalization and lower case. The subject list for a digitized map pulls from the Subject field in Dublin Core. This is typically a Library of Congress subject.
Author
The author facet is an optional field in the GeoBlacklight schema. It represents the Originator in the ISO metadata and the Creator in Dublin Core.
Place
This facet allows the user to search for places by text. The element is from the keywords in ISO with the attribute Place or from the spatial coverage element in Dublin Core. Ideally, these are taken from a thesaurus, such as Geonames, but implementing this practice is challenging, because the benefit is not yet known. For example, in the GeoBlacklight interface, the user is clicking only upon text, so any matching text in the Place Keywords field will get grouped in the facets. If that text is associated with a GeoNames URI, this code does not display in the interface. This introduces ambiguity in the sense that the word “Paris” by itself could mean different places, such as Paris, France vs. Paris, Texas. However, full place name strings in the Place Keyword facet may become unwieldy. For example, a group of three place keywords would have to read: “Paris, Texas, United States; Lamar County, Texas, United States; Texas, United States” to incorporate city, county, and state. Currently, the metadata records in this project do not conform to to this layout, and practical normalization should be devised to best take advantage of this facet.
Collection
This facet is labeled “Is Part Of” in the Dublin Core-GeoBlacklight schema. For this project, we are using it to refer to the data source repository, such as “Minnesota Geospatial Commons.” One of the most common misconceptions about the Big Ten Academic Alliance Geoportal is about where the data is hosted. This project is solely an aggregator of metadata records that link to the original source. Our geoportal does not host any data. This field serves to cite the source and help users find groups of records from the same clearinghouse or library. Like the Institution field, it is not part of the ISO 19139 metadata and is batch added upon transformation.
Year
There are two date elements in the GeoBlacklight schema: Date Issued and Temporal Extent. For this project, the Date Issued is used for this facet. (Technical Note: the actual element for this facet in the GeoBlacklight metadata schema is Solr year, but this is hidden in our metadata editor and simply copied from the Date Issued element upon export.) This value is required. For ISO 19139 records, this element comes from Publication Date. For Dublin Core records, it comes from Date Issued, which is part of the Dublin Core Extended Metadata schema. Most of the digitized map records are imported into this project with the publication year in the Dublin Core Date field. This needs to be moved into the Date Issued field.
The temporal extent is not part of the faceting. It does show up on the Item View page, whereas the Date Issued field does not.
Access
This is a straightforward element that is “Public” or “Restricted.” The scope of the Big Ten Academic Alliance Geoportal only includes public records at this point. Other access or rights information from the original record needs to be appended to the description for it to display in the geoportal.
Data type
Although this facet is called Data type, it is derived from a field called Geometry type in both ISO 19139 and GeoBlacklight schema metadata. For GIS vector datasets, the field is ideally automatically generated by an application such as ArcCatalog. However, ArcGIS labels many datasets generically as “Composite” or “Complex” which is translated as “Mixed,” the most common entry in the facet.
All of the digitized maps are designated as Raster. GIS raster datasets should be as well, but may be mislabeled in the geoportal, because they do not have a Geometry Type value in the ISO metadata. Our current process is to use an XSLT that looks for various elements in the ISO metadata to produce this element. As a result, if the metadata is incomplete, this value may be incorrect. This value has been challenging to identify in batch processing, and we have questioned its usefulness for this particular project.
Data types and their icons in the GeoBlacklight facet. |
Format
For GIS dataset records, this value is derived from the Distribution Format field. This field is only required if the metadata includes a download link. The name of the format will then show up in the Download button on the item entry page.
The GeoBlacklight schema specifies a short list of controlled vocabularies for this element, but the Big Ten Academic Alliance geoportal links to many different kinds of file formats. When creating GeoBlacklight metadata using our Dublin Core editor, any value can be entered. However, when transforming the ISO 19139 metadata, we currently only have a set list of Shapefile, Geodatabase, GeoTIFF, or ArcGRID to choose from. This list is incomplete, and other values may need to be manually added to the XSLT transformation file. To accommodate unanticipated or unspecified formats, any ISO metadata record without one of these values is simply transformed as “File” in GeoBlacklight.