Home | RUBRIC site | Contact us | Creative Commons

RUBRIC Toolkit: Metadata Overview

Metadata is used:

  • to describe and provide the location of a resource to assist with discovery

  • to give information about its conditions of use

  • to govern technical and display issues

  • to contain long term preservation information

  • to maintain version information

  • to manage administrative aspects of the data

Much metadata is machine-generated by the repository, but it is important for repository managers to understand the extent to which metadata in their repositories can be customized for their own purposes. Refer to the section on Metadata and Entering Metadata: Guides and Tools for Repositories for more detailed information on the topic.

Resource discovery and IR interoperability are inseparable issues and both are tied to the effective use of metadata.

Choosing the Right Metadata Standard

Dublin Core is the simplest of metadata schema which ensures interoperability for search and retrieval among repositories and their harvesters. Whilst it is not the only available standard, it is considered to be the baseline metadata standard for the Open Archives Initiative Protocol for Metadata Harvesting. The Open Archive Initiative (OAI) develops the standard to promote interoperability among repositories, and OAI repositories will normally be configured to generate Dublin Core records by default.

While the simple Dublin Core schema provides a basic level of interoperability, the Open Archive Initiative additionally encourages the use of more granular schema. A repository should include other metadata schema to describe their resources and conditions of access more fully for the benefit of their users. Examples of other metadata schema include MARC and an extended version of Dublin Core known as Qualified Dublin Core. Other metadata schema are discussed in the Metadata section.

Metadata Schemes Points of Comparison is one of many useful articles that describes what to look for when comparing the range of standard schema available for specialised purposes.

Choosing a Metadata Standard For Research Discovery by UKOLN includes a checklist for selecting the right standard for your purpose.

The basic criteria to use are:

  • interoperability

  • extensibility and growth

  • sustainability

  • granularity

  • ease of use and existing skills

Images and videos present additional preservation, sustainability and rights issues to other resource types. There are specialist metadata schema for such non-text materials and these are also discussed briefly in the comprehensive Metadata section.

Metadata for Harvesting

The ARROW Discovery Service has produced a Harvesting Guide (currently being updated) that recommends different levels of metadata content for harvesting. The ARROW Discovery Service is an OAI compliant national harvester managed by the National Library of Australia.

It liaises with a range of international service providers to manage the harvesting conditions of Australian material, including:

The ARROW Discovery Service's Public Funding, Public Knowledge, Public Access explains the agreements with Google and the OAIster service to secure higher rankings for its university repository harvested resources. OAIster's agreement with Yahoo! and Google is summarized at http://www.oaister.org/sru.html.

IR managers may liaise independently with the same providers or they may choose to register with the ARROW Discovery Service which can negotiate on their behalf.

The following terms are the Dublin Core elements that are used by OAI service providers for harvesting metadata records. (each element is repeatable and optional). The terms in bold type are the most essential:

  • title

  • subject (includes keywords and controlled vocabularies)

  • description (includes abstract or other summary)

  • type (e.g journal article, conference paper, thesis)

  • source

  • relation

  • coverage

  • creator

  • publisher

  • contributor

  • rights

  • date

  • identifier

  • language

  • format


The comprehensive Metadata section explains that all of the above elements will apply to the document or article deposited in the repository with one exception. At least one identifier that is also a resolvable link (URI) will need to point to the repository's metadata page that describes and links to that resource. Otherwise a harvesting service will direct users directly to that resource and bypass the repository.

The National Science Digital Library (NSDL) advises that best practice to ensure your IR is harvested is to register with the official OAI Registry.

Connecting with the Harvesters explains who to contact and how to register.

Australian Digital Theses

Australian Digital Theses (ADT) metadata and harvesting requirements vary slightly, even if this material is co-located with other repository material. ADT will still expect to be able to harvest those theses separately from the rest of the repository archive.

The ADT harvester only processes a limited range of Dublin Core elements:

dc.title

dc.creator

dc.subject

dc.description

dc.date

dc.language

dc.publisher

dc.rights

dc.identifier

dc.type

If there are any other DC elements in an ADT record (e.g. dc.format), they will be ignored by the ADT harvester for normal processing and indexing purposes.

Open Archive searching requires that ADT records in a repository be grouped and harvested as a discrete set of items separately from the other records in the repository. Sets Guidelines for Repository Implementers on the Open Archive Initiative website provides technical details for constructing Sets.

It is recommended that the dc.type or dc.relation element be used for the SetName for ADT harvesting. Enter '''Australasian Digital Thesis''' as the value for dc.type or "Australasian Digital Thesis Program" as the value for dc.relation.

Example:

dc.type Australasian Digital Thesis
dc.relation Australasian Digital Thesis Program

For ADT to harvest a repository according to OAI-PMH (Open Archive Initiative Protocol for Metadata Harvesting) standards, repository managers will need to inform ADT of:

  • the URL of your server

  • the SetSpec of the ADT records to be harvested

  • the SetName for the ADT records to be harvested (i.e. dc.type Australasian Digital Thesis)

Examples:

DSpace record:

  • URL: http://researchspace.auckland.ac.nz/dspace-oai/request

  • SetSpec: hdl_2292_2

  • SetName: PhD Theses

VITAL record without Collections:

  • URL: http://repository.usq.edu.au/oaiprovider

  • SetSpec: Australasian Digital Thesis

  • SetName: Australasian Digital Thesis

VITAL record with ADT items in a Collection:

  • URL: http://repository.usq.edu.au/oaiprovider

  • SetSpec: rubric:299

  • SetName: Australasian Digital Thesis

Further Guides to Metadata

Guides that have most relevance for repositories in higher education institutions:

Criteria for evaluating a metadata schema for a digital repository can be found at:

RUBRIC Toolkit: Metadata Overview produced July 2007

graphics2

Copyright 2007 RUBRIC