Entering Metadata: Guides and Tools for Repositories
Introduction to Metadata Training
Institutional Repository editors and library cataloguers will find the following metadata crosswalk and data entry guidelines useful as an introduction to applying metadata and to understanding key principles underlying the way metadata is treated in repositories.
Key differences: Library vs Repository
The different ways libraries and repositories collect and expose work affect the way metadata is managed. For example:
Libraries collect works that are compilations of articles by different authors and treat the collected work as the main item. Repositories archive each authored article as the primary item and the host publication is treated as a related item.
A library collection draws information in to the institution for the use of the institution. A repository collection is established to expose its resources openly to the entire world and to promote the work of an institution.
AACR2 is the common data entry standard for libraries. Its limitations in the face of rapid technological developments have led to the development of the Resource Description and Access (RDA).
It is even more necessary for repository metadata management to radically review the rationales for past standards and be prepared to revise or replace them if the repositories present a whole new set of rationales.
Crosswalks between MARC, MODS and Dublin Core are included in the following guidelines. These crosswalks are intended to:
assist technical staff with spreadsheets mapping data to Dublin Core
assist those working with metadata entry and quality control to understand the underlying principles
demonstrate how DC metadata is the key to making repositories compliant with Open Archive standards and internationally accessible
Online Metadata Tools
Crosswalks
The Library of Congress MARC Standards website provides access to complete crosswalk or mapping guides across MARC, MODS and DublinCore, including:
MARCXML Conversion to MODS and Dublin Core Stylesheets
MODS Conversion to MARC and Dublin Core Stylesheets
DCMI resources
The Dublin Core Metadata Initiative (DCMI) has a tools and software page of resources for:
creating metadata templates
changing metadata templates
automatic extraction and production of metadata
conversion between metadata formats
Schema Conversion
The MarcEdit tool automatically converts schema.
Repository Ingest Tools: Survey and ARROW practice
CAIRO (Complex Archive Ingest for Repository Objects), funded by JISC's Repositories and Preservation program, has conducted a study comparing a wide range of metadata and other extraction tools used in repositories with related common open source licenses:
Within the Australian ARROW community (using the VITAL repository) there are members who have customized the VALET ingest tool for their repository. These can be contacted through the community.
Data Entry Guidelines
The following guidelines provide main headings for repository metadata requirements, including Dublin Core. These guidelines are generic (most are Dublin Core terms) and can be easily related to the requirements of specific repositories.
Headings which do not state “DC term” are not Dublin Core elements but are still integral parts of repository metadata.
In some repositories, the creation of Dublin Core fields will be a default part of the repository software. The following guidelines are sufficiently comprehensive on data entry principles that apply across all repositories as well as mapping metadata to Dublin Core where that is required as part of the initial repository configuration.
Beneath each main heading there is:
a scope note explaining the definition and limitations of the term
a note on related terms closely associated with the main term.
notes are entered under MARC, MODS and data entry best practice and standards headings. These are critical parts of the guidelines, explaining:
expected practice in repositories
exceptions
differences from practices normally found in a traditional library.
All Dublin Core elements are repeatable, with one exception: the ARROW Discovery Service harvester recommends that there be only one resource type in the Simple Dublin Core field.
It is not necessary to display all fields in the portal display of a repository record. For example, a list of RFCD codes may be hidden from the main record display but exposed for a browse list to be indexed for searching. The extent and ease of configurability in repositories will vary.
When deciding how much metadata to enter, consider the following points:
records should not be cluttered with unused information
not all metadata needs to be displayed: some may be useful for searching, for authentication of the integrity of records and archived resources or for audit purposes
it is better to add a little more rather than a little less metadata because granularity of data entry is a strength and potentially facilitates its use and value into the future
Abbreviations used:
DC term
Dublin Core term
SN
Scope note
RT
Related term
The Detailed Description of MODS elements page provides more detailed explanations for the use of the MODS elements.
title (DC term)
SN | A name given to the deposited resource, not the parent publication of the resource. Thus the title of a deposited book chapter or journal article will be mapped to this DC element, and not the title of the book or journal in which they appear. |
RT | Relation |
Crosswalks to Dublin Core
MARC | MARC: 245 $a $b $p $n |
| MARC: 246 $a $b $p $n |
Notes | Other MARC title fields do not apply to the title of the deposited resource so only the above fields should be mapped to the dc.title |
MODS | MODS: <titleInfo><title> |
Notes | MODS allows <titleInfo> subelements to be parsed: |
Data entry: best practice and standards
Notes | The main title should be the title of the resource at the time it is published. Other variant titles (e.g. a preprint title) can be added as alternative titles or as notes. If the title is not published enter the title as it appears on the resource. |
| Enter titles in full, including initial articles. If there is no punctuation separating the main title from the subtitle (e.g. the two are separated only by a line space), use a colon to separate the two parts. If there is no title provided with the resource, supply one. |
Rationales | A resource can be known by multiple titles. Different versions (e.g. preprints and postprints) can contain title variations. Running titles, acronyms within titles, advertised titles etc can vary from the title on the resource. The resource is most likely to be known and recognized by the publication title. In some repositories the title will be used in a citation in the form in which it is entered in the record. Title is an essential element required by harvesters. |
creator (DC term)
SN | The person or persons responsible for the intellectual content of the deposited resource, not its presentation. |
RT | contributor |
Crosswalks to Dublin Core
MARC | MARC: 100 $a $q |
Notes | In a repository, archiving the scholarly output of an institution all creators will be personal names, so do not use MARC 110, 111, 710 or 711 tags. Each deposited resource will represent the work of a personal author or authors, even if it is published as part of a compilation under a corporate authorship. Repository records will have entries for editors of conference publications, supervisors of theses, and names of submitters of resources to the repository who are not authors of those resources. These names should be stored in a MARC 720 tag, with $e to indicate their relationship to the resource (submitter, editor, supervisor, etc.) Some names in the 100 and 700 MARC tags that are mapped to the dc.creator will also use a $e relator subfield to indicate their role (e.g. submitter). This is for for internal administrative or authentication purposes and should not be mapped to a Simple Dublin Core element. |
MODS | MODS: <name type=“personal”><namePart> |
Notes | MODS allows <name> subelements to be parsed: <namePart>, <displayForm>, <affiliation>, <role>, <description> MODS subelements should be concatenated in Dublin Core, separated by a space or other form of punctuation. |
Data entry: best practice and standards
Data entry | Enter the names of multiple authors of a resource in the same order in which they appear on the resource, even if this results in the name of an author not belonging to the repository's institution being entered first. Maintain an authority file of personal names entered in the repository and always enter the same author with the same name format. The nature of this authority file will depend on the staff and time resources available in the institution. Use of normal library authority standards such as LC name authorities, National Library authorities, AACR2 standards for forms of foreign names, etc. is discouraged in repository authority list creation. An authority list may be compiled from the forms of names appearing in the institution's formal staff directory or even from the form of the name when it is first encountered by a repository editor. MODS also allows for a name variation to be nested with the standard form of a name with its <displayForm> element. |
Rationales | The order in which names appear on some multi-authored articles can have significance. Some authors will have different forms of their name appearing across different publications, and a repository author index should contain one entry for each name. There are no “see” or “see also” functions in most repositories at present and standardized name authorities such as those of the Library of Congress can sometimes be obscure without this functionality. Even though the repository record will have a standardized form of an author's name, the form of name as it appears on the resource will still be displayed for users on the resource. If the difference between the formats of name is significant an explanatory note can be added to the record. |
Affiliation
SN | An institution to which the author is associated. Typically this will be the university of the submitting author. This is not a Dublin Core element but is an important identification of the author in repositories. |
RT | creator |
Do not crosswalk to Dublin Core
MARC | MARC: 100 $u |
Notes | $u in MARC can be an affiliation or address (e.g. email address) of a name. In repositories, however, always use the institution to which the author belongs. $u is not repeatable in the same MARC field. (Some repositories map affiliation to become part of the name in dc.creator. There are trade-offs to be made when and wherever it is mapped in DC. Keeping Dublin Core Simple explains the problem of mapping this to DC ) |
MODS | MODS: <name> <affiliation> |
Notes | MODS allows <name> subelements to be parsed: <namePart>, <displayForm>, <affiliation>, <role>, <description> MODS subelements should be concatenated in Dublin Core, separated by a space or other form of punctuation. |
Data entry: best practice and standards
Note | Institutions may opt to enter values only for authors from their own institution. Be aware, however, that an author may have belonged to another institution at the time the resource was created and published. “Affiliation” may be used for in-house and display purposes only. It should not be mapped to Simple Dublin Core for harvesting. |
Rationale | There is no scope in Simple Dublin Core for the “affiliation” of the author. |
Role
SN | A term that describes the relationship between the name and the resource. This is not a Dublin Core element but DC does use Relator terms in Qualified Dublin Core. These do not apply to Simple Dublin Core. |
RT | creator |
Do not crosswalk to Dublin Core
MARC | MARC: 100 $e |
Notes | $e in MARC can be an affiliation or address (e.g. email address) of a name. In repositories, however, always use the institution to which the author belongs. $e is repeatable in the same MARC field. |
MODS | MODS: <name> <role> |
Notes | MODS puts all names in a repeated<name> with type of contribution indicated in <role>. It does not make the explicit distinction between creator and contributor in terms of primary vs. secondary roles. An application may wish to designate use of Creator or Contributor for all MODS names or use the role value to determine which DC element is used. MODS allows <name> subelements to be parsed: <namePart>, <displayForm>, <affiliation>, <role>, <description> MODS subelements should be concatenated in Dublin Core, separated by a space or other form of punctuation. |
Data entry: best practice and standards
Notes | Institutions may opt to enter values only for authors from their own institution. Be aware, however, that an author may have belonged to another institution at the time the resource was created and published. “Role” may be used for in-house search and display purposes only. It should not be mapped to Simple Dublin Core for harvesting. |
Rationale | It is important for audit purposes that the “submitter” (role term) of a resource be recorded. It may be desirable in certain cases to display an “editor” (role term) of a conference paper, or a “supervisor” (role term) of a thesis that is a deposited resource. |
contributor (DC term)
SN | Use the “contributor” term for persons responsible for making contributions to the resource (e.g. thesis supervisors, editors) but who are not also responsible for creating the resource. Contributor is used as the default for “author” in some repositories, eg DSpace. However, the Dublin Core “creator” term is repeatable for multiple authors and by DCMI definition is intended for persons responsible for the creation of the resource, so “creator”, not “contributor” should be used for authors. |
RT | creator |
Crosswalks to Dublin Core
MARC | MARC: 720 $a $q $e |
Notes | Repository records will have entries for editors of conference publications, supervisors of theses, and names of submitters of resources to the repository who are not authors of those resources. These names should be stored in a MARC 720 tag, with $e to indicate their relationship to the resource (submitter, editor, supervisor, etc.). Do not enter these in a MARC 700 tag in order to avoid confusing them with creators when mapped to DC. |
MODS | MODS: <name><namePart> |
Notes | MODS puts all names in a repeated <name> with type of contribution indicated in <role>. It does not make the explicit distinction between creator and contributor in terms of primary vs. secondary roles. |
Data entry: best practice and standards
Note | Be careful about implications of entering all contributors to a resource, and of making them available to DC mapping. It is not compulsory to have a dc.contributor field and it may be omitted altogether with some resource types. Enter the names of multiple contributors of a resource in the same order in which they appear on the resource. |
Rationale | In the case of supervisors of theses, for example, supervisors change throughout the creation of a thesis, belong to different institutions, have different statuses and degrees of involvement with the author. Relationships among any of these can also sometimes be a sensitive issue. The order in which names appear on a resource can have significance. |
date (DC term)
SN | The Dublin Core definition of the “date” term is “a date associated with the life cycle of the resource”. This means that a resource can have multiple dates associated with it (e.g. date of creation, of submission to a publisher, of publication, of accession to repository, of a subsequent modification of the repository record). |
RT | coverage |
Crosswalks to Dublin Core
MARC | MARC: 008/00-05 MARC: 260 $c |
Notes | Only crosswalk to Dublin Core dates that are important for discovery. The subfield $c can be repeated for multiple dates. However MARC does not allow for the potential range of dates associated with a resource in a repository. Accession to repository and subsequent modification dates, for example, should ideally be covered automatically as part of the versioning metadata updates. In some cases it may be necessary or desirable for authentication and audit records to maintain date notes in a MARC 5XX tag. MARC 260 $c is normally used for date of publication. |
MODS | MODS: <originInfo><dateIssued> |
Notes | Only crosswalk to Dublin Core dates that are important for discovery. |
Data entry: best practice and standards
Notes | Dates should be formatted in the Dublin Core according to the W3C encoding rules for dates and times. Accordingly be careful not to map any accompanying letters with a date (e.g. “ca.”) to the dc.date element. Dates required for versioning, authentication and reporting purposes in repositories:
If a work was “first” published on two different dates (e.g. online and print versions being published in different years) use the date the work first appeared in public domain regardless of format. In case of unpublished theses, use the date the degree was awarded rather than the date the thesis was completed. To locate the date of publication:
|
Rationale | Intention of the Date of Issue is to indicate when the article was first published, or if not published, first made public. |
description (DC term)
SN | A description is an account of the content of the resource. The Dublin Core definition includes abstracts, tables of contents, or any free text summary or account of the content. |
RT | subject |
Crosswalks to Dublin Core
MARC | MARC 520 $a (Either leave the MARC indicators ## or modify according to the type of summary: e.g. 3# for abstract) MARC 5XX $a |
Notes | Any number of 5XX tags can in theory be mapped to a dc.description element. Thus if one chose to use, say, an inhouse 599 tag to describe the peer-review status of a work, then this peer review value could be mapped from that 599 tag to a dc.description element. It is best practice to always populate a MARC 520 tag to map to a dc.description element. |
MODS | MODS: <abstract> |
Notes |
|
Data entry: best practice and standards
Notes | If no abstract is available enter the summary. If no abstract or summary accompanies the resource enter descriptive sentences from the introduction or conclusion or title or table of contents, or briefly summarize in own words. If no other alternative repeat the main portion of the title. Do not leave this field blank. Precede entries that are not abstracts with a bracketed indicator of the nature or source of the entry (e.g. [Conclusion]:) |
Rationale | Even though “abstract” is the default heading for a description in some repositories, the data is typically mapped to dc.description which is broader in scope that strict “abstracts”. This is one of the most useful fields for both users and harvesters. (Some harvesters, e.g. OAIster, even have display problems if this field is not populated in a record.) |
subject (DC term)
SN | The topic or content of the resource. Subjects will include keywords, keyword phrases and controlled vocabularies. |
RT | abstract |
Crosswalks to Dublin Core
MARC | MARC: 650 $a $b $x $y $z $2 (controlled vocabularies) MARC: 653 $a $a $a . . . . (keywords) Also: 600, 610, 611, 630, 651 |
Notes | In Australian and New Zealand repositories the 650 tag will include the RFCD or Marsden code (both number and descriptive label). A repository may be configured so as to hide this from the main portal display page of a record if desired, while still retaining it for browse-indexing and search purposes. In other subject entries it is best practice to include the $2 subfield to indicate the source of any controlled vocabularies used. The repository is to be potentially accessible to a wider community than traditional libraries so it will not always be obvious to service providers how to interpret a value in this field unless it is explained in the $2 subfield (e.g., $2 LCSH). Keywords and keyword phrases will be entered in the 653 tag, each contained within a separate $a subfield within the one 653 tag. |
MODS | MODS: <subject> |
Notes | MODS also designates the authority for the controlled vocabulary: <mods:subject authority="lcsh"><mods:topic> |
Data entry: best practice and standards
Notes | Multiple RFCD codes may be entered. It is not best practice to use RFCD or Marsden codes for the general search function. It is not best practice to rely entirely on keywords for subject entries. Editors should monitor keywords used and add additional ones if appropriate to the repository record. This will not affect the keywords as chosen by the author on the resource, but may be advisable for more effective search and retrieval purposes. When mapping keywords for display, configure them so that each keyword or keyword phrase is separated by a semicolon. Maintain a common standard regarding capitalization or noncapitalization of keywords within the repository. |
Rationales | Cross disciplinary research makes multiple RFCD codes obligatory. RFCD and Marsden codes are designed for government administration, research and reporting purposes, and not for topic searching. They are useful among academics within the same research fields and and who report their research to the same national jurisdiction. These academics know and use the codes more than other users of the repository. Keywords can sometimes be chosen according to the transient fashion of the day and suffer from limited long-term value. Some authors also choose keywords that have very narrow applicability within their specialist field with the result that a broader topic more useful for search and recovery purposes can be omitted altogether. Semicolons are becoming the standard practice in repositories such as EPrints and DSpace since they potentially cause less confusion for users who often see commas used to separate multiple parts of a single name or topic name. A common standard of format within the repository enhances its professional image. |
coverage (DC term)
SN | DCMI definition: The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. |
RT | description |
Crosswalks to Dublin Core
MARC | MARC: 033 $a (Formatted date/time and/or coded place of creation, capture, or broadcast associated with an event.) MARC: 513 $b (period covered by a report) MARC: 650 $y $z (chronological and geographic subdivisions) |
Notes | Recommended best practice by DCMI is to use controlled vocabularies for this value. Hence the MARC 033 (date = dc.coverage.temporal) and MARC 043 (geographic area code = dc.coverage.spatial) – although both temporal and spatial coverage in Simple Dublin Core sit in the (repeatable) dc.coverage element. Not all MARC fields need to be mapped to DC. Tag 651 can be mapped to both dc.subject and dc.coverage if other MARC tags listed above are not used. |
MODS | MODS: <subject> |
Data entry: best practice and standards
Notes | It is best practice to ensure temporal and spatial values are mapped to dc.coverage elements. |
Rationale | Populating the Dublin Core “coverage” field eliminates the risk of a harvester supplying default values. |
language (DC term)
SN | The language of the intellectual content of the resource. |
RT | -- |
Crosswalks to Dublin Core
MARC | MARC/35-37 MARC 041 $a |
Notes | Note, it is not best practice to use the MARC 546 language note tag to map to dc.language. This MARC tag is a free text note. Coded values for languages are entered in 041 and/or the fixed 008/35-37 fields. A three letter MARC code for language is used in the 008/35-37 field but this can be mapped to a default DC language value en-aus. |
MODS | MODS: <language> |
Notes | e.g. <mods:language authority="rfc3066">en</mods:language> |
Data entry: best practice and standards
Notes | The recommended best practice by DCMI is to use a controlled vocabulary such as RFC 3066 which, in conjunction with ISO 639, defines two- and three-letter primary language tags with optional subtags. So Australian English is represented as en-aus, which is made up from en (RFC3066) and aus (ISO 639). |
Rationale | OAI harvesters typically search for standard codes to indicate language. |
publisher (DC term)
SN | DC definition: An entity responsible for making the resource available. A publisher can be the author's institution or a commercial publisher. |
RT | Place of publication |
Crosswalks to Dublin Core
MARC | MARC: 260 $b MARC: 773 $d |
Notes | 773 $d is the subelement of the host item of the resource, and includes the place, publisher, date of publication as a text string in a single subfield (e.g. $d [Berlin], Elsevier, 2007) |
MODS | MODS: <originInfo><publisher> |
Notes | -- |
Data entry: best practice and standards
Notes | Enter the publisher in full, except for Pty and Ltd. Where there is a hierarchy in the publishing organization enter the broadest umbrella institution first followed by successive narrower institutions. In case of unpublished works (e.g. theses) that emanate from the institution, enter the institution as the value in the publisher field. |
Rationale | The Dublin Core definition of publisher, and therefore the DC expectation in this element, is any entity that is responsible for making the resource available. Hence a university can appear as “publisher” of a thesis even though the thesis is not strictly “published” in a commercial sense. |
Place of publication
SN | The geographic location of the publisher of a resource. |
RT | publisher |
Do not crosswalk to Dublin Core
MARC | MARC: 260 $a MARC: 773 $d |
Notes | 773 $d is the subelement of the host item of the resource, and includes the place, publisher, date of publication as a text string in a single subfield (e.g. $d [Berlin], Elsevier, 2007) |
MODS | MODS: <originInfo> <place> <placeTerm type=“text”> MODS: <originInfo> <publisher> |
Notes | The latter MODS entry is the equivalent of the MARC 773 $d host item subelement. |
Data entry: best practice and standards
Notes | Enter the place of publication value. |
Rationale | The place of publication may be expected for bibliographic citation purposes. Some repositories collate data including place of publication to generate standard bibliographic citations of the resource. |
rights (DC term)
SN | DC definition: Information about rights held in and over the resource. |
RT | -- |
Crosswalks to Dublin Core
MARC | MARC: 506 $a (restrictions on access note) |
Notes | The 540 tag will normally have the official copyright statement from the copyright owner and/or publisher of the resource. Where a link to an online copyright statement is required use the $u subfield for the URI. The 506 tag will explain in further detail any access limitations on the resource. |
MODS | MODS: <accessCondition> |
Notes | MODS combines the 2 MARC tags values into the single <accessCondition> element, and can optionally distinguish between them by “type”: <accessCondition type=“restrictionOnAccess”> |
Data entry: best practice and standards
Notes | DCMI Glossary comment: Typically a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. |
Rationale | Publishers will often require a link to their page or a standard copyright statement. These can be pasted into this field |
relation (DC term)
SN | Dublin Core relation is a related resource. For a conference paper or journal article or book chapter this would mean the conference publication, the journal title and the book title respectively. Conference names and series titles are also “relations” of resources. |
RT | title |
Crosswalks to Dublin Core
MARC | MARC: 440 $a MARC: 530 $a MARC: 710 $a MARC: 830 $a MARC: 856 $u |
Notes | Do not use 710 $a for a creator's affiliation. Creator affiliations are covered by $u in the 100 and 700 tags. The host item entry (MARC 773) can contain complete bibliographic information for the host item of the resource: $t title (e.g. journal title) The MARC 787 tag is a nonspecific relationship entry and may contain other types of data apart from a host item for a resource. Check this field in the case of a batch upload. Be careful to distinguish bibliographic data pertaining to the resource from data pertaining to its related title. An ISSN applies to a journal publication and belongs in the 773 tag, not in the 022 tag that would refer to the main title entry for the journal article. Use the MARC 856 $u field and subfield, with $q for the format (e.g. application/PDF) for the offsite DOI or other offsite URI to the article. |
MODS | MODS: <relatedItem> |
Notes | <relatedItem> data is parsed into subelements in MODS (any MODS element may be used). For example, if giving a reference to a resource fully described in MODS relatedItem, one could use: |
Data entry: best practice and standards
Notes | Do not use this for links to other instances of the same resource, or to links or bibliographic references to the same resource. “Relation” also strictly includes other versions of the resource (e.g. a preprint of a published version) but there is no scope at this stage to monitor this level of relationship in repositories. |
Rationale | Relation is defined by DCMI to mean a related resource. Do not use for another instance of, or link to, the same resource. |
identifier (DC term)
SN | DCMI definition: An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN). DCMI guidelines: This element can also be used for local identifiers (e.g. ID numbers or call numbers) assigned by the Creator of the resource to apply to a particular item. It should not be used for identification of the metadata record itself. |
RT | -- |
Crosswalks to Dublin Core
MARC | MARC: 013 $a MARC: 020 $a MARC: 852 $u |
Notes | Only crosswalk identifiers of the resource to Dublin Core, not the identifiers of the host publishing title. The MARC fields in italics above will typically represent identifiers of host items only and should not be mapped to Dublin Core. Note for OAI harvesting: It is expected that a service provider (harvester) will direct users initially to the resource's metadata page in the repository. This is indicated in the MARC 852 field. (From there users can navigate to the full text of the resource.) In order for service providers to direct users to the repository's metadata page for the resource, the identifier in OAI data provider's Dublin Core record must be the identifier of the metadata page itself, not the full text of the resource. This identifier will normally be machine generated. All other DC values in this DC record will relate to the resource, not the metadata page with the link to the resource. |
MODS | MODS: <identifier> |
Notes | The identifier type (e.g. <identifier> with type=“doi”) should be retained and associated with the identifier value. Follow standards for entry of identifiers. With ISSN and ISBN follow a single standard for data entry: e.g. hyphen in an ISSN and all digits and letters in an ISBN together without spaces or dashes and capitalized. Local identifiers will be machine generated: |
Data entry: best practice and standards
Notes | Do not map local identifiers (e.g. PIDS – persistent identifiers – a local unique repository identifier) to Dublin Core |
Rationales | Local identifiers do not have significance outside the repository institution. |
source (DC term)
Crosswalks to Dublin Core
MARC | -- |
Notes | Do not crosswalk manual data entries to the Dublin Core element “source”. |
MODS | -- |
Notes | Do not crosswalk manual data entries to the Dublin Core element “source”. |
Data entry: best practice and standards
Notes | This value may be machine generated to indicate the name of the file from which the metadata is generated. |
Rationales | DSpace and EPrints recommend that it not be used for data entry. ARROW Discovery Service does not scan for “dc.source”. |
type (DC term)
SN | DCMI definition: The nature or genre of the content of the resource. DCMI recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. This thesaurus however is limited for repository purposes. Repository supports have created other controlled lists. To describe the file format, physical medium, or dimensions of the resource, use the Format element. |
RT | -- |
Crosswalks to Dublin Core
MARC | MARC: 655 $a $2 |
Notes | Use a controlled thesaurus. Indicate the source of the thesaurus (e.g. $2 LCSH) |
MODS | MODS: <typeOfResource> MODS: <genre> |
Notes | Use separate instances of Type for each MODS element value. If converting MODS typeOfResource values to Dublin Core Resource Type values, see conversion details below. If MODS <genre> contains authority="dct", that may be used in dc:type and typeOfResource dropped. |
Conversion of MODS typeOfResource values to DC Resource Type vocabulary
MODS typeofResource | DC Type value |
typeOfResource collection="yes" | Collection (use in addition to specific value below) |
software and mods:genre="database" | Dataset |
cartographic material | Image |
multimedia | InteractiveResource |
moving image | MovingImage |
three-dimensional object | PhysicalObject |
software and mods:genre="online system or service" | Service |
sound recording, sound recording-musical, sound recording-nonmusical | Sound |
still image | StillImage |
software | Software |
text, notated music | Text |
Data entry: best practice and standards
- Notes
- Do not repeat this element. (all other elements may be repeated.)
- RUBRIC is involved with MACAR in looking into the possibility of reaching guidelines towards national standards for a resource type vocabulary in repositories. A final decision has not yet been made on this list (5th October 2007), but one is expected before the end of the year.
- In the meantime, the ARROW Discovery Service harvesting guide states as a rule that a type value “must be one of the ARROW list of recognized types”.
- These are:
arc project report
article
book
book chapter
collection
conference paper
email
reading list
multi-media object
research dataset
research paper
rich media (non-text)
still image
technical report
thesis
working/discussion paper
- The guide also states that if other type values are to be used then the Discovery Service administrator should be contacted. Many repositories, for example, do use variant terms from the ones above. (see the “Current thesauri for resource types” below.)
- There are more types that are being considered by MACAR, such as software, musical compositions, datasets and others, and some of the above terms may change. Before the end of the year a more complete “standard” list should be available.
- Rationale
- Although the ARROW Discovery Service harvesting guide says multiple “type” elements may be supported (presumably for a single document), not all repository solutions currently support multiple resource types for the one document. Hence multiple resource types attached to one document in one repository could run into preservation difficulties if there comes a time for a future migration of data.
- It also needs to be kept in mind that the ARROW Discovery Service harvesting guide is currently under review and is expected to be revised soon. The work of MACAR may influence its revision.
- Unfortunately there has been little consistency among the many thesauri in use, and some confuse formats or types of resources with genres of resources. Beware confusion between types (genres) and formats (MIME types).
Current thesauri for resource types (See Appendix B for a more complete set of comparisons)
ARROW-VITAL types: book | DSpace 1.4 default types: Animation | Fez 1.1 default types:
| Eprints default types: Article |
format (DC term)
SN | DCMI definition: The file format, physical medium, or dimensions of the resource. DCMI comment: Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]. In repositories the file format metadata value will typically be machine generated. |
RT | -- |
Data entry: best practice and standards
MARC | 856 $q (electronic file format type: e.g. $q application/pdf) Other possible format crosswalks: 245 $h (medium) 300 $a (physical description) 533 $e (physical description of a reproduction) |
Notes | In repositories the file format metadata value of the archived resource will typically be machine generated. The above crosswalks are for additional format values that a repository policy may choose to add to a record. |
MODS | MODS: <physicalDescription> MODS: <internetMediaType> MODS: <extent> MODS: <form> |
Notes | Use separate instances of Format for each MODS element value. |
Data entry: best practice and standards
Notes | Format information of a resource has a different function in a repository from what it has in traditional libraries. Repository format is principally for machine information. MARC non-electronic format information is principally descriptive data for the user. |
Rationale | In repositories the file format metadata value of the archived resource will typically be machine generated. |
Index of the MARC fields referred to in the above Guidelines
| MARC field | MARC subfield | DC element |
Language | 008 | eng | language |
Patent control | 013 |




