Requirements for Collection Management Systems

Living Standard,

This version:
https://netwerk-digitaal-erfgoed.github.io/requirements-collection-management-systems/
Issue Tracking:
GitHub
Editors:
(Netwerk Digitaal Erfgoed)
(Netwerk Digitaal Erfgoed)

Abstract

This document describes requirements for collection management systems. By following these requirements, software developers can make their systems NDE-compatible.

1. Introduction

This document helps vendors of collection management systems to make their systems NDE compliant and ready for use in the Dutch heritage sector. Vendors only offering linked data publication services must meet the requirements for publishing Linked Open Data (see § 2.2 Publication of Linked Open Data) and registration with the NDE Dataset Register (see § 2.4 Registration with the NDE Dataset Register). In this case the other requirements must be fulfilled by the system providing the information to the publication service.

The guidelines in this document are required for any institution in the Dutch cultural sector that receives funding through the NDE Versnellingsprogramma (Acceleration Program). Software vendors help these organisations to meet the requirements by implementing them as features in their systems.

Organisations who provide datasets or collections according to these guidelines are considered Data Providers (Bronhouders) as defined by the [DERA].

This specification is normative and builds on top of the more informative Implementation Guidelines which was published in a earlier stage of the NDE-program.

2. Requirements

2.1. Persistent Identifiers

Every information resource needs a [URI] that identifies it to the world. Users must be able to trust these identifiers, they must be unique and reliably serve the same resource over time. URIs must be resolvable and point the user to valid linked data.

Persistent Identifiers (PIDs) are the part of a URI that uniquely identifies a resource. Heritage institutions must use Persistent Identifiers (PIDs) for their resources to guarantee reliability. With a PID the link to a resource keeps working even when the organisation, or the object’s location, has changed.

2.1.1. PID Implementations

There are multiple PID systems such as Archival Resource Keys (ARKs), Digital Object Identifiers (DOIs), Handle, and others. You may choose any implementation, even your own system based on the internet domain of your organisation. Each system has its own particular properties, strengths, and weaknesses. Our PID Guide helps you make a choice.

As a heritage institution you must choose a PID system. You may be limited by the PID systems supported by your software vendor. But when you change vendor, your PID system migrates with you.

Regardless of your choice for a PID system, you are required to structurally guarantee the system in your organisation. You must document the formal processes and policies that describe how PIDs are sustainably implemented and commit to them over time.

As a vendor you must implement at least one PID system in your software. You may support more. When implementing the PID-system you must link resources to your customer (the cultural heritage institution) and not to you as a vendor. Persistency must be guaranteed when resources are migrated to another vendor in the future.

2.1.2. Linking to removed resources

Removed resources are all objects that have been deleted, made invisible, or that for any other reason are no longer structurally available . You must maintain persistency for removed resources. To handle such requests you must use standard HTTP response status codes. You may use 'HTTP 301 Moved Permanently' to redirect the URI of a removed resource to a new resource that replaces it, or mark a removed resource as ‘obsolete’ by throwing 'HTTP 410 Gone' and serve its last incarnation, or choose another solution.

Unless an actual error occurs, you may not throw a client error response (category 400) or a server error (category 500) response to point to removed resources.

2.2. Publication of Linked Open Data

2.2.1. Serving data

The Implementation Guidelines mentioned above considers three levels when publishing linked data:

  1. Web compliance level: resolvable URIs

  2. Basic level: data dumps

  3. Advanced level: queryable Linked Data

NDE requires linked data to be published both as level 1 and 2.

To publish web compliant linked data (level 1) URIs (see: § 2.1 Persistent Identifiers) must resolve to the individual linked data resources contained in your data. To handle requests the HTTP content negotation protocol must be supported. Content negotiation offers users the choice to consume the data in the serialisation of their choice, e.g. as HTML, RDF, or other formats you support.

To publish linked data as a data dump (level 2) you should use RDF. We recommend the use of the N-Triples serialisation for flexibility but any other RDF serialisation is valid (N-Triples, Turtle, JSON-LD, RDF/XML). You must serve the data dump through a URI with a persistent identifier.

We strongly recommend that vendors also implement level 3 in their products. This allows users to also query linked data collections. For compliance with level 3 you must use a SPARQL endpoint.

Note: When you comply with levels 1 and 2 you allow users to access the individual linked data objects and the full dataset, while you avoid maintaining a linked data infrastructure that you would need to provide for querying. The drawback of this approach is that you move the burden to facilitate querying to the user. For this reason we view compliance with level 1 and 2 as a minimal requirement. We strongly urge you to go further and implement level 3. Compliancy with level 3 will very likely play a large role in the allocation of future NDE funding.

2.2.2. Data modelling

We distinguish between the publication of data in "generic data models" and "domain data models". The purpose of generic data models is to integrate the data in the cultural heritage network and make it more visible. Domain models are usually more richly populated and provide users with more possibilities for further processing in *dienstplatformen* (Service Platforms).

You must use the generic model schema.org when you choose to serve linked data in only one model.

We recommend you also serve data in one or more domain specific models. Domain models make reuse of data much more likely and may contain a wealth of information. Depending on the type of data, you may consider the following examples:

You should publish documentation how your data model maps to other well-known models like the Europeana Data Model (EDM). We recommend you publish this documentation in a machine-readable format. NDE can support you when creating these mappings.

To support the reuse of the collection data by the research community the FAIR principles should be taken into account, preferably by stating the FAIRness of the data through a FAIR Implementation Profile. Please contact the NDE team for more information on the latest developments in this area.

2.3. Implementation of the NDE Network of Terms

To connect the linked data that you publish with other data sources heritage institutions should use terms. Terms are descriptions of concepts or entities, such as subjects, people or places, and are managed in terminology sources, such as thesauri, reference lists or classification systems. Examples are AAT, GeoNames or WO2 Thesaurus (in Dutch).

Heritage institutions may use their own terminology sources. However, this reduces their ability to connect to sources from different institutions when using terms that are not part of (inter)nationally standardised terminology sources. We recommend using as many standardised terminology sources as possible. We developed the Network-of-Terms service to give you easy access to many different terminology sources.

For vendors we require the implementation of the NDE Network-of-Terms API in the collection management system. This will give your customers easy access to the terminology sources made available through the Network-of-Terms. You must implement configurable terms for fields in your heritage application. You may choose the specific fields. If you allow users of your application to configure the data type of a field, you should add a data type for terms. That way users are not limited in the (number of) fields that they want to configure for certain sets of terms.

If a heritage institution is considered an authority in a domain, vendors should publish their terminology sources for use by others and make it available through the Network-of-Terms. Please refer to our Implementation Guidelines on publishing terminology sources.

2.4. Registration with the NDE Dataset Register

To increase the findability of datasets of heritage institutions, you must publish the metadata of any dataset you serve. The metadata contains the description of the dataset and must be modelled in schema.org. We created a data model for the description of datasets and adopted the class https://schema.org/Dataset. You must comply with the model as documented in the NDE Requirements for Datasets and specifically the section Dataset information.

You should serve the metadata through a URI with a persistent identifier (see section § 2.1 Persistent Identifiers).

We have created the NDE dataset register to help users in the cultural heritage sector find relevant datasets to link with. You must use this to register the metadata of datasets you serve.

Although the register provides a manual registration option for users, as a vendor you are required to use the Dataset Register API to implement an automated registration feature in your software. The API description is available based on an Open API specification via the Swagger UI. The manual service may be used for publishing updates and changes.

You must publish the metadata with an open data license. You may choose the most suitable license.

The actual access to digital objects, such as images, audio files, or linked data resources, may be restricted. You must specify additional license statements for use and reuse of the objects in the metadata. Use the schema:license property to relate a license to an information resource.

Remember that you must relate a license to the dataset, the object metadata, and the reproduction. These licenses may differ.

2.5. Serving images through IIIF

You must implement the IIIF Image API to publish images contained in your system. The Image API supports requests for specific parts, sizes and orientations of images based on a URL scheme. It gives viewers the ability to request only those tiles needed for the current viewport and zoom level, instead of forcing them to load the complete high resolution image. You may choose the highest zoom level, size, and number of parts of any image you serve, since these are directly related to the available storage in your system, the results of the digitisation process at the institution, and other customer wishes.

We use the IIIF Image API Validator to test compliance.

While the Image API provides access to individual images, it does not describe the relationship between groups of images, such as the individual scans that make up the pages of a book. Besides the IIIF Image API you must also publish an IIIF manifest through the IIIF Presentation API. An IIIF manifest is a JSON-LD document that describes the structure and metadata of a digital object or collection of objects, such as images, audio, or video. It comprises several key components, including pages, canvases, table of contents (ranges), and annotations. You should serve the IIIF manifest through a URI with a § 2.1 Persistent Identifiers.

We use the IIIF Presentation API Validator to test compliance.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[DERA]
Digital Heritage Reference Architecture. URL: https://dera.netwerkdigitaalerfgoed.nl
[NDE-DATASETS]
David de Boer; Netwerk Digitaal Erfgoed. Requirements for Datasets. URL: https://netwerk-digitaal-erfgoed.github.io/requirements-datasets/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[URI]
T. Berners-Lee. Uniform Resource Identifier. URL: https://www.rfc-editor.org/rfc/rfc3986#section-1.1.3

Informative References

[RDF-PRIMER]
Frank Manola; Eric Miller. RDF Primer. URL: https://w3c.github.io/rdf-primer/spec/
[SPARQL11-OVERVIEW]
The W3C SPARQL Working Group. SPARQL 1.1 Overview. 21 March 2013. REC. URL: https://www.w3.org/TR/sparql11-overview/