Please don’t rely on this document yet as it’s still under discussion and development.
1. Introduction
1.1. Goal
This document prescribes the generic data model to be used when publishing linked data in the heritage network. The model consists of a minimal set of classes and properties. It is based on:
-
the current state of datasets in the heritage network, as observed in the Dataset Knowledge Graph, particularly its property partitions analysis;
-
the needs of service platform builders for understanding, processing and presenting data.
By adhering to this model, dataset publishers ensure that their data is visible and can be consumed and combined with other datasets in the network.
1.2. Scope
These requirements are restricted in three ways:
-
they apply only to the way published data is expressed, not how it is stored or managed internally;
-
they prescribe a generic data model and leave the use of domain data models up to dataset publishers;
-
they bear upon datasets, not their descriptions; for the latter see [NDE-DATASETS].
1.3. Examples
While RDF examples in this document are in the [JSON-LD] RDF serialization, publishers MAY use any RDF serialization format, such as [Turtle] or [N3].
2. Definitions
- Data model
-
Set of classes and their properties that defines how data is expressed.
- Generic data model
-
A simple, shared data model; the scope of this document. See also [NDE-ALIGNMENT]. Can be used alongside domain data models.
- Domain data model
-
A domain-specific data model, such as CIDOC-CRM, Linked Art, RiC-O or RDA. Can be used alongside a generic data model. Adds precision at the cost of complexity. Out of this document’s scope.
3. Data model
3.1. Generic and domain data models
The purpose of generic data models is to integrate data in the heritage network and make it more visible. Domain models are usually more richly populated and provide consumers with more possibilities for further processing, for example in service platforms.
This document is limited to a set of classes and properties that together form the generic data model. For most datasets, this generic data model expresses only a subset of data properties that are available. If done well, the generic data invites consumers to explore the data in more depth. To facilitate further exploration, publishers MAY use domain data models of their choosing alongside the generic data model. Examples are:
-
CIDOC-CRM and its derivative Linked Art for museum collections and catalogues;
-
RiC-O for archives;
-
PiCo for biographical data;
-
RDA for libraries.
3.2. Publication principles
With RDF, it’s perfectly fine to express the same data in multiple ways. Therefore, the generic and domain data models MAY coexist in the same information resource.
{ "@context" : { "schema" : "https://schema.org/" , "edm" : "http://www.europeana.eu/schemas/edm/" , "rdfs" : "http://www.w3.org/2000/01/rdf-schema#" , "dcterms" : "http://purl.org/dc/terms/" } "@id" : "https://literatuurmuseum.nl/id/123456789" "@type" : [ "schema:CreativeWork" , "schema:VisualArtwork" ], "schema:name" : "Het fluitketeltje en andere versjes" , "rdfs:label" : "Het fluitketeltje en andere versjes" , "schema:creator" : { "@type" : "schema:Person" , "@id" : "http://data.rkd.nl/artists/8342" }, "dcterms:creator" : { "@type" : "dcterms:Agent" , "@id" : "http://data.rkd.nl/artists/8342" }
Alternatively, publishers MAY separate the generic data model by using profile-based content negotiation (see [DX-PROF-CONNEG]). To do so, publish a profile with URI https://netwerk-digitaal-erfgoed.github.io/requirements-data/:
# Get the list of profiles.GET /resource/a?profile=alt HTTP / 1.1 # Server responds with a list of profiles that includes the NDE generic data model. HTTP/1.1 200 OK Content-Type: application/json { "resource": "http://example.org/resource/a", "profiles": [ { "token": "nde", "uri": "https://netwerk-digitaal-erfgoed.github.io/requirements-data/", "media_types": ["application/ld+json", "text/turtle"] }, ... ] }
3.3. Schema.org vocabulary
The generic data model presented in this document is designed as a [SCHEMA-ORG] application profile. The choice for Schema.org is substantiated in Implementation guidelines for NDE alignment § generic-data-model.
While the Scheme.org website considers “both 'https://schema.org' and 'http://schema.org' (...) fine”, mixing the namespaces makes it harder to consume datasets.
Therefore, Publishers MUST use the https://schema.org/
(HTTPS) namespace for Schema.org; not http://schema.org/
(HTTP).
3.4. Classes
Publishers MUST type each published resource as one or more of the following classes.
Publishers SHOULD use more fine-grained classes alongside these top-level classes.
{ "@context" : "https://schema.org/" , "@id" : "https://example.com/dataset1/resource1" "@type" : [ "CreativeWork" , "Photograph" ] }
3.4.1. CreativeWork
3.4.2. Event
3.4.3. MediaObject
Should we only have MediaObject or the more specific types VideoObject, AudioObject, ImageObject etc., too?
3.4.4. Organization
3.4.5. Person
3.4.6. Place
{ "@context" : "https://schema.org/" , "@id" : "https://example.com/dataset/place" "@type" : [ "Place" ], "name" : "Anne Frank Huis" , "address" : { "@type" : "PostalAddress" , "streetAddress" : "Prinsengracht 263" , "postalCode" : "1016 GV" , "addressLocality" : "Amsterdam" , "addressRegion" : "Noord-Holland" , "addressCountry" : "Netherlands" } }
3.5. Properties
This section describes how to express the data using a minimal set of properties (RDF predicates), and their ranges. The range of an RDF predicate is the set of allowed values for that predicate.
Publishers MUST express data with these properties.
3.5.1. Overview
Property | Description | Range | Cardinality | Usage |
---|---|---|---|---|
schema:name | Name of a person, title of a book etc. | Language-tagged string | 1 | Required |
schema:description | A description of the resource. | Language-tagged string | 0..n | Required (if available) |
schema:image | An image of the resource. | URL? | 0..n | Required (if available) |
schema:creator | An image of the resource. | URI | 0..n | Required (if available) |
schema:geo | An image of the resource. | GeoCoordinates or GeoShape | 0..n | Required (if available) |
How to describe dates? Require only very weak schema:temporal?
3.5.2. Language
For each property with a literal value, the value’s language MUST be specified. The language MUST be expressed as a language code from [BCP47], such as ‘nl’ or ‘nl-NL’.
{ "@context" : "https://schema.org/" , "@id" : "https://example.com/dataset1/resource1" "@type" : [ "CreativeWork" ], "name" : { "@language" : "nl-NL" , "@value" : "Het fluitketeltje en andere versjes" } }
3.5.3. name (required)
Use schema:name and/or rdfs:label?
Should we require a language-tagged string? E.g. person (Vincent van Gogh) or organization names (Van Gogh Museum) can be considered to be untagged.
3.5.4. description
3.5.5. image
3.5.6. license
Does license make sense on the level of individual resources? Or perhaps only on certain types of resources, such as media?
3.5.7. author
Use schema:author or schema:creator?
Where available, reference a person from the Network of Terms.
3.5.8. geo
{ "@context" : "https://schema.org" , "@type" : "Place" , "geo" : { "@type" : "GeoCoordinates" , "latitude" : "40.75" , "longitude" : "-73.98" }, }
4. Formal definition
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix schema: <https://schema.org/> . @prefix sh: <http://www.w3.org/ns/shacl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . _ : CreativeWorkShape a sh : NodeShape ; sh : targetClass schema : CreativeWork ; sh : property _ : NameProperty , _ : DescriptionProperty , _ : CreatorProperty . _ : NameProperty a sh : PropertyShape ; sh : path schema : name ; sh : datatype rdf : langString ; sh : minCount 1 . _ : DescriptionProperty a sh : PropertyShape ; sh : path schema : description ; sh : datatype rdf : langString ; sh : minCount 1 . _ : ImageProperty a sh : PropertyShape ; sh : path schema : image ; sh : class schema : ImageObject ; sh : minCount 0 . _ : CreatorProperty a sh : PropertyShape ; sh : path schema : creator ; sh : or ( [ sh : datatype schema : Person ] [ sh : datatype schema : Organization ] ) ; sh : minCount 1 . _ : GeoCoordinatesShape a sh : NodeShape ; sh : targetClass schema : GeoCoordinates ; sh : property [ sh : path schema : latitude ; sh : datatype xsd : float ; sh : minCount 1 ; sh : maxCount 1 ; ] , [ sh : path schema : longitude ; sh : datatype xsd : float ; sh : minCount 1 ; sh : maxCount 1 ; ] . _ : PlaceShape a sh : NodeShape ; sh : targetClass schema : Place ; sh : property [ sh : path schema : geo ; sh : or ( [ sh : class schema : GeoCoordinates ] [ sh : class schema : GeoShape ] ) ; sh : minCount 0 ; sh : maxCount 1 ; ] .