Schema.org Application Profile

Living Standard,

This version:
https://netwerk-digitaal-erfgoed.github.io/schema-profile/
Issue Tracking:
GitHub
Inline In Spec
Editors:
David de Boer (Netwerk Digitaal Erfgoed)

Abstract

This document specifies the generic data model to be used when publishing linked data in the heritage network. The model enables dataset consumers, including software developers building data-consuming applications, to use, understand and combine datasets from multiple sources, thus fulfilling the promise of linked data.

Note: Please don’t rely on this document yet as it’s still under discussion and development.

1. Introduction

1.1. Goal

This document prescribes the generic data model to be used when publishing linked data in the heritage network. The model consists of a minimal set of classes and properties. It is based on:

By adhering to this model, dataset publishers ensure that their data is visible and can be consumed and combined with other datasets in the network.

1.2. Scope

These requirements are restricted in three ways:

  1. they apply only to the way published data is expressed, not how it is stored or managed internally;

  2. they prescribe a generic data model and leave the use of domain data models up to dataset publishers;

  3. they bear upon datasets, not their descriptions; for the latter see [NDE-DATASETS].

1.3. Examples

While RDF examples in this document are in the [JSON-LD] RDF serialization, publishers MAY use any RDF serialization format, such as [Turtle] or [N3].

2. Definitions

Data model

Set of classes and their properties that defines how data is expressed.

Generic data model

A simple, shared data model; the scope of this document. See also [NDE-ALIGNMENT]. Can be used alongside domain data models.

Domain data model

A domain-specific data model, such as CIDOC-CRM, Linked Art, RiC-O or RDA. Can be used alongside a generic data model. Adds precision at the cost of complexity. Out of this document’s scope.

Metadata record

An RDF resource that expresses one of the top-level classes in the § 4 Data model.

Term

A word, name, acronym, phrase or other symbol with a formal definition, published in the Network of Terms.

3. General considerations

3.1. Generic and domain data models

The purpose of generic data models is to integrate data in the heritage network and make it more visible. Domain models are usually more richly populated and provide consumers with more possibilities for further processing, for example in service platforms.

This document is limited to a set of classes and properties that together form the generic data model. For most datasets, the generic data model expresses only a subset of data properties that are available. This document’s purpose, therefore, is not a complete and correct expression of the source data, but an easily understandable and usable one.

If done well, the generic data invites consumers to explore the data in more depth using the domain data models. So to facilitate further exploration, publishers MAY use domain data models of their choosing alongside the generic data model. Examples are:

3.2. Vocabulary

The generic data model presented in this document is designed as a [SCHEMA-ORG] application profile. The choice for Schema.org is substantiated in Implementation guidelines for NDE alignment § generic-data-model.

While the Schema.org website considers “both 'https://schema.org' and 'http://schema.org' (...) fine”, mixing the namespaces makes it harder to consume datasets.

Therefore, Publishers MUST use the https://schema.org/ (HTTPS) namespace for Schema.org; not http://schema.org/ (HTTP).

3.3. Language

For each property with a literal value, the value’s language MUST be specified, The language MUST be expressed as a language code from [BCP47], such as ‘nl’ or ‘nl-NL’.

Specifying the language of the name property:
{  "@context": "https://schema.org/",  "@id": "https://n2t.net/ark:/123456/1",  "@type": "CreativeWork",  "name": [    {      "@language": "nl",      "@value": "De Sterrennacht"    },    {      "@language": "en",      "@value": "The Starry Night"    }  ]}

Even if only one language is available, the language MUST be specified.

Specify the language property even for a single value:
{  "@context": "https://schema.org/",  "name": {    "@language": "nl",    "@value": "De Sterrennacht"  }}

3.4. Publication method

3.4.1. Combined

With RDF, it’s perfectly fine to express the same data in multiple ways. Therefore, the generic and domain data models MAY coexist in the same information resource.

Combine generic (Schema.org) with domain-specific (Linked Art) modelling.
 {
   "@context": {
     "schema": "https://schema.org/",
     "edm": "http://www.europeana.eu/schemas/edm/",
     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
     "dcterms": "http://purl.org/dc/terms/"
   },
   "@id": "https://literatuurmuseum.nl/id/123456789",
   "@type": ["schema:CreativeWork", "schema:VisualArtwork"],
   "schema:name": "Het fluitketeltje en andere versjes",
   "rdfs:label": "Het fluitketeltje en andere versjes",
   "schema:creator": {
     "@type": "schema:Person",
     "@id": "http://data.rkd.nl/artists/8342"
   },
   "dcterms:creator": {
     "@type": "dcterms:Agent",
     "@id": "http://data.rkd.nl/artists/8342"
   }
}

3.4.2. Separate profiles

Alternatively, publishers MAY separate the generic data model by using profile-based content negotiation (see [DX-PROF-CONNEG]). To do so, publish a profile with URI https://netwerk-digitaal-erfgoed.github.io/schema-profile/.

Expose the generic data model in its own content-negotiated profile.
# Get the list of profiles.
GET /resource/a?profile=alt HTTP/1.1

# Server responds with a list of profiles that includes the NDE generic data model.
HTTP/1.1 200 OK
Content-Type: application/json

{
    "resource": "http://example.org/resource/a",
    "profiles": [
        {
            "token": "nde",
            "uri": "https://netwerk-digitaal-erfgoed.github.io/schema-profile/",
            "media_types": ["application/ld+json", "text/turtle"]
        },
        ...
    ]
}

4. Data model

This section describes the classes and properties that MUST be used to publish metadata records in the heritage network.

Each record MUST be typed as one of the following classes:

For each of these classes, the sections below list the REQUIRED and OPTIONAL properties.

4.1. CreativeWork

The most generic kind of item created by humans, i.e. heritage objects.

Candidate properties [Issue #3]

4.1.1. Subclasses

Publishers SHOULD use more fine-grained classes alongside the top-level class CreativeWork. Examples include:

A painting is typed as both top-level CreativeWork and the more specific Painting:
{
  "@context": "https://schema.org/",
  "@id": "https://n2t.net/ark:/123456/1",
  "@type": ["CreativeWork", "Painting"]
}

4.1.2. URI (required)

Each CreativeWork MUST be identified by a persistent URI. Blank nodes MUST NOT be used for CreativeWorks.

Specify the URI in the JSON-LD @id property:
{  "@context": "https://schema.org/",  "@id": "https://example.com/dataset1/resource1",  "@type": "CreativeWork"}

Do we need identifier alongside URI? Not from a web perspective (where we care only about URIs) but perhaps identifier is useful to reference physical objects, e.g. in a museum.

4.1.3. name (required)

A REQUIRED property to indicate the CreativeWork’s name, assigned either by its creator or by others. The name MUST be a language-tagged string:

A language-tagged name:
{  "@context": "https://schema.org/",  "@id": "https://example.com/dataset1/resource1",  "@type": "CreativeWork",  "name": [    {      "@language": "nl",      "@value": "De Sterrennacht"    },    {      "@language": "en",      "@value": "The Starry Night"    }  ]}

4.1.4. creator (required)

A REQUIRED property that identifies the person(s) or organization that created the CreativeWork. If a term is available, that MUST be referenced. If not, a Person or Organization resource MUST be used instead.

Van Gogh’s painting The Starry Night:
{  "@context": "https://schema.org",  "@type": ["CreativeWork", "Painting"],  "@id": "http://www.wikidata.org/entity/Q45585",  "creator": {    "@id": "https://data.rkd.nl/artists/32439",    "@type": "Person",    "name": "Rembrandt"  }}

Even where more specific properties, applicable to CreativeWork’s subtypes, are available in Schema.org, such as artist, composer and director, the creator property MUST be used for consistency.

4.1.5. isPartOf (required)

A REQUIRED property that points to the dataset(s) that the CreativeWork is part of. Note that a CreativeWork may be part of multiple datasets. The dataset MUST be typed as a Dataset.

{  "@context": "https://schema.org/",  "@id": "https://n2t.net/ark:/123456/1",  "@type": "CreativeWork",  "isPartOf": {    "@id": "https://organization.com/dataset1",    "@type": "Dataset"  }}

4.1.6. associatedMedia (required)

Or use specialized properties schema:image, schema:video, schema:audio alongside or without schema:associatedMedia?

A media object that represents the CreativeWork. This property is REQUIRED if applicable, i.e. if at least one media object is available for the metadata record.

A IIIF image representation of The Starry Night:
{  "@context": "https://schema.org/",  "@id": "http://www.wikidata.org/entity/Q45585",  "@type": "CreativeWork",  "associatedMedia": {    "@id": "https://demo.limb-gallery.com/idurl/1/25290",    "@type": "ImageObject",    "contentUrl": "https://demo.limb-gallery.com/iiif/25290/manifest",    "encodingFormat": "application/ld+json"  }}

See MediaObject for this property’s allowed values.

How to refer to media that is not part of the dataset, such as external images that are used not as unique representations but as illustrations of the CreativeWork?

4.1.7. description

An OPTIONAL property that describes the CreativeWork in one sentence. The description MUST be free of jargon and abbreviations so it can be understood by others. The value MUST be a language-tagged string.

A one-sentence description:
{  "@context": "https://schema.org/",  "@id": "https://example.com/dataset1/resource1",  "description": [    {      "@language": "nl",      "@value": "Olieverfschilderij van het uitzicht uit Van Goghs ziekenhuiskamer in Saint-Rémy-de-Provence, vlak voor zonsopkomst."    },    {      "@language": "en",      "@value": "Oil-on-canvas painting depicting the view from his asylum room at Saint-Rémy-de-Provence, just before sunrise."    }  ]}

4.1.8. abstract

An OPTIONAL property that provides a longer summarizing description of the CreativeWork.

{  "@context": "https://schema.org/",  "@id": "https://example.com/dataset1/resource1",  "abstract": [    {      "@language": "nl",      "@value": "Het schilderij is een nachttafereel met gele sterren boven een kleine stad met heuvels. Het is een uitzicht vanuit een denkbeeldig punt over een dorp met kerktoren, met links een vlammende cipres en rechts olijfbomen tegen de heuvels op."    }  ]}

4.1.9. license

Does license make sense on the level of individual resources of should we delegate to the level of the dataset? Or perhaps only on certain types of resources, such as media?

4.1.10. contentLocation

An OPTIONAL property that indicates the location depicted or described in the CreativeWork. For example, the location in a photograph or painting.

If available, a term MUST be referenced. If not, a Place resource MUST be used.

Van Gogh’s painting The Starry Night:
 {
   "@context": "https://schema.org",
   "@id": "http://www.wikidata.org/entity/Q45585",
   "contentLocation": {
     "@id": "http://www.wikidata.org/entity/Q221507",
     "@type": "Place",
     "name": "Saint-Rémy-de-Provence"
  }
}

4.1.11. locationCreated

An OPTIONAL property that indicates the location where the CreativeWork was created (which may be different from its contentLocation).

If available, a term MUST be referenced. If not, a Place resource MUST be used.

Van Gogh’s painting The Starry Night:
 {
   "@context": "https://schema.org",
   "@id": "http://www.wikidata.org/entity/Q45585",
   "locationCreated": {
     "@id": "http://www.wikidata.org/entity/Q221507",
     "@type": "Place",
     "name": "Saint-Rémy-de-Provence"
  }
}

4.1.12. dateCreated

An OPTIONAL property that indicates the date the CreativeWork was created.

The value MUST be in [ISO8601] format. Partial dates MAY be used if the exact date is unknown.

Van Gogh painted The Starry Night in June 1889:
 {
   "@context": "https://schema.org",
   "@id": "http://www.wikidata.org/entity/Q45585",
   "dateCreated": "1889-06"
}

4.1.13. about

An OPTIONAL property to indicate the subject-matter of the CreativeWork. For example, which subjects are depicted in a painting or photograph? Or which subjects is a story about?

The value MUST reference terms.

If the subject is a location, it MUST be listed under contentLocation instead.

The Starry Night depicts ‘starry sky’ and ‘Moon’.
 {
   "@context": "https://schema.org",
   "@id": "http://www.wikidata.org/entity/Q45585",
   "about": [
     {
       "@id": "http://www.wikidata.org/entity/Q149908",
       "@type": "DefinedTerm"
     },
     {
       "@id": "http://www.wikidata.org/entity/Q405",
       "@type": "DefinedTerm"
     }
   ]
}

Whereas schema:about has range schema:Thing, schema:material and other properties do not. This means we can use schema:DefinedTerm for schema:about but not for schema:material. Should we drop schema:DefinedTerm completely?

4.1.14. material

An OPTIONAL property that indicates the material(s) that the CreativeWork is made from, e.g. leather, wool, cotton, paper. The value MUST reference terms.

The Starry Night is made from ‘oil paint’ and ‘canvas’:
{  "@context": "https://schema.org",  "@id": "http://www.wikidata.org/entity/Q45585",  "material": [    {      "@id": "http://vocab.getty.edu/aat/300015050"    },    {      "@id": "http://vocab.getty.edu/aat/300014078"    }  }}

4.1.15. genre

An OPTIONAL property that indicates the genre(s) of the CreativeWork, for example art movements or periods.

The value MUST reference a term.

The Starry Night belongs to the Post-Impressionist art movement:
{  "@context": "https://schema.org",  "@id": "http://www.wikidata.org/entity/Q45585",  "genre": {    "@id": "http://vocab.getty.edu/aat/300021508"  }}

4.2. Person

If a metadata record is a person, it MUST be typed as Person. If a term is available for the person, that MUST be referenced. If not, the person MUST be defined by the required properties listed below.

The objective for the Person model is not to fully describe all aspects of a person, but to easily identify and distinguish between similar persons.

Consider candidate properties nationality, description, familyName, givenName.

4.2.1. name (required)

A REQUIRED property that indicates the Person’s full name in its preferred display form:

Person with a language-tagged name:
{  "@context": "https://schema.org",  "@type": "Person",  "@id": "https://n2t.net/ark:/123456/2",  "name": {    "@language": "nl-NL",    "@value": "Pluk van de Petteflat"  }}

Does it make sense to require person names to be language-tagged? Think about languages that show names in a different format, such as ZH.

4.2.2. birthDate

An OPTIONAL property that indicates the person’s date of birth in [ISO8601] format.

4.2.3. birthPlace

An OPTIONAL property that references the person’s place of birth. The value MUST reference a term.

{  "@context": "https://schema.org",  "@id": "https://n2t.net/ark:/123456/2",  "birthPlace": {    "@id": "https://sws.geonames.org/2745912/"  }}

4.2.4. deathDate

An OPTIONAL property that indicates the person’s date of death in [ISO8601] format.

4.2.5. deathPlace

An OPTIONAL property that references the person’s place of death. The value MUST reference a term.

{  "@context": "https://schema.org",  "@id": "https://n2t.net/ark:/123456/2",  "deathPlace": {    "@id": "http://www.wikidata.org/entity/Q131153786"  }}

4.2.6. hasOccupation

An OPTIONAL property that indicates the person’s occupation. The value MUST reference a term.

A carpenter:
{  "@context": "https://schema.org",  "@id": "https://n2t.net/ark:/123456/2",  "hasOccupation": {    "@id": "http://vocab.getty.edu/aat/300025008"  }}

4.3. Organization

4.3.1. name (required)

A REQUIRED property that indicates the Organization’s full name in its preferred display form.

Do we need more properties for Organization?

4.4. MediaObject

In case of image, video or audio objects, the relevant subclass MUST be used:

In case of other types of media, the generic class MediaObject MUST be used.

4.4.1. ImageObject

An ImageObject MUST have a contentUrl property that points to a IIIF Presentation API manifest.

See [Issue #2]

Should we support non-IIIF clients/users?

A simple image of a painting.
{
  "@context": "https://schema.org/",
  "@id": "https://example.com/image1",
  "@type": "ImageObject",
  "contentUrl": "https://example.com/image1/manifest.json",
  "encodingFormat": "application/ld+json"
}

4.4.2. AudioObject

TODO

4.4.3. VideoObject

TODO

4.5. Place

For properties that reference locations, if no term is available, a custom Place resource MUST be used.

4.5.1. address (required)

A property that indicates the Place’s address, REQUIRED if known.

REQUIRED address properties are:

A place with an address:
{
  "@context": "https://schema.org/",
  "@id": "https://example.com/dataset/place",
  "@type": "Place",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "Street 123",
    "postalCode": "1234 AB",
    "addressLocality": "City",
    "addressRegion": "Noord-Holland",
    "addressCountry": "NL"
  }
}

4.5.2. geo (required)

A property that indicates the Place’s geo coordinates, REQUIRED if known.

A place with coordinates:
{
  "@context": "https://schema.org/",
  "@id": "https://example.com/dataset/place",
  "@type": "Place",
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": "37.42242",
    "longitude": "-122.08585"
  }
}

5. Example

A full example of a metadata record:

Add full example.

{
  "@context": "https://schema.org/",
  "@id": "https://literatuurmuseum.nl/id/123456789",
  "@type": "CreativeWork",
  "name": "Het fluitketeltje en andere versjes",
  "creator": {
    "@type": "Person",
    "@id": "http://data.rkd.nl/artists/8342"
  },
  "material": {
    "@id": "https://data.cultureelerfgoed.nl/term/id/cht/2d28d9aa-77e8-40ab-b0fe-f04d99f57955"
  },
  "dateCreated": "1950"
}

6. Formal definition

This SHACL file does not yet reflect all changes in the text above.

A formal definition of the generic data model in [SHACL].
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:CreativeWorkShape
    a sh:NodeShape ;
    sh:targetClass schema:CreativeWork ;
    sh:property
        _:NameProperty ,
        _:DescriptionProperty ,
        _:CreatorProperty .

_:NameProperty
    a sh:PropertyShape ;
    sh:path schema:name ;
    sh:datatype rdf:langString ;
    sh:minCount 1.

_:DescriptionProperty
    a sh:PropertyShape ;
    sh:path schema:description ;
    sh:datatype rdf:langString ;
    sh:minCount 1 .

_:ImageProperty
    a sh:PropertyShape ;
    sh:path schema:image ;
    sh:class schema:ImageObject ;
    sh:minCount 0 .

_:CreatorProperty
    a sh:PropertyShape ;
    sh:path schema:creator ;
    sh:or (
              [ sh:datatype schema:Person ]
              [ sh:datatype schema:Organization ]
          ) ;
    sh:minCount 1 .

_:GeoCoordinatesShape
    a sh:NodeShape ;
    sh:targetClass schema:GeoCoordinates ;
    sh:property [
        sh:path schema:latitude ;
        sh:datatype xsd:float  ;
        sh:minCount 1 ;
        sh:maxCount  1 ;
    ] ,
    [
        sh:path schema:longitude  ;
        sh:datatype xsd:float ;
        sh:minCount 1 ;
        sh:maxCount  1 ;
    ] .

_:PlaceShape
    a sh:NodeShape ;
    sh:targetClass schema:Place ;
    sh:property [
        sh:path schema:geo ;
        sh:or (
            [ sh:class schema:GeoCoordinates ]
            [ sh:class schema:GeoShape ]
        ) ;
        sh:minCount 0 ;
        sh:maxCount 1 ;
    ] .

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[BCP47]
A. Phillips, Ed.; M. Davis, Ed.. Tags for Identifying Languages. September 2009. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc5646
[DX-PROF-CONNEG]
Lars G. Svensson; Rob Atkinson; Nicholas Car. Content Negotiation by Profile. URL: https://w3c.github.io/dx-connegp/connegp/
[ISO3166-1]
Codes for the representation of names of countries and their subdivisions — Part 1: Country code. August 2020. Published. URL: https://www.iso.org/standard/72482.html
[ISO8601]
Representation of dates and times. ISO 8601:2004.. 2004. ISO 8601:2004. URL: http://www.iso.org/iso/catalogue_detail?csnumber=40874
[JSON-LD]
Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 3 November 2020. REC. URL: https://www.w3.org/TR/json-ld/
[N3]
Tim Berners-Lee; Dan Connolly. Notation3 (N3): A readable RDF syntax. 14 January 2008. W3C Team Submission. URL: https://www.w3.org/TeamSubmission/2008/SUBM-n3-20080114/
[NDE-ALIGNMENT]
Sjors de Valk; Ivo Zandhuis; Bob Coret. Implementation guidelines for NDE alignment. URL: https://netwerk-digitaal-erfgoed.github.io/cm-implementation-guidelines/
[NDE-DATASETS]
David de Boer; Bob Coret. NDE Requirements for Datasets. Living Specification. URL: https://netwerk-digitaal-erfgoed.github.io/requirements-datasets/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SCHEMA-ORG]
W3C Schema.org Community Group. Schema.org. 6.0. URL: https://schema.org/
[SHACL]
Holger Knublauch; Dimitris Kontokostas. Shapes Constraint Language (SHACL). URL: https://w3c.github.io/data-shapes/shacl/
[Turtle]
Eric Prud'hommeaux; Gavin Carothers. RDF 1.1 Turtle. URL: https://w3c.github.io/rdf-turtle/spec/

Issues Index

Candidate properties [Issue #3]
Do we need identifier alongside URI? Not from a web perspective (where we care only about URIs) but perhaps identifier is useful to reference physical objects, e.g. in a museum.
Or use specialized properties schema:image, schema:video, schema:audio alongside or without schema:associatedMedia?
How to refer to media that is not part of the dataset, such as external images that are used not as unique representations but as illustrations of the CreativeWork?
Does license make sense on the level of individual resources of should we delegate to the level of the dataset? Or perhaps only on certain types of resources, such as media?
Whereas schema:about has range schema:Thing, schema:material and other properties do not. This means we can use schema:DefinedTerm for schema:about but not for schema:material. Should we drop schema:DefinedTerm completely?
Consider candidate properties nationality, description, familyName, givenName.
Does it make sense to require person names to be language-tagged? Think about languages that show names in a different format, such as ZH.
Do we need more properties for Organization?
See [Issue #2]
Should we support non-IIIF clients/users?
TODO
TODO
Add full example.
This SHACL file does not yet reflect all changes in the text above.