Linked Open Data

Last Updated 14 January 2026 Show Versions

DESCRIPTION

Linked Open Data (LOD) is information (data) that has been structured in a way that is readable by machines and is also fully accessible online to anyone with an internet connection (meaning 'open'). The data should be in a non-proprietary format, both human and machine-readable, and assigned an open licence (such as CC-BY) (Meyers, n.d.). The model for LOD is based on the Linked Data model created by Tim Berners-Lee, known as the inventor of the World Wide Web. Linked Open Data (LOD) must be structured using recognised standards so that computers can recognise the data and perform reliable analysis (Blaney, 2017), and follow the set of best practices known as the Linked Data Principles, which establish two primary ideas: the first is to use clear standards for representation and access to data on the internet, and the second is the application of these principles to create reliable hyperlinks, or connections, between data from different sources. These hyperlinks connect all the Linked Data together, creating what is called a 'global data graph', allowing for computerised analysis of a related body of information (Bizer, Vidal & Skaf-Molli, 2018).

All elements must be assigned unique identifiers so that similar data can be disambiguated, and these must also draw, where possible, from existing, widely-used, mature and trusted databases. These may include individuals and places, for instance. Following the assignment of these identifiers, the relationships between them must be described semantically. This is done via what are called 'triples', which are composed of three pieces of information: the subject, predicate, and object. The predicate reveals the relationship between the other two data elements that have been previously assigned the unique, disambiguating identifiers. LOD must consist of these triples, and where possible their component elements and the ways relationships are described should aim for formal standardisation. An important way to further assure the correct individual or element is referred to is by using a URI, or Unique Resource Identifier, which applies to the exact database the element has been pulled from (Blaney, 2017).

LOD can 'answer' complex questions by the way it organises knowledge. When triples are linked together, this forms what is called a conceptual graph, which can then be queried using coded language to retrieve answers from the data. Models of the way information in a particular area of study is connected and how these connections can be represented are called ontologies. Selecting from existing ontological vocabularies relevant to the subject or discipline for use in an LOD is accepted as best practice, rather than attempting to author one's own. While LOD provides similar results to relational databases, LOD can advance the query potential and increase the data available by linking datasets that are not already linked and were not necessarily intended to be linked. LOD uses a standard called the Resource Description Framework (RDF), a recognised data model that describes 'how data is structured on a theoretical level' (Blaney, 2017).

Linked Open Data has long been used in computer science and information technology fields, but has become applied more to humanities fields in recent years, particularly the digital humanities. With increasing amounts of information becoming available in large-scale databases across the world, LOD querying allows for discovery of relevant information on a scale that would be prohibitive for individual humans to achieve. Use of LOD has been adopted into fields such as linguistics, history, literature, and cultural heritage studies, among others.

Antopolsky (2022) performed a review of existing LOD projects in the digital humanities, and found that in the 'social sciences and the humanities, linked open data technology is one of the most promising directions for the integration of information resources' (119). Highlighted projects include LINCS, an LOD project that surfaces Canadian humanities and cultural data; LODI4DH (Linked Open Data Infrastructure for Digital Humanities in Finland); WarSampo Portal, which provides linked historical data on the involvement of Finland WWII; artresearch.net, originally cited as Pharosartsresearch.org, which provides LOD access to PHAROS: The International Association of Photo Archives, including 'millions of photographs of works of art, and over a century of art historical documentation'; and PatrickAloud, an LOD project documenting the history and cultural context of political dissident Patrick Zaki and human rights researcher, who was arrested and held in an Egyptian jail in early 2020 and presently remains in detention.

Kudera et al. (2024, 49) introduce an LOD project out of Trier University in Germany: LODinG – Linked Open Data in the Humanities, with several diverse work packages focusing on 'effective methods of collecting, modeling, linking, releasing and analyzing machine-readable information relevant to (digital) humanities research in the form of LOD'. The authors also nod to existing projects at the intersection of literary studies and LOD, such as the GOLEM project (Graphs and Ontologies for Literary Evolution Models) at Groningen University and the MEDIATE project at Radboud University, providing data on the literary production systems in Europe during the 18th century.

Numerous other applications include the QLIT (Queer Literature Indexing Thesaurus), a bibliographic database and thesaurus covering LGBTQ+ themes in fiction literature in Sweden from the 7th century through today (more on this from Matsson & Kriström, 2023); and the Pelagios Network, which links important databases including geospatial data concerning the ancient world, to provide a resource for academics and nonacademics interested in classical studies. There are also further projects around poetry such as POSTDATA (Poetry Standardization and Linked Open Data), which aims to 'shorten the digital gap between technology and poetry', doing so by transforming traditional scholarship on poetry into a digital humanities research environment, and DISCO (Diachronic Spanish Sonnet Corpus), a corpus of over 4000 sonnets written in Spanish between the 15th and the 19th centuries, which was created to provide 'quantitative evidence on the evolution of sonnets in Spanish' (Fabo, et al. 2021). LOD also allows for an increased potential for interdisciplinary applications, including 'unanticipated reuse from cross-discipline studies as well as [improving] intra-disciplinary collaborations' (Meyers, n.d.).

Bandini (2025, 290) considers the application of Linguistic Linked Open Data (LLOD) in the digital humanities, going beyond 'the representation of cataloguing metadata and conceptual-semantic knowledge, e.g., vocabularies, taxonomies and ontologies', extending to include 'other textual data including primary and secondary data sources — such as artefacts containing texts and images, possibly with annotations'. However, the authors found that there still exists a 'limited number of initiatives' and 'relative immaturity of the field', with 'comparatively few' 'projects that experiment with representing texts themselves as LOD' (306). Nevertheless, there may be potential in further mapping textual outputs semantically for humanities disciplines focused on language.

References

Antopolsky, A.B. (2022) 'Linked Open Data in the Digital Humanities (Review of Publications)', Scientific and Technical Information Processing, 49(2), 119–126. https://doi.org/10.3103/S014768822202006X

Bandini, M. and Quochi, V. (2025) 'A Systematic Literature Review on the Representation of Texts as Linguistic Linked Open Data', Umanistica Digitale, (20), 289–315. https://doi.org/10.6092/issn.2532-8816/21195

Bizer, C., Vidal, M.-E. and Skaf-Molli, H. (2018) 'Linked Open Data', in Encyclopedia of Database Systems. Springer, New York, NY, pp. 2096–2101. https://doi.org/10.1007/978-1-4614-8265-9_80603

Blaney, J. (2017) 'Introduction to the Principles of Linked Open Data', Programming Historian [Preprint]. https://programminghistorian.org/en/lessons/intro-to-linked-data [accessed 14 October 2025]

Fabo, P.R. et al. (2021) 'The Diachronic Spanish Sonnet Corpus: TEI and Linked Open Data Encoding, Data Distribution, and Metrical Findings', Digital Scholarship in the Humanities, 36(Supplement_1), i68–i80. https://doi.org/10.1093/llc/fqaa035

Kudera, J. et al. (2024) 'LODinG: Linked Open Data in the Humanities', in C. Chiarcos et al. (eds) Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024. Torino, Italia: ELRA and ICCL, pp. 49–54. https://aclanthology.org/2024.ldl-1.7/ [accessed 18 November 2025]

Matsson, A. and Kriström, O. (2023) 'Building and Serving the Queerlit Thesaurus as Linked Open Data', Digital Humanities in the Nordic and Baltic Countries Publications, 5(1), 29–39. https://doi.org/10.5617/dhnbpub.10648

Meyers, K. (n.d.) 'An Introduction to Linked Open Data'. https://www.insidehighered.com/blogs/gradhacker/introduction-linked-open-data [accessed 14 October 2025]

Catalogue of Open Research Practices in the Arts, Humanities, and Social Sciences

Linked Open Data

DESCRIPTION

References

Version History