2011-11-01

On Ontology creation

Last week I held a presentation at the Kirjastoverkkopäivät event in Helsinki and talked about the stages of Ontology creation. Instead of posting the slides here I decided to elaborate more on the topics of the presentation.
The need for ontology creation arises often from having to agree on a formal terminology. Formal terminologies are important to share understanding and to reduce ambiguity. And to create a formal terminology a conceptual framework is needed to build on. This is where ontologies come in.
An ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts.

Ontology standards

To be able to share ontologies and to reuse other ontologies in fragments or full form a common agreement on the building blocks of ontologies and their meaning is needed. The Semantic Web effort of the World Web Foundation has been dealing with this goal and has published several standards to tackle the problem.
The most relevant ontology standards today are RDF Schema, OWL 1.0 & 2.0 and SKOS.
  • RDF Schema example
    published in 1998, standardized in 2004
  • OWL example
    • 1.0 (2004)
      divided into the dialects Light, DL (Description Logic) and Full, extends RDF Schema
    • 2.0 (2009)
      divided into the profiles EL, QL and RL
  • SKOS example
    ontology for vocabularies and thesauri
RDF Schema and OWL are ontology frameworks which provide a type system of classes, properties and instances and SKOS can be seen as a ontology which defines concept types and relations between concepts.
The following sections will show examples to illustrate the differences and similarities of the presented standards.

RDF Schema

The Wine ontology is a popular RDF Schema example as it presents in compact form a lot of RDF schema building blocks. The boxes of the images present classes, predicates and instances while the lines present properties. The ElyseZinfandel box for example is connected to the Zinfandel box via the rdf:type relation which means the resource ElyseZinfandel has the type Zinfandel. It is also connected to the Elyse box, but via the hasMaker relation, and to the Red box, via the hasColor relation.
The predicates rdfs:subClassOf and rdfs:subPropertyOf are used to declare subclass and subproperty relations. rdfs:domain declares the subject range of a property, the type this predicate applies to while rdfs:range declares the object range of a property, i.e. the value range of the property.

OWL

The BBC Programmes ontology defines various concepts of the Media domain. This excerpt highlights the the owl:disjointWith relation of the OWL format.The declaration ProgrammeItem owl:disjointWith Brand for example declares that ProgrammeItem instances cannot be Brand instances and vice versa. Both ProgrammeItem and Brand are sub classes of the Programme class.

SKOS

The magazine ontology of ONKI defines concepts to be used for categorization of magazines. For example Living has the Antiques subcategory while Antiques has the subcategory Antique stores. The skos:broader relation is comparable with the rdfs:subClassOf relation, but while rdfs:subClassOf is transitive, skos:broader isn't. For transitive relations the skos:broaderTransitive relation needs to be used.
For a more detailed presentation of the W3C Ontology and Vocabulary formats visit here.

Creation of your own ontology

Sometimes the available ontologies don't cover your problem domain or define semantics that are not compatible with your needs. Defining your own ontology is then a valid option.
Ontology creation begins of with informal sketching of the concepts of the problem domain. Paper and pen, mind maps or any other sketching tool can be used for this. After that the main modeling dimensions should be defined. Do the concepts define a type or categorization system? Are subclass or subconcept relations the dominating relations or maybe temporal ones or something else? The answers to these questions will guide you to the selection of the appropriate modeling standard. As mentioned before RDF Schema and OWL should be picked to model type systems and SKOS for categorizations.

Relation of your own ontology to others

Sometimes most of your problem domain has already been formalized appropriately in other ontologies. In such a case you can link your own ontology to these external ones. Your own classes, predicates and concepts can be specializations and generalizations of appropriate external entities. Another common way is to define equivalence between entities via the the owl:sameAs relation.
The owl:sameAs relation is used much in the Linked Open Data movement to connect heterogenous datasets.

Where to go next?

Maintenance, further development and distribution need to be taken into account as well. Having the facilities in house to tackle these aspects is a great benefit. Otherwise probably a public host for ontologies such as ONKI should be picked.

ONKI

ONKI is a central ontology host of the finnish Semantic Web scene and is maintained by the SeCo team of the Aalto University. ONKI hosts 72 ontologies and vocabularies altogether: 47 OWL ontologies, 2 RDF Schema ontologies and 22 SKOS vocabularies. Most of these are freely available with a liberal license. The SeCo provides support for the ONKI service via email and notifices via Twitter of service changes and problems.

Using an ONKI ontology

ONKI provides several tools to use ontologies :
  • ONKI Browser - a faceted browser to explore ontology content
  • SOAP interface - for service integration
  • ONKI Selector widget - a JavaScript widget for web application integration
  • Downloads - in RDF/XML form to use the data in your own applications

ONKI Browser

This screenshot shows a part of the ONKI Browser tool, a faceted search for the ONKI content. The ONKI Browser provides free text search for ontologies and concepts and provides a navigatable tree visualization of the concept inheritance.

SOAP interface

The following methods are available in the ONKI SOAP interface:
  • getAvailableLanguages - languages used in concept names in an ontology.
  • getAvailableTypeUris - supported concept type URIs of an ontology.
  • search - searching for ontological concepts.
  • getLabel - fetching a label for the concept with a given URI.
  • expandQuery - expanding concept(s) with given URI(s) (e.g. to subconcepts) for queries.
  • getProperties - fetching properties for the concept with a given URI.
  • getConceptTree - fetching concept hierarchy for the concept with a given URI.
A REST/JSON interface is in beta phase, but the interface is still subject to change.

ONKI Selector wdiget

The ONKI Selector widget is a JavaScript script which can easily be embedded into modern web applications. It provides autocomplete based selection of ontology concepts. It is a fairly easy integration, but provides only label and URI of the selected concept, super concepts need to be fetched separately.
The displayed form fields offer the selection of a specific ontology to be targeted in the search or all, the search query and the language of the labels.

Using an ONKI ontology in your own service

While ONKI is a general ontology host, most of the ontologies are connected to the central YSO ontology, a general finnish language ontology. The following picture shows the umbrella function of the YSO ontology:
The benefit of this approach is that when using an ONKI ontology you have a rich foundation to start with, the downside is that the amount of concepts is huge, the YSO ontology alone hosts 20.000 concepts. Avoiding clashes and semantic mismatches can be difficult with so many concepts to be taken into account.
As the background of YSO is YSA, a central Finnish language glossary, it should probably be treated more as a vocabulary than ontology.

Conclusion

This article presented an overview of the ontology standards RDF Schema, OWL and SKOS with examples and a small introduction into the toolset of the ONKI service. If you have experiences with the ONKI service you'd like to share please comment on this post.