On Ontology creation

Last week I held a presentation at the Kirjastoverkkopäivät event in Helsinki and talked about the stages of Ontology creation. Instead of posting the slides here I decided to elaborate more on the topics of the presentation.
The need for ontology creation arises often from having to agree on a formal terminology. Formal terminologies are important to share understanding and to reduce ambiguity. And to create a formal terminology a conceptual framework is needed to build on. This is where ontologies come in.
An ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts.

Ontology standards

To be able to share ontologies and to reuse other ontologies in fragments or full form a common agreement on the building blocks of ontologies and their meaning is needed. The Semantic Web effort of the World Web Foundation has been dealing with this goal and has published several standards to tackle the problem.
The most relevant ontology standards today are RDF Schema, OWL 1.0 & 2.0 and SKOS.
  • RDF Schema example
    published in 1998, standardized in 2004
  • OWL example
    • 1.0 (2004)
      divided into the dialects Light, DL (Description Logic) and Full, extends RDF Schema
    • 2.0 (2009)
      divided into the profiles EL, QL and RL
  • SKOS example
    ontology for vocabularies and thesauri
RDF Schema and OWL are ontology frameworks which provide a type system of classes, properties and instances and SKOS can be seen as a ontology which defines concept types and relations between concepts.
The following sections will show examples to illustrate the differences and similarities of the presented standards.

RDF Schema

The Wine ontology is a popular RDF Schema example as it presents in compact form a lot of RDF schema building blocks. The boxes of the images present classes, predicates and instances while the lines present properties. The ElyseZinfandel box for example is connected to the Zinfandel box via the rdf:type relation which means the resource ElyseZinfandel has the type Zinfandel. It is also connected to the Elyse box, but via the hasMaker relation, and to the Red box, via the hasColor relation.
The predicates rdfs:subClassOf and rdfs:subPropertyOf are used to declare subclass and subproperty relations. rdfs:domain declares the subject range of a property, the type this predicate applies to while rdfs:range declares the object range of a property, i.e. the value range of the property.


The BBC Programmes ontology defines various concepts of the Media domain. This excerpt highlights the the owl:disjointWith relation of the OWL format.The declaration ProgrammeItem owl:disjointWith Brand for example declares that ProgrammeItem instances cannot be Brand instances and vice versa. Both ProgrammeItem and Brand are sub classes of the Programme class.


The magazine ontology of ONKI defines concepts to be used for categorization of magazines. For example Living has the Antiques subcategory while Antiques has the subcategory Antique stores. The skos:broader relation is comparable with the rdfs:subClassOf relation, but while rdfs:subClassOf is transitive, skos:broader isn't. For transitive relations the skos:broaderTransitive relation needs to be used.
For a more detailed presentation of the W3C Ontology and Vocabulary formats visit here.

Creation of your own ontology

Sometimes the available ontologies don't cover your problem domain or define semantics that are not compatible with your needs. Defining your own ontology is then a valid option.
Ontology creation begins of with informal sketching of the concepts of the problem domain. Paper and pen, mind maps or any other sketching tool can be used for this. After that the main modeling dimensions should be defined. Do the concepts define a type or categorization system? Are subclass or subconcept relations the dominating relations or maybe temporal ones or something else? The answers to these questions will guide you to the selection of the appropriate modeling standard. As mentioned before RDF Schema and OWL should be picked to model type systems and SKOS for categorizations.

Relation of your own ontology to others

Sometimes most of your problem domain has already been formalized appropriately in other ontologies. In such a case you can link your own ontology to these external ones. Your own classes, predicates and concepts can be specializations and generalizations of appropriate external entities. Another common way is to define equivalence between entities via the the owl:sameAs relation.
The owl:sameAs relation is used much in the Linked Open Data movement to connect heterogenous datasets.

Where to go next?

Maintenance, further development and distribution need to be taken into account as well. Having the facilities in house to tackle these aspects is a great benefit. Otherwise probably a public host for ontologies such as ONKI should be picked.


ONKI is a central ontology host of the finnish Semantic Web scene and is maintained by the SeCo team of the Aalto University. ONKI hosts 72 ontologies and vocabularies altogether: 47 OWL ontologies, 2 RDF Schema ontologies and 22 SKOS vocabularies. Most of these are freely available with a liberal license. The SeCo provides support for the ONKI service via email and notifices via Twitter of service changes and problems.

Using an ONKI ontology

ONKI provides several tools to use ontologies :
  • ONKI Browser - a faceted browser to explore ontology content
  • SOAP interface - for service integration
  • ONKI Selector widget - a JavaScript widget for web application integration
  • Downloads - in RDF/XML form to use the data in your own applications

ONKI Browser

This screenshot shows a part of the ONKI Browser tool, a faceted search for the ONKI content. The ONKI Browser provides free text search for ontologies and concepts and provides a navigatable tree visualization of the concept inheritance.

SOAP interface

The following methods are available in the ONKI SOAP interface:
  • getAvailableLanguages - languages used in concept names in an ontology.
  • getAvailableTypeUris - supported concept type URIs of an ontology.
  • search - searching for ontological concepts.
  • getLabel - fetching a label for the concept with a given URI.
  • expandQuery - expanding concept(s) with given URI(s) (e.g. to subconcepts) for queries.
  • getProperties - fetching properties for the concept with a given URI.
  • getConceptTree - fetching concept hierarchy for the concept with a given URI.
A REST/JSON interface is in beta phase, but the interface is still subject to change.

ONKI Selector wdiget

The ONKI Selector widget is a JavaScript script which can easily be embedded into modern web applications. It provides autocomplete based selection of ontology concepts. It is a fairly easy integration, but provides only label and URI of the selected concept, super concepts need to be fetched separately.
The displayed form fields offer the selection of a specific ontology to be targeted in the search or all, the search query and the language of the labels.

Using an ONKI ontology in your own service

While ONKI is a general ontology host, most of the ontologies are connected to the central YSO ontology, a general finnish language ontology. The following picture shows the umbrella function of the YSO ontology:
The benefit of this approach is that when using an ONKI ontology you have a rich foundation to start with, the downside is that the amount of concepts is huge, the YSO ontology alone hosts 20.000 concepts. Avoiding clashes and semantic mismatches can be difficult with so many concepts to be taken into account.
As the background of YSO is YSA, a central Finnish language glossary, it should probably be treated more as a vocabulary than ontology.


This article presented an overview of the ontology standards RDF Schema, OWL and SKOS with examples and a small introduction into the toolset of the ONKI service. If you have experiences with the ONKI service you'd like to share please comment on this post.


Accessing Helsinki region statistics from your webapp

In this post I'd like to describe how the statistical data from the Semantic.hfi.fi service can be accessed from your web application via the demo app Areastats. Semantic.hri.fi offers several JSON/P web services which can directly be used from any website.

Areastats displays statistics when accessed from within the Finnish capital region. It uses internally the

  • HTML 5 Geolocation API to obtain the access location as WGS 85 coordinates,
  • a simple webservice to convert the coordinates into a code for the area the location resides in,
  • the Semantic.hri.fi Faceted Search API to access statistics for this area and
  • the jQuery API for DOM manipulation and Ajax

The geo-location-js API is used as an adapter for the incompatible Geolocation APIs used in various browsers. It provides a convenient access point to register callbacks for action when the location could be obtained and when not:

if (geo_position_js.init()) {
    geo_position_js.getCurrentPosition(geo_success, geo_error);

The geo_success callback then retrieves the area code from a web service via Ajax and uses the fetch_stats function to process the result:

function geo_success(p) {
    url: "polygon", 
    data: { "latitude": p.coords.latitude, "longitude": p.coords.longitude}, 
    dataType: "json", 
    success: fetch_stats,
    error: geo_error

The fetch_stats then makes an Ajax call to obtain the results:

function fetch_stats(d) {
  var alue = "alue:_" + d.KUNTA + "_" + d.PIEN + "_" + d.Nimi;
    url: "http://semantic.hri.fi/search",
    data: { 
      "include": ["items"], "limit":50, "offset":0, 
      "value": [
        "dataset:A025_HKI_Vakiluku1962"] },
    dataType: "jsonp",
    success: fetch_stats_success,
    error: fetch_stats_error

The query defines that we are interested in statistical items from the year 2010, with the unit persons (Henkilöä) and from the dataset dataset:A025_HKI_Vakiluku1962. For details on how to use the service of Semantic.hri.fi, please read the embedded documentation.

The fetch_stats_success callback then has the final results and draws a pie chart based on them:

function fetch_stats_success(d) {
  var age_index = -1;
  for (var i = 0; i < d.headers.length; i++) {
    if (d.headers[i] == "dimension:Ikäryhmä") {
      age_index = i;

  for (var i = 0; i < d.items.length; i++) {
    var key = d.items[i].values[age_index];
    key = key.substring(9);


The fetch_stats_success function iterates the headers of the result to find the index of the column we are interested in and the iterates over the result items and maps the column data into the DOM of the local HTML document. After that the drawpie function is invoked and the stats can be shown.

This demo is related to the Helsinki Region Infoshare project, which aims to provide data of the Finnish capital region in Linked Open Data form.

Feel free to contact us if you need more info on the demo or the Semantic.hri.fi services.


Querydsl in the wild

From time to time we observe how the Querydsl framework is used outside of Mysema to find out how Querydsl meets the needs of our user base. In this blog post I'd like to share some Querydsl usage patterns we found.

Expression factory class

As an alternative to Expression construction based on the core primitives reocurring constraints can be bundled as factory methods grouped by the used root types.

This example shows Expression factory methods for the User domain type

package org.springframework.data.jpa.example.repository;

import org.joda.time.LocalDate;
import org.springframework.data.jpa.example.domain.QUser;

import com.mysema.query.types.expr.BooleanExpression;

class UserExpressions {

    public static BooleanExpression goldMember() {
        return adult().and(createdAtleastOneYearAgo());
    public static BooleanExpression adult() {
        return QUser.user.age.goe(18);
    public static BooleanExpression createdAtleastOneYearAgo() {
        return QUser.user.creationDate.loe(new LocalDate().minusYears(1));
    public static BooleanExpression differentFrontThanLastName() {
        return QUser.user.firstname.eq(QUser.user.lastname).not();
    public static BooleanExpression usernameLongerThanSix() {
        return QUser.user.username.length().gt(6);

Source: GitHub

$ as variable name for default entity path

In cases where Querydsl is used in combination with the Repository pattern it can be convenient to use a general variable name for the default entity path to be used. The dollar-sign is a valid candidate for this:

package reevent.dao;

import com.mysema.query.jpa.JPQLQuery;
import org.springframework.stereotype.Repository;
import org.springframework.transaction.annotation.Transactional;
import reevent.domain.QUser;
import reevent.domain.User;

public class UserDaoJpa extends EntityDaoBase<User> implements UserDao {
    private static final QUser $ = QUser.user;

    public User findByUsername(String username) {
        return queryByUsername(username).uniqueResult($);

    public boolean usernameExists(String username) {
        return queryByUsername(username).exists();

    private JPQLQuery queryByUsername(String username) {
        return query().from($).where($.username.eq(username));

Source: GitHub

Drawbacks with this approach are that some code inspection tools might complain about the unconventional variable name usage and that the dollar-sign is also used by the Querydsl Alias-functionality.

Hamcrest integration

Querydsl expressions can also be used for other tasks beside querying. A fine example is Hamcrest integration:

package org.hamcrest;

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.PathMatcher.valueOf;
import static org.junit.Assert.assertThat;

import org.hamcrest.domain.Address;
import org.hamcrest.domain.Customer;
import org.hamcrest.domain.QCustomer;
import org.junit.Test;

public class PathMatcherTest {

    public void testPathEvaluate() {
        Customer customer = new Customer();
        Address address = new Address();
        address.setStreet("blaway 142");

            equalTo("blaway 142")));

Hamcrest matchers can be used in unit tests for complex assertion construction.

Jeroen van Schagen, the author of the Hamcrest example has bundled his Querydsl extensions into the querydsl-support module.

Supplying filter criteria to general find methods

This pattern is extensively used in the Spring Data repositories. Here is a simple example:

QCustomer customer = QCustomer.customer;
LocalDate today = new LocalDate();
BooleanExpression customerHasBirthday = customer.birthday.eq(today);
BooleanExpression isLongTermCustomer = 

Advanced Spring Data JPA – Specifications and Querydsl

If you have innovative Querydsl usage yourself, please share it with us!


Julkinen data seminaari - To 7.4.2011, Helsinki

Hallitus on tehnyt periaatepäätöksen (3.3.2011) julkishallinnon digitaalisten tietoaineistojen saatavuudesta. Tietoaineistojen tulee olla avoimesti saatavilla ja uudelleenkäytettävissä yhtenäisin, selkein ja kaikille tasapuolisin ehdoin. Periaatepäätös koskee ensisijaisesti sellaisenaan luovutettavissa olevia julkisia aineistoja, joiden käsittelyä ei lainsäädäntö rajoita.

Mutta mitä se tarkoittaa käytännössä? Miten julkisia tietoaineistoja tarjotaan?

Julkinen data seminaari torstaina 7.4.2011 Aalto yliopiston Chydenina-talossa Helsingissä tarjoaa tietoa julkisten tietovarantojen avaamiseen liittyen eri näkökulmilta. Seminaari on tarkoitettu kaikille tiedon parissa työskenteleville asiantuntijoille ja päättäjille kuntien ja valtion tehtävissä.

Tilaisuudessa esitellään:

  • mitä Julkinen data on
  • miten valtion periaatepäätöksellä pyritään edistämään julkisen hallinnon tietovarantojen saatavuutta ja käyttöä.
  • konkreettisia esimerkkejä mitä avoimilla tietovarantojen avulla on saatu aikaiseksi meillä ja maailmalla.
  • Case Helsingin kaupungin aluesarjat-tilastotietojen avaaminen ja teknisen ratkaisun rakentaminen

Tilaisuuden ohjelma:

8:30 Rekisteröityminen ja aamukahvit

9:00 Avaus - missä mennään julkisten tietovarantojen avauksessa, Taru Rastas, Liikenne ja Viestintäministeriö

9:15 Julkinen data - mitä kaikkea se onkaan? Tapio Nurminen, FloApps Oy

10:00 Kahvitauko

10:15 Case esittely - Aluesarjat-tilastot avoimeksi, Pekka Vuori, Yliaktuaari, Helsingin Kaupungin Tietokeskus

10:45 Q&A + Aluesarjat Demot by FloApps & Mysema

11.30 Lounas

12.30 Julkista dataa hyödyntäviä esimerkkisovelluksia ja -palveluita Suomessa ja maailmalla. Tapio Nurminen, Toimitusjohtaja, FloApps Oy

13:00 Datan avaaminen käytännössä-teknisen ratkaisun rakentaminen, Samppa Saarela, Mysema Oy

13:30 Avoin tieto laadukkaaksi data managementin avulla, Nino Ilveskero, Talent Base Oy

14:00 Kahvitauko

14:15 Liiketoimintamallit julkisen datan avulla, Juho Lindman, Aalto Econ. CKIR

14:45-15:30 Yhteenveto + kysymykset - Tapio Nurminen ja Samppa Saarela

Tilaisuuden hinta on 75 € + alv. 23%

Ilmoittaudu ystävällisesti tilaisuuteen 25.3. mennessä kahvi- ja lounasjärjestelyjen johdosta ennakkoon osoitteessa:


Seminaari järjestetään: torstaina 7. huhtikuuta, klo 8.30 - 15.30 Aalto-Yliopiston Chydenia-talossa osoitteessa Runeberginkatu 22-24, Helsinki.

Järjestäjät ovat Flo Apps Oy ja Mysema Oy,
yhteistyössä Helsinki Regional Infoshare, Aalto Yliopiston Econ. osaston ja Talent Base Oy:n kanssa.


Querying in SQL with Querydsl

Querydsl provides a typesafe querying layer on top of JPA, JDO, JDBC and other backends. This blog post presents a simple tutorial on how to get started with querying in SQL using Querydsl.

Querydsl for SQL provides a typesafe view of the relational schema and makes expressing database queries in Java as intuitive as possible.

Getting started

To get started with Querydsl for SQL using a Maven 2 based build environment, follow these steps.

Add the following dependencies to your Maven project and make sure that the Maven 2 repo of Mysema Source is accessible from your POM :



The snippet to include the Mysema Source Maven repository is :


Set the variable querydsl.version to the latest Querydsl version available.

If you are not using Maven you can download the latest builds from here

Creating the Querydsl query types

To get started with creating Querydsl SQL queries you need to create Java classes for your relational schema. This can be done either in Java, via Maven or via Ant.

In Java

To get started export your schema into Querydsl query types like this :

java.sql.Connection conn = ...; 
MetaDataExporter exporter = new MetaDataExporter();
exporter.setTargetFolder(new File("src/main/java"));

This declares that the database schema is to be mirrored into the com.myproject.domain package in the src/main/java folder.

The generated types have the table name transformed to mixed case as the class name and a similar mixed case transformation applied to the columns which are available as property paths in the query type.

In Maven

This functionality is also available as a Maven plugin. The presented example can be declared like this in the POM :


Use the goal test-export to add the targetFolder as a test compile source root instead of a compile source root.


The ANT task com.mysema.query.sql.ant.AntMetaDataExporter of the querydsl-sql module provides the same functionality as an ANT task. The configuration parameters of the task are jdbcDriverClass, dbUrl, dbUserName, dbPassword, namePrefix, targetPackage, targetSourceFolder, schemaPattern, tableNamePattern, exportBeans and innerClassesForKeys.

Using query types

To create queries with Querydsl you need to instantiate variables and Query implementations. We will start with the variables.

For each table in your mirrored schema a query type is created. For example for the table customer Querydsl will generate a query type with the simple name QCustomer into the folder and package declared in the code generation setup. QCustomer can be used as a statically typed variable in Querydsl queries as a representative for the customer table.

QCustomer has a default instance variable which can be accessed as a static field :

QCustomer customer = QCustomer.customer;

Alternatively you can define your own QCustomer instances like this :

QCustomer customer = new QCustomer("c");


Querying with Querydsl SQL is as simple as this :

QCustomer customer = new QCustomer("c"); // alias for the CUSTOMER table

SQLTemplates dialect = new HSQLDBTemplates(); // SQL-dialect
SQLQuery query = new SQLQueryImpl(connection, dialect); 
List<String> lastNames = query.from(customer)

which is transformed into the following sql query, assuming that the related table name is customer and the columns first_name and last_name :

SELECT c.last_name 
FROM customer c 
WHERE c.first_name = 'Bob'

Internally Querydsl SQL uses prepared statements.

General usage

Use the the cascading methods of the com.mysema.query.sql.SQLQuery interface like this

from : Define the query sources here.

innerJoin, join, leftJoin, fullJoin, on : Define join elements using these constructs. For the join methods the first argument is the join source and the second the target (alias).

where : Define the query filters, either in varargs form separated via commas or cascaded via the and-operator.

groupBy : Define the group by arguments in varargs form.

having : Define the having filter of the "group by" grouping as an varags array of Predicate expressions.

orderBy : Define the ordering of the result as an varargs array of order expressions. Use asc() and desc() on numeric, string and other comparable expression to access the OrderSpecifier instances.

limit, offset, restrict : Define the paging of the result. Limit for max results, offset for skipping rows and restrict for defining both in one call.


The syntax for declaring ordering is

    .orderBy(customer.lastName.asc(), customer.firstName.asc())
    .list(customer.firstName, customer.lastName);

which is equivalent to the following native SQL

SELECT c.first_name, c.last_name
FROM customer c 
ORDER BY c.last_name ASC, c.first_name ASC


Grouping can be done in the following form


which is equivalent to the following native SQL

SELECT c.last_name 
FROM customer c 
GROUP BY c.last_name


To create a subquery you create a com.mysema.query.sql.SQLSubQuery instance, define the query parameters via from, where etc and use unique or list to create a subquery, which is just a type-safe Querydsl expression for the query. unique is used for a unique (single) result and list for a list result.

  customer.status.eq(new SQLSubQuery().from(customer2).unique(customer2.status.max()))

Another example

  customer.status.in(new SQLSubQuery().from(status).where(status.level.lt(3)).list(status.id))

Query extension support

Custom query extensions to support engine specific syntax can be created by subclassing com.mysema.query.sql.AbstractSQLQuery and adding flagging methods like in the given MySQLQuery example :

public class MySQLQuery extends AbstractSQLQuery<MySQLQuery>{
    public MySQLQuery(Connection conn) {
        this(conn, new MySQLTemplates(), new DefaultQueryMetadata());
    public MySQLQuery(Connection conn, SQLTemplates templates) {
        this(conn, templates, new DefaultQueryMetadata());
    protected MySQLQuery(Connection conn, SQLTemplates templates, QueryMetadata metadata) {
        super(conn, new Configuration(templates), metadata);
    public MySQLQuery bigResult(){
        return addFlag(Position.AFTER_SELECT, "SQL_BIG_RESULT ");
    public MySQLQuery bufferResult(){
        return addFlag(Position.AFTER_SELECT, "SQL_BUFFER_RESULT ");
    // ...   

The flags are custom SQL snippets that can be inserted at specific points in the serialization. The supported positions are the enums of the com.mysema.query.QueryFlag.Position enum class.

Using DDL commands

CREATE TABLE commands can be used in fluent form via the com.mysema.query.sql.ddl.CreateTableClause class. Here are some examples :

new CreateTableClause(conn, templates, "language")
 .column("id", Integer.class).notNull()
 .column("text", String.class).size(256).notNull()

new CreateTableClause(conn, templates, "symbol")
 .column("id", Long.class).notNull()
 .column("lexical", String.class).size(1024).notNull()
 .column("datatype", Long.class)
 .column("lang", Integer.class)

new CreateTableClause(conn, templates, "statement")
 .column("model", Long.class)
 .column("subject", Long.class).notNull()
 .column("predicate", Long.class).notNull()
 .column("object", Long.class).notNull()

The constructor of CreateTableClause takes the connection, the templates and the table name. The rest is declared via column, primaryKey and foreignKey invocations.

Here are the corresponding CREATE TABLE clauses as they are executed.

CREATE TABLE language (
  text VARCHAR(256) NOT NULL,

 CREATE TABLE symbol (
  lexical VARCHAR(1024) NOT NULL,
  datatype BIGINT,
  lang INTEGER,
  intval BIGINT,
  floatval DOUBLE,
  datetimeval TIMESTAMP,

 CREATE TABLE statement (
  model BIGINT,
  subject BIGINT NOT NULL,
  predicate BIGINT NOT NULL,

Using DML commands

All the com.mysema.query.dml.DMLClause implementations in the Querydsl SQL module take three parameters, the java.sql.Connection instance, the com.mysema.query.sql.SQLTemplates instance used in the queries and the main entity the DMLClause is bound to.

Insert examples :

// with columns
new SQLInsertClause(conn, dialect, survey)
    .columns(survey.id, survey.name)
    .values(3, "Hello").execute();

// without columns
new SQLInsertClause(conn, dialect, survey)
    .values(4, "Hello").execute();

// with subquery
new SQLInsertClause(conn, dialect, survey)
    .columns(survey.id, survey.name)
    .select(new SQLSubQuery().from(survey2).list(survey2.id.add(1), survey2.name))

// with subquery, without columns
new SQLInsertClause(conn, dialect, survey)
    .select(new SQLSubQuery().from(survey2).list(survey2.id.add(10), survey2.name))

Update examples :

// update with where
new SQLUpdateClause(conn, dialect, survey)
    .set(survey.name, "S")
// update without where
new SQLUpdateClause(conn, dialect, survey)
    .set(survey.name, "S")

Delete examples :

// delete with where
new SQLDelecteClause(conn, dialect, survey)
// delete without where
new SQLDelecteClause(conn, dialect, survey)

Bean class generation

To create JavaBean DTO types for the tables of your schema use the com.mysema.query.sql.MetaDataExporter like this :

java.sql.Connection conn = ...; 
MetaDataExporter exporter = new MetaDataExporter();
exporter.setTargetFolder(new File("src/main/java"));       
exporter.setBeanSerializer(new BeanSerializer());

Now you can use the bean types as arguments to the populate method in DML clauses and you can project directly to bean types in queries. Here is a simple example :

QEmployee e = new QEmployee("e");
// Insert
Employee employee = new Employee();
Integer id = insert(e).populate(employee).executeWithKey(e.id);

// Update
assertEquals(1l, update(e).populate(employee).where(e.id.eq(employee.getId())).execute());

// Query
Employee smith = query().from(e).where(e.lastname.eq("Smith")).uniqueResult(e);
assertEquals("John", smith.getFirstname());
// Delete 
assertEquals(1l, delete(e).where(e.id.eq(employee.getId())).execute());

Where to go next

See the related section of the latest Reference docs for more information.