Mysema Blog

2013-11-28

Querydsl user survey results

A few weeks ago we opened the Querydsl user survey to gather some information on how Querydsl is used and what direction it would make sense to take it. Some of the results were a little surprising, but mostly the results confirmed that the decisions we had made in the past were mostly right and that most of ours users are extremely happy with the product.

Here is a summary of the results, the amounts are in percentages:

How have you heard of Querydsl?

Which IDEs are you using?

Which build tools are you using with Querydsl?

Which Querydsl modules are you using?

What other databases would you like to see supported in Querydsl?

What JDK version/s are you using?

What is your overall satisfaction with Querydsl?

What are the major benefits in using Querydsl?

How have you participated in the Querydsl community?

What commercial Querydsl services would you be interested to try?

Do you have anything else to tell us

This question was not multiple-choice, so only the most common topics are listed below

Better docs, more examples
Roadmap of coming features
Donation page
Better readability for Querydsl SQL queries
Dedicated Gradle support
More advanced SQL functionality
Continue with the current licensing model
Better control of code generation target packages
Better docs for Hibernate Search and Collections modules

Conclusion

Based on these promising results we will take the following actions:

Expand our commercial offers
Invest into Querydsl SQL, since it is the second most popular Querydsl module
Improve the general documentation
Provide a public Roadmap for Querydsl development

Feedback on Querydsl and its development model is of course also welcome in any other form. Let us know what features you are missing and what use cases should be supported better.

2013-11-21

Hibernate challenges: Entity persistence

Hibernate has been the most used persistence technology in our Java projects. We have optimized Hibernate usage via our own tooling such as Querydsl JPA for querying as well as best practices and conventions for other parts such as entity lifecycle handling.

In this blog post I’d like to address Hibernate issues related to entity persistence and loading. Some of these challenges we have faced ourselves in customer projects and others are issues our customers have come across.

Usage of right session level methods

Hibernate provides multiple methods for persisting entities with slightly different semantics. To understand the methods better it is good to understand the different states Hibernate managed entities go through.

The starting point for new entities is the transient state. Transient entities are unknown to any Hibernate session and don’t have any database identity attached. Transient instances can be persisted by Session.save() and Session.saveOrUpdate(). save will assign the database identity value to the entity’s id property, but might not do any SQL INSERTs. Session.save() should be seen as the way to register an entity with Hibernate.

The persistent state is the state loaded and queried entities are in. Also transient entities become persistent when saved via Session.save() or saveOrUpdate(). Persistent instances are tracked by the Hibernate session and changes are persisted to the database at session flush time.

Persistent entities become detached via Session.evict(), Session.clear() and Session.close() calls. Detached entities do have a database identity, but are not attached to a running entity. Detached entities are usually used when entities are used in the view rendering phase for web application with an active session or in long running conversions with multiple sessions.

Persistent entities can be also be deleted via Session.delete(). The deletions on the database level happen at session flush time.

There is also a path back from detached entities to persistent entities via Session.saveOrUpdate(), Session.update() and Session.merge().

A common problem with Hibernate session usage is that methods are not used in the right way or unnecessary session level method calls are made. The following two cases are quite common examples of wrong API usage.

It is quite common to see method calls to persist entities which are known to be managed by Hibernate.

Person person = new Person();
session.save(person);
// modifications to person
session.update(person);

In this case the update call in the end is a NOOP since the person instance is already persistent.

Another common antipattern is to control the session size with too fine grained calls.

Person person = new Person();
// modifications to person
session.save(person);
session.flush();
session.clear(); // or session.evict(person)

The code above makes the person instance persistent, assigns and id, saves it to the database on flush and clears the session on clear.
There are several problems with this approach

JDBC batching on INSERT and UPDATE can’t be utilized, since flushes happen too often
other entities are also removed from the session which might break surrounding code, evict helps in this case
the person instance is detached after this call and that needs to be taken into account in it’s usage

Explicit flush and evict calls are often necessary for batch operations, but for other cases implicit flushes should be used on transaction ends.

Mixing session level and SQL level manipulations

Hibernate provides the possibility to mix session level operations on entities with direct updates to the database via DML clauses. When these get mixed things get complicated

List<MCustomer> customers = session.createQuery("select c from Customer c").list();

String hql = "update Customer c set c.name = :newName where c.name = :oldName";
int updatedEntities = s.createQuery(hql)
        .setString("newName", newName)
        .setString("oldName", oldName)
        .executeUpdate();

for (Customer customer : customers) {
    // modifications to customers
}

In this case the modifications to the customers on the Java level override the modifications via the DML clause, even if other properties are modified. Also the updates executed via the update are not reflected in the second level cache.

A good solution to this problem is to keep DML and session level operation usage in different transactions. If that is not possible one should be very careful not to create inconsistent end results.

Efficient batch inserts and updates

The promoted approach to batch insert entities via session level methods is

for (int i = 0; i < 100000; i++) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if (i % 20 == 0) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

This is a good approach if used in an isolated session. If other things happen before and after this code in the same session a more secure approach could be

Set<Customer> batch = new HashSet<Customer>();
for (int i = 0; i < 100000; i++) {
    Customer customer = new Customer(.....);
    session.save(customer);
    batch.add(customer);    
    if (batch.size() == 20) { //20, same as the JDBC batch size
        for (Customer c : batch) {
            session.evict(c);
        }
        session.flush();
        batch.clear();
    }
}

The same applies for batch updates. Usage of session.clear() in a more complex context is always risky.

Efficient loading

Efficient loading of entities and collections involves tuning the loading behaviour when querying.

It is usually a good practice to favor query scope optimizations of loading and fetching instead using annotations extensively, because this gives you more flexibility and emphasizes use case specific optimizations instead of general behaviour.

Collections are by default loaded lazily but they can fetched in a single select via the fetch keyword

select c from Company c
left join fetch c.departments

As an annotation inside Company it could be declared like this

    @Entity
    public class Company {

        @Fetch(FetchMode.Join)
        private Set<department> departments;
    }

It is usually good to define as annotations the desired behaviour for loading via session.get() or session.load() and via query flags the optimizations for loading of multiple entities via a query.

Hibernate or JPA API?

Usage of the JPA 2 API is usually promoted in favor of Hibernate’s own Session API. Even the Hibernate team itself pushes the adoption of the JPA API http://www.theserverside.com/news/2240186700/The-JPA-20-EntityManager-vs-the-Hibernate-Session-Which-one-to-use.

In one of our projects we migrated from Hibernate Session usage to JPA API and the biggest pain points were related to the methods related to persistence of entities and related cascading operations. Queries were easily ported since they were already handled by Querydsl which supports both APIs.

The main methods in Hibernate to persist objects are session.save() and session.update(). In JPA they are entityManager.persist() and entityManager.merge(). Hibernate provides additionally the saveOrUpdate convenience method, which will use either save or update depending on the state of the object. A saveOrUpdate() equivalent is not available in the JPA and the entityManager.merge() method is not a direct replacement for session.update(), so it is not a trivial job to replace saveOrUpdate() usage when migrating to the JPA API.

The challenges we faced were

no saveOrUpdate() replacement available and own implementation proved to be not fully compatible with the previous behaviour
Hibernate cascade annotations didn’t work reliably anymore when EntityManager persistence methods were used.

We ended up replacing the Hibernate cascade annotations with JPA cascades and replacing saveOrUpdate calls with a custom persistOrMerge method which used the JPA API.

Having a strong DAO / Service division helps with migrating from one persistence API to another, but since the transaction boundary is often on the Service layer the persistence API behaviour often leaks to the service layer.

For a more indepth discussion of this problem read this blog post http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/

2013-10-31

Querydsl commercial offers

We are frequently asked about our future plans regarding Querydsl and in which direction we intend to take it commercially. With the recent announcements of the jOOQ project to change its licensing model from a pure OSS licensing model to dual licensing our users are wondering if we might follow suit.

We have no intention of changing the licensing model of Querydsl and will keep on using a pure open source license for the foreseeable future. We feel that middleware should be transparent, and open source is one of the best models to guarantee both transparency and an interactive development model.

Our commercial model for Querydsl is open source software and commercial services. Querydsl is free to use and you will get free public support for it, but for private support, bigger additions and closed source extensions we are available for consulting and development at reasonable hourly rates.

Training

Querydsl training is available in packages and can also be customized for special needs. The available packages are

Introduction

one day of training to get an overview of the general functionality of Querydsl and usage examples for its most popular modules JPA and SQL

Querydsl & JPA/Hibernate

two days of training to get an overview of Querydsl and how it is best used with JPA including best practices for service/DAO design with Querydsl

Querydsl & SQL

two days of training to get a general overview of Querydsl and how it is best used with JDBC/SQL

Project development

Development services for projects involving Querydsl and related technologies are also available. We have worked with small startups and big global corporations to provide solutions using open source Java and Web technologies.

See our references for some examples of projects we’ve done in the past.

In addition, we have consulted a major Finnish software company on Hibernate performance problems. And later given training on Hibernate and Querydsl usage for their development team in St. Petersburg.

Extensions

Querydsl extensions in both open source and closed source form can be provided by us via development projects.

Examples of such extensions are

support for SQL/NoSQL technologies that are not yet supported by Querydsl (e.g. DB2 or Cassandra)
higher level abstractions for Querydsl that fit into your IT infrastructure
integration of Querydsl into other middleware frameworks or libraries

Contact us at querydsl@mysema.com if any of these options sound attractive to you. And if you haven’t yet participated in the Querydsl user survey, please do so.

2013-10-07

HelMet integration for Linked Events

The most recent addition to Linked Events, the event information aggregator developed by Mysema, is the integration of events taking place at libraries throughout the Helsinki region. This information is kindly provided by HelMet.

Adding new sources of data to an existing system is always a challenge. For example, the format of the new data might be incompatible with the existing data. This is true for event information in particular since there isn’t any generally accepted format, although some work is being done in that direction, see for example schema.org, EventsML or CitySDK. Other difficulties can arise from incompatibilities at the API level - even though a machine-friendly API is provided, it isn’t necessarily cut out for all possible use cases that application developers might come up with.

Open data has become a priority for the public sector in many countries. The idea is enticing - make your data open and let people innovate with it, developing applications that the public sector itself does not have the resources to implement. New York has NYC BigApps, Helsinki loves developers and there is even an Apps4Pirkanmaa competition, to name just a few. The HelMet case is a good example of a public organization’s willingness to expose data for public use - information about events taking place in HelMet’s network of libraries is provided using a machine-friendly API, which we decided to leverage in the Linked Events portal.

The challenges of open data

Despite the easy-to-use API, we still encountered a few problems while integrating our new data source, problems that probably are rather typical for developers trying to leverage open data. In our case, several vital pieces of information were missing from the event data that the API provides - location information, translations and so on. All of these issues would certainly be possible to fix, but it would probably take a long time for the data provider, a large public sector organization, to go through the hoops of procurement, IT development outsourcing and so forth. Because of time constraints, we ended up scraping a website for the missing pieces of information, an unnecessarily brittle step in what could otherwise have been a very simple integration.

Developers of applications that utilize open data are usually individuals or small, agile companies. When the providers of the open data are large organizations that cannot respond quickly to developers’ needs, there is going to be friction. Here are some tips for organizations that want developers to be able to smoothly leverage their open data sets:

Prefer open source components and universal data formats for developing APIs. This avoids provider lock-in and makes maintenance easier.
Outsource carefully and have someone in your organization with sufficient technical know-how oversee the project. Facilitate knowledge transfer by bringing consultants in-house and having them collaborate with your own people instead of doing development off-site.
Designate someone to be the “API maintainer”. This person should coordinate development, answer questions from developers, monitor API usage and watch out for potential problems. Provide this person’s contact information to developers.
Open a public issue queue on for example GitHub. This provides a channel for developers to report issues, to discover what problems other developers have run into and and to see what progress has been made on resolving them.
Transparency is key - provide documentation, examples, a list of known bugs and so on. Consider open-sourcing your API.

I think the push for making data open and accessible is an applaudable effort by the public sector. There have already been many beautiful applications built using open data - take, for example, BlindSquare or our own Linked Events aggregator. To make open data even more useful, data providers should seek to become more agile and responsive to developers’ needs. A public-facing API can’t be implemented and then forgotten about, it is something that needs to be supported and refined continuously.

2013-09-25

Querydsl in Finnish eDemoracy and eServices projects

Querydsl, our open source database query library, is used in multiple business sectors to make querying databases easier and more productive. Besides various private sector projects Querydsl is used at least in two large-scale public sector projects of the Action Programme on eServices and eDemocracy (SADe) of the Finnish Ministry of Finance here in Finland.

The main goal of the SADe project is to develop public services for individuals, companies and other entities. These customer-oriented and interoperable services are meant to increase the quality and efficiency of public sector services. The SADe programme consists of eight projects which are managed by the Ministry of Finance and other ministries:

Querydsl is used in at least two of these projects (source code will be open by the end of this year):

In the Finnish eParticipation Environment of the Ministry of Justice, or more specifically in the Kansalaisaloite.fi portal, Querydsl is used on top of SQL databases. We have found that handwritten SQL is usually hard to maintain, error-prone, and slow. Querydsl basically solves all of these problems.

Learners’ Online Services of the Ministry of Education and Culture is also using Querydsl. Their use case is a bit different: they have SQL and NoSQL databases, Querydsl keeps the developers productive when switching between the two due to its consistent query API. Mysema has provided consulting to the Learners’ Online Services developers on how to get the most out of Querydsl.

We used Querydsl on top of Apache Lucene when developing Suomen Museot Online. This site provides a browsable database of almost 200,000 museum items. It gathers data from nearly 50 Finnish museums and makes the data available to Europeana.

If you are interested in Mysema’s consulting services, please contact sales@mysema.com.

2013-09-03

Euroopan unionin tietosuojalainsäädäntö uudistumassa

(This post is only available in Finnish)

Kiitos Edward Snowdenin tietoturvasta ja henkilötietojen suojaamisesta on tullut viime aikoina ajankohtaisia puheenaiheita. Yhdysvaltojen PRISM-ohjelman laajuus on tietenkin tullut yllätyksenä, mutta ohjelmistojen suunnitteluun ja implementointiin meillä Suomessa se ei kuitenkaan suoraan vaikuta. Toinen, paljon merkittävämpi seikka, joka ei ainakaan tähän asti ole hirveästi julkisuutta saanut, on Euroopan unionin uusi tietosuojalainsäädäntö.

Uutta tietosuoja-asetusta säädetään tällä hetkellä Euroopan unionissa ja se tulee voimaan mahdollisesti noin kahden vuoden päästä. Asetuksena se tulee jäsenvaltioissa tietyn siirtymäajan jälkeen automaattisesti voimaan, toisin kuin esimerkiksi direktiivi, jota jäsenvaltio pystyy vielä muokkaamaan ennen voimaanastumista. Suomen kohdalla uusi asetus korvaa henkilötietolain vuodelta 1999.

Keskustelu uudesta tietosuojalainsäädännöstä on tähän asti kohdistunut isoihin yrityksiin, esimerkiksi Facebookiin ja Googleen, jotka prosessoivat valtavia määriä henkilötietoja. Pitää kuitenkin muistaa, että uusi lainsäädäntö tulee koskemaan kaikkia tahoja, yrityksiä ja organisaatioita, jotka jollain tavalla käsittelevät henkilötietoja. Yhtä paljon lainsäädäntö koskee Suomen valtiota, kuntia, seurakuntia ja vaikka videovuokraamoja kuin kansainvälisiä Internet-yrityksiä. Uuden asetuksen myötä laiminlyönti tietosuoja-asioissa voi johtaa isoihin sakkoihin.

Uuden asetuksen noudattaminen vaatii, että henkilötietojen suojaaminen ja oikeaoppinen käyttö otetaan alusta asti huomioon ohjelmistoprojektien suunnittelussa ja toteutuksessa. Tietosuoja ei tietenkään ole ohjelmistokehitysprojekteissa mikään uusi vaatimus, mutta tekniset haasteet lisääntyvät tuntuvasti uuden asetuksen myötä. On esimerkiksi ehdotettu, että henkilöllä, jonka tietoja järjestelmässä käytetään, tulisi olla mahdollisuus kaikkien omien tietojen poistamiseen. Tietoja pitää myös pystyä viemään ulos järjestelmästä ”yleisesti käytetyssä formaatissa” helpottaakseen siirtoa yhdeltä palveluntarjoajalta toiselle. Facebookin kokoisella yrityksellä on tarpeeksi resursseja toteuttaa järjestelmissään tällaiset ominaisuudet. Pienen kunnan rakennuslautakunnalle tai vaikka musiikkikoululle uudet vaatimukset voivat kuitenkin olla hyvinkin haasteellisia.

Uuden asetuksen myötä henkilötietoja käsittelevälle organisaatiolle syntyy uusia velvollisuuksia. Mahdollisista tietosuojaloukkauksista tulee esimerkiksi ilmoittaa tietosuojaviranomaiselle ja käyttäjille. Organisaation pitää pystyä näyttämään, että henkilötietoja ei tallenneta vailla käyttötarkoitusta, ja että ne poistetaan automaattisesti, kun niitä ei enää tarvita.

On todennäköistä, että uuden tietosuoja-asetuksen vaatimukset - etenkin tekniset - voivat tulla yllätyksenä monelle tietojärjestelmän tilaajalle. Mysemalla seuraamme tietosuojalainsäädännön kehitystä tarkasti ja otamme tarvittaessa asian hyvissä ajoin puheeksi, jotta asiakas voi olla varma olemassa olevan lainsäädännön noudattamisesta.

Euroopan unionin ehdottaman tietosuoja-asetuksen päämäärä on luoda organisaatioissa kulttuuri, jossa henkilötietojen käsittelyä ja suojaamista huomioidaan laajemmin ja perusteellisemmin kuin nykypäivänä. Organisaatiot, jotka jo hyvissä ajoin ottavat asetuksen vaatimat asiat huomioon, ovat kilpailussa etulyöntiasemassa.

2013-08-30

clojuTRE 2013

We attended clojuTRE 2013 this week and it was brilliant, thanks to Metosin and Nitor for organizing.

The presentations:

ZenRobotics, who can probably claim to have deployed Clojure into production for the longest here in Finland, had Jouni Seppänen and Joel Kaasinen talk about the challenges with resource handling and how to integrate with OpenCV.
Tero Parviainen gave a talk on Pedestal, an über hipster web development framework by Relevance. Tero's opinion was that it is still evolving and probably not something you would use for customer projects just yet.
International Clojure superstars Sam Aaron, Christophe Grand, Edmund Jackson and Meikel Brandmeyer from Lambda Next gave a presentation on core.async. The integration of a technical presentation with a comedy act was hilarious. For this topic it worked perfectly, even though there were a few deadlocks.
Last but not least was a weapons demonstration by Jari Länsiö from Metosin. This super enthusiastic performance showed how to control an (almost) military grade drone with Clojure.

All in all the presentations were excellent and easily matched those at more established conferences.

After-party

After the official program most of us headed to the after-party. It was very interesting to share thoughts with other Finnish Clojurians on how to convince customers that Clojure, and its ecosystem, is production ready.

Final thoughts

We deployed our first Clojure project over a year ago and we have luckily been successful in proving our customers that Clojure is a viable option so that the majority of our new projects are Clojure-based. We found that some of the best arguments for Clojure are development speed, clear paradigm and consistent style. The last two lessen vendor lock-in which is always a worry for the customer.