2013-11-21

Hibernate challenges: Entity persistence

Hibernate has been the most used persistence technology in our Java projects. We have optimized Hibernate usage via our own tooling such as Querydsl JPA for querying as well as best practices and conventions for other parts such as entity lifecycle handling.

In this blog post I’d like to address Hibernate issues related to entity persistence and loading. Some of these challenges we have faced ourselves in customer projects and others are issues our customers have come across.

Usage of right session level methods

Hibernate provides multiple methods for persisting entities with slightly different semantics. To understand the methods better it is good to understand the different states Hibernate managed entities go through.

The starting point for new entities is the transient state. Transient entities are unknown to any Hibernate session and don’t have any database identity attached. Transient instances can be persisted by Session.save() and Session.saveOrUpdate(). save will assign the database identity value to the entity’s id property, but might not do any SQL INSERTs. Session.save() should be seen as the way to register an entity with Hibernate.

The persistent state is the state loaded and queried entities are in. Also transient entities become persistent when saved via Session.save() or saveOrUpdate(). Persistent instances are tracked by the Hibernate session and changes are persisted to the database at session flush time.

Persistent entities become detached via Session.evict(), Session.clear() and Session.close() calls. Detached entities do have a database identity, but are not attached to a running entity. Detached entities are usually used when entities are used in the view rendering phase for web application with an active session or in long running conversions with multiple sessions.

Persistent entities can be also be deleted via Session.delete(). The deletions on the database level happen at session flush time.

There is also a path back from detached entities to persistent entities via Session.saveOrUpdate(), Session.update() and Session.merge().

A common problem with Hibernate session usage is that methods are not used in the right way or unnecessary session level method calls are made. The following two cases are quite common examples of wrong API usage.

It is quite common to see method calls to persist entities which are known to be managed by Hibernate.

Person person = new Person();
session.save(person);
// modifications to person
session.update(person);

In this case the update call in the end is a NOOP since the person instance is already persistent.

Another common antipattern is to control the session size with too fine grained calls.

Person person = new Person();
// modifications to person
session.save(person);
session.flush();
session.clear(); // or session.evict(person)

The code above makes the person instance persistent, assigns and id, saves it to the database on flush and clears the session on clear.
There are several problems with this approach

  • JDBC batching on INSERT and UPDATE can’t be utilized, since flushes happen too often
  • other entities are also removed from the session which might break surrounding code, evict helps in this case
  • the person instance is detached after this call and that needs to be taken into account in it’s usage

Explicit flush and evict calls are often necessary for batch operations, but for other cases implicit flushes should be used on transaction ends.

Mixing session level and SQL level manipulations

Hibernate provides the possibility to mix session level operations on entities with direct updates to the database via DML clauses. When these get mixed things get complicated

List<MCustomer> customers = session.createQuery("select c from Customer c").list();

String hql = "update Customer c set c.name = :newName where c.name = :oldName";
int updatedEntities = s.createQuery(hql)
        .setString("newName", newName)
        .setString("oldName", oldName)
        .executeUpdate();

for (Customer customer : customers) {
    // modifications to customers
}

In this case the modifications to the customers on the Java level override the modifications via the DML clause, even if other properties are modified. Also the updates executed via the update are not reflected in the second level cache.

A good solution to this problem is to keep DML and session level operation usage in different transactions. If that is not possible one should be very careful not to create inconsistent end results.

Efficient batch inserts and updates

The promoted approach to batch insert entities via session level methods is

for (int i = 0; i < 100000; i++) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if (i % 20 == 0) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

This is a good approach if used in an isolated session. If other things happen before and after this code in the same session a more secure approach could be

Set<Customer> batch = new HashSet<Customer>();
for (int i = 0; i < 100000; i++) {
    Customer customer = new Customer(.....);
    session.save(customer);
    batch.add(customer);    
    if (batch.size() == 20) { //20, same as the JDBC batch size
        for (Customer c : batch) {
            session.evict(c);
        }
        session.flush();
        batch.clear();
    }
}

The same applies for batch updates. Usage of session.clear() in a more complex context is always risky.

Efficient loading

Efficient loading of entities and collections involves tuning the loading behaviour when querying.

It is usually a good practice to favor query scope optimizations of loading and fetching instead using annotations extensively, because this gives you more flexibility and emphasizes use case specific optimizations instead of general behaviour.

Collections are by default loaded lazily but they can fetched in a single select via the fetch keyword

select c from Company c
left join fetch c.departments

As an annotation inside Company it could be declared like this

    @Entity
    public class Company {

        @Fetch(FetchMode.Join)
        private Set<department> departments;
    }

It is usually good to define as annotations the desired behaviour for loading via session.get() or session.load() and via query flags the optimizations for loading of multiple entities via a query.

Hibernate or JPA API?

Usage of the JPA 2 API is usually promoted in favor of Hibernate’s own Session API. Even the Hibernate team itself pushes the adoption of the JPA API http://www.theserverside.com/news/2240186700/The-JPA-20-EntityManager-vs-the-Hibernate-Session-Which-one-to-use.

In one of our projects we migrated from Hibernate Session usage to JPA API and the biggest pain points were related to the methods related to persistence of entities and related cascading operations. Queries were easily ported since they were already handled by Querydsl which supports both APIs.

The main methods in Hibernate to persist objects are session.save() and session.update(). In JPA they are entityManager.persist() and entityManager.merge(). Hibernate provides additionally the saveOrUpdate convenience method, which will use either save or update depending on the state of the object. A saveOrUpdate() equivalent is not available in the JPA and the entityManager.merge() method is not a direct replacement for session.update(), so it is not a trivial job to replace saveOrUpdate() usage when migrating to the JPA API.

The challenges we faced were

  • no saveOrUpdate() replacement available and own implementation proved to be not fully compatible with the previous behaviour
  • Hibernate cascade annotations didn’t work reliably anymore when EntityManager persistence methods were used.

We ended up replacing the Hibernate cascade annotations with JPA cascades and replacing saveOrUpdate calls with a custom persistOrMerge method which used the JPA API.

Having a strong DAO / Service division helps with migrating from one persistence API to another, but since the transaction boundary is often on the Service layer the persistence API behaviour often leaks to the service layer.

For a more indepth discussion of this problem read this blog post http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/

No comments:

Post a Comment