Why I finally ditched Hibernate native APIs for JPA

If you're looking for the short answer, go check out the Spring Data JPA project.  This is an incredible product that offers a tremendous productivity boost for projects using JPA.  If you're interested in the more gory details, read on below :)

I've been a Hibernate user since 2005 and have used JPA + Hibernate Annotations since 2006 when 1.0 of the JPA spec was released.  Unlike many others I did not immediately jump to the JPA APIs (EntityManager, PersistenceContext, etc) and continued using the native Hibernate APIs (Session, SessionFactory, etc).  JPA was still missing quite a few useful features such as a Criteria API and I wasn't ready to give that up just to use a "standard API."  When JPA 2.0 was released in late 2009 the feature sets of the two products were generally comparable making the decision a little tougher.  But since I've never really bought into the "vendor portability" promise of JPA, I continued happily on with native Hibernate APIs to much success.

Over the past month or two, I've come to the decision it's time to fully embrace JPA.

JPA has grown beyond its original purpose as an object-relational mapping framework into a more generic persistence API.  NoSQL / data grid solutions have become incredibly important and popular over the past few years.  Several JPA-based implemenations for these solutions have already been developed, including Google App Engine / Big Table and Hibernate's own Object/Grid Mapper (OGM).  I started to experiment with GAE about a month ago and was surprised how quickly I could be productive with its JPA implementation.  While JPA likely isn't the best fit for the diverse range of NoSQL implementations out there, the ease of use for JPA developers is undeniable.

While industry trends are important, I've finally found the killer app for JPA: the Spring Data JPA project.  At its core, SDJ is about generating JPQL at runtime so you don't have to write tedious queries.  Some of the awesome features include:

  • Out-of-the-box support for data pagination and sorting.
  • Query creation from method names.  Creating a method signature of findByEmailAddressAndLastName(String emailAddress, String lastName) creating a backing query that does exactly what you'd expect.
  • Specification API to define and combine predicates in a manner similar to the Criteria API.

Check out the reference documentation for Spring Data JPA.  The project just dropped its first 1.0 release candidate.  I'm excited to see what they'll come up with in future releases.

HFCD for Flash Builder: Build Your Flex App 2-3x Faster

HFCD is an extension for Flash/Flex Builder that delegates compilation of your Flex application to a special "compiler daemon" which can run locally or on a remote machine.  The goal of the project is simple: faster builds!  HFCD is the brainchild of Clement Wong, the former compiler engineering lead on the Flex SDK team.  Here are a few useful things to understand about HFCD:

  • HF installs as a Flex Builder plugin which will delegate compilation to a separate OS-level process running either locally or on a remote machine.
  • The compiler daemon process is persistent, meaning it continues to run across multiple builds.  This allows the Java virtual machine to optimize compilation execution each time that a build is run.  The JVM is *very* good at this.
  • The Flex Builder plugin watches for file modifications and immediately pushes these changes to the compiler daemon process.  The daemon has an internal representation of the project file system and will launch internal incremental builds automatically when files change.

So, how fast is it really?  I benchmarked HFCD on two different machines.  I used the Flex 3.4.1 SDK for compilation and ran clean builds of my application each time.  My test project was a real-world Flex app currently in development consisting of about 15 modules and 350 MXML files and ActionScript classes.

2006 Intel Macbook Pro, Core Duo 2.16 GHz, 2 GB RAM, 7200 RPM HD, Leopard 10.5, 32-bit Java 5

  • Stock: 135 seconds average
  • HFCD (1st run): 155 seconds
  • HFCD (successive): 75 seconds average

Intel Core i7 920 @ 3.2 GHz, HT off, 12 GB RAM, dual 7200 RPM HD's in RAID 0, Vista 64-bit, 32-bit Java 6

  • Stock: 65 seconds average
  • HFCD (1st run): 52 seconds
  • HCFD (successive): 21 seconds average (!!!)

As indicated in the documentation, the performance of HFCD increases dramatically after the first build due to the numerous optimizations in HellFire and the JVM itself.  The Macbook was nearly 2x faster while the Windows box was just over 3x faster.  Very impressive and a real time saver!

I hope to post some new benchmarks soon.  I need to do some more research to get HFCD running on a 64-bit JVM as that isn't supported out of the box.  Also I'd like to configure my Macbook to delegate the compilation to my Windows box, especially to ascertain what kind impact the network topology has on build performance.

 

 

Mixing and Matching Spring JdbcTemplate and HibernateTemplate

The JdbcTemplate and HibernateTemplate convenience classes from Spring really make working with the respective APIs a breeze. Unfortunately getting both of these classes to work together within a single Transaction is not straightforward. This comes up very frequently in JUnit tests where you want to verify Hibernate is working with the database in the way you expect, either by inserting data and letting Hibernate load it or by checking to see that Hibernate creates the data you expect. The same will hold true in application code where you need to add JDBC code alongside Hibernate code to meet various requirements. The testing scenarios are simple and illustrative so let's explore those.

One common use case is to persist an object with HibernateTemplate and then verify the data was inserted correctly using JdbcTemplate. Usually Hibernate will not flush the data out to the DB until the transaction commits, meaning that the query done by JdbcTemplate won't be able to see the new data. This one isn't hard to work around: just call HibernateTemplate.flush() to execute the SQL on demand so that subsequent calls to JdbcTemplate will see the new data.

The second use case is a lot tricker: let's say you want to create some data with JdbcTemplate and then make sure that calls to HibernateTemplate will see that data. By default this will not work. You can actually insert with JdbcTemplate, make a call to load the data with HibernateTemplate (it won't find it) and then make another call to JdbcTemplate which will show that the data is there. The problem is that since JdbcTemplate is injected with a DataSource it doesn't really have any knowledge of the transactions from HibernateTransactionManager; thus operations from the two templates are isolated from one another.

Fortunately Spring offers a solution in the TransactionAwareDataSourceProxy class. Just like the name imples, this class acts as a wrapper for an existing DataSource so that all collaborators will participate in Spring-managed transactions. Configuration of this class is trivial:

<bean id="dataSource" class="org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
<property name="targetDataSource">
<bean class="com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
...
</bean>
</property>
</bean>

Note: you may or may not want to define the "real" DataSource as an inner bean that doesn't get registered in the ApplicationContext itself. If you are autowiring your DataSource purely by type, having two different implementations of DataSource will be a problem for you. Workarounds include autowiring using @Qualifier or using @Resource to inject the bean by name.

On Hibernate, Spring, Sessions and Transactions

I was recently working with Spring and Hibernate on a pet project and ran into some issues with Session and Transaction management that proved to be pretty interesting in the end. The following assumes a working knowledge of Hibernate and Spring...

I was in the midst of writing some JUnit 4.x tests using SpringJUnit4ClassRunner and the @TransactionalConfiguration / @Transactional annotations for automatic rollback of @Test methods. I wanted to do some manipulation of the database prior to my tests using a separate method annotated with @Before. What I was reminded of very quickly is that Spring's class runner will not apply a transactional aspect to this method since it's not actually a @Test. This isn't a problem if you are using HibernateTemplate / HibernateCallback, since it ultimately has a reference back to your TransactionManager to handle transactions. But if you want to work with the raw Hibernate APIs it can be problematic.

There are two things to keep in mind: (1) SessionFactory.getCurrentSession() will only work if you have configured the SessionFactory appropriately, and (2) depending on the configuration, you may have to manage Transactions explicitly. The configuration property in question is "hibernate.current_session_context_class" and it is commonly configured one of three different ways:

1. Omitted the property from the configuration

Hibernate will throw an exception on calls to getCurrentSession() complaining that there is no CurrentSessionContext configured.

2. Configured with 'thread'

hibernate.current_session_context_class=thread

Hibernate will bind the Session returned from getCurrentSession() to the current thread and you must manage transactions programmatically. Generally all that's required is to call Session.beginTransaction(). You can also invoke Transaction.commit() or rollback() if you wish.

3. Configured with SpringSessionContext

hibernate.current_session_context_class=org.springframework.orm.hibernate3.SpringSessionContext

Hibernate will assume it is executing inside of a Spring transactional context (i.e. through a Spring transactional aspect) and Spring will now manage your transaction for you. However if you call getCurrentSession() outside of such a context, Hibernate will throw an exception complaining that no Session is bound to the thread.

What does all this mean?

  1. Use SpringSessionContext if your operations will be done through classes that are invoked through a Spring-managed transactional context or if you can introduce HibernateTemplate and/or HibernateCallback wherever you need it.
  2. Use "thread" if you need to raw with the raw Hibernate Session/Transaction API and remember that you'll need to manage transactions programmatically.

Book Review: The Definitive Guide to Terracotta

Terracotta is a “transparent clustering technology” that allows you to make data structures available across a cluster of machines in a highly-scalable and robust manner. Unlike many other clustering solutions (including the very popular memcached), it doesn’t expose an API that a developer leverages to push data structures in and out of a big distributed container. Rather, it’s a library that’s boot-strapped into your JVM while the behavior is driven by an XML config file. This allows for sharing data in the fields of a class across the cluster as well as synchronized access to objects, just like in any multithreaded application. Terracotta is able to do this through some very interesting decoration of bytecode as Java classes are loaded into the JVM. What this ultimately allows for is something like a large shared memory heap shared by all JVMs which can survive JVM crashes since all data is also written to disk. Additionally, since Terracotta doesn’t use a peer-to-peer approach of data replication, it’s easier to achieve linear scalability.

Sound interesting? Learn more at the Terracotta web site.

This is an excellent book. The prose is well-written and engaging and the book flows very well from section to section. There are massive amounts of Java code and configuration such that you very rarely have to picture anything in your mind, you can just read it there on the page. There are helpful diagrams where appropriate. It’s unlikely that a reader with a good understanding of Java will become confused at any point during the book. It’s informative and provides some excellent examples of real world use, including especially chapters on integration with things like like Spring, Hibernate, session replication and more. There is also an extensive chapter on using Terracotta to create a Master/Worker compute grid. If you’re looking to learn more about Terracotta I really can’t recommend this book enough: it helped fill so many gaps that I had after skimming some of the documentation and reading a few of the white papers.

The only real negative is that the book is slow to get started. The first two chapters (~40 pages) serve as an introduction and history to the technology respectively but taken together, it’s just a very lengthy introduction that rehashes a lot of the same concepts. Maybe this was tedious for me given that I’d already read a lot of documentation on Terracotta but it just seemed like the intro could have been a bit shorter.

I’ll now run down the other chapters in the book and detail some that I found particularly interesting.

Chapter 3 is a quick jump into the framework and some tooling while Chapter 4 gets into the nitty-gritty details of POJO clustering. This is important to read to understand how Terracotta does what it does. Chapter 5 talks about how to do caching and this is where you start to understand the real world problems that can be solved using the tool. Your database will thank you!

Chapter 6 is where it gets really interesting. Here you will learn how to use Terracotta as a 2nd-level cache provider for Hibernate to significantly boost performance over using something like Ehcache . More startling than that is a proposed architecture where the notion of POJO clustering is used to effectively put data structures that hold detached objects (those not attached to an active Hibernate Session). You are shown how to change application code that uses the Hibernate API in a “typical” fashion to achieve performance increases measured in orders of magnitude. This is truly an eye-opener.

Chapter 7 shows you how to cluster HTTP Sessions and how you can be freed from some of the annoying restrictions of the Servlet container Session API, such as implementing Serializable and religiously using setAttribute(). ; This is the sort of thing you can plug into an existing application very quickly and realize enormous scalability gains.

Chapter 8 is about clustering Spring beans. Spring and Terracotta follow a very similar philosophy in that they are non-invasive frameworks. As such, they compliment each other very nicely. This chapter shows how easy it is to cluster Spring beans: even easier perhaps than clustering POJOs as in Chapter 4. At this point, if you are a user of Spring and Hibernate, you’re starting to see how easy it can be to achieve seriously scalability and performance improvements.

Chapter 9 talks about Terracotta Integration Modules which is a sort of package that provides additional features to the Terracotta core: this is how integration with Hibernate and Spring are achieved. Chapter 10 gives an extended treatment of thread coordination, showing how well-written multithreaded code can be used with Terracotta to achieve thread coordination across multiple JVMs. Chapter 11 takes this further to detail the Master/Worker pattern for computing grids. Chapter 12 rounds things out by showing the visualization tools that can be used to monitor and debug an app using Terracotta.

As I said before, this is a great book. If you’re interesting in scaling out enterprise Java applications, you owe it to yourself to check out Terracotta and this book.