Using Apache Ignite as a Hibernate second-level cache
Introduction
In Java world, Hibernate is a well known ORM and JPA provider. In short, Hibernate facilitates database access. For Java web applications, Hibernate is a often used solution. So, each time a web request arrives, database is queried and updated using this framework. Of course, some optimisations are possible, which send us to the subject of Hibernate caching.
Hibernate caching
Hibernate supports several levels of caching: first level cache (enabled by default), second level cache, query cache (same as first level cache). Before discussing these levels of cache, we need to discuss a little about Hibernate architecture.
Full Hibernate documentation can be found here; the relevant information is the existence of a class, Session, or EntityManager, it’s JPA equivalent. Database access is done using this class.
The Session object is a single-threaded, short-lived object, usually associated with a session (for example a web session, mapping a web request). So, there is a Session per each web request. The first level cache is associated with this object. Once a item is retrieved from the DB, it is stored here. Next time the same object is requested in the same web session, it is retrieved from the first level cache and not from DB. But if a different web request arrives, or the same web web request arrives later, database will be called again. To prevent this behavior, the concept of second level cache was born.
While the first level cache is implemented by Hibernate internally, the second level cache is optional, and is provided by a different solution. Hibernate needs to be configured to use a second level cache provider, and to use second level cache; later in this article, it will be explained how. On top of second level cache, a query cache can be added. Query cache is optional, and depends on second level cache.
A demo application was created for illustrating the concepts described in this article.
Demo application
The demo application, named ElectronicStore, manages an electronic store. I’s a REST API, retrieving data from the following database:
The database models an electronic store. The store has several branches, saved in the stores table (which id, city, address). Items to be sold are saved in the items table (having an id, serial number, name, description). Each item can have a review, saved in the reviews table (which id, item_id (item for which the review is created), nr_stars, comment). To indicate that an item can be found in a specific branch, a new table was created, items_location (having an id, an item_id (the item placed in the branch), store_id (the branch where the item is placed)).
There cannot be an electronic store without a logo, so here we have:
Project setup
The project uses PostgreSQL database, is implemented using Spring Framework and uses embedded Jetty as a web server. Maven is used as a build tool. A new database was configured as in following tutorial; then, the tables were created using this script. Jetty was used in embedded mode, and the build produces a .jar file; it can be easily run. Here is the project repository. As you can see in the repository, we have several controllers. We have a controller that saves items, updates items or retrieves them by id. Of course, each retrieve item by id request will have to access the DB. To see what DB calls the application makes, we will set property hibernate.show_sql to value true.
The project exposes the following API:
GET /item/{id}, retrieves an item according to it’s id. Example:
POST /item/save, saves an item. Example:
PUT /item/update, updates an item. Example:
GET /store/{id}, gets a store by id. Example:
POST /store/save, saves a store. Example:
PUT /store/update, updates a store. Example:
GET /store/list/{city}, lists all branches from a city. Example:
GET /review/{id}, gets a review by id. Example:
POST /review/save, saves a review. Example:
GET /review/all/item/{itemId}, gets all reviews for a given item. Example:
PUT /review/update/commentsbyrating, updates comments given a rating; used for illustrating cache invalidation when data is updated. Example:
PUT /review/update/commentsbyrating/native, updates comments given a rating, this time using a native query; used for illustrating cache invalidation when data is updated. Example:
To better understand the need of second level cache, let’s first run the project without caching. We will access the ‘GET /item/{id}’ endpoint, two times, each time with the same id.
At the first access, we will see the following in the logs:
So, as we can see, the database was hit. And, at the second call, we can see the following:
Unsurprisingly, the database is hit again. Two web requests, requiring the same data, arriving almost at the same time, will make a database call each. To improve performance, the second level cache concept was born.
Configure second-level cache
To use second-level cache, we must configure Hibernate to use second-level cache, and integrate a caching solution.
To configure Hibernate to use second-level cache, we just have to set several properties:
For each entity that we want to cache, we will have to use the @Cacheable and @Cache annotations. For example:
The @Cache annotation specifies a cache concurrency strategy. More details can be found here or here.
Hibernate stores cached entities using the id as key. So, repeated calls entityManager.find(Item.class, id);, for the same id, will retrieve items first time from the DB, then from the cache.
In addition, we can configure query cache too. For this, we need to configure Hibernate:
And, we have to inform Hibernate about each query that we want to cache. For example:
Query caches are useful for queries that are frequently executed with the same parameter values, and for entities that are in general unchanged. Our example is a perfect one for query caches: we are searching for branches by city; this search is quite frequently used, and the probability to search all stores in a given city, by different requests, is quite big. Also, we introduce new branches, or change their address, very seldom. Once a store query was cached, it will be invalidated each time we add or update a store.
Now, that we configured Hibernate, we need to integrate a caching provider. There are several cache solutions out there: Infinispan, Ehcache, Apache Ignite, Hazelcast, and so on. In Java world, since there are many libraries for cache, an standardization effort was made, that resulted in JSR 107, or JCache, specification. Since Hibernate 5.2.0, there is available a new Hibernate module, hibernate-jcache, for integrating a JCache provider as a second-level cache. Now, after we added hibernate-jcache, we only need to include a JCache provider in our project then configure Hibernate to use that provider:
We integrated Apache Ignite as a JCache provider. If we wanted to use, let’s say, Ehcache as a JCache provider, we just had to use the following configuration:
This is the power of standardization: for integrating a different solution, we just have to change some configurations. Of course, this is not always so easy, for example maybe we will have to configure each solution differently.
Testing second-level cache
Now, that we configured Hibernate second-level cache, we can test our configuration. We will start two instances of our ElectronicStore application. In two different terminal windows, execute the following (after the project was compiled):
We started two instances of ElectronicStore, one running on 8080 and the other on 8081. Without any other configuration, each instance will have it’s own Apache Ignite, running in server mode, non-communicating. So cache will not be shared between instances. This is how the cluster looks like now:
We can easily see this, first of all, when we start the instances: in the logs, we see messages like this:
To further see this, we will call GET /item/4 endpoint, two times, for each instance. In the logs, we will see
for the fist instance and
for the second one.
As we can see, cache works, but it is not shared between instances.
Luckly, Apache Ignite runs in cluster mode, so, we can configure the Ignite cache, for each instance, to run in client mode and to connect to an Ignite instance running in server mode. To do this, we will write a configuration file for Ignite cache, specifying that we want the cache to run in client mode:
Next, we will instruct Hibernate to use this configuration file:
The application is ready. We also have to prepare the Apache Ignite server. For this, we will download, as specified here, Apache Ignite, we will unzip it, and we will go which a terminal window to the folder where we extracted the archive. To start the Apache server:
Once the server is started we will see the following in logs:
Apache Ignite also has a cmd manager. To use it, we will open a new terminal window, go to the same folder, and we will start the manager:
Next, after we updated the source code, we will recompile it and restart both instances. We will now see the following in the logs:
when we start the first instance and
when we start the second one. In the Apache Ignite server logs, we see the following:
So, when we started the ElectronicStore instances, the Ignite clients connected to the Ignite server.
Now, the cluster looks like this:
Ok, now we are ready to repeat the experiment. We will see:
in the first instance’s logs and
in the second one. So, we can see, the data was cached between the two instances of ElectronicStore.
We can inspect the cache content which the console manager. Following the instructions, we will see this:
We can see both the key used for cache (4 in our example) and the cached data, in a disassembled state (Keystone 16 GB stick., Keystone stick, s-103 U, version=null).
Let’s further play with the cache. If we update the same item, will the cache be invalidated?
This is what we see in logs:
So, we can see that the item was updated in database, and the cache was invalidated (as expected). But, we can see with the console manager that the cache was updated too. A further GET call will confirm that, indeed, updated data is retrieved from the cache, and not from the database.
Next, let’s play with the query cache a little. The endpoint GET /store/list/{city} uses a cached query behind the scenes.
We will hit the endpoint, two times, which the same value of city each time. This is what we see in logs:
Basically, the query result was cached, and only one select statement was performed. From the console manager, we can see that we have a cache query results region, where the id of results are stored.
An other experiment can be to create an other store. We will see, the query cache is invalidated (as expected).
An other endpoint has a cached query behind the scenes, GET /review/all/item/{itemId}. First, let’s hit this endpoint to cache the result, then let’s update the reviews, and see what happens with the stores and reviews query caches.
First, we will update using PUT /review/update/commentsbyrating; this will update the DB using a HQL query. We can see, the stores cache is still valid, but the reviews cache is invalidated.
Next, we will update using PUT /review/update/commentsbyrating/native; this will update the DB using a native SQL query. Now, all the second level cache is invalidated. Hibernate cannow know what items are affected by a native querry, so all second level cache is invalidated.
In order to stop a native query from invalidating all the cache, there is a solution implemented in Hibernate but not described by the JPA standard. The solution, is described here: nativeQuery.unwrap(org.hibernate.SQLQuery.class).addSynchronizedEntityClass(Foo.class).
Conclusions
We described caching in Hibernate, and the need for second level cache. We configured second level cache using hibernate-jcache and a JCache provider. Then, we performed some experiments with the second level cache. We noticed that a native query will invalidate all second level cache (and query cache), unless we use an non-JPA standard, Hibernate specific solution.