Monday, June 29, 2009

Terracotta's Hibernate Integration

This post is re-post of my earlier write-behind post but in different perspective : Terracotta's Hibernate integration 3.1

With version 3.1 Terracotta has implemented its own Caching for Hibernate Second Level Caching Provider. Earlier Terracotta's hibernate integration approach was : clustering EHCACHE. Terracotta with its JVM clustering ability, it was easily possible to cluster any POJO structure. So before 3.1, you might have used EHCACHE as hibernate second level cahce provider and tim-hibernate and tim-ehcache for clustering second level cache. With version 3.1 onwards terracotta will have its own cache backed by map-evictor and concurrent string map. Apart from this new hibernate integration has lots of new additions like cache admin console and read-write cache. Cache is always up-to-date and coherent.

But what I feel is that Terracotta platform is way more capable and following additional features can be added to make applications more scalable. These are just cool ideas.

Cache Warm-up feature
It would be nice feature to refresh or load cache whenever application or application cluster is starting up. This can easily be implemented with some sort of CacheLoader interface where Terracotta can callback this interface when faulting cache objects from terracotta server during first access. But such warm-up is only required on full cluster restart otherwise lot of meaningful cache entrites will get overwritten.

Write-Behind Caching
When you think of cache you will arrive at these cache strategies : Read-Through Caching, Write-Through Caching, Write-Behind Caching. Hibernate Second Level cache is Read-Write-Through Cache where if cache miss occurs, entity is read from database and then handed over to cache for susequent access. But H2LC is not Write-Behind caching. With Terracotta's disk persistence and asynchronsous module it would be really efficient for certain use-cases to implement write-behind. Currently hibernate just directly writes to database. Instead if its modified to write to second level cache and persistent async-database-queue, this would decrease latency and increase throughput dramatically. Imagine if you can schedule all your database writes in non-business hours using tim-async. I find write-behind is certainly the best way to reduce pressure on database. And with Terracotta's clusterwide coherent persistent datastore its practicaly possible. Terracotta would be your database guard taking all your querying as well as database inserts on its shoulders.
But this model would require certain changes in the way hibernate works. especially query cache. Since now Terracotta will have latest snapshot of yor System of Record, queries have to be executed against cache and not database. Thus it can not be generic solution. You can implemented write-behind only in certain cases where your business use case permits it. On the other hand to solve query problem Querymap that i disucssed in my previous posts can be used to query certain type of data. So if your business use case permits write-behind and query-map can give you very fast database accelerator. In one of my previous jobs I was working on financial application where certain set of objects were modified at very high rate and same were queried against. For such application classic replicated H2LC does not bring any value, instead it will degrade the performance due to overhead during frequent-cluster-wide updates. But Terracotta will make it scalable, forwarding updates only to Node on which cache entry exists, updating the object clusterwide so when AsyncProcessor picks it up it will contain all the changes made. Its Terracotta's DSO Magic.

Advantage here is that you dont have to do religious shift of Killing Your Database Totally. Database is your System of Record. With Terracotta Hibernate Accerlerator you are only delaying updates to SOR and not replacing it.

Currently I am going through Hibernate source code and learning how hiberante event mechanism works. My guess is that write-behind can be implemented with hibernate events. If not I may try to modify the source code to add write-behind and h2lc-cache querying capability. Hibernate search is similar where instead of classic session you get Indexing-aware session.

With Terracotta FX (assuming your application requires more than 4000 write operations per second - avg throughput of one un-tuned Terracotta server) your write throughput will increase linearly which is not possible with any RDBMS on any type of hardware.

I hope Terracotta will add these features in coming versions. Terracotta 3.1 Hibernate Integration is just start.

1 comment:

mx said...

Interesting...

So is it fair to say that using write-behind caching would be possible if we completely got rid of querying i.e. no querying whatsoever? And only loaded objects directly.

On the face of it this seems possible. Write behind caching seems to be a very powerful mechanism for speeding up database access though I suppose you'd sit with other problems from database concurrency contention.