Monday, April 23, 2012

What does it mean to be JBoss Data Grid ??

Red Hat recently announced beta release of JBoss Data Grid (JDG) , a supported product based on Infinispan. Infinispan was first announced in 2009 and its first GA release was made in February, 2010. So it took more than three years for us, folks at JBoss, to announce a supported beta release. In case you are wondering why it took so long, here are few things to consider. Besides introducing several handy features such as virtual nodes, grouping API, deadlock detection and optimization, improvements with transactions, we wanted to make sure that it
  • can scale in a large cluster. Scalability is a very generic term and its meaning is highly contextual. For JDG, we narrowed down it to two : client scalability and data scalability.
    • Client Scalabilty:  Given a  percentage of data in virtual heap, does JDG handle more client requests as more nodes are added to the cluster? We measure throughput, 99 percentile response  time under read heavy, write heavy scenarios to verify this assertion.
    • Data Scalability: Given a fix number of clients, how much data can be filled in virtual heap while performance metrics such as throughput and 99 percentile response time are still under user defined values. This is again tested under read heavy, write heavy scenarios. 
  • is elastic. Does the cluster scale out and scale in? As nodes are added and removed, how long does it take for cluster to stabilize? This stability is measured by comparing data distribution before and after nodes leave and join.
  • is resilient. While cluster is in active use and node/s crash, what's the impact on throughput, response time? Is there any data inconsistency? How long does it take to stabilize?
  • is stable. Tests are run for long time and monitored for CPU, memory usage, number of full GC, average GC time.
This matrix gets quite complicated if you add various configurations that are available in JDG such as three different end points (Hot Rod, memcached, REST), partially versus fully replicated modes aslo known as "DISTRIBUTED" and  "REPLICATED" mode respectively,  synchronous vs asynchronous replication, number of virtual nodes, READ heavy vs WRITE heavy, L1 cache enabled vs disabled, File Cache Store, JDBC cache store, and so on. 

I have only touched the performance and scalability aspects in details. There are various other functional verification which gets quite complicated as well if you think in terms of OS+JVM+File System + DB combinations we have to certify. 

I can proudly said that we have come a long way and come GA availability, our customers will have in their hands an extremely scalable and highly available data grid platform. And I should mention linear scalability as well!! Wait should not be too long. ;-)

No comments: