Bothersome Ad
The strange thing happens over the time when I try to google for 'apache ignite' - Hazelcast's advertisement bubbles up to the top of the list suggesting that Hazelcast is up to 50% faster than Apache Ignite:
The first suspicious thing to note right after you click on the link is that Hazelcast compares to Apache Ignite 1.5 that was released more than a year ago! Secondly, I totally agree that it's fine to boast about your success stories for some period of time but it's funny to see when this continues throughout a year without updating benchmarking results on the targeted page.
Well, this seems to be an oversight on Hazelcast's marketing team side. This happens. So, let's help the team to go back to the reality and show a present state of affairs comparing the latest versions of Apache Ignite and Hazelcast.
General Benchmarking
The simplest way to benchmark a distributed platform like Apache Ignite or Hazelcast is to launch a cluster of several machines and run a client process that will produce the load and gather the benchmarking results. For the sake of general benchmarking, a cluster of 4 server/data nodes was prepared on AWS and the load was coming from a single client machine (aka. application). Yardstick was used as a benchmarking framework. All the parameters and instructions are listed below:
AWS EC 2 Configuration
|
|
EC 2 Instance
|
r4.2xlarge
|
CPU
|
8
|
RAM
|
61 GB
|
OS
|
Ubuntu 16.04
|
Java
|
Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13
|
Yardstick Configuration
|
|
Nodes
|
1 Client, 4 Servers
|
Threads
|
64
|
Backups
|
1, Synchronous
|
Running Yardstick on Amazon
|
https://github.com/apacheignite/yardstick-ignite#running-on-amazon
|
Yardstick and Clusters Configurations
|
https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1
|
Following "Running Yardstick on Amazon" instruction with provided configurations we can reproduce these numbers:
Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1
It's obvious that Apache Ignite 1.9 significantly outperforms Hazelcast 3.8.1 in most of the basic operations pulling ahead on up to 160% in some of the scenarios.
At the same time, we can see that Hazelcast performs better in some atomic operations going ahead Apache Ignite on up to 4%. Honestly, that's great to know that there is still a room for performance improvements in Apache Ignite and that Hazelcast doesn't make the life of Ignite's performance engineers easier.
However, after that performance loss was spotted it was decided to run the same set of the benchmarks but under the higher load that is more relevant to production scenarios - the load was generated by 8 client machines (aka. applications) rather than by a single one. The results were surprising and uplifting as we can see from the next section.
Put More Load
Yardstick Configuration
|
|
Nodes
|
8 Client, 4 Servers
|
Yardstick and Clusters Configurations
|
https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1
|
At all, after the total number of client machines was increased from 1 to 8 the following numbers were reproduced:
Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1
These are the numbers taken from one of the client machines. To get the total number of operations per second we just need to accumulate all of them. In any case, looking at the results now we see that Apache Ignite beats Hazelcast even under the higher load for every benchmark.
For instance, Apache Ignite ANSI-99 SQL engine now outperforms Hazelcast's predicates-based querying engine on 200% while in the 1 client machine scenario the difference was only around 80%.
Even more, Apache Ignite took a lead at all the atomic benchmarks jumping from 4% it lost to Hazelcast before to victorious 42% for atomic-put-get-bs-6 scenario.
It would be interesting to repeat your benchmarks but now with PRIMARY CacheAtomicWriteOrderMode. Clock was the default in 1.9, but has been replaced by Primary.
ReplyDeletehttps://issues.apache.org/jira/browse/IGNITE-4587
Apart from switching to Primary (since Clock is broken); I would also suggest switching to Hazelcast lite member, which effectively are the same as Ignite clients.
ReplyDeleteHey, it's good to hear from you.
DeleteThe clock mode might have affected only specific operations related to continuous queries and entry processors. It was safe to use it for the rest of operations like the ones from the posted benchmarks. However, after we made decision to discontinue it at all we validated that there is no performance drop:
https://issues.apache.org/jira/browse/IGNITE-4587?focusedCommentId=15948419&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15948419
In any case, thanks for referring to Hazelcast lite member. We will share the benchmarking results using the primary clock mode on Ignite side and the lite member on Hazelcast.
The big performance advantage of clock compared to primary is that with Clock the maximum number of network-hops is 2 since the modification can send to all replica's in parallel. With primary you have 3 network-hops. So it is unlikely that you won't be seeing a performance degradation since it is hard to beat physics.
DeleteSomething else you might to want to have a look at is how your cores behave on EC2; run a simple map.get benchmark, install htop and look for the red core. This core is not busy with Java code but is bottlenecking performance.
DeleteThat why physical hardware for benchmarking is more reliable since there is no artificial bottleneck.
Along with the removal of the clock mode we optimized an internal cache update protocol. This is why the primary write order mode became comparable to the clock one.
DeleteAs for the EC2 it's obvious that the results might float there because of the specificities like the one your pointed out above. But I don't see much sense to publish benchmarking results obtained on a private hardware if nobody can reproduce them.
A similar thing can be said about publishing benchmark information based on an environment that has a serious performance bottleneck. Easily 50/75% of performance is lost due to this issue; and the more powerful the hardware is you throw at the problem, the more obvious it becomes.
DeleteIt is like putting a Ferrari and Lamborghini on a dirt-track and see how fast they can go.
pveentjer, Ignite was launched with PRIMARY write order mode. You can check this - https://github.com/gridgain/yardstick/blob/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1/ignite/config/benchmark.properties. Pay attention to "-wom PRIMARY" parameter.
DeleteIn order to check that parameter has been properly applied I have checked this in the logs we have got from the run. There is the following entry:
<14:36:13> Cache configured with the following parameters: CacheConfiguration [name=atomic, ..., startSize=1500000, nearCfg=null, writeSync=FULL_SYNC, ..., cacheMode=PARTITIONED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY, backups=1, ..., memMode=ONHEAP_TIERED, ..., readFromBackup=true, ...]
(I replaced the irrelevant pairs with "...")
We will also check Hazelcast lite client.
This comment has been removed by the author.
ReplyDelete