Bothersome Ad

The strange thing happens over the time when I try to google for 'apache ignite' - Hazelcast's advertisement bubbles up to the top of the list suggesting that Hazelcast is up to 50% faster than Apache Ignite:

The first suspicious thing to note right after you click on the link is that Hazelcast compares to Apache Ignite 1.5 that was released more than a year ago! Secondly, I totally agree that it's fine to boast about your success stories for some period of time but it's funny to see when this continues throughout a year without updating benchmarking results on the targeted page.

Well, this seems to be an oversight on Hazelcast's marketing team side. This happens. So, let's help the team to go back to the reality and show a present state of affairs comparing the latest versions of Apache Ignite and Hazelcast.

General Benchmarking

The simplest way to benchmark a distributed platform like Apache Ignite or Hazelcast is to launch a cluster of several machines and run a client process that will produce the load and gather the benchmarking results. For the sake of general benchmarking, a cluster of 4 server/data nodes was prepared on AWS and the load was coming from a single client machine (aka. application). Yardstick was used as a benchmarking framework. All the parameters and instructions are listed below:


AWS EC 2 Configuration
EC 2 Instance
r4.2xlarge
CPU
8
RAM
61 GB
OS
Ubuntu 16.04
Java
Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13



Yardstick Configuration
Nodes
1 Client, 4 Servers
Threads
64
Backups
1, Synchronous
Running Yardstick on Amazon
https://github.com/apacheignite/yardstick-ignite#running-on-amazon
Yardstick and Clusters Configurations
https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

Following "Running Yardstick on Amazon" instruction with provided configurations we can reproduce these numbers:


Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

It's obvious that Apache Ignite 1.9 significantly outperforms Hazelcast 3.8.1 in most of the basic operations pulling ahead on up to 160% in some of the scenarios.

At the same time, we can see that Hazelcast performs better in some atomic operations going ahead Apache Ignite on up to 4%. Honestly, that's great to know that there is still a room for performance improvements in Apache Ignite and that Hazelcast doesn't make the life of Ignite's performance engineers easier.

However, after that performance loss was spotted it was decided to run the same set of the benchmarks but under the higher load that is more relevant to production scenarios - the load was generated by 8 client machines (aka. applications) rather than by a single one.  The results were surprising and uplifting as we can see from the next section.

Put More Load

This is the only part of the previously provided Yardstick configuration that was modified:


Yardstick Configuration
Nodes
8 Client, 4 Servers
Yardstick and Clusters Configurations
https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

At all, after the total number of client machines was increased from 1 to 8 the following numbers were reproduced:


Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

These are the numbers taken from one of the client machines. To get the total number of operations per second we just need to accumulate all of them. In any case, looking at the results now we see that Apache Ignite beats Hazelcast even under the higher load for every benchmark.

For instance, Apache Ignite ANSI-99 SQL engine now outperforms Hazelcast's predicates-based querying engine on 200% while in the 1 client machine scenario the difference was only around 80%.

Even more, Apache Ignite took a lead at all the atomic benchmarks jumping from 4% it lost to Hazelcast before to victorious 42% for atomic-put-get-bs-6 scenario.

The Upshot

It's always up to you to decide what kind of product to use in production. But the golden rule is that you shouldn't blindly follow official numbers or data prepared by a vendor. Use all the information as a basement and then get to know a product and test it for your own scenario. Only this way you will find out which product suits your case more.


9

View comments

  1. In the previous article, we reviewed several practical performance hints that let us exploit SSDs at their peak. Today you will see how to go further and improve random write workloads of a database by replacing regular SSDs with Intel Optane SSDs powered by cutting-edge 3D-XPoint technology.

    What might be wrong with sustained random write workloads on good-old SSDs? Those who read the previous article will remember that regular SSDs have to perform garbage collection routines endlessly by erasing blocks with stale data. Since any kind of garbage collection inevitably leads to performance degradation, SSDs manufactures came up with the over-provisioning technique that reserves some amount of space for cleaning needs. However, this space is limited and can be exhausted pretty quick depending on a workload.

    Thus, it is a prevalent situation to get the top I/O numbers on regular SSDs in the first hours of operation and then suddenly hit a significant performance drop while keeping the workload the same.  The picture below shows this exact tendency:


    Picture 1.

    Indeed, the Apache Ignite community could reproduce a similar curve for its native persistence witnessing how SSDs speeded up to 100 MB/s in first 5 minutes of usage and slowed down to 20 MB/s just in a matter of 2 hours under a sustained random write workload.

    Intel Optane SSDs (P4800X Series) to the Rescue

    Once Intel Optane SSDs hit the market, Ignite performance geeks ran the same set of benchmarks that put a sustained write workload on the cluster throughout 10 hours. In these benchmarks, Ignite persistence used Intel Optane SSDs instead of regular SSDs. When the benchmarks were over, we were staring at the graph below in (almost) disbelief:
       
    Picture 2.

    The graph shows that the random write throughput (red curve) did not fall dramatically even within 10 hours of sustained workload. To be more specific, ~20% decline was spotted after 6 hours of execution. That's negligible in comparison to the drops observed on regular SSDs under the same conditions that might be as big as 70%.

    Furthermore, the benchmarks generated a sustained random read workload (blue curve) at 2 to 5 and at 7- to 10-hour intervals. The graph suggests that the reads do not notably affect the throughput of the writes.

    Generally speaking, the benchmarks assured us that Optane SSDs do not suffer from the garbage collection routines that require spare memory to operate efficiently (over-provisioning). We were told that there is no spare area on an Intel Optane SSD, the media doesn’t require it.  Also, Intel engineers shared with us that memory management is far simpler with this media, only a fraction as complex as with NAND-based NVM that everyone else uses.  Thus, when it comes to writes, you can expect to see a greater balance of IOPS and overall quality of service performance.

    It is worth mentioning that Intel Optane SSDs were mounted as standard disk drives during the benchmarking and accessed using generic Java File I/O APIs. Curious to see what the benchmarks will show once Apache Ignite supports Intel Optane SSDs more natively. By the way, there are even more upcoming performance improvements to the Java IO API’s in Java 10 (that is scheduled for March 2018).













    0

    Add a comment

  2. As a software guy, I was always curious to know how the things work at the hardware level and how to apply the knowledge for more advanced optimizations in applications. Take Java Memory Model for instance. The model grounds its memory consistency and visibility properties on keywords such as volatile or synchronize. But these are just the language keywords and you start looking around how JVM engineers could turn the model in life. At some point, you will breathe out revealing that the model utilizes a low-level instruction set for mutexes and memory barriers at the very bottom of the software pie running on physical machines. Nice, these are the instructions a CPU understands but the curiosity drives you further because it is still vague how all the memory consistency guarantees can be satisfied on multi-CPU machines with several CPU registers and caches. Well, the hardware guys took care of this by supporting the cache coherence protocol. And finally you, as a software guy, can develop highly-performant applications that halt CPUs and invalidate their caches only on purpose with all these volatile, synchronize and final keywords.

    Apache Ignite veterans tapped into the knowledge above and, undoubtedly, could deliver one of the fastest in-memory database and computational platform. Presently, the same people are optimizing Ignite Native Persistence - Ignite's distributed and transactional persistence layer. Being a part of that community, let me share some tips about solid-state drives (SSDs) that you, as a software guy, can exploit in Ignite or other disk-based databases deployments.

    SSD Level Garbage Collection

    Garbage collection (GC) term is used not only by Java developers to describe the process of purging dead objects from Java heap residing in RAM. Hardware guys use the same term for the same purpose but in relation to SSDs.

    In simple words, an SSD stores data in pages. Pages are grouped in blocks (usually 128/256 pages per block). The SSD driver can write data directly into an empty page but can clean the whole blocks only. Thus, to reclaim the space occupied by invalid data, all the valid data from one block has to be first copied into empty pages of another block. Once this happens, the driver will purge all the data from the first block giving more space for new data arriving from your applications.
    This process happens in the background and called with a familiar term - garbage collection (GC).

    So, if you suddenly observe a performance drop under a steady load like it's shown in Figure 1. below, do not be trapped blaming your application or Apache Ignite. The drop might be caused by SSD GC routines.


    Figure 1.


    Let me give you several hints on how to decrease the impact of the SSD GC on the performance of applications.

    Separate Disk Devices for WAL and Data/Index Files

    Apache Ignite arranges data and indexes in special partition files on disk. This type of architecture does not require you to have all the data in RAM, if something is missing there Apache Ignite will find the data on disk in these files.

    Figure 2.

    However, referring to Figure 2., every data (1) that is received by Apache Ignite cluster node will be stored in RAM and persisted (2) in a write-ahead log (WAL) first. This is done by performance reasons and once the update is in the WAL, your application will get the acknowledgment and be able to execute its logic. Then, in the background, the checkpointing process will update the partition files by copying dirty pages from RAM to disk (4). Specific WAL files will be archived over the time and can be safely removed because all the data will be already in the partition files.

    So, what's the performance hint here? Consider using separate SSDs for the partition files and the WAL. Apache Ignite actively writes to both places, thus, by having separate physical disk devices for each you may double the overall write throughput. See how to tweak the configuration for that.


    SSD Over-provisioning

    As the Java heap, SSD requires free space to perform efficiently and to avoid significant performance drops due to the GC. All SSD manufactures reserve some amount of space for that purpose. This is called over-provisioning.

    Here are you, as a software guy, should keep in mind that the performance of random writes on a 50% filled disk is much better than on a 90% filled disk because of the SSDs over-provisioning and GC. Consider buying SSDs with higher over-provisioning rate and make sure a manufacturer supports the tools to adjust it.

    Anything else?

    That's enough for the beginning. If you are a sort of the guy who wants to get most of the hardware by tweaking page size or swapping settings refer to this tuning page maintained by Apache Ignite community.


    2

    View comments

  3. Bothersome Ad

    The strange thing happens over the time when I try to google for 'apache ignite' - Hazelcast's advertisement bubbles up to the top of the list suggesting that Hazelcast is up to 50% faster than Apache Ignite:

    The first suspicious thing to note right after you click on the link is that Hazelcast compares to Apache Ignite 1.5 that was released more than a year ago! Secondly, I totally agree that it's fine to boast about your success stories for some period of time but it's funny to see when this continues throughout a year without updating benchmarking results on the targeted page.

    Well, this seems to be an oversight on Hazelcast's marketing team side. This happens. So, let's help the team to go back to the reality and show a present state of affairs comparing the latest versions of Apache Ignite and Hazelcast.

    General Benchmarking

    The simplest way to benchmark a distributed platform like Apache Ignite or Hazelcast is to launch a cluster of several machines and run a client process that will produce the load and gather the benchmarking results. For the sake of general benchmarking, a cluster of 4 server/data nodes was prepared on AWS and the load was coming from a single client machine (aka. application). Yardstick was used as a benchmarking framework. All the parameters and instructions are listed below:


    AWS EC 2 Configuration
    EC 2 Instance
    r4.2xlarge
    CPU
    8
    RAM
    61 GB
    OS
    Ubuntu 16.04
    Java
    Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13



    Yardstick Configuration
    Nodes
    1 Client, 4 Servers
    Threads
    64
    Backups
    1, Synchronous
    Running Yardstick on Amazon
    https://github.com/apacheignite/yardstick-ignite#running-on-amazon
    Yardstick and Clusters Configurations
    https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

    Following "Running Yardstick on Amazon" instruction with provided configurations we can reproduce these numbers:


    Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

    It's obvious that Apache Ignite 1.9 significantly outperforms Hazelcast 3.8.1 in most of the basic operations pulling ahead on up to 160% in some of the scenarios.

    At the same time, we can see that Hazelcast performs better in some atomic operations going ahead Apache Ignite on up to 4%. Honestly, that's great to know that there is still a room for performance improvements in Apache Ignite and that Hazelcast doesn't make the life of Ignite's performance engineers easier.

    However, after that performance loss was spotted it was decided to run the same set of the benchmarks but under the higher load that is more relevant to production scenarios - the load was generated by 8 client machines (aka. applications) rather than by a single one.  The results were surprising and uplifting as we can see from the next section.

    Put More Load

    This is the only part of the previously provided Yardstick configuration that was modified:


    Yardstick Configuration
    Nodes
    8 Client, 4 Servers
    Yardstick and Clusters Configurations
    https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

    At all, after the total number of client machines was increased from 1 to 8 the following numbers were reproduced:


    Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

    These are the numbers taken from one of the client machines. To get the total number of operations per second we just need to accumulate all of them. In any case, looking at the results now we see that Apache Ignite beats Hazelcast even under the higher load for every benchmark.

    For instance, Apache Ignite ANSI-99 SQL engine now outperforms Hazelcast's predicates-based querying engine on 200% while in the 1 client machine scenario the difference was only around 80%.

    Even more, Apache Ignite took a lead at all the atomic benchmarks jumping from 4% it lost to Hazelcast before to victorious 42% for atomic-put-get-bs-6 scenario.

    The Upshot

    It's always up to you to decide what kind of product to use in production. But the golden rule is that you shouldn't blindly follow official numbers or data prepared by a vendor. Use all the information as a basement and then get to know a product and test it for your own scenario. Only this way you will find out which product suits your case more.


    9

    View comments

    1. It would be interesting to repeat your benchmarks but now with PRIMARY CacheAtomicWriteOrderMode. Clock was the default in 1.9, but has been replaced by Primary.

      https://issues.apache.org/jira/browse/IGNITE-4587

      ReplyDelete
    2. Apart from switching to Primary (since Clock is broken); I would also suggest switching to Hazelcast lite member, which effectively are the same as Ignite clients.

      ReplyDelete
      Replies
      1. Hey, it's good to hear from you.

        The clock mode might have affected only specific operations related to continuous queries and entry processors. It was safe to use it for the rest of operations like the ones from the posted benchmarks. However, after we made decision to discontinue it at all we validated that there is no performance drop:
        https://issues.apache.org/jira/browse/IGNITE-4587?focusedCommentId=15948419&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15948419

        In any case, thanks for referring to Hazelcast lite member. We will share the benchmarking results using the primary clock mode on Ignite side and the lite member on Hazelcast.

        Delete
      2. The big performance advantage of clock compared to primary is that with Clock the maximum number of network-hops is 2 since the modification can send to all replica's in parallel. With primary you have 3 network-hops. So it is unlikely that you won't be seeing a performance degradation since it is hard to beat physics.

        Delete
      3. Something else you might to want to have a look at is how your cores behave on EC2; run a simple map.get benchmark, install htop and look for the red core. This core is not busy with Java code but is bottlenecking performance.


        That why physical hardware for benchmarking is more reliable since there is no artificial bottleneck.

        Delete
      4. Along with the removal of the clock mode we optimized an internal cache update protocol. This is why the primary write order mode became comparable to the clock one.

        As for the EC2 it's obvious that the results might float there because of the specificities like the one your pointed out above. But I don't see much sense to publish benchmarking results obtained on a private hardware if nobody can reproduce them.

        Delete
      5. A similar thing can be said about publishing benchmark information based on an environment that has a serious performance bottleneck. Easily 50/75% of performance is lost due to this issue; and the more powerful the hardware is you throw at the problem, the more obvious it becomes.

        It is like putting a Ferrari and Lamborghini on a dirt-track and see how fast they can go.

        Delete
      6. pveentjer, Ignite was launched with PRIMARY write order mode. You can check this - https://github.com/gridgain/yardstick/blob/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1/ignite/config/benchmark.properties. Pay attention to "-wom PRIMARY" parameter.

        In order to check that parameter has been properly applied I have checked this in the logs we have got from the run. There is the following entry:
        <14:36:13> Cache configured with the following parameters: CacheConfiguration [name=atomic, ..., startSize=1500000, nearCfg=null, writeSync=FULL_SYNC, ..., cacheMode=PARTITIONED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY, backups=1, ..., memMode=ONHEAP_TIERED, ..., readFromBackup=true, ...]

        (I replaced the irrelevant pairs with "...")

        We will also check Hazelcast lite client.

        Delete
    3. This comment has been removed by the author.

      ReplyDelete

  4. Apache Ignite 1.7.0 has been rolled out recently and among the new changes you can find a killer one that was being awaited by many Apache Ignite users and customers for a long time - the Non-Collocated Distributed joins support for SQL queries. So this post will be fully dedicated to this feature, and I'll try to shed some light on how the non-collocated distributed joins work and how they are different from the traditional (affinity collocation based) joins available in Apache Ignite.

    Affinity Collocation Based Joins

    Historically, Apache Ignite allowed executing SQL queries with joins across different tables but it required collocating the data of the caches that are being joined in a query. In fact, in Ignite, collocation can be enabled easily by using the affinity key concept where the data of one business entity is stored on the same node where the other business entity resides.

    For example, let's say you have two business entities - Organization and Person, and an Organization ID is used as an affinity key for Persons from that Organization. Then, Ignite will make sure to place all the Persons data on the same node where their Organization data resides. This simple concept allows executing a whole range of imaginable SQL queries that are ANSI-99 compliant, including joins between multiple caches.

    Basically, the execution flow of a SQL query that uses a join is absolutely the same as that of a query without the latter.

    Let's have a look at the flow of one of the basic queries using Organizations and Persons business entities defined in the following way:
    • Organization(id, address) entity - where id is literally an Organization ID and its value will be used as a cache key at the time an Organization is put into the cache. The key that is used as a cache key is treated as a primary key at the Ignite SQL engine's layer. Keep this in mind till you get to the end of the blog post! 
    • Person(name, salary) entity - will be located in Persons cache, and as a cache key we will use AffinityKey(id, orgId) where AffinityKey is a special kind of object in Ignite that allows to define a Person's unique ID (the first parameter) as well as his affinity key (the second parameter). Here, Organization ID (orgId) has been chosen as Person's affinity key. This means that Persons will be located on the same node where their Organizations reside. 

    After defining these business entities and preloading caches with data, we are free to execute an SQL query like the one below. Since the Persons are affinity collocated with their Organization, we're guaranteed to receive a complete result set.

    SELECT * FROM Organization as org JOIN Person as p ON org.id = p.orgId

    The execution flow of this query, depicted on Picture 1 below, will be the following:
    1. The query initiating node (mapper & reducer) sends the query to all the nodes where cached data resides (Phase Q).
    2. All the nodes that receive the query from the reducer will execute it locally, performing the join using local data only (Phase E(Q)).
    3. The nodes respond to the reducer with their portion of the result set (Phases R1, R2 and R3).
    4. The reducer will eventually reduce the result sets received from all the remote nodes and provide a final aggregated result to your code (Phase R)
    Picture 1. Collocated SQL Query


    Non-collocated Distributed Joins


    If the same query was executed on a non-affinity-collocated data, then you would get an incomplete and inconsistent result. The reason for that is that Apache Ignite’s versions earlier than 1.7.0 perform the query only on the local data (as described in step [2] of the flow above).

    However this is no longer true in Apache Ignite 1.7.0 and later versions that provide support for the non-collocated distributed joins. These joins no longer force you to collocate your data. 

    Now we can use Person's actual id as is, instead of the AffinityKey(id, orgId) as a cache key, and add orgId field to the content of the Person object itself in order to be able to perform joins between these two caches. Even after these modifications, we will still get a complete result regardless of the fact that Persons are no longer collocated with their Organization. This is because in the latest version of Ignite the execution flow of the same query (mentioned above) and now depicted on Picture 2 below will be the following : 
    1. The query initiating node (mapper & reducer) sends the query to all the nodes where cached data resides (Phase Q).
    2. All the nodes that receive the query from the reducer will execute it locally (Phase E(Q))  performing the join using both the local data and the potential data requested from the remote nodes (Phase D(Q)).
    3. The nodes respond to the reducer with their portion of the result set (Phases R1, R2 and R3).
    4. The reducer will eventually reduce the result sets received from all the remote nodes and provide a final aggregated result to your code (Phases R)
    Picture 2. Non-collocated SQL Query

    One important thing to note here is that due to specificity of the query, a node will send broadcast requests into the cluster asking for the missing data in step [2]. However, even now there is a way to help the optimizer and the SQL engine to switch from broadcast to unicast requests for certain join types, and for the exemplar query, the following modification may enable the unicast mode: 

    SELECT * FROM Organization as org JOIN Person as p ON org._key = p.orgId
    With this query, if the SQL engine decides to execute the query against Persons cache first joining with Organizations on the go, then the engine will send unicast requests to nodes that store Organizations with org._key(s), where _key is a special keyword that is used in Ignite SQL queries and it refers to object's cache key/primary key. Basically, it works since the engine can easily find out a node that stores an entry knowing its cache key/primary key. The same is true for affinity keys that are used to join some caches. 

    Last word

    Undoubtedly, Ignite's non-collocated distributed joins functionality makes it possible for applications to execute very complex analytics and operational queries in cases where it's not feasible to collocate all the data. However, I would advise not to overuse this approach in practice because the performance of these joins is worse than the performance of the affinity collocation based joins due to the fact that there will be much more network round-trips and data movement between the nodes to fulfill a query. 

    In reality, there is a very little chance that you will be able to collocate all your business entities in such a way that you can execute 100% of your SQL over the data cached locally. Usually, it’s possible to collocate data satisfying 95% of queries that will be executed in the fastest and most efficient manner, and use the non-collocated distributed joins for the residual 5% that may be not as efficient but this will eventually let you execute 100% of all your queries in Apache Ignite.

    0

    Add a comment

  5. In the previous post we were talking about the situation when an Ignite cluster is deployed in Google Compute Engine network and we need to have nodes auto-discovery mechanism.

    Well, Apache Ignite has Compute Engine specific solution to fill this gap but what's about many others cloud platforms? What if I want to use AWS, Rackspace, GoGrid or other cloud provider?

    Do you need to switch back to annoying and fragile static IP configuration? Luckily, the answer - you don't!

    Since 1.1.0-incubating release Ignite has a so called TcpDiscoveryCloudIpFinder that supports a variety of cloud providers out of the box. Actually, under the hood this IP finder is integrated with well-known Apache jclouds multi-cloud toolkit. Using jclouds the IP finder retrieves IP addresses of all virtual machines in a cloud, talks to them and forms a list of active Ignite nodes (the machines with running Ignite instances). Every node stores a copy of such an up-to-date list. All this lets the nodes to discover each other automatically.

    Having this in mind, let's set up and use TcpDiscoveryCloudIpFinder. As an example I'll keep using Google Compute Engine platform but you can use any provider from this list.

    TcpDiscoverySpi spi = new TcpDiscoverySpi();
    
    TcpDiscoveryCloudIpFinder ipFinder = new TcpDiscoveryCloudIpFinder();
    
    // Configuration for Google Compute Engine.
    ipFinder.setProvider("google-compute-engine");
    ipFinder.setIdentity(yourServiceAccountEmail);
    ipFinder.setCredential(pathToYourPemFile);
    ipFinder.setZones(Arrays.asList("us-central1-a", "asia-east1-a"));
    
    spi.setIpFinder(ipFinder);
    
    IgniteConfiguration cfg = new IgniteConfiguration();
     
    // Override default discovery SPI.
    cfg.setDiscoverySpi(spi);
     
    // Start Ignite node.
    Ignition.start(cfg);
    

    First three parameters that are passed to the instance of the IP finder in the code above are set according to jclouds requirements:
    • Provider name is equal to a value stored in "Maven Artifact ID" column of the following table. Compute Engine's value is "google-compute-engine", that's why it's passed to ipFinder.setProvider();
    • Both identity and credential are provider specific. To get a provider specific instructions click on a provider name in "Provider" column in the same table. Google Compute authentication method is an exception, it's authentication method is covered on this page.
    The last parameter (zones) that is set for the IP finder is optional. However, I highly recommend not to ignore it if it's possible for your case. The reason is simple. If zones are not set then TcpDiscoveryCloudIpFinder will check every zone, that a provider has, looking for Ignite nodes. This can drop the performance significantly.

    In case of my example you can easily form a zones' list by going to Google Developer Console and checking where your VMs are located. This simple step can improve your performance many times.

    When you finished with the parameters, enable Google Compute Engine API in Google Developer Console and benefit from this new IP finder!
    1

    View comments

  6. This post I want to dedicate to one useful feature of Apache Ignite that I had a chance to contribute to the project and that became a part of it since release 1.1.0-incubating.

    Actually, this will be my first post about this Apache project and for those who are unfamiliar with it I would recommend to overview a good quick start documentation that can be found here.

    The feature I'm going to cover is related to nodes discovery process located in a cluster in case when the cluster is a set of virtual machines deployed on Google Compute Engine. Historically Apache Ignite offered us two ways for the discovery process - leveraging multicast protocol or pre-configuring nodes' IP addresses. Unfortunately, neither way works well for Google VMs (virtual machines) instances: multicast is not supported in Compute Engine network at all and static IP configuration is not flexible (machine IP address may change from time to time, machines can be removed or added to a cluster, etc.).

    However, there is a solution to this problem called TcpDiscoveryGoogleStorageIpFinder.

    This IP finder does exactly what its name says. It lets virtual nodes located in Compute Engine network discover each other automatically using Google Cloud Storage where Apache Ignite's kernal will keep track of all cluster's nodes.

    What you need to do to activate this feature is set TcpDiscoveryGoogleStorageIpFinder as an IP finder for a VM node.

    TcpDiscoverySpi spi = new TcpDiscoverySpi();
    
    TcpDiscoveryGoogleStorageIpFinder ipFinder = new TcpDiscoveryGoogleStorageIpFinder();
    
    ipFinder.setServiceAccountId(yourServiceAccountId);
    ipFinder.setServiceAccountP12FilePath(pathToYourP12Key);
    ipFinder.setProjectName(yourGoogleCloudPlatformProjectName);
    
    ipFinder.setBucketName("your_bucket_name");
    
    spi.setIpFinder(ipFinder);
    
    IgniteConfiguration cfg = new IgniteConfiguration();
     
    // Override default discovery SPI.
    cfg.setDiscoverySpi(spi);
     
    // Start Ignite node.
    Ignition.start(cfg);
    

    That's all on the code side. Easy enough, right?
    Now let's go through the parameters that have to be set in order to make the IP finder workable.

    Considering that you're setting up everything from scratch perform the following steps:
    • Create a new project, open its "Overview" section, find "Project Number" and pass it to ipFinder.setProjectName() method call;
    • Activate Google Cloud Storage API by referring to this page and Google Cloud Storage JSON API by going to "APIs and auth"->"APIs" section in your console;
    • Open "APIs and auth"->"Credentials" section and create a service account by pressing on "Create new Client ID" button. Set account's email address to ipFinder.setServiceAccountId(). Generate and download new account's P12 key and pass a full path to its location using ipFinder.setServiceAccountP12FilePath();
    • Set unique Google Storage bucket name to ipFinder.setBucketName(). Note, that the name must be unique across the whole Google Cloud Platform and not just in your project.

    So these are all the steps you need to do if you want to run a cluster of Ignite nodes in Compute Engine network with nodes auto-discovery enabled.

    In the next post I will share another one generic solution that will enable nodes auto-discovery not only for Compute Engine network but for many other cloud platforms. Stay turned!  
    1

    View comments

About Me
About Me
Blog Archive
Loading