1. In the previous article, we reviewed several practical performance hints that let us exploit SSDs at their peak. Today you will see how to go further and improve random write workloads of a database by replacing regular SSDs with Intel Optane SSDs powered by cutting-edge 3D-XPoint technology.

    What might be wrong with sustained random write workloads on good-old SSDs? Those who read the previous article will remember that regular SSDs have to perform garbage collection routines endlessly by erasing blocks with stale data. Since any kind of garbage collection inevitably leads to performance degradation, SSDs manufactures came up with the over-provisioning technique that reserves some amount of space for cleaning needs. However, this space is limited and can be exhausted pretty quick depending on a workload.

    Thus, it is a prevalent situation to get the top I/O numbers on regular SSDs in the first hours of operation and then suddenly hit a significant performance drop while keeping the workload the same.  The picture below shows this exact tendency:


    Picture 1.

    Indeed, the Apache Ignite community could reproduce a similar curve for its native persistence witnessing how SSDs speeded up to 100 MB/s in first 5 minutes of usage and slowed down to 20 MB/s just in a matter of 2 hours under a sustained random write workload.

    Intel Optane SSDs (P4800X Series) to the Rescue

    Once Intel Optane SSDs hit the market, Ignite performance geeks ran the same set of benchmarks that put a sustained write workload on the cluster throughout 10 hours. In these benchmarks, Ignite persistence used Intel Optane SSDs instead of regular SSDs. When the benchmarks were over, we were staring at the graph below in (almost) disbelief:
       
    Picture 2.

    The graph shows that the random write throughput (red curve) did not fall dramatically even within 10 hours of sustained workload. To be more specific, ~20% decline was spotted after 6 hours of execution. That's negligible in comparison to the drops observed on regular SSDs under the same conditions that might be as big as 70%.

    Furthermore, the benchmarks generated a sustained random read workload (blue curve) at 2 to 5 and at 7- to 10-hour intervals. The graph suggests that the reads do not notably affect the throughput of the writes.

    Generally speaking, the benchmarks assured us that Optane SSDs do not suffer from the garbage collection routines that require spare memory to operate efficiently (over-provisioning). We were told that there is no spare area on an Intel Optane SSD, the media doesnā€™t require it.  Also, Intel engineers shared with us that memory management is far simpler with this media, only a fraction as complex as with NAND-based NVM that everyone else uses.  Thus, when it comes to writes, you can expect to see a greater balance of IOPS and overall quality of service performance.

    It is worth mentioning that Intel Optane SSDs were mounted as standard disk drives during the benchmarking and accessed using generic Java File I/O APIs. Curious to see what the benchmarks will show once Apache Ignite supports Intel Optane SSDs more natively. By the way, there are even more upcoming performance improvements to the Java IO APIā€™s in Java 10 (that is scheduled for March 2018).













    0

    Add a comment

  2. As a software guy, I was always curious to know how the things work at the hardware level and how to apply the knowledge for more advanced optimizations in applications. Take Java Memory Model for instance. The model grounds its memory consistency and visibility properties on keywords such as volatile or synchronize. But these are just the language keywords and you start looking around how JVM engineers could turn the model in life. At some point, you will breathe out revealing that the model utilizes a low-level instruction set for mutexes and memory barriers at the very bottom of the software pie running on physical machines. Nice, these are the instructions a CPU understands but the curiosity drives you further because it is still vague how all the memory consistency guarantees can be satisfied on multi-CPU machines with several CPU registers and caches. Well, the hardware guys took care of this by supporting the cache coherence protocol. And finally you, as a software guy, can develop highly-performant applications that halt CPUs and invalidate their caches only on purpose with all these volatile, synchronize and final keywords.

    Apache Ignite veterans tapped into the knowledge above and, undoubtedly, could deliver one of the fastest in-memory database and computational platform. Presently, the same people are optimizing Ignite Native Persistence - Ignite's distributed and transactional persistence layer. Being a part of that community, let me share some tips about solid-state drives (SSDs) that you, as a software guy, can exploit in Ignite or other disk-based databases deployments.

    SSD Level Garbage Collection

    Garbage collection (GC) term is used not only by Java developers to describe the process of purging dead objects from Java heap residing in RAM. Hardware guys use the same term for the same purpose but in relation to SSDs.

    In simple words, an SSD stores data in pages. Pages are grouped in blocks (usually 128/256 pages per block). The SSD driver can write data directly into an empty page but can clean the whole blocks only. Thus, to reclaim the space occupied by invalid data, all the valid data from one block has to be first copied into empty pages of another block. Once this happens, the driver will purge all the data from the first block giving more space for new data arriving from your applications.
    This process happens in the background and called with a familiar term - garbage collection (GC).

    So, if you suddenly observe a performance drop under a steady load like it's shown in Figure 1. below, do not be trapped blaming your application or Apache Ignite. The drop might be caused by SSD GC routines.


    Figure 1.


    Let me give you several hints on how to decrease the impact of the SSD GC on the performance of applications.

    Separate Disk Devices for WAL and Data/Index Files

    Apache Ignite arranges data and indexes in special partition files on disk. This type of architecture does not require you to have all the data in RAM, if something is missing there Apache Ignite will find the data on disk in these files.

    Figure 2.

    However, referring to Figure 2., every data (1) that is received by Apache Ignite cluster node will be stored in RAM and persisted (2) in a write-ahead log (WAL) first. This is done by performance reasons and once the update is in the WAL, your application will get the acknowledgment and be able to execute its logic. Then, in the background, the checkpointing process will update the partition files by copying dirty pages from RAM to disk (4). Specific WAL files will be archived over the time and can be safely removed because all the data will be already in the partition files.

    So, what's the performance hint here? Consider using separate SSDs for the partition files and the WAL. Apache Ignite actively writes to both places, thus, by having separate physical disk devices for each you may double the overall write throughput. See how to tweak the configuration for that.


    SSD Over-provisioning

    As the Java heap, SSD requires free space to perform efficiently and to avoid significant performance drops due to the GC. All SSD manufactures reserve some amount of space for that purpose. This is called over-provisioning.

    Here are you, as a software guy, should keep in mind that the performance of random writes on a 50% filled disk is much better than on a 90% filled disk because of the SSDs over-provisioning and GC. Consider buying SSDs with higher over-provisioning rate and make sure a manufacturer supports the tools to adjust it.

    Anything else?

    That's enough for the beginning. If you are a sort of the guy who wants to get most of the hardware by tweaking page size or swapping settings refer to this tuning page maintained by Apache Ignite community.


    2

    View comments

  3. Bothersome Ad

    The strange thing happens over the time when I try to google for 'apache ignite' - Hazelcast's advertisement bubbles up to the top of the list suggesting that Hazelcast is up to 50% faster than Apache Ignite:

    The first suspicious thing to note right after you click on the link is that Hazelcast compares to Apache Ignite 1.5 that was released more than a year ago! Secondly, I totally agree that it's fine to boast about your success stories for some period of time but it's funny to see when this continues throughout a year without updating benchmarking results on the targeted page.

    Well, this seems to be an oversight on Hazelcast's marketing team side. This happens. So, let's help the team to go back to the reality and show a present state of affairs comparing the latest versions of Apache Ignite and Hazelcast.

    General Benchmarking

    The simplest way to benchmark a distributed platform like Apache Ignite or Hazelcast is to launch a cluster of several machines and run a client process that will produce the load and gather the benchmarking results. For the sake of general benchmarking, a cluster of 4 server/data nodes was prepared on AWS and the load was coming from a single client machine (aka. application). Yardstick was used as a benchmarking framework. All the parameters and instructions are listed below:


    AWS EC 2 Configuration
    EC 2 Instance
    r4.2xlarge
    CPU
    8
    RAM
    61 GB
    OS
    Ubuntu 16.04
    Java
    Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13



    Yardstick Configuration
    Nodes
    1 Client, 4 Servers
    Threads
    64
    Backups
    1, Synchronous
    Running Yardstick on Amazon
    https://github.com/apacheignite/yardstick-ignite#running-on-amazon
    Yardstick and Clusters Configurations
    https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

    Following "Running Yardstick on Amazon" instruction with provided configurations we can reproduce these numbers:


    Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-1-s-4-sm-FULL_SYNC-b-1

    It's obvious that Apache Ignite 1.9 significantly outperforms Hazelcast 3.8.1 in most of the basic operations pulling ahead on up to 160% in some of the scenarios.

    At the same time, we can see that Hazelcast performs better in some atomic operations going ahead Apache Ignite on up to 4%. Honestly, that's great to know that there is still a room for performance improvements in Apache Ignite and that Hazelcast doesn't make the life of Ignite's performance engineers easier.

    However, after that performance loss was spotted it was decided to run the same set of the benchmarks but under the higher load that is more relevant to production scenarios - the load was generated by 8 client machines (aka. applications) rather than by a single one.  The results were surprising and uplifting as we can see from the next section.

    Put More Load

    This is the only part of the previously provided Yardstick configuration that was modified:


    Yardstick Configuration
    Nodes
    8 Client, 4 Servers
    Yardstick and Clusters Configurations
    https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

    At all, after the total number of client machines was increased from 1 to 8 the following numbers were reproduced:


    Complete results: https://github.com/gridgain/yardstick/tree/master/results/HZ-3.8.1-vs-IGNITE-1.9-c-8-s-4-sm-FULL_SYNC-b-1

    These are the numbers taken from one of the client machines. To get the total number of operations per second we just need to accumulate all of them. In any case, looking at the results now we see that Apache Ignite beats Hazelcast even under the higher load for every benchmark.

    For instance, Apache Ignite ANSI-99 SQL engine now outperforms Hazelcast's predicates-based querying engine on 200% while in the 1 client machine scenario the difference was only around 80%.

    Even more, Apache Ignite took a lead at all the atomic benchmarks jumping from 4% it lost to Hazelcast before to victorious 42% for atomic-put-get-bs-6 scenario.

    The Upshot

    It's always up to you to decide what kind of product to use in production. But the golden rule is that you shouldn't blindly follow official numbers or data prepared by a vendor. Use all the information as a basement and then get to know a product and test it for your own scenario. Only this way you will find out which product suits your case more.


    9

    View comments

About Me
About Me
Blog Archive
Loading