Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden


Published on

Agile, Continous Intergration, DevOps, Big data are not longer buzzwords but part of the day today process of everyone working in software development and delivery. To cope with applications that need to be deployed in production almost the same moment they were created, software development has changed, impacting the way of working for everyone in the team. In this talk, Roland will discuss the challenges performance testers face with Big Data applications and how Architecture, Agile, Continous Intergration and DevOps come together to create solutions.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden

  1. 1. Performance Testing of Big Data 26 april 2016
  2. 2. 2
  3. 3. 3
  4. 4. 13-04-20164
  5. 5. 5
  6. 6. 6 Big Data refers to data that, because of its size, speed, or format-- that is, its volume, velocity, or variety-- cannot be easily stored, manipulated or analyzed with traditional methods, like spreadsheets, relational databases, or common statistical software.
  7. 7. 7
  8. 8. 8 Production like Big Data Cluster Testdata TeraBytes / PetaBytes Testdata Management ? Load Generating Cluster
  9. 9. 9
  10. 10. 10 Corporate Data Architecture Data is Fast Before it’s Big. Data often comes in streams into data systems. Events happening hundreds to tens of thousands of times a second. The things we do with Fast Data : • Ingest – get millions of events per second into the system • Decide – make a data-driven decision on each event • Analyze in real time – provide visibility into operational trends of the events
  11. 11. 11 Lambda
  12. 12. 12 Kappa
  13. 13. 13 Component Performance Testing: These systems are made up of multiple components, and it is essential to test each of these components in isolation.
  14. 14. 14
  15. 15. 15 Storm is a distributed real-time computation system for processing large volumes of high- velocity data. Storm is extremely fast, with the ability to process over a million records per second per node on a cluster of modest size. Bolts can do anything from filtering, functions, aggregations, joins, talking to databases, and more. A spout is a source of streams in a topology. Streams are composed of tuples The logic for a realtime application is packaged into a Storm topology. A Storm topology is analogous to a MapReduce job. The tuple is the main data structure in Storm. A tuple is a named list of values, where each value can be any type.
  16. 16. 16
  17. 17. 17 Due to a lack of real-world streaming benchmarks, we developed one to compare Apache Flink, Apache Storm and Apache Spark Streaming. It is released as open source: Storm Benchmark tools authored by Taylor Goetz - Storm Benchmark authored by Manu Zhang -
  18. 18. 13-04-201618 Apache distribution • TestDFSIO read and write test for HDFS • TeraSort The goal of TeraSort is to sort 1TB of data (or any other amount) as fast as possible. It is a benchmark that combines testing the HDFS and MapReduce layers of an Hadoop cluster. • NNBench Is used for load testing the NameNode hardware and configuration. • MRBench Checks whether small jobs are responsive and running efficiently on your cluster. HiBench, a Hadoop benchmark suite consisting of both micro-benchmarks and real world applications applications
  19. 19. 19 Chukwa is an open source data collection system for monitoring and analyzing large distributed systems. It is built on top of Hadoop and includes a powerful and flexible toolkit for monitoring, analyzing, and viewing results. Many components of Chukwa are pluggable, allowing easy customization and enhancement. Monitoring
  20. 20. 20 Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Open sourced by at 08-04-2016
  21. 21. 21 Thinking Scalability Scalability is the ability of the software to keep up the performance even under increasing load by adding resources linearly. But achieving scalability requires more than just adding resources and tuning performance. To achieve scalability one needs to think holistically about software design, quality, maintainability and performance aspects. Necessary conditions for Scalability • Software has sound architecture and high quality • Software is easy to release, monitor and tweak. • Software performance can keep up with additional load by adding resources linearly.
  22. 22. 22
  23. 23. 23
  24. 24. Q & A Praegus B.V. - Experts in Testing & Test Automation 24
  25. 25. 25
  26. 26. 26 Docker lets you limit a container’s CPU resources with the –cpu-shares flag DataBase @1024 ~66% WebServer @512 ~14% Total Shares 1536 DataBase @1024 ~28% WebServer @512 ~33% Total Shares 3584 ApplicationServer @2048 ~57% CPU shares differ from memory limits in that they’re enforced only when there is contention for time on the CPU. If other processes and containers are idle, then the container may burst well beyond its limits.