Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Making Hadoop & Cassandra       work together          © Altoros Systems, Inc.
About Altoros  Software delivery acceleration specialist for big data application implementation   services  200+ employ...
The Product              © Altoros Systems, Inc.
The Problem: Data is Big 10-20 sensors per house Ability to support tens of thousands of households 1 sensor ~1.1 MB/da...
The Dashboard                © Altoros Systems, Inc.
Full Visibility                  © Altoros Systems, Inc.
The Problem: Performance MySQL showed slow performance under intensive writes     Target throughput isn’t scalable Disk ...
Requirements High responsive system with parallel processing Reliable   – Partial failure is acceptable   – Node and dat...
NoSQL Database Requirements   –   Fast writes are critical   –   Querying by column and range of keys   –   Secondary indi...
© Altoros Systems, Inc.
Why Cassandra  – Good overall balance of features, scalability, reliability  – We wanted BigTable-like features: columns, ...
File system HDFS   – Is a file system behind our Cassandra implementation   – Data coherency: write-once-read-many access...
Cassandra: Best Used When… When you write more than you read (logging). If every component of the system must be in Java...
Cassandra Challenges    High, Unpredictable Write Volume    Varying Schema, Variable Msg Size    2 Types of Series - Da...
No Cassandra Compression? Built-in Cassandra compression claims to compress  across columns with identical names. All ou...
Numbers          “Benchmark” Cassandra node               LZO Compression                    © Altoros Systems, Inc.
Lessons Learned Consider hybrid     RDBMS + NoSQL + Hadoop Hadoop     Is for offline processing and analysis     Is NOT ...
Thank you! @renatkhasanshyn      @altorosrenat.k@altoros.com      © Altoros Systems, Inc.
Upcoming SlideShare
Loading in …5
×

Making Hadoop and Cassandra Work Together

3,341 views

Published on

On Aug 21-23, 2012, Altoros took part in NoSQL Now! and the company’s CEO, Renat Khasanshyn, presented a session “Making Hadoop and Cassandra Work Together to Process 5+ TB of Data Daily.”

His session included the description of real data-intensive projects the company has developed. Renat demonstrated the business challenges the customers wanted to solve and explained how a NoSQL database accelerated by Hadoop enabled the customer to achieve great performance results in high-load applications.

Published in: Technology, Business
  • Be the first to comment

Making Hadoop and Cassandra Work Together

  1. 1. Making Hadoop & Cassandra work together © Altoros Systems, Inc.
  2. 2. About Altoros  Software delivery acceleration specialist for big data application implementation services  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)  Big data practice areas Automated device analytics Advertising analytics Big data warehouseCustomersPartners Implementation Partner © Altoros Systems, Inc.
  3. 3. The Product © Altoros Systems, Inc.
  4. 4. The Problem: Data is Big 10-20 sensors per house Ability to support tens of thousands of households 1 sensor ~1.1 MB/day 1,000 Households: 11 GB/day 500,000 Households: 5TB/day © Altoros Systems, Inc.
  5. 5. The Dashboard © Altoros Systems, Inc.
  6. 6. Full Visibility © Altoros Systems, Inc.
  7. 7. The Problem: Performance MySQL showed slow performance under intensive writes Target throughput isn’t scalable Disk performance is a bottleneck Monitoring with iostat -dmx Old fashion single-threaded batch processing is slow Make it parallel! © Altoros Systems, Inc.
  8. 8. Requirements High responsive system with parallel processing Reliable – Partial failure is acceptable – Node and data recoverability Scalable – Load capacity – Max throughput Total cost of ownership – Data compression © Altoros Systems, Inc.
  9. 9. NoSQL Database Requirements – Fast writes are critical – Querying by column and range of keys – Secondary indices – Good map/reduce compatibility using Apache Hadoop © Altoros Systems, Inc.
  10. 10. © Altoros Systems, Inc.
  11. 11. Why Cassandra – Good overall balance of features, scalability, reliability – We wanted BigTable-like features: columns, column families – Well suited for large streams of non-transactional data – Provides good, consistent write throughput – Tunable trade-offs for distribution and replication (N, R, W) © Altoros Systems, Inc.
  12. 12. File system HDFS – Is a file system behind our Cassandra implementation – Data coherency: write-once-read-many access © Altoros Systems, Inc.
  13. 13. Cassandra: Best Used When… When you write more than you read (logging). If every component of the system must be in Java You need/may need in the future complex configuration requirements © Altoros Systems, Inc.
  14. 14. Cassandra Challenges  High, Unpredictable Write Volume  Varying Schema, Variable Msg Size  2 Types of Series - Data, Lookups  All time-series, even metadata - no supplemental DB © Altoros Systems, Inc.
  15. 15. No Cassandra Compression? Built-in Cassandra compression claims to compress across columns with identical names. All our data columns are timestamped, so no two will ever have identical names. © Altoros Systems, Inc.
  16. 16. Numbers “Benchmark” Cassandra node LZO Compression © Altoros Systems, Inc.
  17. 17. Lessons Learned Consider hybrid RDBMS + NoSQL + Hadoop Hadoop Is for offline processing and analysis Is NOT for random reading and writing records Cassandra complements Hadoop with querying capabilities © Altoros Systems, Inc.
  18. 18. Thank you! @renatkhasanshyn @altorosrenat.k@altoros.com © Altoros Systems, Inc.

×