Making Hadoop and Cassandra Work Together

3,296 views

Published on

On Aug 21-23, 2012, Altoros took part in NoSQL Now! and the company’s CEO, Renat Khasanshyn, presented a session “Making Hadoop and Cassandra Work Together to Process 5+ TB of Data Daily.”

His session included the description of real data-intensive projects the company has developed. Renat demonstrated the business challenges the customers wanted to solve and explained how a NoSQL database accelerated by Hadoop enabled the customer to achieve great performance results in high-load applications.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,296
On SlideShare
0
From Embeds
0
Number of Embeds
903
Actions
Shares
0
Downloads
38
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Making Hadoop and Cassandra Work Together

  1. 1. Making Hadoop & Cassandra work together © Altoros Systems, Inc.
  2. 2. About Altoros  Software delivery acceleration specialist for big data application implementation services  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)  Big data practice areas Automated device analytics Advertising analytics Big data warehouseCustomersPartners Implementation Partner © Altoros Systems, Inc.
  3. 3. The Product © Altoros Systems, Inc.
  4. 4. The Problem: Data is Big 10-20 sensors per house Ability to support tens of thousands of households 1 sensor ~1.1 MB/day 1,000 Households: 11 GB/day 500,000 Households: 5TB/day © Altoros Systems, Inc.
  5. 5. The Dashboard © Altoros Systems, Inc.
  6. 6. Full Visibility © Altoros Systems, Inc.
  7. 7. The Problem: Performance MySQL showed slow performance under intensive writes Target throughput isn’t scalable Disk performance is a bottleneck Monitoring with iostat -dmx Old fashion single-threaded batch processing is slow Make it parallel! © Altoros Systems, Inc.
  8. 8. Requirements High responsive system with parallel processing Reliable – Partial failure is acceptable – Node and data recoverability Scalable – Load capacity – Max throughput Total cost of ownership – Data compression © Altoros Systems, Inc.
  9. 9. NoSQL Database Requirements – Fast writes are critical – Querying by column and range of keys – Secondary indices – Good map/reduce compatibility using Apache Hadoop © Altoros Systems, Inc.
  10. 10. © Altoros Systems, Inc.
  11. 11. Why Cassandra – Good overall balance of features, scalability, reliability – We wanted BigTable-like features: columns, column families – Well suited for large streams of non-transactional data – Provides good, consistent write throughput – Tunable trade-offs for distribution and replication (N, R, W) © Altoros Systems, Inc.
  12. 12. File system HDFS – Is a file system behind our Cassandra implementation – Data coherency: write-once-read-many access © Altoros Systems, Inc.
  13. 13. Cassandra: Best Used When… When you write more than you read (logging). If every component of the system must be in Java You need/may need in the future complex configuration requirements © Altoros Systems, Inc.
  14. 14. Cassandra Challenges  High, Unpredictable Write Volume  Varying Schema, Variable Msg Size  2 Types of Series - Data, Lookups  All time-series, even metadata - no supplemental DB © Altoros Systems, Inc.
  15. 15. No Cassandra Compression? Built-in Cassandra compression claims to compress across columns with identical names. All our data columns are timestamped, so no two will ever have identical names. © Altoros Systems, Inc.
  16. 16. Numbers “Benchmark” Cassandra node LZO Compression © Altoros Systems, Inc.
  17. 17. Lessons Learned Consider hybrid RDBMS + NoSQL + Hadoop Hadoop Is for offline processing and analysis Is NOT for random reading and writing records Cassandra complements Hadoop with querying capabilities © Altoros Systems, Inc.
  18. 18. Thank you! @renatkhasanshyn @altorosrenat.k@altoros.com © Altoros Systems, Inc.

×