Cassandra conference

  • 3,589 views
Uploaded on

Issues and Tips for Big Data on Cassandra, by Shotaro Kamio, Rakuten. …

Issues and Tips for Big Data on Cassandra, by Shotaro Kamio, Rakuten.
2011/10/05, Cassandra Conference

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,589
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
34
Comments
1
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Issues and Tips for Big Data on Cassandra Shotaro KamioArchitecture and Core Technology dept., DU, Rakuten, Inc. 1
  • 2. Contents1 Big Data Problem in Rakuten2 Contributions to Cassandra Project3 System Architecture4 Details and Tips5 Conclusion 2
  • 3. Contents1 Big Data Problem in Rakuten2 Contributions to Cassandra Project3 System Architecture4 Details and Tips5 Conclusion 3
  • 4.    Total size M on th -Y Ju ear n De -9 c 7 Ju -97 n De -9 c- 8 Ju 98 n De -99 c Ju -99 n Ja -00 n Ju -00 n De -01 c Ju -01 n De -0 c 2 Ju -02 n De -0 More than 1 billion records. c- 3 Ju 03 n De -0 c 4 – Double its size every second year. Ju -04 n De -05 User data increases exponentially. c Ju -05 n De -06 c Ju -06 n De -07 c Ju -07 n De -0 Big Data Problem in Rakuten c- 8 Ju 08 n De -0 c 9 Ju -09 2 years n De -1 c- 0 We need a scalable solution to handle this big data. x2 104
  • 5. Importance of Data Store in Rakuten• Rakuten have a lot of data – User data, item data, reviews, etc.• Expect connectivity to Hadoop• High-performance, fault-tolerant, scalable storage is necessary → Cassandra Service A Service B Service C … Data A Data B 5
  • 6. Performance of New System (Cassandra) Store all data in 1 day – Achieved 15,000 updates/sec with quorum. – 50 times faster than DB. 15,000 updates/sec Good read throughput – Handle more than 100 read threads at a time. x 50 DB New 6
  • 7. Contents1 Big Data Problem in Rakuten2 Contributions to Cassandra Project3 System Architecture4 Details and Tips5 Conclusion 7
  • 8. Contributions to Cassandra Project• Tested 0.7.x - 0.8.x• Bug reports / Feedback to JIRA – CASSANDRA-2212, 2297, 2406, 2557, 2626 and more – Bugs related to specific condition, secondary index and large dataset.• Contribute patches – Talk this in later slides. 8
  • 9. JIRA: Overflow in bytesPastMark(..)• https://issues.apache.org/jira/browse/CASSANDRA-2297• Hit the error on a row which is more than 60GB – The row has column families of super column type• bytesPastMark method was fixed to return long value. 9
  • 10. JIRA: Stack overflow while compacting• https://issues.apache.org/jira/browse/CASSANDRA-2626• Long series of compaction causes stack overflow.← This occurs with large dataset.• Helped debugging. 10
  • 11. Challenges in OSS• Not well tested with real big data.→ Rakuten can feedback a lot to community. – Bug report, patches, and communication.• OSS becomes much stable. Feedback 11
  • 12. Contribution of Patches• Column name aliasing – Encode column name in compact way. – Useful to reduce data size for structured (relational) data. – Reduce SSTable size by 15%.• Variable-length quantity (VLQ) compression – Reduce encoding overhead in columns – Reduce SSTable size by 17%. 12
  • 13. VLQ Compression Patch• Serializer is changed to use VLQ encoding.• Typical column has fixed length of: – 2 bytes for column name length – 1 byte for flag – 8 bytes for TTL, deletion time – 8 bytes for timestamp – 4 bytes for length of value.• Those encoding overheads are reduced. 13
  • 14. Contents1 Big Data Problem in Rakuten2 Contributions to Cassandra Project3 System Architecture4 Details and Tips5 Conclusion 14
  • 15. System Architecture DB … DB Cassandra 1 B atch Data feeder         DB Services B atch … DB … DB Cassandra 2 Backup 15
  • 16. System Architecture DB … DB Cassandra 1 B atch Data feeder         DB Services B atch … DB … DB Cassandra 2 Backup 16
  • 17. Planning: Schema Design• Data modeling is a key of scalability.• Design schema – Query patterns for super column and normal column.• Think queries based on use cases. – Batch operation to reduce number of requests because Thrift has communication overhead.• Secondary Index – We used it to find out updated data.• Choose partitioner appropriately. – One partitioner for a cluster. 17
  • 18. Secondary Index• Pros – Useful to query based on a column value. – It can reduce consistency problem. – For example, to query updated data based on update-time.• Cons – Performance of complex query depends on data. E.g., Year == 2011 and Price < 100 18
  • 19. A Bit Detail of Secondary Index Works like a hash + filters. 1. Pick up a row which has a key for the index (hash). 2. Apply filters. – Collect the result if all filters are matched. 1. Repeat until the requested number of rows are obtained. E.g., Year == 2011 and Price < 100Key1 Year = 2011Key2 Year = 2011 Price = 1,000 Many keys of year = 2011,Key3 Year = 2011 Price = 10 but a few results.Key4 Year = 2011 Price = 10,000Key5 Year = 2011 Price = 200 19
  • 20. A Bit Detail of Secondary Index (2) Consider the frequency of results for the query – Very few result in large data set → query might get timeout. Careful data/query design is necessary at this moment. Improvement is discussed: CASSANDRA-2915 20
  • 21. Planning: Data Size Estimation• Estimate future data volume• Serialization overhead: x 3 - 4 – Big overhead for small data. – We improved with custom patches, compression code • Cassandra 1.0 can use Snappy/Deflate compression.• Replication: x 3 (depends on your decision)• Compaction: x 2 or above 21
  • 22. Other Factors for Data Size• Obsolete SSTables – Disk usage may keep high after compaction. – Cassandra 0.8.x relies on GC to remove obsolete SSTables. – Improved in 1.0.• How to balance data distribution – Disk usage can be unbalanced (ByteOrderedPartitioner). – Partitioning, key design, initial token assignment. – Very helpful if you know data in advance.• Backup scheme affects disk space – Need backup space. – Discuss later. 22
  • 23. Configuration• We adopted Cassandra 0.8.x + custom patches.• Without mmap – No noticeable difference on performance – Easier to monitor and debug memory usage and GC related issues• ulimit – Avoid file descriptor shortage. Need more than number of db files. Bug?? – “memlock unlimited” for JNA – Make /etc/security/limits.d/cassandra.conf (Redhat) 23
  • 24. JVM / GC• Have to avoid Full GC anytime.• JVM cannot utilize large heap over 15G. – Slow GC. Can be unstable. – Don’t give too much data/cache into heap. – Off-heap cache is available in 0.8.1• Cassandra may use more memory than heap size. – ulimit –d 25000000 (max data segment size) – ulimit –v 75000000 (max virtual memory size)• Need benchmark to know appropriate parameters. 24
  • 25. Parameter Tuning for Failure Detector• Cassandra uses Phi Accrual Failure Detector – The Φ Accrual Failure Detector [SRDS04] double phi(long tnow)• Failure detection error occurs { when node is having too much int size = arrivalIntervals_.size(); double log = 0d; access and/or GC running if ( size > 0 ) { double t = tnow - tLast_;• Depends on number of nodes: double probability = p(t); log = (-1) * Math.log10( probability ); – Larger cluster, larger number. } return log; } double p(double t) { double mean = mean(); double exponent = (-1)*(t)/mean; return Math.pow(Math.E, exponent); } 25
  • 26. Hardware• Benchmark is important to decide hardware. – Requirements for performance, data size, etc. – Cassandra is good at utilizing CPU cores.• Network ports will be bottleneck to scale-out… – Large number of low-spec servers or – Small number of high-spec servers. Our case: • High-spec CPU and SSD drives • 2 clusters (active and test cluster) 26
  • 27. System Architecture DB … DB Cassandra 1 B atch Data feeder         DB Services B atch … DB … DB Cassandra 2 Backup 27
  • 28. Customize Hector Library• Query can timeout on Cassandra: – When Cassandra is in high load temporarily. – Request of large result set – Timeout of secondary index query• Hector retries forever when query get timed-out.• Client cannot detect infinite loop.• Customize: – 3 Timeouts to return exception to client. 28
  • 29. System Architecture DB … DB Cassandra 1 B atch Data feeder         DB Services B atch … DB … DB Cassandra 2 Backup 29
  • 30. Testing: Data Consistency Check Tool • We wanted to make sure data is not corrupted within Cassandra. • Made a tool to check the data consistency. Input data- Insert (Periodically- Update comes in)- Delete Process A Insert, update, and delete dataAnother Process B Cassandradatabase Compare data with that in Cassandra 30
  • 31. Testing: Data Consistency Check Tool (2) Compare only keys of data, not contents. Useful to diagnose which part is wrong in test phase. We found out other team’s bug as well 31
  • 32. Repair• Some types of query doesn’t trigger read repair.• Nodetool repair is tricky on big data. – Disk usage – Time consuming→ Read all data afterward: Read repair• Discussion for improvement is going on: – CASSANDRA-2699 32
  • 33. System Architecture DB … DB Cassandra 1 B atch Data feeder         DB Services B atch … DB … DB Cassandra 2 Backup 33
  • 34. Backup Scheme Backup might be required to shorten recovery time.1. Snapshot to local disk – Plan disk size at server estimation phase.1. Full backup of input data – We had full data feed several times for various reasons: E.g., Logic change, schema change, data corruption, etc. DB Incoming … DB data Cassandra Backup Snapshot Snapshot Snapshot 34
  • 35. Contents1 Big Data Problem in Rakuten2 Contributions to Cassandra Project3 System Architecture4 Details and Tips5 Conclusion 35
  • 36. Conclusion• Rakuten uses Cassandra with Big data.• We’ll continue contributing to OSS. 36
  • 37. 最後に・・・ちょっと宣伝させてください・・・ 37
  • 38. We are hiring! 中途採用を大募集しております!楽天のMission人と社会を(ネットを通じて)Empowermentし自らの成功を通じ社会を変革し豊かにする楽天のGOAL To become No.1 Internet Service Company in the World楽天のMission&GOALに共感いただける方は是非ご連絡ください!  tech-career@mail.rakuten.com 38