SlideShare a Scribd company logo
1 of 38
Download to read offline
Issues and Tips for Big Data
       on Cassandra



                     Shotaro Kamio
Architecture and Core Technology dept., DU, Rakuten, Inc.   1
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         2
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         3

                                      
                                                                                                         
                                                                                                                                                                    Total size
                                                                                                                                       M
                                                                                                                                        on
                                                                                                                                          th
                                                                                                                                             -Y
                                                                                                                                           Ju ear
                                                                                                                                              n
                                                                                                                                          De -9
                                                                                                                                              c 7
                                                                                                                                           Ju -97
                                                                                                                                              n
                                                                                                                                          De -9
                                                                                                                                              c- 8
                                                                                                                                           Ju 98
                                                                                                                                              n
                                                                                                                                          De -99
                                                                                                                                              c
                                                                                                                                           Ju -99
                                                                                                                                              n
                                                                                                                                           Ja -00
                                                                                                                                              n
                                                                                                                                           Ju -00
                                                                                                                                              n
                                                                                                                                          De -01
                                                                                                                                              c
                                                                                                                                           Ju -01
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 2
                                                                                                                                           Ju -02
                                                                                                                                              n
                                                                                                                                          De -0




    More than 1 billion records.
                                                                                                                                              c- 3
                                                                                                                                           Ju 03
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 4

                                                           – Double its size every second year.
                                                                                                                                           Ju -04
                                                                                                                                              n
                                                                                                                                          De -05
                                                                                                  User data increases exponentially.
                                                                                                                                              c
                                                                                                                                           Ju -05
                                                                                                                                              n
                                                                                                                                          De -06
                                                                                                                                              c
                                                                                                                                           Ju -06
                                                                                                                                              n
                                                                                                                                          De -07
                                                                                                                                              c
                                                                                                                                           Ju -07
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                                                                 Big Data Problem in Rakuten




                                                                                                                                              c- 8
                                                                                                                                           Ju 08
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 9
                                                                                                                                           Ju -09
                                                                                                                                                     2 years




                                                                                                                                              n
                                                                                                                                          De -1
                                                                                                                                              c- 0
    We need a scalable solution to handle this big data.
                                                                                                                                                               x2




                                                                                                                                                10
4
Importance of Data Store in Rakuten


• Rakuten have a lot of data
   – User data, item data, reviews, etc.
• Expect connectivity to Hadoop
• High-performance, fault-tolerant, scalable
  storage is necessary → Cassandra


             Service A           Service B   Service C   …



             Data A                Data B


                                                             5
Performance of New System (Cassandra)


   Store all data in 1 day
     – Achieved 15,000 updates/sec with quorum.
     – 50 times faster than DB.
                                              15,000 updates/sec
   Good read throughput
     – Handle more than 100 read threads at a
       time.
                                                x 50



                                                  DB   New


                                                              6
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         7
Contributions to Cassandra Project


• Tested 0.7.x - 0.8.x

• Bug reports / Feedback to JIRA
   – CASSANDRA-2212, 2297, 2406, 2557, 2626 and more
   – Bugs related to specific condition, secondary index and large
     dataset.
• Contribute patches
   – Talk this in later slides.




                                                                     8
JIRA: Overflow in bytesPastMark(..)


•   https://issues.apache.org/jira/browse/CASSANDRA-2297


• Hit the error on a row which is more than 60GB
     – The row has column families of super column type


• bytesPastMark method was fixed to return long value.




                                                           9
JIRA: Stack overflow while compacting


•   https://issues.apache.org/jira/browse/CASSANDRA-2626


• Long series of compaction causes stack overflow.
← This occurs with large dataset.

• Helped debugging.




                                                           10
Challenges in OSS


• Not well tested with real big data.
→ Rakuten can feedback a lot to community.
   – Bug report, patches, and communication.
• OSS becomes much stable.



                    Feedback




                                               11
Contribution of Patches


• Column name aliasing
  – Encode column name in compact way.
  – Useful to reduce data size for structured (relational)
    data.
  – Reduce SSTable size by 15%.
• Variable-length quantity (VLQ) compression
  – Reduce encoding overhead in columns
  – Reduce SSTable size by 17%.




                                                             12
VLQ Compression Patch


• Serializer is changed to use VLQ encoding.
• Typical column has fixed length of:
   –   2 bytes for column name length
   –   1 byte for flag
   –   8 bytes for TTL, deletion time
   –   8 bytes for timestamp
   –   4 bytes for length of value.
• Those encoding overheads are reduced.



                                               13
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         14
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   15
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   16
Planning: Schema Design


• Data modeling is a key of scalability.
• Design schema
   – Query patterns for super column and normal column.
• Think queries based on use cases.
   – Batch operation to reduce number of requests because Thrift has
     communication overhead.
• Secondary Index
   – We used it to find out updated data.
• Choose partitioner appropriately.
   – One partitioner for a cluster.




                                                                       17
Secondary Index


• Pros
   – Useful to query based on a column value.
   – It can reduce consistency problem.
   – For example, to query updated data based on update-time.
• Cons
   – Performance of complex query depends on data.
      E.g., Year == 2011 and Price < 100




                                                                18
A Bit Detail of Secondary Index


   Works like a hash + filters.
    1. Pick up a row which has a key for the index (hash).
    2. Apply filters.
        – Collect the result if all filters are matched.
    1. Repeat until the requested number of rows are obtained.

                                            E.g., Year == 2011 and Price < 100
Key1     Year = 2011

Key2     Year = 2011       Price = 1,000
                                                     Many keys of year = 2011,
Key3     Year = 2011       Price = 10                    but a few results.
Key4     Year = 2011       Price = 10,000

Key5     Year = 2011       Price = 200

                                                                                 19
A Bit Detail of Secondary Index (2)


   Consider the frequency of results for the query
     – Very few result in large data set → query might get
       timeout.
   Careful data/query design is necessary at this moment.
   Improvement is discussed: CASSANDRA-2915




                                                             20
Planning: Data Size Estimation


• Estimate future data volume
• Serialization overhead: x 3 - 4
   – Big overhead for small data.
   – We improved with custom patches, compression code
      • Cassandra 1.0 can use Snappy/Deflate compression.
• Replication: x 3 (depends on your decision)
• Compaction: x 2 or above




                                                            21
Other Factors for Data Size


• Obsolete SSTables
   – Disk usage may keep high after compaction.
   – Cassandra 0.8.x relies on GC to remove obsolete SSTables.
   – Improved in 1.0.

• How to balance data distribution
   – Disk usage can be unbalanced (ByteOrderedPartitioner).
   – Partitioning, key design, initial token assignment.
   – Very helpful if you know data in advance.



• Backup scheme affects disk space
   – Need backup space.
   – Discuss later.
                                                                 22
Configuration


• We adopted Cassandra 0.8.x + custom patches.
• Without mmap
   – No noticeable difference on performance
   – Easier to monitor and debug memory usage and GC related
     issues
• ulimit
   – Avoid file descriptor shortage. Need more than number of db
     files. Bug??
   – “memlock unlimited” for JNA
   – Make /etc/security/limits.d/cassandra.conf (Redhat)




                                                                   23
JVM / GC


• Have to avoid Full GC anytime.
• JVM cannot utilize large heap over 15G.
   – Slow GC. Can be unstable.
   – Don’t give too much data/cache into heap.
   – Off-heap cache is available in 0.8.1
• Cassandra may use more memory than heap size.
   – ulimit –d 25000000 (max data segment size)
   – ulimit –v 75000000 (max virtual memory size)
• Need benchmark to know appropriate parameters.




                                                    24
Parameter Tuning for Failure Detector


•   Cassandra uses Phi Accrual Failure Detector
     – The Φ Accrual Failure Detector [SRDS'04]

                                        double phi(long tnow)
•   Failure detection error occurs      {
    when node is having too much          int size = arrivalIntervals_.size();
                                          double log = 0d;
    access and/or GC running              if ( size > 0 )
                                          {
                                              double t = tnow - tLast_;
•   Depends on number of nodes:               double probability = p(t);
                                              log = (-1) * Math.log10( probability );
     – Larger cluster, larger number.     }
                                          return log;
                                        }
                                        double p(double t)
                                        {
                                            double mean = mean();
                                            double exponent = (-1)*(t)/mean;
                                            return Math.pow(Math.E, exponent);
                                        }

                                                                                    25
Hardware


• Benchmark is important to decide hardware.
   – Requirements for performance, data size, etc.
   – Cassandra is good at utilizing CPU cores.
• Network ports will be bottleneck to scale-out…
   – Large number of low-spec servers or
   – Small number of high-spec servers.



     Our case:
     • High-spec CPU and SSD drives
     • 2 clusters (active and test cluster)



                                                     26
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   27
Customize Hector Library


• Query can timeout on Cassandra:
   – When Cassandra is in high load temporarily.
   – Request of large result set
   – Timeout of secondary index query
• Hector retries forever when query get timed-out.
• Client cannot detect infinite loop.
• Customize:
   – 3 Timeouts to return exception to client.




                                                     28
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   29
Testing: Data Consistency Check Tool


   • We wanted to make sure data is not corrupted within
      Cassandra.
   • Made a tool to check the data consistency.
                                                 Input data
- Insert                                        (Periodically
- Update                                         comes in)
- Delete           Process A
                   Insert, update, and
                   delete data
Another
                   Process B                            Cassandra
database
                   Compare data with that
                   in Cassandra
                                                                    30
Testing: Data Consistency Check Tool (2)


   Compare only keys of data, not contents.
   Useful to diagnose which part is wrong in test phase.
   We found out other team’s bug as well




                                                            31
Repair


• Some types of query doesn’t trigger read repair.
• Nodetool repair is tricky on big data.
   – Disk usage
   – Time consuming
→ Read all data afterward: Read repair

• Discussion for improvement is going on:
   – CASSANDRA-2699




                                                     32
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   33
Backup Scheme

  Backup might be required to shorten recovery time.
1. Snapshot to local disk
    – Plan disk size at server estimation phase.
1. Full backup of input data
    – We had full data feed several times for various reasons:
       E.g., Logic change, schema change, data corruption, etc.


                                            DB

    Incoming




                                                 …
                                       DB



       data                           Cassandra

                    Backup
                                      Snapshot
                                       Snapshot
                                        Snapshot

                                                                  34
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         35
Conclusion


• Rakuten uses Cassandra with Big data.
• We’ll continue contributing to OSS.




                                          36
最後に・・・




ちょっと宣伝させてください・・・




                   37
We are hiring! 中途採用を大募集しております!

楽天のMission

人と社会を(ネットを通じて)Empowermentし
自らの成功を通じ社会を変革し豊かにする
楽天のGOAL
              To become No.1
         Internet Service Company
                in the World
楽天のMission&GOALに共感いただける方は是非ご連絡ください!

       tech-career@mail.rakuten.com
                                         38

More Related Content

Viewers also liked (6)

[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
 
第4回楽天研究開発シンポジウム.開会挨拶
第4回楽天研究開発シンポジウム.開会挨拶第4回楽天研究開発シンポジウム.開会挨拶
第4回楽天研究開発シンポジウム.開会挨拶
 
Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06
 
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
 
RIT (Rakuten Institute of Technology) presentation about UI/UX
RIT (Rakuten Institute of Technology) presentation about UI/UXRIT (Rakuten Institute of Technology) presentation about UI/UX
RIT (Rakuten Institute of Technology) presentation about UI/UX
 
Case Analysis Rakuten Ichiba
Case Analysis  Rakuten IchibaCase Analysis  Rakuten Ichiba
Case Analysis Rakuten Ichiba
 

Similar to Cassandra conference

art of presentation Map of Jamies Yam
art of presentation Map of Jamies Yamart of presentation Map of Jamies Yam
art of presentation Map of Jamies Yam
Jamies Yam
 
Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
guestf8bf20
 
Semester 1 Part 44
Semester 1 Part 44Semester 1 Part 44
Semester 1 Part 44
Chester Lech
 
Seattle Magazine 2012 Media Kit
Seattle Magazine 2012 Media KitSeattle Magazine 2012 Media Kit
Seattle Magazine 2012 Media Kit
jkueber
 

Similar to Cassandra conference (20)

art of presentation Map of Jamies Yam
art of presentation Map of Jamies Yamart of presentation Map of Jamies Yam
art of presentation Map of Jamies Yam
 
Tvr new map 2012
Tvr new map 2012Tvr new map 2012
Tvr new map 2012
 
Brocade Migration Example
Brocade Migration ExampleBrocade Migration Example
Brocade Migration Example
 
UBD Media Kit 2012
UBD Media Kit 2012UBD Media Kit 2012
UBD Media Kit 2012
 
Webster City Enterprise Zone Map
Webster City Enterprise Zone MapWebster City Enterprise Zone Map
Webster City Enterprise Zone Map
 
Report: HSE in the Oilfield
Report: HSE in the OilfieldReport: HSE in the Oilfield
Report: HSE in the Oilfield
 
Jun05 A01 Bct
Jun05 A01 BctJun05 A01 Bct
Jun05 A01 Bct
 
International Trade Compliance Strategy Responsibility Matrix
International Trade Compliance Strategy Responsibility MatrixInternational Trade Compliance Strategy Responsibility Matrix
International Trade Compliance Strategy Responsibility Matrix
 
9 18 Part 2
9 18 Part 29 18 Part 2
9 18 Part 2
 
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
 
The Content Creation Workflow of the Ship Simulator Game - A Case Study
The Content Creation Workflow of the Ship Simulator Game - A Case StudyThe Content Creation Workflow of the Ship Simulator Game - A Case Study
The Content Creation Workflow of the Ship Simulator Game - A Case Study
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
Timeline 1
Timeline 1Timeline 1
Timeline 1
 
Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
 
Hse Product Promo
Hse Product PromoHse Product Promo
Hse Product Promo
 
Hse product promo
Hse product promoHse product promo
Hse product promo
 
Exerpt From Exec Overview
Exerpt From Exec OverviewExerpt From Exec Overview
Exerpt From Exec Overview
 
Intimacy Web
Intimacy WebIntimacy Web
Intimacy Web
 
Semester 1 Part 44
Semester 1 Part 44Semester 1 Part 44
Semester 1 Part 44
 
Seattle Magazine 2012 Media Kit
Seattle Magazine 2012 Media KitSeattle Magazine 2012 Media Kit
Seattle Magazine 2012 Media Kit
 

More from Rakuten Group, Inc.

More from Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 

Cassandra conference

  • 1. Issues and Tips for Big Data on Cassandra Shotaro Kamio Architecture and Core Technology dept., DU, Rakuten, Inc. 1
  • 2. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 2
  • 3. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 3
  • 4.   Total size M on th -Y Ju ear n De -9 c 7 Ju -97 n De -9 c- 8 Ju 98 n De -99 c Ju -99 n Ja -00 n Ju -00 n De -01 c Ju -01 n De -0 c 2 Ju -02 n De -0 More than 1 billion records. c- 3 Ju 03 n De -0 c 4 – Double its size every second year. Ju -04 n De -05 User data increases exponentially. c Ju -05 n De -06 c Ju -06 n De -07 c Ju -07 n De -0 Big Data Problem in Rakuten c- 8 Ju 08 n De -0 c 9 Ju -09 2 years n De -1 c- 0 We need a scalable solution to handle this big data. x2 10 4
  • 5. Importance of Data Store in Rakuten • Rakuten have a lot of data – User data, item data, reviews, etc. • Expect connectivity to Hadoop • High-performance, fault-tolerant, scalable storage is necessary → Cassandra Service A Service B Service C … Data A Data B 5
  • 6. Performance of New System (Cassandra)  Store all data in 1 day – Achieved 15,000 updates/sec with quorum. – 50 times faster than DB. 15,000 updates/sec  Good read throughput – Handle more than 100 read threads at a time. x 50 DB New 6
  • 7. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 7
  • 8. Contributions to Cassandra Project • Tested 0.7.x - 0.8.x • Bug reports / Feedback to JIRA – CASSANDRA-2212, 2297, 2406, 2557, 2626 and more – Bugs related to specific condition, secondary index and large dataset. • Contribute patches – Talk this in later slides. 8
  • 9. JIRA: Overflow in bytesPastMark(..) • https://issues.apache.org/jira/browse/CASSANDRA-2297 • Hit the error on a row which is more than 60GB – The row has column families of super column type • bytesPastMark method was fixed to return long value. 9
  • 10. JIRA: Stack overflow while compacting • https://issues.apache.org/jira/browse/CASSANDRA-2626 • Long series of compaction causes stack overflow. ← This occurs with large dataset. • Helped debugging. 10
  • 11. Challenges in OSS • Not well tested with real big data. → Rakuten can feedback a lot to community. – Bug report, patches, and communication. • OSS becomes much stable. Feedback 11
  • 12. Contribution of Patches • Column name aliasing – Encode column name in compact way. – Useful to reduce data size for structured (relational) data. – Reduce SSTable size by 15%. • Variable-length quantity (VLQ) compression – Reduce encoding overhead in columns – Reduce SSTable size by 17%. 12
  • 13. VLQ Compression Patch • Serializer is changed to use VLQ encoding. • Typical column has fixed length of: – 2 bytes for column name length – 1 byte for flag – 8 bytes for TTL, deletion time – 8 bytes for timestamp – 4 bytes for length of value. • Those encoding overheads are reduced. 13
  • 14. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 14
  • 15. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 15
  • 16. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 16
  • 17. Planning: Schema Design • Data modeling is a key of scalability. • Design schema – Query patterns for super column and normal column. • Think queries based on use cases. – Batch operation to reduce number of requests because Thrift has communication overhead. • Secondary Index – We used it to find out updated data. • Choose partitioner appropriately. – One partitioner for a cluster. 17
  • 18. Secondary Index • Pros – Useful to query based on a column value. – It can reduce consistency problem. – For example, to query updated data based on update-time. • Cons – Performance of complex query depends on data. E.g., Year == 2011 and Price < 100 18
  • 19. A Bit Detail of Secondary Index  Works like a hash + filters. 1. Pick up a row which has a key for the index (hash). 2. Apply filters. – Collect the result if all filters are matched. 1. Repeat until the requested number of rows are obtained. E.g., Year == 2011 and Price < 100 Key1 Year = 2011 Key2 Year = 2011 Price = 1,000 Many keys of year = 2011, Key3 Year = 2011 Price = 10 but a few results. Key4 Year = 2011 Price = 10,000 Key5 Year = 2011 Price = 200 19
  • 20. A Bit Detail of Secondary Index (2)  Consider the frequency of results for the query – Very few result in large data set → query might get timeout.  Careful data/query design is necessary at this moment.  Improvement is discussed: CASSANDRA-2915 20
  • 21. Planning: Data Size Estimation • Estimate future data volume • Serialization overhead: x 3 - 4 – Big overhead for small data. – We improved with custom patches, compression code • Cassandra 1.0 can use Snappy/Deflate compression. • Replication: x 3 (depends on your decision) • Compaction: x 2 or above 21
  • 22. Other Factors for Data Size • Obsolete SSTables – Disk usage may keep high after compaction. – Cassandra 0.8.x relies on GC to remove obsolete SSTables. – Improved in 1.0. • How to balance data distribution – Disk usage can be unbalanced (ByteOrderedPartitioner). – Partitioning, key design, initial token assignment. – Very helpful if you know data in advance. • Backup scheme affects disk space – Need backup space. – Discuss later. 22
  • 23. Configuration • We adopted Cassandra 0.8.x + custom patches. • Without mmap – No noticeable difference on performance – Easier to monitor and debug memory usage and GC related issues • ulimit – Avoid file descriptor shortage. Need more than number of db files. Bug?? – “memlock unlimited” for JNA – Make /etc/security/limits.d/cassandra.conf (Redhat) 23
  • 24. JVM / GC • Have to avoid Full GC anytime. • JVM cannot utilize large heap over 15G. – Slow GC. Can be unstable. – Don’t give too much data/cache into heap. – Off-heap cache is available in 0.8.1 • Cassandra may use more memory than heap size. – ulimit –d 25000000 (max data segment size) – ulimit –v 75000000 (max virtual memory size) • Need benchmark to know appropriate parameters. 24
  • 25. Parameter Tuning for Failure Detector • Cassandra uses Phi Accrual Failure Detector – The Φ Accrual Failure Detector [SRDS'04] double phi(long tnow) • Failure detection error occurs { when node is having too much int size = arrivalIntervals_.size(); double log = 0d; access and/or GC running if ( size > 0 ) { double t = tnow - tLast_; • Depends on number of nodes: double probability = p(t); log = (-1) * Math.log10( probability ); – Larger cluster, larger number. } return log; } double p(double t) { double mean = mean(); double exponent = (-1)*(t)/mean; return Math.pow(Math.E, exponent); } 25
  • 26. Hardware • Benchmark is important to decide hardware. – Requirements for performance, data size, etc. – Cassandra is good at utilizing CPU cores. • Network ports will be bottleneck to scale-out… – Large number of low-spec servers or – Small number of high-spec servers. Our case: • High-spec CPU and SSD drives • 2 clusters (active and test cluster) 26
  • 27. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 27
  • 28. Customize Hector Library • Query can timeout on Cassandra: – When Cassandra is in high load temporarily. – Request of large result set – Timeout of secondary index query • Hector retries forever when query get timed-out. • Client cannot detect infinite loop. • Customize: – 3 Timeouts to return exception to client. 28
  • 29. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 29
  • 30. Testing: Data Consistency Check Tool • We wanted to make sure data is not corrupted within Cassandra. • Made a tool to check the data consistency. Input data - Insert (Periodically - Update comes in) - Delete Process A Insert, update, and delete data Another Process B Cassandra database Compare data with that in Cassandra 30
  • 31. Testing: Data Consistency Check Tool (2)  Compare only keys of data, not contents.  Useful to diagnose which part is wrong in test phase.  We found out other team’s bug as well 31
  • 32. Repair • Some types of query doesn’t trigger read repair. • Nodetool repair is tricky on big data. – Disk usage – Time consuming → Read all data afterward: Read repair • Discussion for improvement is going on: – CASSANDRA-2699 32
  • 33. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 33
  • 34. Backup Scheme  Backup might be required to shorten recovery time. 1. Snapshot to local disk – Plan disk size at server estimation phase. 1. Full backup of input data – We had full data feed several times for various reasons: E.g., Logic change, schema change, data corruption, etc. DB Incoming … DB data Cassandra Backup Snapshot Snapshot Snapshot 34
  • 35. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 35
  • 36. Conclusion • Rakuten uses Cassandra with Big data. • We’ll continue contributing to OSS. 36
  • 38. We are hiring! 中途採用を大募集しております! 楽天のMission 人と社会を(ネットを通じて)Empowermentし 自らの成功を通じ社会を変革し豊かにする 楽天のGOAL To become No.1 Internet Service Company in the World 楽天のMission&GOALに共感いただける方は是非ご連絡ください!  tech-career@mail.rakuten.com 38