SlideShare a Scribd company logo
1 of 12
Download to read offline
Cassandra



Tuesday, February 22, 2011               1
Operational Data Store
                                          Initial Requirements
                                                 (Late 2007)




                    • On big data security aggregator from
                             multiple sources using Morningstar global
                             security identifier
                    • Highly scalable both horizontally and
                             vertically
                    • Easy to distribute computation processing
                    • Easy to store various types of data
Tuesday, February 22, 2011                                               2
MySQL
                                         Initial Implementation
                                                  (2008)



                    •        One database on one big database server

                    •        Very simple data model - one table per source
                             with a simple key (Morningstar ID and date)

                    •        Tables were manually replicated with complicated
                             logic

                    •        Tables stored data as binary blobs

                    •        No indexing on the tables other than the primary
                             key(s)


Tuesday, February 22, 2011                                                      3
MySQL Tables




Tuesday, February 22, 2011                  4
What worked?

                    • Great interface to query the data
                    • Very stable system
                    • Simple data model meant high
                             efficiency for queries
                    • Great memory usage
Tuesday, February 22, 2011                                5
What did not work
                    •         Hard to implement Map-Reduce

                    •         Hard to increase capacity with data growth

                    •         Multi-site replication slow and somewhat
                              complicated

                    •         Limited number of columns and rows per table
                             - Did manual table partitioning to keep under 2 million records per table
                             - Table per source to keep column count down, and to not have sparsely
                               populated rows




Tuesday, February 22, 2011                                                                               6
Cassandra
                                         Current Implementation
                                                  (2010)




                    • 5 Machine Cluster
                             •   In house VMs on blade farm

                             •   4 cores, 8 GB ram per node

                    • Column families based on access type not
                             source
                    • Manual indexing of data unit type to key(s)

Tuesday, February 22, 2011                                          7
Cassandra Column Families
                             Data




Tuesday, February 22, 2011          8
Cassandra Column Families
                             Time Series Data




Tuesday, February 22, 2011                      9
What works?
                    •        Very easy to query when the keys are known (normal use)

                    •        Very scalable, just add more nodes, even at a later point in
                             time.

                    •        Multi-site replication is easy

                    •        Basically unlimited number of columns per column family

                    •        Unlimited number of rows per column family

                    •        Sparse rows don’t waste space

                    •        Disaster recovery automatically taken care of by multi-site
                             redundancy



Tuesday, February 22, 2011                                                                  10
What is hard
                    •        Arbitrary queries are dificult.

                             •   Had to create our own indexes to go from data
                                 unit type back to key (can’t select where != NULL)

                             •   Need to add extra indexes and/or de-normalized
                                 column families when we think of a new way that
                                 we want to query the data

                    •        Monitoring a cluster is harder than one server

                    •        Getting memory usage settings correct so that nodes
                             don’t die with OOM errors


Tuesday, February 22, 2011                                                            11
Future Plans


                    • Upgrade to 0.7
                    • Expand cluster to multiple data centers
                             around the globe




Tuesday, February 22, 2011                                      12

More Related Content

Similar to Cassandra at Morningstar (Feb 2011)

Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
Jags Ramnarayan
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
jbellis
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW Breakfast
Ivan Zoratti
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
Eonblast
 

Similar to Cassandra at Morningstar (Feb 2011) (20)

Membase Meetup - San Diego
Membase Meetup - San DiegoMembase Meetup - San Diego
Membase Meetup - San Diego
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
No sql findings
No sql findingsNo sql findings
No sql findings
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
1 Unix basics. Part 1
1 Unix basics. Part 11 Unix basics. Part 1
1 Unix basics. Part 1
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW Breakfast
 
SortaSQL
SortaSQLSortaSQL
SortaSQL
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Coding Potpourri: MySQL
Coding Potpourri: MySQLCoding Potpourri: MySQL
Coding Potpourri: MySQL
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 

Cassandra at Morningstar (Feb 2011)

  • 2. Operational Data Store Initial Requirements (Late 2007) • On big data security aggregator from multiple sources using Morningstar global security identifier • Highly scalable both horizontally and vertically • Easy to distribute computation processing • Easy to store various types of data Tuesday, February 22, 2011 2
  • 3. MySQL Initial Implementation (2008) • One database on one big database server • Very simple data model - one table per source with a simple key (Morningstar ID and date) • Tables were manually replicated with complicated logic • Tables stored data as binary blobs • No indexing on the tables other than the primary key(s) Tuesday, February 22, 2011 3
  • 5. What worked? • Great interface to query the data • Very stable system • Simple data model meant high efficiency for queries • Great memory usage Tuesday, February 22, 2011 5
  • 6. What did not work • Hard to implement Map-Reduce • Hard to increase capacity with data growth • Multi-site replication slow and somewhat complicated • Limited number of columns and rows per table - Did manual table partitioning to keep under 2 million records per table - Table per source to keep column count down, and to not have sparsely populated rows Tuesday, February 22, 2011 6
  • 7. Cassandra Current Implementation (2010) • 5 Machine Cluster • In house VMs on blade farm • 4 cores, 8 GB ram per node • Column families based on access type not source • Manual indexing of data unit type to key(s) Tuesday, February 22, 2011 7
  • 8. Cassandra Column Families Data Tuesday, February 22, 2011 8
  • 9. Cassandra Column Families Time Series Data Tuesday, February 22, 2011 9
  • 10. What works? • Very easy to query when the keys are known (normal use) • Very scalable, just add more nodes, even at a later point in time. • Multi-site replication is easy • Basically unlimited number of columns per column family • Unlimited number of rows per column family • Sparse rows don’t waste space • Disaster recovery automatically taken care of by multi-site redundancy Tuesday, February 22, 2011 10
  • 11. What is hard • Arbitrary queries are dificult. • Had to create our own indexes to go from data unit type back to key (can’t select where != NULL) • Need to add extra indexes and/or de-normalized column families when we think of a new way that we want to query the data • Monitoring a cluster is harder than one server • Getting memory usage settings correct so that nodes don’t die with OOM errors Tuesday, February 22, 2011 11
  • 12. Future Plans • Upgrade to 0.7 • Expand cluster to multiple data centers around the globe Tuesday, February 22, 2011 12