SlideShare a Scribd company logo
Apache Cassandra, part 2 – data model example, machinery
V. Data model example - Twissandra
Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
Twissandra – DB User User id user_name password
Twissandra - DB Followers User User Followers id user_name password id user_name password user_id follower_id
Twissandra - DB Following User User Following id user_name password id user_name password user_id following_id
Twissandra – DB Tweets User Tweet id user_name password id user_id body timestamp
Twissandra column families User Username Friends, Followers Tweet Userline Timeline
Twissandra – Users CF <<CF>> User <<CF>> Username <<RowKey>> userid + username + password <<RowKey>> username + userid
Twissandra–Friends and Followers CFs <<CF>> Friends <<CF>> Followers <<RowKey>> userid <<RowKey>> userid friendid followerid timestamp timestamp
Twissandra – Tweet CF <<CF>> Tweet <<RowKey>> tweetid  + userid  + body  + timestamp
Twissandra–Userline and Timeline CFs <<CF>> Userline <<CF>> Timeline <<RowKey>> userid <<RowKey>> userid timestamp timestamp tweetid tweetid
Cassandra QL – User creation BATCH BEGIN BATCH  INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’,  ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’,  ‘id’) APPLY BATCH
Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY,  friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
Cassandra QL – Tweet creation  BATCH BEGIN BATCH INSERT INTO Tweet (KEY,  userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
Cassandra QL – Getting user tweets SELECT  * FROM Userline KEY = ‘userid’ SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Cassandra QL – Getting user timeline SELECT  * FROM Timeline KEY = ‘userid’ SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
Indexes <<CF>> Item_Properties <<CF>> Container_Items <<RowKey>> item_id <<RowKey>> container_id property_name item_id property_value insertion_timestamp
Indexes <<CF>> Container_Items_Property_Index <<RowKey>>  container_id + property_name composite(property_value, item_id, entry_timestamp) item_id Comparator: compositecomparer.CompositeType
Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
Solution <<CF>> Container_Item_Property_Index_Entries <<RowKey>>  container_id + item_id 		+ property_name entry_timestamp property_value
VI. Architecture
Partitioners Partitioners decide where a key maps onto the ring. Key 1 Key 2 Key 3 Key 4
Partitioners RandomPartitioner OrderPreservingPartitioner ByteOrderedPartitioner CollatingOrderPreservingPartitioner
Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
Placement Strategies SimpleStrategy - returns the nodes that are next to each other on the ring.
Placement Strategies OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.
Placement Strategies NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.
Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
Commit Log ,[object Object]
 sequential writes onlyMemtable ,[object Object],SSTable ,[object Object]
 indexesMemtables, SSTables, Commit Logs
Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log.  Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
Hinted handoff Write Hint Cassandra is always available for write
Consistency level
Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
Replica synchronization Anti-entropy Read repair
Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): 			after the data is returned Strong consistency level (QUORUM, ALL): 			before the data is returned
Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
Bloom filters Read Write 1 0 0 Hash1 Hash1 0 0 0 Key1 Hash2 Key2 Hash2 0 1 0 Hash3 1 Hash3 0 SSTable

More Related Content

What's hot

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
DataStax
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
Patrick McFadin
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
DataStax Academy
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
DataStax Academy
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraDataStax
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Samir Bessalah
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
Patrick McFadin
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
Sultan Ahmed
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Duyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
Duyhai Doan
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
StampedeCon
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
DataStax Academy
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
Duyhai Doan
 

What's hot (20)

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 

Viewers also liked

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)Andrey Lomakin
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
narsiman
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
Mikalai Alimenkou
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation System
amanchaurasia
 

Viewers also liked (6)

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation System
 

Similar to Apache Cassandra, part 2 – data model example, machinery

Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)
Return on Intelligence
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
zznate
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
Pankaj Khattar
 
Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
Brian Enochson
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
zznate
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
aaronmorton
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
Brian Enochson
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
Jihyun Ahn
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Spark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
MongoDB
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Tathagata Das
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011bostonrb
 
Cassandra
CassandraCassandra
Cassandra
Bang Tsui Liou
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
petabridge
 

Similar to Apache Cassandra, part 2 – data model example, machinery (20)

Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Cassandra
CassandraCassandra
Cassandra
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Apache Cassandra, part 2 – data model example, machinery

  • 1. Apache Cassandra, part 2 – data model example, machinery
  • 2. V. Data model example - Twissandra
  • 3. Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
  • 4. Twissandra – DB User User id user_name password
  • 5. Twissandra - DB Followers User User Followers id user_name password id user_name password user_id follower_id
  • 6. Twissandra - DB Following User User Following id user_name password id user_name password user_id following_id
  • 7. Twissandra – DB Tweets User Tweet id user_name password id user_id body timestamp
  • 8. Twissandra column families User Username Friends, Followers Tweet Userline Timeline
  • 9. Twissandra – Users CF <<CF>> User <<CF>> Username <<RowKey>> userid + username + password <<RowKey>> username + userid
  • 10. Twissandra–Friends and Followers CFs <<CF>> Friends <<CF>> Followers <<RowKey>> userid <<RowKey>> userid friendid followerid timestamp timestamp
  • 11. Twissandra – Tweet CF <<CF>> Tweet <<RowKey>> tweetid + userid + body + timestamp
  • 12. Twissandra–Userline and Timeline CFs <<CF>> Userline <<CF>> Timeline <<RowKey>> userid <<RowKey>> userid timestamp timestamp tweetid tweetid
  • 13. Cassandra QL – User creation BATCH BEGIN BATCH INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’) APPLY BATCH
  • 14. Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
  • 15. Cassandra QL – Tweet creation BATCH BEGIN BATCH INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
  • 16. Cassandra QL – Getting user tweets SELECT * FROM Userline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
  • 17. Cassandra QL – Getting user timeline SELECT * FROM Timeline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
  • 18. Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
  • 19. Indexes <<CF>> Item_Properties <<CF>> Container_Items <<RowKey>> item_id <<RowKey>> container_id property_name item_id property_value insertion_timestamp
  • 20. Indexes <<CF>> Container_Items_Property_Index <<RowKey>> container_id + property_name composite(property_value, item_id, entry_timestamp) item_id Comparator: compositecomparer.CompositeType
  • 21. Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
  • 22. Solution <<CF>> Container_Item_Property_Index_Entries <<RowKey>> container_id + item_id + property_name entry_timestamp property_value
  • 24. Partitioners Partitioners decide where a key maps onto the ring. Key 1 Key 2 Key 3 Key 4
  • 25. Partitioners RandomPartitioner OrderPreservingPartitioner ByteOrderedPartitioner CollatingOrderPreservingPartitioner
  • 26. Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
  • 27. Placement Strategies SimpleStrategy - returns the nodes that are next to each other on the ring.
  • 28. Placement Strategies OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.
  • 29. Placement Strategies NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.
  • 30. Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
  • 31. Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
  • 32. Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
  • 33. Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
  • 34.
  • 35.
  • 37. Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
  • 38. Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
  • 39. Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
  • 40. Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
  • 41. Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
  • 42. Hinted handoff Write Hint Cassandra is always available for write
  • 44. Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
  • 45. Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
  • 46. Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
  • 48. Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
  • 49. Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): after the data is returned Strong consistency level (QUORUM, ALL): before the data is returned
  • 50. Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
  • 51. Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
  • 52. Bloom filters Read Write 1 0 0 Hash1 Hash1 0 0 0 Key1 Hash2 Key2 Hash2 0 1 0 Hash3 1 Hash3 0 SSTable
  • 53. Resources Home of Apache Cassandra Project http://cassandra.apache.org/ Apache Cassandra Wiki http://wiki.apache.org/cassandra/ Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/ Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
  • 54. Authors Lev Sivashov- lsivashov@gmail.com Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_LomakinLinkedIn: http://www.linkedin.com/in/andreylomakin Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh Anton Veretennik - tennik@gmail.com

Editor's Notes

  1. Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)