Your SlideShare is downloading. ×
Cassandra Intro -- TheEdge2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra Intro -- TheEdge2012

2,153
views

Published on

This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability. …

This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,153
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
32
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • האפליקציה שלכם ויראלית כמות המשתמשים מוכפלת כל שבוע
  • Sparse nested hashtables
  • מילות מפתח:העמודות ממויינות
  • Columns are stored in rowsRows are indexed by row-id - This is the primary index in Cassandraמילות מפתח: עמודה ככלי עיקרי לשמירת נתונים. עד 2 ביליון עמודות.
  • כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C
  • כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C
  • 128 bit = 16 byteShardingטבעי של הנתונים
  • Transcript

    • 1. #theedge2012Practical Introduction To Sonia Margulis @robosonia March 2012
    • 2. Your Application
    • 3. Gone Viral
    • 4. Best Hardware Money Can Buy
    • 5. Improve Reads
    • 6. Sharding RDBMS – A Nightmare
    • 7. Cassandra’s Sweet Spot Many Linear concurrent Scalability users Distributed High Volumes Inherently of Operations Clustered
    • 8. The Road to Mastership Introduction to Cassandra Introduction to Cassandra DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 9. A non-relational databaseValues availabilityScales out, not upOpen sourceActive community
    • 10. AlwaysAvailable
    • 11. Who Uses It?
    • 12. Use Case: Social & Timelines
    • 13. Use Case: Statistics & LogsLogs by Rick Payette
    • 14. The Road to Mastership Introduction to Cassandra Running a Server DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 15. The Cassandra Project » Project » Runs on: » Apache License » Current release: 1.0.8 You are here sonia@hiro:~/apache-cassandra-1.0.8$
    • 16. Running a Server sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f .... Now serving reads. localhost/127.0.0.1:9160
    • 17. Connecting to Our Server Cassandra command line interface (CLI) tool sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160 Connected to: “Test Cluster” on localhost/9160 Welcome to Cassandra CLI version 1.0.8
    • 18. Creating a Keyspace Cassandra’s equivalent to RDBMSs database [default@unknown] create keyspace demo; Lets start using it [default@unknown] use demo; [default@demo]
    • 19. Creating a Column Family A column family holds data, much like a table in RDBMS. [default@demo] create column family user; Start adding data [default@demo] set user[1][a]=utf8(„foo‟); [default@demo] set user[2][b]=utf8(„bar‟); [default@demo] set user[2][c]=utf8(„test‟);
    • 20. Retrieving Data Retrieving columns by user key [default@demo] get user[2]; (column=b, value=bar) (column=c, value=test) Returned 2 results.
    • 21. The Road to Mastership Introduction to Cassandra Data Model DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 22. Column Column Name Value
    • 23. Column name Peter Parker 1 name Peter Parker
    • 24. Row icon name residencespiderman Peter Parker New York
    • 25. Row Columns Row Id icon name residencespiderman Peter Parker New York 1 2 spiderman name Peter Parker
    • 26. Column Family spider- icon name residence man Peter P New York icon name residence batman Bruce W Gotham icon name residence hulk Bruce B New York
    • 27. Column Family spider- icon name residence man Peter P New York icon name residence batman set user[„spiderman‟][„name‟] W „Peter Parker‟ Bruce = Gotham icon name residence hulk Value Column Bruce B New York Row id name Column Family
    • 28. The Allies Column Family Robin Alfred batman spider- Iceman Firestar Iron Man Storm man
    • 29. Published Issues Column Family ~2600 columnsspider- 1/8/1962man ### ... 1/3/2012 8/3/2012 ### ###batman 1/5/1939 ### ... 2/3/2012 9/3/2012 ### ### ~3800 columns
    • 30. Model Flexibility Flexible Data Model Image: photostock / FreeDigitalPhotos.net
    • 31. Keyspace » Like RDBMS database » A container for column families [default@unknown] create keyspace demo; » One keyspace per application, in most cases
    • 32. Expiring Columns – TTL icon name passwd_ residence spider- reminder man Peter P abcd New York set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200; 7200s = 2 hours
    • 33. Distributed Counters javaedge speakers sessions .com 1035 3402 incr page_views[„javaedge.com‟][„speakers‟] by 1 get page_views[„javaedge.com‟][„speakers‟]
    • 34. The Road to Mastership Introduction to Cassandra Communication with the Server: Clients DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 35. Cassandra Query Language » Looks a lot like SQL INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel) » Mostly valid SQL SELECT name, universe FROM users WHERE KEY = „hulk‟
    • 36. Advantages of using CQL » Run ad-hoc queries » Very familiar, easier to use » Stable interface ▪ For library developers ▪ For users
    • 37. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
    • 38. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
    • 39. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
    • 40. Cassandra JDBC Driver import java.sql.*; Class.forName( "org.apache.cassandra.cql.jdbc.CassandraDriver"); Connection con = DriverManager.getConnection( "jdbc:cassandra://localhost:9160/keyspace");
    • 41. Cassandra JDBC Driver Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery( “SELECT name, residence FROM users WHERE KEY =" + key + "");
    • 42. Cassandra JDBC Driver JDBC
    • 43. Hector SliceQuery<...> query = HFactory.createSliceQuery(keyspace, ...); query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues") .setKey("spiderman"); QueryResult<ColumnSlice<Date, String>> result = query.execute();
    • 44. Hector: Advanced Features » Failover support » Connection pooling » Load balancing » JMX counters » Object mapper
    • 45. Maven plugin mvn cassandra:start Run your tests mvn cassandra:cql-exec mvn cassandra:stop
    • 46. The Road to Mastership Introduction to Cassandra Modeling Data DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 47. Queries First » Use the same Column Family for data that should be fetched together ▪ Reduces IO » Consider filtering and ordering
    • 48. Denormalize » Less seeks - faster reads » Storing redundant data ▪ Manually handling data integrity » Disk space is cheaper than seek time
    • 49. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York
    • 50. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York create column family users ... and column_metadata= [{column_name: residence, index_type: KEYS}]; » Good nameindexes with low cardinality SELECT for FROM users WHERE residence = „New York‟
    • 51. Manually Managed Index » Requirement: Find a superhero by name
    • 52. Manually Managed Index » Requirement: Find a superhero by name hulk batman Bruce Search Keys in term users CF spiderman Peter » Manually maintain an inverted index
    • 53. Bucketing hulk_jan 1/1/2012 2/1/2012 4/1/2012 _2012 Issue-1 Issue-2 Issue-3 All issues hulk_feb 2/2/2012 28/2/2012 29/2/2012 _2012 Issue-4 Issue-5 Issue-6 By month
    • 54. The Road to Mastership Introduction to Cassandra Cassandra Cluster DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 55. Virtual Ring 10 90 40 75 60
    • 56. Node Token 10 Node Keys 90 40 10 91-10 40 11-40 60 41-60 75 61-75 90 76-90 75 60
    • 57. Node Token hulkMD5’(hulk) = 20 10 90 40 75 60
    • 58. Node TokenMD5’(hulk) = 20 10 hulk 90 40 75 60
    • 59. Node Token 10 hulk thor 40MD5’(thor) = 42 90 75 60
    • 60. Node Token 10 hulkMD5’(thor) = 42 90 40 thor 75 60
    • 61. Inter-Node Communication 10 90 40» Gossip» Failure Detection 75 60
    • 62. Fault Tolerance» Replication factor» Hinted Handoff 10 hulk 90 40 75 60 thor
    • 63. Replication Factor» Replication factor» Hinted Handoff 10 hulk thor 90 Replication 40 factor = 3 hulk hulk thor 75 60 thor
    • 64. Fault Tolerance» Replication factor» Hinted Handoff 10 90 40 75 60
    • 65. Hinted Handoff» Replication factor» Hinted Handoff 10 90 40 75 60
    • 66. Hinted Handoff» Replication factor» Hinted Handoff 10 90 40 75 60
    • 67. Client Requests Coordinator 10 Write Request 90 75 60
    • 68. Consistency Level Consistency level = ONE 10 Write Request 90 75 60
    • 69. Consistency Level Consistency level = ALL 10 Write Request 90 75 60
    • 70. The Road to Mastership Introduction to Cassandra Summary DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
    • 71. Where Do You Sign? » Cassandra ▪ http://cassandra.apache.com ▪ http://www.datastax.com/ • Docs, tutorials & videos ▪ IRC: #cassandra on freenode » Hector ▪ https://github.com/rantav/hector ▪ https://github.com/zznate/hector-examples

    ×