Successfully reported this slideshow.
Your SlideShare is downloading. ×

Slide presentation pycassa_upload

Upcoming SlideShare
Advanced Cassandra
Advanced Cassandra
Loading in …3
×

Check these out next

1 of 42 Ad
1 of 42 Ad
Advertisement

More Related Content

Advertisement
Advertisement

Slide presentation pycassa_upload

  1. 1. PYCON INDIA 2012 Pycassa – Python Cassandrified 28-30th September 2012 Ramesh Rajini Dharmaram Vidya Infosys Limited, Kshetram Education & Research, Bangalore Bangalore, Karnataka
  2. 2. Session Plan • Need & Introduction to NoSQL DB • Cassandra Introduction • Data model creation • Pycassa in action
  3. 3. Heard of NO - SQL? • Stands for Not Only SQL • Class of non-relational data storage systems • No fixed table schema • No Joins! • Relax one or more of the ACID properties & will implement BASE & CAP Theorem!
  4. 4. Do we “REALLY” need them ? • RDBMS …So strong • so crisp • so vast • And WE know it well!
  5. 5. Trends shrends! – Gartner‟s 10 key IT trends for 2012 • unstructured data will grow some 80% over the course of the next five years 5
  6. 6. What made some apps go No-SQLized? • Explosion of social media sites with large data needs • Open-source community • Upsurge of cloud-based solutions • Migration to dynamically-typed languages
  7. 7. RDBMS..hmmm • Normalization => Joins => Slow Queries /Complications • Consistency => locks /transactions => Performance issues in distributed environments • Scalability becomes a mess as our apps grow in size and demand
  8. 8. Current Approach to Scalability • Add hardware • Upgrade hardware • More machines • Turn off unwanted services • Caching • De-normalize…
  9. 9. RDBMS ..tends to Massive [terabytes] Elastic scalability Easily achieve Fault tolerance Tunable Consistency
  10. 10. But Why.. • ACID • - transaction slow under heavy load • - in distributed /replicated environment = 2 phase commit => infinite wait by either NODE or Coordinator
  11. 11. But RDBMS is still holding up!! • Yes..it is • Will continue to Co-exist with NOSQL • What if data is no more a problem to me! • What new problems will I like to have?
  12. 12. Seeds of NoSQL • Three major papers – BigTable (Google) – Dynamo (Amazon) • Gossip protocol (discovery and error detection) • Distributed key-value data store • Eventual consistency – CAP Theorem
  13. 13. Brewer’s CAP Theorem • Properties of a system: – Consistency – Availability – Partitions
  14. 14. Brewer’s CAP Theorem • You can have it good, you can have it fast, you can have it cheap: pick two 14
  15. 15. BASE Vs ACID - Eventual Consistency • No updates for a long duration => eventually all updates will propagate through the system => all the nodes will be consistent • Any given accepted update and a given node, eventually either the update reaches the node or the node is removed from service • Known as BASE (Basically Available, Soft state, Eventual consistency)
  16. 16. What kinds of NoSQL • 2 Major areas: – Key/Value or „the big hash table‟. • Dynamo • Voldemort • Scalaris – Schema-less • column-based, document-based or graph-based. – Cassandra (column-based) – CouchDB (document-based) – Neo4J (graph-based) – HBase (column-based)
  17. 17. Any users?
  18. 18. Cassandra to the Rescue! – , source, Open Distributed, Decentralized, Elastically scalable Highly available / fault-tolerant Tune ably consistent Column-oriented database Automatic sharding Gossip Architecture 18
  19. 19. Distributed and Decentralized Can be running Decentralized on multiple • that there is no single machines point of failure. • appearing to users as • All the nodes in single instance cluster function exactly the same [server symmetry] 19
  20. 20. Elastic Scalability • Vertical scaling : – more hardware capacity /memory • Horizontal scaling : • More machines that have all or some of the data • So that no machine is bearing the complete load 20
  21. 21. Elastic Scalability , No single point failure • Elastic scalability : – Cluster will be able to scale up & down • Master Slave issue 21
  22. 22. Scale UP & Scale down • Add nodes and they can start serving clients! – NO server restart / NO query change / NO balancing – JUST add an another machine. • Just unplug the system. – Since cassandra has multiple copies of the same data in more than one node [configurable] there wont be any loss of data.
  23. 23. High Availability and Fault Tolerance • High availability + central server based system = problem – Internal Hard ware redundancy – Sounds cool but Extremely Costly 23
  24. 24. High Availability and Fault Tolerance – Cassandra allows to : • replace failed nodes in with no downtime • replicate data to multiple data centers to prevent downtime [automatic]
  25. 25. Tuneable Consistency • Consistency : All Reads return the most recently written value – Cassandra is “eventually consistent” model by default. 25
  26. 26. But then! • Amazon, Facebook, Google, Twitter which uses this model. – DATA is their main sales item – High performance!
  27. 27. Setting up Apache Cassandra • From the DataStax community Project – www.datastax.com/download • From the Apache Cassandra project: – http://cassandra.apache.org/ Believe it.. It‟s easy to install & set up!
  28. 28. Keyspace & Column Family creation Column family 1 Key1 ColumnName1 ColumnName2 Value Value Key2 ColumnName1 ColumnName2 Value Value Key3 ColumnName1 ColumnName2 ColumnName3 Value Value Value Column family 2 Key1 ColumnName1 ColumnName2 ColumnName3 Value Value Value
  29. 29. Data makes sense.. Column family Close Friends 010051 Mail id tweets Ramesh_Rajini Hello 010052 Mail id tweets Vinz_Raj I‟m logged in! 010053 Mail id tweet1 tweet2 Ragh_Rao Hey, how r u ? Movie.. Column family Colleagues 020061 Mail id City Likes Puru_lal Bangalore Ladoos!
  30. 30. Cassandra Data Structure key space Ex: column family Colony Name, UserIDs, Ex: Address, column EmpIDs Tweets, Likes, name value timestamp Skill Set
  31. 31. Key-in the Key space.. 31
  32. 32. Pycassa in action!
  33. 33. Multi-level Dictionary {“FriendsInfo”: Keyspace {“closefriends”: Column Family Key {010053: OrderedDict( [(“MailId”:“Ragh_Rao”), Columns (“tweet1”:“Hey, how r u ?”), (“tweet2”: “Movie..”)]) OrderedDict( .. }} ColumnKeys ColumnValues
  34. 34. Can I insert in bulk? • Yes, luckily as an ordered dict.. col_fam.batch_insert( {'010054': {'Name': 'Vinayak', 'Id': „9308'}, '010057': {'Name': 'Poorvi'} }) __________________________________ for i in range(1000, 1010): ... col_fam.insert('EmpIDs', {str(i): 'Hello'}) 34
  35. 35. Is the data stored? • With Key , get all details: col_fam.get('010052') OrderedDict ([('Maild', 'Vinz_Raj'), ('tweets', 'Im loggedin!')]) • With Key, get specific details: col_fam.get('010053', columns=['MaiID', 'tweet2']) OrderedDict([('tweet2', 'Movie..')]) • Specifying start & end columns: col_fam.get('EmpIDs', column_start='1002', column_finish='1006') OrderedDict([('1002', 'Hello'), ('1003', 'Hello'), ('1004', 'Hello'), ('1005', 'Hello'), ('1006', 'Hello')]) 35
  36. 36. Can the columns be sliced? • Specifying the reverse way col_fam.get('EmpIDs', column_reversed=True, column_count=3) OrderedDict([('1009', 'Hello'), ('1008', 'Hello'), ('1007', 'Hello')]) • Fetching multiple rows col_fam.multiget(['010053', '010051']) OrderedDict( [('010053', OrderedDict([('Maild', 'Ragh_Rao'), ('tweet1', 'Hey, how r u?'), ('tweet2', 'Movie..')])), ('010051', OrderedDict([('Mailid', 'Ramesh_Rajini'), ('tweets', 'Hello')]))]) 36
  37. 37. Counting.. • get_count()  Count the number of columns in the row with key . • multiget_count()  Perform a column count in parallel on a set of rows.  Similar parameters as for multiget(), except that a list of keys may be used.  A dictionary of the form {key: int} is returned. 37
  38. 38. What Next? • Explore more on Pycassa modules.. – http://pycassa.github.com/pycassa/api/index.html • Start using it.. I‟m sure you‟ll enjoy because it is simply superb! 38
  39. 39. Recap • Need & Introduction to NoSQL DB • Cassandra Introduction • Data model creation • Pycassa in action 39
  40. 40. References • Cassandra, The Definitive Guide – O‟reilly Publication,Eben Hewitt • http://www.datastax.com/ • http://pycassa.github.com/pycassa/ • https://github.com/twissandra/twissandra • https://groups.google.com/forum/?fromgroups#!forum/py cassa-discuss 40
  41. 41. Time for R&R? - Requests & Responses
  42. 42. Thank you! - R&R Ramesh Rajini Disclaimer : All logos and images belong to the creator and companies which own them

×