• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Slide presentation pycassa_upload

Slide presentation pycassa_upload






Total Views
Views on SlideShare
Embed Views



1 Embed 124

http://in.pycon.org 124



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Slide presentation pycassa_upload Slide presentation pycassa_upload Presentation Transcript

    • PYCON INDIA 2012 Pycassa – Python Cassandrified28-30th September 2012 Ramesh Rajini Dharmaram Vidya Infosys Limited, Kshetram Education & Research, Bangalore Bangalore, Karnataka
    • Session Plan• Need & Introduction to NoSQL DB• Cassandra Introduction• Data model creation• Pycassa in action
    • Heard of NO - SQL?• Stands for Not Only SQL• Class of non-relational data storage systems• No fixed table schema• No Joins!• Relax one or more of the ACID properties & will implement BASE & CAP Theorem!
    • Do we “REALLY” need them ? • RDBMS …So strong • so crisp • so vast • And WE know it well!
    • Trends shrends! – Gartner‟s 10 key IT trends for 2012 • unstructured data will grow some 80% over the course of the next five years 5
    • What made some apps go No-SQLized?• Explosion of social media sites with large data needs• Open-source community• Upsurge of cloud-based solutions• Migration to dynamically-typed languages
    • RDBMS..hmmm• Normalization => Joins => Slow Queries /Complications• Consistency => locks /transactions => Performance issues in distributed environments• Scalability becomes a mess as our apps grow in size and demand
    • Current Approach to Scalability• Add hardware• Upgrade hardware• More machines• Turn off unwanted services• Caching• De-normalize…
    • RDBMS ..tends to Massive [terabytes] Elastic scalability Easily achieve Fault tolerance Tunable Consistency
    • But Why.. • ACID • - transaction slow under heavy load • - in distributed /replicated environment = 2 phase commit => infinite wait by either NODE or Coordinator
    • But RDBMS is still holding up!!• Yes..it is• Will continue to Co-exist with NOSQL• What if data is no more a problem to me!• What new problems will I like to have?
    • Seeds of NoSQL• Three major papers – BigTable (Google) – Dynamo (Amazon) • Gossip protocol (discovery and error detection) • Distributed key-value data store • Eventual consistency – CAP Theorem
    • Brewer’s CAP Theorem• Properties of a system: – Consistency – Availability – Partitions
    • Brewer’s CAP Theorem• You can have it good, you can have it fast, you can have it cheap: pick two 14
    • BASE Vs ACID - Eventual Consistency• No updates for a long duration => eventually all updates will propagate through the system => all the nodes will be consistent• Any given accepted update and a given node, eventually either the update reaches the node or the node is removed from service• Known as BASE (Basically Available, Soft state, Eventual consistency)
    • What kinds of NoSQL• 2 Major areas: – Key/Value or „the big hash table‟. • Dynamo • Voldemort • Scalaris – Schema-less • column-based, document-based or graph-based. – Cassandra (column-based) – CouchDB (document-based) – Neo4J (graph-based) – HBase (column-based)
    • Any users?
    • Cassandra to the Rescue! – , source, Open Distributed, Decentralized, Elastically scalable Highly available / fault-tolerant Tune ably consistent Column-oriented database Automatic sharding Gossip Architecture 18
    • Distributed and Decentralized Can be running Decentralized on multiple • that there is no single machines point of failure. • appearing to users as • All the nodes in single instance cluster function exactly the same [server symmetry] 19
    • Elastic Scalability• Vertical scaling : – more hardware capacity /memory• Horizontal scaling : • More machines that have all or some of the data • So that no machine is bearing the complete load 20
    • Elastic Scalability , No single point failure• Elastic scalability : – Cluster will be able to scale up & down• Master Slave issue 21
    • Scale UP & Scale down• Add nodes and they can start serving clients! – NO server restart / NO query change / NO balancing – JUST add an another machine.• Just unplug the system. – Since cassandra has multiple copies of the same data in more than one node [configurable] there wont be any loss of data.
    • High Availability and Fault Tolerance• High availability + central server based system = problem – Internal Hard ware redundancy – Sounds cool but Extremely Costly 23
    • High Availability and Fault Tolerance – Cassandra allows to : • replace failed nodes in with no downtime • replicate data to multiple data centers to prevent downtime [automatic]
    • Tuneable Consistency• Consistency : All Reads return the most recently written value – Cassandra is “eventually consistent” model by default. 25
    • But then! • Amazon, Facebook, Google, Twitter which uses this model. – DATA is their main sales item – High performance!
    • Setting up Apache Cassandra• From the DataStax community Project – www.datastax.com/download• From the Apache Cassandra project: – http://cassandra.apache.org/ Believe it.. It‟s easy to install & set up!
    • Keyspace & Column Family creation Column family 1Key1 ColumnName1 ColumnName2 Value ValueKey2 ColumnName1 ColumnName2 Value ValueKey3 ColumnName1 ColumnName2 ColumnName3 Value Value Value Column family 2 Key1 ColumnName1 ColumnName2 ColumnName3 Value Value Value
    • Data makes sense.. Column family Close Friends 010051 Mail id tweets Ramesh_Rajini Hello 010052 Mail id tweets Vinz_Raj I‟m logged in! 010053 Mail id tweet1 tweet2 Ragh_Rao Hey, how r u ? Movie.. Column family Colleagues 020061 Mail id City Likes Puru_lal Bangalore Ladoos!
    • Cassandra Data Structure key space Ex: column family Colony Name, UserIDs, Ex: Address, column EmpIDs Tweets, Likes, name value timestamp Skill Set
    • Key-in the Key space.. 31
    • Pycassa in action!
    • Multi-level Dictionary {“FriendsInfo”: Keyspace {“closefriends”: Column Family Key {010053: OrderedDict( [(“MailId”:“Ragh_Rao”), Columns (“tweet1”:“Hey, how r u ?”), (“tweet2”: “Movie..”)]) OrderedDict( .. }} ColumnKeys ColumnValues
    • Can I insert in bulk?• Yes, luckily as an ordered dict.. col_fam.batch_insert({010054: {Name: Vinayak, Id: „9308}, 010057: {Name: Poorvi}})__________________________________for i in range(1000, 1010):... col_fam.insert(EmpIDs, {str(i): Hello}) 34
    • Is the data stored?• With Key , get all details: col_fam.get(010052) OrderedDict ([(Maild, Vinz_Raj), (tweets, Im loggedin!)])• With Key, get specific details: col_fam.get(010053, columns=[MaiID, tweet2]) OrderedDict([(tweet2, Movie..)])• Specifying start & end columns: col_fam.get(EmpIDs, column_start=1002, column_finish=1006) OrderedDict([(1002, Hello), (1003, Hello), (1004, Hello), (1005, Hello), (1006, Hello)]) 35
    • Can the columns be sliced?• Specifying the reverse way col_fam.get(EmpIDs, column_reversed=True, column_count=3) OrderedDict([(1009, Hello), (1008, Hello), (1007, Hello)])• Fetching multiple rows col_fam.multiget([010053, 010051]) OrderedDict( [(010053, OrderedDict([(Maild, Ragh_Rao), (tweet1, Hey, how r u?), (tweet2, Movie..)])), (010051, OrderedDict([(Mailid, Ramesh_Rajini), (tweets, Hello)]))]) 36
    • Counting..• get_count()  Count the number of columns in the row with key .• multiget_count()  Perform a column count in parallel on a set of rows.  Similar parameters as for multiget(), except that a list of keys may be used.  A dictionary of the form {key: int} is returned. 37
    • What Next?• Explore more on Pycassa modules.. – http://pycassa.github.com/pycassa/api/index.html• Start using it.. I‟m sure you‟ll enjoy because it is simply superb! 38
    • Recap• Need & Introduction to NoSQL DB• Cassandra Introduction• Data model creation• Pycassa in action 39
    • References• Cassandra, The Definitive Guide – O‟reilly Publication,Eben Hewitt• http://www.datastax.com/• http://pycassa.github.com/pycassa/• https://github.com/twissandra/twissandra• https://groups.google.com/forum/?fromgroups#!forum/py cassa-discuss 40
    • Time for R&R? - Requests & Responses
    • Thank you! - R&R Ramesh RajiniDisclaimer : All logos and images belong to the creator and companies which own them