Infovision Anand S _ no sql workshop

775 views

Published on

NoSQL, Non relational databases

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
775
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Infovision Anand S _ no sql workshop

  1. 1. NoSQLNon Relational Databases Data Scientist Gramener.com
  2. 2. UPDATE ANOMALYName Address PhoneAnand 10 Mount Rd, Chennai 98765 43210Bala 15, Janpath, New Delhi 90123 45678Chandra 20, Marine Dr, Mumbai 91234 56780Chandra Non Relational Databases 20, Marine Dr, Mumbai 91234 56781INSERT ANOMALY DELETE ANOMALYProfessor Hired on Course Professor Hired on CourseAnand 10 Jan 2012 Maths Anand 10 Jan 2012 MathsBala 15 Jan 2012 Physics Bala 15 Jan 2012 PhysicsChandra 20 Jan 2012 Chemistry Chandra 20 Jan 2012 ChemistryDileep 25 Jan 2012 ??? Dileep 25 Jan 2012 Biology
  3. 3. WHY NOW? DATA VOLUME IS GROWING DATA IS INCREASINGLY NETWORKED 988 623 397 2531612006 2007 2008 2009 2010 SEMI-STRUCTURED DATA DISTRIBUTED ARCHITECTURE
  4. 4. A POLL How many programmers?… who’ve programmed NoSQL DBs How many non-IT folks?
  5. 5. Key-value stores data is stored in TABLESDocument Graphdatabases databases
  6. 6. Availability A Brewer’s CAP Theorem Pick Two CConsistency P Partition Tolerance
  7. 7. Key-value stores Columnar databasesDocument Graphdatabases databases
  8. 8. KEY VALUE STORES DOCUMENT DATABASESRedis CouchDBCassandra MongoDBMemcache SimpleDBVoldemort RiakDynamo TerrastoreTokyo Cabinet Lotus DominoCOLUMNAR DATABASES GRAPH DATABASESCassandra Neo4jBigTable FlockDBHypertable GraphDBHbase OrientDBVertica InfiniteGraphInfiniDB AllegroGraph
  9. 9. KEY VALUE STORES DOCUMENT DATABASESRedis CouchDBCassandra MongoDBMemcache SimpleDBVoldemort RiakDynamo TerrastoreTokyo Cabinet Lotus DominoCOLUMNAR DATABASES GRAPH DATABASESCassandra Neo4jBigTable FlockDBHypertable GraphDBHbase OrientDBVertica InfiniteGraphInfiniDB AllegroGraph
  10. 10. The first time round, the mistakes were The second time round I turned to myaround scalability. I used a SQL new favourite in-memory data structure“ORDER BY RAND()” statement to server, redis, and its SRANDMEMBERreturn the next page to review. I knew command (a feature I requested a whilethis was an inefficient operation, but I ago with this exact kind of project inassumed that it wouldn’t matter since mind). The system maintains a redis setthe button would only be clicked of all IDs that needed to be reviewed foroccasionally. an assignment to be complete, and a separate set of IDs of all pages had beenSomething like 90% of our database reviewed. It then uses redis setload turned out to be caused by that one intersection (the SDIFFSTORESQL statement, and it only got worse as command) to create a set of unreviewedwe loaded more pages in to the system. pages for the current assignment andThis caused multiple site slow downs then SRANDMEMBER to pick one ofand crashes. those pages.
  11. 11. CouchDB
  12. 12.  s.anand@gramener.com gramener.com s-anand.net @sanand0 on Twitter +91 9741 552 552
  13. 13. EXERCISE: DESIGN THE SSLC MARKS DATABASEEach student has an ID.There are totally 11 languages and 92 non-language subjects.Students usually write 3 language and 3 non-language exams.For example,• (English, Hindi, Sanskrit), (Maths, Physics, Chemistry)• (Kannada, Urdu, Marathi), (Commerce, Accountancy, Economics)You need to record their marks in all 6 subjects, and the total.
  14. 14. EXERCISE: DESIGN THE SSLC MARKS DATABASECommon queries: Some scenarios:Who scored the highest in Maths? Access from multiple locationsWhich subject had the highest fail %? Real-time marks updationHow many failed in 1 subject? Guarantee of correctness

×