Your SlideShare is downloading. ×
Writing Space and the Cassandra NoSQL DBMS
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Writing Space and the Cassandra NoSQL DBMS


Published on

By: Brian King at Pearson Education

By: Brian King at Pearson Education

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Writing Spaceand theCassandra NoSQL DBMSBrian King(with thanks to Michael Aillon)
  • 2. Writing Space
  • 3. “Writing is one of the most effective toolsavailable to develop a students critical thinking.”Why A Writing Space?
  • 4. •  Efficient Administration Of Writing Assignments•  Scalable Classrooms (500+)•  Workflow Optimization / Automation•  Integrated Access to Assessment Toolso  Grammar Checkingo  Auto-Scoringo  Plagiarism Detection (Source Check)•  Grading Rubrics•  Online Editing and Document Upload•  Peer Review•  Group ProjectsThe Business Needs
  • 5. •  Highly "Internet" Scalable•  Global Presence•  Continuous Availability (Fault Tolerance)•  Broad OS And Browser Support•  Mobile Device Support - "Mobile First"•  Low Cost (Systems, Maintenance, Integration)•  Write Once, Integrate “Anywhere”•  Gain Experience With Modern NoSQL Technologies•  REST Service-Based Architecture•  Model UIThe Technical Goals
  • 6. Writing Space - Instructor
  • 7. Writing Space - Student
  • 8. Cassandra
  • 9. •  Highly Scalable•  Easy Multi-Data Center Support•  Performance•  Distributed Ring Configuration (Master-less)•  Dynamic Schema, “Schema-less”•  Slice QueriesWhat We Like
  • 10. •  Eventual / Tunable Consistency•  Key-Name-Value Data Store (Column Based)•  Data Modeling Based On Core Queries•  All Rows in a CF Typically Dont Live On 1 Server•  However, All Columns For a Row Do•  RDBMS Mindset•  No Ad Hoc QueriesWhat Challenged Us
  • 11. What Is Consistency?•  Write Consistency: Number Of Replicas Written To•  Read Consistency: Number Of Replicas Queried•  Replication Factor: Number Of Replicas For A Row•  Quorum Consistency Level (Read And Write):o  Option In Specifying Read/Write Consistencyo  (Replication_Factor / 2) + 1o  Ensures Strong Consistencyo  While Maintaining High Availability•  With 4 Servers, Writing Space uses:o  Replication Factor = 3o  Read and Write Quorum Consistency
  • 12. Typical RDBMS Features Not Available (Yet):•  Referential Integrity Constraints / Foreign Keys•  Commit / Rollback•  Stored Procedures•  Joins•  Views•  Triggers•  Functions•  Security Privileges•  Rules•  Partitioned Table DefinitionsWhats Not In Cassandra...
  • 13. CassandraInWriting Space
  • 14. Document Versioning...
  • 15. How We Modeled Our Data...Storage Strategy: Document-oriented1:M1:1
  • 16. The Writing SpaceDB Infrastructure
  • 17. The Hardware•  Many Inexpensive Servers (Actually 4 + 1)•  Our Configuration:Processor: Xeon E5630, 2.53GHz, 4 CoresMemory: 96 GBStorage:Two Mirrored Spinning Disks For OS / BinariesThree Striped 480GB Solid State Drives(Providing 1.3 TB Local DB Storage)•  Peer to Peer Ring•  Hot Swappable - Fault Tolerant•  "Whats Your Insurance Company?"
  • 18. Why DataStax Cassandra?•  A Certified, Production Ready Version Of Cassandra•  24/7 World Class Support•  Integration With Hadoop•  Integration With Solr•  OpsCenter (Multi-Data Center Management Tool)
  • 19. •  Doc Store and UI•  Load: 3x Anticipated Load•  Total Time Of Run: 1.75 hours•  Max Document Size: 10k (25k, 50k and 75k DS)ResultsAverage Response Time: < 300msMaximum Running Vusers: 684Total Throughput (bytes): 7,176,727,121Average Throughput (bytes/sec): 1,993,535Total Hits: 342,833Average Hits per Second: 95DB Server CPU < 0.3%Performance
  • 20. •  Document Store only•  Load: 100x Anticipated Load•  Total Time Of Run: 1 hour•  Document Size: 25k, 50k and 75kResultsAverage Response Time: < 100msMaximum Running Vusers: 2,200Total Throughput (bytes): 2,291,522,553Average Throughput (bytes/sec): 565,808Total Hits: 834,640Average Hits per Second: 206DB Server CPU < 1%Performance
  • 21. Wrapping It Up
  • 22. Cloud Decision Points•  Cost Savings•  Continuous Availability•  Performance / Dynamic (Elastic) Scalability•  Global Distribution Of Access Points•  Redundancy•  Disaster Recovery•  Resiliency To Node / Connectivity Loses A Must
  • 23. •  Think About Reporting Up Front•  Data Analytics – Hadoop and Solr Are Heavy Duty•  More Expensive Hardware?•  Different RAID Configuration (Not Striping)•  Get Training – Especially About Schema DesignWhat Would We Do Differently?
  • 24. Consider The Human Element...•  Mind Shift For RDBMS Folks•  Need To “Let Go” That Data Needs To Be Normalized•  Experience Of Operations Team•  Netflix - 4 People Managing 800+ NodesGlobal Enterprise•  Global Presence•  Disaster Recovery•  Internet ScaleFinal Thoughts...
  • 25. Writing Spaceand theCassandra NoSQL DBMSThank you!Questions?