Writing Space and the Cassandra NoSQL DBMS
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Writing Space and the Cassandra NoSQL DBMS

on

  • 1,674 views

By: Brian King at Pearson Education

By: Brian King at Pearson Education

Statistics

Views

Total Views
1,674
Views on SlideShare
1,647
Embed Views
27

Actions

Likes
0
Downloads
9
Comments
0

2 Embeds 27

https://twitter.com 26
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Writing Space and the Cassandra NoSQL DBMS Presentation Transcript

  • 1. Writing Spaceand theCassandra NoSQL DBMSBrian King(with thanks to Michael Aillon)
  • 2. Writing Space
  • 3. “Writing is one of the most effective toolsavailable to develop a students critical thinking.”Why A Writing Space?
  • 4. •  Efficient Administration Of Writing Assignments•  Scalable Classrooms (500+)•  Workflow Optimization / Automation•  Integrated Access to Assessment Toolso  Grammar Checkingo  Auto-Scoringo  Plagiarism Detection (Source Check)•  Grading Rubrics•  Online Editing and Document Upload•  Peer Review•  Group ProjectsThe Business Needs
  • 5. •  Highly "Internet" Scalable•  Global Presence•  Continuous Availability (Fault Tolerance)•  Broad OS And Browser Support•  Mobile Device Support - "Mobile First"•  Low Cost (Systems, Maintenance, Integration)•  Write Once, Integrate “Anywhere”•  Gain Experience With Modern NoSQL Technologies•  REST Service-Based Architecture•  Model UIThe Technical Goals
  • 6. Writing Space - Instructor
  • 7. Writing Space - Student
  • 8. Cassandra
  • 9. •  Highly Scalable•  Easy Multi-Data Center Support•  Performance•  Distributed Ring Configuration (Master-less)•  Dynamic Schema, “Schema-less”•  Slice QueriesWhat We Like
  • 10. •  Eventual / Tunable Consistency•  Key-Name-Value Data Store (Column Based)•  Data Modeling Based On Core Queries•  All Rows in a CF Typically Dont Live On 1 Server•  However, All Columns For a Row Do•  RDBMS Mindset•  No Ad Hoc QueriesWhat Challenged Us
  • 11. What Is Consistency?•  Write Consistency: Number Of Replicas Written To•  Read Consistency: Number Of Replicas Queried•  Replication Factor: Number Of Replicas For A Row•  Quorum Consistency Level (Read And Write):o  Option In Specifying Read/Write Consistencyo  (Replication_Factor / 2) + 1o  Ensures Strong Consistencyo  While Maintaining High Availability•  With 4 Servers, Writing Space uses:o  Replication Factor = 3o  Read and Write Quorum Consistency
  • 12. Typical RDBMS Features Not Available (Yet):•  Referential Integrity Constraints / Foreign Keys•  Commit / Rollback•  Stored Procedures•  Joins•  Views•  Triggers•  Functions•  Security Privileges•  Rules•  Partitioned Table DefinitionsWhats Not In Cassandra...
  • 13. CassandraInWriting Space
  • 14. Document Versioning...
  • 15. How We Modeled Our Data...Storage Strategy: Document-oriented1:M1:1
  • 16. The Writing SpaceDB Infrastructure
  • 17. The Hardware•  Many Inexpensive Servers (Actually 4 + 1)•  Our Configuration:Processor: Xeon E5630, 2.53GHz, 4 CoresMemory: 96 GBStorage:Two Mirrored Spinning Disks For OS / BinariesThree Striped 480GB Solid State Drives(Providing 1.3 TB Local DB Storage)•  Peer to Peer Ring•  Hot Swappable - Fault Tolerant•  "Whats Your Insurance Company?"
  • 18. Why DataStax Cassandra?•  A Certified, Production Ready Version Of Cassandra•  24/7 World Class Support•  Integration With Hadoop•  Integration With Solr•  OpsCenter (Multi-Data Center Management Tool)
  • 19. •  Doc Store and UI•  Load: 3x Anticipated Load•  Total Time Of Run: 1.75 hours•  Max Document Size: 10k (25k, 50k and 75k DS)ResultsAverage Response Time: < 300msMaximum Running Vusers: 684Total Throughput (bytes): 7,176,727,121Average Throughput (bytes/sec): 1,993,535Total Hits: 342,833Average Hits per Second: 95DB Server CPU < 0.3%Performance
  • 20. •  Document Store only•  Load: 100x Anticipated Load•  Total Time Of Run: 1 hour•  Document Size: 25k, 50k and 75kResultsAverage Response Time: < 100msMaximum Running Vusers: 2,200Total Throughput (bytes): 2,291,522,553Average Throughput (bytes/sec): 565,808Total Hits: 834,640Average Hits per Second: 206DB Server CPU < 1%Performance
  • 21. Wrapping It Up
  • 22. Cloud Decision Points•  Cost Savings•  Continuous Availability•  Performance / Dynamic (Elastic) Scalability•  Global Distribution Of Access Points•  Redundancy•  Disaster Recovery•  Resiliency To Node / Connectivity Loses A Must
  • 23. •  Think About Reporting Up Front•  Data Analytics – Hadoop and Solr Are Heavy Duty•  More Expensive Hardware?•  Different RAID Configuration (Not Striping)•  Get Training – Especially About Schema DesignWhat Would We Do Differently?
  • 24. Consider The Human Element...•  Mind Shift For RDBMS Folks•  Need To “Let Go” That Data Needs To Be Normalized•  Experience Of Operations Team•  Netflix - 4 People Managing 800+ NodesGlobal Enterprise•  Global Presence•  Disaster Recovery•  Internet ScaleFinal Thoughts...
  • 25. Writing Spaceand theCassandra NoSQL DBMSThank you!Questions?Brian.King@Pearson.com