Your SlideShare is downloading. ×
Writing Space and the Cassandra NoSQL DBMS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Writing Space and the Cassandra NoSQL DBMS

1,218

Published on

By: Brian King at Pearson Education

By: Brian King at Pearson Education

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,218
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Writing Spaceand theCassandra NoSQL DBMSBrian King(with thanks to Michael Aillon)
  • 2. Writing Space
  • 3. “Writing is one of the most effective toolsavailable to develop a students critical thinking.”Why A Writing Space?
  • 4. •  Efficient Administration Of Writing Assignments•  Scalable Classrooms (500+)•  Workflow Optimization / Automation•  Integrated Access to Assessment Toolso  Grammar Checkingo  Auto-Scoringo  Plagiarism Detection (Source Check)•  Grading Rubrics•  Online Editing and Document Upload•  Peer Review•  Group ProjectsThe Business Needs
  • 5. •  Highly "Internet" Scalable•  Global Presence•  Continuous Availability (Fault Tolerance)•  Broad OS And Browser Support•  Mobile Device Support - "Mobile First"•  Low Cost (Systems, Maintenance, Integration)•  Write Once, Integrate “Anywhere”•  Gain Experience With Modern NoSQL Technologies•  REST Service-Based Architecture•  Model UIThe Technical Goals
  • 6. Writing Space - Instructor
  • 7. Writing Space - Student
  • 8. Cassandra
  • 9. •  Highly Scalable•  Easy Multi-Data Center Support•  Performance•  Distributed Ring Configuration (Master-less)•  Dynamic Schema, “Schema-less”•  Slice QueriesWhat We Like
  • 10. •  Eventual / Tunable Consistency•  Key-Name-Value Data Store (Column Based)•  Data Modeling Based On Core Queries•  All Rows in a CF Typically Dont Live On 1 Server•  However, All Columns For a Row Do•  RDBMS Mindset•  No Ad Hoc QueriesWhat Challenged Us
  • 11. What Is Consistency?•  Write Consistency: Number Of Replicas Written To•  Read Consistency: Number Of Replicas Queried•  Replication Factor: Number Of Replicas For A Row•  Quorum Consistency Level (Read And Write):o  Option In Specifying Read/Write Consistencyo  (Replication_Factor / 2) + 1o  Ensures Strong Consistencyo  While Maintaining High Availability•  With 4 Servers, Writing Space uses:o  Replication Factor = 3o  Read and Write Quorum Consistency
  • 12. Typical RDBMS Features Not Available (Yet):•  Referential Integrity Constraints / Foreign Keys•  Commit / Rollback•  Stored Procedures•  Joins•  Views•  Triggers•  Functions•  Security Privileges•  Rules•  Partitioned Table DefinitionsWhats Not In Cassandra...
  • 13. CassandraInWriting Space
  • 14. Document Versioning...
  • 15. How We Modeled Our Data...Storage Strategy: Document-oriented1:M1:1
  • 16. The Writing SpaceDB Infrastructure
  • 17. The Hardware•  Many Inexpensive Servers (Actually 4 + 1)•  Our Configuration:Processor: Xeon E5630, 2.53GHz, 4 CoresMemory: 96 GBStorage:Two Mirrored Spinning Disks For OS / BinariesThree Striped 480GB Solid State Drives(Providing 1.3 TB Local DB Storage)•  Peer to Peer Ring•  Hot Swappable - Fault Tolerant•  "Whats Your Insurance Company?"
  • 18. Why DataStax Cassandra?•  A Certified, Production Ready Version Of Cassandra•  24/7 World Class Support•  Integration With Hadoop•  Integration With Solr•  OpsCenter (Multi-Data Center Management Tool)
  • 19. •  Doc Store and UI•  Load: 3x Anticipated Load•  Total Time Of Run: 1.75 hours•  Max Document Size: 10k (25k, 50k and 75k DS)ResultsAverage Response Time: < 300msMaximum Running Vusers: 684Total Throughput (bytes): 7,176,727,121Average Throughput (bytes/sec): 1,993,535Total Hits: 342,833Average Hits per Second: 95DB Server CPU < 0.3%Performance
  • 20. •  Document Store only•  Load: 100x Anticipated Load•  Total Time Of Run: 1 hour•  Document Size: 25k, 50k and 75kResultsAverage Response Time: < 100msMaximum Running Vusers: 2,200Total Throughput (bytes): 2,291,522,553Average Throughput (bytes/sec): 565,808Total Hits: 834,640Average Hits per Second: 206DB Server CPU < 1%Performance
  • 21. Wrapping It Up
  • 22. Cloud Decision Points•  Cost Savings•  Continuous Availability•  Performance / Dynamic (Elastic) Scalability•  Global Distribution Of Access Points•  Redundancy•  Disaster Recovery•  Resiliency To Node / Connectivity Loses A Must
  • 23. •  Think About Reporting Up Front•  Data Analytics – Hadoop and Solr Are Heavy Duty•  More Expensive Hardware?•  Different RAID Configuration (Not Striping)•  Get Training – Especially About Schema DesignWhat Would We Do Differently?
  • 24. Consider The Human Element...•  Mind Shift For RDBMS Folks•  Need To “Let Go” That Data Needs To Be Normalized•  Experience Of Operations Team•  Netflix - 4 People Managing 800+ NodesGlobal Enterprise•  Global Presence•  Disaster Recovery•  Internet ScaleFinal Thoughts...
  • 25. Writing Spaceand theCassandra NoSQL DBMSThank you!Questions?Brian.King@Pearson.com

×