“Writing is one of the most effective toolsavailable to develop a students critical thinking.”Why A Writing Space?
• Efficient Administration Of Writing Assignments• Scalable Classrooms (500+)• Workflow Optimization / Automation• Integrated Access to Assessment Toolso Grammar Checkingo Auto-Scoringo Plagiarism Detection (Source Check)• Grading Rubrics• Online Editing and Document Upload• Peer Review• Group ProjectsThe Business Needs
• Highly "Internet" Scalable• Global Presence• Continuous Availability (Fault Tolerance)• Broad OS And Browser Support• Mobile Device Support - "Mobile First"• Low Cost (Systems, Maintenance, Integration)• Write Once, Integrate “Anywhere”• Gain Experience With Modern NoSQL Technologies• REST Service-Based Architecture• Model UIThe Technical Goals
• Highly Scalable• Easy Multi-Data Center Support• Performance• Distributed Ring Configuration (Master-less)• Dynamic Schema, “Schema-less”• Slice QueriesWhat We Like
• Eventual / Tunable Consistency• Key-Name-Value Data Store (Column Based)• Data Modeling Based On Core Queries• All Rows in a CF Typically Dont Live On 1 Server• However, All Columns For a Row Do• RDBMS Mindset• No Ad Hoc QueriesWhat Challenged Us
What Is Consistency?• Write Consistency: Number Of Replicas Written To• Read Consistency: Number Of Replicas Queried• Replication Factor: Number Of Replicas For A Row• Quorum Consistency Level (Read And Write):o Option In Specifying Read/Write Consistencyo (Replication_Factor / 2) + 1o Ensures Strong Consistencyo While Maintaining High Availability• With 4 Servers, Writing Space uses:o Replication Factor = 3o Read and Write Quorum Consistency
Typical RDBMS Features Not Available (Yet):• Referential Integrity Constraints / Foreign Keys• Commit / Rollback• Stored Procedures• Joins• Views• Triggers• Functions• Security Privileges• Rules• Partitioned Table DefinitionsWhats Not In Cassandra...
The Hardware• Many Inexpensive Servers (Actually 4 + 1)• Our Configuration:Processor: Xeon E5630, 2.53GHz, 4 CoresMemory: 96 GBStorage:Two Mirrored Spinning Disks For OS / BinariesThree Striped 480GB Solid State Drives(Providing 1.3 TB Local DB Storage)• Peer to Peer Ring• Hot Swappable - Fault Tolerant• "Whats Your Insurance Company?"
Why DataStax Cassandra?• A Certified, Production Ready Version Of Cassandra• 24/7 World Class Support• Integration With Hadoop• Integration With Solr• OpsCenter (Multi-Data Center Management Tool)
• Doc Store and UI• Load: 3x Anticipated Load• Total Time Of Run: 1.75 hours• Max Document Size: 10k (25k, 50k and 75k DS)ResultsAverage Response Time: < 300msMaximum Running Vusers: 684Total Throughput (bytes): 7,176,727,121Average Throughput (bytes/sec): 1,993,535Total Hits: 342,833Average Hits per Second: 95DB Server CPU < 0.3%Performance
• Document Store only• Load: 100x Anticipated Load• Total Time Of Run: 1 hour• Document Size: 25k, 50k and 75kResultsAverage Response Time: < 100msMaximum Running Vusers: 2,200Total Throughput (bytes): 2,291,522,553Average Throughput (bytes/sec): 565,808Total Hits: 834,640Average Hits per Second: 206DB Server CPU < 1%Performance
Cloud Decision Points• Cost Savings• Continuous Availability• Performance / Dynamic (Elastic) Scalability• Global Distribution Of Access Points• Redundancy• Disaster Recovery• Resiliency To Node / Connectivity Loses A Must
• Think About Reporting Up Front• Data Analytics – Hadoop and Solr Are Heavy Duty• More Expensive Hardware?• Different RAID Configuration (Not Striping)• Get Training – Especially About Schema DesignWhat Would We Do Differently?
Consider The Human Element...• Mind Shift For RDBMS Folks• Need To “Let Go” That Data Needs To Be Normalized• Experience Of Operations Team• Netflix - 4 People Managing 800+ NodesGlobal Enterprise• Global Presence• Disaster Recovery• Internet ScaleFinal Thoughts...