Your SlideShare is downloading. ×
0
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
DataStax
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DataStax

883

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
883
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extreme Data Velocity Continuous Availability Operational Simplicity Michael Shaler Senior Director, Business Development ©2013 DataStax Confidential. Do not distribute without consent.
  • 2. What is Big Data’s payoff?
  • 3. DataStax: CRN’s “10 Coolest Big Data Startups” Cassandra: InfoWorld’s Technology of the Year 1,000+ production deployments and 300 customers $84M in funding from industry-leading investors
  • 4. We are the first viable alternative to Oracle for modern online applications. We seek to be the first and best choice in databases.
  • 5. No, Seriously…
  • 6. Real-world Use Cases
  • 7. Internet of Things Database Requirements • “UTC subject predicate”: Time series data and metadata are the lingua franca of sensors/device data communications • FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs with variable schemas/data models is the norm—and unless you tell them to do so, sensors never, ever sleep… • HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary. • DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT operational datastores • AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need • SPOILER ALERT: DataStax Enterprise supports all of the above 7
  • 8. Time Series Analytics: 70B readings Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households Improvements in demand forecasting could yield EBITDA > $100M per GW saved • • • $5M CAPEX 10 man/months delivery (Deploy, DevOps, Tuning) Ongoing OPEX of > $1M • • • • $450K OPEX 2 DevOps running 15 AWS nodes Faster performance in 2 weeks …All in the cloud
  • 9. Major Changes: The Evolving Data Center LOB App LOB App LOB App Data Warehouse Oracle MySQL SQL Server Teradata/ Exadata “What’s Happening?” Hyper Velocity Transactional “What Happened?” Massive Volume Bit Bucket NoSQL Hadoop
  • 10. The Application World *HAS* Changed
  • 11. Common Use Cases • Web product searches • Internal document search (law firms, etc.) • Real estate/property searches • Social media match ups • Web & application log management / analysis • Big data OLTP and write intensive systems • Time series data management • High velocity device data consumption and analysis • Healthcare systems input and analysis • Media streaming (music, movies, etc.) • Online Web retail (shopping carts, user transactions, etc.) • Online gaming (real-time messaging, etc.) • Real time data analytics • • Web click-stream analysis • Buyer event and behavior analytics • Fraud detection and analysis • Risk analysis and management • 11 Social media input and analysis Supply chain analytics
  • 12. Continuous Availability Commentary
  • 13. Cassandra: Architecture as Foundation Virginia Santa Clara London Sydney
  • 14. The New DR: Simian Army “Dystopia as a Service” Virginia London Santa Clara Sydney 14
  • 15. Heterogeneous Workloads: Active Everywhere Read Analyze Write Virginia London Search Write Santa Clara Sydney Search Write 15 Read
  • 16. Our Product Solution • DataStax Enterprise powers the big data apps that transform business. • Extreme Data Velocity • Continuous Availability • Operational Simplicity
  • 17. Operational Simplicity 33M streaming customers 2T API calls/year ~1,200 Servers 55 AWS clusters 12 developers 4 operators 0 New data centers ©2012 DataStax “Our primary operational data store is now Cassandra, not Oracle.” 17
  • 18. Performance: NoSQL Leadership Cassandra vs. HBase: •10x more read throughput •100x faster read latency •8x more write throughput •8x faster scan latency •4x more scan throughput Source: Solving Big Data Challenges for Enterprise Application Performance Management Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
  • 19. Performance: NoSQL Leadership YCSB Load Process YCSB Read-mostly YCSB Read-write mix ©2012 DataStax YCSB Write-mostly 19
  • 20. From STB to the Scalable Cloud Message Bus Use Case: X1 Sports App 18000) 16000) API/sec) 14000) Even in preproduction environment prior to tuning, achieved near-linear scalability 12000) 10000) 8000) 6000) 4000) 2000) 0) 4) 8) 12) 16) 20) 24) Ring)Size) 20 Enabling a richer active consumer experience across multiple devices, multiple platforms
  • 21. Instagram Scales Engaged Networks • Transitioned from Redis (in-memory cache) to Cassandra in Amazon Web Services EC2 • Doubled cluster—and then doubled again—to support 150MM users on new infrastructure • Continue to scale in spite of Justin Bieber storms, video formats, new features, new markets CASSAN DRA AT IN STAGRAM Rick Branson, Infrastructure Engineer @ rbranson c om i t ac b02daea57dc a889c 2aa45963754a271f a51566 m Aut hor : Ri c k Br ans on Dat e: Sun Feb 10 20: 36: 34 2013 - 0800 Doubl ed C* c l us t er 2013 Cassandra Summit #cassandra13 June 12, 2013 San Francisco, CA 21
  • 22. Our Vision DataStax is driving Cassandra to be the first viable alternative to the Oracle database for companies who are transforming the way they interact with customers. Getting ahead of exploding growth Sign big, new contracts all the time (ESPN) • 200M unique users per month • 40TB of data • Flexible architecture • “Couldn’t shoehorn RDBMS technology” Very small operations team 3 people • 20 clusters • 100’s of nodes •
  • 23. Why We Exist Today’s applications must be always available and lightning fast as they scale to previously unimaginable levels. Cassandra delivers both with a beautifully simple and elegant architecture. “We need a real-time, massively scalable architecture, where no one node is a single point of failure, that can easily span multiple data centers and cloud availability zones, and that’s Cassandra.”
  • 24. What We Do Best Cassandra was designed to do things that are impossible in other databases when it comes to availability and performance. Forget about losing a machine here or there -Cassandra delivers a world where you can lose an entire datacenter and still perform as your customers expect. “We have to be ready for disaster recovery all the time. It’s really great that Cassandra allows for active-active multiple data centers where we can read and write anywhere” Jay Patel Technical Architect at eBay (Describing why they switched from legacy relational architecture)
  • 25. The Modern “Application”
  • 26. The Modern “Application” Fraud Detection and Prevention
  • 27. What It Means In Real Life
  • 28. What It Means In Real Life
  • 29. Cassandra Summit SF 2013
  • 30. Real Growth In Production
  • 31. We are the first viable alternative to Oracle for modern online applications.
  • 32. Thank You We power the big data apps that transform business. ©2013 DataStax Confidential. Do not distribute without consent.
  • 33. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 34. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 35. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 36. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 37. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 38. BENEFITS FEATURES Security in Cassandra Internal Authentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database Client to Node Encryption protects data in flight to and from a database cluster + Provides granular based control over who can add/change/delete/read data + Ensures data cannot be captured/stolen in route to a server + Uses familiar GRANT/REVOKE from relational systems + No learning curve + Data is safe both in flight from/to a database and on the database; complete coverage is ensured
  • 39. BENEFITS FEATURES Advanced Security in DataStax Enterprise External Authentication uses external security software packages to control security Transparent Data Encryption encrypts data at rest Data Auditing provides trail of who did and looked at what/when + Only authorized users have access to a database system using external validation + Protects sensitive data at rest from theft and from being read at the file system level + Supplies admins with an audit trail of all accesses and changes + Uses most trusted external security packages (Kerberos, LDAP), mainstays in government and finance + No changes needed at application level + Single sign on to all data domains + Can encrypt both Cassandra and Hadoop data + Granular control to audit only what’s needed + Uses log4j interface to ensure performance and efficient audit operations

×