Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user experience

914 views

Published on

Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user experience

  1. 1. Dyn + DataStax: Helping Companies Deliver Exceptional End-User Experience May 17, 2016 Tim Chadwick, Principal Engineer, Infrastructure, Dyn Rick Bross, Principal Engineer, Scalability, Dyn
  2. 2. The Story at Dyn The Road to Production Lessons and Direction Journey to DataStax Enterprise
  3. 3. The Story at Dyn Dyn is a cloud-based Internet Performance Management (IPM) company that provides unrivaled visibility and control into cloud and public Internet resources. Dyn’s platform monitors, controls and optimizes applications and infrastructure through Data, Analytics, and Traffic Steering, ensuring traffic gets delivered faster, safer, and more reliably than ever. http://techcrunch.com/2016/05/10/dyn-series-b/
  4. 4. DNS Overview
  5. 5. Dyn Global: 20+ Data Centers
  6. 6. tchadwick@piedmont:~$ dig SOA ifc.com | grep -A 1 "ANSWER SECT" ;; ANSWER SECTION: ifc.com. 7175 IN SOA ns1.p28.dynect.net. postmaster.ifc.com. 2016042900 3600 600 604800 1800Build a sustainable system that can track usage by customer and zone (domain). The consumers are our customers, our billing department, and Chris Baker. Who Needs These Data?
  7. 7. For each five minute interval of an invoice period, determine the Queries per Second (QPS) and sort in descending order. Discard the top 5%, and it is the maximum value remaining which is a customer’s 95th Percentile, or monthly bill rate. http://dyn.com/blog/the-95th-percentile-burstable-billing-model-managed-dns/ https://en.wikipedia.org/wiki/Burstable_billing#95th_percentile Traffic Telemetry
  8. 8. 1. Operations -Flexible Topology -Resilient Clusters -Visibility and Administration 2. Data Model -Idempotent Writes -Low Concurrency -Application Redundancy Oh, and it must perform well. Benchmarking Cassandra Scalability on AWS Over a million writes per second Priorities that Led to DataStax Enterprise
  9. 9. Consult the Experts
  10. 10. Oh Baby!
  11. 11. One sec, new priority...
  12. 12. FidelityCustomer Enterprise Requirement Ahead!
  13. 13. Sunnyvale (USSNN1) North Bergen (USNBN1) CREATE KEYSPACE qld WITH replication = { 'class': 'NetworkTopologyStrategy', ... }; USE qld; CREATE TABLE qld_logs ( key text, row_seq bigint, logline text, PRIMARY KEY ((key), row_seq) ) WITH COMPACT STORAGE AND ... compaction={'class': 'SizeTieredCompactionStrategy'} AND ... Detailed DNS Query Log - DSE Cluster
  14. 14. Success!
  15. 15. Back to our original goal....
  16. 16. • Customers • Zones (Domains) • Zone Record Types • Fully Qualified Domain Names (qnames) • Regions (ANYCAST) • Data Centers • Nameservers • “Top 10s” Many, many more customers. Many, many more dimensions. I Want More From You....
  17. 17. Datastax Enterprise Provided the Tools
  18. 18. North BergenSunnyvale CREATE TABLE "QueryCountSummaryCF” CREATE TABLE "QueryZoneCountCF" CREATE TABLE "QueryHostCountCF" CREATE TABLE “QueryCountSummaryRollupsCF" CREATE TABLE "QueryZoneCountRollupsCF" CREATE TABLE "QueryHostCountRollupsCF" CREATE TABLE "QueryPlatformCountCF" WITH DEFAULT_TIME_TO_LIVE = 31536000; WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication' : 2 }; The Working Solution
  19. 19. Factoids Throughput: ○ > 12k w / s ○ 99th percentile < 5ms ○ Avg read latency < 10ms Size: ○ 200GB -> 1.2TB, steady ○ ~ 12B data points
  20. 20. How DataStax Enterprise Provided Value ● Support in Every Phase ○ Proof of Concept ○ Design ○ Operations ○ Optimization ● Integrated Toolkit ○ OpsCenter ○ SPARK We get the value of many, many people at the cost of about 1/2 FTE.
  21. 21. Lessons Learned
  22. 22. Top Lessons Learned 1. Include all teams in planning, deployment and implementation. 2. Consult knowledgeable people before making decisions and “optimizations”. 3. Understand compaction strategies to immediately eliminate those that are not a fit. 4. Ensure that client load balancing policies and consistency levels match DC topology and schema replication factors. 5. Model and understand all failure scenarios. 6. Use Spark to aggregate data in order to save storage and improve performance.
  23. 23. #1: Include all teams ● Product management ● Application engineering ● DBAs ● Operations ● Network engineering ● System engineering ● Finance and Management
  24. 24. #2: Consult knowledgeable people . . . ● Schema ● Cluster topology and tuning ● Tuning ● Compaction algorithms ● Client interaction Talk to Datastax! They’ve probably seen it before!
  25. 25. #3: Understand Compaction Strategies! DTCS was our first choice. It didn’t work . . . . Tim Goodaire September 02, 2015 17:10 We have changed the compaction strategy, concurrent_compactors, compaction_throughput, and heap size. It took a while for the cluster to complete the compactions, but it's done now. The cluster is up and appears to be healthy. Today, we've been adding a few more nodes and resetting the heap size back to 8 GB.
  26. 26. #4: Ensure client and cluster settings match Load balancing policies, read and write consistency, schema replication factor, cluster topology . . .
  27. 27. #5: Model Failure Scenarios What happens when a node fails? Two? The DC? Will the client fail? How will queries be satisfied?
  28. 28. 700 rows for a single 5 minute interval Daily billing went from 14 hours, to 2 hours on DSE/C*, and 12 minutes with DSE/SPARK #6: Use DSE Spark to aggregate 20 rows for an hour interval
  29. 29. What’s Next? © DataStax, All Rights Reserved. 29 ● Rely on best practices to support more analytical use cases across products. ● Complete development of generic C* solution, for quicker time to market, and greater scale in our hybrid cloud. ● Consider new opportunities for relying on DSE for products delivering services.
  30. 30. Contacts and Thanks! Tim Chadwick Principal Engineer, Infrastructure https://www.linkedin.com/in/timjchadwick tchadwick@dyn.com @DynData Rick Bross Principal Engineer, Data Analytics https://www.linkedin.com/in/rickbross rbross@dyn.com Dyn, Inc. 150 Dow St – Tower Two Manchester, NH 03101 603-668-4998 © DataStax, All Rights Reserved. 30
  31. 31. Coming Soon! ● June 8: How to Half Hour - Building Data Pipelines with SMACK: Storage Strategy using Cassandra and DSE ● July 6: How to Half Hour - Building Data Pipelines with SMACK: Analyzing Data with Spark ● For the latest schedule of webinars, check out our Webinars page: http://www.datastax.com/resources/webinars.
  32. 32. Appendix
  33. 33. Client ● Client cluster and session object configuration ○ Cluster seeds (DCAwareRoundRobinPolicy implications) ○ Other load balancing policies to wrap ○ Read and write consistency setting ○ # connections per host ○ # requests per connection ○ Pool timeout ● Client query settings ○ Read and write consistency (may override default for specific query) ○ Batches (rarely if ever should be used) ○ Stored procedures (usually best practice for groups of queries - ex. we use for high velocity inserts) ○ Sync or Async? Depends on the specific query, but usually best practice with stored procedures. ○ Write with a consistent TTL per table. ○ How many threads should share the client session object? We’ve found that balancing the DC capabilities, client latency, and a (native) thread pool turbocharges inserts. Cassandra Cluster ● Network topology ○ Colocated latency? Inter DC latency? ○ Replication factor per DC per schema ● Schema ○ Don’t mix schemas with different use cases! ○ Dyn’s usage pattern ■ Optimize INSERTs. ■ Ensure READs succeed. ■ Avoid UPDATEs (“out of order” TTLs) ■ Ban DELETEs (turn off the repair service) ○ Attempt to have all (voluminous) tables use the same compaction strategy. ○ Use consistent TTLs for writes. If you override the default, always override with the same value. ● Compaction algorithms ○ With ✓ time series data, ✓ no deletes, ✓ no updates and ✓ consistent TTLs: you can use DTCS, which will simply drop old sstables. Client/Cluster Settings - Must Work Together!

×