Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Successful Architectures for Fast Data

691 views

Published on

Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:

Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.

Published in: Technology

Successful Architectures for Fast Data

  1. 1. Successful Architectures for Fast Data Patrick McFadin VP Developer Relations, DataStax @PatrickMcFadin
  2. 2. © DataStax, All Rights Reserved. • VP Developer Relations at DataStax • Solution Architect for DataStax • Chief Architect at Hobsons • Building and deploying web apps since 1996 2 Who am I?
  3. 3. © DataStax, All Rights Reserved. THE CHALLENGE: Fast Data? • Big Data is all about processing at scale • Fast Data can be any size • Fast Data is the biggest challenge in Data Management today 3 When do I know I have Fast Data vs Big Data?
  4. 4. © DataStax, All Rights Reserved.4 The Problem Your Magical App
  5. 5. © DataStax, All Rights Reserved. Danger here be dragons 5
  6. 6. © DataStax, All Rights Reserved. The Great Unknown 6 Scary parts Scary parts Scary parts Scary parts Scary parts Scary parts Scary parts Scary parts
  7. 7. ?
  8. 8. © DataStax, All Rights Reserved.10 Distributed? Sad! shard 1 shard 2 shard 3 shard 4 App Server client
  9. 9. © DataStax, All Rights Reserved.11 Uptime Rack Rack Rack Data Center Rack Rack Rack Data Center 0% chance of 100% uptime in one data center
  10. 10. © DataStax, All Rights Reserved.12 Uptime - Cloud version AZ AZ AZ Region AZ AZ AZ Region 0% chance of 100% uptime in one region
  11. 11. © DataStax, All Rights Reserved.13 Remember AWS:Reboot? Double-click to edit
  12. 12. © DataStax, All Rights Reserved.14 Can you withstand failure? Double-click to edit
  13. 13. 133 ms Looks like you want to go faster than light. Can I help? Yes No
  14. 14. © DataStax, All Rights Reserved.17 Solutions? Sad!
  15. 15. © DataStax, All Rights Reserved.18 Time to slay the dragons
  16. 16. © DataStax, All Rights Reserved.19 Organize Process Store Macro Architecture for Success
  17. 17. SMACK
  18. 18. Spark Mesos Akka Cassandra Kafka
  19. 19. CassandraAkka SparkKafka Organize Process Store Mesos KafkaKafkaKafka SparkSparkSpark AkkaAkkaAkka CassandraCassandraCassandra
  20. 20. CassandraAkka SparkKafka Organize Process Store
  21. 21. Kafka
  22. 22. Kafka decouples data pipelines
  23. 23. The problem Kitchen Hamburger please Meat disk on bread please
  24. 24. Scale Producer Topic = Hamburgers Order 1 Order 2 Consumer Order 3 Order 4 Order 5 Topic = Pizza Order 1 Order 2 Order 3 Order 4 Order 5 Topic = Food
  25. 25. Kafka Producer Topic = Temperature Temp 1 Temp 2 Consumer Temp 3 Temp 4 Temp 5 Collection API Temperature Processor Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Precipitation Processor Broker
  26. 26. Kafka Producer Topic = Temperature Temp 1 Temp 2 Consumer Temp 3 Temp 4 Temp 5 Collection API Temperature Processor Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Precipitation Processor Broker Partition 0 Partition 0
  27. 27. Kafka Producer Consumer Collection API Temperature Processor Precipitation Processor Topic = Temperature Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Broker Partition 0 Partition 0 Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Partition 1 Temperature Processor
  28. 28. Kafka Producer Consumer Collection API Temperature Processor Precipitation Processor Topic = Temperature Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Broker Partition 0 Partition 0 Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Partition 1 Temperature Processor Topic = Temperature Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Broker Partition 0 Partition 0 Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Partition 1 Topic Temperature Replication Factor = 2 Topic Precipitation Replication Factor = 2
  29. 29. Kafka Producer Consumer Collection API Temperature Processor Precipitation Processor Topic = Temperature Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Broker Partition 0 Partition 0 Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Partition 1 Temperature Processor Topic = Temperature Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Topic = Precipitation Precip 1 Precip 2 Precip 3 Precip 4 Precip 5 Broker Partition 0 Partition 0 Tem 1 Temp 2 Tem 3 Temp 4 Temp 5 Partition 1 Temperature Processor Temperature Processor Precipitation Processor Topic Temperature Replication Factor = 2 Topic Precipitation Replication Factor = 2
  30. 30. Guarantees Order •Messages are ordered as they are sent by the producer •Consumers see messages in the order they were inserted by the producer Durability •Messages are delivered at least once •With a Replication Factor N up to N-1 server failures can be tolerated without losing committed messages
  31. 31. Akka
  32. 32. Akka in a nutshell • Highly concurrent • Reactive • Fully distributed • Completely elastic and resilient Actor Mailbox Actor Mailbox Actor Mailbox Actor Mailbox
  33. 33. Temperature High/Low Stream Weather Stations Receive API Producer TemperatureActor TemperatureActor TemperatureActor Consumer
  34. 34. Cassandra
  35. 35. Cluster Server Token Range 0 0-25 26 26-50 51 51-75 76 76-100 Server ServerServer 0-25 76-100 26-5051-75
  36. 36. Replication 10.0.0.1 00-25 DC1 DC1: RF=1 Node Primary 10.0.0.1 00-25 10.0.0.2 26-50 10.0.0.3 51-75 10.0.0.4 76-100 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75
  37. 37. Replication 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 DC1 DC1: RF=2 Node Primary Replica 10.0.0.1 00-25 76-100 10.0.0.2 26-50 00-25 10.0.0.3 51-75 26-50 10.0.0.4 76-100 51-75 76-100 00-25 26-50 51-75
  38. 38. Replication DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50
  39. 39. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15
  40. 40. Consistency level Consistency Level Number of Nodes Acknowledged One One Quorum 51%
  41. 41. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  42. 42. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  43. 43. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= Quorum
  44. 44. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.1.0.1 00-25 76-100 51-75 10.1.0.2 26-50 00-25 76-100 10.1.0.3 51-75 26-50 00-25 10.1.0.4 76-100 51-75 26-50 DC2: RF=3
  45. 45. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.1.0.1 00-25 76-100 51-75 10.1.0.2 26-50 00-25 76-100 10.1.0.3 51-75 26-50 00-25 10.1.0.4 76-100 51-75 26-50 DC2: RF=3
  46. 46. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.1.0.1 00-25 76-100 51-75 10.1.0.2 26-50 00-25 76-100 10.1.0.3 51-75 26-50 00-25 10.1.0.4 76-100 51-75 26-50 DC2: RF=3
  47. 47. Spark
  48. 48. Great combo Store a ton of data Analyze a ton of data
  49. 49. Great combo Spark Streaming Near Real-time SparkSQL Structured Data MLLib Machine Learning GraphX Graph Analysis
  50. 50. Great combo Spark Streaming Near Real-time SparkSQL Structured Data MLLib Machine Learning GraphX Graph Analysis CREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); Spark Connector
  51. 51. Executer Master Worker Executer Executer Server
  52. 52. Master Worker Worker Worker Worker 0-24Token Ranges 0-100 25-49 50-74 75-99 I will only analyze 25% of the data.
  53. 53. Spark Streaming - Micro Batching 56© 2015. All Rights Reserved.
  54. 54. © DataStax, All Rights Reserved.57 DSE 5.1 Download today! https://academy.datastax.com/downloads
  55. 55. Mesos
  56. 56. CassandraAkka SparkKafkaKafkaKafkaKafka SparkSparkSpark AkkaAkkaAkka CassandraCassandraCassandra I need CPU!! I need memory!! Got you covered
  57. 57. Kafka Akka AkkaAkka Kafka Spark Spark
  58. 58. Kafka Akka Akka Akka Kafka Spark Spark
  59. 59. © DataStax, All Rights Reserved.62 Ready to go! Mesosphere DC/OS
  60. 60. © DataStax, All Rights Reserved. One Last Dragon 63
  61. 61. © DataStax, All Rights Reserved.64 Will the cloud save you?
  62. 62. © DataStax, All Rights Reserved.65 Organize Process Store Macro Architecture for Success Rent Rent
  63. 63. © DataStax, All Rights Reserved.66 Be multi-cloud. Be happy AWS GCP Azure
  64. 64. © DataStax, All Rights Reserved.67 Did you see this?
  65. 65. © DataStax, All Rights Reserved. How are you going to build your next amazing app? 68
  66. 66. Let us help you! www.datastax.com academy.datastax.com @PatrickMcFadin

×