Fast dataarchitecture

3,495 views

Published on

The way Knoldus tackles some of the complex projects with Scala, Akka, Kafka, Spark and Cassandra

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,495
On SlideShare
0
From Embeds
0
Number of Embeds
3,375
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fast dataarchitecture

  1. 1. (c) 2015-16, No reproduction without permission Fast Data Architecture Gurgaon<<December 2016
  2. 2. (c) 2015-16, No reproduction without permission Data is doubling every 2 years 2013 (4.4 Zetabytes) to 2020 (44 Zetabytes)
  3. 3. (c) 2015-16, No reproduction without permission Fast Data is eventual Big Data at Speed
  4. 4. (c) 2015-16, No reproduction without permission
  5. 5. (c) 2015-16, No reproduction without permission Data Value Chain
  6. 6. (c) 2015-16, No reproduction without permission Big Data Architecture 3 Major Pieces ● A scalable and available storage mechanism, such as a distributed filesystem or database ● A distributed compute engine, for processing and querying the data at scale ● Tools to manage the resources and services used to implement these systems
  7. 7. (c) 2015-16, No reproduction without permission CAP theorem For the always-on Internet, it often made sense to accept eventual consistency in exchange for greater availability.
  8. 8. (c) 2015-16, No reproduction without permission Classic Batch Architecture
  9. 9. (c) 2015-16, No reproduction without permission Batch Mode Characteristcs
  10. 10. (c) 2015-16, No reproduction without permission Challenge Batch Mode would not work for streaming needs of today. Example, breaking news on Google!
  11. 11. (c) 2015-16, No reproduction without permission Combination
  12. 12. (c) 2015-16, No reproduction without permission Fast Data Fast data comes into data systems in streams; they are fire hoses. These streams look like observations, log records, interactions, sensor readings, clicks, game play etc Hundreds to millions of times a second.
  13. 13. (c) 2015-16, No reproduction without permission Advantages of Fast Data ● Better insight ● Better personalization ● Better fraud detection ● Better customer engagement ● Better freemium conversion ● Better game play interaction ● Better alerting and interaction
  14. 14. (c) 2015-16, No reproduction without permission Fast Data Architecture
  15. 15. (c) 2015-16, No reproduction without permission Parts 1. Streams of Data, IoT, firehose, etc 2.REST Calls 3. Microservices, Reactive Platform 4. Zookeeper consensus and state management 5. Kafka connect for persistence 6. Low latency stream processing as runners in Beam with Flink, Gearpump 7.Stream processing results persisted back
  16. 16. (c) 2015-16, No reproduction without permission Parts 9. Pseudo stream with Spark 10. Batch mode processing and interactive Analytics 11. Deployed to mesos, yarn, cloud
  17. 17. (c) 2015-16, No reproduction without permission Stream Data characteristics
  18. 18. (c) 2015-16, No reproduction without permission Core Principles ● Event Logs – almost everything is an event. DB Crud transactions, Telemetry from IoT devices, Clickstream etc ● Event logs enable ES and CQRS ● Message Queues are core integration tool ● Message guarantees of At Most Once, At Least Once and Exactly Once
  19. 19. (c) 2015-16, No reproduction without permission Kafka ● Provides benefits of the Event Log ● Does not delete messages once they have been read. Hence, partitions can be replicated for durability and resiliency
  20. 20. (c) 2015-16, No reproduction without permission Reactive Systems ● Responsive The system can always respond in a timely manner, even when it’s necessary to respond that full service isn’t available due to some failure. ● Resilient The system is resilient against failure of any one component, such as server crashes, hard drive failures, network partitions, etc. Leveraging replication prevents data loss and enables a service to keep going using the remaining instances. Leveraging isolation prevents cascading failures.
  21. 21. (c) 2015-16, No reproduction without permission Reactive Systems ● Elastic You can expect the load to vary considerably over the lifetime of a service. It’s essential to implement dynamic, automatic scala bility, both up and down, based on load. ● Message driven While fast data architectures are obviously focused on data, here we mean that all services respond to directed commands an
  22. 22. (c) 2015-16, No reproduction without permission Lambda Architecture
  23. 23. (c) 2015-16, No reproduction without permission Sample Application
  24. 24. (c) 2015-16, No reproduction without permission Copyright (c) 2016-17 Knoldus Software LLP This training material is only intended to be used by people attending the Knoldus training. Unauthorized reproduction, redistribution, or use of this material is strictly prohibited.

×