(c) 2015-16, No reproduction without permission
Fast Data Architecture
Gurgaon<<December 2016
(c) 2015-16, No reproduction without permission
Data is doubling every 2 years
2013 (4.4 Zetabytes) to 2020 (44 Zetabytes)
(c) 2015-16, No reproduction without permission
Fast Data is eventual Big Data at Speed
(c) 2015-16, No reproduction without permission
(c) 2015-16, No reproduction without permission
Data Value Chain
(c) 2015-16, No reproduction without permission
Big Data Architecture
3 Major Pieces
● A scalable and available storage mechanism,
such as a distributed filesystem or database
● A distributed compute engine, for processing
and querying the data at scale
● Tools to manage the resources and services
used to implement these systems
(c) 2015-16, No reproduction without permission
CAP theorem
For the always-on Internet, it often made sense
to accept eventual consistency in exchange for
greater availability.
(c) 2015-16, No reproduction without permission
Classic Batch Architecture
(c) 2015-16, No reproduction without permission
Batch Mode Characteristcs
(c) 2015-16, No reproduction without permission
Challenge
Batch Mode would not work for streaming needs
of today.
Example, breaking news on Google!
(c) 2015-16, No reproduction without permission
Combination
(c) 2015-16, No reproduction without permission
Fast Data
Fast data comes into data systems in streams;
they are fire hoses.
These streams look like
observations, log records, interactions, sensor
readings, clicks, game play etc
Hundreds to millions of times a
second.
(c) 2015-16, No reproduction without permission
Advantages of Fast Data
● Better insight
● Better personalization
● Better fraud detection
● Better customer engagement
● Better freemium conversion
● Better game play interaction
● Better alerting and interaction
(c) 2015-16, No reproduction without permission
Fast Data Architecture
(c) 2015-16, No reproduction without permission
Parts
1. Streams of Data, IoT, firehose, etc
2.REST Calls
3. Microservices, Reactive Platform
4. Zookeeper consensus and state management
5. Kafka connect for persistence
6. Low latency stream processing as runners in
Beam with Flink, Gearpump
7.Stream processing results persisted back
(c) 2015-16, No reproduction without permission
Parts
9. Pseudo stream with Spark
10. Batch mode processing and interactive
Analytics
11. Deployed to mesos, yarn, cloud
(c) 2015-16, No reproduction without permission
Stream Data characteristics
(c) 2015-16, No reproduction without permission
Core Principles
● Event Logs – almost everything is an event.
DB Crud transactions, Telemetry from IoT
devices, Clickstream etc
● Event logs enable ES and CQRS
● Message Queues are core integration tool
● Message guarantees of At Most Once, At
Least Once and Exactly Once
(c) 2015-16, No reproduction without permission
Kafka
● Provides benefits of the Event Log
● Does not delete messages once they have
been read. Hence, partitions can be replicated
for durability and resiliency
(c) 2015-16, No reproduction without permission
Reactive Systems
● Responsive The system can always respond in a
timely manner, even when it’s necessary to respond
that full service isn’t available due to some failure.
● Resilient The system is resilient against failure of
any one component, such as server crashes, hard
drive failures, network partitions, etc. Leveraging
replication prevents data loss and enables a service
to keep going using the remaining instances.
Leveraging isolation prevents cascading failures.
(c) 2015-16, No reproduction without permission
Reactive Systems
● Elastic You can expect the load to vary
considerably over the lifetime of a service. It’s
essential to implement dynamic, automatic
scala bility, both up and down, based on load.
● Message driven While fast data architectures
are obviously focused on data, here we mean
that all services respond to directed
commands an
(c) 2015-16, No reproduction without permission
Lambda Architecture
(c) 2015-16, No reproduction without permission
Sample Application
(c) 2015-16, No reproduction without permission
Copyright (c) 2016-17 Knoldus
Software LLP
This training material is only intended to be used by
people attending the Knoldus training. Unauthorized
reproduction, redistribution, or use
of this material is strictly prohibited.

Fast dataarchitecture

  • 1.
    (c) 2015-16, Noreproduction without permission Fast Data Architecture Gurgaon<<December 2016
  • 2.
    (c) 2015-16, Noreproduction without permission Data is doubling every 2 years 2013 (4.4 Zetabytes) to 2020 (44 Zetabytes)
  • 3.
    (c) 2015-16, Noreproduction without permission Fast Data is eventual Big Data at Speed
  • 4.
    (c) 2015-16, Noreproduction without permission
  • 5.
    (c) 2015-16, Noreproduction without permission Data Value Chain
  • 6.
    (c) 2015-16, Noreproduction without permission Big Data Architecture 3 Major Pieces ● A scalable and available storage mechanism, such as a distributed filesystem or database ● A distributed compute engine, for processing and querying the data at scale ● Tools to manage the resources and services used to implement these systems
  • 7.
    (c) 2015-16, Noreproduction without permission CAP theorem For the always-on Internet, it often made sense to accept eventual consistency in exchange for greater availability.
  • 8.
    (c) 2015-16, Noreproduction without permission Classic Batch Architecture
  • 9.
    (c) 2015-16, Noreproduction without permission Batch Mode Characteristcs
  • 10.
    (c) 2015-16, Noreproduction without permission Challenge Batch Mode would not work for streaming needs of today. Example, breaking news on Google!
  • 11.
    (c) 2015-16, Noreproduction without permission Combination
  • 12.
    (c) 2015-16, Noreproduction without permission Fast Data Fast data comes into data systems in streams; they are fire hoses. These streams look like observations, log records, interactions, sensor readings, clicks, game play etc Hundreds to millions of times a second.
  • 13.
    (c) 2015-16, Noreproduction without permission Advantages of Fast Data ● Better insight ● Better personalization ● Better fraud detection ● Better customer engagement ● Better freemium conversion ● Better game play interaction ● Better alerting and interaction
  • 14.
    (c) 2015-16, Noreproduction without permission Fast Data Architecture
  • 15.
    (c) 2015-16, Noreproduction without permission Parts 1. Streams of Data, IoT, firehose, etc 2.REST Calls 3. Microservices, Reactive Platform 4. Zookeeper consensus and state management 5. Kafka connect for persistence 6. Low latency stream processing as runners in Beam with Flink, Gearpump 7.Stream processing results persisted back
  • 16.
    (c) 2015-16, Noreproduction without permission Parts 9. Pseudo stream with Spark 10. Batch mode processing and interactive Analytics 11. Deployed to mesos, yarn, cloud
  • 17.
    (c) 2015-16, Noreproduction without permission Stream Data characteristics
  • 18.
    (c) 2015-16, Noreproduction without permission Core Principles ● Event Logs – almost everything is an event. DB Crud transactions, Telemetry from IoT devices, Clickstream etc ● Event logs enable ES and CQRS ● Message Queues are core integration tool ● Message guarantees of At Most Once, At Least Once and Exactly Once
  • 19.
    (c) 2015-16, Noreproduction without permission Kafka ● Provides benefits of the Event Log ● Does not delete messages once they have been read. Hence, partitions can be replicated for durability and resiliency
  • 20.
    (c) 2015-16, Noreproduction without permission Reactive Systems ● Responsive The system can always respond in a timely manner, even when it’s necessary to respond that full service isn’t available due to some failure. ● Resilient The system is resilient against failure of any one component, such as server crashes, hard drive failures, network partitions, etc. Leveraging replication prevents data loss and enables a service to keep going using the remaining instances. Leveraging isolation prevents cascading failures.
  • 21.
    (c) 2015-16, Noreproduction without permission Reactive Systems ● Elastic You can expect the load to vary considerably over the lifetime of a service. It’s essential to implement dynamic, automatic scala bility, both up and down, based on load. ● Message driven While fast data architectures are obviously focused on data, here we mean that all services respond to directed commands an
  • 22.
    (c) 2015-16, Noreproduction without permission Lambda Architecture
  • 23.
    (c) 2015-16, Noreproduction without permission Sample Application
  • 24.
    (c) 2015-16, Noreproduction without permission Copyright (c) 2016-17 Knoldus Software LLP This training material is only intended to be used by people attending the Knoldus training. Unauthorized reproduction, redistribution, or use of this material is strictly prohibited.