Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.twosigma.com
Responsive and Scalable Real-time
Data Analytics
September 13, 2018
Cecilia Ye
Presented to SHPE 11/2/2017
Disclaimer
September 13, 2018
This document is being distributed for informational and educational purposes only and is no...
About Me
September 13, 2018
Engineer at Two Sigma
Lead a team that builds analytics engines and data dashboard
platforms t...
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time ...
VS
Data in MotionData at Rest
v
Analytics done after
the data creating events
have occurred
Analytics happens
in real-time...
VS
Stream OrientedBatch Oriented
v
Data captured in data warehouses
& Processed some time later
in a scheduled batch job
C...
Real-time analytics is valuable to uses cases in many fields…
Monitor financial markets and trading systems
Detect fraudul...
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming ...
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time ...
Readily responsive to a stimulus
Reactive
- Merriam Webster
Key Considerations Revisited
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying...
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming ...
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming ...
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming ...
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming ...
A model of concurrent computation
Provides an abstraction for supporting reactive
principles
Actor Model
Actor
Primitive of concurrent computation
Can hold and modify its own private state,
but no shared mutable state
How do Actors communicate?
A Real-life analogy
Send to a friend …
How do Actors communicate?
A Real-life analogy
The communication is asynchronous
Use messages to
communicate
Actor A
Actor
B
M
Decouples the sending
and receiving of
messages
Actor B may or may
not have ...
Data flows respond automatically to
propagating changes
Data-flow
Focused
Event-based
Non-
blocking
Availability of new in...
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time ...
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
De...
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Co...
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Co...
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Co...
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Co...
Implementation
• Uses Akka, a toolkit that supports building actor systems on the JVM
• Clean separation between “plumbing...
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis...
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Real-time
Data
Data can co...
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis...
Transformations & Analysis
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
Analysis dec...
Transformations & Analysis
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
A vocabulary...
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis...
Hardware and configurations: One VM with 15 vCPUs, 96 GB Memory, Linux Debian Wheezy OS
Metric Sizes and units
Typical loa...
Thank you
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Upcoming SlideShare
Loading in …5
×

of

Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 1 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 2 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 3 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 4 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 5 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 6 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 7 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 8 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 9 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 10 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 11 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 12 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 13 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 14 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 15 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 16 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 17 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 18 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 19 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 20 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 21 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 22 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 23 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 24 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 25 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 26 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 27 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 28 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 29 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 30 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 31 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 32 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 33 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 34 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 35 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 36 Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye Slide 37
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye

Download to read offline

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.

  • Be the first to like this

Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye

  1. 1. www.twosigma.com Responsive and Scalable Real-time Data Analytics September 13, 2018 Cecilia Ye Presented to SHPE 11/2/2017
  2. 2. Disclaimer September 13, 2018 This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for, investment advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). Such views reflect the assumptions of the author(s) of the document and are subject to change without notice. The document may employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of such information and the use of such information in no way implies an endorsement of the source of such information or its validity. The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.
  3. 3. About Me September 13, 2018 Engineer at Two Sigma Lead a team that builds analytics engines and data dashboard platforms that provide real-time monitoring
  4. 4. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  5. 5. VS Data in MotionData at Rest v Analytics done after the data creating events have occurred Analytics happens in real-time as events take place
  6. 6. VS Stream OrientedBatch Oriented v Data captured in data warehouses & Processed some time later in a scheduled batch job Continuous computation & Extract information as soon as data arrives
  7. 7. Real-time analytics is valuable to uses cases in many fields… Monitor financial markets and trading systems Detect fraudulent credit card activity as it happens Identify anomalies in telemetry collected from home automation systems
  8. 8. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion
  9. 9. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  10. 10. Readily responsive to a stimulus Reactive - Merriam Webster
  11. 11. Key Considerations Revisited Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion
  12. 12. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events
  13. 13. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load
  14. 14. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load React To Failures
  15. 15. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load React To Failures React To Users
  16. 16. A model of concurrent computation Provides an abstraction for supporting reactive principles Actor Model
  17. 17. Actor Primitive of concurrent computation Can hold and modify its own private state, but no shared mutable state
  18. 18. How do Actors communicate? A Real-life analogy Send to a friend …
  19. 19. How do Actors communicate? A Real-life analogy The communication is asynchronous
  20. 20. Use messages to communicate Actor A Actor B M Decouples the sending and receiving of messages Actor B may or may not have to respond to actor A Non-blocking response
  21. 21. Data flows respond automatically to propagating changes Data-flow Focused Event-based Non- blocking Availability of new information drives the logic forward Emphasizes asynchronous techniques & non-blocking execution Reactive Key Traits
  22. 22. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  23. 23. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Design Considerations
  24. 24. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Design Considerations
  25. 25. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Customizable analytics functions & Handle different data formats Design Considerations
  26. 26. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Customizable analytics functions & Handle different data formats Handle out-of- order or late data Keep track of late arriving data and manage the ordering correctly Design Considerations
  27. 27. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Business-specific analytics functions & Handle different data formats Handle out-of- order or late data Keep track of late arriving data and manage the ordering correctly Reliability Resilient to failures, including problems of upstream data source Design Considerations
  28. 28. Implementation • Uses Akka, a toolkit that supports building actor systems on the JVM • Clean separation between “plumbing and wiring” and data transformation logic • Allow us to focus more on the functionality and analytics & less on the low-level wiring of asynchronous programming
  29. 29. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Sinks Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor In-Memory Cache Actor MMaped Cache Actor DB Writer Actor Real-time Data Example Data Flow
  30. 30. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Real-time Data Data can come from a many sources Could be unbounded flows of data
  31. 31. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor Real-time Data New information flows through the system as messages between actors Continuously calculates statistics and metrics on- the-fly from live streams of data
  32. 32. Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor Analysis decomposed into multiple discrete steps, each represented by an actor Composable Workflows: Chain together a composition of functions to form a data analysis pipeline
  33. 33. Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor A vocabulary of reusable functional transformations offers solutions to most analytics problems Allow custom logic encapsulated in an actor construct to solve problems that are more business-specific
  34. 34. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Sinks Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor In-Memory Cache Actor MMaped Cache Actor DB Writer Actor … Real-time Data The results can have many destinations Dashboard & Visualization Data Storage
  35. 35. Hardware and configurations: One VM with 15 vCPUs, 96 GB Memory, Linux Debian Wheezy OS Metric Sizes and units Typical load 4k-20k events per second Peak capability 150k events per second Number of Actors 7,000+ Typical time between data arrival and processing Milliseconds under typical load; seconds under high load Analytics Engine Capabilities and Performance
  36. 36. Thank you

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.

Views

Total views

402

On Slideshare

0

From embeds

0

Number of embeds

34

Actions

Downloads

25

Shares

0

Comments

0

Likes

0

×