-
1.
Next Gen Decision Making in <2ms
-
2.
2
-
3.
3
VS.
-
4.
4
-
5.
5
-
6.
6
-
7.
7
X (predictor)
Spend amount
Y (response)
Likelihood of millionaire
Simple Velocity Advanced
-
8.
8
-
9.
9
-
10.
10
-
11.
11
-
12.
12
Hard Metrics Goal
Latency < 40ms
Ideally < 16ms
Throughput Goal of 2000 events / second
Durability No loss, every message gets exactly one response
Availability 99.5% uptime (downtime of 1.83 days / year);
Ideally 99.999% uptime (downtime of 5.26 minutes / year)
Scalability Can add resources, still meet latency requirements
Integration Transparently connected to existing systems – Hardware, Messaging,
HDFS
Soft Metrics Goal
Open Source All components licensed as open source
Extensibility Rules can be updated, model is regularly refreshed
-
13.
13
-
14.
14
Onyx
-
15.
15
Enterprise
Readiness
RoadmapPerformance
Community
-
16.
16
-
17.
17
-
18.
18
-
19.
19
-
20.
20
-
21.
21
-
22.
22
-
23.
23
-
24.
24
-
25.
25
-
26.
26
• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM
Thread Local on ~54M events
Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec 54,126,122 0.19 1 1 1 2 2 5 6
Performance
-
27.
27
Durability
• Two physically independent pipelines on the same cluster processing
identical data
• For the same tuple, we find the best-case time between two pipelines
– 39 records out of 5.2M exceeded 16ms
– 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other
• 99.99925% success rate – “Five Nines”
• Average Latency of 0.0981ms
-
28.
28
-
29.
29
-
30.
30
Appendix
-
31.
31
Streaming Technologies Evaluated
• Spark Streaming
• Samza
• Storm
• Feedzai
• Infosphere Streams
• Flink
• Ignite
• VoltDB
• Cassandra
• Apex
• Of all evaluated technologies, Apache Apex is the only technology that is ready to
bring the decision making solution to production based on:
– Maturity
– Fault-tolerance
– Enterprise-readiness
– Performance
• Focus on open source
• Drive Roadmap
• Competitive Advantage for C1
-
32.
32
Stream Processing – Apache Storm
• An open-source, distributed, real-time computation system
– Logical operators (spouts and bolts) form statically parallelizable topologies
– Very high throughput of messages with very low latency
– Can provide <10ms latency end-end under normal operation
• Basic abstractions provide an at-least-once processing guarantee
Limitations
• Nimbus is a single point of failure
– Rectified by Hortonworks, but not yet available to the public (no timeline for release)
• Upstream bolt/spout failure triggers re-compute on entire tree
– Can only create parallel independent stream by having separate redundant topologies
• Bolts/spouts share JVM Hard to debug
• Failed tuples cannot be replayed quicker than 1s
• No dynamic topologies
• Cannot add or remove applications without service interruption
-
33.
33
Stream Processing – Apache Flink
• An open-source, distributed, real-time computation system
– Logical operators are compiled into a DAG of tasks executed by Task Managers
– Supports streaming, micro-batch, batch compute
– Supports aggregate operations on streams (reduce, join, groupBy)
– Capable of <10 ms end-end latency with streaming under normal operation
• Can provide exactly-once processing guarantees
Limitations
• Failures trigger reset of ALL operators to last checkpoint
– Depends on upstream message broker to track state
• Operators share JVM
– Failure in one brings down all tasks sharing that JVM
– Hard to debug
• No dynamic topologies
• Young community, young product
-
34.
34
Stream Processing – Apache Apex
• An open-source, distributed, real-time computation system on YARN
• Apex is the core system powering DataTorrent, released under ASF
• Demonstrated high throughput with low latency running a next-generation
C1 model (avg. 0.25ms, max 2ms, @ 70k records/sec) w/ 600GB RAM
• True YARN application developed on the principles of Hadoop and YARN
at Yahoo!
• Mature product
– Core principles of Apex are derived from a proven solution in Yahoo Finance and
Yahoo hadoop.
– Operability in Apex is first class citizen with focus on Enterprise capabilities
• DataTorrent (Apex) is executing on production clusters at Fortune 100
companies.
-
35.
35
Stream Processing – Apache Apex
Maturity
• Designed to process and manage global data for Yahoo! Finance
– Primary focus is on stability, fault-tolerance and data management
– Only OSS streaming technology considered designed explicitly for the financial world
• Data or computation could never be lost or replicated
• Architecture had to never go down
• Goal was to make it rock-solid and enterprise-ready before worrying about performance
• Data flow across countries – perfect for use-case that requires cross-
cluster interaction
Enterprise Readiness
• Advanced support for:
– Encryption, authentication, compression, administration, and monitoring
– Deployment at scale in the cloud and on-prem – AWS, Google Cloud, Azure
• Integrates with huge set of existing tools:
– HDFS, Kafka, Cassandra, MongoDB, Redis, ElasticSearch, CouchDB, Splunk, etc.
-
36.
36
Apex Platform – Summary
• Apex Architecture
– Networks of physically independent, parallelizable operators that scale dynamically
– Dynamic topology modification and deployment
– Self-healing, fault tolerant, & recoverable
• Durable messaging queues between operators, check-pointed in memory and on disk
• Resource manager is a replicated YARN process, monitors and restarts downed operators
– No single point of failure, highly modular design
– Can specify locality of execution (avoids network and inter-process latency)
• Guarantees at-least-once, at-most-once, or exactly-once processing
Directed Acyclic Graph (DAG)
Output
Stream
Tuple Tuple
er
Operator
er
Operator
er
Operator
er
Operator
-
37.
37
Apex Platform – Overview
-
38.
38
Apex Platform – Malhar
-
39.
39
Apex Platform – Cluster View
Hadoop Edge Node
DT RTS
Management
Server
Hadoop Node
YARN Container
RTS App Master
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Streaming
Container
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Streaming
Container
CLI
REST
API
DT RTS
Management
Server
REST
API
Part of Community Edition
-
40.
40
Apex Platform – Operators
• Operators can be dynamically
scaled
• Flexible stream configuration
• Parallel Redis / HDHT DAGs
• Separate visualization DAG
• Parallel partitioning
• Durability of data
• Scalability
• Organization for in-memory
store
• Unifiers
• Combine statistics from
physical partitions
-
41.
41
Dynamic Topology Modification
• Can redeploy new operators and models at run-time!
• Can reconfigure settings on the fly
-
42.
42
Apex Platform – Failure Recovery
• Physical independence of partitions is critical
• Redundant STRAMs
• Configurable window size and heartbeat for low-latency recovery
• Downstream failures do not affect upstream components
– Snapshotting only depends on previous operator, not all previous operators
– Can deploy parallel DAGs with same point of origin (simpler from a hardware and
deployment perspective)
-
43.
43
Apex Platform – Windowing
• Sliding window and
tumbling window
• Window based on
checkpoint
• No artificial latency
• Used for stats
measurement
-
44.
44
• Apex
– Great UI to monitor, debug, and control system performance
– Fault-tolerance and recovery out of the box - no additional setup, or improvement
needed
• YARN is still a single point of failure, a name node failure can still impact the system
– Built-in support for dynamic and automatic scaling to handle larger throughputs
– Native integration with Hadoop, YARN, and Kafka – next-gen standard at C1
– Mature product
• Principles derived from years at Yahoo Finance and Yahoo Hadoop
• Built and planned by deep Hadoop and streaming experts
– Proven performance in production at Fortune 100 companies
Enterprise Readiness
-
45.
45
Enterprise Readiness
• Storm
– Widely used but abandoned by creators at Twitter for Heron in production
• Storm debug-ability - topology components are bundled in one process
• Resource demands
– Need dedicated hardware
– Can’t scale on demand or share usage
• Topology creation/tear-down is expensive, topologies can’t share cluster resources
– Have to manually isolate & de-commission machines
– Performance in failure scenarios is insufficient for this use-case
• Flink
– Operational performance has not been proven
• Only one company (ResearchGate) officially uses Flink in production
– Architecture shares fundamental limitations of Storm with regards to
dynamically scaling operators & topologies and debugability
– Performance in failure scenarios is insufficient for this use-case
-
46.
46
Performance
• Storm
– Meets latency and throughput requirements only when no failures occur.
– Resilience to failures only possible by running fully independent clusters
– Difficult to debug and operationalize complex systems (due to shared JVM and poor
resource management)
• Flink
– Broader toolset than Storm or Apex – ML, batch processing, and SQL-like queries
– Meets latency and throughput requirements only when no failures occur.
– Failures reset ALL operators back to the source – resilience only possible across
clusters
– Difficult to debug and operationalize complex systems (due to shared JVM)
• Apex
– Supports redundant parallel pipelines within the same cluster
– Outstanding latency and throughput even in failure scenarios
– Self-healing independent operators (simple to isolate failures)
– Only framework to provide fine-grained control over data and compute locality
-
47.
47
Roadmap – Storm
• Commercial support from from Hortonworks but limited code
contributions
• Twitter - Storm’s largest user - has completely abandoned Storm for Heron
• Business Continuity
– Enhance Storm’s enterprise readiness with high availability (HA) and failover to standby
clusters
– Eliminate Nimbus as a single point of failure
• Operations
– Apache Ambari support for Nimbus HA node setup
– Elastic topologies via YARN and Apache Slider.
– Incremental improvements to Storm UI to easily deploy, manage and monitor
topologies.
• Enterprise readiness
– Declarative writing of spouts, bolts, and data-sources into topologies
-
48.
48
Roadmap – Flink
• Fine-grained fault tolerance (avoid rollback to data source) – Q2 2015
• SQL on Flink – Q3/Q4 2015
• Integrate with distributed memory storage – No ECD
• Use off-heap memory – Q1 2015
• Integration with Samoa, Tez, Mahout DSL – No ECD
-
49.
49
Roadmap – Apex
• Roadmap for next 6 months
• Support creation of reusable pluggable modules (topologies)
• Add additional operators to connect to existing technology
– Databases
– Messaging
– Modeling systems
• Add additional SQL-like operations
– Join
– Filter
– GroupBy
– Caching
• Add ability to create cycles in graph
– Allows re-use of data for ML algorithms (similar to Spark’s caching)
-
50.
50
Road Map Comparison
• Storm
– Roadmap is intended to bring Storm to enterprise readiness Storm is not enterprise
ready today according to Hortonworks
• Flink
– Roadmap brings Flink up to par with Spark and Apex, does not create new capabilities
relative to either
– Spark is more mature for batch-processing and micro-batch and Apex is more mature
from a streaming standpoint.
• Apex
– No need to improve core architecture, focus is instead on adding functionality
• Better support for ML
• Better support for wide variety of business use cases
• Better integration with existing tools
– Stated commitment to letting the community dictate direction. From incubator proposal:
• “DataTorrent plans to develop new functionality in an open, community-driven way”
-
51.
51
Community
• Vendor and community involvement drive roadmap and project growth
• Storm
– Limited improvements to core components of Storm in recent months
– Limited focused and active committers
– Actively promoted and supported in public by Hortonworks
• Flink
– Some adoption in Europe, growing response in U.S.
– 11 active committers, 10 are from Data Artisans (company behind Flink)
– Community is very young, but there is substantial interest
• Apex
– Wide support network around Apex due to its evolution alongside Hadoop and YARN
– Young but actively growing community: http://incubator.apache.org/projects/apex.html
– Opportunity for C1 to drive growth and define the direction of this product
-
52.
52
Streaming Solutions Comparison
• Apex
– Ideal for this use case, meets all performance requirements and is ready for out-of-the-
box enterprise deployment
– Committer status from C1 allows us to collaboratively drive roadmap and product
evolution to fit our business need.
• Storm
– Great for many streaming use cases but not the right fit for this effort
– Performance in failure scenarios does not meet our requirements
– Community involvement is waning and there is a limited road map for substantial
product growth
• Flink
– Poised to compete with Spark in the future based on community activity and roadmap
– Not ready for enterprise deployment:
• Technical limitations around fault-tolerance and failure recovery
• Lack of broad community involvement
• Roadmap only brings it up to par with existing frameworks
-
53.
53
New Capabilities Provided by Proposed Architecture
• Millisecond Level Streaming Solution
• Fault Tolerant & Highly Available
• Parallel Model Scoring for Arbitrary Number of Models
• Quick Model Generation & Execution
• Dynamic Scalability based on Latency or Throughput
• Live Model Refresh
• A/B Testing of Models in Production
• System is Self Healing upon failure of components (**)
-
54.
54
Decisioning System Architecture - Strengths
• Internal
– Capital One software, running on Capital One hardware, designed by Capital One
• Open source
– Internally maintainable code
• Living Model
– Can be re-trained on current data & updated in minutes, not years
– Offline models can expanded and re-developed and deployed to production at will
• Extensible
– Modular architecture with swappable components
• A/B Model Testing in Production
• Dynamic Deployment / Refresh of Models
-
55.
55
Hardware
MDC Hardware Specifications
• Server Quantity – 15
• Server Model – Supermicro
• CPU – Intel Xeon E5-2695v2 2.4Ghz
12Cores
• Memory – 256GB
• HDD – (5) 4TB Seagate SATA
• Network Switch – Cisco Nexus 6001
10GB
• NIC – 2port SFP+ 10GbE
MDC Software Specifications
• Hadoop – v2.6.0
• Yarn – v2.6.0
• Apache Apex – v3.0
• Linux OS – RHEL v6.7
• Linux OS Kernel - 2.6.32-
573.7.1.el6.x86_64
-
56.
56
Performance Comparison - Redis vs. Apex-HDHT
Apex-HDHT - Thread Local on ~2M events
Stats Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec 1,807,283 0.253 1 1 1 2 2 2 2
Apex-HDHT Thread Local on ~54M events
Stats Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec 54,126,122 0.19 1 1 1 2 2 5 6
Apex-HDHT No locality on ~2M events
Stats Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
40k/sec 2,214,777 51.651 98 126 381 489 494 495 495
Redis Thread local on ~2M events
Stats Percentiles (in ms)
Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
8.5k/sec 2,018,057 13.654 16 18 20 21 22 22 22
Spark streaming is missing – nonstarter due to microbatch, lack of dynamic dag reconfiguration
Fast
Easy to use
Mature
***********
Failures are not independent, nimbus, no dynamic topologies, 1 sec ack, resource usage
Community stagnating, only Horton
Roadmap still to bring it to enterprise (Integration with YARN, elastic topologies, high availability)
Easy to Use
Fast
Support for SQL-like queries
----- Meeting Notes (10/27/15 10:14) -----
Reset to upstream data source, Shared JVM, No dynamic topologies
Young community
Roadmap – fine grained fault tolerance, in-memory store integration, off-heap memory, full SQL
Veterans from Yahoo! Finance and Hadoop
Built for Enterprise stability and durability before performance
*****************
Dynamic topologies
Downstream components do not affect upstream
Fine grained control of locality
No single point of failure
Operators are independent
Independence of partitions
Auto-scaling (throughput and latency)
Batch, micro-batch, and true streaming