SlideShare a Scribd company logo
1 of 45
Hi, this is
simonsuo
“To provide a conceptual framework
for designing a dispatch engine that
reacts to a request by gathering
various inputs, dispatching requests
with the inputs to some pricing
engines, then reassembling the results
into a form the original requestor can
comprehend.” – Andrei
Introducing
PHOBOSDEIMOS
DEIMOS: short-term goal
Debugging:
storemore
logfaster
DEIMOS: long-term goal
Service-oriented performance profiling
DEIMOS: high level architecture
Storage
Computation /
Indexing
Ingestion /
Buffering
Apache Kakfa Apache Storm Apache HBase
A bunch of
cooltoys
“Distributed publish-subscribe
message queue”
Kafka: concept
Kafka: message queue
Kafka: storm integration
Storm: concept
“Distributed real-time
computation graph”
Storm: topology
Number
Spout
Data-store
Odd
Bolt
Even
Bolt
[1,2,3,4, …]
[2,4, …]
[1,3, …]
Pair
Bolt
Log
Bolt
[(1,2),(3,4), …]
Storm: parallelism
• 1 worker per node per topology
• 1 executor per core for CPU bound tasks
• 1-10 executors per core for IO bound tasks
• Compute total parallelism possible and
distribute it amongst slow and fast tasks. High
parallelism for slow tasks, low for fast tasks.
Storm: tuning guidelines
Storm: topology code
HBase: concept
“Distributed, non-relational,
key/value store based on HDFS”
HBase: schema design
Raw Table: "logTable"
Row Key
Family
"rrh" "rrb" "mh" "mb"
Qualifiers
entryId0 entryId1 entryId0 entryId1 entryId0 entryId1 entryId0 entryId1
rootRequestId0|requestId0
rootRequestId0|requestId1
rootRequestId1|requestId0
rootRequestId1|requestId1
Uuid/LogTime Index Table: "indexTable"
Row Key
Family
"rrh"
Qualifiers
rootRequestId0 rootRequestId1
logTime0|uuid0
logTime0|uuid1
logTime1|uuid0
logTime1|uuid1
HBase: storm integration
HBase: retrieval
DEIMOS: detailed implementation
Kafka Cluster Storm Cluster HBase Cluster
xaplog
Kafka producer library
MARS & pricing
tasks
Marsloggerclient
library
BAS
mttsvc (java)
HBase client library
MTT MLOG
(Terminal)
MTT WEB
(PHP)
Index
Storage
KafkaSpout
IndexBolt
LogBolt
BAS
DEIMOS: extension
HBase Cluster
Index
Storage
Hadoop
Archive Job
Archive
Elasticsearch / Solr
Interactive Analytics
Stuff I learned the hard way
• Debugging is difficult (dbxtool > ddd)
• Always check version number of open source
libraries
• The right balance between planning and doing
• Use bcpc if you want to test things
• BASO is great
• Reading a book might be better than googling
Q&A
Scalable Logging
To Assess and Improve Performance
Problem Statement
A customer is shouting at me!
How do I find what happened quickly?
How do I prevent it next time?
How can I anticipate entirely new problems?
Use Cases
(needed today)
• Debugging
– Goal: Investigate complaints by looking at the inputs that
went into a specific request.
– What needs to be fixed: NOT logging everything so a lot of
time wasted trying to reproduce customer problems
instead of having it already there.
– Motivation: Spending a week tracking down reproduction
data because logging subsystem cannot handle full
selective BAEL logging in production.
Use Cases
(planning for tomorrow)
• Automated Request Audit
– Goal: Need to know exact inputs, path it took through the input
system, and outputs provided (all based on the logs received).
– What needs to be fixed: We have no way to analyze the requests
we receive except manually one at a time. We cannot go back in
time to perform hypothesis testing and automatic auditing of
requests according to rules.
– Motivation: Recent malformed requests caused one of our
daemons to throw an exception and crash because number of
scenarios did not match number of dates in input. It is not
possible to see how many malformed requests we got in past or
detect this condition in production without deploying new code
in the actual system itself.
Use Cases
(planning for tomorrow)
• Aggregation of end-to-end trends
– Goal: Anomaly (spike/dip) detection (define a window and build a historical
distribution for the data).
– What needs to be fixed: Need to establish expected SLAs for each kind of request
received based on input sizes and estimations of downstream system performance.
– Motivation: MARS team received a complaint about processing being too slow. We
had no baseline. We had to use trial and error to determine what could be pushed
through the system. A lot of guesswork.
• Operational analysis of the dependent systems
– Goal: Capacity planning and performance optimization.
– What needs to be fixed: Problem detection by analyzing deviation from
historical trends for:
• Processing rates, error rates, and response times.
– Motivation: When the downstream mortgage services started throwing
errors it took a lot of manual reproduction attempts to figure out.
The Challenge
We are reactive instead of proactive
Need More Data
Data-driven
Evolution
In
Making
Operational
Substitutions
Definitions
• A log is some arbitrary sequence of events ordered in time representing state that
we want to preserve for later retrieval
• An event is a tuple representing an occurrence of
– Input system (system type + specific instance)
– Event time (start and end time)
– Event ID and Parent Event ID (to establish causation)
– Location (OS and Bloomberg process/task/machine name)
– Privilege information (UUID)
– Event data – can be an arbitrary object (input system provides direction of how to interpret event
data)
• Conceptually, the events are stored as directed acyclic graph with a start node,
where each node represents an event. (see the MTT tool as an example)
• Input systems
– Other systems that provide the event stream
– Two main input systems types:
• BAEL entries
• BAS requests
– Currently targeted input instance is only MARS
Overall Architecture
Event feed – take
responsibility for logging
events • MARS daemons – Sends actual log
events to xaplog instances.
• xaplog instances – Receives log
events and forwards them to Kafka
instance.
• Kafka
– Middleware to queue messages, it is
scalable and durable.
– Once Kafka accepts an event, the
associated xaplog instance is freed of
any further obligations.
Ingestion – group related
events together
• Kafka – Collects events into two main
queues.
– First queue: BAS messages
– Second queue: BAEL messages
– Log events are persisted onto disk.
– Serves as a shock absorber to handle
bursts in log event traffic (since it just
stores the messages, it doesn’t have to
process them).
• The rest of the system should be
designed to handle the average load
case.
• Storm Ingestion Topology – Groups
event stream by root request.
• Partitioner – Holds grouped events together.
Encoding – efficiently code
the event stream at the
binary level
• Partitioner – Writes the same request
chain under the same rows in Hbase.
– The data is split into three main content
types:
• BAS/BAEL headers
• BAS string data (XML)
• BAEL string data (trace information)
• Storm Encoding Topology – Writes each
group of events as one BLOB – with
special coding tailored to data type (i.e.
header data, XML, text).
• Log warehouse – Encoded blobs are
written to different tables for longer-
term archiving.
Indexing – speed up
access to relevant fields
for interactive querying
• Log warehouse – By storing similar
data together with specialized
encoding it can significantly reduce
storage costs.
• Storm Indexing Topology – Extracts
the relevant subset of data to feed
the indexes.
• Indexes – Underlying
implementation of the indexes. Basic
ones can be stored in HBase. More
complicated ones can be stored in
ElasticSearch/Solr.
Querying – let users
lookup the event stream
• Indexes / log warehouse –
– User queries would hit the indexes first.
– If additional data is needed and is not
available in an index it would need to
access the warehouse.
• xapqrylg – New daemons to marshal
requests from the UIs.
• MTT UIs – Would be unchanged.
More improvements can be added
later.
Phase I tasks
Replace MTT backend
• Code in xaplog to send events to Kafka queue
– Kafka & Storm will live on BCPC for proof-of-concept, need to see about production
– See if can reuse what pricing history team did?
• Maybe not, it should just be a simple push.
• Design Kafka queue layout (partitioning and topics)
– Two topics: BAS and BAEL
• Maybe: three later, BAS lite, BAS xml + BAEL – decouple the ingestion rates if better latency needed???
– Look at the best settings and make sure DRQS 54369477 doesn’t apply
• Storm Ingestion topology & HBase schema (in Java)
– Write each header-data row separately and let the encoding aggregate them.
– Blobs do not need any ingestion right now, they can be written to target table directly.
• Storm Encoding topology & HBase schema (in Java)
– Keeping it simple for now. Split up XML blobs from rest of data.
– Store all non-blob data grouped by root request id (protobuf??)
– For blob data do some basic XML to binary, and as part of key order responses and requests together.
– How to ensure if the same log data is fed more than once it only gets written once?
• Storm Indexing topology & HBase schema (in Java)
– A few simple indexes will live in HBase to allow query by UUID, date range, pricing #, and security.
– How to keep indexes synchronized with the warehouse tables?
• Xapqrylg – read HBase indexes and storage tables
– Reuse Kirill’s work on mttweb where it makes sense.
Q&A
"Go ahead, make my day.“ -Harry
Key Properties
…of a useful event stream logging
system
Required Properties
1. Ownership - It accepts logging data and takes responsibility so that input systems are freed from
offering any guarantees after handoff (logging is not the main task of input systems, just a side
effect)
1. Makes it easy to generate IDs to link events in a tree
2. Two main casual link models can be considered (explicitly is preferred):
1. Explicitly, by having each event have a parent event id as well as its own event id
2. Implicitly, by having a root request id, and then ordering by event time, and ingestion order
2. Durability - reduce chances of data loss, especially in the event of crashes
3. Idempotence - It correctly handles the same input log data if sent into the system more than once
1. Due to failures, input systems might send the same data twice – client side problems easy to handle: just send data
again
2. To support batch input of the data from other sources (“bulk import”) – to stand up another instance of the system
or migration from other systems in a consistent fashion
3. Replaying existing log data to simplify re-indexing and related side-effects
4. Time-invariance - Does not expect the event stream to be time ordered (even though it usually will
be), the output of the system might be different in-between, once the exact same overall data has
been fed to the system the outputs should be the same
5. Avoiding Lock-in - Allows easy export of data in bulk into a neutral form
1. for exporting into other systems or into another instance
2. don’t want the data to be stranded
6. Scalable – as close to linear as possible to improve performance by just adding more machines.
Required Properties (cont’d)
7. High Availability – have some form of redundancy so that if machines in the
system fail the system can still operate, maybe in a degraded state (performance-
wise).
8. Manageable - Export metrics to support decisions on the operation of the system
9. Schema-agnostic - Is as schema-less as possible
7. requires only to know about the fields it needs to index on
8. otherwise shouldn’t care about the data being in a specific format
9. the input format should be akin to a nested JSON object
10. but with a parent id to correlate to a parent and then ordered by time.
10. Space-efficient - Ability to optimize binary storage to …
7. Reduce disk space taken
8. Improve read times
9. …at the expense of increased complexity and CPU costs when writing the data
Why Current Solutions Are Inadequate
• APDX (and TRACK – a functional subset of APDX)
– Collects only numerical metrics with no ability to store arbitrary event
data or casual relationships between events. It just counts events.
– It can be used in parallel, but does not our nearly meet our needs.
• Splunk
– Lightweight analysis done based on:
• {TEAM MOB2:SPLUNK TUTORIAL<GO>}
• http://rndx.prod.bloomberg.com/questions/9584/how-should-we-do-distributed-
logging
– Main points that discourage further research:
• Splunk expects log lines only with no arbitrary data.
– Hard to save space
• Cost is per log volume (uncompressed) – we expect to easily exceed 100GiB of raw
logging volume a day (supposedly that will be a one-time cost of $110k).
• Better suited as a higher level tool that we could maybe use on top.

More Related Content

What's hot

Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsGuozhang Wang
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEkawamuray
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDiscover Pinterest
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutYaroslav Tkachenko
 

What's hot (20)

Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 

Similar to Project Deimos

ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastoreTomas Sirny
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Kevin Mao
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKAndrew Trossman
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?Deepak Shankar
 

Similar to Project Deimos (20)

ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 

Project Deimos

  • 2.
  • 3. “To provide a conceptual framework for designing a dispatch engine that reacts to a request by gathering various inputs, dispatching requests with the inputs to some pricing engines, then reassembling the results into a form the original requestor can comprehend.” – Andrei
  • 7. DEIMOS: high level architecture Storage Computation / Indexing Ingestion / Buffering Apache Kakfa Apache Storm Apache HBase
  • 9.
  • 14. Storm: topology Number Spout Data-store Odd Bolt Even Bolt [1,2,3,4, …] [2,4, …] [1,3, …] Pair Bolt Log Bolt [(1,2),(3,4), …]
  • 16. • 1 worker per node per topology • 1 executor per core for CPU bound tasks • 1-10 executors per core for IO bound tasks • Compute total parallelism possible and distribute it amongst slow and fast tasks. High parallelism for slow tasks, low for fast tasks. Storm: tuning guidelines
  • 19. HBase: schema design Raw Table: "logTable" Row Key Family "rrh" "rrb" "mh" "mb" Qualifiers entryId0 entryId1 entryId0 entryId1 entryId0 entryId1 entryId0 entryId1 rootRequestId0|requestId0 rootRequestId0|requestId1 rootRequestId1|requestId0 rootRequestId1|requestId1 Uuid/LogTime Index Table: "indexTable" Row Key Family "rrh" Qualifiers rootRequestId0 rootRequestId1 logTime0|uuid0 logTime0|uuid1 logTime1|uuid0 logTime1|uuid1
  • 22. DEIMOS: detailed implementation Kafka Cluster Storm Cluster HBase Cluster xaplog Kafka producer library MARS & pricing tasks Marsloggerclient library BAS mttsvc (java) HBase client library MTT MLOG (Terminal) MTT WEB (PHP) Index Storage KafkaSpout IndexBolt LogBolt BAS
  • 23. DEIMOS: extension HBase Cluster Index Storage Hadoop Archive Job Archive Elasticsearch / Solr Interactive Analytics
  • 24. Stuff I learned the hard way • Debugging is difficult (dbxtool > ddd) • Always check version number of open source libraries • The right balance between planning and doing • Use bcpc if you want to test things • BASO is great • Reading a book might be better than googling
  • 25. Q&A
  • 26. Scalable Logging To Assess and Improve Performance
  • 27. Problem Statement A customer is shouting at me! How do I find what happened quickly? How do I prevent it next time? How can I anticipate entirely new problems?
  • 28. Use Cases (needed today) • Debugging – Goal: Investigate complaints by looking at the inputs that went into a specific request. – What needs to be fixed: NOT logging everything so a lot of time wasted trying to reproduce customer problems instead of having it already there. – Motivation: Spending a week tracking down reproduction data because logging subsystem cannot handle full selective BAEL logging in production.
  • 29. Use Cases (planning for tomorrow) • Automated Request Audit – Goal: Need to know exact inputs, path it took through the input system, and outputs provided (all based on the logs received). – What needs to be fixed: We have no way to analyze the requests we receive except manually one at a time. We cannot go back in time to perform hypothesis testing and automatic auditing of requests according to rules. – Motivation: Recent malformed requests caused one of our daemons to throw an exception and crash because number of scenarios did not match number of dates in input. It is not possible to see how many malformed requests we got in past or detect this condition in production without deploying new code in the actual system itself.
  • 30. Use Cases (planning for tomorrow) • Aggregation of end-to-end trends – Goal: Anomaly (spike/dip) detection (define a window and build a historical distribution for the data). – What needs to be fixed: Need to establish expected SLAs for each kind of request received based on input sizes and estimations of downstream system performance. – Motivation: MARS team received a complaint about processing being too slow. We had no baseline. We had to use trial and error to determine what could be pushed through the system. A lot of guesswork. • Operational analysis of the dependent systems – Goal: Capacity planning and performance optimization. – What needs to be fixed: Problem detection by analyzing deviation from historical trends for: • Processing rates, error rates, and response times. – Motivation: When the downstream mortgage services started throwing errors it took a lot of manual reproduction attempts to figure out.
  • 31. The Challenge We are reactive instead of proactive Need More Data
  • 33. Definitions • A log is some arbitrary sequence of events ordered in time representing state that we want to preserve for later retrieval • An event is a tuple representing an occurrence of – Input system (system type + specific instance) – Event time (start and end time) – Event ID and Parent Event ID (to establish causation) – Location (OS and Bloomberg process/task/machine name) – Privilege information (UUID) – Event data – can be an arbitrary object (input system provides direction of how to interpret event data) • Conceptually, the events are stored as directed acyclic graph with a start node, where each node represents an event. (see the MTT tool as an example) • Input systems – Other systems that provide the event stream – Two main input systems types: • BAEL entries • BAS requests – Currently targeted input instance is only MARS
  • 35. Event feed – take responsibility for logging events • MARS daemons – Sends actual log events to xaplog instances. • xaplog instances – Receives log events and forwards them to Kafka instance. • Kafka – Middleware to queue messages, it is scalable and durable. – Once Kafka accepts an event, the associated xaplog instance is freed of any further obligations.
  • 36. Ingestion – group related events together • Kafka – Collects events into two main queues. – First queue: BAS messages – Second queue: BAEL messages – Log events are persisted onto disk. – Serves as a shock absorber to handle bursts in log event traffic (since it just stores the messages, it doesn’t have to process them). • The rest of the system should be designed to handle the average load case. • Storm Ingestion Topology – Groups event stream by root request. • Partitioner – Holds grouped events together.
  • 37. Encoding – efficiently code the event stream at the binary level • Partitioner – Writes the same request chain under the same rows in Hbase. – The data is split into three main content types: • BAS/BAEL headers • BAS string data (XML) • BAEL string data (trace information) • Storm Encoding Topology – Writes each group of events as one BLOB – with special coding tailored to data type (i.e. header data, XML, text). • Log warehouse – Encoded blobs are written to different tables for longer- term archiving.
  • 38. Indexing – speed up access to relevant fields for interactive querying • Log warehouse – By storing similar data together with specialized encoding it can significantly reduce storage costs. • Storm Indexing Topology – Extracts the relevant subset of data to feed the indexes. • Indexes – Underlying implementation of the indexes. Basic ones can be stored in HBase. More complicated ones can be stored in ElasticSearch/Solr.
  • 39. Querying – let users lookup the event stream • Indexes / log warehouse – – User queries would hit the indexes first. – If additional data is needed and is not available in an index it would need to access the warehouse. • xapqrylg – New daemons to marshal requests from the UIs. • MTT UIs – Would be unchanged. More improvements can be added later.
  • 40. Phase I tasks Replace MTT backend • Code in xaplog to send events to Kafka queue – Kafka & Storm will live on BCPC for proof-of-concept, need to see about production – See if can reuse what pricing history team did? • Maybe not, it should just be a simple push. • Design Kafka queue layout (partitioning and topics) – Two topics: BAS and BAEL • Maybe: three later, BAS lite, BAS xml + BAEL – decouple the ingestion rates if better latency needed??? – Look at the best settings and make sure DRQS 54369477 doesn’t apply • Storm Ingestion topology & HBase schema (in Java) – Write each header-data row separately and let the encoding aggregate them. – Blobs do not need any ingestion right now, they can be written to target table directly. • Storm Encoding topology & HBase schema (in Java) – Keeping it simple for now. Split up XML blobs from rest of data. – Store all non-blob data grouped by root request id (protobuf??) – For blob data do some basic XML to binary, and as part of key order responses and requests together. – How to ensure if the same log data is fed more than once it only gets written once? • Storm Indexing topology & HBase schema (in Java) – A few simple indexes will live in HBase to allow query by UUID, date range, pricing #, and security. – How to keep indexes synchronized with the warehouse tables? • Xapqrylg – read HBase indexes and storage tables – Reuse Kirill’s work on mttweb where it makes sense.
  • 41. Q&A "Go ahead, make my day.“ -Harry
  • 42. Key Properties …of a useful event stream logging system
  • 43. Required Properties 1. Ownership - It accepts logging data and takes responsibility so that input systems are freed from offering any guarantees after handoff (logging is not the main task of input systems, just a side effect) 1. Makes it easy to generate IDs to link events in a tree 2. Two main casual link models can be considered (explicitly is preferred): 1. Explicitly, by having each event have a parent event id as well as its own event id 2. Implicitly, by having a root request id, and then ordering by event time, and ingestion order 2. Durability - reduce chances of data loss, especially in the event of crashes 3. Idempotence - It correctly handles the same input log data if sent into the system more than once 1. Due to failures, input systems might send the same data twice – client side problems easy to handle: just send data again 2. To support batch input of the data from other sources (“bulk import”) – to stand up another instance of the system or migration from other systems in a consistent fashion 3. Replaying existing log data to simplify re-indexing and related side-effects 4. Time-invariance - Does not expect the event stream to be time ordered (even though it usually will be), the output of the system might be different in-between, once the exact same overall data has been fed to the system the outputs should be the same 5. Avoiding Lock-in - Allows easy export of data in bulk into a neutral form 1. for exporting into other systems or into another instance 2. don’t want the data to be stranded 6. Scalable – as close to linear as possible to improve performance by just adding more machines.
  • 44. Required Properties (cont’d) 7. High Availability – have some form of redundancy so that if machines in the system fail the system can still operate, maybe in a degraded state (performance- wise). 8. Manageable - Export metrics to support decisions on the operation of the system 9. Schema-agnostic - Is as schema-less as possible 7. requires only to know about the fields it needs to index on 8. otherwise shouldn’t care about the data being in a specific format 9. the input format should be akin to a nested JSON object 10. but with a parent id to correlate to a parent and then ordered by time. 10. Space-efficient - Ability to optimize binary storage to … 7. Reduce disk space taken 8. Improve read times 9. …at the expense of increased complexity and CPU costs when writing the data
  • 45. Why Current Solutions Are Inadequate • APDX (and TRACK – a functional subset of APDX) – Collects only numerical metrics with no ability to store arbitrary event data or casual relationships between events. It just counts events. – It can be used in parallel, but does not our nearly meet our needs. • Splunk – Lightweight analysis done based on: • {TEAM MOB2:SPLUNK TUTORIAL<GO>} • http://rndx.prod.bloomberg.com/questions/9584/how-should-we-do-distributed- logging – Main points that discourage further research: • Splunk expects log lines only with no arbitrary data. – Hard to save space • Cost is per log volume (uncompressed) – we expect to easily exceed 100GiB of raw logging volume a day (supposedly that will be a one-time cost of $110k). • Better suited as a higher level tool that we could maybe use on top.

Editor's Notes

  1. Hello everyone, my name is Simon Suo. I am a co-op student from the University of Waterloo and I have been working with Andrei on some exciting stuff over the past four months.
  2. This presentation is meant to showcase everything that I was told to do, what I actually did, and what I should have done.
  3. So what exactly was I told to do? There’s no better way to present this than showing the exact quote from the project proposal document I received. So here it is, in its glorious entirety.
  4. Upon arrival, I was told that there are more urgent matter to tend to before the more grandiose plan can be executed. I will be focusing on the DEIMOS project instead of the PHOBOS project. For us mere mortals who do not possess the extraordinary sense of humor that Andrei does: PHOBOS stands for “proving how our bottleneck opposes speed”, and DEIMOS stands for “Data-driven evolution in marking operational substitutions”. And in plain English, they refer to the dispatcher redesign project and the scalable logging project respectively. For those who are not aware, phobos and deimos are the names of the two largest moons of mars. So it is quite clever actually. Good job Andrei.
  5. Let’s look at the high level architecture of such a scalable logging system. There are three major components to this system: data ingestion and buffering, computation and indexing, and finally storage.
  6. To achieve the performance and scalability we need, we explored many cool new technologies and evaluated their effectiveness.
  7. List of Technologies I got to play with: Apache Kafka Apache Storm Apache Hbase Apache Cassandra ZeroMQ Cap’n Proto Google Protocol Buffers Google Flatbuffers
  8. Key computation graph terminology: Topology Spout Bolt
  9. Key parallelism terminology: Worker process Executors (Threads) Tasks Settings: Parallelism Hint Max Spout Pending Time Out
  10. Obtained from a guideline published in a Spotify Labs blog.
  11. Defining a Storm topology
  12. Defining a Storm bolt
  13. Hbase and Java bas service code