SlideShare a Scribd company logo
TSAR
A TimeSeries AggregatoR
Anirudh Todi | Twitter | @anirudhtodi | TSAR
What is TSAR?
What is TSAR?
TSAR is a framework and service
infrastructure for specifying,
deploying and operating timeseries
aggregation jobs.

TimeSeries Aggregation at Twitter
TimeSeries Aggregation at Twitter
A common problem	

⇢Data products (analytics.twitter.com, embedded
analytics, internal dashboards)	

⇢Business metrics	

⇢Site traffic, service health, and user engagement
monitoring	

!
Hard to do at scale	

⇢10s of billions of events/day in real time	

!
Hard to maintain aggregation services once deployed - 	

⇢Complex tooling is required

TimeSeries Aggregation at Twitter
TimeSeries Aggregation at Twitter
Many time-series applications look similar	

⇢ Common types of aggregations	

⇢ Similar service stacks	

!
Multi-year effort to build general solutions	

⇢Summingbird - abstraction library for generalized
distributed computation	

!
TSAR - an end-to-end aggregation service built on
Summingbird	

⇢Abstracts away everything except application's data
model and business logic
A typical aggregation
Event logs
Extract
Group
Measure
Store
[ (“/“ , 300, iPhone,…)
, (“/favorites”, 300, iPhone,…)
, (“/replies” , 300, Anroid,…)
, (“/“ , 200, Web,…)
, (“/favorites”, 200, Web,…)
, (“/“ , 200, iPhone,…)
]
[ (“/“ , 300)
, (“/favorites”, 300)
, (“/replies” , 300)
, (“/“ , 200)
, (“/favorites”, 200)
, (“/“ , 200)
]
[ (“/“
, [ (“/“, 300)
, (“/“, 200 )
, (“/“, 200 )
]
)
, (“/favorites”
, [ (“/favorites”, 300)
, (“/favorites”, 300)
]
)
, (“/replies”
, [ (“/replies”, 300)
]
)
]
[ (“/“ , 3)
, (“/favorites”, 2)
, (“/replies” , 1)
]
key-value
Vertica
Example: API aggregates
Example: API aggregates
!
⇢Bucket each API call 	

!
⇢Dimensions - endpoint, datacenter, client
application ID 	

!
⇢Compute - total event count, unique users, mean
response time etc	

!
⇢Write the output toVertica
Example: Impressions by Tweet
Example: Impressions by Tweet
!
⇢Bucket each impressions by tweet ID	

!
⇢Compute total count, unique users	

!
⇢Write output to a key-value store	

!
⇢Expose output via a high-SLA query service	

!
⇢Write sample of data toVertica for cross-validation
Problems
Problems!
⇢Service interruption: Can we retrieve lost data?	

!
⇢Data schema coordination: Store output as log
data, in a key-value data store, in cache, and in
relational databases	

!
⇢Flexible schema change	

!
⇢Easy to backfill and update/repair historical data	

!
Most important: Solve these problems in a general
way
TSAR’s design principles
TSAR’s design principles
1) Hybrid computation: Build on Summingbird, process
each event twice - in real time & in batch (at a later
time)	

!
⇢Gives stability and reproducibility of batch	

⇢Streaming (recency) of realtime	

!
Leverage the Summingbird ecosystem:	

⇢Abstraction framework over computing platforms	

⇢Rich library of approximation monoids (Algebird)	

⇢Storage abstractions (Storehaus)	

!
!
TSAR’s design principles
TSAR’s design principles
2) Separate event production from event aggregation	

!
User specifies how to extract events from source data	

!
Bucketing and aggregating events is managed by TSAR	

!
!
TSAR’s design principles
TSAR’s design principles
3) Unified data schema: 	

⇢Data schema specified in datastore-independent way	

⇢Managed schema evolution & data transformation	

!
Store data on:	

⇢HDFS	

⇢Manhattan (key-value)	

⇢Vertica/MySQL	

⇢Cache 	

!
Easily extensible to other schemas (Cassandra, HBase, etc)
Data Schema Coordination
Event	

Log
Kafka	

Spout
Intermediate	

Aggregates
Manhattan
Vertica
• Schema consistent across stores	

• Update stores when schema evolves
TSAR’s design principles
TSAR’s design principles
4) Integrated service toolkit	

!
⇢One-stop deployment tooling	

!
⇢Data warehousing	

!
⇢Query capability	

!
⇢Automatic observability and alerting	

!
⇢Automatic data integrity checks
Tweet Impressions in TSAR
Tweet Impressions in TSAR
⇢Annotate each tweet with an impression count	

!
⇢Count = unique users who saw that tweet	

!
⇢Massive scalability challenge:	

• > 500MM tweets/day	

• tens of billions of impressions	

!
⇢Want realtime updates	

!
⇢Production ready and robust
A minimal Tsar project
Scala Tsar job Configuration File
Thrift IDL
struct TweetAttributes
{
1: optional i64 tweet_id
}
Thrift IDL
Tweet Impressions Example
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

Scala Tsar job
Tweet Impressions Example
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}
Dimensions
for job
aggregation
Scala Tsar job
Tweet Impressions Example
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Metrics to
compute
Scala Tsar job
Tweet Impressions Example
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
What
datastores
to write to
Scala Tsar job
Tweet Impressions Example
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Summingbird
fragment to
describe event
production.
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Summingbird
fragment to
describe event
production.
Scala Tsar job
Tweet Impressions Example
aggregate {	

onKeys(	

(TweetId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.tweetId)	

(event.timestamp, impr)	

}	

}	

!
Summingbird
fragment to
describe event
production.
There is no
aggregation logic
specified here
Scala Tsar job
Config(
base = Base(
namespace = 'tsar-examples',
name = 'tweets',
user = 'tsar-shared',
thriftAttributesName = 'TweetAttributes',
origin = '2014-05-15 00:00:00 UTC',
!
jobclass = 'com.twitter.platform.analytics.examples.TweetJob',
!
outputs = [
Output(sink = Sink.IntermediateThrift, width = 1 * Day),
Output(sink = Sink.Manhattan1, width = 1 * Day)
],
...
Configuration File
What has been specified?
What has been specified?
⇢Our event schema (in thrift)	

!
⇢How to produce these events	

!
⇢Dimensions to aggregate on	

!
⇢Time granularities to aggregate on	

!
⇢Sinks (Manhattan / MySQL) to use	

!
What do you not specify?
⇢How to represent the aggregated data	

!
⇢How to represent the schema in MySQL /
Manhattan	

!
⇢How to perform the aggregation	

!
⇢How to locate and connect to underlying services
(Hadoop, Storm, MySQL, Manhattan, …)	

!
Operational simplicity
End-to-end service infrastructure with a single command	

$ tsar deploy --env=prod
⇢Launch Hadoop jobs	

⇢Launch Storm jobs	

⇢Launch Thrift query service	

⇢Launch loader processes to load data into MySQL / Manhattan	

⇢Mesos configs for all of the above	

⇢Alerts for the batch & storm jobs and the query service	

⇢Observability for the query service	

⇢Auto-create tables and views in MySQL orVertica	

⇢Automatic data regression and data anomaly checks
Bird’s eye view of the TSAR pipeline
Seamless Schema evolution
Scala Tsar job
Seamless Schema evolution
Break down impressions by the client application 	

(Twitter for iPhone, Twitter for Android etc)	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.client, event.tweetId)	

(event.timestamp, impr)	

}	

}	

Scala Tsar job
Seamless Schema evolution
Break down impressions by the client application 	

(Twitter for iPhone, Twitter for Android etc)	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.client, event.tweetId)	

(event.timestamp, impr)	

}	

}	

Scala Tsar job
Seamless Schema evolution
Break down impressions by the client application 	

(Twitter for iPhone, Twitter for Android etc)	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(event.client, event.tweetId)	

(event.timestamp, impr)	

}	

}	

New aggregation
dimension
Scala Tsar job
Backfill tooling
Backfill tooling
But what about historical data?
Backfill tooling
But what about historical data?
tsar backfill —start=<start> —end=<end>
Backfill tooling
But what about historical data?
tsar backfill —start=<start> —end=<end>
Backfill runs parallel to the production job	

!
Useful for repairing historical data as well
Aggregating on different granularities
Configuration File
Aggregating on different granularities
We have been computing only daily aggregates 	

We now wish to add alltime aggregates
Configuration File
Aggregating on different granularities
We have been computing only daily aggregates 	

We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)
Configuration File
Aggregating on different granularities
We have been computing only daily aggregates 	

We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)
Configuration File
Aggregating on different granularities
We have been computing only daily aggregates 	

We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)
New aggregation granularity
Configuration File
Automatic metric computation
Scala Tsar job
Automatic metric computation
So far, only total view counts. 	

Now, add # unique users viewing each tweet	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count,	

Unique(UserId)	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(	

event.client, event.userId, event.tweetId	

)	

(event.timestamp, impr)	

}	

Scala Tsar job
Automatic metric computation
So far, only total view counts. 	

Now, add # unique users viewing each tweet	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count,	

Unique(UserId)	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(	

event.client, event.userId, event.tweetId	

)	

(event.timestamp, impr)	

}	

Scala Tsar job
Automatic metric computation
So far, only total view counts. 	

Now, add # unique users viewing each tweet	

!
aggregate {	

onKeys(	

(TweetId),	

(TweetId, ClientApplicationId)	

) produce (	

Count,	

Unique(UserId)	

) sinkTo (Manhattan)	

} fromProducer {	

ClientEventSource(“client_events”)	

.filter { event => isImpressionEvent(event) }	

.map { event =>	

val impr = ImpressionAttributes(	

event.client, event.userId, event.tweetId	

)	

(event.timestamp, impr)	

}	

New metric
Scala Tsar job
Support for multiple sinks
Configuration File
Support for multiple sinks
So far, only persisting data to Manhattan	

!
Persist data to MySQL as well	

Configuration File
Support for multiple sinks
So far, only persisting data to Manhattan	

!
Persist data to MySQL as well	

Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)	

Output(sink = Sink.MySQL, width = Alltime)
Configuration File
Support for multiple sinks
So far, only persisting data to Manhattan	

!
Persist data to MySQL as well	

Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)	

Output(sink = Sink.MySQL, width = Alltime)
Configuration File
Support for multiple sinks
So far, only persisting data to Manhattan	

!
Persist data to MySQL as well	

Output(sink = Sink.Manhattan, width = 1 * Day)	

Output(sink = Sink.Manhattan, width = Alltime)	

Output(sink = Sink.MySQL, width = Alltime)
New sink
Configuration File
Tsar Workflow
Create Job
Deploy Job
Modify Job
Optional:	

Backfill Job
TSAR Optimizations
Cache TSAR Pipeline
Logs
Intermediate Thrift
Sink
Total Thrift
Cache filtered events
Cache aggregation results
SinkSink aggregate {
onKeys (
(TweetIdField), // Total
(ClientApplicationIdField), // Total
(TweetIdField, ClientApplicationIdField) // Intermediate
) produce (
...
) sinkTo (
...
)
Covering Template
aggregate {
!
onKeys (
(TweetIdField),
(ClientApplicationIdField),
(TweetIdField, ClientApplicationIdField) // Covering Template
) produce (
...
) sinkTo (
...
)
Intermediate Sinks
Config(
base = Base(
namespace = 'tsar-examples',
name = 'tweets',
user = 'tsar-shared',
thriftAttributesName = 'TweetAttributes',
origin = '2014-05-15 00:00:00 UTC',
jobclass = 'com.twitter.platform.analytics.examples.TweetJob',
outputs = [
Output(sink = Sink.IntermediateThrift, width = 1 * Day),
Output(sink = Sink.TotalThrift, width = 1 * Day),
Output(sink = Sink.Manhattan1, width = 1 * Day),
Output(sink = Sink.Vertica, width = 1 * Day)
],
…
Trade space for time
Logs
Sink
Logs
Intermediate Thrift
Sink
Total Thrift
Logs
Intermediate Thrift
Sink SinkSinkSinkSinkSinkSink
Manhattan - Key Clustering
Reduces Manhattan queries for large time ranges
Manhattan - Key Clustering
2014-01-01 12:00:00
2014-01-01 13:00:00
2014-01-02 12:00:00
2014-01-02 13:00:00
2014-01-03 12:00:00
2014-01-03 12:00:00
2014-01-01
!
2014-01-02
!
2014-01-03
!
1 Day Clustering
Manhattan - Key Clustering
2014-01-01 12:00:00
2014-01-01 13:00:00
2014-01-02 12:00:00
2014-01-02 13:00:00
2014-01-03 12:00:00
2014-01-03 12:00:00
!
!
2014-01-01 to 2014-01-07
!
!
!
7 Day Clustering
Value Packing - Key Indexer
Some keys are always queried together
Value Packing - Key Indexer
!
Naive approach:
!
(CampaignIdField, CountryField) —-> Impressions
!
Would have to query:
!
(CampaignIdField, USA) —-> Impressions_USA
(CampaignIdField, UK) —-> Impressions_UK
…
!
Better approach:
!
(CampaignIdField) —> Map[CountryField -> Impressions]
!
Would have to query:
!
(CampaignIdField) —> Map[USA -> Impressions_USA,
UK -> Impressions_UK,…]
⇢Limits key fanout	

!
⇢Implicit index on CountryField - don’t need to
know which countries to query for
TSAR Visualization
TSAR Visualization
Conclusion: Three basic problems
Conclusion: Three basic problems
⇢Computation management	

Describe and execute computational logic	

Specify aggregation dimensions, metrics and
time granularities	

⇢Dataset management	

Define, deploy and evolve data schemas	

Coordinate data migration, backfill and
recovery	

⇢Service management	

Define query services, observability, alerting,
regression checks, coordinate deployment
across all underlying services	

TSAR gives you all of the above
!
Key Takeaway
Key Takeaway
“The end-to-end management of the
data pipeline is TSAR’s key feature.
The user concentrates on the
business logic.”	

!
!
Thank you!
!
Questions?
@anirudhtodi
ani @ twitter

More Related Content

What's hot

(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
Amazon Web Services
 
Airbnb - StreamAlert
Airbnb - StreamAlertAirbnb - StreamAlert
Airbnb - StreamAlert
Amazon Web Services
 
Crossing the streams viktor gamov
Crossing the streams viktor gamovCrossing the streams viktor gamov
Crossing the streams viktor gamov
confluent
 
Cloudwatch - The In's and Out's
Cloudwatch - The In's and Out'sCloudwatch - The In's and Out's
Cloudwatch - The In's and Out'sbeaknit
 
Containers and the Evolution of Computing
Containers and the Evolution of ComputingContainers and the Evolution of Computing
Containers and the Evolution of Computing
Amazon Web Services
 
Hands-on Lab: Data Lake Analytics
Hands-on Lab: Data Lake AnalyticsHands-on Lab: Data Lake Analytics
Hands-on Lab: Data Lake Analytics
Amazon Web Services
 
AWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as CodeAWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as Code
Amazon Web Services
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
Amazon Web Services
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
AWS Germany
 
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Sungmin Kim
 

What's hot (14)

(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014
 
Airbnb - StreamAlert
Airbnb - StreamAlertAirbnb - StreamAlert
Airbnb - StreamAlert
 
Crossing the streams viktor gamov
Crossing the streams viktor gamovCrossing the streams viktor gamov
Crossing the streams viktor gamov
 
Cloudwatch - The In's and Out's
Cloudwatch - The In's and Out'sCloudwatch - The In's and Out's
Cloudwatch - The In's and Out's
 
Containers and the Evolution of Computing
Containers and the Evolution of ComputingContainers and the Evolution of Computing
Containers and the Evolution of Computing
 
Hands-on Lab: Data Lake Analytics
Hands-on Lab: Data Lake AnalyticsHands-on Lab: Data Lake Analytics
Hands-on Lab: Data Lake Analytics
 
AWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as CodeAWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as Code
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
AWS CloudTrail to Track AWS Resources in Your Account (SEC207) | AWS re:Inven...
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
 
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
Mastering Access Control Policies (SEC302) | AWS re:Invent 2013
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 

Similar to TSAR (TimeSeries AggregatoR) Tech Talk

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
Patrik Suzzi
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
Nicolas (Nick) Barcet
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
Stephanie Locke
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
Koray Kocabas
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
Amrit Sarkar
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
ennui2342
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorWSO2
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
Amazon Web Services
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
The use case of a scalable architecture
The use case of a scalable architectureThe use case of a scalable architecture
The use case of a scalable architecture
Toru Wonyoung Choi
 
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQLBuilding Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
Mohanadarshan Vivekanandalingam
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
OCoderFest
 
Spark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data AnalysisSpark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data Analysis
Sushmanth Sagala
 
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
AFAS Software
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Building and deploying microservices with event sourcing, CQRS and Docker (QC...
Building and deploying microservices with event sourcing, CQRS and Docker (QC...Building and deploying microservices with event sourcing, CQRS and Docker (QC...
Building and deploying microservices with event sourcing, CQRS and Docker (QC...
Chris Richardson
 
The Fine Art of Time Travelling: implementing Event Sourcing
The Fine Art of Time Travelling: implementing Event SourcingThe Fine Art of Time Travelling: implementing Event Sourcing
The Fine Art of Time Travelling: implementing Event Sourcing
Andrea Saltarello
 

Similar to TSAR (TimeSeries AggregatoR) Tech Talk (20)

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
Rapid prototyping of eclipse rcp applications - Eclipsecon Europe 2017
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
The use case of a scalable architecture
The use case of a scalable architectureThe use case of a scalable architecture
The use case of a scalable architecture
 
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQLBuilding Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Spark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data AnalysisSpark Streaming - Meetup Data Analysis
Spark Streaming - Meetup Data Analysis
 
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
Michiel Overeem (AFAS) - Enterprise software schaalbaar maken met Service Fab...
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Building and deploying microservices with event sourcing, CQRS and Docker (QC...
Building and deploying microservices with event sourcing, CQRS and Docker (QC...Building and deploying microservices with event sourcing, CQRS and Docker (QC...
Building and deploying microservices with event sourcing, CQRS and Docker (QC...
 
The Fine Art of Time Travelling: implementing Event Sourcing
The Fine Art of Time Travelling: implementing Event SourcingThe Fine Art of Time Travelling: implementing Event Sourcing
The Fine Art of Time Travelling: implementing Event Sourcing
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 

TSAR (TimeSeries AggregatoR) Tech Talk

  • 1. TSAR A TimeSeries AggregatoR Anirudh Todi | Twitter | @anirudhtodi | TSAR
  • 3. What is TSAR? TSAR is a framework and service infrastructure for specifying, deploying and operating timeseries aggregation jobs.

  • 5. TimeSeries Aggregation at Twitter A common problem ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards) ⇢Business metrics ⇢Site traffic, service health, and user engagement monitoring ! Hard to do at scale ⇢10s of billions of events/day in real time ! Hard to maintain aggregation services once deployed - ⇢Complex tooling is required

  • 7. TimeSeries Aggregation at Twitter Many time-series applications look similar ⇢ Common types of aggregations ⇢ Similar service stacks ! Multi-year effort to build general solutions ⇢Summingbird - abstraction library for generalized distributed computation ! TSAR - an end-to-end aggregation service built on Summingbird ⇢Abstracts away everything except application's data model and business logic
  • 8. A typical aggregation Event logs Extract Group Measure Store
  • 9. [ (“/“ , 300, iPhone,…) , (“/favorites”, 300, iPhone,…) , (“/replies” , 300, Anroid,…) , (“/“ , 200, Web,…) , (“/favorites”, 200, Web,…) , (“/“ , 200, iPhone,…) ]
  • 10. [ (“/“ , 300) , (“/favorites”, 300) , (“/replies” , 300) , (“/“ , 200) , (“/favorites”, 200) , (“/“ , 200) ]
  • 11. [ (“/“ , [ (“/“, 300) , (“/“, 200 ) , (“/“, 200 ) ] ) , (“/favorites” , [ (“/favorites”, 300) , (“/favorites”, 300) ] ) , (“/replies” , [ (“/replies”, 300) ] ) ]
  • 12. [ (“/“ , 3) , (“/favorites”, 2) , (“/replies” , 1) ]
  • 15. Example: API aggregates ! ⇢Bucket each API call ! ⇢Dimensions - endpoint, datacenter, client application ID ! ⇢Compute - total event count, unique users, mean response time etc ! ⇢Write the output toVertica
  • 17. Example: Impressions by Tweet ! ⇢Bucket each impressions by tweet ID ! ⇢Compute total count, unique users ! ⇢Write output to a key-value store ! ⇢Expose output via a high-SLA query service ! ⇢Write sample of data toVertica for cross-validation
  • 19. Problems! ⇢Service interruption: Can we retrieve lost data? ! ⇢Data schema coordination: Store output as log data, in a key-value data store, in cache, and in relational databases ! ⇢Flexible schema change ! ⇢Easy to backfill and update/repair historical data ! Most important: Solve these problems in a general way
  • 21. TSAR’s design principles 1) Hybrid computation: Build on Summingbird, process each event twice - in real time & in batch (at a later time) ! ⇢Gives stability and reproducibility of batch ⇢Streaming (recency) of realtime ! Leverage the Summingbird ecosystem: ⇢Abstraction framework over computing platforms ⇢Rich library of approximation monoids (Algebird) ⇢Storage abstractions (Storehaus) ! !
  • 22.
  • 24. TSAR’s design principles 2) Separate event production from event aggregation ! User specifies how to extract events from source data ! Bucketing and aggregating events is managed by TSAR ! !
  • 26. TSAR’s design principles 3) Unified data schema: ⇢Data schema specified in datastore-independent way ⇢Managed schema evolution & data transformation ! Store data on: ⇢HDFS ⇢Manhattan (key-value) ⇢Vertica/MySQL ⇢Cache ! Easily extensible to other schemas (Cassandra, HBase, etc)
  • 27. Data Schema Coordination Event Log Kafka Spout Intermediate Aggregates Manhattan Vertica • Schema consistent across stores • Update stores when schema evolves
  • 29. TSAR’s design principles 4) Integrated service toolkit ! ⇢One-stop deployment tooling ! ⇢Data warehousing ! ⇢Query capability ! ⇢Automatic observability and alerting ! ⇢Automatic data integrity checks
  • 31. Tweet Impressions in TSAR ⇢Annotate each tweet with an impression count ! ⇢Count = unique users who saw that tweet ! ⇢Massive scalability challenge: • > 500MM tweets/day • tens of billions of impressions ! ⇢Want realtime updates ! ⇢Production ready and robust
  • 32. A minimal Tsar project Scala Tsar job Configuration File Thrift IDL
  • 33. struct TweetAttributes { 1: optional i64 tweet_id } Thrift IDL
  • 35. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } Scala Tsar job
  • 37. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } Scala Tsar job
  • 38. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } Scala Tsar job
  • 39. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } Dimensions for job aggregation Scala Tsar job
  • 41. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Scala Tsar job
  • 42. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Scala Tsar job
  • 43. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Metrics to compute Scala Tsar job
  • 45. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Scala Tsar job
  • 46. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! What datastores to write to Scala Tsar job
  • 48. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Scala Tsar job
  • 49. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Scala Tsar job
  • 50. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Summingbird fragment to describe event production. Scala Tsar job
  • 51. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Summingbird fragment to describe event production. Scala Tsar job
  • 52. Tweet Impressions Example aggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } ! Summingbird fragment to describe event production. There is no aggregation logic specified here Scala Tsar job
  • 53. Config( base = Base( namespace = 'tsar-examples', name = 'tweets', user = 'tsar-shared', thriftAttributesName = 'TweetAttributes', origin = '2014-05-15 00:00:00 UTC', ! jobclass = 'com.twitter.platform.analytics.examples.TweetJob', ! outputs = [ Output(sink = Sink.IntermediateThrift, width = 1 * Day), Output(sink = Sink.Manhattan1, width = 1 * Day) ], ... Configuration File
  • 54. What has been specified?
  • 55. What has been specified? ⇢Our event schema (in thrift) ! ⇢How to produce these events ! ⇢Dimensions to aggregate on ! ⇢Time granularities to aggregate on ! ⇢Sinks (Manhattan / MySQL) to use !
  • 56. What do you not specify? ⇢How to represent the aggregated data ! ⇢How to represent the schema in MySQL / Manhattan ! ⇢How to perform the aggregation ! ⇢How to locate and connect to underlying services (Hadoop, Storm, MySQL, Manhattan, …) !
  • 57. Operational simplicity End-to-end service infrastructure with a single command $ tsar deploy --env=prod ⇢Launch Hadoop jobs ⇢Launch Storm jobs ⇢Launch Thrift query service ⇢Launch loader processes to load data into MySQL / Manhattan ⇢Mesos configs for all of the above ⇢Alerts for the batch & storm jobs and the query service ⇢Observability for the query service ⇢Auto-create tables and views in MySQL orVertica ⇢Automatic data regression and data anomaly checks
  • 58. Bird’s eye view of the TSAR pipeline
  • 60. Seamless Schema evolution Break down impressions by the client application (Twitter for iPhone, Twitter for Android etc) ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } Scala Tsar job
  • 61. Seamless Schema evolution Break down impressions by the client application (Twitter for iPhone, Twitter for Android etc) ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } Scala Tsar job
  • 62. Seamless Schema evolution Break down impressions by the client application (Twitter for iPhone, Twitter for Android etc) ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } New aggregation dimension Scala Tsar job
  • 64. Backfill tooling But what about historical data?
  • 65. Backfill tooling But what about historical data? tsar backfill —start=<start> —end=<end>
  • 66. Backfill tooling But what about historical data? tsar backfill —start=<start> —end=<end> Backfill runs parallel to the production job ! Useful for repairing historical data as well
  • 67. Aggregating on different granularities Configuration File
  • 68. Aggregating on different granularities We have been computing only daily aggregates We now wish to add alltime aggregates Configuration File
  • 69. Aggregating on different granularities We have been computing only daily aggregates We now wish to add alltime aggregates Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Configuration File
  • 70. Aggregating on different granularities We have been computing only daily aggregates We now wish to add alltime aggregates Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Configuration File
  • 71. Aggregating on different granularities We have been computing only daily aggregates We now wish to add alltime aggregates Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) New aggregation granularity Configuration File
  • 73. Automatic metric computation So far, only total view counts. Now, add # unique users viewing each tweet ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } Scala Tsar job
  • 74. Automatic metric computation So far, only total view counts. Now, add # unique users viewing each tweet ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } Scala Tsar job
  • 75. Automatic metric computation So far, only total view counts. Now, add # unique users viewing each tweet ! aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } New metric Scala Tsar job
  • 76. Support for multiple sinks Configuration File
  • 77. Support for multiple sinks So far, only persisting data to Manhattan ! Persist data to MySQL as well Configuration File
  • 78. Support for multiple sinks So far, only persisting data to Manhattan ! Persist data to MySQL as well Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime) Configuration File
  • 79. Support for multiple sinks So far, only persisting data to Manhattan ! Persist data to MySQL as well Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime) Configuration File
  • 80. Support for multiple sinks So far, only persisting data to Manhattan ! Persist data to MySQL as well Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime) New sink Configuration File
  • 81. Tsar Workflow Create Job Deploy Job Modify Job Optional: Backfill Job
  • 83. Cache TSAR Pipeline Logs Intermediate Thrift Sink Total Thrift Cache filtered events Cache aggregation results SinkSink aggregate { onKeys ( (TweetIdField), // Total (ClientApplicationIdField), // Total (TweetIdField, ClientApplicationIdField) // Intermediate ) produce ( ... ) sinkTo ( ... )
  • 84. Covering Template aggregate { ! onKeys ( (TweetIdField), (ClientApplicationIdField), (TweetIdField, ClientApplicationIdField) // Covering Template ) produce ( ... ) sinkTo ( ... )
  • 85. Intermediate Sinks Config( base = Base( namespace = 'tsar-examples', name = 'tweets', user = 'tsar-shared', thriftAttributesName = 'TweetAttributes', origin = '2014-05-15 00:00:00 UTC', jobclass = 'com.twitter.platform.analytics.examples.TweetJob', outputs = [ Output(sink = Sink.IntermediateThrift, width = 1 * Day), Output(sink = Sink.TotalThrift, width = 1 * Day), Output(sink = Sink.Manhattan1, width = 1 * Day), Output(sink = Sink.Vertica, width = 1 * Day) ], …
  • 86. Trade space for time Logs Sink Logs Intermediate Thrift Sink Total Thrift Logs Intermediate Thrift Sink SinkSinkSinkSinkSinkSink
  • 87. Manhattan - Key Clustering Reduces Manhattan queries for large time ranges
  • 88. Manhattan - Key Clustering 2014-01-01 12:00:00 2014-01-01 13:00:00 2014-01-02 12:00:00 2014-01-02 13:00:00 2014-01-03 12:00:00 2014-01-03 12:00:00 2014-01-01 ! 2014-01-02 ! 2014-01-03 ! 1 Day Clustering
  • 89. Manhattan - Key Clustering 2014-01-01 12:00:00 2014-01-01 13:00:00 2014-01-02 12:00:00 2014-01-02 13:00:00 2014-01-03 12:00:00 2014-01-03 12:00:00 ! ! 2014-01-01 to 2014-01-07 ! ! ! 7 Day Clustering
  • 90. Value Packing - Key Indexer Some keys are always queried together
  • 91. Value Packing - Key Indexer ! Naive approach: ! (CampaignIdField, CountryField) —-> Impressions ! Would have to query: ! (CampaignIdField, USA) —-> Impressions_USA (CampaignIdField, UK) —-> Impressions_UK … ! Better approach: ! (CampaignIdField) —> Map[CountryField -> Impressions] ! Would have to query: ! (CampaignIdField) —> Map[USA -> Impressions_USA, UK -> Impressions_UK,…] ⇢Limits key fanout ! ⇢Implicit index on CountryField - don’t need to know which countries to query for
  • 95. Conclusion: Three basic problems ⇢Computation management Describe and execute computational logic Specify aggregation dimensions, metrics and time granularities ⇢Dataset management Define, deploy and evolve data schemas Coordinate data migration, backfill and recovery ⇢Service management Define query services, observability, alerting, regression checks, coordinate deployment across all underlying services TSAR gives you all of the above !
  • 97. Key Takeaway “The end-to-end management of the data pipeline is TSAR’s key feature. The user concentrates on the business logic.” !