Real-Time Applications At
Terabyte Scale
Isaac Mosquera
VP Engineering, Data & Insights
You’ve probably seen our sharing
tools...
But that’s not all we do...
WE MAKE SOCIAL DATA ACTIONABLE
Over 1B social signals
are processed monthly by
the ShareThis Social
Intelligence Platform™ to
generate insights about
your brand, industry and
events.
ENGAGEMENT
Users consume and
share content across
web and mobile
TARGETING
Desktop and mobile
targeting at scale
INSIGHTS
Actionable cross-device
insights
DATA
1B+ first party
Social Actions
Monthly
ENGAGEMENT
TARGETING INSIGHTS
DATA
• Lookalike Audiences
• Audience Segments
“Wow small SUVs are fuel efficient!”
User #12345
• Automotive Study
• Car Buying Infographic
Why Is Real-Time Important?
Time
Sharing Interest Decays With Time
The Previous
Architecture
Previous Architecture Problems
Duplicated Data
Query
Engine
Share Data
Insights
Query
Engine
Ad Tech
Query
Engine
Consumer
Engagement
Query
Engine
Data Science
Fragmented & Siloed Data Sources
Query
Engine
Share Data
Insights
Query
Engine
Ad Tech
Query
Engine
Consumer
Engagement
Query
Engine
Data Science
Campaign RTB Conversion
Summarization
3rd Party
Trends
Studies
Generating Reports From Old Platform
Raw Data
Pre
Aggregation
Staged Data
Results
Consumers
Query
Rest API
New Report Type
Why Focus On These Problems?
Faster Iterations Data Science New Applications
Business Value
Targeting
The Birth of a New Team
Data Team’s Mission
Making our data easily accessible
Our Data
Vision
Centralize Data sources Data Quality &
Trust
Reliable
Infrastructure
Real Time All The Things
Raw Social
Data
DLX Geo Device
Mappings
SentimentSocial
Keywords
Downstream
Applications
Kafka Architecture
Data ScienceApplication
Data ScienceLogs
Data ScienceProducers
Data ScienceApplication
Data ScienceLogs
Data ScienceProducers
Brokers
Data ScienceConsumers
Data Loaders
Data ScienceAnnotations Data ScienceFilters
Destinations
Big QuerySocial Ad Tech
Integrate Campaign
Social Data
DLX Geo Device
Mappings
SentimentSocial
Keywords
RTB Bid Data
Campaign Data
Downstream
Applications
Build An Active Warehouse
3 Trillion Row Interactive
Query Engine
Share Data
Data
Science
Ad Tech
Consumer
Engagement
Sales
Strategy
Insights
RTB
ImpressionS &
Clicks + RT
External
Data
Science
Data
Science
ATDs
Data
Scienc
e
DMPs DSPs
Internal
Google Big Query
Add in redundancy and robustness into our
data pipeline that protects us against data
loss.
Reliability
Unified Monitoring
Centralizing monitoring allows us to have a
singular definition of “data quality”
Monitoring Infrastructure
Consumer App
Metrics Library
Producer App
Metrics Library
Graphite
Slack
Dev Team
Seyren
Dashboards
Defining Data Quality
Expected Field Distribution Data Loss Business KPIs
What’s Next?
Dynamic Stream Filter
You Want This But You Get This
Stream Sources
Filter Application
Data Filter UI
Filter
Definitions
Data Stream Filter Prototype
Real Time Pipeline
shares from top
100 domains
user actions in
north east region
users who
recently bought
car
user likely to buy a
car soon
actions from user
ids in (1234,
5432, 9999)
Data Science
External
Customers
Data ScienceInternal Teams
Predictive Algorithms
Dynamically create filters based
on customer’s needs. These can
be created instantly on-demand.
Questions?
Isaac Mosquera
twitter: imosquera
e-mail: isaac@sharethis.com

Real time pipeline at terabyte sacle