Named leader in report.
Founded in 2009	
  
Acquired by AOL in 2014	
  
Using Big Data stack since 2009
75 people - 30 R&D
1.5T of daily data
>100 data sources
Helping marketers to optimize
their spend
Across: Channels | Devices
Online + Offline
Convertro
ü Clear
ü Actionable
ü Great UX/I
Successful Dashboard
Rendering
time	
  
Storage	
  
Cost	
  
UX/I	
  
Considerations
Processing
time	
  
Insights	
  
Comparison	
  
Over	
  2me	
  
Explore	
  
“S2cky”	
  
Configura2on	
  
RT	
  metrics	
  
Integrate	
  
USE CASE #1
Speed	
   Batch	
  
Materialization to one table is too costly
(belated massive updates)
Leverage Vertica’s sorted data structure
Join data in run time ( O(n) )
Query	
  
Spend	
  	
  
Batch	
  
Revenue	
  
Speed	
  
Query*
Merged	
  
Structure	
  
Spend	
  	
  
Batch	
  
Revenue	
  
Speed	
  
* λ architecture
USE CASE #2
Different	
  	
  
metrics	
  
with	
  	
  
1:N	
  rela2onships	
  
Avoid joins in query time ( if possible )
Pre joining and aggregate by dimensions
Pre joining does not necessarily explode
your data store
Visits,	
  	
  Conversions,	
  Impressions	
  
Conversions	
   Impressions	
  
⨝	
   ⨝	
  
Σ	
  
Visits	
  
USE CASE #3
Many	
  
Dimensions	
  	
  
Limit number of returned records to screen –
vizualize the most significant data	
  
	
  
Allow to dump data with different QOS
Allow to choose up to X dimensions – not all	
  
For each page allow to choose different relevant
dimensions 	
  
Build different data structures for different pages
USE CASE #4
Same	
  data	
  
different	
  
rendering	
  
Same	
  data	
  different	
  
rendering	
  
Query locality caching
	
  
Backend does data rendering	
  
	
  
Shared configuration across widgets	
  
	
  
MPP has a limited query schedulers	
  
Table	
   Query	
   Cache	
  
Σ	
   Widget	
  1	
  
Widget	
  2	
  
USE CASE #5
Real	
  Time	
  	
  
data	
  points	
  	
  
2cker	
  
Sometimes you don’t have to be 100% accurate or
consistent
try using:
Extrapolation
Sampling
Different data stores
Heuristics
logs	
   Speed	
  layer	
   Ticker	
  
Every	
  X	
  
minutes	
  
Real	
  2me	
  	
  
extrapola2on	
  
Hydro – Data Rendering Service
Hydro
E
X
T
R
A
C
T	
  
T
R
N
S
F
O
R
M
R
E
N
D
E
R	
  
ETL	
  
Web/
App	
  
Server	
  
API	
  
DB1	
  
DB2	
  
Connect to any data source
Multi level caching and invalidation
Applying data transformation and rendering
Logic sharing
Understand the requirements
One technology doesn't fit all
One data structure doesn't fit all
Good UX takes into account Data and
Technology considerations
yaniv@convertro.com
Data Processing and Mining
Analytics DB - Vertica
Built	
  for	
  analy>cs	
  
Storage	
  /	
  Query	
  engine	
  /	
  Op2mizer	
  	
  
Column	
  oriented	
  store	
  
Sorted	
  
True	
  MPP	
  
	
  
Deals	
  well	
  with	
  high	
  cardinality	
  and	
  sparse	
  data	
  
*not an open source
Real Time metrics
Web Stack
Server	
  
	
  
	
  
Pandas	
  
Hydro 	
  
Client	
  
	
  
Backbone	
  
marioneVe	
  
RequireJs	
  
handlebars	
  
	
  
	
  
highcharts	
  
d3	
  
	
  
underscore	
  
TwiVer	
  Bootstrap	
  
SlickGrid	
  
...	
  
Architecture	
  
Visualizaion	
  

"Interactive Deep Analytics" Dashboard

  • 2.
    Named leader inreport. Founded in 2009   Acquired by AOL in 2014   Using Big Data stack since 2009 75 people - 30 R&D 1.5T of daily data >100 data sources
  • 3.
    Helping marketers tooptimize their spend Across: Channels | Devices Online + Offline Convertro
  • 4.
  • 5.
    Rendering time   Storage   Cost   UX/I   Considerations Processing time  
  • 6.
    Insights   Comparison   Over  2me   Explore   “S2cky”   Configura2on   RT  metrics   Integrate  
  • 7.
  • 8.
  • 9.
    Materialization to onetable is too costly (belated massive updates) Leverage Vertica’s sorted data structure Join data in run time ( O(n) ) Query   Spend     Batch   Revenue   Speed   Query* Merged   Structure   Spend     Batch   Revenue   Speed   * λ architecture
  • 10.
  • 11.
    Different     metrics   with     1:N  rela2onships  
  • 12.
    Avoid joins inquery time ( if possible ) Pre joining and aggregate by dimensions Pre joining does not necessarily explode your data store Visits,    Conversions,  Impressions   Conversions   Impressions   ⨝   ⨝   Σ   Visits  
  • 13.
  • 14.
  • 15.
    Limit number ofreturned records to screen – vizualize the most significant data     Allow to dump data with different QOS
  • 16.
    Allow to chooseup to X dimensions – not all   For each page allow to choose different relevant dimensions   Build different data structures for different pages
  • 17.
  • 18.
  • 19.
  • 20.
    Query locality caching   Backend does data rendering     Shared configuration across widgets     MPP has a limited query schedulers   Table   Query   Cache   Σ   Widget  1   Widget  2  
  • 21.
  • 22.
    Real  Time     data  points     2cker  
  • 23.
    Sometimes you don’thave to be 100% accurate or consistent try using: Extrapolation Sampling Different data stores Heuristics logs   Speed  layer   Ticker   Every  X   minutes   Real  2me     extrapola2on  
  • 24.
    Hydro – DataRendering Service Hydro E X T R A C T   T R N S F O R M R E N D E R   ETL   Web/ App   Server   API   DB1   DB2   Connect to any data source Multi level caching and invalidation Applying data transformation and rendering Logic sharing
  • 25.
    Understand the requirements Onetechnology doesn't fit all One data structure doesn't fit all Good UX takes into account Data and Technology considerations
  • 27.
  • 28.
  • 29.
    Analytics DB -Vertica Built  for  analy>cs   Storage  /  Query  engine  /  Op2mizer     Column  oriented  store   Sorted   True  MPP     Deals  well  with  high  cardinality  and  sparse  data   *not an open source
  • 30.
  • 31.
    Web Stack Server       Pandas   Hydro   Client     Backbone   marioneVe   RequireJs   handlebars       highcharts   d3     underscore   TwiVer  Bootstrap   SlickGrid   ...   Architecture   Visualizaion