Tapping the real-time stream
with SQL
Wix’s SQL-on-Storm Platform
May, 2015
Gregory	
  Bondar,	
  gregoryb@wix.com	
  
Igal	
  Shilman,	
  igals@wix.com	
  
	
  
Wix	
  Company	
  
•  Wix.com	
  is	
  the	
  world’s	
  leading	
  cloud-­‐based	
  web	
  development	
  pla<orm	
  
that	
  enables	
  to	
  create	
  professional	
  HTML5	
  websites	
  using	
  online	
  "Drag	
  
&	
  Drop"	
  tools	
  
•  Wix	
  was	
  founded	
  in	
  2006,	
  headquartered	
  in	
  Tel	
  Aviv	
  
•  Wix	
  has	
  around	
  65M	
  registered	
  users	
  and	
  growing…	
  
Wix’s	
  Data	
  Services:	
  building	
  blocks	
  
•  Batch-­‐Oriented	
  Data	
  Processing:	
  
-  Hadoop	
  ecosystem:	
  Cloudera	
  CDH4,	
  HBase,	
  Pig,	
  Oozie,	
  etc.	
  
	
  
•  SQL-­‐on-­‐Hadoop	
  interfaces:	
  
-  Facebook’s	
  Presto	
  with	
  “home-­‐made”	
  Parquet,	
  HBase	
  and	
  MS	
  SQL	
  
connectors	
  
	
  
•  Real-­‐Zme	
  Stream-­‐Oriented	
  AnalyZcs:	
  
-  Storm,	
  Esper,	
  etc.	
  	
  
	
  
•  And	
  more:	
  
-  Microso	
  SQL	
  Server	
  2012	
  
-  Google	
  Cloud	
  (AppEngine,	
  Datastore,	
  Pub/Sub,	
  Dataflow,	
  etc.)	
  
-  Sharded	
  Redis	
  cluster	
  
Major	
  limitaZons	
  pushed	
  us	
  into	
  Data	
  Stream	
  journey	
  
•  Latency,	
  latency,	
  laaaaaaaaatency…	
  
–  Events	
  ingesZon	
  latency	
  (10-­‐20	
  minutes	
  on	
  average)	
  
–  Hadoop	
  is	
  opZmized	
  for	
  batch-­‐oriented	
  processing	
  of	
  historical	
  data	
  
–  Latency	
  of	
  analyZc	
  job	
  results	
  (up	
  to	
  dozens	
  of	
  minutes)	
  
–  Unpredictable	
  consumpZon	
  of	
  Hadoop	
  cluster	
  resources	
  by	
  on-­‐
demand	
  analyZc	
  jobs	
  
Use	
  Cases	
  that	
  require	
  Real-­‐Time	
  Data	
  Stream	
  AnalyZcs	
  
•  Product	
  personalizaZon	
  
•  Analysis	
  of	
  user	
  behavior	
  trends	
  and	
  anomalies	
  
•  OperaZonal	
  analyZcs	
  (monitoring,	
  security,	
  etc.)	
  
•  Machine	
  learning	
  models	
  against	
  user	
  acZvity	
  to	
  predict	
  user	
  behavior	
  	
  
Wix	
  Data	
  Stream	
  Tube	
  
Let’s	
  assume	
  that	
  all	
  Wix’s	
  events	
  flows	
  	
  
through	
  a	
  one	
  tube	
  named	
  “events”	
  
SQL-­‐like	
  query	
  language	
  
SQL-­‐like	
  query	
  language	
  (Cont.)	
  
Wix’s	
  SQL-­‐on-­‐Storm:	
  requirements	
  
•  DemocraZzing	
  Data,	
  self-­‐service	
  to	
  access	
  and	
  
uZlize	
  as	
  much	
  data	
  as	
  legally	
  possible	
  
•  User-­‐friendly	
  interface	
  for	
  SQL	
  patriots	
  
•  Flexibility	
  to	
  execute	
  any	
  kind	
  of	
  queries	
  
•  Ability	
  to	
  output	
  the	
  query	
  results	
  to	
  external	
  
services	
  
•  On-­‐demand	
  and	
  long-­‐running	
  queries	
  support	
  
•  Knowledge	
  sharing:	
  “ready-­‐to-­‐use”	
  query	
  templates	
  
•  High	
  throughput	
  	
  and	
  maximum	
  upZme	
  
Integrated	
  usage	
  of	
  Storm	
  and	
  Esper	
  
Esper	
  -­‐	
  hgp://www.espertech.com/esper/	
  
•  Esper	
  –	
  light-­‐weight	
  Java	
  library	
  for	
  complex	
  event	
  
processing	
  (CEP)	
  and	
  event	
  series	
  analysis	
  
•  Why	
  Esper?	
  
–  Offers	
  rich	
  SQL-­‐like	
  event	
  processing	
  language	
  (EPL)	
  
supporZng	
  very	
  complex	
  event	
  streaming	
  analyZcs	
  
–  Easy	
  to	
  integrate	
  and	
  use	
  
–  Very	
  stable,	
  with	
  high	
  performance	
  metrics	
  
–  AcZvely	
  developed	
  
–  Open	
  source,	
  well	
  documented	
  
Storm	
  topology	
  reuse	
  by	
  correct	
  parZZon	
  key	
  
•  Accepts	
  events	
  from	
  log	
  collectors	
  
•  Converts	
  them	
  to	
  enriched	
  objects	
  
•  Hash	
  parZZon	
  objects	
  by	
  key	
  (e.g.,	
  user	
  id,	
  request	
  id)	
  
Compute	
  Bolt	
  
•  Manages	
  Esper	
  engine	
  instances	
  	
  
•  Deploy/un-­‐deploy	
  queries	
  on	
  demand	
  
•  Routes	
  query	
  results	
  to	
  the	
  ac:on	
  /	
  aggrega:on	
  layers	
  
AcZons	
  
•  PersonalizaZon	
  
Services	
  
•  Graphite	
  
•  Database	
  
•  New	
  Relic	
  
•  Email	
  
•  UDP	
  and	
  HTTP	
  
output	
  
Wix	
  SQL-­‐on-­‐Storm	
  Dashboard:	
  Demo	
  
AggregaZon	
  Bolt	
  
•  Special	
  acZon	
  type	
  aggregaZng	
  parZal	
  results	
  of	
  Compute	
  Bolts	
  
•  In	
  another	
  words:	
  Map-­‐Reduce	
  paradigm	
  implementaZon	
  for	
  streaming	
  
Wix	
  SQL-­‐on-­‐Storm	
  –	
  AggregaZon	
  Queries:	
  Demo	
  
Wix	
  SQL-­‐on-­‐Storm:	
  Architecture	
  Summary	
  
Any	
  QuesZons?!	
  

Wix sql on-storm-platform

  • 1.
        Tapping thereal-time stream with SQL Wix’s SQL-on-Storm Platform May, 2015 Gregory  Bondar,  gregoryb@wix.com   Igal  Shilman,  igals@wix.com    
  • 2.
    Wix  Company   • Wix.com  is  the  world’s  leading  cloud-­‐based  web  development  pla<orm   that  enables  to  create  professional  HTML5  websites  using  online  "Drag   &  Drop"  tools   •  Wix  was  founded  in  2006,  headquartered  in  Tel  Aviv   •  Wix  has  around  65M  registered  users  and  growing…  
  • 3.
    Wix’s  Data  Services:  building  blocks   •  Batch-­‐Oriented  Data  Processing:   -  Hadoop  ecosystem:  Cloudera  CDH4,  HBase,  Pig,  Oozie,  etc.     •  SQL-­‐on-­‐Hadoop  interfaces:   -  Facebook’s  Presto  with  “home-­‐made”  Parquet,  HBase  and  MS  SQL   connectors     •  Real-­‐Zme  Stream-­‐Oriented  AnalyZcs:   -  Storm,  Esper,  etc.       •  And  more:   -  Microso  SQL  Server  2012   -  Google  Cloud  (AppEngine,  Datastore,  Pub/Sub,  Dataflow,  etc.)   -  Sharded  Redis  cluster  
  • 4.
    Major  limitaZons  pushed  us  into  Data  Stream  journey   •  Latency,  latency,  laaaaaaaaatency…   –  Events  ingesZon  latency  (10-­‐20  minutes  on  average)   –  Hadoop  is  opZmized  for  batch-­‐oriented  processing  of  historical  data   –  Latency  of  analyZc  job  results  (up  to  dozens  of  minutes)   –  Unpredictable  consumpZon  of  Hadoop  cluster  resources  by  on-­‐ demand  analyZc  jobs  
  • 5.
    Use  Cases  that  require  Real-­‐Time  Data  Stream  AnalyZcs   •  Product  personalizaZon   •  Analysis  of  user  behavior  trends  and  anomalies   •  OperaZonal  analyZcs  (monitoring,  security,  etc.)   •  Machine  learning  models  against  user  acZvity  to  predict  user  behavior    
  • 6.
    Wix  Data  Stream  Tube   Let’s  assume  that  all  Wix’s  events  flows     through  a  one  tube  named  “events”  
  • 7.
  • 8.
  • 9.
    Wix’s  SQL-­‐on-­‐Storm:  requirements   •  DemocraZzing  Data,  self-­‐service  to  access  and   uZlize  as  much  data  as  legally  possible   •  User-­‐friendly  interface  for  SQL  patriots   •  Flexibility  to  execute  any  kind  of  queries   •  Ability  to  output  the  query  results  to  external   services   •  On-­‐demand  and  long-­‐running  queries  support   •  Knowledge  sharing:  “ready-­‐to-­‐use”  query  templates   •  High  throughput    and  maximum  upZme  
  • 10.
    Integrated  usage  of  Storm  and  Esper  
  • 11.
    Esper  -­‐  hgp://www.espertech.com/esper/   •  Esper  –  light-­‐weight  Java  library  for  complex  event   processing  (CEP)  and  event  series  analysis   •  Why  Esper?   –  Offers  rich  SQL-­‐like  event  processing  language  (EPL)   supporZng  very  complex  event  streaming  analyZcs   –  Easy  to  integrate  and  use   –  Very  stable,  with  high  performance  metrics   –  AcZvely  developed   –  Open  source,  well  documented  
  • 12.
    Storm  topology  reuse  by  correct  parZZon  key   •  Accepts  events  from  log  collectors   •  Converts  them  to  enriched  objects   •  Hash  parZZon  objects  by  key  (e.g.,  user  id,  request  id)  
  • 13.
    Compute  Bolt   • Manages  Esper  engine  instances     •  Deploy/un-­‐deploy  queries  on  demand   •  Routes  query  results  to  the  ac:on  /  aggrega:on  layers  
  • 14.
    AcZons   •  PersonalizaZon   Services   •  Graphite   •  Database   •  New  Relic   •  Email   •  UDP  and  HTTP   output  
  • 15.
  • 16.
    AggregaZon  Bolt   • Special  acZon  type  aggregaZng  parZal  results  of  Compute  Bolts   •  In  another  words:  Map-­‐Reduce  paradigm  implementaZon  for  streaming  
  • 17.
    Wix  SQL-­‐on-­‐Storm  –  AggregaZon  Queries:  Demo  
  • 18.
  • 19.