The Big Data Journey!
at Connexity!
!
Will Gage!
wgage@connexity.com!
!
@gapjump!
!
Connexity
Shopping powers our marketing platforms!
2!
•  Paid	
  Search	
  &	
  Marketplace	
  
Performance-­‐based	
  marke8ng	
  that	
  finds	
  in-­‐
market	
  shoppers	
  and	
  delivers	
  conversions	
  at	
  
lower	
  cost	
  
•  Bizrate	
  Insights	
  
A	
  repor8ng	
  and	
  ra8ngs	
  plaAorm	
  that	
  captures	
  
the	
  power	
  of	
  the	
  consumer	
  voice.	
  
•  Display	
  Media	
  
An	
  audience	
  ac8va8on	
  plaAorm	
  that	
  integrates	
  
retail	
  data	
  and	
  programma8c	
  buying.	
  
Connexity History
Don’t worry - there is no test later!
3!
Connexity Technology
The Pre-Big Data Era!
!
4!
Connexity Technology
The Big Data Explosion!
!
!
5!
Lessons Learned



“There’s a funny thing about regret... It’s better to regret
something you have done, than something you haven’t.” – Gibby
Haynes





6!
Keep It Edgy
It is better to be closer to the bleeding edge than behind
the curve!
Case Study: Riak in SEM Keyword Service
7!
o  Online access to metadata for keywords marketed through SEM channels!
o  Used in-line with handling end-user traffic from search engines – revenue impacting!
o  Handled 1.2 billion keywords at the time of this project!
o  Projected 2x growth in 12 months!
o  Needed to create system that could run in external cloud data center!
o  Existing system scaled via proprietary memory grid cache!
Keep It Edgy
Case Study: Riak in SEM Keyword Service
8!
o  Prototyped several solutions: Redis, MongoDB, MySQL!
o  Chose Riak for scalability, stability, unfussiness!
o  Hardware:!
6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across
chassis!
A few examples that graduated to production!
!
o  Use of Cassandra within Inventory systems!
o  SitePerf: in-house availability monitoring tool!
o  Several different customer-facing advertising products!
o  Hadoop implementations of core bidding platform!
o  Mock Service: Like Wiremock with persistence to MySQL!
o  Numerous internal tools for managing our systems!
R & D
10% time: Give all engineers the opportunity to experiment!
9!
R & D
10% time: Give all engineers the opportunity to experiment!
10!
Quality Assurance
Any new technology choice should improve or maintain
test automation coverage!
Case Study: Hadoop + Solr + BDD
11!
Existing Technologies
Reasons to stay with an older technology!
!
1.  It works well!
2.  Your business depends on it!
3.  Your team is very knowledgeable in its operation!
4.  It fits your budget!
!
!
!
12!
New Technologies
Reasons to use a new technology!
!
1. It makes new things possible or very difficult things easier!
•  Hadoop / MapReduce !
•  Auto-sharding distributed key-value data
stores (Cassandra, Hbase, VoltDB, Riak,
etc)!
•  Distributed stream-processing systems
(Storm)!
13!
New Technologies
Reasons to use a new technology!
!
2. It will save your company money!
•  Hardware !
•  Software Licensing!
•  Bandwidth!
•  Power Consumption!
!
14!
New Technologies
Reasons to use a new technology: saving money!
!
15!
New Technologies
Reasons to use a new technology!
!
3. It will save you time!
•  Time to market !
•  Time spent on operational complexity!
•  Time fighting fires!
•  Compute time!
16!
New Technologies
Reasons to use a new technology: saving time!
!
Example: FastTrack!
!
17!
New Technologies
Reasons to use a new technology!
!
4. It brings you in line with industry standards!
•  Moving from home-grown frameworks to
Hadoop, Solr!
•  Where possible, running on JVM-based
systems!
!
18!
Future Trends
19!
o  Like you, the data we work with is only growing!
o  We are consolidating the number and variety of NoSQL solutions that we
use.!
o  We’re looking at better abstractions for Java MapReduce programming:
Crunch, Cascading, …!
o  Have dipped our toes in the water with Storm, but expect heavier stream-
processing needs soon!
o  Still looking for a bulletproof way of importing data from various sources into
Hadoop: LinkedIn’s Gobblin shows some promise there!
o  Big data technologies are becoming more distributed across our
organization!
!
In Closing
20!
You should:!
!
o  Stay within walking distance of the bleeding edge!
o  Empower your engineers to experiment!
o  Always move in the direction of better automated testing!
o  Keep using the old technologies that are awesome!
o  Make new things possible!
o  Save your company money!
o  Save your company time!
o  Stay in line with industry standards!
o  Call your family once in a while!
!
… and you can do all of these things on your own big data journeys!
!

The Big Data Journey at Connexity - Big Data Day LA 2015

  • 1.
    The Big DataJourney! at Connexity! ! Will Gage! wgage@connexity.com! ! @gapjump! !
  • 2.
    Connexity Shopping powers ourmarketing platforms! 2! •  Paid  Search  &  Marketplace   Performance-­‐based  marke8ng  that  finds  in-­‐ market  shoppers  and  delivers  conversions  at   lower  cost   •  Bizrate  Insights   A  repor8ng  and  ra8ngs  plaAorm  that  captures   the  power  of  the  consumer  voice.   •  Display  Media   An  audience  ac8va8on  plaAorm  that  integrates   retail  data  and  programma8c  buying.  
  • 3.
    Connexity History Don’t worry- there is no test later! 3!
  • 4.
  • 5.
    Connexity Technology The BigData Explosion! ! ! 5!
  • 6.
    Lessons Learned
 
 “There’s afunny thing about regret... It’s better to regret something you have done, than something you haven’t.” – Gibby Haynes
 
 
 6!
  • 7.
    Keep It Edgy Itis better to be closer to the bleeding edge than behind the curve! Case Study: Riak in SEM Keyword Service 7! o  Online access to metadata for keywords marketed through SEM channels! o  Used in-line with handling end-user traffic from search engines – revenue impacting! o  Handled 1.2 billion keywords at the time of this project! o  Projected 2x growth in 12 months! o  Needed to create system that could run in external cloud data center! o  Existing system scaled via proprietary memory grid cache!
  • 8.
    Keep It Edgy CaseStudy: Riak in SEM Keyword Service 8! o  Prototyped several solutions: Redis, MongoDB, MySQL! o  Chose Riak for scalability, stability, unfussiness! o  Hardware:! 6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across chassis!
  • 9.
    A few examplesthat graduated to production! ! o  Use of Cassandra within Inventory systems! o  SitePerf: in-house availability monitoring tool! o  Several different customer-facing advertising products! o  Hadoop implementations of core bidding platform! o  Mock Service: Like Wiremock with persistence to MySQL! o  Numerous internal tools for managing our systems! R & D 10% time: Give all engineers the opportunity to experiment! 9!
  • 10.
    R & D 10%time: Give all engineers the opportunity to experiment! 10!
  • 11.
    Quality Assurance Any newtechnology choice should improve or maintain test automation coverage! Case Study: Hadoop + Solr + BDD 11!
  • 12.
    Existing Technologies Reasons tostay with an older technology! ! 1.  It works well! 2.  Your business depends on it! 3.  Your team is very knowledgeable in its operation! 4.  It fits your budget! ! ! ! 12!
  • 13.
    New Technologies Reasons touse a new technology! ! 1. It makes new things possible or very difficult things easier! •  Hadoop / MapReduce ! •  Auto-sharding distributed key-value data stores (Cassandra, Hbase, VoltDB, Riak, etc)! •  Distributed stream-processing systems (Storm)! 13!
  • 14.
    New Technologies Reasons touse a new technology! ! 2. It will save your company money! •  Hardware ! •  Software Licensing! •  Bandwidth! •  Power Consumption! ! 14!
  • 15.
    New Technologies Reasons touse a new technology: saving money! ! 15!
  • 16.
    New Technologies Reasons touse a new technology! ! 3. It will save you time! •  Time to market ! •  Time spent on operational complexity! •  Time fighting fires! •  Compute time! 16!
  • 17.
    New Technologies Reasons touse a new technology: saving time! ! Example: FastTrack! ! 17!
  • 18.
    New Technologies Reasons touse a new technology! ! 4. It brings you in line with industry standards! •  Moving from home-grown frameworks to Hadoop, Solr! •  Where possible, running on JVM-based systems! ! 18!
  • 19.
    Future Trends 19! o  Likeyou, the data we work with is only growing! o  We are consolidating the number and variety of NoSQL solutions that we use.! o  We’re looking at better abstractions for Java MapReduce programming: Crunch, Cascading, …! o  Have dipped our toes in the water with Storm, but expect heavier stream- processing needs soon! o  Still looking for a bulletproof way of importing data from various sources into Hadoop: LinkedIn’s Gobblin shows some promise there! o  Big data technologies are becoming more distributed across our organization! !
  • 20.
    In Closing 20! You should:! ! o Stay within walking distance of the bleeding edge! o  Empower your engineers to experiment! o  Always move in the direction of better automated testing! o  Keep using the old technologies that are awesome! o  Make new things possible! o  Save your company money! o  Save your company time! o  Stay in line with industry standards! o  Call your family once in a while! ! … and you can do all of these things on your own big data journeys! !