SlideShare a Scribd company logo
The Big Data Journey!
at Connexity!
!
Will Gage!
wgage@connexity.com!
!
@gapjump!
!
Connexity
Shopping powers our marketing platforms!
2!
•  Paid	
  Search	
  &	
  Marketplace	
  
Performance-­‐based	
  marke8ng	
  that	
  finds	
  in-­‐
market	
  shoppers	
  and	
  delivers	
  conversions	
  at	
  
lower	
  cost	
  
•  Bizrate	
  Insights	
  
A	
  repor8ng	
  and	
  ra8ngs	
  plaAorm	
  that	
  captures	
  
the	
  power	
  of	
  the	
  consumer	
  voice.	
  
•  Display	
  Media	
  
An	
  audience	
  ac8va8on	
  plaAorm	
  that	
  integrates	
  
retail	
  data	
  and	
  programma8c	
  buying.	
  
Connexity History
Don’t worry - there is no test later!
3!
Connexity Technology
The Pre-Big Data Era!
!
4!
Connexity Technology
The Big Data Explosion!
!
!
5!
Lessons Learned



“There’s a funny thing about regret... It’s better to regret
something you have done, than something you haven’t.” – Gibby
Haynes





6!
Keep It Edgy
It is better to be closer to the bleeding edge than behind
the curve!
Case Study: Riak in SEM Keyword Service
7!
o  Online access to metadata for keywords marketed through SEM channels!
o  Used in-line with handling end-user traffic from search engines – revenue impacting!
o  Handled 1.2 billion keywords at the time of this project!
o  Projected 2x growth in 12 months!
o  Needed to create system that could run in external cloud data center!
o  Existing system scaled via proprietary memory grid cache!
Keep It Edgy
Case Study: Riak in SEM Keyword Service
8!
o  Prototyped several solutions: Redis, MongoDB, MySQL!
o  Chose Riak for scalability, stability, unfussiness!
o  Hardware:!
6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across
chassis!
A few examples that graduated to production!
!
o  Use of Cassandra within Inventory systems!
o  SitePerf: in-house availability monitoring tool!
o  Several different customer-facing advertising products!
o  Hadoop implementations of core bidding platform!
o  Mock Service: Like Wiremock with persistence to MySQL!
o  Numerous internal tools for managing our systems!
R & D
10% time: Give all engineers the opportunity to experiment!
9!
R & D
10% time: Give all engineers the opportunity to experiment!
10!
Quality Assurance
Any new technology choice should improve or maintain
test automation coverage!
Case Study: Hadoop + Solr + BDD
11!
Existing Technologies
Reasons to stay with an older technology!
!
1.  It works well!
2.  Your business depends on it!
3.  Your team is very knowledgeable in its operation!
4.  It fits your budget!
!
!
!
12!
New Technologies
Reasons to use a new technology!
!
1. It makes new things possible or very difficult things easier!
•  Hadoop / MapReduce !
•  Auto-sharding distributed key-value data
stores (Cassandra, Hbase, VoltDB, Riak,
etc)!
•  Distributed stream-processing systems
(Storm)!
13!
New Technologies
Reasons to use a new technology!
!
2. It will save your company money!
•  Hardware !
•  Software Licensing!
•  Bandwidth!
•  Power Consumption!
!
14!
New Technologies
Reasons to use a new technology: saving money!
!
15!
New Technologies
Reasons to use a new technology!
!
3. It will save you time!
•  Time to market !
•  Time spent on operational complexity!
•  Time fighting fires!
•  Compute time!
16!
New Technologies
Reasons to use a new technology: saving time!
!
Example: FastTrack!
!
17!
New Technologies
Reasons to use a new technology!
!
4. It brings you in line with industry standards!
•  Moving from home-grown frameworks to
Hadoop, Solr!
•  Where possible, running on JVM-based
systems!
!
18!
Future Trends
19!
o  Like you, the data we work with is only growing!
o  We are consolidating the number and variety of NoSQL solutions that we
use.!
o  We’re looking at better abstractions for Java MapReduce programming:
Crunch, Cascading, …!
o  Have dipped our toes in the water with Storm, but expect heavier stream-
processing needs soon!
o  Still looking for a bulletproof way of importing data from various sources into
Hadoop: LinkedIn’s Gobblin shows some promise there!
o  Big data technologies are becoming more distributed across our
organization!
!
In Closing
20!
You should:!
!
o  Stay within walking distance of the bleeding edge!
o  Empower your engineers to experiment!
o  Always move in the direction of better automated testing!
o  Keep using the old technologies that are awesome!
o  Make new things possible!
o  Save your company money!
o  Save your company time!
o  Stay in line with industry standards!
o  Call your family once in a while!
!
… and you can do all of these things on your own big data journeys!
!

More Related Content

Viewers also liked

LA Salesforce.com User Group: Shopzilla and Informatica Cloud
LA Salesforce.com User Group: Shopzilla and Informatica CloudLA Salesforce.com User Group: Shopzilla and Informatica Cloud
LA Salesforce.com User Group: Shopzilla and Informatica Cloud
Darren Cunningham
 
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
MongoDB
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland
 
Shopzilla - Performance By Design
Shopzilla - Performance By DesignShopzilla - Performance By Design
Shopzilla - Performance By Design
Tim Morrow
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
MongoDB
 
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
Internet Marketing Software - WordStream
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
Laurent Kinet
 
Big data retail_industry_by VivekChutke
Big data retail_industry_by VivekChutkeBig data retail_industry_by VivekChutke
Big data retail_industry_by VivekChutke
vchutke
 
The Big Data Revolution in Retail
The Big Data Revolution in RetailThe Big Data Revolution in Retail
The Big Data Revolution in Retail
Market Research Reports, Inc.
 
Big Data in Retail: too big to ignore
Big Data in Retail: too big to ignoreBig Data in Retail: too big to ignore
Big Data in Retail: too big to ignore
valantic NL
 
Continuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile DevelopmentContinuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile Development
Dynatrace
 
Connected Retail Reference Architecture
Connected Retail Reference ArchitectureConnected Retail Reference Architecture
Connected Retail Reference Architecture
WSO2
 
Sac prep and soi
Sac prep and soiSac prep and soi
Sac prep and soiPaul
 
V Mware Workstation 6 Ivnua
V Mware Workstation 6 IvnuaV Mware Workstation 6 Ivnua
V Mware Workstation 6 Ivnua
-
 
Mund te mos te ta them gjithmone
Mund te mos te ta them gjithmoneMund te mos te ta them gjithmone
Mund te mos te ta them gjithmoneeni45654
 
India Social Media Case Study - Most Intensive Social Media Campaign
India Social Media Case Study - Most Intensive Social Media CampaignIndia Social Media Case Study - Most Intensive Social Media Campaign
India Social Media Case Study - Most Intensive Social Media CampaignThe In Things
 
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
Communicate Magazine
 
Malaysia Special Ev Set
Malaysia Special Ev SetMalaysia Special Ev Set
Malaysia Special Ev SetCicak
 
Giles Palmer, Evaluation, Google+ for businesses and brands
Giles Palmer, Evaluation, Google+ for businesses and brandsGiles Palmer, Evaluation, Google+ for businesses and brands
Giles Palmer, Evaluation, Google+ for businesses and brands
Communicate Magazine
 

Viewers also liked (20)

LA Salesforce.com User Group: Shopzilla and Informatica Cloud
LA Salesforce.com User Group: Shopzilla and Informatica CloudLA Salesforce.com User Group: Shopzilla and Informatica Cloud
LA Salesforce.com User Group: Shopzilla and Informatica Cloud
 
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
Shopzilla - Performance By Design
Shopzilla - Performance By DesignShopzilla - Performance By Design
Shopzilla - Performance By Design
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
 
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
5 Conversion Rate Hacks That Yield Massive 3-5x Conversion Rate Improvements ...
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
 
Big data retail_industry_by VivekChutke
Big data retail_industry_by VivekChutkeBig data retail_industry_by VivekChutke
Big data retail_industry_by VivekChutke
 
The Big Data Revolution in Retail
The Big Data Revolution in RetailThe Big Data Revolution in Retail
The Big Data Revolution in Retail
 
Big Data in Retail: too big to ignore
Big Data in Retail: too big to ignoreBig Data in Retail: too big to ignore
Big Data in Retail: too big to ignore
 
Continuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile DevelopmentContinuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile Development
 
Connected Retail Reference Architecture
Connected Retail Reference ArchitectureConnected Retail Reference Architecture
Connected Retail Reference Architecture
 
Sac prep and soi
Sac prep and soiSac prep and soi
Sac prep and soi
 
V Mware Workstation 6 Ivnua
V Mware Workstation 6 IvnuaV Mware Workstation 6 Ivnua
V Mware Workstation 6 Ivnua
 
Mund te mos te ta them gjithmone
Mund te mos te ta them gjithmoneMund te mos te ta them gjithmone
Mund te mos te ta them gjithmone
 
India Social Media Case Study - Most Intensive Social Media Campaign
India Social Media Case Study - Most Intensive Social Media CampaignIndia Social Media Case Study - Most Intensive Social Media Campaign
India Social Media Case Study - Most Intensive Social Media Campaign
 
Oš Preska
Oš PreskaOš Preska
Oš Preska
 
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
Reputation in Oil Gas and Mining 2014: Reputation, reputation risk and reputa...
 
Malaysia Special Ev Set
Malaysia Special Ev SetMalaysia Special Ev Set
Malaysia Special Ev Set
 
Giles Palmer, Evaluation, Google+ for businesses and brands
Giles Palmer, Evaluation, Google+ for businesses and brandsGiles Palmer, Evaluation, Google+ for businesses and brands
Giles Palmer, Evaluation, Google+ for businesses and brands
 

Similar to The Big Data Journey at Connexity - Big Data Day LA 2015

Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Data Con LA
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
OCTO Technology
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
BigDataEverywhere
 
Kognitio cloud webinar feb 2013
Kognitio cloud webinar feb 2013Kognitio cloud webinar feb 2013
Kognitio cloud webinar feb 2013Kognitio
 
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebula Project
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOT
MongoDB
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
RightScale
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDB
MongoDB
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
WSO2Con US 2013 - Connected Business - making it happen
WSO2Con US 2013 - Connected Business - making it happenWSO2Con US 2013 - Connected Business - making it happen
WSO2Con US 2013 - Connected Business - making it happenWSO2
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
Eric Kavanagh
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
Chris Dagdigian
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
C4Media
 
Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?
Mandie Quartly
 
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
panagenda
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
Amazon Web Services
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
gmalouf678
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Sri Ambati
 
Gartner Catalyst 2015 Customer Presentation - MindTouch
Gartner Catalyst 2015 Customer Presentation - MindTouchGartner Catalyst 2015 Customer Presentation - MindTouch
Gartner Catalyst 2015 Customer Presentation - MindTouch
Splunk
 

Similar to The Big Data Journey at Connexity - Big Data Day LA 2015 (20)

Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Kognitio cloud webinar feb 2013
Kognitio cloud webinar feb 2013Kognitio cloud webinar feb 2013
Kognitio cloud webinar feb 2013
 
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOT
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDB
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
WSO2Con US 2013 - Connected Business - making it happen
WSO2Con US 2013 - Connected Business - making it happenWSO2Con US 2013 - Connected Business - making it happen
WSO2Con US 2013 - Connected Business - making it happen
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?
 
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
 
Gartner Catalyst 2015 Customer Presentation - MindTouch
Gartner Catalyst 2015 Customer Presentation - MindTouchGartner Catalyst 2015 Customer Presentation - MindTouch
Gartner Catalyst 2015 Customer Presentation - MindTouch
 

Recently uploaded

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 

Recently uploaded (20)

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 

The Big Data Journey at Connexity - Big Data Day LA 2015

  • 1. The Big Data Journey! at Connexity! ! Will Gage! wgage@connexity.com! ! @gapjump! !
  • 2. Connexity Shopping powers our marketing platforms! 2! •  Paid  Search  &  Marketplace   Performance-­‐based  marke8ng  that  finds  in-­‐ market  shoppers  and  delivers  conversions  at   lower  cost   •  Bizrate  Insights   A  repor8ng  and  ra8ngs  plaAorm  that  captures   the  power  of  the  consumer  voice.   •  Display  Media   An  audience  ac8va8on  plaAorm  that  integrates   retail  data  and  programma8c  buying.  
  • 3. Connexity History Don’t worry - there is no test later! 3!
  • 5. Connexity Technology The Big Data Explosion! ! ! 5!
  • 6. Lessons Learned
 
 “There’s a funny thing about regret... It’s better to regret something you have done, than something you haven’t.” – Gibby Haynes
 
 
 6!
  • 7. Keep It Edgy It is better to be closer to the bleeding edge than behind the curve! Case Study: Riak in SEM Keyword Service 7! o  Online access to metadata for keywords marketed through SEM channels! o  Used in-line with handling end-user traffic from search engines – revenue impacting! o  Handled 1.2 billion keywords at the time of this project! o  Projected 2x growth in 12 months! o  Needed to create system that could run in external cloud data center! o  Existing system scaled via proprietary memory grid cache!
  • 8. Keep It Edgy Case Study: Riak in SEM Keyword Service 8! o  Prototyped several solutions: Redis, MongoDB, MySQL! o  Chose Riak for scalability, stability, unfussiness! o  Hardware:! 6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across chassis!
  • 9. A few examples that graduated to production! ! o  Use of Cassandra within Inventory systems! o  SitePerf: in-house availability monitoring tool! o  Several different customer-facing advertising products! o  Hadoop implementations of core bidding platform! o  Mock Service: Like Wiremock with persistence to MySQL! o  Numerous internal tools for managing our systems! R & D 10% time: Give all engineers the opportunity to experiment! 9!
  • 10. R & D 10% time: Give all engineers the opportunity to experiment! 10!
  • 11. Quality Assurance Any new technology choice should improve or maintain test automation coverage! Case Study: Hadoop + Solr + BDD 11!
  • 12. Existing Technologies Reasons to stay with an older technology! ! 1.  It works well! 2.  Your business depends on it! 3.  Your team is very knowledgeable in its operation! 4.  It fits your budget! ! ! ! 12!
  • 13. New Technologies Reasons to use a new technology! ! 1. It makes new things possible or very difficult things easier! •  Hadoop / MapReduce ! •  Auto-sharding distributed key-value data stores (Cassandra, Hbase, VoltDB, Riak, etc)! •  Distributed stream-processing systems (Storm)! 13!
  • 14. New Technologies Reasons to use a new technology! ! 2. It will save your company money! •  Hardware ! •  Software Licensing! •  Bandwidth! •  Power Consumption! ! 14!
  • 15. New Technologies Reasons to use a new technology: saving money! ! 15!
  • 16. New Technologies Reasons to use a new technology! ! 3. It will save you time! •  Time to market ! •  Time spent on operational complexity! •  Time fighting fires! •  Compute time! 16!
  • 17. New Technologies Reasons to use a new technology: saving time! ! Example: FastTrack! ! 17!
  • 18. New Technologies Reasons to use a new technology! ! 4. It brings you in line with industry standards! •  Moving from home-grown frameworks to Hadoop, Solr! •  Where possible, running on JVM-based systems! ! 18!
  • 19. Future Trends 19! o  Like you, the data we work with is only growing! o  We are consolidating the number and variety of NoSQL solutions that we use.! o  We’re looking at better abstractions for Java MapReduce programming: Crunch, Cascading, …! o  Have dipped our toes in the water with Storm, but expect heavier stream- processing needs soon! o  Still looking for a bulletproof way of importing data from various sources into Hadoop: LinkedIn’s Gobblin shows some promise there! o  Big data technologies are becoming more distributed across our organization! !
  • 20. In Closing 20! You should:! ! o  Stay within walking distance of the bleeding edge! o  Empower your engineers to experiment! o  Always move in the direction of better automated testing! o  Keep using the old technologies that are awesome! o  Make new things possible! o  Save your company money! o  Save your company time! o  Stay in line with industry standards! o  Call your family once in a while! ! … and you can do all of these things on your own big data journeys! !