Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gregg, AutoTrader)

3,764 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics
  • If you’re struggling with your assignments like me, check out ⇒ www.HelpWriting.net ⇐. My friend sent me a link to to tis site. This awesome company. After I was continuously complaining to my family and friends about the ordeals of student life. They wrote my entire research paper for me, and it turned out brilliantly. I highly recommend this service to anyone in my shoes. ⇒ www.HelpWriting.net ⇐.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 16,000 woodworking Projects. Decks, Sheds, Greenhouses, Chairs & Tables, File Cabinets and Much More! #1 Recommended. Download Your Plans NOW! ➜➜➜ https://url.cn/ktFCrsHZ
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gregg, AutoTrader)

  1. 1. Visualizing 
 Autotrader Traffic Using Spark Streaming Jon Gregg, Cox Automotive
  2. 2. Overview • Cox Automotive and Hadoop • Spark Streaming application • Spark roadmap at Cox Automotive
  3. 3. Why we’re using Hadoop
  4. 4. $45B+ vehicle values sold annually through Manheim AutoTrader.com has 18M unique visitors each month and lists an average of 4M cars monthly Kelley Blue Book provides values for 290M cars annually and has 18M+ unique visitors monthly vAuto has over 2.3M active vehicles in inventory and started 1.65M vehicle appraisals in Oct 265K vehicles sold per month (on average) through a VinSolutions CRM Cox Automotive The Journey
  5. 5. Cox Automotive $45B+ vehicle values sold annually through Manheim AutoTrader.com has 18M unique visitors each month and lists an average of 4M cars monthly Kelley Blue Book provides values for 290M cars annually and has 18M+ unique visitors monthly vAuto has over 2.3M active vehicles in inventory and started 1.65M vehicle appraisals in Oct 265K vehicles sold per month (on average) through a VinSolutions CRM Cox Automotive The Journey • Over 25 companies
 (and growing)
 
 • Facilitate joining data, analyst collaboration • Hadoop cluster, dedicated ingest team
  6. 6. Use Hadoop where it makes sense • Joining data from across several companies • Large amounts of data (Querying and Reporting) • Build out business logic so it’s shareable
  7. 7. That’s all great… … but we also have to showcase what Hadoop can do
  8. 8. Autotrader’s “Autobowl”
  9. 9. Autobowl: Goal Find which Big Game car commercial led to the greatest Autotrader traffic increase, as a proxy for influence on consumers?
  10. 10. Two solutions • Hive on MapReduce
 Mature, supported product
 Shows SQL’s capabilities on Hadoop • Spark
 Started as a POC, no expectations
 How would it work with YARN, Kerberos?
  11. 11. Autobowl Hourly Data Make Model Hour VDPs Searches … Kia Sedona 9pm 300 290 Kia Sedona 10pm 310 320 Kia Sorento 3pm 220 240 Kia Sorento 4pm 210 220 Kia Sorento 5pm 350 380 … +70%!
  12. 12. Comparison: Hive vs. Spark
  13. 13. Hive vs. Spark: Processing Time Hive Spark 1.5 min 18 min Minutes to Process 1hr of Site Activity Data (And a month 
 to spare!)
  14. 14. Spark Streaming for Near Realtime Visualization of Traffic
  15. 15. Autobowl Hourly Data Make Model Hour VDPs Searches … Kia Sedona 9pm 300 290 Kia Sedona 10pm 310 320 Kia Sorento 3pm 220 240 Kia Sorento 4pm 210 220 Kia Sorento 5pm 350 380 … How about a visualization using Spark Streaming?
  16. 16. High-level architecture diagram web server web server web server web server emitter kafka Hadoop (Spark) AWS
  17. 17. Video
  18. 18. (Screenshot from Video)
  19. 19. What’s next?
  20. 20. • Detecting anomalies in Autotrader metrics after a site update Other Visualization use cases
  21. 21. • Detecting anomalies in Autotrader metrics after a site update • Executive dashboards • Visualizations for A/B testing Other Visualization use cases
  22. 22. Gaining BI Adoption
  23. 23. • Most BI users use Hive or point-and-click app • But there’s been a shift - Spark is in use by analyst teams within Autotrader, KBB using Python • Spark used by developers at Autotrader, KBB, Mannheim, NextGear Gaining BI Adoption
  24. 24. • Speed improvements with Dataframes, 
 Kafka integration • “Easier sell” than Java/Scala
 Scripting, visualization+analytics packages • Onboarding users
 Central repository with best practices
 Individual support for BI Spark champions
 Guidelines on setting Spark parameters
 Python: our primary Spark language

×