What Drives the Car Business: Moving from Anecdotes to Data

1,215 views
969 views

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,215
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

What Drives the Car Business: Moving from Anecdotes to Data

  1. 1. THE CAR BUSINESS MOVING FROM ANECDOTES TO DATA WHAT DRIVES
  2. 2. WHO WE ARE  TrueCar’s mission is to prove that truth and transparency is a more profitable way of doing business – starting with automotive.  The TrueCar Platform allows for data to be dissected and transformed into easily digestible and usable purchasing tools for the consumer. So you can be a first-time car buyer — you don’t have to be an expert — and actually understand the difference between a bad price, a fair price and a great price.  www.TrueCar.com, TRUE.com, NASDAQ: TRUE
  3. 3. 2.4% 3 $65M ABOUT US JOHN WILLIAMS, SVP PLATFORM OPERATIONS RUSSELL FOLTZ-SMITH, VP DATA PLATFORM Russ is the VP of Data Platform at TrueCar.com, where he creates the intelligence systems driving TrueCar’s innovative interactive product set. Prior to TrueCar, he held executive, product and technical leadership positions at category leaders like IAC, Grind Networks, and Wolfram|Alpha. Russ holds a degree in mathematics from the University of Chicago and currently lives in Marina Del Rey, CA with his wife and two daughters. John Williams is the SVP, Platform Operations of TrueCar. John has over 20 years of experience designing, building and operating large scale Internet infrastructure. John joined TrueCar in March 2011. John is responsible for the technology, security and operations strategy that facilitates explosive growth while still meeting strict requirements for performance, security and reliability. Before joining TrueCar, John was retained as a consultant by numerous world-class technology, financial services, entertainment, military and government organizations. Previously, John was the CTO and co-founder of Preventsys (acquired by McAfee) where he created the world’s first automated security policy compliance system for large enterprise networks. Prior to that he founded and led the network penetration testing team for Internet security pioneer Trusted Information Systems. At the start of his career, John co-founded and built one of New York City’s first Internet Service Providers.
  4. 4. 2.4% 4 OUR CORE SERVICE Provide Interactive Transaction Guidance to Consumers via Web, Mobile PAY PER SALE Revenue Model CONSUMERS INDUSTRY Provide Interactive Transaction Tools to OEMs, Dealers via Web, Mobile
  5. 5. OUR PARTNERS
  6. 6. 2.4% 6 THE SITUATION INCREASING DATA APPETITE GROWING TECH DIVERSITY MORE PRODUCTS Data Movement Pressure Too much time keeping it together SQL Wizardry=
  7. 7. 2.4% 7 $65M DATA FLOW MULTIPLE DATA WAREHOUSES 100s of enrichment processes 1,000+ Inbound Data Feeds 7,500+ Dealers 1,500,000+ TC Dealers Vehicles Tracked Daily 8,000,000+ Industry Wide Vehicles Tracked Daily 400+ Websites Powered 1,000,000+ Cars Sold 20,000,000+ Customers Serviced Industry Leading Analytic Products 250,000,000+ Vehicle Images And More… FEEDBACK LOOPS *NUMBERS ARE ALL APPROXIMATE
  8. 8. WHOLESALE SHIFT NEEDED It’s not just an economics exercise. WE NEED NEW CAPABILITIES.
  9. 9. 9 $65M FUNDAMENTAL ROLE TRANSFORMATION SQL but Faster Data Scientists Database Developers Programmers Analysts INTELLIGENCE ENGINEERS YES, THIS NOT THIS
  10. 10. 2.4% 10 FOCUS ON MAKING THINGS INTELLIGENCE ENGINEERS should not have to worry about:  COMPUTE CYCLES  STORAGE  SYSTEM SCALE  MOVING DATA THEY SHOULD BE MAKING SMARTER THINGS
  11. 11. 2.4% 11 $65M DATA then APPs EXISTING DEVELOPMENT MODEL IS BROKEN & LIMITING NEW MODEL Define app Create highly tuned DB for specific app Load specific data GET ALL THE DATA YOU CAN HDFS Make and Remake apps
  12. 12. 12 $65M PHILOSOPHY DELET E DATA MOVE DATA DON’Ts LEARN MAP REDUCE WELL USE NATIVE COMPONENTS TAKE SHORTCUTS DO’s
  13. 13. 2.4% 13 $65M NO PROOF OF CONCEPTS POCS are: TOO SMALL TOO SIMPLE TOO EASY ONLY WAY TO BUILD LHC is to BUILD LHC
  14. 14. 14 $65M OUR DATA EVOLUTION JUNE ‘13 Initiate Hadoop Execution JULY ‘13 Partner with Hortonworks AUG. ‘13 Training & Dev Begins NOV. ‘13 (60) Node, 2PB prod. Cluster live DEC. ‘13 (3) production apps launch FEB ‘14 (3) more production apps launch JAN. ‘14 40% Dev staff proficient MAY ‘14 IPO 12 months execution path DataPlatformCapabilities We addressed out data platform capabilities strategically as a pre-cursor to IPO.
  15. 15. OUR SETUP TrueCar Hadoop Cluster:  60 Nodes, 2.55PB usable HDFS, 960 Xeon CPU cores, 7.7TB RAM - 10GbE networking, 3 racks, HDP 2.1 Final price point: $0.23/GB hardware & software/support $0.003/GB/mo space/power/cooling
  16. 16. 16 $65M SOME OF OUR HADOOP BASED SYSTEMS Vehicle Data Systems Intelligent Image Processing And of course… better BI
  17. 17. 2.4% 17 $65M EXAMPLE SYSTEM 1: VEHICLE DATA  We keep track of over 8,000,000+ new and used vehicles in inventory in the marketplace every day  We enrich and use vehicle data to power our market reports, Live Offers, value/pricing systems, industry data products and more  Previous non-Hadoop system took 6-24 hours to complete a full processing run The Goal with Hadoop:The Situation:  Scale up to allow reprocessing of 50 years of inventory/vehicle record data available to us  Enable attaching additional enrichment data and processing without a massive overhaul (plug and play)  Complete a full processing run of daily inbound data in 1 hour and speedy one off/small batch CRUD operations
  18. 18. 18 $65M EXAMPLE SYSTEM 1: VEHICLE INVENTORY DATA 1. Dealer Data Feeds  Provide daily snapshot of raw vehicle inventory 2. MapReduce – Data Loader  Normalize into a standard record  Filter out bad records  Validate fields 3. MapReduce – VIN Decoder  Identify trim/options for each vehicle 4. Hive – Data Enhancer  Join against other data sources to enrich the vehicle information 5. MapReduce – CRUD  Decide which entries are new, updated or should be deleted  Put entries in a queue for exporting to SQL HDFS MR – FILTER/VERIFY MR – VIN DECODE Hive Enrich MR – Rabbit/CRUD Database DEALER INVENTORY FEEDS Queue Service Message Queue HADOOP
  19. 19. 19 $65M EXAMPLE SYSTEM 1: VEHICLE DATA VIN DECODER Inventory or transaction data from dealers (HDFS) VIN decode rules (general & make- specific) Compute F1 score for matches Mapper Vehicle trim & probability Canonical vehicle color data (HDFS) Canonical vehicle trim/style data (HDFS) Pre staged in memory Hadoop Components: Just a MAPPER Avro format for I/O Challenge: Understand EXACTLY What options are on all cars. Used to compute similarity between inventory and canonical data http://en.wikipedia.org/wiki/F1_score
  20. 20. 2.4% 20 $65M EXAMPLE SYSTEM 2: INTELLIGENT IMAGE PROCESSING  250,000,000+ vehicle images currently under asset management for live data  1,000,000,000+ images have passed through system  1,000,000+ images processed daily (and growing)  Original system for processing images: could take up to 1 day to fully process all daily images The Goal with Hadoop:The Situation:  Scale to being able to store online over 1,000,000,000+ image  Allow for advanced image recognition, OCR  Process full run of latest images in less than 2 hours, allow for speedy one off/small batch real time CRUD operations
  21. 21. 21 $65M EXAMPLE SYSTEM 2: IMAGE DOWNLOADER Pulls Images From Providers into HDFS Hadoop  Downloads multiple images simultaneously  Downloads from multiple providers simultaneously  Download times scale with cluster size
  22. 22. 2.4% 22 $65M EXAMPLE SYSTEM 2: IMAGE BUNDLER BUNDLES MILLIONS OF DAILY IMAGES INTO SINGLE HDFS FILE Hadoop Image Bundle May 31, 2014 Image Bundle May 30, 2014  Uses HIPI (http://hipi.cs.virginia.edu) to store multiple images in an HDFS sequence file  Instead of millions of small daily image files ( << block size), have 1 large daily file with all images bundled inside (>> block size)  We tag images with metadata, permanently linking images to our vehicle database (e.g., VIN, Make, Model, Model Year, etc.)
  23. 23. 2.4% 23 $65M Hadoop Thumbnailing builds thumbnail library Vehicle Locator finds vehicle in image Color Decoder determines vehicle RGB color code COCOCO Orientation determines image orientation Driver Side  Image bundles can be processed through multiple Java MapReduce routines  Thumbnailing is done with ImageJ  Vehicle locator will be done with OpenCV, using edge detection and shape-based features  Average color will be determined from pixel value ratios in the RGB layers of the jpeg  Orientation will be determined with shape-based features and gradient algorithms (see Rybski, Huber, Morris, and Hoffman 2010) EXAMPLE SYSTEM 2: IMAGE PROCESSOR PROCESSES IMAGE BUNDLE THROUGH HADOOP
  24. 24. 2.4% 24 $65M EXAMPLE SYSTEM 3: ADVANCED BUSINESS INTELLIGENCE  8 years of web/app behavior  25,000+ data fields  50,000,000+ configured vehicles  1,000,000+ TrueCar car transactions  Previous approaches had various data spread across 4+ data warehouses and only a small portion of the data online available for query and required extensive data movement pipelines to integrate The Goal with Hadoop:The Situation:  All behavioral data for all time available for analytics  Data injected no less than once per day, with most coming in near real time  Remove worry from analysts and DBAs regarding deletion or offline archive  Reduce data warehouses, consolidate analytic tooling
  25. 25. 2.4% 25 $65M EXAMPLE SYSTEM 3: BI GROWTH 0 200 400 600 800 1000 1200 1400 1600 1800 Millions ACCELERATING BI DATA GROWTH
  26. 26. 2.4% 26 EXAMPLE SYSTEM 3: MULTI-DIMENSIONAL BI
  27. 27. 27 $65M WAS IT WORTH IT ? ECONOMIC  Storage Costs, Compute Costs - FROM $19.00/GB to $0.23/GB  Elimination of expensive proprietary tools FUNCTIONALITY  Development effort of complex data applications reduced by 3x  Automated Trend Hunting  Consolidation of data into immediately computable, searchable infrastructure  Unified ETL and Storage system – near zero data movement environment  Functional Programming Approach
  28. 28. FUTURE PREVIEW COMPREHENSIVE DATA REAL TIME MARKET SIMULATION REAL TIME TRANSACTION PROCESSING PRESCRIPTIVE MOBILE REAL TIME TOOLS TOTAL AUTO MARKETPLACE
  29. 29. THANK YOU.

×