2011 x.commerce Innovate Data Alchemy


Published on

The New Alchemy: Turning Data into Gold

Developers are leading the charge to turn consumer behavior into profitable solutions. By accessing and analyzing the explosion of data from consumer activities, any developer can create the personalized, relevant products and services that customers demand and merchants urgently need. We will discuss how to acquire, store, and mine information, and how to design analytics-focused software and build data-driven software engines.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2011 x.commerce Innovate Data Alchemy

  1. 1. !!!
  2. 2. Every Second – in over 50,000 Categories
  3. 3. eBay Analytics >50 TB/day new data >100k data elements >100 Trillion pairs of information>150 PB/day Processed >50k chains of logic >7500 business users & analysts Structured/Unstructured turning over a TB every second 24 x7x365 Always online Millions of queries/day 99.98+% Availability Near-Real-time 3
  4. 4. Big
  5. 5. Detail
  6. 6. Designing for the Unknown>85% of analytical workload is NEW & UnknownThe metrics you know are cheapThe metrics you don’t know are expensive – but high in potential ROIExploration & Testing are core pillars of an analytics-driven organization
  7. 7. incremental storage Volume DATA
  8. 8. incremental storage Volume DATA Velocity processing change
  9. 9. incremental storage Volume DATA structured Variety Velocity processingsemi-structured change un-structured
  10. 10. Value > Cost $’s per year in incremental revenuewww.wallpapertimes.com
  11. 11. !  Data Growing Faster
  12. 12. •  Impact
  13. 13. Data questions later structure later ($0.04/GB, $80/2TB)single HDFS instances >50PBValue > Cost 16
  14. 14. Synonyms  derived  from  top  queries  in  item  query  clusters  texas  instruments  ba  ii  plus   /  ba  ii  plus  brighton  handbag   brighton  purse  lenovo  x200   thinkpad  x200  king  bedspread   king  coverlet  rockabilly  dress   swing  dress  1963  ford  falcon   63  falcon  jessica  simpson  hair  extensions   jessica  simpson  hairdo     Abbrevia7ons/acronym  derived  from  query  transi7ons  stanford  ky   stanford  kentucky  dc  sub   dc  subwoofer  snowboard  helmet  l   snowboard  helmet  large  motorcycle  cam   motorcycle  camera  diamond  amp   diamond  amplifier  
  15. 15. Toys and HobbiesATC > Artist trading card in ARTATC > Automatic Tool Change in Business and Industrial
  16. 16. Offline Online ClientsEditorial Service Search Code Selling Small Data Others… Behavioral Logs Big Data Store Document Data NoSQL Human Judgment <3 milliseconds per query 1.2 billion queries per day 1,000’s of queries per second per machine
  17. 17. German Compound Words •  German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (beef labeling regulation & delegation of supervision law) •  Syntactically, words can be combined and split in many ways. •  Some words shouldn’t be de-compounded. beiden (both) – bei(at) den(the) •  Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) •  Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  18. 18. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/HiveProduction Data Warehousing Contextual-Complex Analytics Structure the UnstructuredLarge Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop BehavioralEnterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  19. 19. Brian knows the satisfaction and importance of good search results,and his team is responsible for ensuring that the millions of queriesentered onto the eBay website provide just that. The words “Did youmean…?” are incredibly meaningful to Brian as he combs through auniverse of queries altered by synonyms, acronyms, attributes, andexpansions. He’s been doing this sort of work since he joined eBaynine years ago. Brian has loved technology ever since junior highschool, when he played the game “Lunar Lander” on a paperteletype before video games existed, and pulled pranks in the localRadio Shack. When Brian gets outside, he goes backpacking onMount Whitney, enters triathlons, and walks on water (barefoot waterskiing).