!        !        !CloudCon, Tuesday, October 2nd, 11am
Every Second – in over thousands of Categories
Value > Cost                         $’s per year in incremental revenuewww.wallpapertimes.com
Big
Detail
incremental   storage                            Volume                            DATA    structured    Variety          ...
Analyze & Report                                                                         Discover & Explore      Structure...
!    Data Growing Faster
Data         questions later         structure later              (<$0.04/GB, <$80/2TB)single HDFS instances >50PBValue > ...
Designing for the Unknown>85% of analytical workload is NEW & UnknownThe metrics you know are cheapThe metrics you don’t k...
•    Impact
!"#$   %$&               ()*+,"-+       .-)/01$2&   3-#$4!     5"*2"$,           5"*2&4!     6*77"$,           6*77&4!    ...
!"#$   %$&               ()*+,"-+       .-)/01$2&                      3-#$4!     5"*2"$,           5"*2&           6*7)"2...
Value > Cost                         $’s per year in incremental revenuewww.wallpapertimes.com
Toys and HobbiesATC   >   Artist trading card   in ARTATC   >   Automatic Tool Change in Business and Industrial
German Compound Words •    German compound words can be arbitrarily created and extremely long          Adidastrainingsanz...
Synonyms	  derived	  from	  top	  queries	  in	  item	  query	  clusters	  texas	  instruments	  ba	  ii	  plus	          ...
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
CloudCon Data Mining Presentation
Upcoming SlideShare
Loading in …5
×

CloudCon Data Mining Presentation

550 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
550
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CloudCon Data Mining Presentation

  1. 1. ! ! !CloudCon, Tuesday, October 2nd, 11am
  2. 2. Every Second – in over thousands of Categories
  3. 3. Value > Cost $’s per year in incremental revenuewww.wallpapertimes.com
  4. 4. Big
  5. 5. Detail
  6. 6. incremental storage Volume DATA structured Variety Velocity processingsemi-structured change un-structured
  7. 7. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/HiveProduction Data Warehousing Contextual-Complex Analytics Structure the UnstructuredLarge Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop BehavioralEnterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  8. 8. !  Data Growing Faster
  9. 9. Data questions later structure later (<$0.04/GB, <$80/2TB)single HDFS instances >50PBValue > Cost 10
  10. 10. Designing for the Unknown>85% of analytical workload is NEW & UnknownThe metrics you know are cheapThe metrics you don’t know are expensive – but high in potential ROIExploration & Testing are core pillars of an analytics-driven organization
  11. 11. •  Impact
  12. 12. !"#$ %$& ()*+,"-+ .-)/01$2& 3-#$4! 5"*2"$, 5"*2&4! 6*77"$, 6*77&4! 82*+6$22"$, 82*+6$22&4! 9-77"+7 9-74! :",;"+7/,#"8<$2 :",;/,#"8<$2,4% )*+$=, )*+$=="+74% )2-#$8#"-+ )2-#$8#$24% ="+"+7 ="+$54% *+">*#"-+ *+">*#$54% #218<, #218<"+74% $57"+7 $57$,4% +$#, +$##"+7
  13. 13. !"#$ %$& ()*+,"-+ .-)/01$2& 3-#$4! 5"*2"$, 5"*2& 6*7)"2$/5"*2"$,4! 8*99"$, 8*99& )*#*9-+"*/8*99"$, 9--5/:-2/)*#*9-+"*/8*99&;/+-#/9--5/*<-+$4! =2*+8$22"$, =2*+8$22& #>$/=2*+8$22"$,4! ?-99"+9 ?-9 ?-99"+9/,#2-<<$24! :",>"+9/,#"=@$2 :",>/,#"=@$2, :",>"+9/,#"=@$2 ,)-2#,/6,A/@"5,/2--7,4% )*+$<, )*+$<<"+9 :$+=$/)*+$<,4% )2-#$=#"-+ )2-#$=#$2 7=*:$$/#-#*</)2-#$=#"-+/BCDB ,=2$$+/)2-#$=#$2/",/#-)/4!/E1$2&4% <"+"+9 <"+$5 )"+@/<"+"+9/=>*+9"+9/8*94% *+"7*#"-+ *+"7*#$5 *+"7*#"-+/=$<4% #21=@, #21=@"+9 =-29"/#21=@,4% $59"+9 $59$, 9*25$+/$59"+94% +$#, +$##"+9 )12,$/+$#,
  14. 14. Value > Cost $’s per year in incremental revenuewww.wallpapertimes.com
  15. 15. Toys and HobbiesATC > Artist trading card in ARTATC > Automatic Tool Change in Business and Industrial
  16. 16. German Compound Words •  German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (beef labeling regulation & delegation of supervision law) •  Syntactically, words can be combined and split in many ways. •  Some words shouldn’t be de-compounded. beiden (both) – bei(at) den(the) •  Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) •  Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  17. 17. Synonyms  derived  from  top  queries  in  item  query  clusters  texas  instruments  ba  ii  plus   /  ba  ii  plus  brighton  handbag   brighton  purse  lenovo  x200   thinkpad  x200  king  bedspread   king  coverlet  rockabilly  dress   swing  dress  1963  ford  falcon   63  falcon  jessica  simpson  hair  extensions   jessica  simpson  hairdo     Abbrevia7ons/acronym  derived  from  query  transi7ons  stanford  ky   stanford  kentucky  dc  sub   dc  subwoofer  snowboard  helmet  l   snowboard  helmet  large  motorcycle  cam   motorcycle  camera  diamond  amp   diamond  amplifier  

×