SlideShare a Scribd company logo
1 of 15
Download to read offline
Sidi Chang
Insight Data Science Data Engineering Fellow
Jul 2016
JustBid
Sealed/blind second price auction
Item
Bidder
• Demo
Data Pipeline
Simulated
Data
Data
• 10K bidders
• Nearly 15 million bidding
Recommendation—Jaccard Similarity
Jaccard Similarity:
D_i = user_i
C_i = items(user_i)
Recommendation
For	𝑵 = 𝟏𝟎	million,	it	takes	more	than	a	year(AWS	m4.large	cluster)…	
Then	we	will	need	to	use	minHash	Algorithm	which	can	be	easily	distributed…	
Do	an	unbiased	estimation	by	Chernoff	Bounds	and	Markov	Inequality:	
The	expected	error	is
MinHash Example
Item Row User 1 User 2 User 3 User 4 x+1 mod
5
3x+1
mod 5
1 0 1 0 0 1 1 1
2 1 0 0 1 0 2 4
3 2 0 1 0 1 3 2
4 3 1 0 1 1 4 0
5 4 0 0 1 0 0 3
U1 U2 U3 U4
Hash 1
Hash 2
MinHash Example
Item Row User 1 User 2 User 3 User 4 x+1 mod
5
3x+1
mod 5
1 0 1 0 0 1 1 1
2 1 0 0 1 0 2 4
3 2 0 1 0 1 3 2
4 3 1 0 1 1 4 0
5 4 0 0 1 0 0 3
U1 U2 U3 U4
Hash 1
Hash 2
MinHash Example
Item Row User 1 User 2 User 3 User 4 x+1 mod
5
3x+1
mod 5
1 0 1 0 0 1 1 1
2 1 0 0 1 0 2 4
3 2 0 1 0 1 3 2
4 3 1 0 1 1 4 0
5 4 0 0 1 0 0 3
U1 U2 U3 U4
Hash 1
Hash 2
MinHash Example
Item Row User 1 User 2 User 3 User 4 x+1 mod
5
3x+1
mod 5
1 0 1 0 0 1 1 1
2 1 0 0 1 0 2 4
3 2 0 1 0 1 3 2
4 3 1 0 1 1 4 0
5 4 0 0 1 0 0 3
U1 U2 U3 U4
Hash 1 1
Hash 2
MinHash Example
Item Row User 1 User 2 User 3 User 4 x+1 mod
5
3x+1
mod 5
1 0 1 0 0 1 1 1
2 1 0 0 1 0 2 4
3 2 0 1 0 1 3 2
4 3 1 0 1 1 4 0
5 4 0 0 1 0 0 3
U1 U2 U3 U4
Hash 1 1 3 0 1
Hash 2 0 2 0 0
Performance
Challenges
• MinHash Algorithm implemented in distributed system
• Jaccard Similarity Tested in distributed system
• Use right data structures to faster computation
• Use both Scala and Python
About me
• MS in CS and Operations Research

More Related Content

Viewers also liked

IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketingIMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketingKellen Dieterich
 
Riesgos laborales según las normas convenin
Riesgos laborales según las normas conveninRiesgos laborales según las normas convenin
Riesgos laborales según las normas conveninjohalmy
 
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.Suresh Babu G
 
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práceRenata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práceÚISK FF UK
 
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...ÚISK FF UK
 
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulaceLukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulaceÚISK FF UK
 
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified SelfJakub Fiala: Quantified Self
Jakub Fiala: Quantified SelfÚISK FF UK
 

Viewers also liked (11)

El estado colombiano
El estado colombianoEl estado colombiano
El estado colombiano
 
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketingIMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
 
Base de datos
Base de datosBase de datos
Base de datos
 
Creative commons
Creative commonsCreative commons
Creative commons
 
Riesgos laborales según las normas convenin
Riesgos laborales según las normas conveninRiesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
 
Teori pendekatan gestalt
Teori pendekatan gestaltTeori pendekatan gestalt
Teori pendekatan gestalt
 
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
 
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práceRenata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
 
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
 
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulaceLukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
 
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified SelfJakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
 

Similar to Sidi chang demo

Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryBayes Nets meetup London
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningitstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
 
Outlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopOutlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopPranab Ghosh
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian RaschkaMachine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian RaschkaPawanJayarathna1
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...e2wi67sy4816pahn
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeItai Yaffe
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 

Similar to Sidi chang demo (20)

Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Big data
Big dataBig data
Big data
 
Outlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopOutlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian RaschkaMachine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian Raschka
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Big data
Big dataBig data
Big data
 

Recently uploaded

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 

Recently uploaded (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 

Sidi chang demo