Submit Search
Upload
Sidi chang demo
•
0 likes
•
84 views
Sidi Chang
Follow
Insight Data Science
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 15
Download now
Download to read offline
Recommended
Probabilistic Data Structures and Approximate Solutions
Probabilistic Data Structures and Approximate Solutions
Oleksandr Pryymak
The Very ^ 2 Basics of R
The Very ^ 2 Basics of R
Winston Chen
R statistics with mongo db
R statistics with mongo db
MongoDB
Tech talk Probabilistic Data Structure
Tech talk Probabilistic Data Structure
Rishabh Dugar
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
Sidi chang week_4.3
Sidi chang week_4.3
Sidi Chang
Ppt pkn endah
Ppt pkn endah
Tri_Endah_Sulistiani
Ppt Tri Endah
Ppt Tri Endah
Tri_Endah_Sulistiani
Recommended
Probabilistic Data Structures and Approximate Solutions
Probabilistic Data Structures and Approximate Solutions
Oleksandr Pryymak
The Very ^ 2 Basics of R
The Very ^ 2 Basics of R
Winston Chen
R statistics with mongo db
R statistics with mongo db
MongoDB
Tech talk Probabilistic Data Structure
Tech talk Probabilistic Data Structure
Rishabh Dugar
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
Sidi chang week_4.3
Sidi chang week_4.3
Sidi Chang
Ppt pkn endah
Ppt pkn endah
Tri_Endah_Sulistiani
Ppt Tri Endah
Ppt Tri Endah
Tri_Endah_Sulistiani
El estado colombiano
El estado colombiano
camilo charris
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Kellen Dieterich
Base de datos
Base de datos
Yessica Yuliana Montealegre Amado
Creative commons
Creative commons
etevago lopez dofus
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
johalmy
Teori pendekatan gestalt
Teori pendekatan gestalt
Tri_Endah_Sulistiani
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Suresh Babu G
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
ÚISK FF UK
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
ÚISK FF UK
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
ÚISK FF UK
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
ÚISK FF UK
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
Bayes Nets meetup London
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
Big data
Big data
canara engineering college
Big data
Big data
Harshit Namdev
Hadoop PDF
Hadoop PDF
1904saikrishna
Skillwise Big data
Skillwise Big data
Skillwise Group
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity
Big data
Big data
Zeeshan Khan
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
Pranab Ghosh
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
More Related Content
Viewers also liked
El estado colombiano
El estado colombiano
camilo charris
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Kellen Dieterich
Base de datos
Base de datos
Yessica Yuliana Montealegre Amado
Creative commons
Creative commons
etevago lopez dofus
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
johalmy
Teori pendekatan gestalt
Teori pendekatan gestalt
Tri_Endah_Sulistiani
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Suresh Babu G
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
ÚISK FF UK
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
ÚISK FF UK
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
ÚISK FF UK
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
ÚISK FF UK
Viewers also liked
(11)
El estado colombiano
El estado colombiano
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Base de datos
Base de datos
Creative commons
Creative commons
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
Teori pendekatan gestalt
Teori pendekatan gestalt
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
Similar to Sidi chang demo
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
Bayes Nets meetup London
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
Big data
Big data
canara engineering college
Big data
Big data
Harshit Namdev
Hadoop PDF
Hadoop PDF
1904saikrishna
Skillwise Big data
Skillwise Big data
Skillwise Group
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity
Big data
Big data
Zeeshan Khan
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
Pranab Ghosh
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian Raschka
PawanJayarathna1
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
NUS-ISS
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
e2wi67sy4816pahn
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Itai Yaffe
Clickstream data with spark
Clickstream data with spark
Marissa Saunders
Summit EU Machine Learning
Summit EU Machine Learning
MapR Technologies
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
Big data
Big data
Mohammad Reza Gerami
Similar to Sidi chang demo
(20)
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Big data
Big data
Big data
Big data
Hadoop PDF
Hadoop PDF
Skillwise Big data
Skillwise Big data
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
Big data
Big data
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian Raschka
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Clickstream data with spark
Clickstream data with spark
Summit EU Machine Learning
Summit EU Machine Learning
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Big data
Big data
Recently uploaded
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
eptoze12
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
KartikeyaDwivedi3
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
asadnawaz62
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
ssuser7cb4ff
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
SachinPawar510423
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
JasonTagapanGulla
Past, Present and Future of Generative AI
Past, Present and Future of Generative AI
abhishek36461
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Tagore Institute of Engineering And Technology
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
saravananr517913
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
jennyeacort
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Asst.prof M.Gokilavani
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
Madan Karki
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
VICTOR MAESTRE RAMIREZ
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
sdickerson1
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
dollysharma2066
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
JuanCarlosMorales19600
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
welding defects observed during the welding
welding defects observed during the welding
MuhammadUzairLiaqat
Recently uploaded
(20)
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
Past, Present and Future of Generative AI
Past, Present and Future of Generative AI
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
welding defects observed during the welding
welding defects observed during the welding
Sidi chang demo
1.
Sidi Chang Insight Data
Science Data Engineering Fellow Jul 2016 JustBid
2.
Sealed/blind second price
auction Item Bidder
3.
• Demo
4.
Data Pipeline Simulated Data
5.
Data • 10K bidders •
Nearly 15 million bidding
6.
Recommendation—Jaccard Similarity Jaccard Similarity: D_i
= user_i C_i = items(user_i)
7.
Recommendation For 𝑵 = 𝟏𝟎 million, it takes more than a year(AWS m4.large cluster)… Then we will need to use minHash Algorithm which can be easily distributed… Do an unbiased estimation by Chernoff Bounds and Markov Inequality: The expected error is
8.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
9.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
10.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
11.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 1 Hash 2
12.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 1 3 0 1 Hash 2 0 2 0 0
13.
Performance
14.
Challenges • MinHash Algorithm
implemented in distributed system • Jaccard Similarity Tested in distributed system • Use right data structures to faster computation • Use both Scala and Python
15.
About me • MS
in CS and Operations Research
Download now