SlideShare a Scribd company logo
Overview of Research in WG IV:
Representative Points for Small-Data and Big-Data Problems
V. Roshan Joseph and Simon Mak
1
Supported by NSF DMS 1712642
Big-Data Problems
• Reduce big data to reduce future
computational cost
2
Big Data Representative points
Small-Data Problems
• Obtain expensive small data with minimum
cost
3
Expensive data
generating mechanism
Representative points
Big Data: An Example
4
Kernel Ridge Regression
(Inputs)
90 song features
(Output)
Song release date
Loudness Pitch Timbre
2007
E.g.,
(Data)
N=515345 songs
Computation: 𝑂(𝑁3)
Storage: 𝑂(𝑁2)
https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
• Mak and Joseph (2017)
How to Reduce Big Data?
• Stratified sampling (Dalenius 1950; Cox
1957)
• Principal points (Flury 1990)
• Quantizers (Lloyd 1957; Max 1960)
• MSE-rep points (Fang and Wang 1994)
– K-means Clustering
– Can’t produce a reduced point set that retains
the original distribution -> Not a “true”
representative point set!
5
Support Points
6
• Can be efficiently optimized using difference-of-convex
program.
• Mak and Joseph (2018)
Support Points-continued
7
Reducing Big Data
8
Example: N2(0,1)
9
Small Data: An Example
10
Physical Experiment
FEM Experiment
• Friction drilling (Miller and Shih 2007)
Model Calibration
11
Bayesian Model
where
• One evaluation of the unnormalized posterior takes
about 15.4 seconds in a 3.2 GHz laptop=> 10,000
MCMC samples would take 43 hours!
12
Minimum Energy Design
• Joseph, Dasgupta, Tuo, and Wu (2015)
• Posterior density
• Normalizing constant C is not needed!
13
𝑓 𝑥 =
1
𝐶
ℎ(𝑥)
14
15
• #evaluations=654 (Joseph, Wang, Gu, Lv, and Tuo 2017)
16
Marginal Distribution
17
MED+MCMC
• Approximate the log-unnormalized posterior using
Gaussian Process and use MCMC
18
Support Points
min
2
𝑛𝐶
𝑖=1
𝑛
𝑥𝑖 − 𝑥 ℎ 𝑥 𝑑𝑥 −
1
𝑛2
𝑖=1
𝑛
𝑗=1
𝑛
𝑥𝑖 − 𝑥𝑗
• Normalizing constant C doesn’t factor out!
19
Research Questions
• Can we do fast Gaussian Process
approximation with big data?
• Can we adaptively estimate the
normalizing constant?
– Simon’s talk!
20
Thanks
21
• Lulu Kang
• Lester Mackey
• Fred Hickernell
• Mac Hyman
• Scott Schmilder
• Joe Marion
• Raaz Dwivedi
• Kan Zhang
• Cheng Cheng
• Matthias Sachs
References
Support points
• Mak, S. and Joseph, V. R. (2018). “Support Points,” Annals of Statistics, to appear,
https://arxiv.org/abs/1609.01811.
• Mak, S. and Joseph, V. R. (2017) “Projected Support Points: A New Method for High-
Dimensional Data Reduction”. Under review, https://arxiv.org/abs/1708.06897.
Minimum energy designs
• Joseph, V. R., Dasgupta, T., Tuo, R., and Wu, C. F. J. (2015). “Sequential
Exploration of Complex Surfaces Using Minimum Energy Designs”. Technometrics,
57, 64-74.
• Joseph, V. R., Wang, D., Gu, L., Lv, S., and Tuo, R. (2017) “Deterministic Sampling
of Expensive Posteriors Using Minimum Energy Designs”.
https://arxiv.org/abs/1712.08929
22

More Related Content

Similar to QMC: Transition Workshop - Overview of Research in Working Group 4: Representative Points for Small Data and Big Data Problems - V. Roshan Joseph & Simon Mak, May 7, 2018

AI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAsst.prof M.Gokilavani
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSotiris Beis
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Handling Uncertainty in Geo-Spatial Data.
Handling Uncertainty in Geo-Spatial Data.Handling Uncertainty in Geo-Spatial Data.
Handling Uncertainty in Geo-Spatial Data.Andreas Zuefle
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
 
Foofah: Data Transformation by Example (SIGMOD 2017)
Foofah: Data Transformation by Example (SIGMOD 2017)Foofah: Data Transformation by Example (SIGMOD 2017)
Foofah: Data Transformation by Example (SIGMOD 2017)"Zhongjun "Mark"" Jin
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsJen Stirrup
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly
 
Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data Viktoria Kolomiets
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
Visualising and analysing networks
Visualising and analysing networksVisualising and analysing networks
Visualising and analysing networksFrancisco Restivo
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 

Similar to QMC: Transition Workshop - Overview of Research in Working Group 4: Representative Points for Small Data and Big Data Problems - V. Roshan Joseph & Simon Mak, May 7, 2018 (20)

AI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptx
 
CS6715-Module1
CS6715-Module1CS6715-Module1
CS6715-Module1
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Lecture2-DT.pptx
Lecture2-DT.pptxLecture2-DT.pptx
Lecture2-DT.pptx
 
Handling Uncertainty in Geo-Spatial Data.
Handling Uncertainty in Geo-Spatial Data.Handling Uncertainty in Geo-Spatial Data.
Handling Uncertainty in Geo-Spatial Data.
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Foofah: Data Transformation by Example (SIGMOD 2017)
Foofah: Data Transformation by Example (SIGMOD 2017)Foofah: Data Transformation by Example (SIGMOD 2017)
Foofah: Data Transformation by Example (SIGMOD 2017)
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data Memory Networks for Question Answering on Tabular Data
Memory Networks for Question Answering on Tabular Data
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
IMIA Chiang Spatial Computing - 2016
IMIA Chiang Spatial Computing - 2016IMIA Chiang Spatial Computing - 2016
IMIA Chiang Spatial Computing - 2016
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Visualising and analysing networks
Visualising and analysing networksVisualising and analysing networks
Visualising and analysing networks
 
DBMS
DBMSDBMS
DBMS
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleCeline George
 
Forest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFForest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFVivekanand Anglo Vedic Academy
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfYibeltalNibretu
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfPo-Chuan Chen
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxricssacare
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringDenish Jangid
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxakshayaramakrishnan21
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Forest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFForest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDF
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 

QMC: Transition Workshop - Overview of Research in Working Group 4: Representative Points for Small Data and Big Data Problems - V. Roshan Joseph & Simon Mak, May 7, 2018

  • 1. Overview of Research in WG IV: Representative Points for Small-Data and Big-Data Problems V. Roshan Joseph and Simon Mak 1 Supported by NSF DMS 1712642
  • 2. Big-Data Problems • Reduce big data to reduce future computational cost 2 Big Data Representative points
  • 3. Small-Data Problems • Obtain expensive small data with minimum cost 3 Expensive data generating mechanism Representative points
  • 4. Big Data: An Example 4 Kernel Ridge Regression (Inputs) 90 song features (Output) Song release date Loudness Pitch Timbre 2007 E.g., (Data) N=515345 songs Computation: 𝑂(𝑁3) Storage: 𝑂(𝑁2) https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD • Mak and Joseph (2017)
  • 5. How to Reduce Big Data? • Stratified sampling (Dalenius 1950; Cox 1957) • Principal points (Flury 1990) • Quantizers (Lloyd 1957; Max 1960) • MSE-rep points (Fang and Wang 1994) – K-means Clustering – Can’t produce a reduced point set that retains the original distribution -> Not a “true” representative point set! 5
  • 6. Support Points 6 • Can be efficiently optimized using difference-of-convex program. • Mak and Joseph (2018)
  • 10. Small Data: An Example 10 Physical Experiment FEM Experiment • Friction drilling (Miller and Shih 2007)
  • 12. Bayesian Model where • One evaluation of the unnormalized posterior takes about 15.4 seconds in a 3.2 GHz laptop=> 10,000 MCMC samples would take 43 hours! 12
  • 13. Minimum Energy Design • Joseph, Dasgupta, Tuo, and Wu (2015) • Posterior density • Normalizing constant C is not needed! 13 𝑓 𝑥 = 1 𝐶 ℎ(𝑥)
  • 14. 14
  • 15. 15 • #evaluations=654 (Joseph, Wang, Gu, Lv, and Tuo 2017)
  • 16. 16
  • 18. MED+MCMC • Approximate the log-unnormalized posterior using Gaussian Process and use MCMC 18
  • 19. Support Points min 2 𝑛𝐶 𝑖=1 𝑛 𝑥𝑖 − 𝑥 ℎ 𝑥 𝑑𝑥 − 1 𝑛2 𝑖=1 𝑛 𝑗=1 𝑛 𝑥𝑖 − 𝑥𝑗 • Normalizing constant C doesn’t factor out! 19
  • 20. Research Questions • Can we do fast Gaussian Process approximation with big data? • Can we adaptively estimate the normalizing constant? – Simon’s talk! 20
  • 21. Thanks 21 • Lulu Kang • Lester Mackey • Fred Hickernell • Mac Hyman • Scott Schmilder • Joe Marion • Raaz Dwivedi • Kan Zhang • Cheng Cheng • Matthias Sachs
  • 22. References Support points • Mak, S. and Joseph, V. R. (2018). “Support Points,” Annals of Statistics, to appear, https://arxiv.org/abs/1609.01811. • Mak, S. and Joseph, V. R. (2017) “Projected Support Points: A New Method for High- Dimensional Data Reduction”. Under review, https://arxiv.org/abs/1708.06897. Minimum energy designs • Joseph, V. R., Dasgupta, T., Tuo, R., and Wu, C. F. J. (2015). “Sequential Exploration of Complex Surfaces Using Minimum Energy Designs”. Technometrics, 57, 64-74. • Joseph, V. R., Wang, D., Gu, L., Lv, S., and Tuo, R. (2017) “Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs”. https://arxiv.org/abs/1712.08929 22