SlideShare a Scribd company logo
1 of 30
Download to read offline
Computational and Statistical Tradeoffs
Venkat Chandrasekaran
Caltech
Joint with Michael Jordan (Berkeley), Yong Sheng
Soh (Caltech), Quentin Berthet (Cambridge)
Time-Constrained Inference
o Require decision after a fixed (usually small) amount of time
Ad exchange
Classical Analysis in Statistics
o Previously, key bottleneck: amount of data
o Consider ``best’’ estimator without much regard for
computational considerations
o Minimax analysis
o More recently, time/memory/other infrastructure is bottleneck
o Data is plentiful in several domains
o Need to incorporate these additional constraints formally
A Thought Experiment
o The standard inferential tradeoff
o Process 5MB of data in 1 hour, error = 0.1
o Process 5TB in 1 month, error = 0.0001
o Given 5TB of data, happy with error = 0.1
o Don’t care about small improvements in error
o Statistical models are only approximations of reality
A Thought Experiment
o The standard inferential tradeoff
o Process 5MB of data in 1 hour, error = 0.1
o Process 5TB in 1 month, error = 0.0001
o Given 5TB of data, happy with error = 0.1
o Don’t care about small improvements in error
o Statistical models are only approximations of reality
o More data useful for less computation?
Computer Science vs. Statistics
Error
Amount
of data
Runtime
/ Space /
…
Computer
ScienceComputation /
Statistics tradeoffs
Statistics
Time-Data Tradeoffs
o Consider an inference problem with fixed (desired) error
o Inference procedures viewed as points in plot
Runtime
Minimax risk lower bound
- well understood
Computational lower bound
- poorly understood
- depends on computational model
A
B
C
D E F
Number of samples
Time-Data Tradeoffs
o Consider an inference problem with fixed (desired) error
o Need “weaker” algorithms
for larger datasets to reduce
runtime
o At some stage, throw away
dataRuntime
A
B
C
D E F
Number of samples
An Estimation Problem
o Signal from known (bounded) set
o Noise
o Observation model
o Observe i.i.d. samples
Convex Programming Estimator
o Sample mean is sufficient statistic
o Natural M-estimator
o Convex programming M-estimator
o Constraint is a convex set such that
Convex Programming Estimator
o Long history of shrinkage estimation in statistics
o James, Stein (1961)
o Donoho, Johnstone (early 1990s)
o Shrinkage onto convex sets for tractability
o Many surprises in high dimensions, i.e., large p
o More recently
o L1 norm, trace norm, max norm, …
Statistical Performance of Estimator
o Defn 1: The cone of feasible directions into a convex set C is
defined as
Statistical Performance of Estimator
o Defn 1: The cone of feasible directions into a convex set C is
defined as
o Defn 2: The Gaussian (squared) complexity of a cone T is defined
as
Statistical Performance of Estimator
o Thm: The risk of the estimator is
o Proof: Apply optimality conditions
o Intuition: Only consider error in feasible cone
o Many improvements possible to this theorem
Weakening via Convex Relaxation
o Thm: The risk of the estimator is
o Corr: To obtain risk of at most 1,
Weakening via Convex Relaxation
o Corr: To obtain risk of at most 1,
Monotonic in C
Weakening via Convex Relaxation
If we have access to larger n, can use larger C
 Obtain “weaker” estimation algorithm
Hierarchy of Convex Relaxations
o If “algebraic”, then one can obtain family of outer convex
approximations
o Polyhedral, semidefinite, hyperbolic, relative entropy relaxations
(Sherali-Adams, Boyd, Parrilo, Lasserre, Renegar, C., Shah)
o Sets ordered by computational complexity
o Central role played by lift-and-project
Before we get to examples …
o How do we calculate runtime?
o Total runtime = np + # ops for projection
Computing
sample mean
Subsequent
processing
With more data,
this increases
With more data,
this decreases
Example 1
o consists of cut matrices
o E.g., collaborative filtering, clustering
Example 2
o Signal set consists of all perfect matchings in complete graph
o E.g., network inference
Example 3
o Banding estimators for covariance matrices
o Bickel-Levina [2007]; Cai, Zhang, Zhou [2010]
o Assume known variable ordering
o Stylized problem: let M be known tridiagonal matrix
o Signal set
Example 4
o consists of all adjacency matrices of graphs with only a clique
on square-root of the nodes
o E.g., sparse PCA, gene expression patterns
o Kolar et al. (2010)
Example 4
o What if we use an even weaker relaxation?
o E.g., just take the sample mean as the estimate
Example 4
o What if we use an even weaker relaxation?
o E.g., just take the sample mean as the estimate
o
o In this case, makes sense to throw away data …
Recall Plot …
o At some stage, throw away
data
o Several caveats
‒ Need computational lower
bounds
‒ Depends on computational
architecture
‒ Parallel vs. serial, …
Runtime
A
B
C
D E F
Number of samples
Beyond Denoising Problems …
o Consider a high-dimensional time series
o Goal: reliably estimate change-points
Change-Point Estimation
o Suppose we sample more frequently, but number of changes
remains the same
o Can again use convex relaxation to obtain time-data tradeoffs
o Joint work with Y.S. Soh (Caltech)
Soh, C., “High-Dimensional Changepoint Estimation: Combining Filtering and Convex Optimization,” ACHA, 2016
Heterogenous Data
o Preceding setups assume unrealistically that data are
“stationary” (same type, quality, etc.)
o How do we optimally deploy computational and other
resources to process data from disparate sources?
o Preliminary efforts (with Q. Berthet), but a lot remains to be done
Geo-monitoring
stations around
California
Cellphone
seismo-meters
in Pasadena
+ = ?
Berthet, C., “Resource Allocation for Statistical Estimation,” Proc. of the IEEE, 2016
Summary
o Many action items…
o Large-scale practical demonstrations
o Other types of inference problems, eg., covariance computation
o Mathematically codify network / communication / computing
constraints?
o Ultimately inform data-gathering exercises?
users.cms.caltech.edu/~venkatc

More Related Content

Similar to CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-offs in Data Analysis - Venkat Chandrasekaran, Feb 12, 2018

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
probability.pptx
probability.pptxprobability.pptx
probability.pptxbisan3
 
NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many exampleszukun
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxsmithashetty24
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point AnalysisMark Conway
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
A Review of Probability and its Applications Shameel Farhan new applied [Comp...
A Review of Probability and its Applications Shameel Farhan new applied [Comp...A Review of Probability and its Applications Shameel Farhan new applied [Comp...
A Review of Probability and its Applications Shameel Farhan new applied [Comp...shameel farhan
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in RomaChristian Robert
 

Similar to CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-offs in Data Analysis - Venkat Chandrasekaran, Feb 12, 2018 (20)

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
probability.pptx
probability.pptxprobability.pptx
probability.pptx
 
modeling.ppt
modeling.pptmodeling.ppt
modeling.ppt
 
NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many examples
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
A Review of Probability and its Applications Shameel Farhan new applied [Comp...
A Review of Probability and its Applications Shameel Farhan new applied [Comp...A Review of Probability and its Applications Shameel Farhan new applied [Comp...
A Review of Probability and its Applications Shameel Farhan new applied [Comp...
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-offs in Data Analysis - Venkat Chandrasekaran, Feb 12, 2018

  • 1. Computational and Statistical Tradeoffs Venkat Chandrasekaran Caltech Joint with Michael Jordan (Berkeley), Yong Sheng Soh (Caltech), Quentin Berthet (Cambridge)
  • 2. Time-Constrained Inference o Require decision after a fixed (usually small) amount of time Ad exchange
  • 3. Classical Analysis in Statistics o Previously, key bottleneck: amount of data o Consider ``best’’ estimator without much regard for computational considerations o Minimax analysis o More recently, time/memory/other infrastructure is bottleneck o Data is plentiful in several domains o Need to incorporate these additional constraints formally
  • 4. A Thought Experiment o The standard inferential tradeoff o Process 5MB of data in 1 hour, error = 0.1 o Process 5TB in 1 month, error = 0.0001 o Given 5TB of data, happy with error = 0.1 o Don’t care about small improvements in error o Statistical models are only approximations of reality
  • 5. A Thought Experiment o The standard inferential tradeoff o Process 5MB of data in 1 hour, error = 0.1 o Process 5TB in 1 month, error = 0.0001 o Given 5TB of data, happy with error = 0.1 o Don’t care about small improvements in error o Statistical models are only approximations of reality o More data useful for less computation?
  • 6. Computer Science vs. Statistics Error Amount of data Runtime / Space / … Computer ScienceComputation / Statistics tradeoffs Statistics
  • 7. Time-Data Tradeoffs o Consider an inference problem with fixed (desired) error o Inference procedures viewed as points in plot Runtime Minimax risk lower bound - well understood Computational lower bound - poorly understood - depends on computational model A B C D E F Number of samples
  • 8. Time-Data Tradeoffs o Consider an inference problem with fixed (desired) error o Need “weaker” algorithms for larger datasets to reduce runtime o At some stage, throw away dataRuntime A B C D E F Number of samples
  • 9. An Estimation Problem o Signal from known (bounded) set o Noise o Observation model o Observe i.i.d. samples
  • 10. Convex Programming Estimator o Sample mean is sufficient statistic o Natural M-estimator o Convex programming M-estimator o Constraint is a convex set such that
  • 11. Convex Programming Estimator o Long history of shrinkage estimation in statistics o James, Stein (1961) o Donoho, Johnstone (early 1990s) o Shrinkage onto convex sets for tractability o Many surprises in high dimensions, i.e., large p o More recently o L1 norm, trace norm, max norm, …
  • 12. Statistical Performance of Estimator o Defn 1: The cone of feasible directions into a convex set C is defined as
  • 13. Statistical Performance of Estimator o Defn 1: The cone of feasible directions into a convex set C is defined as o Defn 2: The Gaussian (squared) complexity of a cone T is defined as
  • 14. Statistical Performance of Estimator o Thm: The risk of the estimator is o Proof: Apply optimality conditions o Intuition: Only consider error in feasible cone o Many improvements possible to this theorem
  • 15. Weakening via Convex Relaxation o Thm: The risk of the estimator is o Corr: To obtain risk of at most 1,
  • 16. Weakening via Convex Relaxation o Corr: To obtain risk of at most 1, Monotonic in C
  • 17. Weakening via Convex Relaxation If we have access to larger n, can use larger C  Obtain “weaker” estimation algorithm
  • 18. Hierarchy of Convex Relaxations o If “algebraic”, then one can obtain family of outer convex approximations o Polyhedral, semidefinite, hyperbolic, relative entropy relaxations (Sherali-Adams, Boyd, Parrilo, Lasserre, Renegar, C., Shah) o Sets ordered by computational complexity o Central role played by lift-and-project
  • 19. Before we get to examples … o How do we calculate runtime? o Total runtime = np + # ops for projection Computing sample mean Subsequent processing With more data, this increases With more data, this decreases
  • 20. Example 1 o consists of cut matrices o E.g., collaborative filtering, clustering
  • 21. Example 2 o Signal set consists of all perfect matchings in complete graph o E.g., network inference
  • 22. Example 3 o Banding estimators for covariance matrices o Bickel-Levina [2007]; Cai, Zhang, Zhou [2010] o Assume known variable ordering o Stylized problem: let M be known tridiagonal matrix o Signal set
  • 23. Example 4 o consists of all adjacency matrices of graphs with only a clique on square-root of the nodes o E.g., sparse PCA, gene expression patterns o Kolar et al. (2010)
  • 24. Example 4 o What if we use an even weaker relaxation? o E.g., just take the sample mean as the estimate
  • 25. Example 4 o What if we use an even weaker relaxation? o E.g., just take the sample mean as the estimate o o In this case, makes sense to throw away data …
  • 26. Recall Plot … o At some stage, throw away data o Several caveats ‒ Need computational lower bounds ‒ Depends on computational architecture ‒ Parallel vs. serial, … Runtime A B C D E F Number of samples
  • 27. Beyond Denoising Problems … o Consider a high-dimensional time series o Goal: reliably estimate change-points
  • 28. Change-Point Estimation o Suppose we sample more frequently, but number of changes remains the same o Can again use convex relaxation to obtain time-data tradeoffs o Joint work with Y.S. Soh (Caltech) Soh, C., “High-Dimensional Changepoint Estimation: Combining Filtering and Convex Optimization,” ACHA, 2016
  • 29. Heterogenous Data o Preceding setups assume unrealistically that data are “stationary” (same type, quality, etc.) o How do we optimally deploy computational and other resources to process data from disparate sources? o Preliminary efforts (with Q. Berthet), but a lot remains to be done Geo-monitoring stations around California Cellphone seismo-meters in Pasadena + = ? Berthet, C., “Resource Allocation for Statistical Estimation,” Proc. of the IEEE, 2016
  • 30. Summary o Many action items… o Large-scale practical demonstrations o Other types of inference problems, eg., covariance computation o Mathematically codify network / communication / computing constraints? o Ultimately inform data-gathering exercises? users.cms.caltech.edu/~venkatc