Slide from Prof. Galit Shmueli's talk at University of Toronto's Rotman School of Management, March 4, 2016. This talk is part of Rotman's Big Data Expert Speaker Series.
https://www.rotman.utoronto.ca/ProfessionalDevelopment/Events/UpcomingEvents/20160304GalitShmueli.aspx
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Â
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Â
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
Â
Slides from keynote presentation at 3rd Taiwan Summer Workshop in Information Management (TSWIM) by Galit Shmueli on "To Explain or To Predict? Predictive Analytics in Information Systems Research"
Slides accompanying Malcolm Mooreâs 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Exploratory data analysis data visualization:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
Maximize insight into a data set.
Uncover underlying structure.
Extract important variables.
Detect outliers and anomalies.
Test underlying assumptions.
Develop parsimonious models.
Determine optimal factor settings
Causal Inference in Data Science and Machine LearningBill Liu
Â
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Â
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
This slide discuss predictive data analytics models and their applications in broader content. It gives simple examples of regression and classification.
Would you like greater confidence that the models you build are genuinely useful and can drive rational decisions? This slideshow will show how to build the most useful models that fully exploit all the information in your data, simply and easily.
Join us for an upcoming live webcast to learn more about using JMP: http://www.jmp.com/uk/about/events/webcasts/
And if you'd like to try JMP, here's how: http://www.jmp.com/uk/software/try-jmp.shtml?product=jmp&ref=top
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
Â
Slides from keynote presentation at 3rd Taiwan Summer Workshop in Information Management (TSWIM) by Galit Shmueli on "To Explain or To Predict? Predictive Analytics in Information Systems Research"
Slides accompanying Malcolm Mooreâs 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Exploratory data analysis data visualization:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
Maximize insight into a data set.
Uncover underlying structure.
Extract important variables.
Detect outliers and anomalies.
Test underlying assumptions.
Develop parsimonious models.
Determine optimal factor settings
Causal Inference in Data Science and Machine LearningBill Liu
Â
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Â
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
This slide discuss predictive data analytics models and their applications in broader content. It gives simple examples of regression and classification.
Would you like greater confidence that the models you build are genuinely useful and can drive rational decisions? This slideshow will show how to build the most useful models that fully exploit all the information in your data, simply and easily.
Join us for an upcoming live webcast to learn more about using JMP: http://www.jmp.com/uk/about/events/webcasts/
And if you'd like to try JMP, here's how: http://www.jmp.com/uk/software/try-jmp.shtml?product=jmp&ref=top
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
Â
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Difference Between Qualitative and Quantitative Research.docxzekfeker
Â
Literature search tools ï Zekarias Tilaye
Hints:
ï· These tools help researchers to find and collect relevant scholarly literature, such as
academic journals, books, and conference proceedings. Some examples of literature
search tools include Google Scholar, PubMed, and Scopus.
Therefore, please provide us with clear information on this topic.
Research methods can generally be divided into two main categories: Quantitative and Qualitative. This webinar will provide an overview of quantitative methods with a brief distinction between quantitative and qualitative methods. We will focus on when and how to use quantitative research and discuss type of variables and statistical analysis.
Presentation will be led by Dr. Carlos Cardillo.
About CORE:
The Culture of Research and Education (C.O.R.E.) webinar series is spearheaded by Dr. Bernice B. Rumala, CORE Chair & Program Director of the Ph.D. in Health Sciences program in collaboration with leaders and faculty across all academic programs.
This innovative and wide-ranging series is designed to provide continuing education, skills-building techniques, and tools for academic and professional development. These sessions will provide a unique chance to build your professional development toolkit through presentations, discussions, and workshops with Tridentâs world-class faculty.
For further information about CORE or to present, you may contact Dr. Bernice B. Rumala at Bernice.rumala@trident.edu
data science course with placement in hyderabadmaneesha2312
Â
360DigiTMG delivers data science course with placement in hyderabad, where you can gain practical experience in key methods and tools through real-world projects. Study under skilled trainers and transform into a skilled Data Scientist. Enroll today!
BDW17 London - Totte Harinen, Uber - Why Big Data Didnât End Causal InferenceBig Data Week
Â
Ten years ago there were rumours of the death of causal inference. Big data was supposed to enable us to rely on purely correlational data to predict and control the world. In this talk, I argue that the rumours were strongly exaggerated. Causal inference is becoming increasingly relevant thanks to improvements in inference methods andâironicallyâthe availability of data. Far from becoming marginalised, causal inference is today more relevant than itâs ever been.
Research methods and paradigms is a topic from the subject Methods of Research (FC 402) of the degree Master of Arts in Educational Management, quantitative research, descriptive, survey, developmental, correlational, causal-comparative, experimental, true experimental, quasi-experimental, qualitative research, mixed methods research
This presents an overview about relevance and significance of statistics as a valid tool in enhancing quality of research. It also touches upon some misuse and abuse of statistics.
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
Â
Keynote address by Galit Shmueli at 2016 Israeli Conference on Mechanical Engineering (ICME), Technion, Israel (Nov 23, 2016). http://icme2016.net.technion.ac.il/
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
Â
This is my presentation at the 2016 Chinese New Year Banquet of NTHU's College of Technology Management. In this 15-min presentation, I describe my entrepreneurial approach to analytics, and the two papers that won me the E.SUN Academic Award.
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
Â
Slides by Galit Shmueli for keynote presentation at 2015 Statistical Challenges in eCommerce Research (SCECR) symposium, Addis Ababa, Ethiopia (www.scecr.org)
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Galit Shmueli
Â
Prof. Galit Shmueli introduces and describes the NTHU-EZTABLE data mining contest on Kaggle.com (talk at Taiwan's National Tsing Hua University, Oct 29, 2014). https://inclass.kaggle.com/c/predict-repeat-restaurant-bookings
Palestine last event orientationfvgnh .pptxRaedMohamed3
Â
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as âdistorted thinkingâ.
Operation âBlue Starâ is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Â
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
How to Make a Field invisible in Odoo 17Celine George
Â
It is possible to hide or invisible some fields in odoo. Commonly using âinvisibleâ attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
Â
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Model Attribute Check Company Auto PropertyCeline George
Â
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Management
1. Big Data â
To Explain or To Predict?
Big Data Experts Speaker Series
Rotman School of Management, U Toronto, March 2016
Galit Shmueli
2. Galit Shmueli (ćŸèè)
www.galitshmueli.com
â· 2000-2002
Carnegie Mellon Univ.
Visiting Assistant Prof.
Dept. of Statistics
âž 2002-2012
Univ. of Maryland College Park
Assistant then Associate Prof. of
Statistics & Management Science
R H Smith School of Business
2008-2014
Rigsum Institute (Bhutan)
Co-Director, Rigsum
Research Lab
âč 2011-2014
Indian School of Business
SRITNE Chaired Prof. of Data
Analytics, Associate Prof. of
Statistics & Info Systems
ⶠ1994-2000
Israel Institute of
Technology
MSc + PhD, Statistics
2014-⊠NTHU
Institute of Service Science
Director, Center for Service
Innovation & Analytics
3. Research in Data Analytics
âEntrepreneurialâ statistical
& data mining modeling
(for todayâs problems)
Interdisciplinary modeling
Statistical Strategy
To Explain or To Predict?
Information Quality
Regression with Big Data
8. Statistical modeling in
social sciences &
management research
Purpose: test causal theory (âexplainâ)
Association-based statistical models
Prediction nearly absent
9. Start with a causal
theory
Generate causal
hypotheses on
constructs
Operationalize constructs â Measurable variables
Fit statistical model
Statistical inference â Causal conclusions
Classic journal paper
10. In the social sciences,
data analysis is mainly used for testing
causal theory.
âIf it explains, it predictsâ
11. âEmpirical prediction alone
is un-scientificâ
Some statisticians share this view:
The two goals in analyzing data... I prefer to describe
as âmanagementâ and âscienceâ. Management seeks
profit... Science seeks truth.
- Parzen, Statistical Science 2001
12. Prediction in top research journals in
Information Systems
Predictive goal?
Predictive modeling?
Predictive assessment?
1990-2006
18. Philosophy of Science
âExplanation and prediction have the
same logical structureâ
Hempel & Oppenheim, 1948
âIt becomes pertinent to investigate the
possibilities of predictive procedures
autonomous of those used for explanationâ
Helmer & Rescher, 1959
âTheories of social and human behavior
address themselves to two distinct goals of
science: (1) prediction and (2) understandingâ
Dubin, Theory Building, 1969
20. Explanatory Model:
Test/quantify causal effect for
âaverageâ record in population
Predictive Model:
Predict new individual
observations
Different Scientific Goals
Different generalization
25. Predict â Explain
+ ?
âwe tried to benefit from an extensive
set of attributes describing each of the
movies in the dataset. Those attributes
certainly carry a significant signal and
can explain some of the user behavior.
However⊠they could not help at all
for improving the [predictive]
accuracy.â
Bell et al., 2008
26. Explain â Predict
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%
âWe are planning to⊠develop predictive models for bioavailability
and bioequivalenceâ
Lester M. Crawford, 2005
Acting Commissioner of Food & Drugs
27. âFor a long time, we thought that
Tamoxifen was roughly 80%
effective for breast cancer
patients.
But now we know much more:
we know that itâs 100% effective
in 70%-80% of the patients, and
ineffective in the rest.â
29. Study design
Hierarchical data
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. meas. accuracy)
How much data?
How to sample?
& data collection
34. Evaluation, Validation
& Model Selection
Training dataEmpirical
model Holdout data
Predictive power
Over-fitting
analysis
Theoretical
model
Empirical
model
Data
Validation
Model fit â
Explanatory power
35. Inference
Model Use: Industry
Identify causal
factors
generate
predictions for
new data
Predictive performance
Over-fitting analysis
Null hypothesis
NaĂŻve/baseline
36. Inference
Model Use (Science)
test causal theory
generate new theory
develop measures
compare theories
improve theory
assess relevance
Evaluate predictability
Predictive performance
Over-fitting analysis
Null hypothesis
NaĂŻve/baseline
40. The predictive power of an
explanatory model has important
scientific value
Relevance, reality check, predictability
41. Current state in academia
(social sciences and management)
âWhile the value of scientific prediction⊠is beyond
question⊠the inexact sciences [do not] haveâŠthe
use of predictive expertise well in hand.â
Helmer & Rescher, 1959
Distinction blurred
Unfamiliarity with predictive
modeling/assessment
Prediction underappreciated