Slides from keynote presentation at 3rd Taiwan Summer Workshop in Information Management (TSWIM) by Galit Shmueli on "To Explain or To Predict? Predictive Analytics in Information Systems Research"
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
Slide from Prof. Galit Shmueli's talk at University of Toronto's Rotman School of Management, March 4, 2016. This talk is part of Rotman's Big Data Expert Speaker Series.
https://www.rotman.utoronto.ca/ProfessionalDevelopment/Events/UpcomingEvents/20160304GalitShmueli.aspx
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
Slide from Prof. Galit Shmueli's talk at University of Toronto's Rotman School of Management, March 4, 2016. This talk is part of Rotman's Big Data Expert Speaker Series.
https://www.rotman.utoronto.ca/ProfessionalDevelopment/Events/UpcomingEvents/20160304GalitShmueli.aspx
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
Slides accompanying Malcolm Moore’s 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTINGIJMIT JOURNAL
This paper presents performance analysis of hybrid model comprise of concordance and Genetic
Programming (GP) to forecast financial market with some existing models. This scheme can be used for in
depth analysis of stock market. Different measures of concordances such as Kendall’s Tau, Gini’s Mean
Difference, Spearman’s Rho, and weak interpretation of concordance are used to search for the pattern in
past that look similar to present. Genetic Programming is then used to match the past trend to present
trend as close as possible. Then Genetic Program estimates what will happen next based on what had
happened next. The concept is validated using financial time series data (S&P 500 and NASDAQ indices)
as sample data sets. The forecasted result is then compared with standard ARIMA model and other model
to analyse its performance
Exploratory Factor Analysis; Concepts and TheoryHamed Taherdoost
Exploratory factor analysis is a complex and multivariate statistical technique commonly employed in information system, social science, education and psychology. This presentation intends to provide a simplified collection of information for researchers and practitioners undertaking exploratory factor analysis (EFA) and to make decisions about best practice in EFA. Particularly, the objective of this presentation is to provide practical and theoretical information on decision making of sample size, extraction, number of factors to retain and rotational methods.
I am Joshua M. I am a Statistics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, Michigan State University, USA
I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignments.
Recognition the needs and acceptance of individuals is the beginning stage of any businesses and this understanding would be helpful to find the way of future development, thus academicians are interested to realize the factors that drive users’ acceptance or rejection of technologies. A number of models and frameworks have been developed to explain user adoption of new technologies and these models introduce factors that can affect the user acceptance. This presentation provides an overview of theories and models regarding user acceptance of technology has provided.
Would you like greater confidence that the models you build are genuinely useful and can drive rational decisions? This slideshow will show how to build the most useful models that fully exploit all the information in your data, simply and easily.
Join us for an upcoming live webcast to learn more about using JMP: http://www.jmp.com/uk/about/events/webcasts/
And if you'd like to try JMP, here's how: http://www.jmp.com/uk/software/try-jmp.shtml?product=jmp&ref=top
Statistics is all about facts and ratio, well it is a lot more than that and understanding every bit of it requires right statistics assignment help from some expert sources. We at helpmeinhomework are one such homework helping source providing adequate help whenever necessary.
Slides accompanying Malcolm Moore’s 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTINGIJMIT JOURNAL
This paper presents performance analysis of hybrid model comprise of concordance and Genetic
Programming (GP) to forecast financial market with some existing models. This scheme can be used for in
depth analysis of stock market. Different measures of concordances such as Kendall’s Tau, Gini’s Mean
Difference, Spearman’s Rho, and weak interpretation of concordance are used to search for the pattern in
past that look similar to present. Genetic Programming is then used to match the past trend to present
trend as close as possible. Then Genetic Program estimates what will happen next based on what had
happened next. The concept is validated using financial time series data (S&P 500 and NASDAQ indices)
as sample data sets. The forecasted result is then compared with standard ARIMA model and other model
to analyse its performance
Exploratory Factor Analysis; Concepts and TheoryHamed Taherdoost
Exploratory factor analysis is a complex and multivariate statistical technique commonly employed in information system, social science, education and psychology. This presentation intends to provide a simplified collection of information for researchers and practitioners undertaking exploratory factor analysis (EFA) and to make decisions about best practice in EFA. Particularly, the objective of this presentation is to provide practical and theoretical information on decision making of sample size, extraction, number of factors to retain and rotational methods.
I am Joshua M. I am a Statistics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, Michigan State University, USA
I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignments.
Recognition the needs and acceptance of individuals is the beginning stage of any businesses and this understanding would be helpful to find the way of future development, thus academicians are interested to realize the factors that drive users’ acceptance or rejection of technologies. A number of models and frameworks have been developed to explain user adoption of new technologies and these models introduce factors that can affect the user acceptance. This presentation provides an overview of theories and models regarding user acceptance of technology has provided.
Would you like greater confidence that the models you build are genuinely useful and can drive rational decisions? This slideshow will show how to build the most useful models that fully exploit all the information in your data, simply and easily.
Join us for an upcoming live webcast to learn more about using JMP: http://www.jmp.com/uk/about/events/webcasts/
And if you'd like to try JMP, here's how: http://www.jmp.com/uk/software/try-jmp.shtml?product=jmp&ref=top
Statistics is all about facts and ratio, well it is a lot more than that and understanding every bit of it requires right statistics assignment help from some expert sources. We at helpmeinhomework are one such homework helping source providing adequate help whenever necessary.
MIS (Management Information System) in Fashion & Textile IndustryAnuradha Sajwan
The presentation has been prepared by the students of MFM(Master Of Fashion Management), New Delhi as a part of the study of the Role of Information Systems in Fashion & Textile Industry
Research methods and paradigms is a topic from the subject Methods of Research (FC 402) of the degree Master of Arts in Educational Management, quantitative research, descriptive, survey, developmental, correlational, causal-comparative, experimental, true experimental, quasi-experimental, qualitative research, mixed methods research
This presents an overview about relevance and significance of statistics as a valid tool in enhancing quality of research. It also touches upon some misuse and abuse of statistics.
Chao Wrote Some trends that influence human resource are, Leade.docxsleeperharwell
Chao Wrote:
Some trends that influence human resource are, Leadership Development and Learning Opportunities, Data and Analytics, Compliance and Regulation, Controlling and Containing Costs, and More Competition for Talent. But the one that I like and think its much important is leadership development and learning opportunity because in this role, companies give the employees the opportunity to learn and grow with the leadership training and this will show employees that the company wants employee to be more engage. Plus, this kind of program can also help nurture leadership abilities and professional development. The other trend I think that plays a very important role is knowing the compliance and regulations because in this area, compliance and regulation changes all the time and companies need to be more pro-active and make changes as they have updates with any new compliance or regulations. For this, many companies turn to technology solutions to minimize the costs and resources devoted to this task, freeing up HR professionals to focus on other aspects of their work. Some strategic resource examples include recruitment, learning and development, compensation, and performance appraisal.
Quane Wrote:
Hi Dr. Clark and Classmates,
Through my assigned reading for week 1, I've learned that one-third of large U.S. businesses selected non-Human Resources managers to operate in top tier executive positions. Consequently, the most successful Human Resource executive do have prior Human Resources experience so for the select few managers without a Human Resource background that get the opportunity to serve in a Human Resource executive will increase their probability of successful career progression. The new tentative transition for businesses is to outsource the majority of their Human Resource operational needs to large Human Resource firms that service multiple businesses. Many frequently utilized services will be offered to employees online in order to address the increased demand for specialized Human Resource services as well as shorten response times and increase efficiency.
Strategic Human Resource Management is the process of determining ways to evaluate an organization's unique Human Resources need and create a plan that facilitates the establishment and maintenance of efficient personnel management systems that support the short term and long term functionality and sustained growth of an organization.
Exercise 8 - Case Study Research
Develop a hypothetical research scenario that would warrant the application of the case study.
What type of approach within the qualitative method would be used? Why or why not?
Exercise 9 - Perspectives in Qualitative Methods
Develop a hypothetical research scenario that would warrant the application of the ethnographic, narrative or phenomenological approach.
What type of design would be best utilized along with this approach?
Exercise 10 - Factors in Mixed Methods Research
What are the strengths.
Chao Wrote Some trends that influence human resource are, Leade.docxketurahhazelhurst
Chao Wrote:
Some trends that influence human resource are, Leadership Development and Learning Opportunities, Data and Analytics, Compliance and Regulation, Controlling and Containing Costs, and More Competition for Talent. But the one that I like and think its much important is leadership development and learning opportunity because in this role, companies give the employees the opportunity to learn and grow with the leadership training and this will show employees that the company wants employee to be more engage. Plus, this kind of program can also help nurture leadership abilities and professional development. The other trend I think that plays a very important role is knowing the compliance and regulations because in this area, compliance and regulation changes all the time and companies need to be more pro-active and make changes as they have updates with any new compliance or regulations. For this, many companies turn to technology solutions to minimize the costs and resources devoted to this task, freeing up HR professionals to focus on other aspects of their work. Some strategic resource examples include recruitment, learning and development, compensation, and performance appraisal.
Quane Wrote:
Hi Dr. Clark and Classmates,
Through my assigned reading for week 1, I've learned that one-third of large U.S. businesses selected non-Human Resources managers to operate in top tier executive positions. Consequently, the most successful Human Resource executive do have prior Human Resources experience so for the select few managers without a Human Resource background that get the opportunity to serve in a Human Resource executive will increase their probability of successful career progression. The new tentative transition for businesses is to outsource the majority of their Human Resource operational needs to large Human Resource firms that service multiple businesses. Many frequently utilized services will be offered to employees online in order to address the increased demand for specialized Human Resource services as well as shorten response times and increase efficiency.
Strategic Human Resource Management is the process of determining ways to evaluate an organization's unique Human Resources need and create a plan that facilitates the establishment and maintenance of efficient personnel management systems that support the short term and long term functionality and sustained growth of an organization.
Exercise 8 - Case Study Research
Develop a hypothetical research scenario that would warrant the application of the case study.
What type of approach within the qualitative method would be used? Why or why not?
Exercise 9 - Perspectives in Qualitative Methods
Develop a hypothetical research scenario that would warrant the application of the ethnographic, narrative or phenomenological approach.
What type of design would be best utilized along with this approach?
Exercise 10 - Factors in Mixed Methods Research
What are the strengths ...
Statistical Techniques for Processing & Analysis of Data Part 9.pdfAdebisiAdetayo1
the present book has been written with two clear objectives, viz., (i) to
enable researchers, irrespective of their discipline, in developing the most appropriate methodology
for their research studies; and (ii) to make them familiar with the art of using different researchmethods
and techniques. It is hoped that the humble effort made in the form of this book will assist in
the accomplishment of exploratory as well as result-oriented research studies.
Difference Between Qualitative and Quantitative Research.docxzekfeker
Literature search tools Zekarias Tilaye
Hints:
These tools help researchers to find and collect relevant scholarly literature, such as
academic journals, books, and conference proceedings. Some examples of literature
search tools include Google Scholar, PubMed, and Scopus.
Therefore, please provide us with clear information on this topic.
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Research methods can generally be divided into two main categories: Quantitative and Qualitative. This webinar will provide an overview of quantitative methods with a brief distinction between quantitative and qualitative methods. We will focus on when and how to use quantitative research and discuss type of variables and statistical analysis.
Presentation will be led by Dr. Carlos Cardillo.
About CORE:
The Culture of Research and Education (C.O.R.E.) webinar series is spearheaded by Dr. Bernice B. Rumala, CORE Chair & Program Director of the Ph.D. in Health Sciences program in collaboration with leaders and faculty across all academic programs.
This innovative and wide-ranging series is designed to provide continuing education, skills-building techniques, and tools for academic and professional development. These sessions will provide a unique chance to build your professional development toolkit through presentations, discussions, and workshops with Trident’s world-class faculty.
For further information about CORE or to present, you may contact Dr. Bernice B. Rumala at Bernice.rumala@trident.edu
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
Keynote address by Galit Shmueli at 2016 Israeli Conference on Mechanical Engineering (ICME), Technion, Israel (Nov 23, 2016). http://icme2016.net.technion.ac.il/
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
This is my presentation at the 2016 Chinese New Year Banquet of NTHU's College of Technology Management. In this 15-min presentation, I describe my entrepreneurial approach to analytics, and the two papers that won me the E.SUN Academic Award.
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
Slides by Galit Shmueli for keynote presentation at 2015 Statistical Challenges in eCommerce Research (SCECR) symposium, Addis Ababa, Ethiopia (www.scecr.org)
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Galit Shmueli
Prof. Galit Shmueli introduces and describes the NTHU-EZTABLE data mining contest on Kaggle.com (talk at Taiwan's National Tsing Hua University, Oct 29, 2014). https://inclass.kaggle.com/c/predict-repeat-restaurant-bookings
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
1. To Explain or To Predict?
Predictive Analytics in IS Research
3rd Taiwan Summer Workshop
on Information Management
July 2015
Galit Shmuéli
2. Galit Shmueli (徐茉莉)
www.galitshmueli.com
❷ 2000-2002
Carnegie Mellon Univ.
Visiting Assistant Prof.
Dept. of Statistics
❸ 2002-2012
Univ. of Maryland College Park
Assistant then Associate Prof. of
Statistics & Management Science
R H Smith School of Business
2008-2014
Rigsum Institute (Bhutan)
Co-Director, Rigsum
Research Lab
❹ 2011-2014
Indian School of Business
SRITNE Chaired Prof. of Data
Analytics, Associate Prof. of
Statistics & Info Systems
❶ 1994-2000
Israel Institute of
Technology
MSc + PhD, Statistics
2014-… NTHU
Institute of Service Science
Director, Center for Service
Innovation & Analytics
3. Research in Data Analytics
www.galitshmueli.com
• Statistical strategy
• ‘Entrepreneurial’ statistical &
data mining modeling (new
conditions & environments)
• Business analytics
In progress…
9. Statistical modeling in
MIS research
Purpose: test causal theory (“explain”)
Association-based statistical models
Prediction nearly absent
10. Start with a causal
theory
Generate causal
hypotheses on
constructs
Operationalize constructs → Measurable variables
Fit statistical model
Statistical inference → Causal conclusions
Explanatory modeling à-la MIS
11. In MIS,
data analysis is mainly used for testing
causal theory.
“If it explains, it predicts”
12. “Empirical prediction alone
is un-scientific”
Some statisticians share this view:
The two goals in analyzing data... I prefer to describe
as “management” and “science”. Management seeks
profit... Science seeks truth.
- Parzen, Statistical Science 2001
13. Prediction in top research journals in
Information Systems
Predictive goal?
Predictive modeling?
Predictive assessment?
1990-2006
15. generate new theory
develop measures
compare theories
improve theory
assess relevance
evaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research”
MIS Quarterly, 2011
16. “A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
17. Philosophy of Science
“Explanation and prediction have the
same logical structure”
Hempel & Oppenheim, 1948
“It becomes pertinent to investigate the
possibilities of predictive procedures
autonomous of those used for explanation”
Helmer & Rescher, 1959
“Theories of social and human behavior
address themselves to two distinct goals of
science: (1) prediction and (2) understanding”
Dubin, Theory Building, 1969
19. Explanatory Model:
Test/quantify causal effect for
“average” record in population
Predictive Model:
Predict new individual
observations
Different Scientific Goals
Different generalization
25. Four aspects
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Y=F(X)
Y=f(X)
26. Predict ≠ Explain
+ ?
“we tried to benefit from an extensive
set of attributes describing each of the
movies in the dataset. Those attributes
certainly carry a significant signal and
can explain some of the user behavior.
However… they could not help at all
for improving the [predictive]
accuracy.”
Bell et al., 2008
28. Explain ≠ Predict
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%
“We are planning to… develop predictive models for bioavailability
and bioequivalence”
Lester M. Crawford, 2005
Acting Commissioner of Food & Drugs
29. “For a long time, we thought that
Tamoxifen was roughly 80%
effective for breast cancer
patients.
But now we know much more:
we know that it’s 100% effective
in 70%-80% of the patients, and
ineffective in the rest.”
31. Study design
Hierarchical data
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?
& data collection
36. Evaluation, Validation
& Model Selection
Training dataEmpirical
model Holdout data
Predictive power
Over-fitting
analysis
Theoretical
model
Empirical
model
Data
Validation
Model fit ≠
Explanatory power
37. Inference
Model Use
test causal theory
generate new theory
develop measures
compare theories
improve theory
assess relevance
Evaluate predictability
Predictive performance
Over-fitting analysis
Null hypothesis
Naïve/baseline
41. The predictive power of an
explanatory model has important
scientific value
Relevance, reality check, predictability
42. Current State in Social Sciences
(and MIS)
“While the value of scientific prediction… is beyond
question… the inexact sciences [do not] have…the
use of predictive expertise well in hand.”
Helmer & Rescher, 1959
Distinction blurred
Unfamiliarity with predictive
modeling/assessment
Prediction underappreciated
45. How does this impact an
organization’s actions?
…and our lives?
46. What can be done?
Acknowledge difference
Learn/teach prediction
Leverage prediction in research
BUT
focus on its scientific uses:
47. generate new theory
develop measures
compare theories
improve theory
assess relevance
evaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research”
MIS Quarterly, 2011
48. Shmueli (2010) “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011) “Predictive Analytics in IS Research”, MISQ