This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
YouTube Link: https://youtu.be/vpOLiDyhNUA
** Machine Learning Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
This Edureka PPT on 'What is a Neural Network' will help you understand how Neural Networks can be used to solve complex, data-driven problems along with their real-world applications.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
Currently hundreds of tools are promising to make artificial intelligence accessible to the masses. Tools like DataRobot, H20 Driverless AI, Amazon SageMaker or Microsoft Azure Machine Learning Studio.
These tools promise to accelerate the time-to-value of data science projects by simplifying model building.
In the workshop we will approach the AI Topic head on!
What is AI? What can AI do today? What do I need to start my own project?
We do all this using Microsoft's Machine Learning Studio.
Trainer: Philipp von Loringhoven - Chef, Designer, Developer, Markeeter - Data Nerd!
He has acquired a lot of expertise in marketing, business intelligence and product development during his time at the Rocket Internet startups (Wimdu, Lamudi) and Projekt-A (Tirendo).
Today he supports customers of the Austrian digitisation agency TOWA as Director Data Consulting to generate an added value from their data.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
Data Science is a wonderful technology that has applications in almost every field. Let's learn the basics of this domain on 16th March at (time).
Agenda
1. What is Data Science? How is it different from ML, DL, and AI
2. Why is this skill in demand?
3. What are some popular applications of Data Science
4. Popular tools and frameworks used in Data Science
This slide deck gives a general overview of Data Visualization, with inspiring examples, the strength and weaknesses of the human visual system, a few technical frameworks that may be used for creating your own visualizations and some design concepts from the data visualization field.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a time series.
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
YouTube Link: https://youtu.be/aGu0fbkHhek
** Data Science Master Program: https://www.edureka.co/masters-program/data-scientist-certification **
This Edureka PPT on "Data Science Full Course" provides an end to end, detailed and comprehensive knowledge on Data Science. This Data Science PPT will start with basics of Statistics and Probability and then moves to Machine Learning and Finally ends the journey with Deep Learning and AI. For Data-sets and Codes discussed in this PPT, drop a comment.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
YouTube Link: https://youtu.be/vpOLiDyhNUA
** Machine Learning Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
This Edureka PPT on 'What is a Neural Network' will help you understand how Neural Networks can be used to solve complex, data-driven problems along with their real-world applications.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
Currently hundreds of tools are promising to make artificial intelligence accessible to the masses. Tools like DataRobot, H20 Driverless AI, Amazon SageMaker or Microsoft Azure Machine Learning Studio.
These tools promise to accelerate the time-to-value of data science projects by simplifying model building.
In the workshop we will approach the AI Topic head on!
What is AI? What can AI do today? What do I need to start my own project?
We do all this using Microsoft's Machine Learning Studio.
Trainer: Philipp von Loringhoven - Chef, Designer, Developer, Markeeter - Data Nerd!
He has acquired a lot of expertise in marketing, business intelligence and product development during his time at the Rocket Internet startups (Wimdu, Lamudi) and Projekt-A (Tirendo).
Today he supports customers of the Austrian digitisation agency TOWA as Director Data Consulting to generate an added value from their data.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
Data Science is a wonderful technology that has applications in almost every field. Let's learn the basics of this domain on 16th March at (time).
Agenda
1. What is Data Science? How is it different from ML, DL, and AI
2. Why is this skill in demand?
3. What are some popular applications of Data Science
4. Popular tools and frameworks used in Data Science
This slide deck gives a general overview of Data Visualization, with inspiring examples, the strength and weaknesses of the human visual system, a few technical frameworks that may be used for creating your own visualizations and some design concepts from the data visualization field.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a time series.
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
YouTube Link: https://youtu.be/aGu0fbkHhek
** Data Science Master Program: https://www.edureka.co/masters-program/data-scientist-certification **
This Edureka PPT on "Data Science Full Course" provides an end to end, detailed and comprehensive knowledge on Data Science. This Data Science PPT will start with basics of Statistics and Probability and then moves to Machine Learning and Finally ends the journey with Deep Learning and AI. For Data-sets and Codes discussed in this PPT, drop a comment.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Disclaimer
• I have no conflicts of interest to report
• The opinions presented are those of the author
and do not necessarily reflect those of the
University of West Florida
3. Learning Objectives
Upon completion of the presentation participants
should be able to:
• Summarize the characteristics of data science
• Summarize the skill sets for data scientists
• Compare and contrast predictive analytics using
statistics vs. machine learning
• Enumerate features of IBM Watson Analytics
(IBMWA)
• Enumerate features of WEKA machine learning
• List the challenges facing data science
6. Definitions
• Data science is “the scientific study of the
creation, validation and transformation of data to
create meaning.” 1 Because data science is
relatively new, definitions are still evolving. Data
science is a good “umbrella” term.
• Analytics is “the discovery and communication
of meaningful patterns in data.” While some
would argue for separating data analytics from
data mining and knowledge discovery from data
(KDD), we will use the terms interchangeably. 2
8. Critical need for data scientists with:
• Domain expertise (example: healthcare)
• In depth statistical knowledge
• Computer science expertise
• Machine learning expertise
• Programming expertise: R, SQL and Python
languages
• Relational database system (RDBS) knowledge
• Comfort level with “Big Data”
9. Historical Background
• While all industries (including sports) are
incorporating analytics and data science, the
business world was first.
• Businesses benefitted from knowing which
customers were likely to unsubscribe (churn) and
if you purchased item A, would you purchase item
B (market basket analysis).
• As far back as the 1960s a small group of
statisticians suggested their field should be
broadened to handle more volume and variety of
data.
• In the 1990s computer scientists developed and
promoted machine learning software.
10. Historical background
• There is evidence that many healthcare workers
lack training in statistics and machine learning. 3
• There is also evidence that statistics is not easy to
teach to non-statisticians and difficult to retain. 4
• Statisticians recommend knowledge of calculus
and linear algebra; not routinely studied by
healthcare workers. They often prefer that
statistical formulas should be calculated
longhand.
12. Historical Background
• As a result, many workers are not comfortable
with statistics and data analytics
• This observation flies in the face of a health data
explosion and a shortage of data scientists
• The data explosion is fueled by genomic, EHR-
related, wearable technology and social media
data.
• About 75% of healthcare data is unstructured
(free text), so difficult to analyze
• Enter the Big Data era to further confuse matters
13. Big Data Definition
• #1 Too much data to analyze on one computer
• #2 The Five V’s
• Volume: massive amounts of data are being generated
• Velocity: data is being generated so rapidly that it needs
to be analyzed without placing it in a database
• Variety: roughly 80% of data in existence is unstructured
so it won’t fit into a database or spreadsheet.
• Veracity: current data can be “messy” with missing data
and other challenges.
• Value: data scientists now have the capability to turn
large volumes of unstructured data into something
meaningful.5
14. Data Science is part of the federal
vision of a healthcare system
• Learning health system: “an ecosystem where all
stakeholders can securely, effectively and efficiently
contribute, share and analyze data” 6 (the PDCA
cycle)
• Precision medicine: “identifying which approaches
will be effective for which patients based on genetic,
environmental, and lifestyle factors.” Clearly this
initiative requires a big data approach to integrate
these data.7
• Population health requires data analytics
• Value based care requires data
15. Types of analytics (Gartner)
Predictive analytics describes four
attributes:
1. An emphasis on prediction
2. Rapid analysis measured in hours or
days
3. An emphasis on the business
relevance of the resulting insights (no
ivory tower analyses)
4. An emphasis on ease of use, thus
making the tools accessible to business
users.8
16. Predictive Analytics
• It could be argued that predictive analytics is the
most important aspect of data science, where an
outcome of importance is predicted based on
multiple factors influencing the outcome. This is
the area I will focus on
• Use cases will be discussed in the next slide
• I will not cover:
• Text mining with natural language processing (NLP) is
very important for mining unstructured data
• Data visualization software, such as Tableau and
QlikSense, used for descriptive analytics
• Deep Learning based on artificial intelligence (AI)
17. Predictive Analytics Use Cases
• Predict poor patient outcomes (morbidity)
• Sepsis prediction 9
• Impending renal failure 10
• Predict death (mortality)
• Predict readmissions: in fiscal 2016 only 24%
(799/3400) of reporting hospitals will not receive a
penalty (0.1%-3% range) for too many
readmissions. 11-12
• Predict high cost patients for population health
care management: 5% of Medicare/Medicaid
patients use 50% of resources. 13-14
18. Predictive Modeling Approaches
1. Modeling with statistics
2. Modeling with Machine Learning
3. Modeling using the R or Python programming
languages (not covered)
21. Predictive Analytics
• The most common approach is to use
classification where you predict an outcome
(dependent variable) that is categorical data (e.g.
lived, died) with multiple predictors (independent
variables). For example, you have a data set of
pregnant women with Zika virus. Some have
children with micro-encephalopathy and others
don’t. You run a classification model to see if
factors such as age, trimester of infection, fever,
symptoms, etc. predict micro-encephalopathy
• If the outcome is numerical then you would use
linear regression
22. Need for better data analytical tools
• We would benefit from more user friendly tools and
some degree of automation
• MS Excel with the Analysis ToolPak add-in is a
possibility but implies you know which stats tests to use
• There are also multiple statistical packages, such as
SPSS and SAS, also associated with a steep learning
curve
23. Need for better data analytical tools
• Tool #1: IBM Watson Analytics: automatics
predictive, descriptive and visualization analytics
• Tool #2 WEKA: open source machine learning
platform
24. IBM Watson Analytics
• New program offered in 2015 that is not related to Watson
Health (cognitive computing). Business oriented
• Program is based on SPSS-based statistical tests. Covers
regression, classification, decision trees, chi-square, t-
tests, etc.
• Program can automatically convert nominal data to
numerical and vice versa
• Versions
• Free
• Professional (Academic)
25. IBM Watson Analytics Academic
program
• Free for universities to use for teaching (non-commercial)
purposes. Includes 100 students/professor/year
• University of West Florida has used the program for about
12 months in a Health Informatics graduate course and a
Data Mining (computer science) course
• IBM did an on-site visit for training
• Multiple videos on YouTube
• PDF user guide available
26. IBM Watson analytics features
• IBMWA is completely online
• Accepts Excel and CSV input, as well as feeds from
most relational database systems (RDBSs)
• 100 GB storage
• Limits: 500 columns and 10 million rows
• Twitter Feed analysis
28. IBMWA versions
• About the time we submitted our detailed analysis of Watson
Analytics, IBM had created Watson-2 that combined
“Explore” and “Predict” into “Discover.” Watson-1 will be
retired shortly.
• Watson-2 includes statistical details about prediction when
the target/outcome is numerical. They are working on
adding the statistical details for categorical
targets/outcomes.
• Watson-2 includes a “data quality” score but doesn’t point
out missing or skewed data and outliers.
38. Confusion matrix is created but not explained
(degree of malignancy and no recurrence)
Predicted
No recurrence Recurrence
Actual No recurrence TN = 161 FP = 40 201
Recurrence FN = 40 TP = 45 85
201 85 286
Accuracy = TP + TN/Total = 72%
Sensitivity (recall) TP/FN + TP = 53%
Specificity = TN/TN + FP = 80%
Precision = TP/TP + FP = 53%
39. Create your Display
• Display can be shared by email, hyperlink, Tweet or downloaded
• Display is interactive
40. IBMWA limitations
• Business oriented, so not aligned perfectly with healthcare
data analytics. Predictive strength is good, but we are used
to sensitivity, specificity, PPV, NPV, ROC curves, etc.
• No choice of statistical tests
• IBMWA does not perform unsupervised learning
• This approach (results first, stats second) may not appeal to
purists
• Sample dataset I used was of excellent quality, therefore not
typical of many datasets
41. questions
• Process to apply for the academic program is easy. Apply at:
https://www.ibm.com/blogs/watson-analytics/calling-all-
academics-have-we-got-a-watson-analytics-for-you/
• IBM Contact information: Randy Messina at
randymessina@us.ibm.com
• My contact information: rhoyt@uwf.edu
IBMWA Application Process
42. Machine Learning (ML)
• Machine learning was developed by computer
scientists and is largely based on mathematics,
like statistics
• While some ML algorithms are difficult to
understand (e.g. neural networks), others are
easier, such as decision trees and regression
• Modeling is like baking: you decide what you
want to bake and the select the best recipe
(algorithm) to accomplish it. Optimally, you select
multiple recipes and compare the results!
Example: you want a model to decide what is
spam email. You test many algorithms for best
results and determine the best combination of
predictors
43. Algorithm Types
• Supervised learning
• Classification for categorical data (spam v no spam)
• Regression for numerical data ($, mortality rate)
• Unsupervised learning
• Association rules: an example would be market basket
analysis
• Clustering: when you don’t know the data categories
and you are looking for patterns in large data sets.
Used extensively with genetic data sets
• What ML has in common with statistical approach
• Both will perform linear regression, logistic regression
and decision trees
44. Open Source Free ML Programs
• WEKA 15
• Pentaho Community 16
• RapidMiner Community 17
• KNIME 18
• Orange 19
46. WEKA
• Named after a bird in New Zealand and stands
for Waikato Environment for Knowledge
Assessment
• Free software is associated with a free ML
course and a low cost textbook
• Software works on all operating systems
• WEKA is the only ML program mentioned that
does not require moving around widgets or
operators
49. Outcome Measurements
Accuracy is hitting a the bulls-eye every time.
Precision is hitting the same place each time, even if it is not the place you aimed for.
53. Predictive Analytics Report Card
• Many risk prediction models yield mediocre
results at this point (C-statistic .56 -.80), but we
are early in the game.
• Models need to work in real-time ideally
• Some risk models are used in healthcare
organizations that might not fit your patient
demographic, such as safety-net hospitals, etc.
• It is helpful to identify patients at risk for
morbidity and mortality but you still have to have
an intervention team, ready to apply additional
resources to high risk patients
54. Data Science Education Stats
• Certificate (82); Bachelor (24); Masters (259);
Doctorate (14)
• 37% of courses are offered online
• 101 programs were related to business schools,
40 related to mathematics and statistics
departments, 39 related to computer science
departments and 9 related to new data science
departments. The remainder were from a wide
variety of college and university departments.
55. Data Science Centers
• Multiple universities and medical centers have
created “data centers” to create the right
environment for data analysis and research
• They tend to be multi-disciplinary and not just
relegated to the computer science department
• Every industry seems to have interest in data
science and analytics, hence the need to create
a central hub
56. Data Science Resources
• My web site www.informaticseducation.org has a
resource center with Chapter 23: Data Science
resources:
• Data sets: health and non-health related
• Free data science courses
• Free statistics resources
• Free visualization software
• Free programming tutorials
• Other helpful stuff
• Chapter 23 is available thru Lulu.com for $2.99
57. ONC Sponsored Free Courses
• Healthcare Data Analytics
• Bellevue College (limited to Veterans Administration
Staff Only)
• Columbia University
• Normandale College
• Oregon Health & Science University
• University of Alabama at Birmingham
• University of Texas Health Science Center at Houston
58. Machine Learning Resources
• I would recommend beginning with Jason
Brownlee’s eBooks:
• Machine Learning Algorithms $27 (163 pages)
• Machine Learning Mastery with WEKA $27 (248 pages)
• www.machinelearningmastery.com
60. Data Science Challenges
• Not enough data scientists; it is estimated that we
will need 140,000 by 2018 20
• Not enough data science training programs
• Expensive to build big data and data science
centers
• Privacy and security concerns
• Hype. Adverse unintended consequences (AUCs)
• Medical data is heterogeneous and complex,
compared to other industries 21
• Correlation does not equal causation
• 80% of the time spent with data analysis is spent
preparing the data for analysis 22
61. Data Science Challenges
• Difficult to find patient-level data
• It has been stated that clinical medicine accounts
for only 20% of population health; 80% is due to
psycho-social-environmental-behavioral-
economic factors that are beyond the control of
the healthcare system. Therefore, interventions
based on good data can result in no impact 23
• Just because you have technology and
voluminous data doesn’t mean it changes patient
outcomes. Example: fitness devices affecting
behavior 24
62. Make data science part of patient care
Not everyone will be able to afford a robust
analytics platform overlaid on a clinical data
warehouse and the ability to handle Big Data.
But we can start the educational process to learn
more about data science
64. Conclusions
• Data science is a new information science that
serves as an umbrella for data creation,
manipulation, analysis and research
• Data scientists are in high demand and it will
take years before we can educate enough
scientists to meet the demand
• Data science is a team sport; it will require teams
with individual skill sets to accomplish robust
data science
• New tools such as Watson and WEKA likely
represent the beginning of analysis automation
65. Conclusions
• I encourage everyone to increase their
knowledge in data science areas, such as
predictive analytics
• There are a myriad of free and affordable
courses now available online (mentioned in my
blog and on the resource page)
• I encourage academic centers and HIT vendors
to expand their data science offerings at multiple
levels
1 Data Science Association. www.datascienceassn.org Accessed September 12, 2106
2 Analytics. Wikipedia. www.wikipedia.org Accessed January 16, 2016
3 Wegwarth O, Schwartz LM, Woloshin S et al. Do physicians understand cancer screening statistics? A national survey of primary care physicians in the United States. Ann Intern Med 2012;156(5):340-9
4 Manrai AK, Bhatia G, Strymish J et al. Medicine’s uncomfortable relationship with math. Research Letter. June 2014. JAMA Intern Med 2014;174(6):991-993
IBM Big Data and Analytics Hub http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters
6 ONC definition of learning health system. Connecting Health and Care for the Nation. A Shared Nationwide Interoperability Roadmap. October 2015
7 National Research Council. Towards Precision Medicine: Building a Network for Biomedical Research and a new Taxonomy of Disease. National Academies Press. 2011
Gartner IT Glossary: http://www.gartner.com/it-glossary/predictive-analytics/
Desautels T, Calvert J, Hoffman J et al. Prediction of sepsis in the ICU with minimal EHR data: a machine learning approach. JMIR Medical Informatics 2016;4(3):e28
Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: a systematic review. PLOS Medicine. November 20, 2012. Journals.plos.org
Most hospitals face 30 day readmissions penalty in fiscal 2016. August 3, 2015 www.modernhealthcare.com
Amarasingham R, Patel P, Tolo K et al. Allocating scarce resources in real-time to reduce heart failure readmissions: a prospective controlled trial. BMJ Quality Safety. July 31 2013
Stanton M. The high concentration of US Health Care expenditures. Research in Action. Issue 19. 2002. AHRQ Archive. https://archive.ahrq.gov
Chechulin Y, Nazerian A, Rais S. Predicting patients with high risk of becoming high cost healthcare users in Ontario. Healthcare Policy. 2014;9(3):68-79
From Doing Data Science by O’Neill and Schutt. O’Reilly Media. 2014
WEKA: http://www.cs.waikato.ac.nz/ml/weka/
Pentaho Community www.community.pentaho.com
RapidMiner Community www.community.rapidminer.com
KNIME www.knime.org
Orange data mining www.orange.biolab.si
C-statistic (used to compare logistic regression models): The probability that predicting the outcome is better than chance. Used to compare the goodness of fit of logistic regression models, values for this measure range from 0.5 to 1.0. A value of 0.5 indicates that the model is no better than chance at making a prediction of membership in a group and a value of 1.0 indicates that the model perfectly identifies those within a group and those not. Models are typically considered reasonable when the C-statistic is higher than 0.7 and strong when C exceeds 0.8 (Hosmer & Lemeshow, 2000; Hosmer & Lemeshow, 1989). http://mchp-appserv.cpe.umanitoba.ca/viewDefinition.php?definitionID=104234
Area under the curve: based on prediction rules with true positives plotted against false positives (1-specificity). The closer to 1, the better. 0.5 is essentially worthless. http://gim.unmc.edu/dxtests/roc3.htm
Manyika J, Chui M, Brown B. Et al. Big Data: The Next Frontier for Innovation, Competition and Productivity. http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
Krzysztof JC, Moore GW. Uniqueness of medical data mining. Art Int Med 2002;26:1-24
Press G. Cleaning big data: most time consuming, least enjoyable data science task, survey says. Forbes. March 23 2016 www.forbes.com
Jacobsen RM, Isham GJ, Rutten LJF. Population Health as a means for health care organizations to deliver value. Mayo Clinic Proceedings November 2015;90(11):1465-1470
Jakicic JM, David KK, Rogers RJ et al. Effect of wearable technology combined with lifestyle intervention on increases long term weight loss. The IDEA RCT.JAMA 2016:316(11):1161-1171
Parikh RB, Obermeyer Z, Bates DW. Making predictive analytics a routine part of patient care. Harvard Business Review. April 21, 2016. https://hbr.org