SlideShare a Scribd company logo
Introduction to
Data Science
- Big Data & Data Analytics -
Yasas Senarath
Graduate Assistant Researcher at DataSEARCH
University of Moratuwa
Outline
● Introduction to Big Data and Data Science
● Data Driven Decision Making / D3M
● Importance of Big Data in Telehealth Services
● Data to Knowledge Process
● Techniques and Tools
Data is the new science.
Big data holds the answers.
-Pat Gelsinger, CEO, VMware
What is Big Data?
Big data is a term used to refer to data sets
that are too large or complex for traditional
data-processing application software to
adequately deal with.
--Wikipedia
“
● Attributes that define big data (the 4 V’s)
How to identify Big Data?
Volume Velocity
Variety Veracity
● Mobile Devices
● Internet of Things (IOT)
● Social Media
● Satellite Imagery
Where does Big Data come from?
<iframe width="640" height="480" src="https://ytcropper.com/embed/W_5c4d6496b5141/loop/noautoplay/"
frameborder="0" allowfullscreen></iframe><a href="/" target="_blank">via ytCropper</a>
● Emerging Discipline
● No exact definition (Different definitions exist from different
perspectives)
Data Science
†National Institute of Standards and Technology
Data science is the empirical synthesis of
actionable knowledge from raw data
through the data lifecycle process
-NIST†
“
Why Data Science?
● Exact new values, Insights and Hypothesis
● Derive new knowledge from existing data
● Understand customers’ behaviour
● Facilitate the demand market to suppliers
● Build Recommender systems
● Build predictive systems.
Data-driven decision making (DDDM)
involves making decisions that are
backed up by hard data rather than
making decisions that are intuitive or
based on observation alone.
MIT Sloan School of Management
professors Andrew McAfee and Erik
Brynjolfsson explain in a Wall Street
Journal article that companies that
were mostly data-driven had 4%
higher productivity and 6% higher
profits than the average.
Data-driven decision making (DDDM/D3M)
4 Stages of Data Analytics Maturity
Big Data in Telehealth Services
● Predict Admission Rates
○ Big data is helping to solve this problem, at least at a few hospitals in Paris
○ A Forbes article† details how four hospitals which are part of the Assistance
Publique-Hôpitaux de Paris have been using data from a variety of sources to
come up with daily and hourly predictions of how many patients are expected
to be at each hospital
● Electronic Health Records (EHRs)
○ Trigger warnings and reminders when a patient should get a new lab test or
track prescriptions to see if a patient has been following doctors’ orders
○ Hospitals adopting EHR?
†https://bit.ly/2FSzTZk
Big Data in Telehealth Services
● Real-Time Alerting
○ Wearables will collect data from patients and send this data to the cloud
○ React every time the results will be disturbing
Send data
periodically
Alert the
Doctor
Administer
measures
Analize
Better
Treatment
Plans
Big Data in Telehealth Services
● Patient Satisfaction Monitoring
○ Collect data on sentiment of the patient on Doctor / Hospital
○ For example,
■ Whether the doctor explained the treatment understandably
■ Whether the patient had confidence and trust in the treating physician
○ Analyze and use it to improve the quality of health services
● Minimizing Waiting Time
○ Predict the time patient should be available to the doctor
Big Data in Business
● Sentiment / Opinion Analysis
○ Analyze Social Media Posts and forums
○ Learn how customers feel about your products
○ Give attention where required
● Understanding, Targeting And Serving Customers
○ Analize usage patterns and understand the customer base (Eg: demographic)
○ Targeted Advertising
○ Improved service
Data to Knowledge Process
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Data
Acquisition
Data Storage
Data Cleaning
● Electronic Medical
Records (EMRs)
● User-generated data
(Fitbit, iWatch)
● Doctor Channelling
Records
● System Logs
● Patient Details
...
● Data acquisition and data
formats Privacy and
ethical issues
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Data
Acquisition
Data Storage
Data Cleaning
● Big Data
● CSV, TSV,XL
● Databases (MySQL,
NoSQL)
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Data
Acquisition
Data Storage
Data Cleaning
● Missing Values
● Outliers
● Human Error
● Machine Error
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Exploratory
Data Analysis
Dependency
and
Relationship
Machine
Learning
● Descriptive Statistics
● Clustering
● Looking for patterns
● Hypothesis testing
● Data tendency
● Groups, subgroups
● Looking for abnormality
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Exploratory
Data Analysis
Dependency
and
Relationship
Machine
Learning
● Association
- Do changes in X (seem to)
coincide with changes in Y?
● Correlation
- How to quantify the
association between X and Y?
● Agreement
- Do X and Y agre?
● Causation
- Do changes in X cause
changes in Y?
Data to Knowledge Process [contd...]
Data
Manipulation
Analytics
Communication
& Visualization
Exploratory
Data Analysis
Dependency
and
Relationship
Machine
Learning
Techniques and Tools
● Plotting (Scatter Plot, Bar Chart)
● Correlation
○ Pearson’s correlation
○ Spearman’s rank correlation
● Agreement
○ Cohen’s Kappa Coefficient
● Regression
○ Linear Regression
○ Logistic Regression
● Classification
○ SVM
○ Decision Trees
● Java/ Scala (Production)
○ Apache Hadoop (Distributed
Computing)
○ Apache Spark (Unified Analytics
Engine)
● Python (Research / Production)
○ Scikit-Learn (Machine Lear)
○ Keras (Deep Learning)
● Weka Package (Beginner)
Q & A
Hiding within those mounds of data is
knowledge that could change the life of a
patient, or change the world.
--Atul Butte, Stanford
“
ysenarath wayasas wayasas ypsenarath
Challenges
● Privacy and Security
● Data collection and management
○ Complex Data
○ Noisy Data
○ Distributed Data
○ Data Integration
● Performance
● Background Knowledge

More Related Content

What's hot

Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
Scottperrone
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
priya jain
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
PromptCloud
 
Data mining
Data miningData mining
Data mining
nandini patil
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
Seval Çapraz
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
Seerat Malik
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
IJMER
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
Asma CHERIF
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
tobiemuir
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
Basit Rafiq
 
Data mining
Data mining Data mining
Data mining
sayalipatil528
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
Sanzid Kawsar
 
Data analytics
Data analyticsData analytics
Data analytics
Dr.Bhuvaneswari Velumani
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Data mining
Data miningData mining
Data mining
Alisha Korpal
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
Krishan Pareek
 
Some Questions About Your Data
Some Questions About Your DataSome Questions About Your Data
Some Questions About Your Data
Damian T. Gordon
 
Data mining
Data miningData mining
Data mining
Murniana Shazwen
 

What's hot (19)

Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
 
Data mining
Data miningData mining
Data mining
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Data mining
Data mining Data mining
Data mining
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data mining
Data miningData mining
Data mining
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Some Questions About Your Data
Some Questions About Your DataSome Questions About Your Data
Some Questions About Your Data
 
Data mining
Data miningData mining
Data mining
 

Similar to Data science / Big Data

Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
VMware Tanzu
 
'Big Data Little Disease' - OBH and Big Data Partnership
'Big Data Little Disease' - OBH and Big Data Partnership'Big Data Little Disease' - OBH and Big Data Partnership
'Big Data Little Disease' - OBH and Big Data Partnership
Health Innovation Wessex
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Aridhia Informatics Ltd
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
DATA360US
 
4D Geospatial Analytics in Digital Healthcare PDF
4D Geospatial Analytics in Digital Healthcare PDF4D Geospatial Analytics in Digital Healthcare PDF
4D Geospatial Analytics in Digital Healthcare PDF
Nigel Tebbutt 奈杰尔 泰巴德
 
CTMS Data Migration by Krishnaveni Rapuru
CTMS Data Migration  by Krishnaveni RapuruCTMS Data Migration  by Krishnaveni Rapuru
CTMS Data Migration by Krishnaveni Rapuru
MuraliRaj M
 
Decentralized Clinical Trials
Decentralized Clinical TrialsDecentralized Clinical Trials
Decentralized Clinical Trials
PCE121
 
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
IJET - International Journal of Engineering and Techniques
 
Intel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analyticsIntel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analytics
Carestream
 
Big Data Challenges and solutions.pptx
 Big Data Challenges and solutions.pptx Big Data Challenges and solutions.pptx
Big Data Challenges and solutions.pptx
jawaria11
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
MapR Technologies
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
Muhammad Rumman Islam Nur
 
Lecture 6_Data acquisition.pptx power points
Lecture 6_Data acquisition.pptx power pointsLecture 6_Data acquisition.pptx power points
Lecture 6_Data acquisition.pptx power points
Josephmwanika
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
PrabhaJoshi4
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
UsmanAliyuAminu
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
NUS-ISS
 
Data mining
Data mining Data mining
dataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnndataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnn
jainutkarsh078
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptx
Shambhavi Vats
 

Similar to Data science / Big Data (20)

Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
 
'Big Data Little Disease' - OBH and Big Data Partnership
'Big Data Little Disease' - OBH and Big Data Partnership'Big Data Little Disease' - OBH and Big Data Partnership
'Big Data Little Disease' - OBH and Big Data Partnership
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
4D Geospatial Analytics in Digital Healthcare PDF
4D Geospatial Analytics in Digital Healthcare PDF4D Geospatial Analytics in Digital Healthcare PDF
4D Geospatial Analytics in Digital Healthcare PDF
 
CTMS Data Migration by Krishnaveni Rapuru
CTMS Data Migration  by Krishnaveni RapuruCTMS Data Migration  by Krishnaveni Rapuru
CTMS Data Migration by Krishnaveni Rapuru
 
Decentralized Clinical Trials
Decentralized Clinical TrialsDecentralized Clinical Trials
Decentralized Clinical Trials
 
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
 
Intel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analyticsIntel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analytics
 
Big Data Challenges and solutions.pptx
 Big Data Challenges and solutions.pptx Big Data Challenges and solutions.pptx
Big Data Challenges and solutions.pptx
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Lecture 6_Data acquisition.pptx power points
Lecture 6_Data acquisition.pptx power pointsLecture 6_Data acquisition.pptx power points
Lecture 6_Data acquisition.pptx power points
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Data mining
Data mining Data mining
Data mining
 
dataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnndataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnn
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptx
 

More from Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
Yasas Senarath
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
Yasas Senarath
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Yasas Senarath
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
Yasas Senarath
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
Yasas Senarath
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
Yasas Senarath
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Yasas Senarath
 

More from Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Data science / Big Data

  • 1. Introduction to Data Science - Big Data & Data Analytics - Yasas Senarath Graduate Assistant Researcher at DataSEARCH University of Moratuwa
  • 2. Outline ● Introduction to Big Data and Data Science ● Data Driven Decision Making / D3M ● Importance of Big Data in Telehealth Services ● Data to Knowledge Process ● Techniques and Tools
  • 3. Data is the new science. Big data holds the answers. -Pat Gelsinger, CEO, VMware
  • 4. What is Big Data? Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. --Wikipedia “
  • 5. ● Attributes that define big data (the 4 V’s) How to identify Big Data? Volume Velocity Variety Veracity
  • 6. ● Mobile Devices ● Internet of Things (IOT) ● Social Media ● Satellite Imagery Where does Big Data come from?
  • 7. <iframe width="640" height="480" src="https://ytcropper.com/embed/W_5c4d6496b5141/loop/noautoplay/" frameborder="0" allowfullscreen></iframe><a href="/" target="_blank">via ytCropper</a>
  • 8. ● Emerging Discipline ● No exact definition (Different definitions exist from different perspectives) Data Science †National Institute of Standards and Technology Data science is the empirical synthesis of actionable knowledge from raw data through the data lifecycle process -NIST† “
  • 9. Why Data Science? ● Exact new values, Insights and Hypothesis ● Derive new knowledge from existing data ● Understand customers’ behaviour ● Facilitate the demand market to suppliers ● Build Recommender systems ● Build predictive systems.
  • 10. Data-driven decision making (DDDM) involves making decisions that are backed up by hard data rather than making decisions that are intuitive or based on observation alone. MIT Sloan School of Management professors Andrew McAfee and Erik Brynjolfsson explain in a Wall Street Journal article that companies that were mostly data-driven had 4% higher productivity and 6% higher profits than the average. Data-driven decision making (DDDM/D3M)
  • 11. 4 Stages of Data Analytics Maturity
  • 12. Big Data in Telehealth Services ● Predict Admission Rates ○ Big data is helping to solve this problem, at least at a few hospitals in Paris ○ A Forbes article† details how four hospitals which are part of the Assistance Publique-Hôpitaux de Paris have been using data from a variety of sources to come up with daily and hourly predictions of how many patients are expected to be at each hospital ● Electronic Health Records (EHRs) ○ Trigger warnings and reminders when a patient should get a new lab test or track prescriptions to see if a patient has been following doctors’ orders ○ Hospitals adopting EHR? †https://bit.ly/2FSzTZk
  • 13. Big Data in Telehealth Services ● Real-Time Alerting ○ Wearables will collect data from patients and send this data to the cloud ○ React every time the results will be disturbing Send data periodically Alert the Doctor Administer measures Analize Better Treatment Plans
  • 14. Big Data in Telehealth Services ● Patient Satisfaction Monitoring ○ Collect data on sentiment of the patient on Doctor / Hospital ○ For example, ■ Whether the doctor explained the treatment understandably ■ Whether the patient had confidence and trust in the treating physician ○ Analyze and use it to improve the quality of health services ● Minimizing Waiting Time ○ Predict the time patient should be available to the doctor
  • 15. Big Data in Business ● Sentiment / Opinion Analysis ○ Analyze Social Media Posts and forums ○ Learn how customers feel about your products ○ Give attention where required ● Understanding, Targeting And Serving Customers ○ Analize usage patterns and understand the customer base (Eg: demographic) ○ Targeted Advertising ○ Improved service
  • 16. Data to Knowledge Process
  • 17. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Data Acquisition Data Storage Data Cleaning ● Electronic Medical Records (EMRs) ● User-generated data (Fitbit, iWatch) ● Doctor Channelling Records ● System Logs ● Patient Details ... ● Data acquisition and data formats Privacy and ethical issues
  • 18. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Data Acquisition Data Storage Data Cleaning ● Big Data ● CSV, TSV,XL ● Databases (MySQL, NoSQL)
  • 19. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Data Acquisition Data Storage Data Cleaning ● Missing Values ● Outliers ● Human Error ● Machine Error
  • 20. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Exploratory Data Analysis Dependency and Relationship Machine Learning ● Descriptive Statistics ● Clustering ● Looking for patterns ● Hypothesis testing ● Data tendency ● Groups, subgroups ● Looking for abnormality
  • 21. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Exploratory Data Analysis Dependency and Relationship Machine Learning ● Association - Do changes in X (seem to) coincide with changes in Y? ● Correlation - How to quantify the association between X and Y? ● Agreement - Do X and Y agre? ● Causation - Do changes in X cause changes in Y?
  • 22. Data to Knowledge Process [contd...] Data Manipulation Analytics Communication & Visualization Exploratory Data Analysis Dependency and Relationship Machine Learning
  • 23. Techniques and Tools ● Plotting (Scatter Plot, Bar Chart) ● Correlation ○ Pearson’s correlation ○ Spearman’s rank correlation ● Agreement ○ Cohen’s Kappa Coefficient ● Regression ○ Linear Regression ○ Logistic Regression ● Classification ○ SVM ○ Decision Trees ● Java/ Scala (Production) ○ Apache Hadoop (Distributed Computing) ○ Apache Spark (Unified Analytics Engine) ● Python (Research / Production) ○ Scikit-Learn (Machine Lear) ○ Keras (Deep Learning) ● Weka Package (Beginner)
  • 24. Q & A Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. --Atul Butte, Stanford “ ysenarath wayasas wayasas ypsenarath
  • 25. Challenges ● Privacy and Security ● Data collection and management ○ Complex Data ○ Noisy Data ○ Distributed Data ○ Data Integration ● Performance ● Background Knowledge

Editor's Notes

  1. Data Veracity, uncertain or imprecise data. Data veracity is the degree to whichdata is accurate, precise and trusted. Data is often viewed as certain and reliable. The reality of problem spaces, data sets and operational environments is that data is often uncertain, imprecise and difficult to trust. The following are illustrative examples of data veracity.
  2. National Institute of Standards and Technology
  3. https://www.wsj.com/articles/SB10001424052748704547804576260781324726782#articleTabs%3Darticle
  4. https://www.gartner.com/doc/3818364/itscore-data-analytics Understanding of a situation or event only after it has happened or developed
  5. https://www.datapine.com/blog/big-data-examples-in-healthcare/
  6. https://www.datapine.com/blog/big-data-examples-in-healthcare/ identify asthma trends both on an individual level and looking at larger populations
  7. DS: Descriptive statistics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related comorbidities, etc.
  8. scatter plots Pearson’s correlation coefficient for two MC data types (assumed normal), Spearman’s rank correlation coefficient for either or both variables is ordinal (not assume normal) Cohen’s kappa coefficient Linear regression, Structural equation modelling.
  9. scatter plots Pearson’s correlation coefficient for two MC data types (assumed normal), Spearman’s rank correlation coefficient for either or both variables is ordinal (not assume normal) Cohen’s kappa coefficient Linear regression, Structural equation modelling.
  10. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.