SlideShare a Scribd company logo
1 of 31
Download to read offline
Introduction to Data
Science and Analytics
Summer School 2015
Srinath Perera
VP Research
WSO2 Inc.
What is Data Science?
Extraction of
knowledge from large
volumes of data that are
structured or
unstructured.
It is a continuation of the
fields data mining and
predictive analytics
Data Science Pipeline
Example ( Road.lk) traffic Feed
1. Data as tweets
2. Extract time,
location, and traffic
level using NLP
3. Explore data
4. Model based on
time, and it is a
holiday
5. Predict traffic given
a time and location.
Real data is messay, often needs to cleaned up before
useful.
o Bad formats - ignore or treat like missing data
o Missing Data - extrapolate or remove data line
o Useless variables - remove
o Wrong data - e.g. aaa, bbb, joe, some might be
deliberate lie, or 99 may be a code for N/A
Data Cleanup
o Transform variables ( date formats, String to int)
o Create derived variables
o Derive country from IP
o age from ID card number
o Normalize strings
o e.g. stemm or use phonetic sounds
o different spellings and nicknames ( William->Bill)
o Feature value rescaling (e.g. most ML algorithms
needs value to rescaled to 0-1 range).
o Enrich (e.g. lookup and add age from profile)
Data Cleanup (Contd.)
Understand, and get a feel for what is expected
(models => densities, constraints) and
unexpected/ residuals (errors, outliers)
o think what this is data about? domain, background,
how it is collected, what each fields mean and range
of values.
o head, tail, count, all descriptives (Mean, Max, median,
percentiles .. ) - Five number Summary. Min. 1st Qu.
Median Mean 3rd Qu. Max.
o run a bunch of count/group-by statements to gauge if
I think it's corrupt.
Data Exploration
o Plot - take random sample and explore ( scatter plot)
o e.g. Draw scatter plot or Trellis Plot
o Find Dependencies between fields
o Calculate Correlation
o Dimensionality reduction
o Cluster and look visualize clusters
o Look at frequency distribution of each field and try to
find a known distribution if possible.
Data Exploration (Contd.)
Data Exploration (Contd.)
Feature Engineering
o Feature engineering is the art of finding feature that leads
simplest decision algorithm. ( Good features allow a
simple model to beat a complex model.)
o Best features may be a subset, or a combination, or
transformed version of the features.
How to do Feature Engineering?
o Manually pick by domain experts and trial and error.
o Search the possible combinations by training and
combining subsets (e.g. Random Forest)
o Use statistical concepts like correlation and
information criteria
o Reduce the features to a low dimension space using
techniques like PCA.
o Automatic Feature Learning though Deep Learning
o ...
Analysis
o Goal of analysis is to extract knowledge
o This knowledge usually come in one of the two forms
o KPI (Key Performance Indicators)
■ Describe key measurement for what is being
measured. (e.g. revenue per year, profit margin,
revenue for sqft in retail, revenue per employer)
o Models to describe or predict the data
■ e.g. Machine Learning models or Statistical models
4 Analysis types by time to decision
o Hindsight ( what happened?)
o Done using Batch Analytics like MapReduce
o Oversight ( what is happening?)
o Done using Realtime Analytics technologies like CEP
o Insight ( why things happening?)
o Done with Data Mining and Unsupervised learning
algorithms like Clustering
o Foresight ( what will happen?)
o Done by building models using Machine learning or
one of other techniques
Data Analytics Tools Landscape
Batch Analytics: SparkSQL
Realtime Analytics: Complex Event
Processing
Interactive Analytics
o Define Indexes on Collected
data ( Streams)
o Issue, dynamic queries and get
results right away. ( Powered
by Apache Lucene)
o Shows multiples events from
same activity together using
custom defined activity IDs
o Useful for data exploration
o Powered by Apache Lucene,
with support for Index
Sharding
Predictive Analytics
o Build models and use them
with WSO2 CEP, BAM and
ESB using WSO2 Machine
Learner Product ( 2015 Q3)
o Build model using R, export
them as PMML, and use
within WSO2 CEP
WSO2 Machine Learner
o Sample, explore, and
understand data
through visualizations
o A wizard to configure,
train machine learning
models, and select the
best model
o Find and use those
models with WSO2
CEP, BAM and ESB
o Powered by Apache
Spark MLLib
Building Decision Models
A model describe how a system behave when inputs
changes. There are many ways to build models.
see https://icrunchdatanews.com/what-are-predictive-models/
o Regression models and ML Models
Time series models
o Statistical models
o Physical Models - based on physical
phenomena. They include 6-DoF flight
models, space flight models Weather
models.
o Mathematical Models
Verification
o All is good, now you have a
model. You must verify that it
is correct before using it in the
real world.
o Prediction can be verified by
waiting for events to occur
o Relationships like causality (e.g.
having free shipping leads a
customer to buy more) must be
verified with A/B testing
o Let’s look at few of pitfalls
Pitfalls: Experiment vs Observation
o If you follow scientific method,
you would do experiments, and
they have control sets ( A/B) tests.
o Bigdata does not have a control set,
it is rather observations. ( we
observe the world as it happens)
o So what we can tell are limited.
o Correlation does not imply
Causality!!
o Send a book home example [1]
o All big buyers have free
shipping
Causality: What can we do?
o Option 1: We can act on
correlation if we can verify the
guess or if correctness is not
critical (Start Investigation,
Check for a disease, Marketing )
o Option 2: We verify correlations
using A/B testing or propensity
analysis
http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-facebook-today, Pic from http:
//www.phibetaiota.net/2011/09/defdog-the-importance-of-selection-bias-in-statistics/
Pitfalls: Think about the Missing Data
o WW II, Returned Aircrafts and data on where they
were hit?
o How would you add Armour?
Abraham
Wald
o Dashboard give an “Overall idea” in a glance (e.g. car
dashboard)
o Support for personalization, you can build your own dashboard.
o Also the entry point for Drill down
o How to build?
o WSO2 DAS supports a gadget generation WIzard
o Or you can write your own Gadgets using D3 and Javascript.
Communicate: Dashboards
Communicate: Alerts
o Detecting conditions can
be done via CEP Queries.
Key is the “Last Mile”.
o Email
o SMS
o Push notifications to a
UI
o Pager
o Trigger physical Alarm
o How?
o Select Email sender “Output Adaptor” from CEP, or
send from CEP to ESB, and ESB has lot of
connectors
o How?
o Write data to a database from CEP event tables
o Build Services via WSO2 Data Service
o Expose them as APIs via API Manager
Communicate: APIs
o With mobile Apps, most data
are exposed and shared as
APIs (REST/Json ) to end
users.
o Need to expose analytics
results as API
o Following are some challenges
o Security and Permissions
o API Discovery, Billing,
throttling, quotas & SLA
Communicate: Realtime Soccer Analytics
Watch at: https://www.youtube.com/watch?
v=nRI6buQ0NOM
Data Science Pipeline
Conclusion
o Data Science is extracting
knowledge by analyzing
data
o Discussed the pipeline and
tools you can use to do
that
o Rest of summer school
will look at different
aspects in detail.
o All tools discussed are
available free under
Apache Licence.

More Related Content

What's hot

Data Modeling for Microservices with Cassandra and Spark
Data Modeling for Microservices with Cassandra and SparkData Modeling for Microservices with Cassandra and Spark
Data Modeling for Microservices with Cassandra and SparkJeffrey Carpenter
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, HadoopSenthil Pandurangan
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream ProcessingJorge Hirtz
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseMongoDB
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speeddanpotterdwch
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 

What's hot (20)

Data Modeling for Microservices with Cassandra and Spark
Data Modeling for Microservices with Cassandra and SparkData Modeling for Microservices with Cassandra and Spark
Data Modeling for Microservices with Cassandra and Spark
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speed
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 

Viewers also liked

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Top Data Science Trends for 2015
Top Data Science Trends for 2015Top Data Science Trends for 2015
Top Data Science Trends for 2015VMware Tanzu
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
Data Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceData Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceMichael Roytman
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...VMware Tanzu
 
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScienceNAVER D2
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterpriseVMware Tanzu
 
저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다eungjin cho
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science VMware Tanzu
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware DetectionVMware Tanzu
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data ScienceSean Taylor
 
The Science of a Great Career in Data Science
The Science of a Great Career in Data ScienceThe Science of a Great Career in Data Science
The Science of a Great Career in Data ScienceKate Matsudaira
 
Data Science in Digital Health
Data Science in Digital HealthData Science in Digital Health
Data Science in Digital HealthNeal Lathia
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
[FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview [FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview chanyoonkim
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasTaegyun Jeon
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAmazon Web Services
 

Viewers also liked (20)

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Top Data Science Trends for 2015
Top Data Science Trends for 2015Top Data Science Trends for 2015
Top Data Science Trends for 2015
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
Data Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceData Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data Science
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
 
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 
저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
The Science of a Great Career in Data Science
The Science of a Great Career in Data ScienceThe Science of a Great Career in Data Science
The Science of a Great Career in Data Science
 
Data Science in Digital Health
Data Science in Digital HealthData Science in Digital Health
Data Science in Digital Health
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
[FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview [FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
 
What Is the Future of Data Sharing?
What Is the Future of Data Sharing?What Is the Future of Data Sharing?
What Is the Future of Data Sharing?
 

Similar to Wso2datasciencesummerschool20151 150714180825-lva1-app6892

Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learningPramit Choudhary
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science ProcessVishal Patel
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
Machine learning 101
Machine learning 101Machine learning 101
Machine learning 101AmmarChalifah
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdfSunilMatsagar1
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsSriskandarajah Suhothayan
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer Jim Czuprynski
 
Business Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleBusiness Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleSongtao Guo
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 

Similar to Wso2datasciencesummerschool20151 150714180825-lva1-app6892 (20)

Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Machine learning 101
Machine learning 101Machine learning 101
Machine learning 101
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdf
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer
 
Business Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleBusiness Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at Scale
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 

More from WSO2

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in ChoreoWSO2
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023WSO2
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzureWSO2
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfWSO2
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in MinutesWSO2
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityWSO2
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...WSO2
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfWSO2
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoWSO2
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsWSO2
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital BusinessesWSO2
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)WSO2
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformationWSO2
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesWSO2
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready BankWSO2
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIsWSO2
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native DeploymentWSO2
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”WSO2
 

More from WSO2 (20)

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in Choreo
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on Azure
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdf
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos Identity
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdf
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing Choreo
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected Products
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital Businesses
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformation
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking Experiences
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready Bank
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
 

Wso2datasciencesummerschool20151 150714180825-lva1-app6892

  • 1. Introduction to Data Science and Analytics Summer School 2015 Srinath Perera VP Research WSO2 Inc.
  • 2. What is Data Science? Extraction of knowledge from large volumes of data that are structured or unstructured. It is a continuation of the fields data mining and predictive analytics
  • 4. Example ( Road.lk) traffic Feed 1. Data as tweets 2. Extract time, location, and traffic level using NLP 3. Explore data 4. Model based on time, and it is a holiday 5. Predict traffic given a time and location.
  • 5. Real data is messay, often needs to cleaned up before useful. o Bad formats - ignore or treat like missing data o Missing Data - extrapolate or remove data line o Useless variables - remove o Wrong data - e.g. aaa, bbb, joe, some might be deliberate lie, or 99 may be a code for N/A Data Cleanup
  • 6. o Transform variables ( date formats, String to int) o Create derived variables o Derive country from IP o age from ID card number o Normalize strings o e.g. stemm or use phonetic sounds o different spellings and nicknames ( William->Bill) o Feature value rescaling (e.g. most ML algorithms needs value to rescaled to 0-1 range). o Enrich (e.g. lookup and add age from profile) Data Cleanup (Contd.)
  • 7. Understand, and get a feel for what is expected (models => densities, constraints) and unexpected/ residuals (errors, outliers) o think what this is data about? domain, background, how it is collected, what each fields mean and range of values. o head, tail, count, all descriptives (Mean, Max, median, percentiles .. ) - Five number Summary. Min. 1st Qu. Median Mean 3rd Qu. Max. o run a bunch of count/group-by statements to gauge if I think it's corrupt. Data Exploration
  • 8. o Plot - take random sample and explore ( scatter plot) o e.g. Draw scatter plot or Trellis Plot o Find Dependencies between fields o Calculate Correlation o Dimensionality reduction o Cluster and look visualize clusters o Look at frequency distribution of each field and try to find a known distribution if possible. Data Exploration (Contd.)
  • 10. Feature Engineering o Feature engineering is the art of finding feature that leads simplest decision algorithm. ( Good features allow a simple model to beat a complex model.) o Best features may be a subset, or a combination, or transformed version of the features.
  • 11. How to do Feature Engineering? o Manually pick by domain experts and trial and error. o Search the possible combinations by training and combining subsets (e.g. Random Forest) o Use statistical concepts like correlation and information criteria o Reduce the features to a low dimension space using techniques like PCA. o Automatic Feature Learning though Deep Learning o ...
  • 12. Analysis o Goal of analysis is to extract knowledge o This knowledge usually come in one of the two forms o KPI (Key Performance Indicators) ■ Describe key measurement for what is being measured. (e.g. revenue per year, profit margin, revenue for sqft in retail, revenue per employer) o Models to describe or predict the data ■ e.g. Machine Learning models or Statistical models
  • 13. 4 Analysis types by time to decision o Hindsight ( what happened?) o Done using Batch Analytics like MapReduce o Oversight ( what is happening?) o Done using Realtime Analytics technologies like CEP o Insight ( why things happening?) o Done with Data Mining and Unsupervised learning algorithms like Clustering o Foresight ( what will happen?) o Done by building models using Machine learning or one of other techniques
  • 14. Data Analytics Tools Landscape
  • 15.
  • 17. Realtime Analytics: Complex Event Processing
  • 18. Interactive Analytics o Define Indexes on Collected data ( Streams) o Issue, dynamic queries and get results right away. ( Powered by Apache Lucene) o Shows multiples events from same activity together using custom defined activity IDs o Useful for data exploration o Powered by Apache Lucene, with support for Index Sharding
  • 19. Predictive Analytics o Build models and use them with WSO2 CEP, BAM and ESB using WSO2 Machine Learner Product ( 2015 Q3) o Build model using R, export them as PMML, and use within WSO2 CEP
  • 20. WSO2 Machine Learner o Sample, explore, and understand data through visualizations o A wizard to configure, train machine learning models, and select the best model o Find and use those models with WSO2 CEP, BAM and ESB o Powered by Apache Spark MLLib
  • 21. Building Decision Models A model describe how a system behave when inputs changes. There are many ways to build models. see https://icrunchdatanews.com/what-are-predictive-models/ o Regression models and ML Models Time series models o Statistical models o Physical Models - based on physical phenomena. They include 6-DoF flight models, space flight models Weather models. o Mathematical Models
  • 22. Verification o All is good, now you have a model. You must verify that it is correct before using it in the real world. o Prediction can be verified by waiting for events to occur o Relationships like causality (e.g. having free shipping leads a customer to buy more) must be verified with A/B testing o Let’s look at few of pitfalls
  • 23. Pitfalls: Experiment vs Observation o If you follow scientific method, you would do experiments, and they have control sets ( A/B) tests. o Bigdata does not have a control set, it is rather observations. ( we observe the world as it happens) o So what we can tell are limited. o Correlation does not imply Causality!! o Send a book home example [1] o All big buyers have free shipping
  • 24. Causality: What can we do? o Option 1: We can act on correlation if we can verify the guess or if correctness is not critical (Start Investigation, Check for a disease, Marketing ) o Option 2: We verify correlations using A/B testing or propensity analysis
  • 25. http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-facebook-today, Pic from http: //www.phibetaiota.net/2011/09/defdog-the-importance-of-selection-bias-in-statistics/ Pitfalls: Think about the Missing Data o WW II, Returned Aircrafts and data on where they were hit? o How would you add Armour? Abraham Wald
  • 26. o Dashboard give an “Overall idea” in a glance (e.g. car dashboard) o Support for personalization, you can build your own dashboard. o Also the entry point for Drill down o How to build? o WSO2 DAS supports a gadget generation WIzard o Or you can write your own Gadgets using D3 and Javascript. Communicate: Dashboards
  • 27. Communicate: Alerts o Detecting conditions can be done via CEP Queries. Key is the “Last Mile”. o Email o SMS o Push notifications to a UI o Pager o Trigger physical Alarm o How? o Select Email sender “Output Adaptor” from CEP, or send from CEP to ESB, and ESB has lot of connectors
  • 28. o How? o Write data to a database from CEP event tables o Build Services via WSO2 Data Service o Expose them as APIs via API Manager Communicate: APIs o With mobile Apps, most data are exposed and shared as APIs (REST/Json ) to end users. o Need to expose analytics results as API o Following are some challenges o Security and Permissions o API Discovery, Billing, throttling, quotas & SLA
  • 29. Communicate: Realtime Soccer Analytics Watch at: https://www.youtube.com/watch? v=nRI6buQ0NOM
  • 31. Conclusion o Data Science is extracting knowledge by analyzing data o Discussed the pipeline and tools you can use to do that o Rest of summer school will look at different aspects in detail. o All tools discussed are available free under Apache Licence.