SlideShare a Scribd company logo
1 of 63
Data science andbusiness analytics
Dr.M.Inbavalli
Vice Principal & Head Research Department of Computer Science
Marudhar Kesari Jain College for Women
Vaniyambadi-635751
Overview
• Evolution of Data
• Data Science
• Business Analytics
• Applications
• AI, ML, DL, Data science – Relationship
• Tools for Data Science
• Life cycle of data science with case study
• Algorithms for Data Science
• Data Science Research Areas
• Future of Data Science
Data All Around
• Data has become the most abundant thing today
• Explosion of data, in pretty much every domain
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Financial transactions, bank/credit transactions
• Online trading and purchasing
• Social Network
•Data All Around
• Sensing devices and sensor networks that can monitor everything 24/7 from
temperature to pollution to vital signs
• Increasingly sophisticated smart phones
• Internet, social networks makes it easy to publish data
• Scientific experiments and simulations produce astronomical volumes of data
• Internet of Things(IOT)
• Dataification: taking all aspects of life and turning them into data (e.g., what you
like/enjoy has been turned into a stream of your "likes")
• Data Science – Why all the excitement?
• How Much Data Do We have?
• Data volumes expected to get much worse
• Over 2.5 quintillion bytes of data are created every single day.
How Much Data Do We have?
What can you do with the Traffic Prediction data?
9
Crowdsourcing + physical modeling + sensing + data assimilation
From Institute for Transportation Studies
• How to handle that data?
• Data is just like crude oil. It’s valuable, but if unrefined it cannot really be
used. It has to be changed into gas, plastic, chemicals, etc to create a
valuable entity that drives profitable activity; so data must be broken
down, analyzed for it to have value.
• How to extract interesting actionable insights and scientific knowledge?
•Data Science why excitement?
• Data Science is the science
which uses computer science, statistics
and machine learning, visualization
and human-computer interactions to
collect, clean, integrate, analyze,
visualize, interact with data to create
data products.
• Turn data into data products.
• Data Science why excitement?
Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision
makers in many industries such as science, engineering, economics,
politics, finance, and education
Computer Science
Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
Mathematics
Mathematical Modeling
Statistics
Statistical and Stochastic modeling, Probability.
Data science (DS) is a multidisciplinary field of study with goal
to address the challenges in big data
• Data Science why excitement?(cont)
• Data Science blend of tools, algorithms, and machine learning principles with the goal to discover
hidden patterns from the raw data.
• focus on statistical modeling, machine learning, management and analysis of data sets, and data
acquisition.
• Data Science makes use of several statistical procedures
• These procedures range from data transformations, data modeling, statistical operations
(descriptive and inferential statistics) and machine learning modeling.
• In order to gain predictive responses from the models, it is an essential requirement to understand
the underlying patterns of the data model. Furthermore, optimization techniques can be utilized to
meet the business requirements of the user.
•Data Science why excitement?(cont)
• Using various statistical tools, a Data Scientist has to develop models. With the help of
these models, they help their clients in the decision-making process. Furthermore,
these models support demand generation initiatives.
Data Science also covers:
• Data Integration.
• Distributed Architecture.
• Automating Machine learning.
• Data Visualization.
• Dashboards and BI.
• Data Engineering.
• Deployment in production mode
• Automated, data-driven decisions.
Example Search
• Google revenue around $50 bn/year from marketing, 97% of the companies
revenue.
• Sponsored search uses an action – a pure competition for marketers trying to
win access to consumers.
• In other words, a competition for models of consumers – their likelihood of
responding to the ad – and of determining the right bid for the item.
• There are around 30 billion search requests a month. Perhaps a trillion events
of history between search providers.
• Google Adwords and Adsense
Data Science Applications
• Transaction Databases  Recommender systems (NetFlix), Fraud Detection
(Security and Privacy)
• Wireless Sensor Data  Smart Home, Real-time Monitoring, Internet of Things
• Text Data, Social Media Data  Product Review and Consumer Satisfaction
(Facebook, Twitter, LinkedIn), E-discovery
• Software Log Data  Automatic Trouble Shooting (Splunk)
• Genotype and Phenotype Data  Epic, 23andme, Patient-Centered Care,
Personalized Medicine
• Other Applications
• Bank -make smarter decisions through fraud detection, management of
customer data, risk modeling, real-time predictive analytics, customer
segmentation, etc.
• In case of fraud detection -- a credit card, insurance, and accounting.
• able to analyze investment patterns and cycles of customers and suggest you
several offers that suit you accordingly.
• ability to risk modeling through data science through which they can assess their
overall performance.
• In real-time and predictive analytics, banks use machine learning algorithms to
improve their analytics strategy
Other Applications
• customer sentiment analysis techniques
can boost the social media interaction, boost their feedback and analyze
customer reviews.
Manufacturing-IOT
enabled the companies to predict potential problems, monitor systems
and analyze the continuous stream of data.
Uber is using data science for price optimization and providing better
experiences to their customers.
Using powerful predictive tools, they accurately predict the price based
on parameters like a weather pattern, availability of transport,
customers, etc.
Data
• Measureable units of information gathered or captured from activity of people, places
and things.
• data is generated from different sources like financial logs, text files, multimedia forms,
sensors, and instruments.
• need to understand
• which data to use
• how to organize the data, and so on.
• prepare the structured, and the unstructured data to be used by the Analytics team for
model building purpose.
• Types of Data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
What do we do with the Data ?
• Aggregation and Statistics
• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
• Example –Data Science
• Companies learn your secrets, shopping patterns, and preferences
• Eg. can we know if a child likes animation games , even if they doesn’t
want us to know?
• Building, and maintain a Data warehouse is a key skill which a Data
Engineer must have.
• They build pipelines which extract data from multiple sources and then manipulates
it to make it usable.
• Business analytics (BA) is the practice of iterative, methodical exploration of an
organization's data, with an emphasis on statistical analysis.
Business analytics is used by companies committed to data-driven decision-making.
• BA activities must be anchored to a strategically relevant business question to be
answered by using data analysis.
• Data Science and Business Analytics
• Data science or analytics is the process of deriving insights from data in order to
make optimal decisions.
• data science and analytics techniques such as basic statistics, regressions, simulation
and optimization modeling, data mining and machine learning, text analytics,
artificial intelligence and visualizations.
• Data science focuses on data modelling and data warehousing to track the ever-
growing data set. The information extracted through data science applications are
used to guide business processes and reach organisational goals.
Databases Data Science
Data Volume Modest Massive
Examples Bank records,
Personnel records,
Census,
Medical records
Online clicks,
GPS logs,
Tweets,
Building sensor readings
Priorities Consistency,
Error recovery,
Auditability
Speed,
Availability,
Query richness
Structured Strongly (Schema) Weakly or none (Text)
Properties Transactions, ACID* CAP* theorem (2/3),
eventual consistency
Realizations SQL NoSQL:
MongoDB, CouchDB,
Hbase, Cassandra, Riak, Memcached,
Apache River, …
Features Business Intelligence (BI) Data Science
Data Sources
Structured
(Usually SQL, often Data Warehouse)
Both Structured and Unstructured
( logs, cloud data, SQL, NoSQL, text)
Approach Statistics and Visualization
Statistics, Machine Learning, Graph Analysis,
Neuro- linguistic Programming (NLP)
Focus Past and Present Present and Future
Tools Pentaho, Microsoft BI, QlikView, R Rapid Miner, BigML, Weka, R
Data Science ML AI
Tools -1. SAS2. Tableau3. Apache
Spark4. MATLAB, SQL,
1. Amazon Lex2. IBM Watson
Studio3. Microsoft Azure ML Studio
1.TensorFlow2. Scikit Learn
3. Keras, Amazon lex, Google cloud
platform, Data robot.
Data Science deals with structured
and unstructured data.
Machine Learning uses statistical
models.
Artificial Intelligence uses logic and
decision trees.
Fraud Detection and Healthcare
analysis are popular examples of
Data Science.
Recommendation Systems such as
Spotify, and Facial Recognition are
popular examples.
Chatbots, and Voice assistants are
popular applications of AI.
The main applications of Data
Science are credit card fraud, ATM
theft, disease prediction, pattern
identification etc.
The main applications of machine
learning are Online recommender
system, Google search
algorithms, Facebook auto friend
tagging suggestions, etc.
The main applications of AI are Siri,
customer support using catboats,
Expert System, Online game playing,
intelligent humanoid robot, etc.
• Relationship between Data Science, Artificial Intelligence and Machine
Learning
• Machine Learning for Predictive Reporting
• to study transactional data to make valuable predictions .
• Also known as supervised learning
• implemented to suggest the most effective courses of action for any company.
Machine Learning for Pattern Discovery
• set parameters in various data reports
• unsupervised learning where there are no pre-decided parameters.
Artificial Intelligence represents an action planned feedback of
perception.
Perception > Planning > Action > Feedback of Perception
Data Science uses different parts of this pattern or loop to solve specific
problems
• For instance, in the first step, i.e. Perception,
• data scientists try to identify patterns with the help of the data.
• planning, there are two aspects:
• Finding all possible solutions
• Finding the best solution among all solutions
• machine learning by taking it as a standalone subject- understood in the context
of its environment.
AI is the tool that helps data science get results and the solutions for specific
problems. However, machine learning is what helps in achieving that goal
Example : Google’s search engine is a product of data science
It uses predictive analysis, a system used by artificial intelligence, to deliver
intelligent results to the users
• Tools for Data Science
• Reporting and Business Intelligence
• Predictive Modelling and Machine Learning
• Artificial Intelligence
• Data Science Tools for Big Data(Volume)
• Data 1GB to 10 GB - Traditional DB Excel, Access, SQl etc.
• >10 GB – Haddop, Hive
• Tools for Handling Variety
• Voluminous
• customer feedback may vary in length, sentiments, and other factors.
• Example for SQL are Oracle, MySQL, SQLite, whereas NoSQL consists of popular
databases like MongoDB, Cassandra, etc.
• These NoSQL databases are seeing huge adoption numbers because of their ability
to scale and handle dynamic data.
.
• Tools for Handling Velocity
• speed at which the data is captured.
• includes both real-time and non-real-time data.
• Example for realtime data
• sensor data collected by self-driving cars- automatic actions
• CCTV
• Stock trading
• Fraud detection for credit card transaction
• Network data – social media (Facebook, Twitter, etc.)
Tools -Apache Kafka- real-time data pipelines.
Apache Storm- process up to 1 Million tuples per second and it is highly scalable
Amazon Kinesis-Licensed and powerful
Apache Flink- high performance, fault tolerance, and efficient memory
management.
Reporting and BI Tools Predictive Analytics and
Machine Learning Tools
Frameworks for Deep
Learning
AI Tools
Excel, QlikView, Tableau ,
Microstrategy, powerBI,
Google
Analytics,Dundas,SISENSE
etc
Python , R, Apache spark,
Julia, Jupyter Notebooks
TensorFlow, Pytroch,
Keras and Caffe
AutoKeras, Google Cloud
AutoML, IBM Watson,
DataRobot, H20’s Driverless
AI, and Amazon’s Lex
SAS, SPSS,MATLAB- Licensed
Lifecycle of Data Science
• Role of Data Scientist
• Identifying the data-analytics problems that offer the greatest opportunities to the
organization
• Determining the correct data sets and variables
• Collecting large sets of structured and unstructured data from disparate sources
• Cleaning and validating the data to ensure accuracy, completeness, and uniformity
• Devising and applying models and algorithms to mine the stores of big data
• Analyzing the data to identify patterns and trends
• Interpreting the data to discover solutions and opportunities
• Communicating findings to stakeholders using visualization and other means
• Phase 1—Discovery
• various specifications, requirements, priorities and required budget.
• the ability to ask the right questions.
• need to frame the business problem and formulate initial hypotheses (IH) to test.
• Phase 2—Data preparation
• data cleaning, transformation, and visualization. This will help you to spot the outliers
and establish a relationship between the variables.----R
• Phase 3—Model planning
• methods and techniques to draw the relationships between variables
• These relationships will set the base for the algorithms in next phase
• apply Exploratory Data Analytics (EDA) using various statistical formulas and
visualization tools.
• R has a complete set of modeling capabilities and provides a good environment for
building interpretive models.
• SQL Analysis services can perform in-database analytics using common data mining
functions and basic predictive models.
• SAS/ACCESS can be used to access data from Hadoop and is used for creating
repeatable and reusable model flow diagrams.
• Phase 4—Model building
• develop datasets for training and testing purposes
• various learning techniques like classification, association and clustering to build the
model.
Example :
1. Classification (decision trees)
2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN)
3. Association rules
4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM)
5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting)
• Phase 5—Operationalize
• Analyzing the data to identify patterns and trends
• Interpreting results
• deliver final reports, briefings, code and technical documents
• pilot project
• Phase 6—Communicate results
• identify all the key findings, communicate to the stakeholders and determine if the
results of the project are a success or a failure
• Basic statistics
• 1. Random variables, sampling
• 2. Distributions and statistical measures
• 3. Hypothesis testing
Overview of linear algebra
1. Linear algebra and matrix computations
2. Functions, derivatives, convexity
Modeling techniques regression
1. Mathematical modeling process 2. Linear regression 3. Logistic regression
• Data visualization and visual analytics
• 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics
• Data visualization and visual analytics
• 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson
Analytics
• Data mining and machine learning
• 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means,
Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised
machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble
learning algorithms (Random Forest, Gradient Boosting)
• Simulation modeling 1. Random number generation 2. Monte Carlo simulations 3.
Simulation in Ipython
• Real time example
• Case Study: Diabetes Prevention
• What if we could predict the occurrence of diabetes and take appropriate
measures beforehand to prevent it?
• 1. You can refer to the sample data below.
• Step 1: Discovery
• Attributes:
• npreg – Number of times pregnant
• glucose – Plasma glucose concentration
• bp – Blood pressure
• skin – Triceps skinfold thickness
• bmi – Body mass index
• ped – Diabetes pedigree function
• age – Age
• income – Income
• Step 2 Data Preparation
• once we have the data, we need to clean and prepare the data for data
analysis.
• data has a lot of inconsistencies like missing values, blank columns,
abrupt values and incorrect data format which need to be cleaned.
• we have organized the data into a single table under different
attributes – making it look more structured.
• Step 2(Cont)
• This data has a lot of inconsistencies.
• In the column npreg, “one” is written in words, whereas it should be in the numeric
form like 1.
• In column bp one of the values is 6600 which is impossible (at least for humans) as bp
cannot go up to such huge value.
• Income column is blank and also makes no sense in predicting diabetes.
• Therefore, it is redundant to have it here and should be removed from the table.
• clean and preprocess this data by removing the outliers, filling up the null values and
normalizing the data type. -data preprocessing.
• Finally, we get the clean data which can be used for analysis.
• Step 3 Model Planning
• load the data into the analytical sandbox and apply various statistical functions
• R has functions like describe which gives us the number of missing values and unique
values.
• We can also use the summary function which will give us statistical information like
mean, median, range, min and max values.
• Then, we use visualization techniques like histograms, line graphs, box plots to get a
fair idea of the distribution of data.
• Step 4 Model Building
• supervised learning technique to build a model here.
• Step 5 Deliver the Model
• Check with sample data.
Data :Data tables and data types
○ Operations on tables
○ Basic plotting
○ Tidy data / the ER model
○ Relational Operations
○ SQL
wrangling
○ Data acquisition (load and scrape)
○ EDA Vis / grammar of graphics
○ Data cleaning (text, dates)
○ EDA: Summary statistics
○ Data analysis with optimization (derivatives)
○ Data transformations
○ Missing data
• Modeling
○ Univariate probability and statistics
○ Hypothesis testing
○ Multivariate probablity and statistics (joint and conditional probability, Bayes
thm)
○ Data Analysis with geometry (vectors, inner products, gradients and matrices)
○ Linear regression
○ Logistic regression
○ Gradient descent (batch and stochastic)
○ Trees and random forests
○ K-NN
○ Naïve Bayes
○ Clustering
○ PCA
• Sample Algorithms for Data Science analytics
Regression
• The most popular technique for this algorithm is least of squares. This method
calculates the best-fitting line.
• Based on historical data
Example :
• Weather forecasting
• Assessing risk
Tools
• TensorFlow and PyTorch
• Logistic Regression
• Logistic regression is similar to linear regression, but it is used when the output is
binary (i.e. when outcome can have only two possible values). The prediction for this
final output will be a non-linear S-shaped function called the logistic function, g().
• Graph of a logistic regression curve showing probability of passing an exam versus
hours studying
• Decision Trees
• Decision Trees can be used for both regression and classification tasks.
• Categorical Variable Decision Tree-predict whether a customer will pay his
renewal premium with an insurance company (yes/ no).
• Continuous Variable Decision Tree.-predict customer income based on occupation,
product, and various other variables.
• Example C4.5, CART
• Naive Bayes
• classification technique
• It measures the probability of each class, and the conditional probability for each
class give values of x. This algorithm is used for classification problems to reach a
binary yes/no outcome.
Example:
Text classification/ Spam Filtering/ Sentiment Analysis
Recommendation System
Types
Gaussian Naive Bayes
Multinomial Naive Bayes
Bernoulli
SVM
KNN
Kmeans
Dimensionality Reduction
• ANN
• Feed forward -multilayer perceptrons
• convolution neural networks-classification, object detection, or even
image segmentation,
• hierarchical object extractors.
What do Data Scientists do?
• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….
Data Scientist must posses
• Mathematics and Applied
Mathematics
• Applied Statistics/Data Analysis
• Solid Programming Skills (R,
Python, Julia, SQL)
• Data Mining
• Data Base Storage and
Management
• Machine Learning and
discovery
• Data Science Research Areas
• machine learning.
• artificial intelligence.
• Deep learning
• databases.
• statistics.
• optimization.
• natural language processing.
• computer vision.
• speech processing.
• Privacy
• Ethics
• Energy consumption
• Cloud computing
• IOT
• Cloud
• Social Media
• Block Chain etc.
• Future of Data Science and Analytics
Thank You
?

More Related Content

What's hot

Business intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and ApplicationsBusiness intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and Applicationsraj
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsSystems Limited
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
DATA & ANALYTICS
DATA & ANALYTICSDATA & ANALYTICS
DATA & ANALYTICSfireflylabz
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Data Science
Data ScienceData Science
Data ScienceRabin BK
 

What's hot (20)

Business intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and ApplicationsBusiness intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and Applications
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data Science
Data ScienceData Science
Data Science
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data Analytics course.pptx
Data Analytics course.pptxData Analytics course.pptx
Data Analytics course.pptx
 
Data Science
Data ScienceData Science
Data Science
 
Big data
Big dataBig data
Big data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
DATA & ANALYTICS
DATA & ANALYTICSDATA & ANALYTICS
DATA & ANALYTICS
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Data science in business
Data science in businessData science in business
Data science in business
 
Data Science
Data ScienceData Science
Data Science
 
Data Analytics for IoT
Data Analytics for IoT Data Analytics for IoT
Data Analytics for IoT
 

Similar to Data science and business analytics

Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...teodroscampaus
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdfssuser0413ec
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Robert Williams
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfVijayKaran7
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics Venkat .P
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand wordsMasum Billah
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSwapnilSaurav10
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 

Similar to Data science and business analytics (20)

Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 

Recently uploaded

Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 

Recently uploaded (20)

Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 

Data science and business analytics

  • 1. Data science andbusiness analytics Dr.M.Inbavalli Vice Principal & Head Research Department of Computer Science Marudhar Kesari Jain College for Women Vaniyambadi-635751
  • 2. Overview • Evolution of Data • Data Science • Business Analytics • Applications • AI, ML, DL, Data science – Relationship • Tools for Data Science • Life cycle of data science with case study • Algorithms for Data Science • Data Science Research Areas • Future of Data Science
  • 3. Data All Around • Data has become the most abundant thing today • Explosion of data, in pretty much every domain • Lots of data is being collected and warehoused • Web data, e-commerce • Financial transactions, bank/credit transactions • Online trading and purchasing • Social Network
  • 4. •Data All Around • Sensing devices and sensor networks that can monitor everything 24/7 from temperature to pollution to vital signs • Increasingly sophisticated smart phones • Internet, social networks makes it easy to publish data • Scientific experiments and simulations produce astronomical volumes of data • Internet of Things(IOT) • Dataification: taking all aspects of life and turning them into data (e.g., what you like/enjoy has been turned into a stream of your "likes")
  • 5. • Data Science – Why all the excitement?
  • 6.
  • 7.
  • 8. • How Much Data Do We have? • Data volumes expected to get much worse • Over 2.5 quintillion bytes of data are created every single day.
  • 9. How Much Data Do We have? What can you do with the Traffic Prediction data? 9 Crowdsourcing + physical modeling + sensing + data assimilation From Institute for Transportation Studies
  • 10. • How to handle that data? • Data is just like crude oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so data must be broken down, analyzed for it to have value. • How to extract interesting actionable insights and scientific knowledge?
  • 11. •Data Science why excitement? • Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. • Turn data into data products.
  • 12. • Data Science why excitement? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Computer Science Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI Mathematics Mathematical Modeling Statistics Statistical and Stochastic modeling, Probability. Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data
  • 13. • Data Science why excitement?(cont) • Data Science blend of tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. • focus on statistical modeling, machine learning, management and analysis of data sets, and data acquisition. • Data Science makes use of several statistical procedures • These procedures range from data transformations, data modeling, statistical operations (descriptive and inferential statistics) and machine learning modeling. • In order to gain predictive responses from the models, it is an essential requirement to understand the underlying patterns of the data model. Furthermore, optimization techniques can be utilized to meet the business requirements of the user.
  • 14. •Data Science why excitement?(cont) • Using various statistical tools, a Data Scientist has to develop models. With the help of these models, they help their clients in the decision-making process. Furthermore, these models support demand generation initiatives. Data Science also covers: • Data Integration. • Distributed Architecture. • Automating Machine learning. • Data Visualization. • Dashboards and BI. • Data Engineering. • Deployment in production mode • Automated, data-driven decisions.
  • 15. Example Search • Google revenue around $50 bn/year from marketing, 97% of the companies revenue. • Sponsored search uses an action – a pure competition for marketers trying to win access to consumers. • In other words, a competition for models of consumers – their likelihood of responding to the ad – and of determining the right bid for the item. • There are around 30 billion search requests a month. Perhaps a trillion events of history between search providers. • Google Adwords and Adsense
  • 16. Data Science Applications • Transaction Databases  Recommender systems (NetFlix), Fraud Detection (Security and Privacy) • Wireless Sensor Data  Smart Home, Real-time Monitoring, Internet of Things • Text Data, Social Media Data  Product Review and Consumer Satisfaction (Facebook, Twitter, LinkedIn), E-discovery • Software Log Data  Automatic Trouble Shooting (Splunk) • Genotype and Phenotype Data  Epic, 23andme, Patient-Centered Care, Personalized Medicine
  • 17. • Other Applications • Bank -make smarter decisions through fraud detection, management of customer data, risk modeling, real-time predictive analytics, customer segmentation, etc. • In case of fraud detection -- a credit card, insurance, and accounting. • able to analyze investment patterns and cycles of customers and suggest you several offers that suit you accordingly. • ability to risk modeling through data science through which they can assess their overall performance. • In real-time and predictive analytics, banks use machine learning algorithms to improve their analytics strategy
  • 18. Other Applications • customer sentiment analysis techniques can boost the social media interaction, boost their feedback and analyze customer reviews. Manufacturing-IOT enabled the companies to predict potential problems, monitor systems and analyze the continuous stream of data. Uber is using data science for price optimization and providing better experiences to their customers. Using powerful predictive tools, they accurately predict the price based on parameters like a weather pattern, availability of transport, customers, etc.
  • 19. Data • Measureable units of information gathered or captured from activity of people, places and things. • data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments. • need to understand • which data to use • how to organize the data, and so on. • prepare the structured, and the unstructured data to be used by the Analytics team for model building purpose. • Types of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data
  • 20. What do we do with the Data ? • Aggregation and Statistics • Data warehousing and OLAP • Indexing, Searching, and Querying • Keyword based search • Pattern matching (XML/RDF) • Knowledge discovery • Data Mining • Statistical Modeling • Example –Data Science • Companies learn your secrets, shopping patterns, and preferences • Eg. can we know if a child likes animation games , even if they doesn’t want us to know? • Building, and maintain a Data warehouse is a key skill which a Data Engineer must have.
  • 21. • They build pipelines which extract data from multiple sources and then manipulates it to make it usable. • Business analytics (BA) is the practice of iterative, methodical exploration of an organization's data, with an emphasis on statistical analysis. Business analytics is used by companies committed to data-driven decision-making. • BA activities must be anchored to a strategically relevant business question to be answered by using data analysis.
  • 22. • Data Science and Business Analytics • Data science or analytics is the process of deriving insights from data in order to make optimal decisions. • data science and analytics techniques such as basic statistics, regressions, simulation and optimization modeling, data mining and machine learning, text analytics, artificial intelligence and visualizations. • Data science focuses on data modelling and data warehousing to track the ever- growing data set. The information extracted through data science applications are used to guide business processes and reach organisational goals.
  • 23.
  • 24. Databases Data Science Data Volume Modest Massive Examples Bank records, Personnel records, Census, Medical records Online clicks, GPS logs, Tweets, Building sensor readings Priorities Consistency, Error recovery, Auditability Speed, Availability, Query richness Structured Strongly (Schema) Weakly or none (Text) Properties Transactions, ACID* CAP* theorem (2/3), eventual consistency Realizations SQL NoSQL: MongoDB, CouchDB, Hbase, Cassandra, Riak, Memcached, Apache River, …
  • 25. Features Business Intelligence (BI) Data Science Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R Rapid Miner, BigML, Weka, R
  • 26.
  • 27.
  • 28. Data Science ML AI Tools -1. SAS2. Tableau3. Apache Spark4. MATLAB, SQL, 1. Amazon Lex2. IBM Watson Studio3. Microsoft Azure ML Studio 1.TensorFlow2. Scikit Learn 3. Keras, Amazon lex, Google cloud platform, Data robot. Data Science deals with structured and unstructured data. Machine Learning uses statistical models. Artificial Intelligence uses logic and decision trees. Fraud Detection and Healthcare analysis are popular examples of Data Science. Recommendation Systems such as Spotify, and Facial Recognition are popular examples. Chatbots, and Voice assistants are popular applications of AI. The main applications of Data Science are credit card fraud, ATM theft, disease prediction, pattern identification etc. The main applications of machine learning are Online recommender system, Google search algorithms, Facebook auto friend tagging suggestions, etc. The main applications of AI are Siri, customer support using catboats, Expert System, Online game playing, intelligent humanoid robot, etc.
  • 29. • Relationship between Data Science, Artificial Intelligence and Machine Learning • Machine Learning for Predictive Reporting • to study transactional data to make valuable predictions . • Also known as supervised learning • implemented to suggest the most effective courses of action for any company. Machine Learning for Pattern Discovery • set parameters in various data reports • unsupervised learning where there are no pre-decided parameters. Artificial Intelligence represents an action planned feedback of perception. Perception > Planning > Action > Feedback of Perception Data Science uses different parts of this pattern or loop to solve specific problems
  • 30. • For instance, in the first step, i.e. Perception, • data scientists try to identify patterns with the help of the data. • planning, there are two aspects: • Finding all possible solutions • Finding the best solution among all solutions • machine learning by taking it as a standalone subject- understood in the context of its environment. AI is the tool that helps data science get results and the solutions for specific problems. However, machine learning is what helps in achieving that goal Example : Google’s search engine is a product of data science It uses predictive analysis, a system used by artificial intelligence, to deliver intelligent results to the users
  • 31.
  • 32. • Tools for Data Science • Reporting and Business Intelligence • Predictive Modelling and Machine Learning • Artificial Intelligence • Data Science Tools for Big Data(Volume) • Data 1GB to 10 GB - Traditional DB Excel, Access, SQl etc. • >10 GB – Haddop, Hive • Tools for Handling Variety
  • 33. • Voluminous • customer feedback may vary in length, sentiments, and other factors. • Example for SQL are Oracle, MySQL, SQLite, whereas NoSQL consists of popular databases like MongoDB, Cassandra, etc. • These NoSQL databases are seeing huge adoption numbers because of their ability to scale and handle dynamic data. .
  • 34. • Tools for Handling Velocity • speed at which the data is captured. • includes both real-time and non-real-time data. • Example for realtime data • sensor data collected by self-driving cars- automatic actions • CCTV • Stock trading • Fraud detection for credit card transaction • Network data – social media (Facebook, Twitter, etc.) Tools -Apache Kafka- real-time data pipelines. Apache Storm- process up to 1 Million tuples per second and it is highly scalable Amazon Kinesis-Licensed and powerful Apache Flink- high performance, fault tolerance, and efficient memory management.
  • 35. Reporting and BI Tools Predictive Analytics and Machine Learning Tools Frameworks for Deep Learning AI Tools Excel, QlikView, Tableau , Microstrategy, powerBI, Google Analytics,Dundas,SISENSE etc Python , R, Apache spark, Julia, Jupyter Notebooks TensorFlow, Pytroch, Keras and Caffe AutoKeras, Google Cloud AutoML, IBM Watson, DataRobot, H20’s Driverless AI, and Amazon’s Lex SAS, SPSS,MATLAB- Licensed
  • 36. Lifecycle of Data Science
  • 37. • Role of Data Scientist • Identifying the data-analytics problems that offer the greatest opportunities to the organization • Determining the correct data sets and variables • Collecting large sets of structured and unstructured data from disparate sources • Cleaning and validating the data to ensure accuracy, completeness, and uniformity • Devising and applying models and algorithms to mine the stores of big data • Analyzing the data to identify patterns and trends • Interpreting the data to discover solutions and opportunities • Communicating findings to stakeholders using visualization and other means
  • 38. • Phase 1—Discovery • various specifications, requirements, priorities and required budget. • the ability to ask the right questions. • need to frame the business problem and formulate initial hypotheses (IH) to test. • Phase 2—Data preparation • data cleaning, transformation, and visualization. This will help you to spot the outliers and establish a relationship between the variables.----R • Phase 3—Model planning • methods and techniques to draw the relationships between variables • These relationships will set the base for the algorithms in next phase • apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.
  • 39. • R has a complete set of modeling capabilities and provides a good environment for building interpretive models. • SQL Analysis services can perform in-database analytics using common data mining functions and basic predictive models. • SAS/ACCESS can be used to access data from Hadoop and is used for creating repeatable and reusable model flow diagrams.
  • 40. • Phase 4—Model building • develop datasets for training and testing purposes • various learning techniques like classification, association and clustering to build the model. Example : 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting)
  • 41.
  • 42. • Phase 5—Operationalize • Analyzing the data to identify patterns and trends • Interpreting results • deliver final reports, briefings, code and technical documents • pilot project • Phase 6—Communicate results • identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure
  • 43. • Basic statistics • 1. Random variables, sampling • 2. Distributions and statistical measures • 3. Hypothesis testing Overview of linear algebra 1. Linear algebra and matrix computations 2. Functions, derivatives, convexity Modeling techniques regression 1. Mathematical modeling process 2. Linear regression 3. Logistic regression • Data visualization and visual analytics • 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics
  • 44. • Data visualization and visual analytics • 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics • Data mining and machine learning • 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting) • Simulation modeling 1. Random number generation 2. Monte Carlo simulations 3. Simulation in Ipython
  • 45. • Real time example • Case Study: Diabetes Prevention • What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? • 1. You can refer to the sample data below. • Step 1: Discovery • Attributes: • npreg – Number of times pregnant • glucose – Plasma glucose concentration • bp – Blood pressure • skin – Triceps skinfold thickness • bmi – Body mass index • ped – Diabetes pedigree function • age – Age • income – Income
  • 46. • Step 2 Data Preparation • once we have the data, we need to clean and prepare the data for data analysis. • data has a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which need to be cleaned. • we have organized the data into a single table under different attributes – making it look more structured.
  • 47. • Step 2(Cont) • This data has a lot of inconsistencies. • In the column npreg, “one” is written in words, whereas it should be in the numeric form like 1. • In column bp one of the values is 6600 which is impossible (at least for humans) as bp cannot go up to such huge value. • Income column is blank and also makes no sense in predicting diabetes. • Therefore, it is redundant to have it here and should be removed from the table. • clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type. -data preprocessing. • Finally, we get the clean data which can be used for analysis.
  • 48. • Step 3 Model Planning • load the data into the analytical sandbox and apply various statistical functions • R has functions like describe which gives us the number of missing values and unique values. • We can also use the summary function which will give us statistical information like mean, median, range, min and max values. • Then, we use visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.
  • 49. • Step 4 Model Building • supervised learning technique to build a model here.
  • 50.
  • 51. • Step 5 Deliver the Model • Check with sample data. Data :Data tables and data types ○ Operations on tables ○ Basic plotting ○ Tidy data / the ER model ○ Relational Operations ○ SQL wrangling ○ Data acquisition (load and scrape) ○ EDA Vis / grammar of graphics ○ Data cleaning (text, dates) ○ EDA: Summary statistics ○ Data analysis with optimization (derivatives) ○ Data transformations ○ Missing data
  • 52. • Modeling ○ Univariate probability and statistics ○ Hypothesis testing ○ Multivariate probablity and statistics (joint and conditional probability, Bayes thm) ○ Data Analysis with geometry (vectors, inner products, gradients and matrices) ○ Linear regression ○ Logistic regression ○ Gradient descent (batch and stochastic) ○ Trees and random forests ○ K-NN ○ Naïve Bayes ○ Clustering ○ PCA
  • 53. • Sample Algorithms for Data Science analytics Regression • The most popular technique for this algorithm is least of squares. This method calculates the best-fitting line. • Based on historical data Example : • Weather forecasting • Assessing risk Tools • TensorFlow and PyTorch
  • 54. • Logistic Regression • Logistic regression is similar to linear regression, but it is used when the output is binary (i.e. when outcome can have only two possible values). The prediction for this final output will be a non-linear S-shaped function called the logistic function, g(). • Graph of a logistic regression curve showing probability of passing an exam versus hours studying
  • 55. • Decision Trees • Decision Trees can be used for both regression and classification tasks. • Categorical Variable Decision Tree-predict whether a customer will pay his renewal premium with an insurance company (yes/ no). • Continuous Variable Decision Tree.-predict customer income based on occupation, product, and various other variables. • Example C4.5, CART • Naive Bayes • classification technique • It measures the probability of each class, and the conditional probability for each class give values of x. This algorithm is used for classification problems to reach a binary yes/no outcome.
  • 56. Example: Text classification/ Spam Filtering/ Sentiment Analysis Recommendation System Types Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli SVM KNN Kmeans Dimensionality Reduction
  • 57. • ANN • Feed forward -multilayer perceptrons • convolution neural networks-classification, object detection, or even image segmentation, • hierarchical object extractors.
  • 58. What do Data Scientists do? • National Security • Cyber Security • Business Analytics • Engineering • Healthcare • And more ….
  • 59. Data Scientist must posses • Mathematics and Applied Mathematics • Applied Statistics/Data Analysis • Solid Programming Skills (R, Python, Julia, SQL) • Data Mining • Data Base Storage and Management • Machine Learning and discovery
  • 60. • Data Science Research Areas • machine learning. • artificial intelligence. • Deep learning • databases. • statistics. • optimization. • natural language processing. • computer vision. • speech processing. • Privacy • Ethics • Energy consumption • Cloud computing • IOT • Cloud • Social Media • Block Chain etc.
  • 61.
  • 62. • Future of Data Science and Analytics