This document provides an introduction to data science. It discusses why data science is important and covers key techniques like statistics, data mining, and visualization. It also reviews popular tools and platforms for data science like R, Hadoop, and real-time systems. Finally, it discusses how data science can be applied across different business domains such as financial services, telecom, retail, and healthcare.
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Learn the basics of data visualization in R. In this module, we explore the Graphics package and learn to build basic plots in R. In addition, learn to add title, axis labels and range. Modify the color, font and font size. Add text annotations and combine multiple plots. Finally, learn how to save the plots in different formats.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Learn the basics of data visualization in R. In this module, we explore the Graphics package and learn to build basic plots in R. In addition, learn to add title, axis labels and range. Modify the color, font and font size. Add text annotations and combine multiple plots. Finally, learn how to save the plots in different formats.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
In this tutorial, we learn to create variables in R. Followed by that, we explore the different data types including numeric, integer, character, logical and date/time.
Tableau Tutorial for Data Science | EdurekaEdureka!
YouTube Link:https://youtu.be/ZHNdSKMluI0
Edureka Tableau Certification Training: https://www.edureka.co/tableau-certification-training
This Edureka's PPT on "Tableau for Data Science" will help you to utilize Tableau as a tool for Data Science, not only for engagement but also comprehension efficiency. Through this PPT, you will learn to gain the maximum amount of insight with the least amount of effort.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
This Linear Regression in Machine Learning Presentation will help you understand the basics of Linear Regression algorithm - what is Linear Regression, why is it needed and how Simple Linear Regression works with solved examples, Linear regression analysis, applications of Linear Regression and Multiple Linear Regression model. At the end, we will implement a use case on profit estimation of companies using Linear Regression in Python. This Machine Learning presentation is ideal for beginners who want to understand Data Science algorithms as well as Machine Learning algorithms.
Below topics are covered in this Linear Regression Machine Learning Tutorial:
1. Introduction to Machine Learning
2. Machine Learning Algorithms
3. Applications of Linear Regression
4. Understanding Linear Regression
5. Multiple Linear Regression
6. Use case - Profit estimation of companies
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - - -
Who should take this Machine Learning Training Course?
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
- - - - - -
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Visualization is widely used in industries in info-graphics design, business analytics, data analytics, advanced analytics, business intelligence dashboards, content marketing. It is the 1st part of 3 part series on data visualization. These techniques will enable you to create a good design UI/UX. It contains r codes useful for programmers to create good visual charts and depict a story to clients, customer, senior management, etc ...
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
In this tutorial, we learn to create variables in R. Followed by that, we explore the different data types including numeric, integer, character, logical and date/time.
Tableau Tutorial for Data Science | EdurekaEdureka!
YouTube Link:https://youtu.be/ZHNdSKMluI0
Edureka Tableau Certification Training: https://www.edureka.co/tableau-certification-training
This Edureka's PPT on "Tableau for Data Science" will help you to utilize Tableau as a tool for Data Science, not only for engagement but also comprehension efficiency. Through this PPT, you will learn to gain the maximum amount of insight with the least amount of effort.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
This Linear Regression in Machine Learning Presentation will help you understand the basics of Linear Regression algorithm - what is Linear Regression, why is it needed and how Simple Linear Regression works with solved examples, Linear regression analysis, applications of Linear Regression and Multiple Linear Regression model. At the end, we will implement a use case on profit estimation of companies using Linear Regression in Python. This Machine Learning presentation is ideal for beginners who want to understand Data Science algorithms as well as Machine Learning algorithms.
Below topics are covered in this Linear Regression Machine Learning Tutorial:
1. Introduction to Machine Learning
2. Machine Learning Algorithms
3. Applications of Linear Regression
4. Understanding Linear Regression
5. Multiple Linear Regression
6. Use case - Profit estimation of companies
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - - -
Who should take this Machine Learning Training Course?
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
- - - - - -
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Data Visualization is widely used in industries in info-graphics design, business analytics, data analytics, advanced analytics, business intelligence dashboards, content marketing. It is the 1st part of 3 part series on data visualization. These techniques will enable you to create a good design UI/UX. It contains r codes useful for programmers to create good visual charts and depict a story to clients, customer, senior management, etc ...
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
Data and Analytics Career Paths, Presented at IEEE LYC'19.
About Speaker:
Ahmed Amr is a Data/Analytics Engineer at Rubikal, where he leads, develops, and creates daily data/analytics operations, which includes data ingestion , data streaming, data warehousing, and analytical dashboards. Ahmed is graduated from Computer Engineering Department, Alexandria University; and he is currently pursuing his MSc degree in Computer Science, AAST. Professionally, Ahmed worked with Egyptian/US startups such as (Badr, Incorta, WhoKnows) to develop their data/analytics projects. Academically, Ahmed worked as a Teaching Assistant in CS department, AAST. Ahmed helps software companies to develop robust data engineering infrastructure, and powerful analytical insights.
References:
1) https://www.datacamp.com/community/tutorials/data-science-industry-infographic
2) Analytics: The real-world use of big data, IBM, Executive Report
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
Presented at the ACEMS workshop at QUT in February 2015.
Credits: whole project team (names listed in the first slide).
Approved by CSIRO to be shared externally.
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
With the torrent of data available to us on the Internet, it's been increasingly difficult to separate the signal from the noise. We set out on a journey with a simple directive: Figure out a way to discover emerging technology trends. Through a series of experiments, trials, and pivots, we found our answer in the power of graph databases. We essentially built our "Emerging Tech Radar" on emerging technologies with graph databases being central to our discovery platform. Using a mix of NoSQL databases and open source libraries we built a scalable information digestion platform which touches upon multiple topics such as NLP, named entity extraction, data cleansing, cypher queries, multiple visualizations, and polymorphic persistence.
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
Dan Lynn (AgilData) & Patrick Russell (Craftsy) present on how to do data science in the real world. We discuss data cleansing, ETL, pipelines, hosting, and share several tools used in the industry.
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
Watch full webinar here: https://bit.ly/3dMN503
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
This talk was held at the 11th meeting on April 7 2014 by Karolina Alexiou.
Analysis of big data is useless (and a lot harder to sell) when you can't measure whether the resulting insights are correct. In order to develop sophisticated data analysis methodologies tailored to your particular use-case, you need to be able to figure out what works and what doesn't. It is crucial to gather data independently to your analysis (ground truth) and compare it to your results using the correct metrics and account for biases. The sheer volume of data means that you also need to have a strategy for slicing and dicing the data to isolate the really valuable parts, and also, a keen eye for visualization so that you can quickly compare methodologies and support the validity of your insights to third parties.
Bitcoin, Blockchain and the Crypto Contracts - Part 2Prithwis Mukerjee
Where we explain how the cryptographic ideas are used to create a crypto asset on the block chain. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
Where we explain how the concept of a crypto currency can lead to the creation of a new kind of autonomous corporation. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
Presentation made at Engage 2013, the annual event of the Public Relations Society of India on the topic of how to create your own personal radio and TV channel
Can a mind control a machine ? Can a machine control a mind ? Can a mind control another mind through a machine ? Explore all these fascinating possibilities in a slidedeck that I had presented at the PricewaterhouseCoopers Technology Forecast in Calcutta
Please click on the embedded Videos to see them in YouTube
One of the earliest presentation made in Bangla to a group of school students in Nabadwip in the year 2000. The original Powerpoint presentation is no more usable because the fonts used are not available any more. However the screen shots have been preserved here.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
4. Volume
Data is being acquired from a
variety of sources
●
●
●
●
●
●
●
EFT in Banks, Credit card
payments
Cell phones
Sensors attached to a variety
of equipment
Surveillance cameras, CCTV
Social Media Updates
Blogs
Websites
prithwis mukerjee, ph.d.
5. Variety / Velocity
●
●
●
●
●
●
Numeric data
Structured text data
Unstructured text data
Images
Sound and video recordings
Graph Nodes
○ Social Media “friends”
○ Websites linked to each
other
prithwis mukerjee, ph.d.
Data is being generated fast and is
becoming obsolete or useless
equally faster
●
●
●
Realtime ( or near realtime)
data from sensors, cameras
Website traffic
Social media “trends”
6. So what is Big Data ?
●
●
●
Volume
Velocity
Variety ?
A new term coined by
IT vendors to push new
technology like
●
●
●
prithwis mukerjee, ph.d.
Map Reduce
Hadoop
NOSQL
A new way to
●
●
●
●
●
collect
store
manage
analyse
visualise data
7. Big Data is like Crude Oil { not new Oil }
Think of data as crude oil !
Big Data is like extracting the
crude oil, transporting it in mega
tankers, pumping it through
pipelines and storing it in
massive silos
But what
about
refining ?
prithwis mukerjee, ph.d.
8. The Science (and Art ) of Data
Think of data as crude oil !
Data Science
●
Big Data is like extracting the
crude oil, transporting it in mega
tankers, pumping it through
pipelines and storing it in
Refining
massive silos
prithwis mukerjee, ph.d.
●
●
●
Discovering what we do not
know about the data
Obtaining predictive, actionable
insight
Creating data products that have
business impacts
Communicating relevent
business stories
10. 10 Things {most} Data Scientists do ...
1. Ask good questions
6. Create models, algorithms
What is what ?
7. Under data relationships
We do not know ! We would like to
know
8. Tell the machine how to learn
from the data
2. Define, Test Hypothesis, Run
experiments
3, Scoop, scrape, sample business
data
4. Wrestle and tame data
5. Play with data, discover
unknowns
prithwis mukerjee, ph.d.
9. Create data products that
deliver actionable insights
10. Tell relevant business stories
from data
11. Statistics - World of Data
●
Data comes in various types
○ Nominal - colour, gender,
PIN code
○ Ordinal - scale of 1-10,
{high, medium, low}
○ Interval - Dates,
Temperature (Centigrade)
○ Ratio - length, weight, count
prithwis mukerjee, ph.d.
●
Data comes in various
structure
○ Structured data - nominal,
ordinal, interval, ratio
○ Unstructured text - email,
tweets, reviews
○ Images, voice prints
○ graphs, networks - social
media friendships, likes
13. Statistics : The Path Ahead
Probability,
Distributions
prithwis mukerjee, ph.d.
Testing of
Hypothesis
Regression,
Testing
Predictive
Analysis
14. Data Mining / Machine Learning
Is the process of obtaining
Typical tasks are
●
novel
●
classification
●
valid
●
clustering
●
potentially useful
●
association rules
●
understandable
●
sequential patterns
●
regression
●
deviation detection
patterns in data
prithwis mukerjee, ph.d.
15. Some definitions
Instance ( an item or record)
●
an observation that is
characterised by a number of
attributes
○
○
person - with attributes like age,
salary, qualification
sale - with product, quantity, price
Attribute
●
measuring characteristics of an
instance
Class
●
grouping of an instance into
○
○
acceptable, not acceptable
mammal, fish, bird
prithwis mukerjee, ph.d.
Nominal
●
colour, PIN code, state
Ordinal
●
ranking : tall, medium, short or
feedback on a scale of 1 - 10
Ratio
●
length, price, duration, quantity
Interval
●
date, temperature
16. Data Mining : Classification
Classification
●
●
Which loan applicant will not
default on the loan ?
Which potential customer will
respond to a mailer campaign
?
prithwis mukerjee, ph.d.
18. Data Mining : Clustering
Given a set of
unclassified data
points, how to find
a natural grouping
within them
●
Can we segment the market in
some way that is not yet known ?
prithwis mukerjee, ph.d.
19. Example of Document Clustering
Clustering points : 3204 article
from the Los Angeles Times
Similarity Measure : How many
words are common in these
documents ( after excluding some
common words )
prithwis mukerjee, ph.d.
20. Clustering of S&P Stock Data
●
●
●
●
Observe Stock Movements
every day.
Clustering points: Stock{UP/DOWN}
Similarity Measure: Two
points are more similar if
the events described by
them frequently happen
together on the same day.
We used association rules
to quantify a similarity
measure.
prithwis mukerjee, ph.d.
21. Regression
● Predict a value of a given continuous valued variable
based on the values of other variables, assuming a
linear or nonlinear model of dependency.
○
Greatly studied in statistics, neural network fields.
● Examples:
○
Predicting sales amounts of new product based on advertising
expenditure.
○
Predicting wind velocities as a function of temperature, humidity, air
○
pressure, etc.
Time series prediction of stock market indices.
prithwis mukerjee, ph.d.
22. Data Mining : Association Rules Mining
Association Rules
●
●
which products
should be kept
along with other
products
which two
products should
never be
discounted
together
prithwis mukerjee, ph.d.
25. Definitions
Data Mining
●
●
Is the process of extracting
unknown, valid and
actionable information from
large databases and using
this to make business
decisions
Non trivial process of
identifying valid, novel,
potentially useful and
understandable /
explainable patterns in data
prithwis mukerjee, ph.d.
Data Science is a rare combination of
multiple skills that include
●
Technology : obviously !
but also
●
●
●
Curiosity - a desire to go below
the surface and discover a
hypothesis that can be tested
Storytelling - create a business
story around the data
Cleverness - again obviously, to
look at the problem from different
angles
30. Map Reduce
●
●
●
Input : A set of (key, value)
pairs
User supplies two functions
○ Map (k,v) => List(k1,v1)
○ Reduce (k1, list(v1)) => v2
Output is the set of (k1,v2)
pairs
prithwis mukerjee, ph.d.
31. Hadoop
A programming framework that
allows you to run Map-Reduce jobs
on a distributed cluster of low cost
machines without having to bother
about anything except
●
●
the Map and Reduce functions
loading data into HDFS
1.
2.
3.
4.
prithwis mukerjee, ph.d.
HIVE
a. A plug-in that allows one to
use SQL like queries that are
converted into map-reduce
jobs
PIG
a. A scripting language for
writing long queries
HBASE
a. A non-relational DBMS
SQOOP
a. moves data to andfrom HDFS
35. Conclusion
●
●
●
●
Why data science ?
Techniques
○ Statistics
○ Data Mining
○ Visualisation
Tools & Platforms
○ R
○ Hadoop / MapReduce
○ Real Time Systems
Business Domains
Data Science is a rare combination of
multiple skills that include
●
but also
●
●
●
prithwis mukerjee, ph.d.
Technology : obviously !
Curiosity - a desire to go below
the surface and discover a
hypothesis that can be tested
Storytelling - create a business
story around the data
Cleverness - again obviously, to
look at the problem from different
angles
37. Thank You
Contact
This presentation is accessible at at
the blog
Prithwis Mukerjee
Professor, Praxis Business School
http://blog.yantrajaal.com
prithwis@praxis.ac.in
at the following URL
http://bit.ly/pm-datascience
prithwis mukerjee, ph.d.