SlideShare a Scribd company logo
1 of 15
• From the ACM Taskforce on Data Science Curricula - Draws from many different disciplines
What is Data Science ?
• Statistics;
– a central component of data science. Why?
– statistics studies how to make robust conclusions with incomplete
information.
• Computing
– is a central component because
– programming allows us to apply analysis techniques to the large and diverse
data sets that arise in real-world applications:
– not just numbers, but text, images, videos, and sensor readings.
• Domain Knowledge
– Through understanding a particular domain, data scientists learn to ask
appropriate questions about their data and correctly interpret the answers
provided by our inferential and computational tools.
Data science is all of these things, but it is more than the sum of its parts
because of the applications.
What is Data Science ?
What is Foundations of Data Science?
• Drawing useful conclusions from data using computation
• Exploration
– Identifying patterns in information
– Uses visualizations
• Inference
– Quantifying whether those patterns are reliable
– Uses randomization
– quantifying our degree of certainty: will those patterns we found
also appear in new observations? How accurate are our predictions?
• Prediction
– Making informed guesses
– Uses machine learning
• A foundation in DS requires
– Not just understand statistical and computational techniques,
– but also recognizing how they apply to real scenarios.
• Whatever aspect of the world we wish to study— Earth’s weather/markets/ polls
– data we collect typically offer an incomplete description of the subject at hand.
– A central challenge of data science - make reliable conclusions using partial information.
• In this endeavor, combine two essential tools: computation and randomization.
– Computing skills
• will allow us to use all available information to draw conclusions.
• Rather than focusing only on the average temperature of a region, we will consider the
whole range of temperatures together to construct a more nuanced analysis.
– Randomness
• allow us to consider the many different ways in which incomplete information might be
completed.
• Rather than assuming that temperatures vary in a particular way, we will learn to use
randomness as a way to imagine many possible scenarios that are all consistent with the
data we observe.
Data Science
Statistical Techniques
• The discipline of statistics has long addressed the same
fundamental challenge as data science: how to draw robust
conclusions about the world using incomplete information.
• An important contributions of statistics : consistent and
precise vocabulary for describing the relationship between
observations and conclusions.
• Follow the same tradition, focus on a set of core inferential
problems from statistics:
– testing hypotheses,
– estimating confidence, and
– predicting unknown quantities.
Data Science…Goes beyond Statistics
• Data science extends the field of statistics by taking full advantage of
– computing,
– data visualization,
– machine learning,
– optimization, and
– access to information.
• The combination of fast computers and the Internet gives anyone the ability to
access and analyze vast datasets:
– millions of news articles,
– full encyclopedias,
– databases for any domain, and
– massive repositories of music, photos, and video.
• Challenge - Real data often do not follow regular patterns/ match standard
equations.
• The interesting variation in real data can be lost by focusing too much
attention on simplistic summaries such as average values.
Tools
• Recommend the Anaconda distribution that packages
together the Python 3 language interpreter, IPython libraries,
and the Jupyter notebook environment.
• Complete introduction to all of these computational tools.
• Learn to write programs, generate images from data, and
work with real-world data sets that are published online.
Course Structure
FOUNDATIONS OF DATA SCIENCE 3-0-0-3
• Unit-1
Introduction, Causality and Experiments, Data Preprocessing: Data cleaning, Data reduction, Data
transformation, Data discretization. Visualization and Graphing: Visualizing Categorical Distributions, Visualizing
Numerical Distributions, Overlaid Graphs, plots, and summary statistics of exploratory data analysis,
Randomness, Probability, Introduction to Statistics, Sampling, Sample Means and Sample Sizes.
• Unit-2
Descriptive statistics – Central tendency, dispersion, variance, covariance, kurtosis, five point summary,
Distributions, Bayes Theorem, Error Probabilities; Permutation Testing, Statistical Inference; Hypothesis Testing,
Assessing Models, Decisions and Uncertainty, Comparing Samples, A/B Testing, P-Values, Causality.
• Unit-3
Estimation, Prediction, Confidence Intervals, Inference for Regression, Classification, Graphical Models,
Updating Predictions.
TEXT BOOKS/ REFERENCES:
• Adi Adhikari and John DeNero, “Computational and Inferential Thinking: The Foundations of Data Science”,
e-book.
• Data Mining for Business Analytics: Concepts, Techniques and Applications in R, by Galit Shmueli, Peter C.
Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Wiley India, 2018.
• Rachel Schutt & Cathy O’Neil, “Doing Data Science” O’ Reilly, First Edition, 2013.
Course Outcomes
• CO1: Understand the statistical foundations of data science.
• CO2: Learn techniques to pre-process raw data so as to enable further analysis.
• CO3: Conduct exploratory data analysis and create insightful visualizations to identify patterns.
• CO4: Introduce machine learning algorithms for prediction/classification and to derive insights.
• CO5: Analyze the degree of certainty of predictions using statistical test and models.
Evaluation Pattern (70:30)
• Internal – 70
 PT1 & PT2(max 50 mks) - weightage 30 mks(15 each)
• Thy -25 mks ; lab ex – 25 mks
• Exam in Lab (2 hrs test + 1 hr for evaluation)
 Continuous Evaluation (weightage 40 mks)
• Labs – 4 evaluations (10 mks each)
• Final Examination - 30
 End Semester Examinations (max 50 mks) - weightage 30 mks
• Thy -25 mks ; lab ex – 25 mks
• Exam in Lab (2 hrs test + 1 hr for evaluation)
Instructions to Students
• FoDS will be handled in a flipped class room
format
• Students are expected to bring their laptop to
the class room
Lets get started!
Explore statistics for two classic novels
• The Adventures of Huckleberry Finn by Mark Twain, and Little
Women by Louisa May Alcott.
• Books published before 1923 are currently in the public
domain..Project Gutenberg is a website that publishes public
domain books online.
• Using Python, we can load the text of these books directly from the
web.
• First, read the text of both books into lists of chapters,
called huck_finn_chapters and little_women_chapters.

More Related Content

Similar to 1. Intro DS.pptx

rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Shrihari Shrihari
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Data Works MD
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3Ferdin Joe John Joseph PhD
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016Manjula Ambur
 
Areas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineAreas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineJulyn Mae Pagmanoja
 
Areas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineAreas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineJulyn Mae Pagmanoja
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
writing research proposal (education).pptx
writing research proposal (education).pptxwriting research proposal (education).pptx
writing research proposal (education).pptxDrAmanSaxena
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 

Similar to 1. Intro DS.pptx (20)

rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Introduction to Statistics and Probability:
Introduction to Statistics and Probability:
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Lecture-2 Applied ML .pptx
Lecture-2 Applied ML .pptxLecture-2 Applied ML .pptx
Lecture-2 Applied ML .pptx
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
 
Areas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineAreas and-thesis-proposal-outline
Areas and-thesis-proposal-outline
 
Areas and-thesis-proposal-outline
Areas and-thesis-proposal-outlineAreas and-thesis-proposal-outline
Areas and-thesis-proposal-outline
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
Data Processing
 Data Processing Data Processing
Data Processing
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
writing research proposal (education).pptx
writing research proposal (education).pptxwriting research proposal (education).pptx
writing research proposal (education).pptx
 
KREAM@ICCS2013
KREAM@ICCS2013KREAM@ICCS2013
KREAM@ICCS2013
 
Research.pptx
Research.pptxResearch.pptx
Research.pptx
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 

More from Anusuya123

Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Types of Data-Introduction.pptx
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptxAnusuya123
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxAnusuya123
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxAnusuya123
 
5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptxAnusuya123
 
5.1.3. Chord.pptx
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptxAnusuya123
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.pptAnusuya123
 
5.Collective bargaining.pptx
5.Collective bargaining.pptx5.Collective bargaining.pptx
5.Collective bargaining.pptxAnusuya123
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
RuntimeenvironmentAnusuya123
 
Think pair share
Think pair shareThink pair share
Think pair shareAnusuya123
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lexAnusuya123
 
Operators in Python
Operators in PythonOperators in Python
Operators in PythonAnusuya123
 

More from Anusuya123 (14)

Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Types of Data-Introduction.pptx
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptx
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx
 
5.1.3. Chord.pptx
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptx
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
5.Collective bargaining.pptx
5.Collective bargaining.pptx5.Collective bargaining.pptx
5.Collective bargaining.pptx
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
Think pair share
Think pair shareThink pair share
Think pair share
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Operators in Python
Operators in PythonOperators in Python
Operators in Python
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 

1. Intro DS.pptx

  • 1. • From the ACM Taskforce on Data Science Curricula - Draws from many different disciplines What is Data Science ?
  • 2.
  • 3. • Statistics; – a central component of data science. Why? – statistics studies how to make robust conclusions with incomplete information. • Computing – is a central component because – programming allows us to apply analysis techniques to the large and diverse data sets that arise in real-world applications: – not just numbers, but text, images, videos, and sensor readings. • Domain Knowledge – Through understanding a particular domain, data scientists learn to ask appropriate questions about their data and correctly interpret the answers provided by our inferential and computational tools. Data science is all of these things, but it is more than the sum of its parts because of the applications. What is Data Science ?
  • 4.
  • 5. What is Foundations of Data Science? • Drawing useful conclusions from data using computation • Exploration – Identifying patterns in information – Uses visualizations • Inference – Quantifying whether those patterns are reliable – Uses randomization – quantifying our degree of certainty: will those patterns we found also appear in new observations? How accurate are our predictions? • Prediction – Making informed guesses – Uses machine learning
  • 6. • A foundation in DS requires – Not just understand statistical and computational techniques, – but also recognizing how they apply to real scenarios. • Whatever aspect of the world we wish to study— Earth’s weather/markets/ polls – data we collect typically offer an incomplete description of the subject at hand. – A central challenge of data science - make reliable conclusions using partial information. • In this endeavor, combine two essential tools: computation and randomization. – Computing skills • will allow us to use all available information to draw conclusions. • Rather than focusing only on the average temperature of a region, we will consider the whole range of temperatures together to construct a more nuanced analysis. – Randomness • allow us to consider the many different ways in which incomplete information might be completed. • Rather than assuming that temperatures vary in a particular way, we will learn to use randomness as a way to imagine many possible scenarios that are all consistent with the data we observe. Data Science
  • 7. Statistical Techniques • The discipline of statistics has long addressed the same fundamental challenge as data science: how to draw robust conclusions about the world using incomplete information. • An important contributions of statistics : consistent and precise vocabulary for describing the relationship between observations and conclusions. • Follow the same tradition, focus on a set of core inferential problems from statistics: – testing hypotheses, – estimating confidence, and – predicting unknown quantities.
  • 8. Data Science…Goes beyond Statistics • Data science extends the field of statistics by taking full advantage of – computing, – data visualization, – machine learning, – optimization, and – access to information. • The combination of fast computers and the Internet gives anyone the ability to access and analyze vast datasets: – millions of news articles, – full encyclopedias, – databases for any domain, and – massive repositories of music, photos, and video. • Challenge - Real data often do not follow regular patterns/ match standard equations. • The interesting variation in real data can be lost by focusing too much attention on simplistic summaries such as average values.
  • 9. Tools • Recommend the Anaconda distribution that packages together the Python 3 language interpreter, IPython libraries, and the Jupyter notebook environment. • Complete introduction to all of these computational tools. • Learn to write programs, generate images from data, and work with real-world data sets that are published online.
  • 11. FOUNDATIONS OF DATA SCIENCE 3-0-0-3 • Unit-1 Introduction, Causality and Experiments, Data Preprocessing: Data cleaning, Data reduction, Data transformation, Data discretization. Visualization and Graphing: Visualizing Categorical Distributions, Visualizing Numerical Distributions, Overlaid Graphs, plots, and summary statistics of exploratory data analysis, Randomness, Probability, Introduction to Statistics, Sampling, Sample Means and Sample Sizes. • Unit-2 Descriptive statistics – Central tendency, dispersion, variance, covariance, kurtosis, five point summary, Distributions, Bayes Theorem, Error Probabilities; Permutation Testing, Statistical Inference; Hypothesis Testing, Assessing Models, Decisions and Uncertainty, Comparing Samples, A/B Testing, P-Values, Causality. • Unit-3 Estimation, Prediction, Confidence Intervals, Inference for Regression, Classification, Graphical Models, Updating Predictions. TEXT BOOKS/ REFERENCES: • Adi Adhikari and John DeNero, “Computational and Inferential Thinking: The Foundations of Data Science”, e-book. • Data Mining for Business Analytics: Concepts, Techniques and Applications in R, by Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Wiley India, 2018. • Rachel Schutt & Cathy O’Neil, “Doing Data Science” O’ Reilly, First Edition, 2013. Course Outcomes • CO1: Understand the statistical foundations of data science. • CO2: Learn techniques to pre-process raw data so as to enable further analysis. • CO3: Conduct exploratory data analysis and create insightful visualizations to identify patterns. • CO4: Introduce machine learning algorithms for prediction/classification and to derive insights. • CO5: Analyze the degree of certainty of predictions using statistical test and models.
  • 12. Evaluation Pattern (70:30) • Internal – 70  PT1 & PT2(max 50 mks) - weightage 30 mks(15 each) • Thy -25 mks ; lab ex – 25 mks • Exam in Lab (2 hrs test + 1 hr for evaluation)  Continuous Evaluation (weightage 40 mks) • Labs – 4 evaluations (10 mks each) • Final Examination - 30  End Semester Examinations (max 50 mks) - weightage 30 mks • Thy -25 mks ; lab ex – 25 mks • Exam in Lab (2 hrs test + 1 hr for evaluation)
  • 13. Instructions to Students • FoDS will be handled in a flipped class room format • Students are expected to bring their laptop to the class room
  • 15. Explore statistics for two classic novels • The Adventures of Huckleberry Finn by Mark Twain, and Little Women by Louisa May Alcott. • Books published before 1923 are currently in the public domain..Project Gutenberg is a website that publishes public domain books online. • Using Python, we can load the text of these books directly from the web. • First, read the text of both books into lists of chapters, called huck_finn_chapters and little_women_chapters.