DATA SCIENCE The highest paid job of future
Presented By:
Dr. Hemant Kumar Singh
Associate Professor & Head
Deptt. of Computer Science & Engineering
DATA SCIENCE
• Data Science is a new term. But in the same sense as
Columbus was discovered NEW Continent 1000 years ago
• A multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and
insights from structured and unstructured data.
• Google knows us more than our parents
DATA ALL AROUND
Lots of data is being collected and warehoused
• –Scientific Experiments
• –Internet of Things
• –Web data, e-commerce
• –Financial transactions, bank/credit transactions
• –Online trading and purchasing
• –Social Network
BIG DATA?
50
times
2010 2020
35 ZB
SOURCES OF BIG DATA
BIG DATA
Black
box Data
IOT
Social Media
Transactions
Call Record
Data
–……many more!
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?
TBs
of
data
every
day
2+
billion
people on
the Web
by end
2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world wide
100s of
millions
of GPS
enabled
devices
sold
annually
MEASUREMENTS
Unit of
Measure
Approximate
size
Mathematical
Representation
Kilobyte(KB) 103 210
Megabyte(MB) 106 220
Gigabyte(GB) 109 230
Terabyte(TB) 1012 240
Petabyte(PT) 1015 250
Exabyte(EB) 1018 260
Zetabyte(ZB) 1021 270
Yottabyte(YB) 1024 280
SOME FACTS ABOUT DATA
• 2.5 quintillion data bytes daily in 2020. 463 exabytes of data will be
generated each day by people as of 2025.
• There were 4.66 billion active internet users around the world in
January 2021.
• There were 319 million new internet users in 2020
• The end of 2021 could see two trillion Google searches.
SOME FACTS ABOUT DATA
• Five hundred thousand new Tweets were posted every day in 2020.
• Facebook generated four petabytes of data every day in 2020
• Almost 70% of GDPs will have undergone digitization by 2022.
• China had 3.17 billion IoT devices in 2020
• Cloud data storage around the world will amount to 200+ Zettabytes
by 2025.
AI-ML-DL-DS
DL
AI
ML
DS
AI enables machine to think
Self driving car is an AI application
It provides statistical tools to
explore the data (gives capability
to learn)
Create neural network(mimic human
brain)
Apply ML/DL tools and
some mathematical
models
PILLARS OF DATA SCIENCE
• Data science (DS) is a multidisciplinary field of study with
goal to address the challenges in big data
• Data science principles apply to all data –big and small
WHY LEARN DATA SCIENCE?
According to Simon Quinton, “If Analytics is the Engine, then
Data is the Fuel of the 21st century.” Without data, businesses
would not be able to uncover useful insights that could help
streamline their business.
• Computer Science field titles are Software development
engineer, software developer, Java developer, systems
engineer and network engineer
• Those who work in the field of data science may have titles
such as data scientist, data architect, data engineer,
business analyst and data analyst.
HOW IS DATA SCIENCE DIFFERENT FROM COMPUTER SCIENCE
Some typical computer science-related job duties include:
• Testing, documenting and debugging code
• Creating or modifying software and mobile apps
• Designing components of an application and integrating
them into a larger overall product
• Collaborating with a team of programmers to build and
optimize code
• Some typical data science-related job duties include:
• Collecting, “cleaning” and organizing data sets
• Building data models
• Asking and answering questions with large scale data
analysis
• Creating data visualizations and presenting findings to
stakeholders
• A Data Scientist is a data expert that extrapolates insights
from large data sets to help organizations solve complex
problems. To do so, Data Scientists combine
computer science, mathematics, statistics, and modeling
with a strong understanding of their business and industry
to unlock new opportunities and strategies.
WHAT DO DATA SCIENTISTS DO
Query Please

Data science

  • 1.
    DATA SCIENCE Thehighest paid job of future Presented By: Dr. Hemant Kumar Singh Associate Professor & Head Deptt. of Computer Science & Engineering
  • 2.
    DATA SCIENCE • DataScience is a new term. But in the same sense as Columbus was discovered NEW Continent 1000 years ago • A multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. • Google knows us more than our parents
  • 3.
    DATA ALL AROUND Lotsof data is being collected and warehoused • –Scientific Experiments • –Internet of Things • –Web data, e-commerce • –Financial transactions, bank/credit transactions • –Online trading and purchasing • –Social Network
  • 4.
  • 5.
    SOURCES OF BIGDATA BIG DATA Black box Data IOT Social Media Transactions Call Record Data –……many more!
  • 6.
    12+ TBs of tweetdata every day 25+ TBs of log data every day ? TBs of data every day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually
  • 7.
    MEASUREMENTS Unit of Measure Approximate size Mathematical Representation Kilobyte(KB) 103210 Megabyte(MB) 106 220 Gigabyte(GB) 109 230 Terabyte(TB) 1012 240 Petabyte(PT) 1015 250 Exabyte(EB) 1018 260 Zetabyte(ZB) 1021 270 Yottabyte(YB) 1024 280
  • 8.
    SOME FACTS ABOUTDATA • 2.5 quintillion data bytes daily in 2020. 463 exabytes of data will be generated each day by people as of 2025. • There were 4.66 billion active internet users around the world in January 2021. • There were 319 million new internet users in 2020 • The end of 2021 could see two trillion Google searches.
  • 9.
    SOME FACTS ABOUTDATA • Five hundred thousand new Tweets were posted every day in 2020. • Facebook generated four petabytes of data every day in 2020 • Almost 70% of GDPs will have undergone digitization by 2022. • China had 3.17 billion IoT devices in 2020 • Cloud data storage around the world will amount to 200+ Zettabytes by 2025.
  • 10.
    AI-ML-DL-DS DL AI ML DS AI enables machineto think Self driving car is an AI application It provides statistical tools to explore the data (gives capability to learn) Create neural network(mimic human brain) Apply ML/DL tools and some mathematical models
  • 11.
    PILLARS OF DATASCIENCE • Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data • Data science principles apply to all data –big and small
  • 12.
    WHY LEARN DATASCIENCE? According to Simon Quinton, “If Analytics is the Engine, then Data is the Fuel of the 21st century.” Without data, businesses would not be able to uncover useful insights that could help streamline their business.
  • 13.
    • Computer Sciencefield titles are Software development engineer, software developer, Java developer, systems engineer and network engineer • Those who work in the field of data science may have titles such as data scientist, data architect, data engineer, business analyst and data analyst. HOW IS DATA SCIENCE DIFFERENT FROM COMPUTER SCIENCE
  • 14.
    Some typical computerscience-related job duties include: • Testing, documenting and debugging code • Creating or modifying software and mobile apps • Designing components of an application and integrating them into a larger overall product • Collaborating with a team of programmers to build and optimize code
  • 15.
    • Some typicaldata science-related job duties include: • Collecting, “cleaning” and organizing data sets • Building data models • Asking and answering questions with large scale data analysis • Creating data visualizations and presenting findings to stakeholders
  • 16.
    • A DataScientist is a data expert that extrapolates insights from large data sets to help organizations solve complex problems. To do so, Data Scientists combine computer science, mathematics, statistics, and modeling with a strong understanding of their business and industry to unlock new opportunities and strategies. WHAT DO DATA SCIENTISTS DO
  • 17.