SlideShare a Scribd company logo
Data Science
Dr. Sunil Kr Pandey
Professor & Director (IT & UG)
Institute of Technology & Science
Mohan Nagar, Ghaziabad
Evolution of Databases
There's certainly a lot of it!
2015
1 Zettabyte
1 Exabyte
1 Petabyte
(brain) 14 PB: http://www.quora.com/Neuroscience-1/How-much-data-can-the-human-brain-store
(2002) 5 EB: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm
1 Petabyte == 1000 TB 2002 2009
(2009) 800 EB: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf
(2015) 8 ZB: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
2006 2011
(2006) 161 EB: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
(2011) 1.8 ZB: http://www.emc.com/leadership/programs/digital-universe.htm (life in video) 60 PB: in 4320p resolution, extrapolated from 16MB for 1:21 of 640x480 video
(w/sound) – almost certainly a gross overestimate, as sleep can be compressed significantly!
5 EB
161 EB
800 EB
1.8 ZB 8.0 ZB
14 PB
60 PB
Data produced each year
100-years of HD video + audio
Human brain's capacity
Data, data everywhere…
References
1 TB = 1000 GB
120 PB
logarithmicscale
Data has become a Resource that needs to be carefully stored, processed,
analyzed, visualize and Present where it is required securely.
Growing Need for Analytics
DATA
HARNESSING
Companies store
each piece of
information
generated during
the business
operations and
customer
interactions.
DATA VOLUMESData is generated.
Learning from the data
is used in the decision
making and process
optimization.
Data is analyzed. 1.22010
2012
2015
2.4
7.9
Volumes in Trillion GB
DID
YOU
KNOW
?
Generation of Large Amount of Data from Business Transactions
4
Billion
Number of
transactions
every year
900
Number
of Stores
Number
of SKUs
10000
-1 lakh
Year Data Volume in
Zetabytes
2010 2
2011 5
12 6.5
13 9
14 12.5
15 15.5
16 18
17 26
18 33
19 41
20 50.5
21 64.5
22 79.5
23 101
24 129.5
25 175
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
2 5 6.5 9 12.5 15.5 18 26 33 41
50.5
64.5
79.5
101
129.5
175
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Data Volume Growth from 2010 – 2025
Year Data Volume
Growth in Data Volume
2010-2025 (Projections)
Fourth Paradigm of Science
Turing award winner Jim
Gray imagined data science
as a "fourth paradigm" of
science -
• Thousands of years
• Empirical (अनुभवजन्य)
• Few hundreds of years
• Theoretical (सैद्धांतिक)
• Last fifty years
• Computational (गणनधत्मक)
• “Query the world”
• Last twenty years
• eScience (Data Science)
• “Download the world”
What is Data Science
• Data Science is a multi-disciplinary field that uses scientific
methods, processes, algorithms and systems to
extract knowledge and insights from structured and
unstructured data.
• Data Science is a "concept to unify statistics, data analysis,
machine learning and their related methods" in order to
"understand and analyze actual phenomena" with data. It
employs techniques and theories drawn from many fields within
the context of mathematics, statistics, comp. science,
and information science.
• The availability of high-capacity networks, low-cost computers and
storage devices as well as the widespread adoption of hardware
virtualization, service-oriented
architecture and autonomic and utility computing has led to growth
in cloud computing.
Data Science – A Visual Definition
Data Science : A Definition
Data Science is the science which uses computer science, statistics and
machine learning, visualization and human-computer interactions to:
1. Collect
2. Clean
3. Integrate
4. Analyze
5. Visualize
6. Interact
with data to create data products.
Objective of Data Science is to “Turn Data into Data Products”.
Traditionally, the data that we had was mostly structured and small in size,
which could be analyzed by using the simple BI tools. Unlike data in
the traditional systems which was mostly structured, today most of the
data is unstructured or semi-structured. Let’s have a look at the data
trends in the image given below which shows that by 2020, more than 80 % of
the data will be unstructured.
Data Science Team
•Business Analyst
•Data & Analytics Manager
•Data Analyst
•Database Administrator
•Data Scientist
•Statistician
•Data Engineer
•Data Architect
Role of Business Analyst
What is Analytics?
Data on its own is useless unless you can make sense of it!
WHAT IS ANALYTICS?
The scientific process of transforming data into insight for making
better decisions, offering new opportunities for a competitive
advantage
22
Types of Analytics
1
32
Analytics
Prescriptive Analytics
Descriptive analyticsPredictive analytics
Enabling smart decisions
based on data
What should we do?
Mining data to provide
business insights
What has happened?
Predicting the future based
on historical patterns
What could happen?
Types of Analytics
Prescriptive
Analytics
advice on possible outcomes
Predictive
Analytics
understanding the future
Descriptive
Analytics
insight into the past
Why do airline prices
change every hour?
How do grocery cashiers
know to hand you coupons
you might actually use?
How does Netflix
frequently recommend
just the right movie?
Features Business Intelligence (BI) Data Science
Data Sources
Structured
(Usually SQL, often Data Warehouse)
Both Structured and
Unstructured
( logs, cloud data, SQL,
NoSQL, text)
Approach Statistics and Visualization
Statistics, Machine
Learning, Graph Analysis,
Neuro- linguistic
Programming (NLP)
Focus Past and Present Present and Future
Tools Pentaho, Microsoft BI, QlikView, R
RapidMiner, BigML, Weka,
R
Business Intelligence (BI) vs. Data Science
Scope of
Business
Intelligence
techniques
employed in
2018.
Interest for “Data Science” term since
December 2013
(source: Google Trends)
Hype bag-of-words. Let’s not focus on buzzwords, but on what the
beneath technologies can actually solve.
Lifecycle of Data Science
Contrast: Databases
Databases Data Science
Data Value “Precious” “Cheap”
Data Volume Modest Massive
Examples Bank records,
Personnel records,
Census, Medical records
Online clicks, GPS logs,
Tweets, Building sensor readings
Priorities Consistency,
Error recovery,
Auditability
Speed,
Availability,
Query richness
Structured Strongly (Schema) Weakly or none (Text)
Properties Transactions, ACID* CAP* theorem (2/3),
eventual consistency
Realizations SQL NoSQL: MongoDB, CouchDB,
Hbase, Cassandra, Riak, Memcached,
Apache River, …
ACID = Atomicity, Consistency, Isolation and Durability
CAP = Consistency, Availability, Partition Tolerance
Contrast: Machine Learning
Data Science
Explore many models, build and tune hybrids
Understand empirical properties of models
Develop/use tools that can handle massive
datasets
Take action!
Machine Learning
Develop new (individual) models
Prove mathematical properties of models
Improve/validate on a few, relatively clean,
small datasets
Publish a paper
the companies are expanding as fast as the data!
The first war: Terminology
• Analyzing data has a long history!
• There have been many terms that have been used to describe such
endeavors:
• Statistics
• Artificial Intelligence
• Machine learning
• Data analytics
• Since I happen to work in a “Data Science” program perhaps I may be
allowed the indulgence of using that terminology…
The Case for Business Analytics
• The Business environment today is
more complex than ever before.
• Businesses are expected to be
diligently responsive to the
increasing demands of customers,
various stakeholders and even
regulators.
• Organizations have been turning to
the use of analytics.
• More than 83% of Global CIOs
surveyed by IBM in 2010 singled out
Business Intelligence and Analytics
as one of their visionary plans for
enhancing competitiveness.
In most cases the primary objective of
an organization that seeks to turn to
analytics is:
• Revenue/Profit growth
• Optimize expenditure
SOLUTION
BUSINESS NEED
GOAL
34
Data Analysis Has Been Around for a While…
R.A. Fisher
Howard
Dresner
Peter Luhn
W.E. Deming
Experiments, observations, and numerical simulations in many
areas of science and business are currently generating terabytes of
data, and in some cases are on the verge of generating petabytes
and beyond. Analyses of the information contained in these data
sets have already led to major breakthroughs in fields ranging from
genomics to astronomy and high-energy physics and to the
development of new information-based industries.
- Frontiers in Massive Data Analysis, National Research Council of the National Academies
Given a large mass of data, we can by judicious selection
construct perfectly plausible unassailable theories—all of
which, some of which, or none of which may be right.
- Paul Arnold Srere
The ability to take data—to be able to understand it, to process it, to
extract value from it, to visualize it, to communicate it—that’s going
to be a hugely important skill in the next decades, not only at the
professional level but even at the educational level for elementary
school kids, for high school kids, for college kids. Because now we
really do have essentially free and ubiquitous data. So the
complimentary scarce factor is the ability to understand that data
and extract value from it.
-Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
My personal goal: Getting students to be able to
think critically about data.
What is Big Data?
The are many examples of "data", but what makes some of it “big”? The classic
definition revolves around the three V’s - Volume, velocity, and variety.
 Volume: There is a just a lot of it being generated all the time. Things get
interesting and “big”, when you can’t fit it all on one computer anymore.
Why? There are many ideas here such as MapReduce, Hadoop, etc. that all
revolve around being able to process data that goes from Terabytes, to
Petabytes, to Exabytes.
 Velocity: Data is being generated very quickly. Can you even store it all? If
not, then what do you get rid of and what do you keep?
 Variety: The data types you mention all take different shapes. What does it
mean to store them so that you can play with or compare them?
BIGDATAData that is TOO LARGE & TOO
COMPLEX for conventional data tools
to capture, store and analyze.
Shares traded on US
Stock Markets each
day:
7 Billion
Data generated in
one flight from NY
to London:
10 Terabytes
Number of tweets
per day on Twitter:
400 Million
Number of ‘Likes’
each day on
Facebook:
3 Billion
The 3V’s of Big Data
VOLUME VARIETY VELOCITY
90% OF THE WORLD’S
DATA WAS
GENERATED IN THE
LAST TWO YEARS
Big Data Everywhere!
www.imarticus.org 39
Is Big Data the same as Data Science?
 Are Big Data and Data Science the same thing?
 I wouldn't say so...
 Data Science can be done on small data sets.
 And not everything done using Big Data would necessarily be called Data
Science.
Big Data
Data
Science
Is Big Data the same as Data Science?
 Are Big Data and Data Science the same thing?
 I wouldn't say so...
 Data Science can be done on small data sets.
 And not everything done using Big Data would necessarily be called Data
Science.
 But there certainly is a substantial overlap!
Big Data
Data
Science
Perspective Of Big Data's Growth
• Worldwide Big Data market revenues for software and services are projected to
increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual
Growth Rate (CAGR) of 10.48% according to Wikibon.
•According to an Accenture study, 79% of enterprise executives agree that
companies that do not embrace Big Data will lose their competitive position and
could face extinction. Even more, 83%, have pursued Big Data projects to seize a
competitive edge.
•Forrester predicts the global Big Data software market will be worth $31B this
year, growing 14% from the previous year. The entire global software market is
forecast to be worth $628B in revenue, with $302B from applications.
•Worldwide Big Data market revenues for software and services are projected to
increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual
Growth Rate (CAGR) of 10.48% according to Wikibon.
• 59% of executives say Big Data at their company would be improved through the
use of AI according to PwC.
Future Trends
Tech & Industries to watch out in near Future:
• Progressive Web Apps (PWAs) — A mixture of a mobile and web apps.
• Block Chain & Fintech – Meta-model building, reliable trading & credit scoring.
• Healthcare — Diagnosis by Medical Imaging (Computer vision & ML).
• AR/VR — Sport Analysis, Business Cards (Image Tracking), Real -Life Gaming
(Hado).
• AI Speech Assistants, smarter Chat-bot integrations.
• Smart Supply Chain — Digital twins (IoT Sensors).
• 5G — Big data, Mobile cloud computing, scalable IoT & Network function
virtualisation (NFV).
• 3D Printing — Prefabrication efficiency, Defect detection, Predictive ML
maintenance.
• Dark Data — Information that is yet to become available in digital format.
• Quantum Computing — Cutting data processing times into fractions.
Thank You!
Dr. Sunil Kr Pandey
Professor & Director (IT & UG)
Institute of Technology & Science
Mohan Nagar, Ghaziabad
Email: sunilpandey@its.edu.in

More Related Content

What's hot

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ANOOP V S
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
University of Washington
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
Arnab Majumdar
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
Kenny Daniel
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance
 
Data science
Data scienceData science
Data science
Sreejith c
 
Data Science
Data ScienceData Science
Data Science
Amit Singh
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
Gregory Piatetsky-Shapiro
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Ilkay Altintas, Ph.D.
 
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...BigData AAI
 
Data science
Data scienceData science
Data science
9diov
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
Gregory Piatetsky-Shapiro
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
Ioannis Kourouklides
 

What's hot (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
Data science
Data scienceData science
Data science
 
Data Science
Data ScienceData Science
Data Science
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
 
Data science
Data scienceData science
Data science
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 

Similar to Data Science - An emerging Stream of Science with its Spreading Reach & Impact

intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
jybufgofasfbkpoovh
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
shalini s
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
SouravBiswas747273
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
Philip Bourne
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
Dylan Erens
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
wahiba ben abdessalem
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
ssuser1a4f0f
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
Sanghamitra Deb
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
Inbavalli Valli
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
Zenodia Charpy
 
Datascience
DatascienceDatascience
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
ArmyTrilidiaDevegaSK
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Shiv Shakti Ghosh
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
RUDRAPRASADSABAR
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
vishal choudhary
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
Willard Van De Bogart
 
Data science
Data scienceData science

Similar to Data Science - An emerging Stream of Science with its Spreading Reach & Impact (20)

intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Datascience
DatascienceDatascience
Datascience
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Data science
Data scienceData science
Data science
 

More from Dr. Sunil Kr. Pandey

Cloud Security, Standards and Applications
Cloud Security, Standards and ApplicationsCloud Security, Standards and Applications
Cloud Security, Standards and Applications
Dr. Sunil Kr. Pandey
 
Virtualization for Cloud Environment
Virtualization for Cloud EnvironmentVirtualization for Cloud Environment
Virtualization for Cloud Environment
Dr. Sunil Kr. Pandey
 
Collaborating Using Cloud Services
Collaborating Using Cloud ServicesCollaborating Using Cloud Services
Collaborating Using Cloud Services
Dr. Sunil Kr. Pandey
 
Cloud Services: Types of Cloud
Cloud Services: Types of CloudCloud Services: Types of Cloud
Cloud Services: Types of Cloud
Dr. Sunil Kr. Pandey
 
Cloud Computing - Introduction
Cloud Computing - IntroductionCloud Computing - Introduction
Cloud Computing - Introduction
Dr. Sunil Kr. Pandey
 
Future Skills & Career Opportunities in POST COVID-19
Future Skills & Career Opportunities in POST COVID-19Future Skills & Career Opportunities in POST COVID-19
Future Skills & Career Opportunities in POST COVID-19
Dr. Sunil Kr. Pandey
 
Digital India: Use of Technology For Transforming Society
Digital India: Use of Technology For Transforming SocietyDigital India: Use of Technology For Transforming Society
Digital India: Use of Technology For Transforming Society
Dr. Sunil Kr. Pandey
 
Mobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future DirectionsMobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future Directions
Dr. Sunil Kr. Pandey
 
Mobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future DirectionsMobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future Directions
Dr. Sunil Kr. Pandey
 
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
Dr. Sunil Kr. Pandey
 
Digital India MIssion - An oveview
Digital India MIssion - An oveviewDigital India MIssion - An oveview
Digital India MIssion - An oveview
Dr. Sunil Kr. Pandey
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Dr. Sunil Kr. Pandey
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
Dr. Sunil Kr. Pandey
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Dr. Sunil Kr. Pandey
 

More from Dr. Sunil Kr. Pandey (15)

Cloud Security, Standards and Applications
Cloud Security, Standards and ApplicationsCloud Security, Standards and Applications
Cloud Security, Standards and Applications
 
Virtualization for Cloud Environment
Virtualization for Cloud EnvironmentVirtualization for Cloud Environment
Virtualization for Cloud Environment
 
Collaborating Using Cloud Services
Collaborating Using Cloud ServicesCollaborating Using Cloud Services
Collaborating Using Cloud Services
 
Cloud Services: Types of Cloud
Cloud Services: Types of CloudCloud Services: Types of Cloud
Cloud Services: Types of Cloud
 
Cloud Computing - Introduction
Cloud Computing - IntroductionCloud Computing - Introduction
Cloud Computing - Introduction
 
Future Skills & Career Opportunities in POST COVID-19
Future Skills & Career Opportunities in POST COVID-19Future Skills & Career Opportunities in POST COVID-19
Future Skills & Career Opportunities in POST COVID-19
 
Digital India: Use of Technology For Transforming Society
Digital India: Use of Technology For Transforming SocietyDigital India: Use of Technology For Transforming Society
Digital India: Use of Technology For Transforming Society
 
Mobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future DirectionsMobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future Directions
 
Mobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future DirectionsMobile Technology – Historical Evolution, Present Status & Future Directions
Mobile Technology – Historical Evolution, Present Status & Future Directions
 
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...
 
Digital India MIssion - An oveview
Digital India MIssion - An oveviewDigital India MIssion - An oveview
Digital India MIssion - An oveview
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

Data Science - An emerging Stream of Science with its Spreading Reach & Impact

  • 1. Data Science Dr. Sunil Kr Pandey Professor & Director (IT & UG) Institute of Technology & Science Mohan Nagar, Ghaziabad
  • 2.
  • 3.
  • 5. There's certainly a lot of it! 2015 1 Zettabyte 1 Exabyte 1 Petabyte (brain) 14 PB: http://www.quora.com/Neuroscience-1/How-much-data-can-the-human-brain-store (2002) 5 EB: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm 1 Petabyte == 1000 TB 2002 2009 (2009) 800 EB: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf (2015) 8 ZB: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf 2006 2011 (2006) 161 EB: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf (2011) 1.8 ZB: http://www.emc.com/leadership/programs/digital-universe.htm (life in video) 60 PB: in 4320p resolution, extrapolated from 16MB for 1:21 of 640x480 video (w/sound) – almost certainly a gross overestimate, as sleep can be compressed significantly! 5 EB 161 EB 800 EB 1.8 ZB 8.0 ZB 14 PB 60 PB Data produced each year 100-years of HD video + audio Human brain's capacity Data, data everywhere… References 1 TB = 1000 GB 120 PB logarithmicscale
  • 6. Data has become a Resource that needs to be carefully stored, processed, analyzed, visualize and Present where it is required securely.
  • 7. Growing Need for Analytics DATA HARNESSING Companies store each piece of information generated during the business operations and customer interactions. DATA VOLUMESData is generated. Learning from the data is used in the decision making and process optimization. Data is analyzed. 1.22010 2012 2015 2.4 7.9 Volumes in Trillion GB DID YOU KNOW ? Generation of Large Amount of Data from Business Transactions 4 Billion Number of transactions every year 900 Number of Stores Number of SKUs 10000 -1 lakh
  • 8. Year Data Volume in Zetabytes 2010 2 2011 5 12 6.5 13 9 14 12.5 15 15.5 16 18 17 26 18 33 19 41 20 50.5 21 64.5 22 79.5 23 101 24 129.5 25 175 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2 5 6.5 9 12.5 15.5 18 26 33 41 50.5 64.5 79.5 101 129.5 175 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data Volume Growth from 2010 – 2025 Year Data Volume Growth in Data Volume 2010-2025 (Projections)
  • 9.
  • 10. Fourth Paradigm of Science Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science - • Thousands of years • Empirical (अनुभवजन्य) • Few hundreds of years • Theoretical (सैद्धांतिक) • Last fifty years • Computational (गणनधत्मक) • “Query the world” • Last twenty years • eScience (Data Science) • “Download the world”
  • 11.
  • 12. What is Data Science • Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. • Data Science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, comp. science, and information science. • The availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture and autonomic and utility computing has led to growth in cloud computing.
  • 13. Data Science – A Visual Definition
  • 14. Data Science : A Definition Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to: 1. Collect 2. Clean 3. Integrate 4. Analyze 5. Visualize 6. Interact with data to create data products. Objective of Data Science is to “Turn Data into Data Products”.
  • 15. Traditionally, the data that we had was mostly structured and small in size, which could be analyzed by using the simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. Let’s have a look at the data trends in the image given below which shows that by 2020, more than 80 % of the data will be unstructured.
  • 16. Data Science Team •Business Analyst •Data & Analytics Manager •Data Analyst •Database Administrator •Data Scientist •Statistician •Data Engineer •Data Architect
  • 17.
  • 18.
  • 19. Role of Business Analyst
  • 20.
  • 21.
  • 22. What is Analytics? Data on its own is useless unless you can make sense of it! WHAT IS ANALYTICS? The scientific process of transforming data into insight for making better decisions, offering new opportunities for a competitive advantage 22
  • 23.
  • 24. Types of Analytics 1 32 Analytics Prescriptive Analytics Descriptive analyticsPredictive analytics Enabling smart decisions based on data What should we do? Mining data to provide business insights What has happened? Predicting the future based on historical patterns What could happen?
  • 25. Types of Analytics Prescriptive Analytics advice on possible outcomes Predictive Analytics understanding the future Descriptive Analytics insight into the past Why do airline prices change every hour? How do grocery cashiers know to hand you coupons you might actually use? How does Netflix frequently recommend just the right movie?
  • 26. Features Business Intelligence (BI) Data Science Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R Business Intelligence (BI) vs. Data Science
  • 28. Interest for “Data Science” term since December 2013 (source: Google Trends) Hype bag-of-words. Let’s not focus on buzzwords, but on what the beneath technologies can actually solve.
  • 29. Lifecycle of Data Science
  • 30. Contrast: Databases Databases Data Science Data Value “Precious” “Cheap” Data Volume Modest Massive Examples Bank records, Personnel records, Census, Medical records Online clicks, GPS logs, Tweets, Building sensor readings Priorities Consistency, Error recovery, Auditability Speed, Availability, Query richness Structured Strongly (Schema) Weakly or none (Text) Properties Transactions, ACID* CAP* theorem (2/3), eventual consistency Realizations SQL NoSQL: MongoDB, CouchDB, Hbase, Cassandra, Riak, Memcached, Apache River, … ACID = Atomicity, Consistency, Isolation and Durability CAP = Consistency, Availability, Partition Tolerance
  • 31. Contrast: Machine Learning Data Science Explore many models, build and tune hybrids Understand empirical properties of models Develop/use tools that can handle massive datasets Take action! Machine Learning Develop new (individual) models Prove mathematical properties of models Improve/validate on a few, relatively clean, small datasets Publish a paper
  • 32. the companies are expanding as fast as the data!
  • 33. The first war: Terminology • Analyzing data has a long history! • There have been many terms that have been used to describe such endeavors: • Statistics • Artificial Intelligence • Machine learning • Data analytics • Since I happen to work in a “Data Science” program perhaps I may be allowed the indulgence of using that terminology…
  • 34. The Case for Business Analytics • The Business environment today is more complex than ever before. • Businesses are expected to be diligently responsive to the increasing demands of customers, various stakeholders and even regulators. • Organizations have been turning to the use of analytics. • More than 83% of Global CIOs surveyed by IBM in 2010 singled out Business Intelligence and Analytics as one of their visionary plans for enhancing competitiveness. In most cases the primary objective of an organization that seeks to turn to analytics is: • Revenue/Profit growth • Optimize expenditure SOLUTION BUSINESS NEED GOAL 34
  • 35. Data Analysis Has Been Around for a While… R.A. Fisher Howard Dresner Peter Luhn W.E. Deming
  • 36. Experiments, observations, and numerical simulations in many areas of science and business are currently generating terabytes of data, and in some cases are on the verge of generating petabytes and beyond. Analyses of the information contained in these data sets have already led to major breakthroughs in fields ranging from genomics to astronomy and high-energy physics and to the development of new information-based industries. - Frontiers in Massive Data Analysis, National Research Council of the National Academies Given a large mass of data, we can by judicious selection construct perfectly plausible unassailable theories—all of which, some of which, or none of which may be right. - Paul Arnold Srere
  • 37. The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. -Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers My personal goal: Getting students to be able to think critically about data.
  • 38. What is Big Data? The are many examples of "data", but what makes some of it “big”? The classic definition revolves around the three V’s - Volume, velocity, and variety.  Volume: There is a just a lot of it being generated all the time. Things get interesting and “big”, when you can’t fit it all on one computer anymore. Why? There are many ideas here such as MapReduce, Hadoop, etc. that all revolve around being able to process data that goes from Terabytes, to Petabytes, to Exabytes.  Velocity: Data is being generated very quickly. Can you even store it all? If not, then what do you get rid of and what do you keep?  Variety: The data types you mention all take different shapes. What does it mean to store them so that you can play with or compare them?
  • 39. BIGDATAData that is TOO LARGE & TOO COMPLEX for conventional data tools to capture, store and analyze. Shares traded on US Stock Markets each day: 7 Billion Data generated in one flight from NY to London: 10 Terabytes Number of tweets per day on Twitter: 400 Million Number of ‘Likes’ each day on Facebook: 3 Billion The 3V’s of Big Data VOLUME VARIETY VELOCITY 90% OF THE WORLD’S DATA WAS GENERATED IN THE LAST TWO YEARS Big Data Everywhere! www.imarticus.org 39
  • 40.
  • 41. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science. Big Data Data Science
  • 42. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science.  But there certainly is a substantial overlap! Big Data Data Science
  • 43. Perspective Of Big Data's Growth • Worldwide Big Data market revenues for software and services are projected to increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual Growth Rate (CAGR) of 10.48% according to Wikibon. •According to an Accenture study, 79% of enterprise executives agree that companies that do not embrace Big Data will lose their competitive position and could face extinction. Even more, 83%, have pursued Big Data projects to seize a competitive edge. •Forrester predicts the global Big Data software market will be worth $31B this year, growing 14% from the previous year. The entire global software market is forecast to be worth $628B in revenue, with $302B from applications. •Worldwide Big Data market revenues for software and services are projected to increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual Growth Rate (CAGR) of 10.48% according to Wikibon. • 59% of executives say Big Data at their company would be improved through the use of AI according to PwC.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51. Future Trends Tech & Industries to watch out in near Future: • Progressive Web Apps (PWAs) — A mixture of a mobile and web apps. • Block Chain & Fintech – Meta-model building, reliable trading & credit scoring. • Healthcare — Diagnosis by Medical Imaging (Computer vision & ML). • AR/VR — Sport Analysis, Business Cards (Image Tracking), Real -Life Gaming (Hado). • AI Speech Assistants, smarter Chat-bot integrations. • Smart Supply Chain — Digital twins (IoT Sensors). • 5G — Big data, Mobile cloud computing, scalable IoT & Network function virtualisation (NFV). • 3D Printing — Prefabrication efficiency, Defect detection, Predictive ML maintenance. • Dark Data — Information that is yet to become available in digital format. • Quantum Computing — Cutting data processing times into fractions.
  • 52.
  • 53. Thank You! Dr. Sunil Kr Pandey Professor & Director (IT & UG) Institute of Technology & Science Mohan Nagar, Ghaziabad Email: sunilpandey@its.edu.in