SlideShare a Scribd company logo
1 of 18
INTRODUCTION TO BIG DATA 
S I T A R A M K O T N I S P . 
D A T E : 2 0 / 1 0 / 2 0 1 4
COMPUTING SYSTEMS EVOLUTION 
Flat Files 
Mainframes 
RDBMS 
Into Data Explosion 
Data Warehousing
What is BIG DATA? 
Volume 
Scale of 
Data 
BIG 
DATA 
VeracityA 
ccuracy 
of Data 
Velocity 
Analysis of 
Streaming 
Data 
Variety 
Different 
Forms of 
Data 
“data sets that are too large and 
complex to manipulate or 
interrogate with standard methods 
or tools.” 
The challenges include 
analysis, capture, curation, 
search, sharing, storage, 
transfer, visualization, and 
privacy violations.
Factors Contributing to BIG DATA 
SmartPh 
ones 
BIG 
DATA 
Sensor 
Enabled 
Devices 
Online 
banking/s 
ales 
Social 
Computin 
Cloud 
g 
computing
SOME STATISTICS… 
Everyday 247 billion emails are exchanged in the world in which 
80% are spam. 
YouTube users upload 48 hours of new video every minute of the 
day. 
Upto 2003 we stored 5 Exabytes of data. Today everyday we 
generated more than 5EB. 
There are 30 billion pieces of content shared on Facebook every 
month. 
People wishing happy new year generated 80TB of data in 2011. 
Number of webpages Google indexes is more than 55 billion.
BIG DATA IS BIG DEAL!!! 
Data is increasing at accelerating speeds day by day 
Revolutionary changes in statistical and computational 
methods 
Everyone wants to find insights from the pile of data 
Some of the patterns discovered in BIG DATA would 
not be seen with small sets of data. 
Sentiment Analysis, Market Study, Automated Traffic 
controls, System monitored health care and many more
WHERE IS IT USED? 
Science and Research – meteorology, genomic studies, LHC, 
NASA etc.. 
Government – NSA, Adhar 
Private Enterprises 
Google, Facebook, Twitter, Yahoo, Ebay and more 
Politics -----)
Why is it used?--> To make 
decisions i.e. Analytics 
Descriptive Analytics 
Vanilla BI reports 
Predictive Analytics 
Linkedln Recommendations 
Credit card ratings. 
Prescriptive Analytics 
Health care 
Oil and Gas explorations.
TYPES OF APPLICATIONS 
Days/Weeks Data latency Nanosecond Data latency
TYPE 1: MAPREDUCE 
MapReduce is a programming model designed for processing large 
volumes of data in parallel by dividing the work into a set of 
independent tasks. 
.
HADOOP 
Apache has implemented HADOOP framework on 
basis of MapReduce programming model. 
"NoSQL” approach to data 
Tools available to interact with SQL based traditional Databases. 
It is designed to scale up with a very high degree of fault 
tolerance on commodity based servers. 
MapRedue 
- Assigns and 
manages 
work in 
cluseter 
nodes 
HDFS 
- Distributes 
data between 
nodes. 
- Provides 
fault toreance 
EcoSystem 
- PIG, HIVE, 
SCOOP 
ZOOKEEPER 
HADOOP
TYPE 2 :REAL-TIME ANALYTICS– 
WITH ADVANCED FEATURES IN RDBMS/ NOSQL 
DATABASES 
In-memory data and computing 
Columnar Data 
Write(Insert) operations in to Delta 
Partitioning and many more
In memory (or main memory) database 
Is a database management system that primarily relies on 
main memory for computer data storage. 
SAP HANA, IBM Netezza, HP Vertica
Columnar databases 
Enable faster throughput for Aggregations, selelctions, calculations. 
$ 
$ 
Values of a column are stored contiguously in memory. 
Columnar database tables can better suit to OLAP 
Suitable for NoSQL databases too. 
A 10 € B 35 $ C 2 € D 40 € E 12 
A B C D E 10 35 2 40 12 € $ € € 
memory address 
organize by row 
organize by column
Insert in to Delta
TYPE 3: COMPLEX EVENT PROCESSING 
Most Real-Time of all with nano-second 
data latency. 
Smart Trade solutions 
Fraud Detection systems 
Driverless Cars
CONCLUSION 
BIG DATA concept is old but caught attention in recent years. 
Private Enterprises, Public Sector Agencies, Financial Institutions are 
eagerly looking to take advantage of BIG DATA solutions. 
Various products, frameworks and solutions are available in the market 
and they keep growing. 
Huge market opportunities for IT services and analytics firms.

More Related Content

What's hot

What's hot (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
 
Simplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data VirtualizationSimplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data Virtualization
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
From hadoop to spark
From hadoop to sparkFrom hadoop to spark
From hadoop to spark
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityThe Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
 
Denodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the CloudDenodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the Cloud
 
A Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data VirtualizationA Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data Virtualization
 

Viewers also liked

Viewers also liked (6)

Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing Overview
 
Sap hana poc volvo it
Sap hana poc volvo itSap hana poc volvo it
Sap hana poc volvo it
 
SAP HANA Overview
SAP HANA OverviewSAP HANA Overview
SAP HANA Overview
 
SAP HANA Overview
SAP HANA OverviewSAP HANA Overview
SAP HANA Overview
 
SAP Migration Overview
SAP Migration OverviewSAP Migration Overview
SAP Migration Overview
 
AWS re:Invent 2016: Technical Tips for Helping SAP Customers Succeed on AWS (...
AWS re:Invent 2016: Technical Tips for Helping SAP Customers Succeed on AWS (...AWS re:Invent 2016: Technical Tips for Helping SAP Customers Succeed on AWS (...
AWS re:Invent 2016: Technical Tips for Helping SAP Customers Succeed on AWS (...
 

Similar to Introduction to big data

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
AASTHA PANDEY
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
rajsharma159890
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 

Similar to Introduction to big data (20)

Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
1
11
1
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big data
Big dataBig data
Big data
 
Big data
Big data Big data
Big data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Unit 1
Unit 1Unit 1
Unit 1
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Introduction to big data

  • 1. INTRODUCTION TO BIG DATA S I T A R A M K O T N I S P . D A T E : 2 0 / 1 0 / 2 0 1 4
  • 2. COMPUTING SYSTEMS EVOLUTION Flat Files Mainframes RDBMS Into Data Explosion Data Warehousing
  • 3.
  • 4. What is BIG DATA? Volume Scale of Data BIG DATA VeracityA ccuracy of Data Velocity Analysis of Streaming Data Variety Different Forms of Data “data sets that are too large and complex to manipulate or interrogate with standard methods or tools.” The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations.
  • 5. Factors Contributing to BIG DATA SmartPh ones BIG DATA Sensor Enabled Devices Online banking/s ales Social Computin Cloud g computing
  • 6. SOME STATISTICS… Everyday 247 billion emails are exchanged in the world in which 80% are spam. YouTube users upload 48 hours of new video every minute of the day. Upto 2003 we stored 5 Exabytes of data. Today everyday we generated more than 5EB. There are 30 billion pieces of content shared on Facebook every month. People wishing happy new year generated 80TB of data in 2011. Number of webpages Google indexes is more than 55 billion.
  • 7. BIG DATA IS BIG DEAL!!! Data is increasing at accelerating speeds day by day Revolutionary changes in statistical and computational methods Everyone wants to find insights from the pile of data Some of the patterns discovered in BIG DATA would not be seen with small sets of data. Sentiment Analysis, Market Study, Automated Traffic controls, System monitored health care and many more
  • 8. WHERE IS IT USED? Science and Research – meteorology, genomic studies, LHC, NASA etc.. Government – NSA, Adhar Private Enterprises Google, Facebook, Twitter, Yahoo, Ebay and more Politics -----)
  • 9. Why is it used?--> To make decisions i.e. Analytics Descriptive Analytics Vanilla BI reports Predictive Analytics Linkedln Recommendations Credit card ratings. Prescriptive Analytics Health care Oil and Gas explorations.
  • 10. TYPES OF APPLICATIONS Days/Weeks Data latency Nanosecond Data latency
  • 11. TYPE 1: MAPREDUCE MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. .
  • 12. HADOOP Apache has implemented HADOOP framework on basis of MapReduce programming model. "NoSQL” approach to data Tools available to interact with SQL based traditional Databases. It is designed to scale up with a very high degree of fault tolerance on commodity based servers. MapRedue - Assigns and manages work in cluseter nodes HDFS - Distributes data between nodes. - Provides fault toreance EcoSystem - PIG, HIVE, SCOOP ZOOKEEPER HADOOP
  • 13. TYPE 2 :REAL-TIME ANALYTICS– WITH ADVANCED FEATURES IN RDBMS/ NOSQL DATABASES In-memory data and computing Columnar Data Write(Insert) operations in to Delta Partitioning and many more
  • 14. In memory (or main memory) database Is a database management system that primarily relies on main memory for computer data storage. SAP HANA, IBM Netezza, HP Vertica
  • 15. Columnar databases Enable faster throughput for Aggregations, selelctions, calculations. $ $ Values of a column are stored contiguously in memory. Columnar database tables can better suit to OLAP Suitable for NoSQL databases too. A 10 € B 35 $ C 2 € D 40 € E 12 A B C D E 10 35 2 40 12 € $ € € memory address organize by row organize by column
  • 16. Insert in to Delta
  • 17. TYPE 3: COMPLEX EVENT PROCESSING Most Real-Time of all with nano-second data latency. Smart Trade solutions Fraud Detection systems Driverless Cars
  • 18. CONCLUSION BIG DATA concept is old but caught attention in recent years. Private Enterprises, Public Sector Agencies, Financial Institutions are eagerly looking to take advantage of BIG DATA solutions. Various products, frameworks and solutions are available in the market and they keep growing. Huge market opportunities for IT services and analytics firms.