SlideShare a Scribd company logo
Overview of Big Data & Apache
Hadoop
Presented By :
Sunny
Objectives
 What is Big Data
 BIG DATA Challenges
 Sources of Big Data challenges
 Categories Of 'Big Data'
 Characteristics Of 'Big Data’
 Live Example
 Introduction Of Hadoop
Big Data - Definition and Concepts
• Big data is the term for collection of data sets so
large and complex that it become difficult to process
using on – hand database management tool
• Traditionally, “Big Data” = massive volumes of data
– E.g., volume of data at CERN, NASA, Google, …
• Where does the Big Data come from?
– Everywhere! Web logs, GPS systems, sensor networks,
social networks, Internet-based text documents, Internet
search indexes, detail call records, astronomy, atmospheric
science, biology, nuclear physics, biochemical experiments,
medical records, scientific research, military surveillance,
multimedia archives, …
Data Explosion
2.5 billion gigabytes of data was generated everyday in 2012.
40,000 search queries search in every second.
300 hours of video is uploaded every min.
31.25 million message sent & 2.77 million video been viewed.
2020 all data will pass through cloud .
Technology Insights 6.1
The Data Size Is Getting Big, Bigger, …
• Hadron Collider - 1
PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day
• YouTube – 1 TB/4 min
• The proposed Square
Kilometer Array
telescope (the world’s
proposed biggest
telescope) – 1 EB/day
Characteristics Of 'Big Data'
BIG DATA Challenges-
The challenges include :
 Capture,
 Storage,
 Search,
 Sharing,
 Transfer analysis and
 Visualization.
Categories Of 'Big Data'
Big data could be found in three forms:
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and
processed in the form of fixed format is termed
as a 'structured' data.
• In semi-structured data, the entities belonging to the
same class may have different attributes even though
they are grouped together.
Semi-structured Data
• Example :
A Word document is generally considered to be unstructured
data. However, you can add metadata tags in the form of
keywords and other metadata that represent the document
content and make it easier for that document to be found when
people search for those terms -- the data is now semi-
structured. Nevertheless, the document still lacks the complex
organization of the database, so falls short of being fully
structured data.
Semi-structured
• Semi-structured data can contain both the
forms of data.
Unstructured
• Any data with unknown form or the structure
is classified as unstructured data.
• Word, PDF, Text, Media Logs.
Live Example
Bank manager assigned task to find best
location to setup the ATM machine .
Distributed System
 A model in which components located on
networked computer communication.
 Distributed System use multiple
Machine for a single job.
How does a distributed system works ?
1 Machine
Data = 1 Terabyte
Processing time 45 min.
100 Machine
Data = 1 Terabyte
Processing time 47 sec.
Challenges of Distributed System
1. Multiple computer are used.
2. High chance of system failure.
3. Limit Bandwidth.
4. Complex Programming.
Solution to all these is Hadoop
Big Data Technologies
• MapReduce …
• Hadoop …
• Hive
• Pig
• Hbase
• Flume
• Oozie
• Ambari
• Avro
• Mahout, Sqoop, Hcatalog, ….
Hadoop
Doug Cutting is creator of Hadoop.
 Hadoop don’t have any meaning – its a
made up name by his kid.
Hadoop
Hadoop is a framework that allows for
distributed processing of large data sets across
clusters of commodity computers using simple
programming models.
 Hadoop is an open source, java based
programming framework that supports the
processing and storage of extremely large data
sets in a distributed computing environment.
Why Hadoop
1. Runs a number of application which involving
Petabyte of data.
2. Has a distributed file system, called HDFS,
which enables fast data transfer among the
nodes or server.
Big Data Technologies Hadoop
• Hadoop Technical Components
– Hadoop Distributed File System (HDFS)
– Name Node (primary facilitator)
– Secondary Node (backup to Name Node)
– Job Tracker
– Slave Nodes (the grunts of any Hadoop cluster)
– Additionally, Hadoop ecosystem is made up of a
number of complementary sub-projects: NoSQL
(Cassandra, Hbase), DW (Hive), …
• NoSQL = not only SQL
Hadoop Characteristics
1. Economical – ordinary computer can be used for data
processing.
2. Reliable - Stores copies of the data on different machines
and is resistant to hardware failure.
3. Scalable - Hadoop cluster can be extended by just adding
nodes in the cluster.
4. Flexible - Can store a lot of data and decide to use it later.
Hadoop Core Components
Big Data Technologies
MapReduce
4
3
3
3
3
Raw Data Map Function Reduce Function
How does
MapReduce
work?
Top 10 Big Data Vendors
with Primary Focus on Hadoop
$0
$10
$20
$30
$40
$50
$60
$70
Stream Analytics Applications
• e-Commerce
• Telecommunication
• Law Enforcement and Cyber Security
• Power Industry
• Financial Services
• Health Services
• Government
Thank You

More Related Content

What's hot

Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
Columbia Unviersity Scholarly Communication Program
 
Digital data
Digital dataDigital data
Digital data
ShivanandaVSeeri
 
Data management
Data management Data management
Data management
Graça Gabriel
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
Arhiv družboslovnih podatkov
 
Big data
Big dataBig data
Dbms unit 1
Dbms unit   1Dbms unit   1
Dbms unit 1
devineni66
 
Hadoop
HadoopHadoop
Data tree product brochure
Data tree product brochureData tree product brochure
Data tree product brochurelwiggins
 
Concepts of Data Bases
Concepts of Data BasesConcepts of Data Bases
Concepts of Data Bases
Networking
 
Digital Types
Digital TypesDigital Types
Digital Types
ShivanandaVSeeri
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
BigMine
 
CURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMSCURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMS
Gayathri P
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Available techniques in hadoop small file issue
Available techniques in hadoop small file issueAvailable techniques in hadoop small file issue
Available techniques in hadoop small file issue
IJECEIAES
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
Natalino Busa
 

What's hot (20)

Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
 
Digital data
Digital dataDigital data
Digital data
 
Data management
Data management Data management
Data management
 
Beekman5 std ppt_08
Beekman5 std ppt_08Beekman5 std ppt_08
Beekman5 std ppt_08
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Big data
Big dataBig data
Big data
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Dbms unit 1
Dbms unit   1Dbms unit   1
Dbms unit 1
 
Hadoop
HadoopHadoop
Hadoop
 
Data tree product brochure
Data tree product brochureData tree product brochure
Data tree product brochure
 
Concepts of Data Bases
Concepts of Data BasesConcepts of Data Bases
Concepts of Data Bases
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
CURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMSCURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMS
 
Hota hadoop
Hota hadoopHota hadoop
Hota hadoop
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Available techniques in hadoop small file issue
Available techniques in hadoop small file issueAvailable techniques in hadoop small file issue
Available techniques in hadoop small file issue
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 

Similar to Overview of Big Data by Sunny

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Sri Kanth
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
big data analytics and hadoop comparions
big data analytics and hadoop comparionsbig data analytics and hadoop comparions
big data analytics and hadoop comparions
DineshSwami21
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
Bhavya Gulati
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
IMC Institute
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
Big data
Big dataBig data
Big data
revathireddyb
 

Similar to Overview of Big Data by Sunny (20)

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
paper
paperpaper
paper
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
big data analytics and hadoop comparions
big data analytics and hadoop comparionsbig data analytics and hadoop comparions
big data analytics and hadoop comparions
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big Data
Big DataBig Data
Big Data
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data
Big dataBig data
Big data
 

More from DignitasDigital1

Ambush Marketing By Amandeep
Ambush Marketing By AmandeepAmbush Marketing By Amandeep
Ambush Marketing By Amandeep
DignitasDigital1
 
Wireframing with balsamiq by Chandeep
Wireframing with balsamiq by ChandeepWireframing with balsamiq by Chandeep
Wireframing with balsamiq by Chandeep
DignitasDigital1
 
10 Productive habits for working from home By Shweta
10 Productive habits for working from home By Shweta 10 Productive habits for working from home By Shweta
10 Productive habits for working from home By Shweta
DignitasDigital1
 
5 Principles of brand on social media during lockdown By Aman
5 Principles of brand on social media during lockdown By Aman5 Principles of brand on social media during lockdown By Aman
5 Principles of brand on social media during lockdown By Aman
DignitasDigital1
 
Typography By Amit
Typography By AmitTypography By Amit
Typography By Amit
DignitasDigital1
 
Bootstrap By Shafeeq
Bootstrap By Shafeeq Bootstrap By Shafeeq
Bootstrap By Shafeeq
DignitasDigital1
 
Drip Marketing by Abhishek
Drip Marketing by AbhishekDrip Marketing by Abhishek
Drip Marketing by Abhishek
DignitasDigital1
 
7Cs for communication by Shweta
7Cs for communication by Shweta 7Cs for communication by Shweta
7Cs for communication by Shweta
DignitasDigital1
 
Flutter by Shubham
Flutter by ShubhamFlutter by Shubham
Flutter by Shubham
DignitasDigital1
 
Blue Ocean strategy by Vinita
Blue Ocean strategy by VinitaBlue Ocean strategy by Vinita
Blue Ocean strategy by Vinita
DignitasDigital1
 
Sass:-Syntactically Awesome Stylesheet by Shafeeq
Sass:-Syntactically Awesome Stylesheet by ShafeeqSass:-Syntactically Awesome Stylesheet by Shafeeq
Sass:-Syntactically Awesome Stylesheet by Shafeeq
DignitasDigital1
 
Advertising and marketing at zero cost by Jatin
Advertising and marketing at zero cost by JatinAdvertising and marketing at zero cost by Jatin
Advertising and marketing at zero cost by Jatin
DignitasDigital1
 
Ui trends 2019 by Amit
Ui trends 2019 by AmitUi trends 2019 by Amit
Ui trends 2019 by Amit
DignitasDigital1
 
Kubernetes by Jai
Kubernetes by JaiKubernetes by Jai
Kubernetes by Jai
DignitasDigital1
 

More from DignitasDigital1 (14)

Ambush Marketing By Amandeep
Ambush Marketing By AmandeepAmbush Marketing By Amandeep
Ambush Marketing By Amandeep
 
Wireframing with balsamiq by Chandeep
Wireframing with balsamiq by ChandeepWireframing with balsamiq by Chandeep
Wireframing with balsamiq by Chandeep
 
10 Productive habits for working from home By Shweta
10 Productive habits for working from home By Shweta 10 Productive habits for working from home By Shweta
10 Productive habits for working from home By Shweta
 
5 Principles of brand on social media during lockdown By Aman
5 Principles of brand on social media during lockdown By Aman5 Principles of brand on social media during lockdown By Aman
5 Principles of brand on social media during lockdown By Aman
 
Typography By Amit
Typography By AmitTypography By Amit
Typography By Amit
 
Bootstrap By Shafeeq
Bootstrap By Shafeeq Bootstrap By Shafeeq
Bootstrap By Shafeeq
 
Drip Marketing by Abhishek
Drip Marketing by AbhishekDrip Marketing by Abhishek
Drip Marketing by Abhishek
 
7Cs for communication by Shweta
7Cs for communication by Shweta 7Cs for communication by Shweta
7Cs for communication by Shweta
 
Flutter by Shubham
Flutter by ShubhamFlutter by Shubham
Flutter by Shubham
 
Blue Ocean strategy by Vinita
Blue Ocean strategy by VinitaBlue Ocean strategy by Vinita
Blue Ocean strategy by Vinita
 
Sass:-Syntactically Awesome Stylesheet by Shafeeq
Sass:-Syntactically Awesome Stylesheet by ShafeeqSass:-Syntactically Awesome Stylesheet by Shafeeq
Sass:-Syntactically Awesome Stylesheet by Shafeeq
 
Advertising and marketing at zero cost by Jatin
Advertising and marketing at zero cost by JatinAdvertising and marketing at zero cost by Jatin
Advertising and marketing at zero cost by Jatin
 
Ui trends 2019 by Amit
Ui trends 2019 by AmitUi trends 2019 by Amit
Ui trends 2019 by Amit
 
Kubernetes by Jai
Kubernetes by JaiKubernetes by Jai
Kubernetes by Jai
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Overview of Big Data by Sunny

  • 1. Overview of Big Data & Apache Hadoop Presented By : Sunny
  • 2. Objectives  What is Big Data  BIG DATA Challenges  Sources of Big Data challenges  Categories Of 'Big Data'  Characteristics Of 'Big Data’  Live Example  Introduction Of Hadoop
  • 3. Big Data - Definition and Concepts • Big data is the term for collection of data sets so large and complex that it become difficult to process using on – hand database management tool • Traditionally, “Big Data” = massive volumes of data – E.g., volume of data at CERN, NASA, Google, … • Where does the Big Data come from? – Everywhere! Web logs, GPS systems, sensor networks, social networks, Internet-based text documents, Internet search indexes, detail call records, astronomy, atmospheric science, biology, nuclear physics, biochemical experiments, medical records, scientific research, military surveillance, multimedia archives, …
  • 4.
  • 5. Data Explosion 2.5 billion gigabytes of data was generated everyday in 2012. 40,000 search queries search in every second. 300 hours of video is uploaded every min. 31.25 million message sent & 2.77 million video been viewed. 2020 all data will pass through cloud .
  • 6. Technology Insights 6.1 The Data Size Is Getting Big, Bigger, … • Hadron Collider - 1 PB/sec • Boeing jet - 20 TB/hr • Facebook - 500 TB/day • YouTube – 1 TB/4 min • The proposed Square Kilometer Array telescope (the world’s proposed biggest telescope) – 1 EB/day
  • 8. BIG DATA Challenges- The challenges include :  Capture,  Storage,  Search,  Sharing,  Transfer analysis and  Visualization.
  • 9. Categories Of 'Big Data' Big data could be found in three forms: 1. Structured 2. Unstructured 3. Semi-structured
  • 10. Structured Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data.
  • 11. • In semi-structured data, the entities belonging to the same class may have different attributes even though they are grouped together. Semi-structured Data • Example : A Word document is generally considered to be unstructured data. However, you can add metadata tags in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data is now semi- structured. Nevertheless, the document still lacks the complex organization of the database, so falls short of being fully structured data.
  • 12. Semi-structured • Semi-structured data can contain both the forms of data.
  • 13. Unstructured • Any data with unknown form or the structure is classified as unstructured data. • Word, PDF, Text, Media Logs.
  • 14. Live Example Bank manager assigned task to find best location to setup the ATM machine .
  • 15. Distributed System  A model in which components located on networked computer communication.  Distributed System use multiple Machine for a single job.
  • 16. How does a distributed system works ? 1 Machine Data = 1 Terabyte Processing time 45 min. 100 Machine Data = 1 Terabyte Processing time 47 sec.
  • 17. Challenges of Distributed System 1. Multiple computer are used. 2. High chance of system failure. 3. Limit Bandwidth. 4. Complex Programming. Solution to all these is Hadoop
  • 18. Big Data Technologies • MapReduce … • Hadoop … • Hive • Pig • Hbase • Flume • Oozie • Ambari • Avro • Mahout, Sqoop, Hcatalog, ….
  • 19. Hadoop Doug Cutting is creator of Hadoop.  Hadoop don’t have any meaning – its a made up name by his kid.
  • 20. Hadoop Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using simple programming models.  Hadoop is an open source, java based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment.
  • 21. Why Hadoop 1. Runs a number of application which involving Petabyte of data. 2. Has a distributed file system, called HDFS, which enables fast data transfer among the nodes or server.
  • 22. Big Data Technologies Hadoop • Hadoop Technical Components – Hadoop Distributed File System (HDFS) – Name Node (primary facilitator) – Secondary Node (backup to Name Node) – Job Tracker – Slave Nodes (the grunts of any Hadoop cluster) – Additionally, Hadoop ecosystem is made up of a number of complementary sub-projects: NoSQL (Cassandra, Hbase), DW (Hive), … • NoSQL = not only SQL
  • 23. Hadoop Characteristics 1. Economical – ordinary computer can be used for data processing. 2. Reliable - Stores copies of the data on different machines and is resistant to hardware failure. 3. Scalable - Hadoop cluster can be extended by just adding nodes in the cluster. 4. Flexible - Can store a lot of data and decide to use it later.
  • 25. Big Data Technologies MapReduce 4 3 3 3 3 Raw Data Map Function Reduce Function How does MapReduce work?
  • 26. Top 10 Big Data Vendors with Primary Focus on Hadoop $0 $10 $20 $30 $40 $50 $60 $70
  • 27. Stream Analytics Applications • e-Commerce • Telecommunication • Law Enforcement and Cyber Security • Power Industry • Financial Services • Health Services • Government