SlideShare a Scribd company logo
1 of 25
BIG DATA
A PROBLEM
http://www.allsoftsolutions.in
BIG DATA
 Big data is a term for data sets that are so
large or complex that traditional data
processing application softwares are
inadequate to deal with them.
 Challenges
include capture, storage, analysis,
search, sharing, transfer, visualization, quer
ying, updating and information privacy.
 The term "big data" often refers simply to
the use of predictive analytics, user
behavior analytics, or certain other
advanced data analytics methods that
extract value from data, and seldom to a
particular size of data set.http://www.allsoftsolutions.in
BIG DATA
http://www.allsoftsolutions.in
WHY BIG DATA
 Over 2.5 Exabyte(2.5 billion gigabytes) of data is
generated every day. Following are some of the
sources of the huge volume of data:
 A typical, large stock exchange captures more
than 1 TB of data every day. There are around 5
billion mobile phones (including 1.75 billion smart
phones) in the world.
 YouTube users upload more than 48 hours of
video every minute.
 Large social networks such as Twitter and
Facebook capture more than 10 TB of data daily.
 There are more than 30 million networked
sensors in the world.http://www.allsoftsolutions.in
4V’s BY IBM
 Volume
 Velocity
 Variety
 Veracity
http://www.allsoftsolutions.in
IBM’s definition
 Volume:- Big data is always large in volume. It
actually doesn't have to be a certain number
of petabytes to qualify. If your store of old data and
new incoming data has gotten so large that you are
having difficulty handling it, that's big data.
Remember that it's going to keep getting bigger.
http://www.allsoftsolutions.in
IBM’s definition
 Velocity :-Velocity or speed refers to how fast the
data is coming in, but also to how fast we need to be
able to analyze and utilize it. If we have one or more
business processes that require real-time data
analysis, we have a velocity challenge. Solving this
issue might mean expanding our private cloud using
a hybrid model that allows bursting for additional
compute power as-needed for data analysis.
http://www.allsoftsolutions.in
IBM’s definition
 Variety:- Variety points to the number of sources or
incoming vectors leading to databases. That might
be embedded sensor data, phone conversations,
documents, video uploads or feeds, social media,
and much more. Variety in data means variety in
databases – we will almost certainly need to add a
non-relational database if you haven't already done
so.
http://www.allsoftsolutions.in
IBM’s definition
 Veracity :-Veracity is probably the toughest nut to
crack. If we can't trust the data itself, the source of
the data, or the processes we are using to identify
which data points are important, we have a veracity
problem. One of the biggest problems with big
data is the tendency for errors to snowball. User
entry errors, redundancy and corruption all affect the
value of data. We must clean our existing data and
put processes in place to reduce the accumulation
of dirty data going forward.
http://www.allsoftsolutions.in
Types of Data
Structured data:
 Data which is represented in a tabular format
 E.g.: Databases
Semi-structured data:
 Data which does not have a formal data model
 E.g.: XML files
Unstructured data:
 Data which does not have a pre-defined data model
 E.g.: Text files
http://www.allsoftsolutions.in
Structured Data
 Structured data refers to kinds of data with a
high level of organization, such as
information in a relational database.
 When information is highly structured and
predictable, search engines can more easily
organize and display it in creative ways.
 Structured data markup is a text-based
organization of data that is included in a file
and served from the web.
http://www.allsoftsolutions.in
Semi-structured data
 It is a form of structured data that does not
conform with the formal structure of data models
associated with relational databases or other
forms of data tables.
 But nonetheless contains tags or other markers
to separate semantic elements and enforce
hierarchies of records and fields within the data.
Therefore, it is also known as self-
describing structure.
 In semi-structured data, the entities belonging to
the same class may have
different attributes even though they are groupedhttp://www.allsoftsolutions.in
Unstructured data
 It refers to information that either does not have a
pre-defined data model or is not organized in a pre-
defined manner.
 It is typically text-heavy, but may contain data such
as dates, numbers, and facts as well.
 This results in irregularities and ambiguities that
make it difficult to understand using traditional
programs as compared to data stored in fielded form
in databases or annotated (semantically tagged) in
documents.
http://www.allsoftsolutions.in
WHAT IS HADOOP
 Hadoop is an open source, Java-based programming
framework that supports the processing and storage of
extremely large data sets in a distributed computing
environment. It is part of the Apache project sponsored
by the Apache Software Foundation.
 It consists of computer clusters built from commodity
hardware.
 All the modules in Hadoop are designed with a
fundamental assumption that hardware failures are
common occurrences and should be automatically
handled by the framework.
 The core of Apache Hadoop consists of a storage part,
known as Hadoop Distributed File System (HDFS), and a
processing part which is a MapReduce programming
model.
http://www.allsoftsolutions.in
WHAT IS HADOOP
 Hadoop splits files into large blocks and distributes
them across nodes in a cluster.
 It then transfers packaged code into nodes to
process the data in parallel. This approach takes
advantage of data locality – nodes manipulating the
data they have access to – to allow the dataset to
be processed faster and more efficiently than it
would be in a more conventional supercomputer
architecture that relies on a parallel file
system where computation and data are distributed
via high-speed networking.
http://www.allsoftsolutions.in
Core components of Hadoop
 Hadoop Distributed File System (HDFS) – a
distributed file-system that stores data on commodity
machines, providing very high aggregate bandwidth
across the cluster.
 Hadoop MapReduce – an implementation of
the MapReduce programming model for large scale
data processing.
http://www.allsoftsolutions.in
HISTORY OF HADOOP
 The genesis of Hadoop came from the Google File
System paper that was published in October 2003.
 This paper spawned another research paper from
Google – MapReduce: Simplified Data Processing
on Large Clusters.
 Development started on the Apache Nutch project,
but was moved to the new Hadoop subproject in
January 2006.
 Doug Cutting, who was working at Yahoo! at the
time, named it after his son's toy elephant.
http://www.allsoftsolutions.in
HADOOP-ECOSYSTEM
http://www.allsoftsolutions.in
CORE COMPONENTS-HDFS
 Hadoop File System was developed using
distributed file system design.
 It is run on commodity hardware.
 Unlike other distributed systems, HDFS is highly fault
tolerant and designed using low-cost hardware.
 HDFS holds very large amount of data and provides
easier access.
 To store such huge data, the files are stored across
multiple machines.
 HDFS also makes applications available to parallel
processing.
http://www.allsoftsolutions.in
Features of HDFS
 It is suitable for the distributed storage and
processing.
 Hadoop provides a command interface to interact
with HDFS.
 The built-in servers of namenode and datanode help
users to easily check the status of cluster.
 Streaming access to file system data.
 HDFS provides file permissions and authentication.
http://www.allsoftsolutions.in
HDFS Architecture
http://www.allsoftsolutions.in
NAMENODE
HDFS follows the master-slave architecture and it has
the following elements.
 Namenode
 The namenode is the commodity hardware that
contains an operating system and the namenode
software.
 It is a software that can be run on commodity
hardware.
http://www.allsoftsolutions.in
NAMENODE
 The system having the namenode acts as the
master server and it does the following tasks:
 Manages the file system namespace.
 Regulates client’s access to files.
 It also executes file system operations such as
renaming, closing, and opening files and directories.
http://www.allsoftsolutions.in
DATANODE
 The datanode is a commodity hardware having any
operating system and datanode software. For every
node (Commodity hardware/System) in a cluster,
there will be a datanode. These nodes manage the
data storage of their system.
 Datanodes perform read-write operations on the file
systems, as per client request.
 They also perform operations such as block creation,
deletion, and replication according to the instructions
of the namenode.
http://www.allsoftsolutions.in
BLOCK
 Generally the user data is stored in the files of
HDFS. The file in a file system will be divided into
one or more segments and/or stored in individual
data nodes. These file segments are called as
blocks. In other words, the minimum amount of data
that HDFS can read or write is called a Block. The
default block size is 64MB, but it can be increased
as per the need to change in HDFS configuration.
http://www.allsoftsolutions.in

More Related Content

What's hot

Cibm work shop 2chapter six
Cibm  work shop 2chapter sixCibm  work shop 2chapter six
Cibm work shop 2chapter sixShaheen Khan
 
A Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s EcosystemA Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s Ecosystemrahulmonikasharma
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsDell World
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Lizzy_Rolando
 
Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)robin fay
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 
What is big data
What is big dataWhat is big data
What is big dataShubShubi
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setSoner Altin
 

What's hot (20)

Cibm work shop 2chapter six
Cibm  work shop 2chapter sixCibm  work shop 2chapter six
Cibm work shop 2chapter six
 
A Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s EcosystemA Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s Ecosystem
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13
 
Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)
 
Data Management 101
Data Management 101Data Management 101
Data Management 101
 
Electronic Databases
Electronic DatabasesElectronic Databases
Electronic Databases
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart Data
 
What is big data
What is big dataWhat is big data
What is big data
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 

Similar to Bigdata overview

Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoopbalajiganesan03
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 

Similar to Bigdata overview (20)

Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoop
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data
Big dataBig data
Big data
 

More from AllsoftSolutions (9)

C#.net
C#.netC#.net
C#.net
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Iot basics
Iot basicsIot basics
Iot basics
 
C++ basics
C++ basicsC++ basics
C++ basics
 
Python1
Python1Python1
Python1
 
R Basics
R BasicsR Basics
R Basics
 
Mysql using php
Mysql using phpMysql using php
Mysql using php
 
Hbase
HbaseHbase
Hbase
 
Map reduce part1
Map reduce part1Map reduce part1
Map reduce part1
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 

Bigdata overview

  • 2. BIG DATA  Big data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them.  Challenges include capture, storage, analysis, search, sharing, transfer, visualization, quer ying, updating and information privacy.  The term "big data" often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.http://www.allsoftsolutions.in
  • 4. WHY BIG DATA  Over 2.5 Exabyte(2.5 billion gigabytes) of data is generated every day. Following are some of the sources of the huge volume of data:  A typical, large stock exchange captures more than 1 TB of data every day. There are around 5 billion mobile phones (including 1.75 billion smart phones) in the world.  YouTube users upload more than 48 hours of video every minute.  Large social networks such as Twitter and Facebook capture more than 10 TB of data daily.  There are more than 30 million networked sensors in the world.http://www.allsoftsolutions.in
  • 5. 4V’s BY IBM  Volume  Velocity  Variety  Veracity http://www.allsoftsolutions.in
  • 6. IBM’s definition  Volume:- Big data is always large in volume. It actually doesn't have to be a certain number of petabytes to qualify. If your store of old data and new incoming data has gotten so large that you are having difficulty handling it, that's big data. Remember that it's going to keep getting bigger. http://www.allsoftsolutions.in
  • 7. IBM’s definition  Velocity :-Velocity or speed refers to how fast the data is coming in, but also to how fast we need to be able to analyze and utilize it. If we have one or more business processes that require real-time data analysis, we have a velocity challenge. Solving this issue might mean expanding our private cloud using a hybrid model that allows bursting for additional compute power as-needed for data analysis. http://www.allsoftsolutions.in
  • 8. IBM’s definition  Variety:- Variety points to the number of sources or incoming vectors leading to databases. That might be embedded sensor data, phone conversations, documents, video uploads or feeds, social media, and much more. Variety in data means variety in databases – we will almost certainly need to add a non-relational database if you haven't already done so. http://www.allsoftsolutions.in
  • 9. IBM’s definition  Veracity :-Veracity is probably the toughest nut to crack. If we can't trust the data itself, the source of the data, or the processes we are using to identify which data points are important, we have a veracity problem. One of the biggest problems with big data is the tendency for errors to snowball. User entry errors, redundancy and corruption all affect the value of data. We must clean our existing data and put processes in place to reduce the accumulation of dirty data going forward. http://www.allsoftsolutions.in
  • 10. Types of Data Structured data:  Data which is represented in a tabular format  E.g.: Databases Semi-structured data:  Data which does not have a formal data model  E.g.: XML files Unstructured data:  Data which does not have a pre-defined data model  E.g.: Text files http://www.allsoftsolutions.in
  • 11. Structured Data  Structured data refers to kinds of data with a high level of organization, such as information in a relational database.  When information is highly structured and predictable, search engines can more easily organize and display it in creative ways.  Structured data markup is a text-based organization of data that is included in a file and served from the web. http://www.allsoftsolutions.in
  • 12. Semi-structured data  It is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables.  But nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self- describing structure.  In semi-structured data, the entities belonging to the same class may have different attributes even though they are groupedhttp://www.allsoftsolutions.in
  • 13. Unstructured data  It refers to information that either does not have a pre-defined data model or is not organized in a pre- defined manner.  It is typically text-heavy, but may contain data such as dates, numbers, and facts as well.  This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. http://www.allsoftsolutions.in
  • 14. WHAT IS HADOOP  Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.  It consists of computer clusters built from commodity hardware.  All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.  The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. http://www.allsoftsolutions.in
  • 15. WHAT IS HADOOP  Hadoop splits files into large blocks and distributes them across nodes in a cluster.  It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality – nodes manipulating the data they have access to – to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking. http://www.allsoftsolutions.in
  • 16. Core components of Hadoop  Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.  Hadoop MapReduce – an implementation of the MapReduce programming model for large scale data processing. http://www.allsoftsolutions.in
  • 17. HISTORY OF HADOOP  The genesis of Hadoop came from the Google File System paper that was published in October 2003.  This paper spawned another research paper from Google – MapReduce: Simplified Data Processing on Large Clusters.  Development started on the Apache Nutch project, but was moved to the new Hadoop subproject in January 2006.  Doug Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant. http://www.allsoftsolutions.in
  • 19. CORE COMPONENTS-HDFS  Hadoop File System was developed using distributed file system design.  It is run on commodity hardware.  Unlike other distributed systems, HDFS is highly fault tolerant and designed using low-cost hardware.  HDFS holds very large amount of data and provides easier access.  To store such huge data, the files are stored across multiple machines.  HDFS also makes applications available to parallel processing. http://www.allsoftsolutions.in
  • 20. Features of HDFS  It is suitable for the distributed storage and processing.  Hadoop provides a command interface to interact with HDFS.  The built-in servers of namenode and datanode help users to easily check the status of cluster.  Streaming access to file system data.  HDFS provides file permissions and authentication. http://www.allsoftsolutions.in
  • 22. NAMENODE HDFS follows the master-slave architecture and it has the following elements.  Namenode  The namenode is the commodity hardware that contains an operating system and the namenode software.  It is a software that can be run on commodity hardware. http://www.allsoftsolutions.in
  • 23. NAMENODE  The system having the namenode acts as the master server and it does the following tasks:  Manages the file system namespace.  Regulates client’s access to files.  It also executes file system operations such as renaming, closing, and opening files and directories. http://www.allsoftsolutions.in
  • 24. DATANODE  The datanode is a commodity hardware having any operating system and datanode software. For every node (Commodity hardware/System) in a cluster, there will be a datanode. These nodes manage the data storage of their system.  Datanodes perform read-write operations on the file systems, as per client request.  They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode. http://www.allsoftsolutions.in
  • 25. BLOCK  Generally the user data is stored in the files of HDFS. The file in a file system will be divided into one or more segments and/or stored in individual data nodes. These file segments are called as blocks. In other words, the minimum amount of data that HDFS can read or write is called a Block. The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration. http://www.allsoftsolutions.in