SlideShare a Scribd company logo
Big Data
Presented by : SHIVAM SHUKLA
Contents
 What is Big data ?
 History
 Three V’s
 Why Big Data important ?
 Technologies related to Big Data
Hadoop
Why Hadoop?
Hbase
Why Hbase?
Some features of Hbase
Hive
About
Points to remember
Sqoop
Working
Difference
What is Big Data ?
 Big data is a term that describes the large volume of data :
a) Structured
b) Unstructured
c) Semi-structured
 That inundates a business on a day-to-day basis.
 But it’s not the amount of data that’s important. It’s what
organizations do with the data that matters.
History
 While the term “big data” is relatively new, the act of gathering and
storing large amounts of information for eventual analysis is ages
old.
 The concept gained momentum in the early 2000s, when industry
analyst Doug Laney articulated the now-mainstream definition of
big data as the three Vs:
Volume
Velocity
Variety
Three V’s :
 Volume
Defines the huge amount of data that is produced each day by
organizations in the world
 Velocity
Refers to speed with which the data is generated , analyzed and
reprocessed
 Variety
refers to diversity of data and data sources
Additional V’s
With the time new V’s of big data introduced
 Validity
It refers to the guarantee of data quality or,
alternatively, Veracity is the authenticity and credibility of the data.
 Value
denotes the added value for companies. Many companies have
recently established their own data platforms, filled their data pools
and invested a lot of money in infrastructure. It is now a question of
generating business value from their investments.
Why is Big Data important ?
 The importance of big data doesn’t revolve around how much data
you have, but what you do with it.
 You can take data from any source and analyze it to find answers
that enable
Cost reduction
Time reduction
Smart decision making
Some Technologies related to Big
data
 Hadoop framework
 Hbase
 Hive
 Scoop
Hadoop
 Hadoop is developed by Doug cutting and Michael j. cafarella.
 Hadoop is a apache open source frame work designed for
Managing the data
Processing the data
Analyzing the data
Storing the data
 Hadoop is written in java and not OLAP(online analytical
processing).
 It is used for offline processing.
 Logo for Hadoop is a YELLOW ELEPHANT
Why Hadoop ?
 Fast :
 In HDFS the data distributed over the cluster and are mapped
which helps in faster retrieval.
 Scalable :
 Hadoop cluster can be extended by just adding nodes in the
cluster.
 Cost Effective :
 Hadoop is open source and uses commodity hardware to store
data so it really cost effective as compared to traditional
relational database management system.
 Resilient to failure :
 HDFS has the property with which it can replicate data over the
network, so if one node is down or some other network failure
happens, then Hadoop takes the other copy of data and use it.
HBase
 HBase is an open source framework provided by Apache. It is a
sorted map data built on Hadoop.
 It is column oriented and horizontally scalable.
 It has set of tables which keep data in key value format.
 It is type of a database designed for mainly managing the
unstructured data
 Logo for Apache HBase is a DOLPHIN
Why Hbase?
 RDBMS get exponentially slow as the data becomes large.
 Expects data to be highly structured, i.e. ability to fit in a well-
defined schema.
 Any change in schema might require a downtime.
 For sparse datasets, too much of overhead of maintaining NULL
values.
Some feature of
Hbase
 Horizontally scalable: You can add any number of columns anytime.
 Often referred as a key value store or column family-oriented
database, or storing versioned maps of maps.
 fundamentally, it's a platform for storing and retrieving data with
random access.
 It doesn't care about datatypes(storing an integer in one row and a
string in another for the same column).
 There is only one kind of data type which is byte array.
 It doesn't enforce relationships within your data.
 It is designed to run on a cluster of computers.
Hive
 Hive is a data warehouse infrastructure tool to process structured
data in Hadoop.
 It runs SQL like queries called HQL (Hive query language) which
gets internally converted to map reduce jobs.
 Initially Hive was developed by Facebook, later the Apache
Software Foundation took it up and developed it further as an open
source under the name Apache Hive.
 Hive supports Data definition Language(DDL), Data Manipulation
Language(DML) and user defined functions.
 The logo for hive is a yellow and black BEE
Hive is not :
 A relational database
 designed for Online Transaction Processing (OLTP)
 A language for real-time queries and row-level updates
 Even with small amount of data ,time to return the response can’t be
compared to RDBMS.
Points to remember about
hive
 Hive Query Language is similar to SQL and gets reduced to map
reduce jobs in backend.
 Hive's default database is derby.
 It also called as a No Sql.
 It provides SQL type language for querying called HiveQL or HQL.
 It is designed for OLAP(Online analytics processing).
Sqoop
 Sqoop is a tool designed to transfer data between Hadoop and
relational database servers.
 It is used to import data from relational databases such as MySQL,
Oracle to Hadoop HDFS, and export from Hadoop file system to
relational databases.
 It is provided by the Apache Software Foundation.
 Sqoop- “SQL to Hadoop and Hadoop to SQL”
Working of sqoop
Difference
Sqoop Import
 The import tool imports
individual tables from
RDBMS to HDFS.
 Each row in a table is treated
as a record in HDFS.
 All records are stored as text
data in text files or as binary
data in Avro and Sequence
files.
Sqoop Export
 The export tool exports a set of
files from HDFS back to an
RDBMS.
 The files given as input to
Sqoop contain records, which
are called as rows in table.
 Those are read and parsed into
a set of records and delimited
with user-specified delimiter.
Thank you
Any queries

More Related Content

What's hot

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
Pham Thai Hoa
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
KMS Technology
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 
Hadoop
HadoopHadoop
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
sudhakara st
 
Hadoop
Hadoop Hadoop
Hadoop
Shamama Kamal
 

What's hot (20)

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hadoop
Hadoop Hadoop
Hadoop
 

Similar to Big data and tools

Case study on big data
Case study on big dataCase study on big data
Case study on big data
Khushboo Kumari
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
Jonathan Bloom
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
Jonathan Bloom
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
KarthigaGunasekaran1
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
Chandra Sekhar Saripaka
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Amr Awadallah
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
DIVYA370851
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 

Similar to Big data and tools (20)

Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Big data and tools

  • 1. Big Data Presented by : SHIVAM SHUKLA
  • 2. Contents  What is Big data ?  History  Three V’s  Why Big Data important ?  Technologies related to Big Data Hadoop Why Hadoop? Hbase Why Hbase? Some features of Hbase
  • 4. What is Big Data ?  Big data is a term that describes the large volume of data : a) Structured b) Unstructured c) Semi-structured  That inundates a business on a day-to-day basis.  But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
  • 5. History  While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old.  The concept gained momentum in the early 2000s, when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs: Volume Velocity Variety
  • 6. Three V’s :  Volume Defines the huge amount of data that is produced each day by organizations in the world  Velocity Refers to speed with which the data is generated , analyzed and reprocessed  Variety refers to diversity of data and data sources
  • 7.
  • 8. Additional V’s With the time new V’s of big data introduced  Validity It refers to the guarantee of data quality or, alternatively, Veracity is the authenticity and credibility of the data.  Value denotes the added value for companies. Many companies have recently established their own data platforms, filled their data pools and invested a lot of money in infrastructure. It is now a question of generating business value from their investments.
  • 9. Why is Big Data important ?  The importance of big data doesn’t revolve around how much data you have, but what you do with it.  You can take data from any source and analyze it to find answers that enable Cost reduction Time reduction Smart decision making
  • 10. Some Technologies related to Big data  Hadoop framework  Hbase  Hive  Scoop
  • 11. Hadoop  Hadoop is developed by Doug cutting and Michael j. cafarella.  Hadoop is a apache open source frame work designed for Managing the data Processing the data Analyzing the data Storing the data  Hadoop is written in java and not OLAP(online analytical processing).  It is used for offline processing.  Logo for Hadoop is a YELLOW ELEPHANT
  • 12. Why Hadoop ?  Fast :  In HDFS the data distributed over the cluster and are mapped which helps in faster retrieval.  Scalable :  Hadoop cluster can be extended by just adding nodes in the cluster.  Cost Effective :  Hadoop is open source and uses commodity hardware to store data so it really cost effective as compared to traditional relational database management system.  Resilient to failure :  HDFS has the property with which it can replicate data over the network, so if one node is down or some other network failure happens, then Hadoop takes the other copy of data and use it.
  • 13. HBase  HBase is an open source framework provided by Apache. It is a sorted map data built on Hadoop.  It is column oriented and horizontally scalable.  It has set of tables which keep data in key value format.  It is type of a database designed for mainly managing the unstructured data  Logo for Apache HBase is a DOLPHIN
  • 14. Why Hbase?  RDBMS get exponentially slow as the data becomes large.  Expects data to be highly structured, i.e. ability to fit in a well- defined schema.  Any change in schema might require a downtime.  For sparse datasets, too much of overhead of maintaining NULL values.
  • 15. Some feature of Hbase  Horizontally scalable: You can add any number of columns anytime.  Often referred as a key value store or column family-oriented database, or storing versioned maps of maps.  fundamentally, it's a platform for storing and retrieving data with random access.  It doesn't care about datatypes(storing an integer in one row and a string in another for the same column).  There is only one kind of data type which is byte array.  It doesn't enforce relationships within your data.  It is designed to run on a cluster of computers.
  • 16. Hive  Hive is a data warehouse infrastructure tool to process structured data in Hadoop.  It runs SQL like queries called HQL (Hive query language) which gets internally converted to map reduce jobs.  Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.  Hive supports Data definition Language(DDL), Data Manipulation Language(DML) and user defined functions.  The logo for hive is a yellow and black BEE
  • 17. Hive is not :  A relational database  designed for Online Transaction Processing (OLTP)  A language for real-time queries and row-level updates  Even with small amount of data ,time to return the response can’t be compared to RDBMS.
  • 18. Points to remember about hive  Hive Query Language is similar to SQL and gets reduced to map reduce jobs in backend.  Hive's default database is derby.  It also called as a No Sql.  It provides SQL type language for querying called HiveQL or HQL.  It is designed for OLAP(Online analytics processing).
  • 19. Sqoop  Sqoop is a tool designed to transfer data between Hadoop and relational database servers.  It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.  It is provided by the Apache Software Foundation.  Sqoop- “SQL to Hadoop and Hadoop to SQL”
  • 21. Difference Sqoop Import  The import tool imports individual tables from RDBMS to HDFS.  Each row in a table is treated as a record in HDFS.  All records are stored as text data in text files or as binary data in Avro and Sequence files. Sqoop Export  The export tool exports a set of files from HDFS back to an RDBMS.  The files given as input to Sqoop contain records, which are called as rows in table.  Those are read and parsed into a set of records and delimited with user-specified delimiter.