SlideShare a Scribd company logo
1 of 39
Big Data – Srinath & Arjun
Big Data – Srinath & Arjun
• The BIG-DATA
• Hadoop
• Hadoop Components
• Hadoop Eco Systems
2
Agenda
Big Data – Srinath & Arjun
The BIG-DATA
Big Data – Srinath & Arjun 4
The Context
• Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009)
• Google collects 270PB data in a month (2007), 20000PB a day (2008)
• 2010 census data is expected to be a huge gold mine of information
• Data mining huge amounts of data collected in a wide range of domains
from astronomy to healthcare has become essential for planning and
performance.
Big Data – Srinath & Arjun
• We are in a knowledge economy.
– Data is an important asset to any organization
– Discovery of knowledge; Enabling discovery; annotation of data
• We are looking at newer
– programming models, and
– Supporting algorithms and data structures.
The Context
Big Data – Srinath & Arjun
• Big Data is New
• Big Data is only about Massive Data Volume
• Big data means Hadoop
• Big data need a Data Warehouse
• Big data means Unstructured Data
• Big data is for Social Media and Data mining Analyses
6
The Myth about Big Data
Big Data – Srinath & Arjun
It is all about better analytic on a broader spectrum of data, and
therefore represents an opportunity to create even more differentiation
among industries.
7
Big Data is…
Big Data – Srinath & Arjun
Where Data is coming….?
12+ TBs
of tweet data
every day
25+ TBs
of
log data
every day
?TBsof
dataevery
day
2+
billion
people
on the
Web by
end 2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions
of GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
Big Data – Srinath & Arjun
Facebook
• 4.5 billion Facebook likes every day
• 350 million photos uploaded on a daily basis
• 250 billion photos stored by Facebook
• 10 billion messages sent everyday
• 1 trillion posts in Facebook’s graph search database
• 500 TB of data processed daily
• 100 PB of data stored in Facebok’s Hadoop disk cluster (1PB=1000TB=1000000
GB)
Example of Big Data Generation
Big Data – Srinath & Arjun
Flights
• 1 Boeing plane engine generates 20TB of data for every hour of flying
• How much data do all the flights in this world generate every year if
there are 100000 two engine flights daily?
Example of Big Data Generation
Big Data – Srinath & Arjun
• Black Box Data
• Social Media Data
• Stock Exchange Data
• Power Grid Data
• Transport Data
• Search Engine Data
What comes under Big data?
Big Data – Srinath & Arjun
• Capturing Data
• Storage
• Searching
• Sharing
• Transfer
• Analysis
• Presentation
Big Data Challenges
Big Data – Srinath & Arjun
Characteristics of Big Data
Volume
of Tweets
create daily.
12+terabytes
Variety
of different
types of data.
100’s
Veracity
decision makers trust
their information.
Only 1 in 3
trade events
per second.
5+million
Velocity
Big Data – Srinath & Arjun
• Structured data : Relational Data
• Semi Structured data : XML data
• Unstructured Data : Word, PDF, Text, Media Logs
Types of Data
Big Data – Srinath & Arjun
The Data Explosion
• 2.5 quintillion bytes of data created each year
• 90 % of data in the world was created in the last two years
Big Data – Srinath & Arjun
Hadoop
Big Data – Srinath & Arjun
Hadoop
• Open Source Software Framework
• Inspired by Google’s Map – Reduce Programming Model (GFS)
• Originally written for the Nutch search engine project
• Written in java
• Efficiently processes large volumes of Data
• Breaks up Big data into multiple parts
• Two key parts
• HDFS
• MapReduce
Big Data – Srinath & Arjun
History of Hadoop
Big Data – Srinath & Arjun
Hadoop Architecture
Big Data – Srinath & Arjun
Hadoop Components
Big Data – Srinath & Arjun
HDFS – Hadoop Distributed File System
• It’s a file system designed for storing very large files running on cluster of
commodity hardware
• High fault tolerance, Distributed, Reliable, Scalable file system for Data
Storage
• Stores multiple copies of data on different nodes. (default 64MB)
• Typically has a single namenode and no.of datanodes to form the HDFS
cluster
Big Data – Srinath & Arjun
HDFS Architecture
• Two types of Nodes
 Master or Namenode
 Slave or Datanode
Big Data – Srinath & Arjun
HDFS Architecture
Big Data – Srinath & Arjun
Read a File
Big Data – Srinath & Arjun
Write a File
Big Data – Srinath & Arjun
Hadoop Cluster Modes
• Standalone Mode
• Pseudo-Distributed Mode
• Fully-Distributed Mode
Big Data – Srinath & Arjun
MapReduce
Programming Model designed for processing large volumes of data in
parallel by dividing the work into a set of independent tasks
Big Data – Srinath & Arjun
Terminology
• Job
• Task
• Task Attempt
• NameNode
• MasterNode
• SlaveNode
• Clusters
• Commodity Hardware
Big Data – Srinath & Arjun
Components
• Master Nodes
• Slave Nodes
Big Data – Srinath & Arjun
Workflow
Big Data – Srinath & Arjun
Example
Big Data – Srinath & Arjun
Closer Look
Big Data – Srinath & Arjun
Input Formats
• Text Input Format
• Sequential input format
• Key value text input format
Big Data – Srinath & Arjun
NoSQL
• NoSQL mean “not only SQL”
• This includes key value stores, document-oriented databases, graph
databases, big datable structures, and caching data stores
Eg. MongoDB, Cassandra
Big Data – Srinath & Arjun
Hadoop ECO Systems
Big Data – Srinath & Arjun
What is HIVE?
• Data Warehousing Infrastructure
• Data Summarization, ad-hoc querying and analysis of large
volumes of data
Big Data – Srinath & Arjun
HiveQL
• HiveQL is the Hive query language.
• Hive doesn’t support transactions.
Big Data – Srinath & Arjun
Hive Application
• Log Processing
• Text Mining
• Document indexing
• Customer – facing Business intelligence (eg. Google Analytics)
• Predictive modelling, hypothesis testing
Big Data – Srinath & Arjun
Thank You….

More Related Content

What's hot

Presentation at Google Day on Big Data
Presentation at Google Day on Big DataPresentation at Google Day on Big Data
Presentation at Google Day on Big Data
Rezaur Rahman
 
متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
جشنوارهٔ روز آزادی نرم‌افزار تهران
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
tejayasam
 

What's hot (20)

Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Presentation at Google Day on Big Data
Presentation at Google Day on Big DataPresentation at Google Day on Big Data
Presentation at Google Day on Big Data
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
 
A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
 
What is big data?
What is big data?What is big data?
What is big data?
 
Topic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale ComputingTopic 1: Big Data and Warehouse-scale Computing
Topic 1: Big Data and Warehouse-scale Computing
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Big data
Big dataBig data
Big data
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 
Big Data
Big DataBig Data
Big Data
 

Similar to Big data, Hadoop and Hive

ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 

Similar to Big data, Hadoop and Hive (20)

Big data
Big dataBig data
Big data
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data
Big DataBig Data
Big Data
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big data
Big dataBig data
Big data
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Big data
Big dataBig data
Big data
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data
Big dataBig data
Big data
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Big data, Hadoop and Hive

  • 1. Big Data – Srinath & Arjun
  • 2. Big Data – Srinath & Arjun • The BIG-DATA • Hadoop • Hadoop Components • Hadoop Eco Systems 2 Agenda
  • 3. Big Data – Srinath & Arjun The BIG-DATA
  • 4. Big Data – Srinath & Arjun 4 The Context • Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) • Google collects 270PB data in a month (2007), 20000PB a day (2008) • 2010 census data is expected to be a huge gold mine of information • Data mining huge amounts of data collected in a wide range of domains from astronomy to healthcare has become essential for planning and performance.
  • 5. Big Data – Srinath & Arjun • We are in a knowledge economy. – Data is an important asset to any organization – Discovery of knowledge; Enabling discovery; annotation of data • We are looking at newer – programming models, and – Supporting algorithms and data structures. The Context
  • 6. Big Data – Srinath & Arjun • Big Data is New • Big Data is only about Massive Data Volume • Big data means Hadoop • Big data need a Data Warehouse • Big data means Unstructured Data • Big data is for Social Media and Data mining Analyses 6 The Myth about Big Data
  • 7. Big Data – Srinath & Arjun It is all about better analytic on a broader spectrum of data, and therefore represents an opportunity to create even more differentiation among industries. 7 Big Data is…
  • 8. Big Data – Srinath & Arjun Where Data is coming….? 12+ TBs of tweet data every day 25+ TBs of log data every day ?TBsof dataevery day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  • 9. Big Data – Srinath & Arjun Facebook • 4.5 billion Facebook likes every day • 350 million photos uploaded on a daily basis • 250 billion photos stored by Facebook • 10 billion messages sent everyday • 1 trillion posts in Facebook’s graph search database • 500 TB of data processed daily • 100 PB of data stored in Facebok’s Hadoop disk cluster (1PB=1000TB=1000000 GB) Example of Big Data Generation
  • 10. Big Data – Srinath & Arjun Flights • 1 Boeing plane engine generates 20TB of data for every hour of flying • How much data do all the flights in this world generate every year if there are 100000 two engine flights daily? Example of Big Data Generation
  • 11. Big Data – Srinath & Arjun • Black Box Data • Social Media Data • Stock Exchange Data • Power Grid Data • Transport Data • Search Engine Data What comes under Big data?
  • 12. Big Data – Srinath & Arjun • Capturing Data • Storage • Searching • Sharing • Transfer • Analysis • Presentation Big Data Challenges
  • 13. Big Data – Srinath & Arjun Characteristics of Big Data Volume of Tweets create daily. 12+terabytes Variety of different types of data. 100’s Veracity decision makers trust their information. Only 1 in 3 trade events per second. 5+million Velocity
  • 14. Big Data – Srinath & Arjun • Structured data : Relational Data • Semi Structured data : XML data • Unstructured Data : Word, PDF, Text, Media Logs Types of Data
  • 15. Big Data – Srinath & Arjun The Data Explosion • 2.5 quintillion bytes of data created each year • 90 % of data in the world was created in the last two years
  • 16. Big Data – Srinath & Arjun Hadoop
  • 17. Big Data – Srinath & Arjun Hadoop • Open Source Software Framework • Inspired by Google’s Map – Reduce Programming Model (GFS) • Originally written for the Nutch search engine project • Written in java • Efficiently processes large volumes of Data • Breaks up Big data into multiple parts • Two key parts • HDFS • MapReduce
  • 18. Big Data – Srinath & Arjun History of Hadoop
  • 19. Big Data – Srinath & Arjun Hadoop Architecture
  • 20. Big Data – Srinath & Arjun Hadoop Components
  • 21. Big Data – Srinath & Arjun HDFS – Hadoop Distributed File System • It’s a file system designed for storing very large files running on cluster of commodity hardware • High fault tolerance, Distributed, Reliable, Scalable file system for Data Storage • Stores multiple copies of data on different nodes. (default 64MB) • Typically has a single namenode and no.of datanodes to form the HDFS cluster
  • 22. Big Data – Srinath & Arjun HDFS Architecture • Two types of Nodes  Master or Namenode  Slave or Datanode
  • 23. Big Data – Srinath & Arjun HDFS Architecture
  • 24. Big Data – Srinath & Arjun Read a File
  • 25. Big Data – Srinath & Arjun Write a File
  • 26. Big Data – Srinath & Arjun Hadoop Cluster Modes • Standalone Mode • Pseudo-Distributed Mode • Fully-Distributed Mode
  • 27. Big Data – Srinath & Arjun MapReduce Programming Model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks
  • 28. Big Data – Srinath & Arjun Terminology • Job • Task • Task Attempt • NameNode • MasterNode • SlaveNode • Clusters • Commodity Hardware
  • 29. Big Data – Srinath & Arjun Components • Master Nodes • Slave Nodes
  • 30. Big Data – Srinath & Arjun Workflow
  • 31. Big Data – Srinath & Arjun Example
  • 32. Big Data – Srinath & Arjun Closer Look
  • 33. Big Data – Srinath & Arjun Input Formats • Text Input Format • Sequential input format • Key value text input format
  • 34. Big Data – Srinath & Arjun NoSQL • NoSQL mean “not only SQL” • This includes key value stores, document-oriented databases, graph databases, big datable structures, and caching data stores Eg. MongoDB, Cassandra
  • 35. Big Data – Srinath & Arjun Hadoop ECO Systems
  • 36. Big Data – Srinath & Arjun What is HIVE? • Data Warehousing Infrastructure • Data Summarization, ad-hoc querying and analysis of large volumes of data
  • 37. Big Data – Srinath & Arjun HiveQL • HiveQL is the Hive query language. • Hive doesn’t support transactions.
  • 38. Big Data – Srinath & Arjun Hive Application • Log Processing • Text Mining • Document indexing • Customer – facing Business intelligence (eg. Google Analytics) • Predictive modelling, hypothesis testing
  • 39. Big Data – Srinath & Arjun Thank You….