SlideShare a Scribd company logo
1 of 35
Download to read offline
2
Vision
To become leading consulting and
training provider in the field of Data
Analytics, Machine Learning, Big Data in
India & Overseas.
Mission
To create value for our customers by
providing consulting services and to
impart high quality training & skill
enhancement programs for employability.
About Us
Emerging India is promoted by professionals from IIT’s, IIM’s, MBAs and experts from Education and IT Industry. We are one of the India’s
fastest growing Analytics/ IT consulting and training companies. We offer services in both consulting and training domain including
NASSCOM certified professional programs (designed to bridge the gap between academics and Industry) and Data Analytics/ Cyber
Security/ IoT/ Robotics/ AI/ Blockchain consulting solutions. We are also proud NASSCOM member and NASSCOM
SSC Licensed Training Partner for the northern region in India.. As NASSCOM licensed training partner, Emerging
India is proudly taking NASSCOM SSC initiatives to the next level in the field of Data Analytics to enhance the technical skills of students
& working professionals.
Mrs. Rakhi Singh, Delivery Head (NASSCOM
Certified Trainer)
Mr. Mayank Jain, Big Data Developer and
Analyst
Mr. Kapil Sharma Center-Head cum Trainer
(Certified by North-western University)
Speakers:
What is HDFS
 HDFS stands for Hadoop Distributed File System
 Built on top of Ext3/Ext4 file System
 Designed to store large amount of data reliably and efficiently
 Ensure 100% data availability (High Availability Cluster)
 Do not permit Update Operation
 Built for OLAP, not for OLTP
Architecture of HDFS
Ext3/Ext4
SecondaryName
Node
Name Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
HDFS
Master
Slave
MR Operation
A.txt 100mb
Client1 a.txt code1
A
.
t
x
t
A
.t
x
t
MR Framwork
Mapper Reducer
< Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out>
Input
Patitioner
Output
Hi
Hello
Hi
Hadoop
World
Hive
Hadoop
Hive
Hi
Hello
World
a.txt
Hi
Hello
Hi
Hadoop
World
Hive
Hadoop
Hive
Hi
Hello
World
block1
block2
Mapper1
Mapper2
1,Hi
2,Hello
3,Hi
4,Hadoop
5,World
1,Hive
2,Hadoop
3,Hive
4,Hi
5,Hello
6,World
Reducer1
Hi,1
Hello,1
Hi,1
Hadoop,1
World,1
Hive,1
Hadoop,1
Hive,1
Hi,1
Hello,1
World,1
MR Framwork
Mapper Reducer
< Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out>
Input
Patitioner
Output
Reducer1
Hi,1
Hello,1
Hi,1
Hadoop,1
World,1
Hive,1
Hadoop,1
Hive,1
Hi,1
Hello,1
World,1
Hi,<1,1,1>
Hello,<1,1>
Hadoop,<1,1>
World,<1,1>
Hive,<1,1>
Hi,3
Hello,2
Hadoop,2
World,2
Hive,2
HIVE
Driver
Execution Engine
=MapReduce
Compiler Translator
Client Submits SQL
Convert SQL to Map Reduce
Metastore
Derby
Hive
 Database
 Types – Internal (Managed table) & External
 Internal(default): In case of drop operation, it will delete
data + metadata
 External Table: It drop only metadata
 Optimization Technique
 Partitioning
 Bucketing
Introduction to HBase
HBase is a Nosql, non-relational, distributed column-oriented database on top of
Hadoop.
NoSQL - NoSQL database are databases that doesn't use SQL engine as query engine.
Hbase Daemons
Daemons are services that run on individual machines and communicate with each other
HMaster — Master server of HBase, contains all meta data.
HRegionserver — Slave server of Hbase, contains the actual data.
HQuorumpeer — Zookeeper daemons for co-ordination service.
Advantages of using HBase
Provides a highly scalable database with nativity with hadoop.
Nodes can be added on the fly.
HBase ( LSM Tree)
Normalization vs Denormalization
HBase Data Model
Introduction to Spark
Introduction to Spark
Key Features
• RDD
• DAG
• Dataframe
• Lazzy
Questions &
Feedback !!!!
Our Location A
H-196,304,Iind Floor
Sector 63, ,Noida –
201301
Our Phone
+91 120-4169097
+91 8860599698
Email / Website
info@emergingindiagroup.com
https://www.emergingindiagro
up.com
Get in Touch with Us
We would be glad to hear from you !

More Related Content

Similar to Big data technologies by Emerging India Analytics

Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceMakoto Yui
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010Cloudera, Inc.
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Haider
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Intellipaat
 
Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Intellipaat
 
Bhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueBhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueVijayananda Mohire
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram Parida
 
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_KumarSAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumararavindkvs
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Indus Khaitan
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Basecamp Startups Company Profile
Basecamp Startups Company ProfileBasecamp Startups Company Profile
Basecamp Startups Company ProfileRupesh Patil
 
Kalyan Hadoop
Kalyan HadoopKalyan Hadoop
Kalyan HadoopCanarys
 

Similar to Big data technologies by Emerging India Analytics (20)

Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experience
 
Resume
ResumeResume
Resume
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
 
Madhu
MadhuMadhu
Madhu
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updated
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020
 
Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020
 
New-RajeshNaspoori_profile
New-RajeshNaspoori_profileNew-RajeshNaspoori_profile
New-RajeshNaspoori_profile
 
Bhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueBhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogue
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum Master
 
Technovalley RedHat
Technovalley RedHatTechnovalley RedHat
Technovalley RedHat
 
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_KumarSAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
 
Resume
ResumeResume
Resume
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Basecamp Startups Company Profile
Basecamp Startups Company ProfileBasecamp Startups Company Profile
Basecamp Startups Company Profile
 
Kalyan Hadoop
Kalyan HadoopKalyan Hadoop
Kalyan Hadoop
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Big data technologies by Emerging India Analytics

  • 1.
  • 2. 2 Vision To become leading consulting and training provider in the field of Data Analytics, Machine Learning, Big Data in India & Overseas. Mission To create value for our customers by providing consulting services and to impart high quality training & skill enhancement programs for employability. About Us Emerging India is promoted by professionals from IIT’s, IIM’s, MBAs and experts from Education and IT Industry. We are one of the India’s fastest growing Analytics/ IT consulting and training companies. We offer services in both consulting and training domain including NASSCOM certified professional programs (designed to bridge the gap between academics and Industry) and Data Analytics/ Cyber Security/ IoT/ Robotics/ AI/ Blockchain consulting solutions. We are also proud NASSCOM member and NASSCOM SSC Licensed Training Partner for the northern region in India.. As NASSCOM licensed training partner, Emerging India is proudly taking NASSCOM SSC initiatives to the next level in the field of Data Analytics to enhance the technical skills of students & working professionals.
  • 3. Mrs. Rakhi Singh, Delivery Head (NASSCOM Certified Trainer) Mr. Mayank Jain, Big Data Developer and Analyst Mr. Kapil Sharma Center-Head cum Trainer (Certified by North-western University) Speakers:
  • 4. What is HDFS  HDFS stands for Hadoop Distributed File System  Built on top of Ext3/Ext4 file System  Designed to store large amount of data reliably and efficiently  Ensure 100% data availability (High Availability Cluster)  Do not permit Update Operation  Built for OLAP, not for OLTP
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Architecture of HDFS Ext3/Ext4 SecondaryName Node Name Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node HDFS Master Slave
  • 23. MR Operation A.txt 100mb Client1 a.txt code1 A . t x t A .t x t
  • 24. MR Framwork Mapper Reducer < Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out> Input Patitioner Output Hi Hello Hi Hadoop World Hive Hadoop Hive Hi Hello World a.txt Hi Hello Hi Hadoop World Hive Hadoop Hive Hi Hello World block1 block2 Mapper1 Mapper2 1,Hi 2,Hello 3,Hi 4,Hadoop 5,World 1,Hive 2,Hadoop 3,Hive 4,Hi 5,Hello 6,World Reducer1 Hi,1 Hello,1 Hi,1 Hadoop,1 World,1 Hive,1 Hadoop,1 Hive,1 Hi,1 Hello,1 World,1
  • 25. MR Framwork Mapper Reducer < Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out> Input Patitioner Output Reducer1 Hi,1 Hello,1 Hi,1 Hadoop,1 World,1 Hive,1 Hadoop,1 Hive,1 Hi,1 Hello,1 World,1 Hi,<1,1,1> Hello,<1,1> Hadoop,<1,1> World,<1,1> Hive,<1,1> Hi,3 Hello,2 Hadoop,2 World,2 Hive,2
  • 26. HIVE Driver Execution Engine =MapReduce Compiler Translator Client Submits SQL Convert SQL to Map Reduce Metastore Derby
  • 27. Hive  Database  Types – Internal (Managed table) & External  Internal(default): In case of drop operation, it will delete data + metadata  External Table: It drop only metadata  Optimization Technique  Partitioning  Bucketing
  • 28. Introduction to HBase HBase is a Nosql, non-relational, distributed column-oriented database on top of Hadoop. NoSQL - NoSQL database are databases that doesn't use SQL engine as query engine. Hbase Daemons Daemons are services that run on individual machines and communicate with each other HMaster — Master server of HBase, contains all meta data. HRegionserver — Slave server of Hbase, contains the actual data. HQuorumpeer — Zookeeper daemons for co-ordination service. Advantages of using HBase Provides a highly scalable database with nativity with hadoop. Nodes can be added on the fly.
  • 29. HBase ( LSM Tree)
  • 33. Introduction to Spark Key Features • RDD • DAG • Dataframe • Lazzy
  • 35. Our Location A H-196,304,Iind Floor Sector 63, ,Noida – 201301 Our Phone +91 120-4169097 +91 8860599698 Email / Website info@emergingindiagroup.com https://www.emergingindiagro up.com Get in Touch with Us We would be glad to hear from you !