Summit v4 dave wolcott

•

0 likes•510 views

The document summarizes the Hadoop stack and its components for storing and analyzing big data. It includes file storage with HDFS, data processing with MapReduce, data access tools like Hive and Pig, and security/monitoring with Kerberos and Nagios. HDFS uses metadata to track file locations across data nodes in a fault-tolerant manner similar to a file system.

Technology Business

Hadoop Stack
3
Extract Load Transform
HDFS
Propagate
RDBMS/Files
API Access
Business Intelligence
Staging
ConsumableData
Security
Kerberos
Data Chronology
Hive/Pig
Hbase
Meta Data
HCatalog
Job Scheduling
oozie
Extract to RDBMS
sqoop
Monitoring Tools
Nagios, Ganglia, Ambari
Direct Access to Raw Data
Hue
Data Serialization
Avro
Governance
HadoopStackandDataAccess
Data Extraction
Flume
Google Analytics
Data Movement
Map Reduce
lyndaLogs
User Sessions
ServicesandAPI`s

4
HDFS Analogy
File metadata
/user/dave/data -> 1,2,3
/user/subash/data -> 4,5
3 1
5
2
3
5
1
5
2
4
1 4
3
2
Name Node
Data Node

5
File Meta Data:
File Name
File Type
Access Rights
Owners
Timestamps
Size
Data block Pointers
1
2
3
4
5
6
7
8
9
10
11
12
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Indirect
HDFS Analogy

6
File Meta Data:
File Name
File Type
Access Rights
Owners
Timestamps
Size
Data block Pointers
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
Data Block
HDFS Analogy

9
HDFS Analogy
File metadata
/user/dave/data -> 1,2,3
/user/subash/data -> 4,5
3 1
5
2
3
5
1
5
2
4
1 4
3
2
Name Node
Data Node

Toronto, 20
Whitby, 25
New York, 22
Rome, 32
Toronto, 4
Rome, 33
New York, 18
New York, 18
Toronto, 18
Whitby, 27
New York, 32
Rome, 37
Toronto, 32
Whitby, 20
New York, 33
Rome, 38
Toronto, 22
Whitby, 19
Rome, 32
Whitby, 22
New York, 20
Rome, 31
New York, 17
Toronto, 31
Whitby, 22
New York, 19
Toronto, 28
Rome, 30
New York, 32
Toronto, 18
Whitby, 23
New York, 32
Rome, 37
Toronto, 22
Whitby, 20
New York, 31
Toronto, 32
Whitby, 27
New York, 33
Rome, 38
Toronto, 20
Whitby, 25
New York, 22
Rome, 33
Map Reduce
FiveParallelProcesses
ReducetheFivetoOne
Toronto, 32
Whitby, 27
New York, 33
Rome, 37
Toronto, 22
Whitby, 22
New York, 20
Rome, 38
Toronto, 31
Whitby, 22
New York, 19
Rome, 31
Toronto, 22
Whitby, 23
New York, 32
Rome, 37

What's hot

Describing LDP Applications with the Hydra Core VocabularyNandana Mihindukulasooriya

Globus Integrations (JupyterHub, Django, ...)Globus

Barcelona 2014: CrossRef System and Support Update by Chuck KoscherCrossref

Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray

Introduction to W3C Linked Data PlatformNandana Mihindukulasooriya

W3C Linked Data Platform OverviewSteve Speicher

REST meets Semantic WebSteve Speicher

The Crossref/ORCID Auto-Update: all you need to knowCrossref

Giving researchers credit for their data phase 3 pitchFiona Murphy

Data Citation Implementation at DataverseMerce Crosas

Text and Data Mining (TDM):Tools to make it easier by Chuck KoscherCrossref

Semantic web: where are we now? horvadam

Application integration with the W3C Linked Data standardsNandana Mihindukulasooriya

2013 CrossRef Workshops System Update Chuck KoscherCrossref

2009 0807 Lod GmodJun Zhao

Supporting Dataset Descriptions in the Life SciencesAlasdair Gray

Standardization and integration of molecular biology information with DASRafael C. Jimenez

DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai

Text and Data MiningCrossref

VistatecTAUS - The Language Data Network

What's hot (20)

Describing LDP Applications with the Hydra Core Vocabulary

Globus Integrations (JupyterHub, Django, ...)

Barcelona 2014: CrossRef System and Support Update by Chuck Koscher

Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...

Introduction to W3C Linked Data Platform

W3C Linked Data Platform Overview

REST meets Semantic Web

The Crossref/ORCID Auto-Update: all you need to know

Giving researchers credit for their data phase 3 pitch

Data Citation Implementation at Dataverse

Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher

Semantic web: where are we now?

Application integration with the W3C Linked Data standards

2013 CrossRef Workshops System Update Chuck Koscher

2009 0807 Lod Gmod

Supporting Dataset Descriptions in the Life Sciences

Standardization and integration of molecular biology information with DAS

DataTags, The Tags Toolset, and Dataverse Integration

Text and Data Mining

Vistatec

Viewers also liked

La big datacamp2014_vikram_dixitData Con LA

Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Data Con LA

Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA

20140614 introduction to spark-ben whiteData Con LA

140614 bigdatacamp-la-keynote-jon hsiehData Con LA

Big datacamp june14_alex_liuData Con LA

Ag big datacampla-06-14-2014-ajay_gopalData Con LA

Aziksa hadoop for buisness users2 santosh jhaData Con LA

Yarn cloudera-kathleenting061414 kate-tingData Con LA

2014 bigdatacamp asya_kamskyData Con LA

Kiji cassandra la june 2014 - v02 clint-kellyData Con LA

Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Data Con LA

Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA

Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA

Hadoop Innovation Summit 2014Data Con LA

Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Data Con LA

Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Data Con LA

Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Data Con LA

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA

Viewers also liked (20)

La big datacamp2014_vikram_dixit

Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...

Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...

20140614 introduction to spark-ben white

140614 bigdatacamp-la-keynote-jon hsieh

Big datacamp june14_alex_liu

Ag big datacampla-06-14-2014-ajay_gopal

Aziksa hadoop for buisness users2 santosh jha

Yarn cloudera-kathleenting061414 kate-ting

2014 bigdatacamp asya_kamsky

Kiji cassandra la june 2014 - v02 clint-kelly

Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...

Hadoop and NoSQL joining forces by Dale Kim of MapR

Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...

Hadoop Innovation Summit 2014

Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...

Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...

Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...

Similar to Summit v4 dave wolcott

Hadoop securityBiju Nair

Hadoop Distributed File SystemNilaNila16

Open Source Security Tools for Big DataGreat Wide Open

Open Source Security Tools for Big DataRommel Garcia

Hadoop training by keylabsSiva Sankar

Apache Hadoop In Theory And PracticeAdam Kawa

Big data and lynda_Subash_DSouza.comData Con LA

May 2013 HUG: HCatalog/Hive Data OutYahoo Developer Network

HUG Meetup 2013: HCatalog / Hive Data Out Sumeet Singh

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn

HadoopSyed Measum Haider Bokhari

Datalake ArchitectureTechYugadi IT Solutions & Consulting

Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.

UserGuideHDFS_FinalDocumentAnna Ellis

HadoopAli Bahu

Introduction to HDFSBhavesh Padharia

Hadoop File system (HDFS)Prashant Gupta

Hadoop distributed file systemAmeya Vijay Gokhale

Охота на уязвимости HadoopPositive Hack Days

Hadoop admin trainingArun Kumar

Similar to Summit v4 dave wolcott (20)

Hadoop security

Hadoop Distributed File System

Open Source Security Tools for Big Data

Hadoop training by keylabs

Apache Hadoop In Theory And Practice

Big data and lynda_Subash_DSouza.com

May 2013 HUG: HCatalog/Hive Data Out

HUG Meetup 2013: HCatalog / Hive Data Out

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...

Hadoop

Datalake Architecture

Combat Cyber Threats with Cloudera Impala & Apache Hadoop

UserGuideHDFS_FinalDocument

Hadoop

Introduction to HDFS

Hadoop File system (HDFS)

Hadoop distributed file system

Охота на уязвимости Hadoop

Hadoop admin training

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School

UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10

UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10

Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung

The Future of Platform EngineeringJemma Hussein Allen

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck

How world-class product teams are winning in the AI era by CEO and Founder, P...Product School

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable

ODC, Data Fabric and Architecture User GroupCatarinaPereira64715

UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff

Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software

Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada

To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel

IoT Analytics Company Presentation May 2024IoTAnalytics

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

UiPath Test Automation using UiPath Test Suite series, part 2

UiPath Test Automation using UiPath Test Suite series, part 3

Key Trends Shaping the Future of Infrastructure.pdf

The Future of Platform Engineering

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

How world-class product teams are winning in the AI era by CEO and Founder, P...

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

ODC, Data Fabric and Architecture User Group

UiPath Test Automation using UiPath Test Suite series, part 1

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Essentials of Automations: Optimizing FME Workflows with Parameters

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

IoT Analytics Company Presentation May 2024

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Summit v4 dave wolcott

1. Big Data Summit Dave Wolcott 1

2. High Level 2

3. Hadoop Stack 3 Extract Load Transform HDFS Propagate RDBMS/Files API Access Business Intelligence Staging ConsumableData Security Kerberos Data Chronology Hive/Pig Hbase Meta Data HCatalog Job Scheduling oozie Extract to RDBMS sqoop Monitoring Tools Nagios, Ganglia, Ambari Direct Access to Raw Data Hue Data Serialization Avro Governance HadoopStackandDataAccess Data Extraction Flume Google Analytics Data Movement Map Reduce lyndaLogs User Sessions ServicesandAPI`s

4. 4 HDFS Analogy File metadata /user/dave/data -> 1,2,3 /user/subash/data -> 4,5 3 1 5 2 3 5 1 5 2 4 1 4 3 2 Name Node Data Node

5. 5 File Meta Data: File Name File Type Access Rights Owners Timestamps Size Data block Pointers 1 2 3 4 5 6 7 8 9 10 11 12 Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block Indirect HDFS Analogy

6. 6 File Meta Data: File Name File Type Access Rights Owners Timestamps Size Data block Pointers Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block Data Block HDFS Analogy

7. 7 X X X X X X X X HDFS Analogy

8. 8 X X X X X X X X HDFS Analogy

9. 9 HDFS Analogy File metadata /user/dave/data -> 1,2,3 /user/subash/data -> 4,5 3 1 5 2 3 5 1 5 2 4 1 4 3 2 Name Node Data Node

10. Toronto, 20 Whitby, 25 New York, 22 Rome, 32 Toronto, 4 Rome, 33 New York, 18 New York, 18 Toronto, 18 Whitby, 27 New York, 32 Rome, 37 Toronto, 32 Whitby, 20 New York, 33 Rome, 38 Toronto, 22 Whitby, 19 Rome, 32 Whitby, 22 New York, 20 Rome, 31 New York, 17 Toronto, 31 Whitby, 22 New York, 19 Toronto, 28 Rome, 30 New York, 32 Toronto, 18 Whitby, 23 New York, 32 Rome, 37 Toronto, 22 Whitby, 20 New York, 31 Toronto, 32 Whitby, 27 New York, 33 Rome, 38 Toronto, 20 Whitby, 25 New York, 22 Rome, 33 Map Reduce FiveParallelProcesses ReducetheFivetoOne Toronto, 32 Whitby, 27 New York, 33 Rome, 37 Toronto, 22 Whitby, 22 New York, 20 Rome, 38 Toronto, 31 Whitby, 22 New York, 19 Rome, 31 Toronto, 22 Whitby, 23 New York, 32 Rome, 37

Summit v4 dave wolcott

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Summit v4 dave wolcott

Similar to Summit v4 dave wolcott (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Summit v4 dave wolcott