SlideShare a Scribd company logo
Hadoop Demystified
What is it? How does Microsoft fit in?
and… of course… some demos!
Presentation for ATL .NET User Group
(July, 2014)
Lester Martin
Page 1
Agenda
• Hadoop 101
–Fundamentally, What is Hadoop?
–How is it Different?
–History of Hadoop
• Components of the Hadoop Ecosystem
• MapReduce, Pig, and Hive Demos
–Word Count
–Open Georgia Dataset Analysis
Page 2
Connection before Content
• Lester Martin
• Hortonworks – Professional Services
• lmartin@hortonworks.com
• http://about.me/lestermartin (links to blog, github, twitter, LI, FB, etc)
Page 3
© Hortonworks Inc. 2012
Scale-Out
Processing
Scalable, Fault Tolerant, Open Source Data Storage and Processing
Page 7
MapReduce
What is Core Apache Hadoop?
Flexibility to Store and Mine
Any Type of Data
 Ask questions that were previously
impossible to ask or solve
 Not bound by a single, fixed schema
Excels at
Processing Complex Data
 Scale-out architecture divides
workloads across multiple nodes
 Eliminates ETL bottlenecks
Scales
Economically
 Deployed on “commodity” hardware
 Open source platform guards
against vendor lock
Scale-Out
Storage
HDFS
Scale-Out
Resource Mgt
YARN
The Need for Hadoop
• Store and use all types of data
• Process ALL the data; not just a sample
• Scalability to 1000s of nodes
• Commodity hardware
Page 5
Relational Database vs. Hadoop
Relational Hadoop
Required on write schema Required on Read
Reads are fast speed Writes are fast
Standards and structure governance Loosely structured
Limited, no data processing processing Processing coupled with data
Structured data types Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
best fit use Data Discovery
Processing unstructured data
Massive storage/processing
P
Fundamentally, a Simple Algorithm
1. Review stack of quarters
2. Count each year that ends
in an even number
Page 7
Processing at Scale
Page 8
Distributed Algorithm – Map:Reduce
Page 9
Map
(total number of quarters)
Reduce
(sum each person’s total)
A Brief History of Apache Hadoop
Page 10
2013
Focus on INNOVATION
2005: Hadoop created
at Yahoo!
Focus on OPERATIONS
2008: Yahoo team extends focus to
operations to support multiple
projects & growing clusters
Yahoo! begins to
Operate at scale
Enterprise
Hadoop
Apache Project
Established
Hortonworks
Data Platform
2004 2008 2010 20122006
STABILITY
2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24
key Hadoop engineers from Yahoo
HDP / Hadoop Components
Page 11
HDP: Enterprise Hadoop Platform
Page 12
Hortonworks
Data Platform (HDP)
• The ONLY 100% open source
and complete platform
• Integrates full range of
enterprise-ready services
• Certified and tested at scale
• Engineered for deep
ecosystem interoperability
OS/VM Cloud Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP
TEZREDUCE
HIVE &
HCATALOG
PIGHBASE
Typical Hadoop Cluster
Page 13
HDFS - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
write blocks
block report
fs sync Backup NN
per NN
checkpoint
Name Node
Hive
• Data warehousing package built on top of Hadoop
• Bringing structure to unstructured data
• Query petabytes of data with HiveQL
• Schema on read
1
•
•
–
–
Hive: SQL-Like Interface to Hadoop
• Provides basic SQL functionality using MapReduce to
execute queries
• Supports standard SQL clauses
INSERT INTO
SELECT
FROM … JOIN … ON
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT
• Supports basic DDL
CREATE/ALTER/DROP TABLE, DATABASE
Page 17
Hortonworks Investment
in Apache Hive
Batch AND Interactive SQL-IN-Hadoop
Stinger Initiative
A broad, community-based effort to
drive the next generation of HIVE
Page 18
Stinger Phase 3
• Hive on Apache Tez
• Query Service (always on)
• Buffer Cache
• Cost Based Optimizer (Optiq)
Stinger Phase 1:
• Base Optimizations
• SQL Types
• SQL Analytic Functions
• ORCFile Modern File Format
Stinger Phase 2:
• SQL Types
• SQL Analytic Functions
• Advanced Optimizations
• Performance Boosts via YARN
Speed
Improve Hive query performance by 100X to
allow for interactive query times (seconds)
Scale
The only SQL interface to Hadoop designed
for queries that scale from TB to PB
Goals:
…70% complete
in 6 months…all IN Hadoop
SQL
Support broadest range of SQL semantics for
analytic applications running against Hadoop
Stinger: Enhancing SQL Semantics
Page 19
Hive SQL Datatypes Hive SQL Semantics
INT SELECT, LOAD, INSERT from query
TINYINT/SMALLINT/BIGINT Expressions in WHERE and HAVING
BOOLEAN GROUP BY, ORDER BY, SORT BY
FLOAT Sub-queries in FROM clause
DOUBLE GROUP BY, ORDER BY
STRING CLUSTER BY, DISTRIBUTE BY
TIMESTAMP ROLLUP and CUBE
BINARY UNION
DECIMAL LEFT, RIGHT and FULL INNER/OUTER JOIN
ARRAY, MAP, STRUCT, UNION CROSS JOIN, LEFT SEMI JOIN
CHAR Windowing functions (OVER, RANK, etc.)
VARCHAR INTERSECT, EXCEPT, UNION DISTINCT
DATE Sub-queries in HAVING
Sub-queries in WHERE (IN/NOT IN,
EXISTS/NOT EXISTS
Hive 0.10
Hive 12
Hive 0.11
Compete Subset
Hive 13
Pig
• Pig was created at Yahoo! to analyze data in HDFS without writing
Map/Reduce code.
• Two components:
– SQL like processing language called “Pig Latin”
– PIG execution engine producing Map/Reduce code
• Popular uses:
– ETL at scale (offloading)
– Text parsing and processing to Hive or HBase
– Aggregating data from multiple sources
•
•
•
Pig
Sample Code to find dropped call data:
4G_Data = LOAD ‘/archive/FDR_4G.txt’ using TextLoader();
Customer_Master = LOAD ‘masterdb.customer_data’ using
HCatLoader();
4G_Data_Full = JOIN 4G_Data by customerID, CustomerMaster by
customerID;
X = FILTER 4G_Data_Full BY State == ‘call_dropped’;
•
•
•
Typical Data Analysis Workflow
Powering the Modern Data Architecture
HADOOP 2.0
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Page 23
Interact with all data in
multiple ways simultaneously
Redundant, Reliable Storage
HDFS 2
Cluster Resource Management
YARN
Standard SQL
Processing
Hive
Batch
MapReduce
Interactive
Tez
Online Data
Processing
HBase, Accumulo
Real Time Stream
Processing
Storm
others
…
HADOOP 1.0
HDFS 1
(redundant, reliable storage)
MapReduce
(distributed data processing
& cluster resource management)
Single Use System
Batch Apps
Data Processing
Frameworks
(Hive, Pig, Cascading, …)
Word Counting Time!!
Hadoop’s “Hello Whirled” Example
A quick refresher of core elements of
Hadoop and then code walk-thrus with
Java MapReduce and Pig
Page 25
Core Hadoop Concepts
• Applications are written in high-level code
–Developers need not worry about network programming, temporal
dependencies or low-level infrastructure
• Nodes talk to each other as little as possible
–Developers should not write code which communicates between
nodes
–“Shared nothing” architecture
• Data is spread among machines in advance
–Computation happens where the data is stored, wherever possible
– Data is replicated multiple times on the system for increased
availability and reliability
Page 26
Hadoop: Very High-Level Overview
• When data is loaded in the system, it is split into
“blocks”
–Typically 64MB or 128MB
• Map tasks (first part of MapReduce) work on relatively
small portions of data
–Typically a single block
• A master program allocates work to nodes such that a
Map tasks will work on a block of data stored locally
on that node whenever possible
–Many nodes work in parallel, each on their own part of the overall
dataset
Page 27
Fault Tolerance
• If a node fails, the master will detect that failure and
re-assign the work to a different node on the system
• Restarting a task does not require communication
with nodes working on other portions of the data
• If a failed node restarts, it is automatically added back
to the system and assigned new tasks
• If a nodes appears to be running slowly, the master
can redundantly execute another instance of the same
task
–Results from the first to finish will be used
–Known as “speculative execution”
Page 28
Hadoop Components
• Hadoop consists of two core components
–The Hadoop Distributed File System (HDFS)
–MapReduce
• Many other projects based around core Hadoop (the
“Ecosystem”)
–Pig, Hive, Hbase, Flume, Oozie, Sqoop, Datameer, etc
• A set of machines running HDFS and MapReduce is
known as a Hadoop Cluster
–Individual machines are known as nodes
–A cluster can have as few as one node, as many as several
thousand
– More nodes = better performance!
Page 29
Hadoop Components: HDFS
• HDFS, the Hadoop Distributed File System, is
responsible for storing data on the cluster
• Data is split into blocks and distributed across
multiple nodes in the cluster
–Each block is typically 64MB (the default) or 128MB in size
• Each block is replicated multiple times
–Default is to replicate each block three times
–Replicas are stored on different nodes
– This ensures both reliability and availability
Page 30
HDFS Replicated Blocks Visualized
Page 31
HDFS *is* a File System
• Screenshot for “Name Node UI”
Page 32
Accessing HDFS
• Applications can read and write HDFS files directly via
a Java API
• Typically, files are created on a local filesystem and
must be moved into HDFS
• Likewise, files stored in HDFS may need to be moved
to a machine’s local filesystem
• Access to HDFS from the command line is achieved
with the hdfs dfs command
–Provides various shell-like commands as you find on Linux
–Replaces the hadoop fs command
• Graphical tools available like the Sandbox’s Hue File
Browser and Red Gate’s HDFS Explorer
Page 33
hdfs dfs Examples
• Copy file foo.txt from local disk to the user’s directory
in HDFS
–This will copy the file to /user/username/fooHDFS.txt
• Get a directory listing of the user’s home directory in
HDFS
• Get a directory listing of the HDFS root directory
Page 34
hdfs dfs –put fooLocal.txt fooHDFS.txt
hdfs dfs –ls
hdfs dfs –ls /
hdfs dfs Examples (continued)
• Display the contents of a specific HDFS file
• Move that file back to the local disk
• Create a directory called input under the user’s home
directory
• Delete the HDFS directory input and all its contents
Page 35
hdfs dfs –cat /user/fred/fooHDFS.txt
hdfs dfs –mkdir input
hdfs dfs –rm –r input
hdfs dfs –get /user/fred/fooHDFS.txt barLocal.txt
Hadoop Components: MapReduce
• MapReduce is the system used to process data in the
Hadoop cluster
• Consists of two phases: Map, and then Reduce
–Between the two is a stage known as the shuffle and sort
• Each Map task operates on a discrete portion of the
overall dataset
–Typically one HDFS block of data
• After all Maps are complete, the MapReduce system
distributes the intermediate data to nodes which
perform the Reduce phase
–Source code examples and live demo coming!
Page 36
Features of MapReduce
• Hadoop attempts to run tasks on nodes which hold
their portion of the data locally, to avoid network
traffic
• Automatic parallelization, distribution, and fault-
tolerance
• Status and monitoring tools
• A clean abstraction for programmers
–MapReduce programs are usually written in Java
– Can be written in any language using Hadoop Streaming
– All of Hadoop is written in Java
–With “housekeeping” taken care of by the framework, developers
can concentrate simply on writing Map and Reduce functions
Page 37
MapReduce Visualized
Page 38
Detailed Administrative Console
• Screenshot from “Job Tracker UI”
Page 39
MapReduce: The Mapper
• The Mapper reads data in the form of key/value pairs
(KVPs)
• It outputs zero or more KVPs
• The Mapper may use or completely ignore the input
key
–For example, a standard pattern is to read a line of a file at a time
– The key is the byte offset into the file at which the line starts
– The value is the contents of the line itself
– Typically the key is considered irrelevant with this pattern
• If the Mapper writes anything out, it must in the form
of KVPs
–This “intermediate data” is NOT stored in HDFS (local storage only
without replication)
Page 40
MapReducer: The Reducer
• After the Map phase is over, all the intermediate
values for a given intermediate key are combined
together into a list
• This list is given to a Reducer
–There may be a single Reducer, or multiple Reducers
–All values associated with a particular intermediate key are
guaranteed to go to the same Reducer
–The intermediate keys, and their value lists, are passed in sorted
order
• The Reducer outputs zero or more KVPs
–These are written to HDFS
–In practice, the Reducer often emits a single KVP for each input
key
Page 41
MapReduce Example: Word Count
• Count the number of occurrences of each word in a
large amount of input data
Page 42
map(String input_key, String input_value)
foreach word in input_value:
emit(w,1)
reduce(String output_key, Iter<int> intermediate_vals)
set count = 0
foreach val in intermediate_vals:
count += val
emit(output_key, count)
MapReduce Example: Map Phase
Page 43
• Input to the Mapper
• Ignoring the key
– It is just an offset
• Output from the Mapper
• No attempt is made to optimize
within a record in this example
– This is a great use case for a
“Combiner”
(8675, ‘I will not eat
green eggs and ham’)
(8709, ‘I will not eat
them Sam I am’)
(‘I’, 1), (‘will’, 1),
(‘not’, 1), (‘eat’, 1),
(‘green’, 1), (‘eggs’, 1),
(‘and’, 1), (‘ham’, 1),
(‘I’, 1), (‘will’, 1),
(‘not’, 1), (‘eat’, 1),
(‘them’, 1), (‘Sam’, 1),
(‘I’, 1), (‘am’, 1)
MapReduce Example: Reduce Phase
Page 44
• Input to the Reducer
• Notice keys are sorted and
associated values for same key
are in a single list
– Shuffle & Sort did this for us
• Output from the Reducer
• All done!
(‘I’, [1, 1, 1])
(‘Sam’, [1])
(‘am’, [1])
(‘and’, [1])
(‘eat’, [1, 1])
(‘eggs’, [1])
(‘green’, [1])
(‘ham’, [1])
(‘not’, [1, 1])
(‘them’, [1])
(‘will’, [1, 1])
(‘I’, 3)
(‘Sam’, 1)
(‘am’, 1)
(‘and’, 1)
(‘eat’, 2)
(‘eggs’, 1)
(‘green’, 1)
(‘ham’, 1)
(‘not’, 2)
(‘them’, 1)
(‘will’, 2)
Code Walkthru & Demo Time!!
• Word Count Example
–Java MapReduce
–Pig
Page 45
Additional Demonstrations
A Real-World Analysis Example
Compare/contrast solving the same
problem with Java MapReduce, Pig,
and Hive
Page 46
Dataset: Open Georgia
• Salaries & Travel Reimbursements
–Organization
– Local Boards of Education
– Several Atlanta-area districts; multiple years
– State Agencies, Boards, Authorities and Commissions
– Dept of Public Safety; 2010
Page 47
Format & Sample Data
Page 48
NAME (String) TITLE (String)
SALARY
(float)
ORG TYPE
(String)
ORG (String) YEAR (int)
ABBOTT,DEEDEE W
GRADES 9-12
TEACHER
52,122.10 LBOE
ATLANTA INDEPENDENT
SCHOOL SYSTEM
2010
ALLEN,ANNETTE D
SPEECH-LANGUAGE
PATHOLOGIST
92,937.28 LBOE
ATLANTA INDEPENDENT
SCHOOL SYSTEM
2010
BAHR,SHERREEN T GRADE 5 TEACHER 52,752.71 LBOE
COBB COUNTY SCHOOL
DISTRICT
2010
BAILEY,ANTOINETT
E R
SCHOOL
SECRETARY/CLERK
19,905.90 LBOE
COBB COUNTY SCHOOL
DISTRICT
2010
BAILEY,ASHLEY N
EARLY INTERVENTION
PRIMARY TEACHER
43,992.82 LBOE
COBB COUNTY SCHOOL
DISTRICT
2010
CALVERT,RONALD
MARTIN
STATE PATROL (SP) 51,370.40 SABAC
PUBLIC SAFETY, DEPARTMENT
OF
2010
CAMERON,MICHAE
L D
PUBLIC SAFETY TRN
(AL)
34,748.60 SABAC
PUBLIC SAFETY, DEPARTMENT
OF
2010
DAAS,TARWYN
TARA
GRADES 9-12
TEACHER
41,614.50 LBOE
FULTON COUNTY BOARD OF
EDUCATION
2011
DABBS,SANDRA L
GRADES 9-12
TEACHER
79,801.59 LBOE
FULTON COUNTY BOARD OF
EDUCATION
2011
E'LOM,SOPHIA L
IS PERSONNEL -
GENERAL ADMIN
75,509.00 LBOE
FULTON COUNTY BOARD OF
EDUCATION
2012
EADDY,FENNER R SUBSTITUTE 13,469.00 LBOE
FULTON COUNTY BOARD OF
EDUCATION
2012
EADY,ARNETTA A ASSISTANT PRINCIPAL 71,879.00 LBOE
FULTON COUNTY BOARD OF
EDUCATION
2012
Simple Use Case
• For all loaded State of Georgia salary information
–Produce statistics for each specific job title
– Number of employees
– Salary breakdown
– Minimum
– Maximum
– Average
–Limit the data to investigate
– Fiscal year 2010
– School district employees
Page 49
Code Walkthru & Demo; Part Deux!
• Word Count Example
–Java MapReduce
–Pig
–Hive
Page 50
Demo Wrap-Up
• All code, test data, wiki pages, and blog posting can
be found, or linked to, from
–https://github.com/lestermartin/hadoop-exploration
• This deck can be found on SlideShare
–http://www.slideshare.net/lestermartin
• Questions?
Page 51
Thank You!!
• Lester Martin
• Hortonworks – Professional Services
• lmartin@hortonworks.com
• http://about.me/lestermartin (links to blog, github, twitter, LI, FB, etc)
Page 52

More Related Content

What's hot

In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
Robert Friberg
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Maarten Smeets
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
Christopher Foot
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
Hadoop
HadoopHadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
Rinat Abdullin
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
ScyllaDB
 
Redis introduction
Redis introductionRedis introduction
Graph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsightGraph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsight
Md. Farhan Memon
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
DataWorks Summit
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
 

What's hot (20)

In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
Hadoop
HadoopHadoop
Hadoop
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
 
Redis introduction
Redis introductionRedis introduction
Redis introduction
 
Graph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsightGraph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsight
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
 

Viewers also liked

Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
Lester Martin
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
Lester Martin
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
Ahsan Javed Awan
 
Deep learning - Part I
Deep learning - Part IDeep learning - Part I
Deep learning - Part I
QuantUniversity
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
Michael Young
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
Sri Ambati
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
markgrover
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
 

Viewers also liked (20)

Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Deep learning - Part I
Deep learning - Part IDeep learning - Part I
Deep learning - Part I
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 

Similar to Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos

Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Derek Chen
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 

Similar to Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos (20)

Hadoop
HadoopHadoop
Hadoop
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 

Recently uploaded

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos

  • 1. Hadoop Demystified What is it? How does Microsoft fit in? and… of course… some demos! Presentation for ATL .NET User Group (July, 2014) Lester Martin Page 1
  • 2. Agenda • Hadoop 101 –Fundamentally, What is Hadoop? –How is it Different? –History of Hadoop • Components of the Hadoop Ecosystem • MapReduce, Pig, and Hive Demos –Word Count –Open Georgia Dataset Analysis Page 2
  • 3. Connection before Content • Lester Martin • Hortonworks – Professional Services • lmartin@hortonworks.com • http://about.me/lestermartin (links to blog, github, twitter, LI, FB, etc) Page 3
  • 4. © Hortonworks Inc. 2012 Scale-Out Processing Scalable, Fault Tolerant, Open Source Data Storage and Processing Page 7 MapReduce What is Core Apache Hadoop? Flexibility to Store and Mine Any Type of Data  Ask questions that were previously impossible to ask or solve  Not bound by a single, fixed schema Excels at Processing Complex Data  Scale-out architecture divides workloads across multiple nodes  Eliminates ETL bottlenecks Scales Economically  Deployed on “commodity” hardware  Open source platform guards against vendor lock Scale-Out Storage HDFS Scale-Out Resource Mgt YARN
  • 5. The Need for Hadoop • Store and use all types of data • Process ALL the data; not just a sample • Scalability to 1000s of nodes • Commodity hardware Page 5
  • 6. Relational Database vs. Hadoop Relational Hadoop Required on write schema Required on Read Reads are fast speed Writes are fast Standards and structure governance Loosely structured Limited, no data processing processing Processing coupled with data Structured data types Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store best fit use Data Discovery Processing unstructured data Massive storage/processing P
  • 7. Fundamentally, a Simple Algorithm 1. Review stack of quarters 2. Count each year that ends in an even number Page 7
  • 9. Distributed Algorithm – Map:Reduce Page 9 Map (total number of quarters) Reduce (sum each person’s total)
  • 10. A Brief History of Apache Hadoop Page 10 2013 Focus on INNOVATION 2005: Hadoop created at Yahoo! Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2008 2010 20122006 STABILITY 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
  • 11. HDP / Hadoop Components Page 11
  • 12. HDP: Enterprise Hadoop Platform Page 12 Hortonworks Data Platform (HDP) • The ONLY 100% open source and complete platform • Integrates full range of enterprise-ready services • Certified and tested at scale • Engineered for deep ecosystem interoperability OS/VM Cloud Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN MAP TEZREDUCE HIVE & HCATALOG PIGHBASE
  • 14. HDFS - Writing Files Rack1 Rack2 Rack3 RackN request write Hadoop Client return DNs, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM write blocks block report fs sync Backup NN per NN checkpoint Name Node
  • 15. Hive • Data warehousing package built on top of Hadoop • Bringing structure to unstructured data • Query petabytes of data with HiveQL • Schema on read 1 • • – –
  • 16. Hive: SQL-Like Interface to Hadoop • Provides basic SQL functionality using MapReduce to execute queries • Supports standard SQL clauses INSERT INTO SELECT FROM … JOIN … ON WHERE GROUP BY HAVING ORDER BY LIMIT • Supports basic DDL CREATE/ALTER/DROP TABLE, DATABASE Page 17
  • 17. Hortonworks Investment in Apache Hive Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Page 18 Stinger Phase 3 • Hive on Apache Tez • Query Service (always on) • Buffer Cache • Cost Based Optimizer (Optiq) Stinger Phase 1: • Base Optimizations • SQL Types • SQL Analytic Functions • ORCFile Modern File Format Stinger Phase 2: • SQL Types • SQL Analytic Functions • Advanced Optimizations • Performance Boosts via YARN Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB Goals: …70% complete in 6 months…all IN Hadoop SQL Support broadest range of SQL semantics for analytic applications running against Hadoop
  • 18. Stinger: Enhancing SQL Semantics Page 19 Hive SQL Datatypes Hive SQL Semantics INT SELECT, LOAD, INSERT from query TINYINT/SMALLINT/BIGINT Expressions in WHERE and HAVING BOOLEAN GROUP BY, ORDER BY, SORT BY FLOAT Sub-queries in FROM clause DOUBLE GROUP BY, ORDER BY STRING CLUSTER BY, DISTRIBUTE BY TIMESTAMP ROLLUP and CUBE BINARY UNION DECIMAL LEFT, RIGHT and FULL INNER/OUTER JOIN ARRAY, MAP, STRUCT, UNION CROSS JOIN, LEFT SEMI JOIN CHAR Windowing functions (OVER, RANK, etc.) VARCHAR INTERSECT, EXCEPT, UNION DISTINCT DATE Sub-queries in HAVING Sub-queries in WHERE (IN/NOT IN, EXISTS/NOT EXISTS Hive 0.10 Hive 12 Hive 0.11 Compete Subset Hive 13
  • 19. Pig • Pig was created at Yahoo! to analyze data in HDFS without writing Map/Reduce code. • Two components: – SQL like processing language called “Pig Latin” – PIG execution engine producing Map/Reduce code • Popular uses: – ETL at scale (offloading) – Text parsing and processing to Hive or HBase – Aggregating data from multiple sources • • •
  • 20. Pig Sample Code to find dropped call data: 4G_Data = LOAD ‘/archive/FDR_4G.txt’ using TextLoader(); Customer_Master = LOAD ‘masterdb.customer_data’ using HCatLoader(); 4G_Data_Full = JOIN 4G_Data by customerID, CustomerMaster by customerID; X = FILTER 4G_Data_Full BY State == ‘call_dropped’; • • •
  • 22. Powering the Modern Data Architecture HADOOP 2.0 Multi Use Data Platform Batch, Interactive, Online, Streaming, … Page 23 Interact with all data in multiple ways simultaneously Redundant, Reliable Storage HDFS 2 Cluster Resource Management YARN Standard SQL Processing Hive Batch MapReduce Interactive Tez Online Data Processing HBase, Accumulo Real Time Stream Processing Storm others … HADOOP 1.0 HDFS 1 (redundant, reliable storage) MapReduce (distributed data processing & cluster resource management) Single Use System Batch Apps Data Processing Frameworks (Hive, Pig, Cascading, …)
  • 23. Word Counting Time!! Hadoop’s “Hello Whirled” Example A quick refresher of core elements of Hadoop and then code walk-thrus with Java MapReduce and Pig Page 25
  • 24. Core Hadoop Concepts • Applications are written in high-level code –Developers need not worry about network programming, temporal dependencies or low-level infrastructure • Nodes talk to each other as little as possible –Developers should not write code which communicates between nodes –“Shared nothing” architecture • Data is spread among machines in advance –Computation happens where the data is stored, wherever possible – Data is replicated multiple times on the system for increased availability and reliability Page 26
  • 25. Hadoop: Very High-Level Overview • When data is loaded in the system, it is split into “blocks” –Typically 64MB or 128MB • Map tasks (first part of MapReduce) work on relatively small portions of data –Typically a single block • A master program allocates work to nodes such that a Map tasks will work on a block of data stored locally on that node whenever possible –Many nodes work in parallel, each on their own part of the overall dataset Page 27
  • 26. Fault Tolerance • If a node fails, the master will detect that failure and re-assign the work to a different node on the system • Restarting a task does not require communication with nodes working on other portions of the data • If a failed node restarts, it is automatically added back to the system and assigned new tasks • If a nodes appears to be running slowly, the master can redundantly execute another instance of the same task –Results from the first to finish will be used –Known as “speculative execution” Page 28
  • 27. Hadoop Components • Hadoop consists of two core components –The Hadoop Distributed File System (HDFS) –MapReduce • Many other projects based around core Hadoop (the “Ecosystem”) –Pig, Hive, Hbase, Flume, Oozie, Sqoop, Datameer, etc • A set of machines running HDFS and MapReduce is known as a Hadoop Cluster –Individual machines are known as nodes –A cluster can have as few as one node, as many as several thousand – More nodes = better performance! Page 29
  • 28. Hadoop Components: HDFS • HDFS, the Hadoop Distributed File System, is responsible for storing data on the cluster • Data is split into blocks and distributed across multiple nodes in the cluster –Each block is typically 64MB (the default) or 128MB in size • Each block is replicated multiple times –Default is to replicate each block three times –Replicas are stored on different nodes – This ensures both reliability and availability Page 30
  • 29. HDFS Replicated Blocks Visualized Page 31
  • 30. HDFS *is* a File System • Screenshot for “Name Node UI” Page 32
  • 31. Accessing HDFS • Applications can read and write HDFS files directly via a Java API • Typically, files are created on a local filesystem and must be moved into HDFS • Likewise, files stored in HDFS may need to be moved to a machine’s local filesystem • Access to HDFS from the command line is achieved with the hdfs dfs command –Provides various shell-like commands as you find on Linux –Replaces the hadoop fs command • Graphical tools available like the Sandbox’s Hue File Browser and Red Gate’s HDFS Explorer Page 33
  • 32. hdfs dfs Examples • Copy file foo.txt from local disk to the user’s directory in HDFS –This will copy the file to /user/username/fooHDFS.txt • Get a directory listing of the user’s home directory in HDFS • Get a directory listing of the HDFS root directory Page 34 hdfs dfs –put fooLocal.txt fooHDFS.txt hdfs dfs –ls hdfs dfs –ls /
  • 33. hdfs dfs Examples (continued) • Display the contents of a specific HDFS file • Move that file back to the local disk • Create a directory called input under the user’s home directory • Delete the HDFS directory input and all its contents Page 35 hdfs dfs –cat /user/fred/fooHDFS.txt hdfs dfs –mkdir input hdfs dfs –rm –r input hdfs dfs –get /user/fred/fooHDFS.txt barLocal.txt
  • 34. Hadoop Components: MapReduce • MapReduce is the system used to process data in the Hadoop cluster • Consists of two phases: Map, and then Reduce –Between the two is a stage known as the shuffle and sort • Each Map task operates on a discrete portion of the overall dataset –Typically one HDFS block of data • After all Maps are complete, the MapReduce system distributes the intermediate data to nodes which perform the Reduce phase –Source code examples and live demo coming! Page 36
  • 35. Features of MapReduce • Hadoop attempts to run tasks on nodes which hold their portion of the data locally, to avoid network traffic • Automatic parallelization, distribution, and fault- tolerance • Status and monitoring tools • A clean abstraction for programmers –MapReduce programs are usually written in Java – Can be written in any language using Hadoop Streaming – All of Hadoop is written in Java –With “housekeeping” taken care of by the framework, developers can concentrate simply on writing Map and Reduce functions Page 37
  • 37. Detailed Administrative Console • Screenshot from “Job Tracker UI” Page 39
  • 38. MapReduce: The Mapper • The Mapper reads data in the form of key/value pairs (KVPs) • It outputs zero or more KVPs • The Mapper may use or completely ignore the input key –For example, a standard pattern is to read a line of a file at a time – The key is the byte offset into the file at which the line starts – The value is the contents of the line itself – Typically the key is considered irrelevant with this pattern • If the Mapper writes anything out, it must in the form of KVPs –This “intermediate data” is NOT stored in HDFS (local storage only without replication) Page 40
  • 39. MapReducer: The Reducer • After the Map phase is over, all the intermediate values for a given intermediate key are combined together into a list • This list is given to a Reducer –There may be a single Reducer, or multiple Reducers –All values associated with a particular intermediate key are guaranteed to go to the same Reducer –The intermediate keys, and their value lists, are passed in sorted order • The Reducer outputs zero or more KVPs –These are written to HDFS –In practice, the Reducer often emits a single KVP for each input key Page 41
  • 40. MapReduce Example: Word Count • Count the number of occurrences of each word in a large amount of input data Page 42 map(String input_key, String input_value) foreach word in input_value: emit(w,1) reduce(String output_key, Iter<int> intermediate_vals) set count = 0 foreach val in intermediate_vals: count += val emit(output_key, count)
  • 41. MapReduce Example: Map Phase Page 43 • Input to the Mapper • Ignoring the key – It is just an offset • Output from the Mapper • No attempt is made to optimize within a record in this example – This is a great use case for a “Combiner” (8675, ‘I will not eat green eggs and ham’) (8709, ‘I will not eat them Sam I am’) (‘I’, 1), (‘will’, 1), (‘not’, 1), (‘eat’, 1), (‘green’, 1), (‘eggs’, 1), (‘and’, 1), (‘ham’, 1), (‘I’, 1), (‘will’, 1), (‘not’, 1), (‘eat’, 1), (‘them’, 1), (‘Sam’, 1), (‘I’, 1), (‘am’, 1)
  • 42. MapReduce Example: Reduce Phase Page 44 • Input to the Reducer • Notice keys are sorted and associated values for same key are in a single list – Shuffle & Sort did this for us • Output from the Reducer • All done! (‘I’, [1, 1, 1]) (‘Sam’, [1]) (‘am’, [1]) (‘and’, [1]) (‘eat’, [1, 1]) (‘eggs’, [1]) (‘green’, [1]) (‘ham’, [1]) (‘not’, [1, 1]) (‘them’, [1]) (‘will’, [1, 1]) (‘I’, 3) (‘Sam’, 1) (‘am’, 1) (‘and’, 1) (‘eat’, 2) (‘eggs’, 1) (‘green’, 1) (‘ham’, 1) (‘not’, 2) (‘them’, 1) (‘will’, 2)
  • 43. Code Walkthru & Demo Time!! • Word Count Example –Java MapReduce –Pig Page 45
  • 44. Additional Demonstrations A Real-World Analysis Example Compare/contrast solving the same problem with Java MapReduce, Pig, and Hive Page 46
  • 45. Dataset: Open Georgia • Salaries & Travel Reimbursements –Organization – Local Boards of Education – Several Atlanta-area districts; multiple years – State Agencies, Boards, Authorities and Commissions – Dept of Public Safety; 2010 Page 47
  • 46. Format & Sample Data Page 48 NAME (String) TITLE (String) SALARY (float) ORG TYPE (String) ORG (String) YEAR (int) ABBOTT,DEEDEE W GRADES 9-12 TEACHER 52,122.10 LBOE ATLANTA INDEPENDENT SCHOOL SYSTEM 2010 ALLEN,ANNETTE D SPEECH-LANGUAGE PATHOLOGIST 92,937.28 LBOE ATLANTA INDEPENDENT SCHOOL SYSTEM 2010 BAHR,SHERREEN T GRADE 5 TEACHER 52,752.71 LBOE COBB COUNTY SCHOOL DISTRICT 2010 BAILEY,ANTOINETT E R SCHOOL SECRETARY/CLERK 19,905.90 LBOE COBB COUNTY SCHOOL DISTRICT 2010 BAILEY,ASHLEY N EARLY INTERVENTION PRIMARY TEACHER 43,992.82 LBOE COBB COUNTY SCHOOL DISTRICT 2010 CALVERT,RONALD MARTIN STATE PATROL (SP) 51,370.40 SABAC PUBLIC SAFETY, DEPARTMENT OF 2010 CAMERON,MICHAE L D PUBLIC SAFETY TRN (AL) 34,748.60 SABAC PUBLIC SAFETY, DEPARTMENT OF 2010 DAAS,TARWYN TARA GRADES 9-12 TEACHER 41,614.50 LBOE FULTON COUNTY BOARD OF EDUCATION 2011 DABBS,SANDRA L GRADES 9-12 TEACHER 79,801.59 LBOE FULTON COUNTY BOARD OF EDUCATION 2011 E'LOM,SOPHIA L IS PERSONNEL - GENERAL ADMIN 75,509.00 LBOE FULTON COUNTY BOARD OF EDUCATION 2012 EADDY,FENNER R SUBSTITUTE 13,469.00 LBOE FULTON COUNTY BOARD OF EDUCATION 2012 EADY,ARNETTA A ASSISTANT PRINCIPAL 71,879.00 LBOE FULTON COUNTY BOARD OF EDUCATION 2012
  • 47. Simple Use Case • For all loaded State of Georgia salary information –Produce statistics for each specific job title – Number of employees – Salary breakdown – Minimum – Maximum – Average –Limit the data to investigate – Fiscal year 2010 – School district employees Page 49
  • 48. Code Walkthru & Demo; Part Deux! • Word Count Example –Java MapReduce –Pig –Hive Page 50
  • 49. Demo Wrap-Up • All code, test data, wiki pages, and blog posting can be found, or linked to, from –https://github.com/lestermartin/hadoop-exploration • This deck can be found on SlideShare –http://www.slideshare.net/lestermartin • Questions? Page 51
  • 50. Thank You!! • Lester Martin • Hortonworks – Professional Services • lmartin@hortonworks.com • http://about.me/lestermartin (links to blog, github, twitter, LI, FB, etc) Page 52

Editor's Notes

  1. Hadoop fills several important needs in your data storage and processing infrastructure Store and use all types of data: Allows semi-structured, unstructured and structured data to be processed in a way to create new insights of significant business value. Process all the data: Instead of looking at samples of data or small sections of data, organizations can look at large volumes of data to get new perspective and make business decisions with higher degree of accuracy. Scalability: Reducing latency in business is critical for success. The massive scalability of Big Data systems allow organizations to process massive amounts of data in a fraction of the time required for traditional systems. Commodity hardware: Self-healing, extremely scalable, highly available environment with cost-effective commodity hardware.
  2. KEY CALLOUT: Schema on Read IMPORTANT NOTE: Hadoop is not meant to replace your relational database. Hadoop is for storing Big Data, which is often the type of data that you would otherwise not store in a database due to size or cost constraints You will still have your database for relational, transactional data.
  3. I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop. What we now know of as Hadoop really started back in 2005, when the team at yahoo! – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application. By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications. In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop. [note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  4. SQL is a query language Declarative, what not how Oriented around answering a question Requires uniform schema Requires metadata Known by everyone A great choice for answering queries, building reports, use with automated tools
  5. With Hive and Stinger we are focused on enabling the SQL ecosystem and to do that we’ve put Hive on a clear roadmap to SQL compliance. That includes adding critical datatypes like character and date types as well as implementing common SQL semantics seen in most databases.
  6. “hdfs dfs” is the *new* “hadoop fs” Blank acts like ~
  7. These two slides were just to make folks feel at home with CLI access to HDFS
  8. See https://martin.atlassian.net/wiki/x/FwAvAQ for more details Surely not the typical Volume/Velocity/Variety definition of “Big Data”, but gives us a controlled environment to do some simple prototyping and validating with
  9. See https://martin.atlassian.net/wiki/x/NYBmAQ for more details
  10. See https://martin.atlassian.net/wiki/x/FwAvAQ for more information