SlideShare a Scribd company logo
1 of 69
A NEW PLATFORM FOR A NEW ERA
2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
Hadoop and Pivotal HD
April 23, 2013
3© Copyright 2013 Pivotal. All rights reserved.
About the speakers
Adam Shook
– Technical Architect for Pivotal
– 2+ years Hadoop experience
– Instructor for Hadoop-based courses
Mark Pollack
– Spring committer since 2003
– Founder of Spring.NET
– Lead Spring Data family of projects
4© Copyright 2013 Pivotal. All rights reserved.
Agenda
What is Hadoop?
Pivotal HD
HAWQ
Spring for Apache Hadoop
Questions
5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved.
What is Hadoop?
6© Copyright 2013 Pivotal. All rights reserved.
Why Hadoop is Important?
Delivers performance and scalability at low cost
Handles large amounts of data
Stores data in native format
Resilient in case of infrastructure failures
Transparent application scalability
7© Copyright 2013 Pivotal. All rights reserved.
Hadoop Overview
Open-source Apache project out of Yahoo! in 2006
Distributed fault-tolerant data storage and batch processing
Linear scalability on commodity hardware
8© Copyright 2013 Pivotal. All rights reserved.
Hadoop Overview
Great at
– Reliable storage for huge data sets
– Batch queries and analytics
– Changing schemas
Not so great at
– Changes to files (can‟t do it…)
– Low-latency responses
– Analyst usability
9© Copyright 2013 Pivotal. All rights reserved.
HDFS Overview
Hierarchical UNIX-like file system for data storage
– sort of
Splitting of large files into blocks
Distribution and replication of blocks to nodes
Two key services
– Master NameNode
– Many DataNodes
Secondary/Checkpoint Node
10© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Writes
DataNode A DataNode B DataNode C DataNode D
NameNode
1
Client
2
A1
3
A2 A3 A4
Client contacts NameNode to write data
NameNode says write it to these nodes
Client sequentially
writes blocks to DataNode
11© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Writes
DataNode A DataNode B DataNode C DataNode D
NameNodeClient
A1 A2 A3 A4 A1A1 A2A2
A3A3A4 A4
DataNodes replicate data
blocks, orchestrated
by the NameNode
12© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Reads
DataNode A DataNode B DataNode C DataNode D
NameNodeClient
A1 A2 A3 A4 A1A1 A2A2
A3A3A4 A4
1
2
3
Client contacts NameNode to read data
NameNode says you can find it here
Client sequentially
reads blocks from DataNode
13© Copyright 2013 Pivotal. All rights reserved.
Hadoop MapReduce 1.x
Moves the code to the data
JobTracker
– Master service to monitor jobs
TaskTracker
– Multiple services to run tasks
– Same physical machine as a DataNode
A job contains many tasks
A task contains one or more task attempts
14© Copyright 2013 Pivotal. All rights reserved.
How MapReduce Works
DataNode A
A1 A2 A4 A2 A1 A3 A3 A2 A4 A4 A1 A3
JobTracker
1
Client
4
2
B1 B3 B4 B2 B3 B1 B3 B2 B4 B4 B1 B2
3
DataNode B DataNode C DataNode D
TaskTracker A TaskTracker B TaskTracker C TaskTracker D
Client submits job to JobTracker
JobTracker submits
tasks to TaskTrackers
Job output is written to
DataNodes w/replication
JobTracker reports metrics
15© Copyright 2013 Pivotal. All rights reserved.
MapReduce Paradigm
Data processing system with two key phases
Map
– Perform a map function on key/value pairs
Reduce
– Perform a reduce function on key/value groups
Groups created by sorting map output
16© Copyright 2013 Pivotal. All rights reserved.
Reduce Task 0 Reduce Task 1
Map Task 0 Map Task 1 Map Task 2
(0, "hadoop is fun") (52, "I love hadoop") (104, "Pig is more fun")
("hadoop", 1)
("is", 1)
("fun", 1)
("I", 1)
("love", 1)
("hadoop", 1)
("Pig", 1)
("is", 1)
("more", 1)
("fun", 1)
("hadoop", {1,1})
("is", {1,1})
("fun", {1,1})
("love", {1})
("I", {1})
("Pig", {1})
("more", {1})
("hadoop", 2)
("fun", 2)
("love", 1)
("I", 1)
("is", 2)
("Pig", 1)
("more", 1)
SHUFFLE AND SORT
Map Input
Map Output
Reducer Input Groups
Reducer Output
17© Copyright 2013 Pivotal. All rights reserved.
Word Count
Count the number of times
each word is used in a body
of text
Map input is a line of text
Reduce output a word and
the count
map(byte_offset, line)
foreach word in line
emit(word, 1)
reduce(word, counts)
sum = 0
foreach count in counts
sum += count
emit(word, sum)
18© Copyright 2013 Pivotal. All rights reserved.
Mapper Code
public class WordMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable ONE = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, ONE);
}
}
}
19© Copyright 2013 Pivotal. All rights reserved.
Reducer Code
public class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
21© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
World‟s first true SQL processing for enterprise-ready
Hadoop
100% Apache Hadoop-based platform
Virtualization and cloud ready with VMWare and Isilon
22© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD Architecture
HDFS
HBase
Pig, Hive, Mah
out
MapReduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
ZooKeeper
Deploy,
Configure,
Monitor, Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Apache Pivotal HD Enterprise HAWQ
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced
Database Services
Spring
23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved.
HAWQ
24© Copyright 2013 Pivotal. All rights reserved.
HAWQ: The Crown Jewel of Greenplum
 SQL compliant
 World-class query optimizer
 Interactive query
 Horizontal scalability
 Robust data management
 Common Hadoop formats
 Deep analytics
25© Copyright 2013 Pivotal. All rights reserved.
HAWQ
Query Processing
– Interactive and true ANSI
SQL support
– Multi-petabyte horizontal
scalability
– Cost-based parallel query
optimizer
– Programmable analytics
Database Services and
Management
– Scatter-gather data loading
– Row and column storage
– Workload management
– Multi-level partitioning
– 3rd-party tool & open client
interfaces
26© Copyright 2013 Pivotal. All rights reserved.
10+ Years MPP Database R&D to Hadoop
PRODUCT
FEATURES
CLIENT ACCESS
& TOOLS
Multi-Level Fault Tolerance
Shared-Nothing MPP
Parallel Query Optimizer
Polymorphic Data Storage™
CLIENT ACCESS
ODBC, JDBC, OLEDB,
MapReduce, etc.
MPP
ARCHITECTURE
Parallel Dataflow Engine
Software Interconnect
Scatter/Gather Streaming™ Data Loading
Online System Expansion Workload Management
ADAPTIVE
SERVICES
LOADING & EXT. ACCESS
Petabyte-Scale Loading
Trickle Micro-Batching
Anywhere Data Access
STORAGE & DATA ACCESS
Hybrid Storage & Execution
(Row- & Column-Oriented)
In-Database Compression
Multi-Level Partitioning
LANGUAGE SUPPORT
Comprehensive SQL
SQL 92, 99, 2003
OLAP Extensions
Analytics Extensions
3rd PARTY TOOLS
BI Tools, ETL Tools
Data Mining, etc
ADMIN TOOLS
Command Center
Package Manager
27© Copyright 2013 Pivotal. All rights reserved.
Query Optimizer
Physical plan contains
scans, joins, sorts, aggregations,
etc.
Cost-based optimization looks for
the most efficient plan
Global planning avoids sub-
optimal “SQL pushing” to
segments
Directly inserts “motion” nodes
for inter-segment communication
Execution Plan
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
28© Copyright 2013 Pivotal. All rights reserved.
Dynamic PipeliningTM
A supercomputing-based “soft-switch”
Core execution technology, borrowed from GPDB, allows us to run
complex job without materializing intermediate results.
Efficiently pumping streams of data between motion nodes during
query-plan execution
Delivers messages, moves data, collects results, and coordinates work
among the segments in the system
Dynamic PipeliningTM
29© Copyright 2013 Pivotal. All rights reserved.
Xtension Framework
Enables Intelligent query integration
with filter pushdown to
HBase, Hive, and HDFS
Supports common data formats such
Avro, Protocol Buffers and Sequence
Files
Provides extensible framework for
connectivity to other data sourcesHDFS HBase Hive
Xtension Framework
30© Copyright 2013 Pivotal. All rights reserved.
HAWQ Deployment
Dynamic Pipelining
... ...
......
Master
Servers & Name
Nodes
Query planning & dispatch
Segment
Servers & Data
Nodes
Query processing &
data storage
External
Sources
Loading, streami
ng, etc.
HDFS
ODBC/JDBC Driver
31© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
32© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
33© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
34© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
35© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
36© Copyright 2013 Pivotal. All rights reserved.
How HAWQ Works
37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved.
Spring for Apache
Hadoop
Simplify developing Hadoop Applications
38© Copyright 2013 Pivotal. All rights reserved.
Developer observations on Hadoop
Hadoop has a poor out of the box programming model
Non trivial applications often become a collection of scripts
calling Hadoop command line applications
Spring aims to simplify developer Hadoop applications
– Leverage several Spring eco-system projects
39© Copyright 2013 Pivotal. All rights reserved.
Spring For Apache Hadoop - Features
Consistent programming and declarative configuration model
– Create, configure, and parameterize Hadoop connectivity and all job types
– Environment profiles – easily move application from dev to qa to production
Developer productivity
– Create well-formed applications, not spaghetti script applications
– Simplify HDFS access and FsShell API with support for JVM scripting
– Runner classes for MR/Pig/Hive/Cascading for small workflows
– Helper “Template” classes for Pig/Hive/HBase
40© Copyright 2013 Pivotal. All rights reserved.
Spring For Apache Hadoop – Use Cases
Apply across a wide range of use cases
– Ingest: Events/JDBC/NoSQL/Files to HDFS
– Orchestrate: Hadoop Jobs
– Export: HDFS to JDBC/NoSQL
Spring Integration and Spring Batch make this possible
41© Copyright 2013 Pivotal. All rights reserved.
• Standard Hadoop APIs
Counting Words – Configuring M/R
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
Job.setJarByClass(WordCountMapper.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
42© Copyright 2013 Pivotal. All rights reserved.
Configuring Hadoop with Spring
<context:property-placeholder location="hadoop-dev.properties"/>
<hdp:configuration>
fs.default.name=${hd.fs}
</hdp:configuration>
<hdp:job id="word-count-job"
input-path=“${input.path}"
output-path="${output.path}“
jar=“hadoop-examples.jar”
mapper="examples.WordCount.WordMapper“
reducer="examples.WordCount.IntSumReducer"/>
<hdp:job-runner id=“runner” job-ref="word-count-job“
run-at-startup=“true“ />
input.path=/wc/input/
output.path=/wc/word/
hd.fs=hdfs://localhost:9000
applicationContext.xml
hadoop-dev.properties
Automatically determines
Output key and class
43© Copyright 2013 Pivotal. All rights reserved.
Injecting Jobs
Use DI to obtain reference to Hadoop Job
– Perform additional runtime configuration and submit
public class WordService {
@Autowired
private Job mapReduceJob;
public void processWords() {
mapReduceJob.submit();
}
}
44© Copyright 2013 Pivotal. All rights reserved.
input.path=/wc/input/
output.path=/wc/word/
hd.fs=hdfs://localhost:9000
Streaming Jobs and Environment Configuration
bin/hadoop jar hadoop-streaming.jar 
–input /wc/input –output /wc/output 
-mapper /bin/cat –reducer /bin/wc 
-files stopwords.txt
<context:property-placeholder location="hadoop-${env}.properties"/>
<hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}”
mapper=“${cat}” reducer=“${wc}”
files=“classpath:stopwords.txt”>
</hdp:streaming>
env=dev java –jar SpringLauncher.jar applicationContext.xml
hadoop-dev.properties
45© Copyright 2013 Pivotal. All rights reserved.
Streaming Jobs and Environment Configuration
bin/hadoop jar hadoop-streaming.jar 
–input /wc/input –output /wc/output 
-mapper /bin/cat –reducer /bin/wc 
-files stopwords.txt
<context:property-placeholder location="hadoop-${env}.properties"/>
<hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}”
mapper=“${cat}” reducer=“${wc}”
files=“classpath:stopwords.txt”>
</hdp:streaming>
env=qa java –jar SpringLauncher.jar applicationContext.xml
input.path=/gutenberg/input/
output.path=/gutenberg/word/
hd.fs=hdfs://darwin:9000
hadoop-qa.properties
46© Copyright 2013 Pivotal. All rights reserved.
• Access all “bin/hadoop fs” commands through
Spring‟s FsShell helper class
– mkdir, chmod, test
HDFS and Hadoop Shell as APIs
class MyScript {
@Autowired FsShell fsh;
@PostConstruct void init() {
String outputDir = "/data/output";
if (fsShell.test(outputDir)) {
fsShell.rmr(outputDir);
}
}
}
47© Copyright 2013 Pivotal. All rights reserved.
HDFS and Hadoop Shell as APIs
FsShell is designed to support JVM scripting languages
// use the shell (made available under variable fsh)
if (!fsh.test(inputDir)) {
fsh.mkdir(inputDir);
fsh.copyFromLocal(sourceFile, inputDir);
fsh.chmod(700, inputDir)
}
if (fsh.test(outputDir)) {
fsh.rmr(outputDir)
}
copy-files.groovy
48© Copyright 2013 Pivotal. All rights reserved.
HDFS and Hadoop Shell as APIs
Reference script and supply variables in application
configuration
<script id="setupScript" location="copy-files.groovy">
<property name="inputDir" value="${wordcount.input.path}"/>
<property name="outputDir" value="${wordcount.output.path}"/>
<property name=“sourceFile“ value="${localSourceFile}"/>
</script>
appCtx.xml
49© Copyright 2013 Pivotal. All rights reserved.
Small workflows
Often need the following steps
– Execute HDFS operations before job
– Run MapReduce Job
– Execute HDFS operations after job completes
Spring‟s JobRunner helper class sequences these steps
– Can reference multiple scripts with comma delimited names
<hdp:job-runner id="runner" run-at-startup="true"
pre-action="setupScript"
job="wordcountJob“
post-action=“tearDownScript"/>
50© Copyright 2013 Pivotal. All rights reserved.
Runner classes
Similar runner classes available for Hive and Pig
Implement JDK callable interface
Easy to schedule for simple needs using Spring
Can later „graduate‟ to use Spring Batch for more complex workflows
– Start simple and grow, reusing existing configuration
<hdp:job-runner id="runner“ run-at-startup=“false"
pre-action="setupScript“
job="wordcountJob“
post-action=“tearDownScript"/>
<task:scheduled-tasks>
<task:scheduled ref="runner" method="call" cron="3/30 * * * * ?"/>
</task:scheduled-tasks>
51© Copyright 2013 Pivotal. All rights reserved.
Spring‟s PigRunner
Execute a small Pig workflow
<pig-factory job-name=“analysis“ properties-location="pig-server.properties"/>
<script id="hdfsScript” location="copy-files.groovy">
<property name=“sourceFile" value="${localSourceFile}"/>
<property name="inputDir" value="${inputDir}"/>
<property name="outputDir" value="${outputDir}"/>
</script>
<pig-runner id="pigRunner“ pre-action="hdfsScript” run-at-startup="true">
<script location=“wordCount.pig">
<arguments>
inputDir=${inputDir}
outputDir=${outputDir}
</arguments>
</script>
</pig-runner>
52© Copyright 2013 Pivotal. All rights reserved.
PigTemplate - Configuration
Helper class that simplifies the programmatic use of Pig
– Common tasks are one-liners
Similar template helper classes for Hive and HBase
<pig-factory id="pigFactory“ properties-location="pig-server.properties"/>
<pig-template pig-factory-ref="pigFactory"/>
53© Copyright 2013 Pivotal. All rights reserved.
PigTemplate – Programmatic Use
public class PigPasswordRepository implements PasswordRepository {
@Autowired
private PigTemplate pigTemplate;
@Autowired
private String outputDir;
private String pigScript = "classpath:password-analysis.pig";
public void processPasswordFile(String inputFile) {
Properties scriptParameters = new Properties();
scriptParameters.put("inputDir", inputFile);
scriptParameters.put("outputDir", outputDir);
pigTemplate.executeScript(pigScript, scriptParameters);
}
}
54© Copyright 2013 Pivotal. All rights reserved.
Big Data problems are also integration problems
Collect Transform RT Analysis Ingest Batch Analysis Distribute Use
Spring Integration & Data
Spring Hadoop +
Batch
Spring MVCTwitter Search
& Gardenhose
Redis
Gemfire (CQ)
55© Copyright 2013 Pivotal. All rights reserved.
Spring Integration
 Implementation of Enterprise Integration Patterns
– Mature, since 2007
– Apache 2.0 License
 Separates integration concerns from processing logic
– Framework handles message reception and method invocation
• e.g. Polling vs. Event-driven
– Endpoints written as POJOs
• Increases testability
Endpoint Endpoint
56© Copyright 2013 Pivotal. All rights reserved.
Pipes and Filters Architecture
Endpoints are connected through Channels and exchange
Messages
$> cat foo.txt | grep the | while read l; do echo $l ; done
Endpoint Endpoint
Channel
Producer ConsumerFile RouteJMS TCP
57© Copyright 2013 Pivotal. All rights reserved.
Spring Batch
Framework for batch processing
– Basis for JSR-352
Born out of collaboration with
Accenture in 2007
Features
– parsers, mappers, readers, writers
– automatic retries after failure
– periodic commits
– synchronous and asynch processing
– parallel processing
– partial processing (skipping records)
– non-sequential processing
– job tracking and restart
58© Copyright 2013 Pivotal. All rights reserved.
Spring Integration and Batch for Hadoop
Ingest/Export
Event Streams – Spring Integration
– Examples
▪ Consume syslog events, transform and write to HDFS
▪ Consume twitter search results and write to HDFS
Batch – Spring Batch
– Examples
▪ Read log files on local file system, transform and write to HDFS
▪ Read from HDFS, transform and write to JDBC, HBase, MongoDB,…
59© Copyright 2013 Pivotal. All rights reserved.
Spring Data, Integration, & Batch for Analytics
Realtime Analytics – Spring Integration & Data
– Examples – Service Activator that
▪ Increments counters in Redis or MongoDB using Spring Data helper libraries
▪ Create Gemfire Continuous Queries using Spring Gemfire
Batch Analytics – Spring Batch
– Orchestrate Hadoop based workflows with Spring Batch
– Also orchestrate non-hadoop based workflows
60© Copyright 2013 Pivotal. All rights reserved.
Ingesting – Syslog into HDFS
Use SI‟s syslog adapter
Perform transformation on data
Route to specific channels based
on category
One route leads to HDFS and
filtered data stored in Redis
61© Copyright 2013 Pivotal. All rights reserved.
Ingesting – Multi-node syslog into HDFS
Syslog collection across multiple
machines
Break processing chain at
channel boundaries
Use SI‟s TCP adapters to forward
events
– Or other SI middleware adapters
62© Copyright 2013 Pivotal. All rights reserved.
Hadoop Analytical workflow managed by
Spring Batch
 Reuse same Batch infrastructure
and knowledge to manage
Hadoop workflows
 Step can be any Hadoop job
type or HDFS script
63© Copyright 2013 Pivotal. All rights reserved.
Spring Batch Configuration for Hadoop
<job id="job1">
<step id="import" next="wordcount">
<tasklet ref=“import-tasklet"/>
</step>
<step id=“wc" next="pig">
<tasklet ref="wordcount-tasklet"/>
</step>
<step id="pig">
<tasklet ref="pig-tasklet“></step>
<split id="parallel" next="hdfs">
<flow><step id="mrStep">
<tasklet ref="mr-tasklet"/>
</step></flow>
<flow><step id="hive">
<tasklet ref="hive-tasklet"/>
</step></flow>
</split>
<step id="hdfs">
<tasklet ref="hdfs-tasklet"/></step>
</job>
64© Copyright 2013 Pivotal. All rights reserved.
• Use Spring Batch‟s
– MutliFileItemReader
– JdbcItemWriter
Exporting HDFS to JDBC
<step id="step1">
<tasklet>
<chunk reader=“flatFileItemReader" processor="itemProcessor" writer=“jdbcItemWriter"
commit-interval="100" retry-limit="3"/>
</chunk>
</tasklet>
</step>
65© Copyright 2013 Pivotal. All rights reserved.
Relationship between Spring Projects
66© Copyright 2013 Pivotal. All rights reserved.
Next Steps – Spring XD
New open source umbrella project to support common big
data use cases
– High throughput distributed data ingestion into HDFS
▪ From a variety of input sources
– Real-time analytics at ingestion time
▪ Gathering metrics, counting values, Gemfire CQ…
– On and off Hadoop workflow orchestration
– High throughput data export
▪ From HDFS to a RDBMS or NoSQL database.
XD = eXtreme Data or y= mx + b
67© Copyright 2013 Pivotal. All rights reserved.
Next Steps – Spring XD
Consistent model that spans the 4 use-case categories
Move beyond delivering a set of libraries
– Provide out-of-the-box executable servier
– High level DSL to configure flows and jobs
▪ http | hdfs
– Pluggable module system
See blog post for more information
– Github: http://github.com/springsource/spring-xd
Get involved!
68© Copyright 2013 Pivotal. All rights reserved.
Resources
Pivotal
– goPivotal.com
Spring Data
– http://www.springsource.org/spring-data
– http://www.springsource.org/spring-hadoop
Spring Data Book - http://bit.ly/sd-book
– Part III on Big Data
Example Code https://github.com/SpringSource/spring-data-book
Spring XD http://github.com/springsource/spring-xd
A NEW PLATFORM FOR A NEW ERA

More Related Content

What's hot

AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleDataWorks Summit
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopGiovanna Roda
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRPivotalOpenSourceHub
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop SecurityDataWorks Summit
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 

What's hot (20)

AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 

Similar to Pivotal HD and Spring for Apache Hadoop

Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data Amar kumar
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 

Similar to Pivotal HD and Spring for Apache Hadoop (20)

Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Pivotal HD and Spring for Apache Hadoop

  • 1. A NEW PLATFORM FOR A NEW ERA
  • 2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved. Hadoop and Pivotal HD April 23, 2013
  • 3. 3© Copyright 2013 Pivotal. All rights reserved. About the speakers Adam Shook – Technical Architect for Pivotal – 2+ years Hadoop experience – Instructor for Hadoop-based courses Mark Pollack – Spring committer since 2003 – Founder of Spring.NET – Lead Spring Data family of projects
  • 4. 4© Copyright 2013 Pivotal. All rights reserved. Agenda What is Hadoop? Pivotal HD HAWQ Spring for Apache Hadoop Questions
  • 5. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. What is Hadoop?
  • 6. 6© Copyright 2013 Pivotal. All rights reserved. Why Hadoop is Important? Delivers performance and scalability at low cost Handles large amounts of data Stores data in native format Resilient in case of infrastructure failures Transparent application scalability
  • 7. 7© Copyright 2013 Pivotal. All rights reserved. Hadoop Overview Open-source Apache project out of Yahoo! in 2006 Distributed fault-tolerant data storage and batch processing Linear scalability on commodity hardware
  • 8. 8© Copyright 2013 Pivotal. All rights reserved. Hadoop Overview Great at – Reliable storage for huge data sets – Batch queries and analytics – Changing schemas Not so great at – Changes to files (can‟t do it…) – Low-latency responses – Analyst usability
  • 9. 9© Copyright 2013 Pivotal. All rights reserved. HDFS Overview Hierarchical UNIX-like file system for data storage – sort of Splitting of large files into blocks Distribution and replication of blocks to nodes Two key services – Master NameNode – Many DataNodes Secondary/Checkpoint Node
  • 10. 10© Copyright 2013 Pivotal. All rights reserved. How HDFS Works - Writes DataNode A DataNode B DataNode C DataNode D NameNode 1 Client 2 A1 3 A2 A3 A4 Client contacts NameNode to write data NameNode says write it to these nodes Client sequentially writes blocks to DataNode
  • 11. 11© Copyright 2013 Pivotal. All rights reserved. How HDFS Works - Writes DataNode A DataNode B DataNode C DataNode D NameNodeClient A1 A2 A3 A4 A1A1 A2A2 A3A3A4 A4 DataNodes replicate data blocks, orchestrated by the NameNode
  • 12. 12© Copyright 2013 Pivotal. All rights reserved. How HDFS Works - Reads DataNode A DataNode B DataNode C DataNode D NameNodeClient A1 A2 A3 A4 A1A1 A2A2 A3A3A4 A4 1 2 3 Client contacts NameNode to read data NameNode says you can find it here Client sequentially reads blocks from DataNode
  • 13. 13© Copyright 2013 Pivotal. All rights reserved. Hadoop MapReduce 1.x Moves the code to the data JobTracker – Master service to monitor jobs TaskTracker – Multiple services to run tasks – Same physical machine as a DataNode A job contains many tasks A task contains one or more task attempts
  • 14. 14© Copyright 2013 Pivotal. All rights reserved. How MapReduce Works DataNode A A1 A2 A4 A2 A1 A3 A3 A2 A4 A4 A1 A3 JobTracker 1 Client 4 2 B1 B3 B4 B2 B3 B1 B3 B2 B4 B4 B1 B2 3 DataNode B DataNode C DataNode D TaskTracker A TaskTracker B TaskTracker C TaskTracker D Client submits job to JobTracker JobTracker submits tasks to TaskTrackers Job output is written to DataNodes w/replication JobTracker reports metrics
  • 15. 15© Copyright 2013 Pivotal. All rights reserved. MapReduce Paradigm Data processing system with two key phases Map – Perform a map function on key/value pairs Reduce – Perform a reduce function on key/value groups Groups created by sorting map output
  • 16. 16© Copyright 2013 Pivotal. All rights reserved. Reduce Task 0 Reduce Task 1 Map Task 0 Map Task 1 Map Task 2 (0, "hadoop is fun") (52, "I love hadoop") (104, "Pig is more fun") ("hadoop", 1) ("is", 1) ("fun", 1) ("I", 1) ("love", 1) ("hadoop", 1) ("Pig", 1) ("is", 1) ("more", 1) ("fun", 1) ("hadoop", {1,1}) ("is", {1,1}) ("fun", {1,1}) ("love", {1}) ("I", {1}) ("Pig", {1}) ("more", {1}) ("hadoop", 2) ("fun", 2) ("love", 1) ("I", 1) ("is", 2) ("Pig", 1) ("more", 1) SHUFFLE AND SORT Map Input Map Output Reducer Input Groups Reducer Output
  • 17. 17© Copyright 2013 Pivotal. All rights reserved. Word Count Count the number of times each word is used in a body of text Map input is a line of text Reduce output a word and the count map(byte_offset, line) foreach word in line emit(word, 1) reduce(word, counts) sum = 0 foreach count in counts sum += count emit(word, sum)
  • 18. 18© Copyright 2013 Pivotal. All rights reserved. Mapper Code public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable ONE = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, ONE); } } }
  • 19. 19© Copyright 2013 Pivotal. All rights reserved. Reducer Code public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
  • 20. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  • 21. 21© Copyright 2013 Pivotal. All rights reserved. Pivotal HD World‟s first true SQL processing for enterprise-ready Hadoop 100% Apache Hadoop-based platform Virtualization and cloud ready with VMWare and Isilon
  • 22. 22© Copyright 2013 Pivotal. All rights reserved. Pivotal HD Architecture HDFS HBase Pig, Hive, Mah out MapReduce Sqoop Flume Resource Management & Workflow Yarn ZooKeeper Deploy, Configure, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Apache Pivotal HD Enterprise HAWQ Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ – Advanced Database Services Spring
  • 23. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved. HAWQ
  • 24. 24© Copyright 2013 Pivotal. All rights reserved. HAWQ: The Crown Jewel of Greenplum  SQL compliant  World-class query optimizer  Interactive query  Horizontal scalability  Robust data management  Common Hadoop formats  Deep analytics
  • 25. 25© Copyright 2013 Pivotal. All rights reserved. HAWQ Query Processing – Interactive and true ANSI SQL support – Multi-petabyte horizontal scalability – Cost-based parallel query optimizer – Programmable analytics Database Services and Management – Scatter-gather data loading – Row and column storage – Workload management – Multi-level partitioning – 3rd-party tool & open client interfaces
  • 26. 26© Copyright 2013 Pivotal. All rights reserved. 10+ Years MPP Database R&D to Hadoop PRODUCT FEATURES CLIENT ACCESS & TOOLS Multi-Level Fault Tolerance Shared-Nothing MPP Parallel Query Optimizer Polymorphic Data Storage™ CLIENT ACCESS ODBC, JDBC, OLEDB, MapReduce, etc. MPP ARCHITECTURE Parallel Dataflow Engine Software Interconnect Scatter/Gather Streaming™ Data Loading Online System Expansion Workload Management ADAPTIVE SERVICES LOADING & EXT. ACCESS Petabyte-Scale Loading Trickle Micro-Batching Anywhere Data Access STORAGE & DATA ACCESS Hybrid Storage & Execution (Row- & Column-Oriented) In-Database Compression Multi-Level Partitioning LANGUAGE SUPPORT Comprehensive SQL SQL 92, 99, 2003 OLAP Extensions Analytics Extensions 3rd PARTY TOOLS BI Tools, ETL Tools Data Mining, etc ADMIN TOOLS Command Center Package Manager
  • 27. 27© Copyright 2013 Pivotal. All rights reserved. Query Optimizer Physical plan contains scans, joins, sorts, aggregations, etc. Cost-based optimization looks for the most efficient plan Global planning avoids sub- optimal “SQL pushing” to segments Directly inserts “motion” nodes for inter-segment communication Execution Plan ScanBars b HashJoinb.name =s.bar ScanSells s Filterb.city ='SanFrancisco' Projects.beer, s.price MotionGather MotionRedist(b.name)
  • 28. 28© Copyright 2013 Pivotal. All rights reserved. Dynamic PipeliningTM A supercomputing-based “soft-switch” Core execution technology, borrowed from GPDB, allows us to run complex job without materializing intermediate results. Efficiently pumping streams of data between motion nodes during query-plan execution Delivers messages, moves data, collects results, and coordinates work among the segments in the system Dynamic PipeliningTM
  • 29. 29© Copyright 2013 Pivotal. All rights reserved. Xtension Framework Enables Intelligent query integration with filter pushdown to HBase, Hive, and HDFS Supports common data formats such Avro, Protocol Buffers and Sequence Files Provides extensible framework for connectivity to other data sourcesHDFS HBase Hive Xtension Framework
  • 30. 30© Copyright 2013 Pivotal. All rights reserved. HAWQ Deployment Dynamic Pipelining ... ... ...... Master Servers & Name Nodes Query planning & dispatch Segment Servers & Data Nodes Query processing & data storage External Sources Loading, streami ng, etc. HDFS ODBC/JDBC Driver
  • 31. 31© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 32. 32© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 33. 33© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 34. 34© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 35. 35© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 36. 36© Copyright 2013 Pivotal. All rights reserved. How HAWQ Works
  • 37. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved. Spring for Apache Hadoop Simplify developing Hadoop Applications
  • 38. 38© Copyright 2013 Pivotal. All rights reserved. Developer observations on Hadoop Hadoop has a poor out of the box programming model Non trivial applications often become a collection of scripts calling Hadoop command line applications Spring aims to simplify developer Hadoop applications – Leverage several Spring eco-system projects
  • 39. 39© Copyright 2013 Pivotal. All rights reserved. Spring For Apache Hadoop - Features Consistent programming and declarative configuration model – Create, configure, and parameterize Hadoop connectivity and all job types – Environment profiles – easily move application from dev to qa to production Developer productivity – Create well-formed applications, not spaghetti script applications – Simplify HDFS access and FsShell API with support for JVM scripting – Runner classes for MR/Pig/Hive/Cascading for small workflows – Helper “Template” classes for Pig/Hive/HBase
  • 40. 40© Copyright 2013 Pivotal. All rights reserved. Spring For Apache Hadoop – Use Cases Apply across a wide range of use cases – Ingest: Events/JDBC/NoSQL/Files to HDFS – Orchestrate: Hadoop Jobs – Export: HDFS to JDBC/NoSQL Spring Integration and Spring Batch make this possible
  • 41. 41© Copyright 2013 Pivotal. All rights reserved. • Standard Hadoop APIs Counting Words – Configuring M/R Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); Job.setJarByClass(WordCountMapper.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true);
  • 42. 42© Copyright 2013 Pivotal. All rights reserved. Configuring Hadoop with Spring <context:property-placeholder location="hadoop-dev.properties"/> <hdp:configuration> fs.default.name=${hd.fs} </hdp:configuration> <hdp:job id="word-count-job" input-path=“${input.path}" output-path="${output.path}“ jar=“hadoop-examples.jar” mapper="examples.WordCount.WordMapper“ reducer="examples.WordCount.IntSumReducer"/> <hdp:job-runner id=“runner” job-ref="word-count-job“ run-at-startup=“true“ /> input.path=/wc/input/ output.path=/wc/word/ hd.fs=hdfs://localhost:9000 applicationContext.xml hadoop-dev.properties Automatically determines Output key and class
  • 43. 43© Copyright 2013 Pivotal. All rights reserved. Injecting Jobs Use DI to obtain reference to Hadoop Job – Perform additional runtime configuration and submit public class WordService { @Autowired private Job mapReduceJob; public void processWords() { mapReduceJob.submit(); } }
  • 44. 44© Copyright 2013 Pivotal. All rights reserved. input.path=/wc/input/ output.path=/wc/word/ hd.fs=hdfs://localhost:9000 Streaming Jobs and Environment Configuration bin/hadoop jar hadoop-streaming.jar –input /wc/input –output /wc/output -mapper /bin/cat –reducer /bin/wc -files stopwords.txt <context:property-placeholder location="hadoop-${env}.properties"/> <hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}” mapper=“${cat}” reducer=“${wc}” files=“classpath:stopwords.txt”> </hdp:streaming> env=dev java –jar SpringLauncher.jar applicationContext.xml hadoop-dev.properties
  • 45. 45© Copyright 2013 Pivotal. All rights reserved. Streaming Jobs and Environment Configuration bin/hadoop jar hadoop-streaming.jar –input /wc/input –output /wc/output -mapper /bin/cat –reducer /bin/wc -files stopwords.txt <context:property-placeholder location="hadoop-${env}.properties"/> <hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}” mapper=“${cat}” reducer=“${wc}” files=“classpath:stopwords.txt”> </hdp:streaming> env=qa java –jar SpringLauncher.jar applicationContext.xml input.path=/gutenberg/input/ output.path=/gutenberg/word/ hd.fs=hdfs://darwin:9000 hadoop-qa.properties
  • 46. 46© Copyright 2013 Pivotal. All rights reserved. • Access all “bin/hadoop fs” commands through Spring‟s FsShell helper class – mkdir, chmod, test HDFS and Hadoop Shell as APIs class MyScript { @Autowired FsShell fsh; @PostConstruct void init() { String outputDir = "/data/output"; if (fsShell.test(outputDir)) { fsShell.rmr(outputDir); } } }
  • 47. 47© Copyright 2013 Pivotal. All rights reserved. HDFS and Hadoop Shell as APIs FsShell is designed to support JVM scripting languages // use the shell (made available under variable fsh) if (!fsh.test(inputDir)) { fsh.mkdir(inputDir); fsh.copyFromLocal(sourceFile, inputDir); fsh.chmod(700, inputDir) } if (fsh.test(outputDir)) { fsh.rmr(outputDir) } copy-files.groovy
  • 48. 48© Copyright 2013 Pivotal. All rights reserved. HDFS and Hadoop Shell as APIs Reference script and supply variables in application configuration <script id="setupScript" location="copy-files.groovy"> <property name="inputDir" value="${wordcount.input.path}"/> <property name="outputDir" value="${wordcount.output.path}"/> <property name=“sourceFile“ value="${localSourceFile}"/> </script> appCtx.xml
  • 49. 49© Copyright 2013 Pivotal. All rights reserved. Small workflows Often need the following steps – Execute HDFS operations before job – Run MapReduce Job – Execute HDFS operations after job completes Spring‟s JobRunner helper class sequences these steps – Can reference multiple scripts with comma delimited names <hdp:job-runner id="runner" run-at-startup="true" pre-action="setupScript" job="wordcountJob“ post-action=“tearDownScript"/>
  • 50. 50© Copyright 2013 Pivotal. All rights reserved. Runner classes Similar runner classes available for Hive and Pig Implement JDK callable interface Easy to schedule for simple needs using Spring Can later „graduate‟ to use Spring Batch for more complex workflows – Start simple and grow, reusing existing configuration <hdp:job-runner id="runner“ run-at-startup=“false" pre-action="setupScript“ job="wordcountJob“ post-action=“tearDownScript"/> <task:scheduled-tasks> <task:scheduled ref="runner" method="call" cron="3/30 * * * * ?"/> </task:scheduled-tasks>
  • 51. 51© Copyright 2013 Pivotal. All rights reserved. Spring‟s PigRunner Execute a small Pig workflow <pig-factory job-name=“analysis“ properties-location="pig-server.properties"/> <script id="hdfsScript” location="copy-files.groovy"> <property name=“sourceFile" value="${localSourceFile}"/> <property name="inputDir" value="${inputDir}"/> <property name="outputDir" value="${outputDir}"/> </script> <pig-runner id="pigRunner“ pre-action="hdfsScript” run-at-startup="true"> <script location=“wordCount.pig"> <arguments> inputDir=${inputDir} outputDir=${outputDir} </arguments> </script> </pig-runner>
  • 52. 52© Copyright 2013 Pivotal. All rights reserved. PigTemplate - Configuration Helper class that simplifies the programmatic use of Pig – Common tasks are one-liners Similar template helper classes for Hive and HBase <pig-factory id="pigFactory“ properties-location="pig-server.properties"/> <pig-template pig-factory-ref="pigFactory"/>
  • 53. 53© Copyright 2013 Pivotal. All rights reserved. PigTemplate – Programmatic Use public class PigPasswordRepository implements PasswordRepository { @Autowired private PigTemplate pigTemplate; @Autowired private String outputDir; private String pigScript = "classpath:password-analysis.pig"; public void processPasswordFile(String inputFile) { Properties scriptParameters = new Properties(); scriptParameters.put("inputDir", inputFile); scriptParameters.put("outputDir", outputDir); pigTemplate.executeScript(pigScript, scriptParameters); } }
  • 54. 54© Copyright 2013 Pivotal. All rights reserved. Big Data problems are also integration problems Collect Transform RT Analysis Ingest Batch Analysis Distribute Use Spring Integration & Data Spring Hadoop + Batch Spring MVCTwitter Search & Gardenhose Redis Gemfire (CQ)
  • 55. 55© Copyright 2013 Pivotal. All rights reserved. Spring Integration  Implementation of Enterprise Integration Patterns – Mature, since 2007 – Apache 2.0 License  Separates integration concerns from processing logic – Framework handles message reception and method invocation • e.g. Polling vs. Event-driven – Endpoints written as POJOs • Increases testability Endpoint Endpoint
  • 56. 56© Copyright 2013 Pivotal. All rights reserved. Pipes and Filters Architecture Endpoints are connected through Channels and exchange Messages $> cat foo.txt | grep the | while read l; do echo $l ; done Endpoint Endpoint Channel Producer ConsumerFile RouteJMS TCP
  • 57. 57© Copyright 2013 Pivotal. All rights reserved. Spring Batch Framework for batch processing – Basis for JSR-352 Born out of collaboration with Accenture in 2007 Features – parsers, mappers, readers, writers – automatic retries after failure – periodic commits – synchronous and asynch processing – parallel processing – partial processing (skipping records) – non-sequential processing – job tracking and restart
  • 58. 58© Copyright 2013 Pivotal. All rights reserved. Spring Integration and Batch for Hadoop Ingest/Export Event Streams – Spring Integration – Examples ▪ Consume syslog events, transform and write to HDFS ▪ Consume twitter search results and write to HDFS Batch – Spring Batch – Examples ▪ Read log files on local file system, transform and write to HDFS ▪ Read from HDFS, transform and write to JDBC, HBase, MongoDB,…
  • 59. 59© Copyright 2013 Pivotal. All rights reserved. Spring Data, Integration, & Batch for Analytics Realtime Analytics – Spring Integration & Data – Examples – Service Activator that ▪ Increments counters in Redis or MongoDB using Spring Data helper libraries ▪ Create Gemfire Continuous Queries using Spring Gemfire Batch Analytics – Spring Batch – Orchestrate Hadoop based workflows with Spring Batch – Also orchestrate non-hadoop based workflows
  • 60. 60© Copyright 2013 Pivotal. All rights reserved. Ingesting – Syslog into HDFS Use SI‟s syslog adapter Perform transformation on data Route to specific channels based on category One route leads to HDFS and filtered data stored in Redis
  • 61. 61© Copyright 2013 Pivotal. All rights reserved. Ingesting – Multi-node syslog into HDFS Syslog collection across multiple machines Break processing chain at channel boundaries Use SI‟s TCP adapters to forward events – Or other SI middleware adapters
  • 62. 62© Copyright 2013 Pivotal. All rights reserved. Hadoop Analytical workflow managed by Spring Batch  Reuse same Batch infrastructure and knowledge to manage Hadoop workflows  Step can be any Hadoop job type or HDFS script
  • 63. 63© Copyright 2013 Pivotal. All rights reserved. Spring Batch Configuration for Hadoop <job id="job1"> <step id="import" next="wordcount"> <tasklet ref=“import-tasklet"/> </step> <step id=“wc" next="pig"> <tasklet ref="wordcount-tasklet"/> </step> <step id="pig"> <tasklet ref="pig-tasklet“></step> <split id="parallel" next="hdfs"> <flow><step id="mrStep"> <tasklet ref="mr-tasklet"/> </step></flow> <flow><step id="hive"> <tasklet ref="hive-tasklet"/> </step></flow> </split> <step id="hdfs"> <tasklet ref="hdfs-tasklet"/></step> </job>
  • 64. 64© Copyright 2013 Pivotal. All rights reserved. • Use Spring Batch‟s – MutliFileItemReader – JdbcItemWriter Exporting HDFS to JDBC <step id="step1"> <tasklet> <chunk reader=“flatFileItemReader" processor="itemProcessor" writer=“jdbcItemWriter" commit-interval="100" retry-limit="3"/> </chunk> </tasklet> </step>
  • 65. 65© Copyright 2013 Pivotal. All rights reserved. Relationship between Spring Projects
  • 66. 66© Copyright 2013 Pivotal. All rights reserved. Next Steps – Spring XD New open source umbrella project to support common big data use cases – High throughput distributed data ingestion into HDFS ▪ From a variety of input sources – Real-time analytics at ingestion time ▪ Gathering metrics, counting values, Gemfire CQ… – On and off Hadoop workflow orchestration – High throughput data export ▪ From HDFS to a RDBMS or NoSQL database. XD = eXtreme Data or y= mx + b
  • 67. 67© Copyright 2013 Pivotal. All rights reserved. Next Steps – Spring XD Consistent model that spans the 4 use-case categories Move beyond delivering a set of libraries – Provide out-of-the-box executable servier – High level DSL to configure flows and jobs ▪ http | hdfs – Pluggable module system See blog post for more information – Github: http://github.com/springsource/spring-xd Get involved!
  • 68. 68© Copyright 2013 Pivotal. All rights reserved. Resources Pivotal – goPivotal.com Spring Data – http://www.springsource.org/spring-data – http://www.springsource.org/spring-hadoop Spring Data Book - http://bit.ly/sd-book – Part III on Big Data Example Code https://github.com/SpringSource/spring-data-book Spring XD http://github.com/springsource/spring-xd
  • 69. A NEW PLATFORM FOR A NEW ERA

Editor's Notes

  1. Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
  2. After the file is closed, the data nodes traffic data around to replicate the blocks to a triplicate, all orchestrated by the namenodeIn the event of a node failure, data can be accessed on other nodes and the namenode will move data blocks to other nodes
  3. Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
  4. Uses key value pairs as input and output to both phasesHighly parallelizable paradigm – very easy choice for data processing on a Hadoop cluster
  5. Advanced Database Services (HAWQ) – high-performance, “True SQL” query interface running within the Hadoop cluster.Xtensions Framework – support for ADS interfaces on external data providers (HBase, Avro, etc.).Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.Unified Storage Services (USS)and Unified Catalog Services (UCS) – support for tiered storage (hot, warm, cold) and integration of multiple data provider catalogs into a single interface.
  6. HDFSDelimited TextSequence FileGPDB Writable FormatProtocol BufferAvroHbasePredicate PushdownHiveRCFileText FileSequence File