Pivotal HD and Spring for Apache Hadoop

5,272 views
5,124 views

Published on

In this webinar we introduce the the concepts of Hadoop and dive into some details unqiue to the Pivotal HD distribution, namely HAWQ which brings ANSI complaint SQL to Hadoop.

We also introduce the Spring for Apache Hadoop project that simplifies developing Hadoop applications by providing a unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, Hive, and HBase. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration. The new Spring XD umbrella project is also introduced.

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,272
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
369
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
  • After the file is closed, the data nodes traffic data around to replicate the blocks to a triplicate, all orchestrated by the namenodeIn the event of a node failure, data can be accessed on other nodes and the namenode will move data blocks to other nodes
  • Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
  • Uses key value pairs as input and output to both phasesHighly parallelizable paradigm – very easy choice for data processing on a Hadoop cluster
  • Advanced Database Services (HAWQ) – high-performance, “True SQL” query interface running within the Hadoop cluster.Xtensions Framework – support for ADS interfaces on external data providers (HBase, Avro, etc.).Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.Unified Storage Services (USS)and Unified Catalog Services (UCS) – support for tiered storage (hot, warm, cold) and integration of multiple data provider catalogs into a single interface.
  • HDFSDelimited TextSequence FileGPDB Writable FormatProtocol BufferAvroHbasePredicate PushdownHiveRCFileText FileSequence File
  • Pivotal HD and Spring for Apache Hadoop

    1. 1. A NEW PLATFORM FOR A NEW ERA
    2. 2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.Hadoop and Pivotal HDApril 23, 2013
    3. 3. 3© Copyright 2013 Pivotal. All rights reserved.About the speakersAdam Shook– Technical Architect for Pivotal– 2+ years Hadoop experience– Instructor for Hadoop-based coursesMark Pollack– Spring committer since 2003– Founder of Spring.NET– Lead Spring Data family of projects
    4. 4. 4© Copyright 2013 Pivotal. All rights reserved.AgendaWhat is Hadoop?Pivotal HDHAWQSpring for Apache HadoopQuestions
    5. 5. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved.What is Hadoop?
    6. 6. 6© Copyright 2013 Pivotal. All rights reserved.Why Hadoop is Important?Delivers performance and scalability at low costHandles large amounts of dataStores data in native formatResilient in case of infrastructure failuresTransparent application scalability
    7. 7. 7© Copyright 2013 Pivotal. All rights reserved.Hadoop OverviewOpen-source Apache project out of Yahoo! in 2006Distributed fault-tolerant data storage and batch processingLinear scalability on commodity hardware
    8. 8. 8© Copyright 2013 Pivotal. All rights reserved.Hadoop OverviewGreat at– Reliable storage for huge data sets– Batch queries and analytics– Changing schemasNot so great at– Changes to files (can‟t do it…)– Low-latency responses– Analyst usability
    9. 9. 9© Copyright 2013 Pivotal. All rights reserved.HDFS OverviewHierarchical UNIX-like file system for data storage– sort ofSplitting of large files into blocksDistribution and replication of blocks to nodesTwo key services– Master NameNode– Many DataNodesSecondary/Checkpoint Node
    10. 10. 10© Copyright 2013 Pivotal. All rights reserved.How HDFS Works - WritesDataNode A DataNode B DataNode C DataNode DNameNode1Client2A13A2 A3 A4Client contacts NameNode to write dataNameNode says write it to these nodesClient sequentiallywrites blocks to DataNode
    11. 11. 11© Copyright 2013 Pivotal. All rights reserved.How HDFS Works - WritesDataNode A DataNode B DataNode C DataNode DNameNodeClientA1 A2 A3 A4 A1A1 A2A2A3A3A4 A4DataNodes replicate datablocks, orchestratedby the NameNode
    12. 12. 12© Copyright 2013 Pivotal. All rights reserved.How HDFS Works - ReadsDataNode A DataNode B DataNode C DataNode DNameNodeClientA1 A2 A3 A4 A1A1 A2A2A3A3A4 A4123Client contacts NameNode to read dataNameNode says you can find it hereClient sequentiallyreads blocks from DataNode
    13. 13. 13© Copyright 2013 Pivotal. All rights reserved.Hadoop MapReduce 1.xMoves the code to the dataJobTracker– Master service to monitor jobsTaskTracker– Multiple services to run tasks– Same physical machine as a DataNodeA job contains many tasksA task contains one or more task attempts
    14. 14. 14© Copyright 2013 Pivotal. All rights reserved.How MapReduce WorksDataNode AA1 A2 A4 A2 A1 A3 A3 A2 A4 A4 A1 A3JobTracker1Client42B1 B3 B4 B2 B3 B1 B3 B2 B4 B4 B1 B23DataNode B DataNode C DataNode DTaskTracker A TaskTracker B TaskTracker C TaskTracker DClient submits job to JobTrackerJobTracker submitstasks to TaskTrackersJob output is written toDataNodes w/replicationJobTracker reports metrics
    15. 15. 15© Copyright 2013 Pivotal. All rights reserved.MapReduce ParadigmData processing system with two key phasesMap– Perform a map function on key/value pairsReduce– Perform a reduce function on key/value groupsGroups created by sorting map output
    16. 16. 16© Copyright 2013 Pivotal. All rights reserved.Reduce Task 0 Reduce Task 1Map Task 0 Map Task 1 Map Task 2(0, "hadoop is fun") (52, "I love hadoop") (104, "Pig is more fun")("hadoop", 1)("is", 1)("fun", 1)("I", 1)("love", 1)("hadoop", 1)("Pig", 1)("is", 1)("more", 1)("fun", 1)("hadoop", {1,1})("is", {1,1})("fun", {1,1})("love", {1})("I", {1})("Pig", {1})("more", {1})("hadoop", 2)("fun", 2)("love", 1)("I", 1)("is", 2)("Pig", 1)("more", 1)SHUFFLE AND SORTMap InputMap OutputReducer Input GroupsReducer Output
    17. 17. 17© Copyright 2013 Pivotal. All rights reserved.Word CountCount the number of timeseach word is used in a bodyof textMap input is a line of textReduce output a word andthe countmap(byte_offset, line)foreach word in lineemit(word, 1)reduce(word, counts)sum = 0foreach count in countssum += countemit(word, sum)
    18. 18. 18© Copyright 2013 Pivotal. All rights reserved.Mapper Codepublic class WordMapperextends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable ONE = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, Context context) {String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {word.set(tokenizer.nextToken());context.write(word, ONE);}}}
    19. 19. 19© Copyright 2013 Pivotal. All rights reserved.Reducer Codepublic class IntSumReducerextends Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values,Context context) {int sum = 0;for (IntWritable val : values) {sum += val.get();}context.write(key, new IntWritable(sum));}}
    20. 20. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved.Pivotal HD
    21. 21. 21© Copyright 2013 Pivotal. All rights reserved.Pivotal HDWorld‟s first true SQL processing for enterprise-readyHadoop100% Apache Hadoop-based platformVirtualization and cloud ready with VMWare and Isilon
    22. 22. 22© Copyright 2013 Pivotal. All rights reserved.Pivotal HD ArchitectureHDFSHBasePig, Hive, MahoutMapReduceSqoop FlumeResourceManagement& WorkflowYarnZooKeeperDeploy,Configure,Monitor, ManageCommandCenterHadoop Virtualization (HVE)Data LoaderPivotal HDEnterpriseApache Pivotal HD Enterprise HAWQXtensionFrameworkCatalogServicesQueryOptimizerDynamic PipeliningANSI SQL + AnalyticsHAWQ – AdvancedDatabase ServicesSpring
    23. 23. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved.HAWQ
    24. 24. 24© Copyright 2013 Pivotal. All rights reserved.HAWQ: The Crown Jewel of Greenplum SQL compliant World-class query optimizer Interactive query Horizontal scalability Robust data management Common Hadoop formats Deep analytics
    25. 25. 25© Copyright 2013 Pivotal. All rights reserved.HAWQQuery Processing– Interactive and true ANSISQL support– Multi-petabyte horizontalscalability– Cost-based parallel queryoptimizer– Programmable analyticsDatabase Services andManagement– Scatter-gather data loading– Row and column storage– Workload management– Multi-level partitioning– 3rd-party tool & open clientinterfaces
    26. 26. 26© Copyright 2013 Pivotal. All rights reserved.10+ Years MPP Database R&D to HadoopPRODUCTFEATURESCLIENT ACCESS& TOOLSMulti-Level Fault ToleranceShared-Nothing MPPParallel Query OptimizerPolymorphic Data Storage™CLIENT ACCESSODBC, JDBC, OLEDB,MapReduce, etc.MPPARCHITECTUREParallel Dataflow EngineSoftware InterconnectScatter/Gather Streaming™ Data LoadingOnline System Expansion Workload ManagementADAPTIVESERVICESLOADING & EXT. ACCESSPetabyte-Scale LoadingTrickle Micro-BatchingAnywhere Data AccessSTORAGE & DATA ACCESSHybrid Storage & Execution(Row- & Column-Oriented)In-Database CompressionMulti-Level PartitioningLANGUAGE SUPPORTComprehensive SQLSQL 92, 99, 2003OLAP ExtensionsAnalytics Extensions3rd PARTY TOOLSBI Tools, ETL ToolsData Mining, etcADMIN TOOLSCommand CenterPackage Manager
    27. 27. 27© Copyright 2013 Pivotal. All rights reserved.Query OptimizerPhysical plan containsscans, joins, sorts, aggregations,etc.Cost-based optimization looks forthe most efficient planGlobal planning avoids sub-optimal “SQL pushing” tosegmentsDirectly inserts “motion” nodesfor inter-segment communicationExecution PlanScanBarsbHashJoinb.name =s.barScanSellssFilterb.city =SanFranciscoProjects.beer, s.priceMotionGatherMotionRedist(b.name)
    28. 28. 28© Copyright 2013 Pivotal. All rights reserved.Dynamic PipeliningTMA supercomputing-based “soft-switch”Core execution technology, borrowed from GPDB, allows us to runcomplex job without materializing intermediate results.Efficiently pumping streams of data between motion nodes duringquery-plan executionDelivers messages, moves data, collects results, and coordinates workamong the segments in the systemDynamic PipeliningTM
    29. 29. 29© Copyright 2013 Pivotal. All rights reserved.Xtension FrameworkEnables Intelligent query integrationwith filter pushdown toHBase, Hive, and HDFSSupports common data formats suchAvro, Protocol Buffers and SequenceFilesProvides extensible framework forconnectivity to other data sourcesHDFS HBase HiveXtension Framework
    30. 30. 30© Copyright 2013 Pivotal. All rights reserved.HAWQ DeploymentDynamic Pipelining... .........MasterServers & NameNodesQuery planning & dispatchSegmentServers & DataNodesQuery processing &data storageExternalSourcesLoading, streaming, etc.HDFSODBC/JDBC Driver
    31. 31. 31© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    32. 32. 32© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    33. 33. 33© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    34. 34. 34© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    35. 35. 35© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    36. 36. 36© Copyright 2013 Pivotal. All rights reserved.How HAWQ Works
    37. 37. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved.Spring for ApacheHadoopSimplify developing Hadoop Applications
    38. 38. 38© Copyright 2013 Pivotal. All rights reserved.Developer observations on HadoopHadoop has a poor out of the box programming modelNon trivial applications often become a collection of scriptscalling Hadoop command line applicationsSpring aims to simplify developer Hadoop applications– Leverage several Spring eco-system projects
    39. 39. 39© Copyright 2013 Pivotal. All rights reserved.Spring For Apache Hadoop - FeaturesConsistent programming and declarative configuration model– Create, configure, and parameterize Hadoop connectivity and all job types– Environment profiles – easily move application from dev to qa to productionDeveloper productivity– Create well-formed applications, not spaghetti script applications– Simplify HDFS access and FsShell API with support for JVM scripting– Runner classes for MR/Pig/Hive/Cascading for small workflows– Helper “Template” classes for Pig/Hive/HBase
    40. 40. 40© Copyright 2013 Pivotal. All rights reserved.Spring For Apache Hadoop – Use CasesApply across a wide range of use cases– Ingest: Events/JDBC/NoSQL/Files to HDFS– Orchestrate: Hadoop Jobs– Export: HDFS to JDBC/NoSQLSpring Integration and Spring Batch make this possible
    41. 41. 41© Copyright 2013 Pivotal. All rights reserved.• Standard Hadoop APIsCounting Words – Configuring M/RConfiguration conf = new Configuration();Job job = new Job(conf, "wordcount");Job.setJarByClass(WordCountMapper.class);job.setMapperClass(WordCountMapper.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.waitForCompletion(true);
    42. 42. 42© Copyright 2013 Pivotal. All rights reserved.Configuring Hadoop with Spring<context:property-placeholder location="hadoop-dev.properties"/><hdp:configuration>fs.default.name=${hd.fs}</hdp:configuration><hdp:job id="word-count-job"input-path=“${input.path}"output-path="${output.path}“jar=“hadoop-examples.jar”mapper="examples.WordCount.WordMapper“reducer="examples.WordCount.IntSumReducer"/><hdp:job-runner id=“runner” job-ref="word-count-job“run-at-startup=“true“ />input.path=/wc/input/output.path=/wc/word/hd.fs=hdfs://localhost:9000applicationContext.xmlhadoop-dev.propertiesAutomatically determinesOutput key and class
    43. 43. 43© Copyright 2013 Pivotal. All rights reserved.Injecting JobsUse DI to obtain reference to Hadoop Job– Perform additional runtime configuration and submitpublic class WordService {@Autowiredprivate Job mapReduceJob;public void processWords() {mapReduceJob.submit();}}
    44. 44. 44© Copyright 2013 Pivotal. All rights reserved.input.path=/wc/input/output.path=/wc/word/hd.fs=hdfs://localhost:9000Streaming Jobs and Environment Configurationbin/hadoop jar hadoop-streaming.jar –input /wc/input –output /wc/output -mapper /bin/cat –reducer /bin/wc -files stopwords.txt<context:property-placeholder location="hadoop-${env}.properties"/><hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}”mapper=“${cat}” reducer=“${wc}”files=“classpath:stopwords.txt”></hdp:streaming>env=dev java –jar SpringLauncher.jar applicationContext.xmlhadoop-dev.properties
    45. 45. 45© Copyright 2013 Pivotal. All rights reserved.Streaming Jobs and Environment Configurationbin/hadoop jar hadoop-streaming.jar –input /wc/input –output /wc/output -mapper /bin/cat –reducer /bin/wc -files stopwords.txt<context:property-placeholder location="hadoop-${env}.properties"/><hdp:streaming id=“wc“ input-path=“${input}” output-path=“${output}”mapper=“${cat}” reducer=“${wc}”files=“classpath:stopwords.txt”></hdp:streaming>env=qa java –jar SpringLauncher.jar applicationContext.xmlinput.path=/gutenberg/input/output.path=/gutenberg/word/hd.fs=hdfs://darwin:9000hadoop-qa.properties
    46. 46. 46© Copyright 2013 Pivotal. All rights reserved.• Access all “bin/hadoop fs” commands throughSpring‟s FsShell helper class– mkdir, chmod, testHDFS and Hadoop Shell as APIsclass MyScript {@Autowired FsShell fsh;@PostConstruct void init() {String outputDir = "/data/output";if (fsShell.test(outputDir)) {fsShell.rmr(outputDir);}}}
    47. 47. 47© Copyright 2013 Pivotal. All rights reserved.HDFS and Hadoop Shell as APIsFsShell is designed to support JVM scripting languages// use the shell (made available under variable fsh)if (!fsh.test(inputDir)) {fsh.mkdir(inputDir);fsh.copyFromLocal(sourceFile, inputDir);fsh.chmod(700, inputDir)}if (fsh.test(outputDir)) {fsh.rmr(outputDir)}copy-files.groovy
    48. 48. 48© Copyright 2013 Pivotal. All rights reserved.HDFS and Hadoop Shell as APIsReference script and supply variables in applicationconfiguration<script id="setupScript" location="copy-files.groovy"><property name="inputDir" value="${wordcount.input.path}"/><property name="outputDir" value="${wordcount.output.path}"/><property name=“sourceFile“ value="${localSourceFile}"/></script>appCtx.xml
    49. 49. 49© Copyright 2013 Pivotal. All rights reserved.Small workflowsOften need the following steps– Execute HDFS operations before job– Run MapReduce Job– Execute HDFS operations after job completesSpring‟s JobRunner helper class sequences these steps– Can reference multiple scripts with comma delimited names<hdp:job-runner id="runner" run-at-startup="true"pre-action="setupScript"job="wordcountJob“post-action=“tearDownScript"/>
    50. 50. 50© Copyright 2013 Pivotal. All rights reserved.Runner classesSimilar runner classes available for Hive and PigImplement JDK callable interfaceEasy to schedule for simple needs using SpringCan later „graduate‟ to use Spring Batch for more complex workflows– Start simple and grow, reusing existing configuration<hdp:job-runner id="runner“ run-at-startup=“false"pre-action="setupScript“job="wordcountJob“post-action=“tearDownScript"/><task:scheduled-tasks><task:scheduled ref="runner" method="call" cron="3/30 * * * * ?"/></task:scheduled-tasks>
    51. 51. 51© Copyright 2013 Pivotal. All rights reserved.Spring‟s PigRunnerExecute a small Pig workflow<pig-factory job-name=“analysis“ properties-location="pig-server.properties"/><script id="hdfsScript” location="copy-files.groovy"><property name=“sourceFile" value="${localSourceFile}"/><property name="inputDir" value="${inputDir}"/><property name="outputDir" value="${outputDir}"/></script><pig-runner id="pigRunner“ pre-action="hdfsScript” run-at-startup="true"><script location=“wordCount.pig"><arguments>inputDir=${inputDir}outputDir=${outputDir}</arguments></script></pig-runner>
    52. 52. 52© Copyright 2013 Pivotal. All rights reserved.PigTemplate - ConfigurationHelper class that simplifies the programmatic use of Pig– Common tasks are one-linersSimilar template helper classes for Hive and HBase<pig-factory id="pigFactory“ properties-location="pig-server.properties"/><pig-template pig-factory-ref="pigFactory"/>
    53. 53. 53© Copyright 2013 Pivotal. All rights reserved.PigTemplate – Programmatic Usepublic class PigPasswordRepository implements PasswordRepository {@Autowiredprivate PigTemplate pigTemplate;@Autowiredprivate String outputDir;private String pigScript = "classpath:password-analysis.pig";public void processPasswordFile(String inputFile) {Properties scriptParameters = new Properties();scriptParameters.put("inputDir", inputFile);scriptParameters.put("outputDir", outputDir);pigTemplate.executeScript(pigScript, scriptParameters);}}
    54. 54. 54© Copyright 2013 Pivotal. All rights reserved.Big Data problems are also integration problemsCollect Transform RT Analysis Ingest Batch Analysis Distribute UseSpring Integration & DataSpring Hadoop +BatchSpring MVCTwitter Search& GardenhoseRedisGemfire (CQ)
    55. 55. 55© Copyright 2013 Pivotal. All rights reserved.Spring Integration Implementation of Enterprise Integration Patterns– Mature, since 2007– Apache 2.0 License Separates integration concerns from processing logic– Framework handles message reception and method invocation• e.g. Polling vs. Event-driven– Endpoints written as POJOs• Increases testabilityEndpoint Endpoint
    56. 56. 56© Copyright 2013 Pivotal. All rights reserved.Pipes and Filters ArchitectureEndpoints are connected through Channels and exchangeMessages$> cat foo.txt | grep the | while read l; do echo $l ; doneEndpoint EndpointChannelProducer ConsumerFile RouteJMS TCP
    57. 57. 57© Copyright 2013 Pivotal. All rights reserved.Spring BatchFramework for batch processing– Basis for JSR-352Born out of collaboration withAccenture in 2007Features– parsers, mappers, readers, writers– automatic retries after failure– periodic commits– synchronous and asynch processing– parallel processing– partial processing (skipping records)– non-sequential processing– job tracking and restart
    58. 58. 58© Copyright 2013 Pivotal. All rights reserved.Spring Integration and Batch for HadoopIngest/ExportEvent Streams – Spring Integration– Examples▪ Consume syslog events, transform and write to HDFS▪ Consume twitter search results and write to HDFSBatch – Spring Batch– Examples▪ Read log files on local file system, transform and write to HDFS▪ Read from HDFS, transform and write to JDBC, HBase, MongoDB,…
    59. 59. 59© Copyright 2013 Pivotal. All rights reserved.Spring Data, Integration, & Batch for AnalyticsRealtime Analytics – Spring Integration & Data– Examples – Service Activator that▪ Increments counters in Redis or MongoDB using Spring Data helper libraries▪ Create Gemfire Continuous Queries using Spring GemfireBatch Analytics – Spring Batch– Orchestrate Hadoop based workflows with Spring Batch– Also orchestrate non-hadoop based workflows
    60. 60. 60© Copyright 2013 Pivotal. All rights reserved.Ingesting – Syslog into HDFSUse SI‟s syslog adapterPerform transformation on dataRoute to specific channels basedon categoryOne route leads to HDFS andfiltered data stored in Redis
    61. 61. 61© Copyright 2013 Pivotal. All rights reserved.Ingesting – Multi-node syslog into HDFSSyslog collection across multiplemachinesBreak processing chain atchannel boundariesUse SI‟s TCP adapters to forwardevents– Or other SI middleware adapters
    62. 62. 62© Copyright 2013 Pivotal. All rights reserved.Hadoop Analytical workflow managed bySpring Batch Reuse same Batch infrastructureand knowledge to manageHadoop workflows Step can be any Hadoop jobtype or HDFS script
    63. 63. 63© Copyright 2013 Pivotal. All rights reserved.Spring Batch Configuration for Hadoop<job id="job1"><step id="import" next="wordcount"><tasklet ref=“import-tasklet"/></step><step id=“wc" next="pig"><tasklet ref="wordcount-tasklet"/></step><step id="pig"><tasklet ref="pig-tasklet“></step><split id="parallel" next="hdfs"><flow><step id="mrStep"><tasklet ref="mr-tasklet"/></step></flow><flow><step id="hive"><tasklet ref="hive-tasklet"/></step></flow></split><step id="hdfs"><tasklet ref="hdfs-tasklet"/></step></job>
    64. 64. 64© Copyright 2013 Pivotal. All rights reserved.• Use Spring Batch‟s– MutliFileItemReader– JdbcItemWriterExporting HDFS to JDBC<step id="step1"><tasklet><chunk reader=“flatFileItemReader" processor="itemProcessor" writer=“jdbcItemWriter"commit-interval="100" retry-limit="3"/></chunk></tasklet></step>
    65. 65. 65© Copyright 2013 Pivotal. All rights reserved.Relationship between Spring Projects
    66. 66. 66© Copyright 2013 Pivotal. All rights reserved.Next Steps – Spring XDNew open source umbrella project to support common bigdata use cases– High throughput distributed data ingestion into HDFS▪ From a variety of input sources– Real-time analytics at ingestion time▪ Gathering metrics, counting values, Gemfire CQ…– On and off Hadoop workflow orchestration– High throughput data export▪ From HDFS to a RDBMS or NoSQL database.XD = eXtreme Data or y= mx + b
    67. 67. 67© Copyright 2013 Pivotal. All rights reserved.Next Steps – Spring XDConsistent model that spans the 4 use-case categoriesMove beyond delivering a set of libraries– Provide out-of-the-box executable servier– High level DSL to configure flows and jobs▪ http | hdfs– Pluggable module systemSee blog post for more information– Github: http://github.com/springsource/spring-xdGet involved!
    68. 68. 68© Copyright 2013 Pivotal. All rights reserved.ResourcesPivotal– goPivotal.comSpring Data– http://www.springsource.org/spring-data– http://www.springsource.org/spring-hadoopSpring Data Book - http://bit.ly/sd-book– Part III on Big DataExample Code https://github.com/SpringSource/spring-data-bookSpring XD http://github.com/springsource/spring-xd
    69. 69. A NEW PLATFORM FOR A NEW ERA

    ×