• Like

Cloudera Development Kit (CDK): Hadoop Application Development Made Easier

  • 1,715 views
Uploaded on

A set of best practices has emerged for building applications on top of Hadoop, thanks to the broad adoption of Apache Hadoop across various industries. However, for many developers, particularly …

A set of best practices has emerged for building applications on top of Hadoop, thanks to the broad adoption of Apache Hadoop across various industries. However, for many developers, particularly those who are relatively new to Hadoop, it's a challenge to learn what those best practices are and how to apply them.

Cloudera has created a new open source project called the Cloudera Development Kit (CDK), to help these developers get new projects off the ground more easily. The CDK is both a framework and long-term initiative for documenting proven development practices and providing helpful doc and APIs that will make Hadoop application development as easy as possible.

This on-demand webinar will teach you:
- About the current CDK release and its targeted use cases
- How the CDK will be managed and extended over time
- Why the CDK will have a long-term impact on Hadoop adoption

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,715
On Slideshare
0
From Embeds
0
Number of Embeds
21

Actions

Shares
Downloads
61
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 11Headline Goes HereSpeaker Name or Subhead Goes HereCloudera Developer Kit:Hadoop Application Development Made EasierE. Sammer | Engineering ManagerMay 2013
  • 2. 22“[I]t’s not enough to just build ascalable and stable system; the systemalso has to be easy enough forthousands of internal developers of alltypes and all skill levels to use.”http://gigaom.com/data/how-disney-built-a-big-data-platform-on-a-startup-budget/
  • 3. 3Hadoop is incredibly powerful3
  • 4. 4Hadoop is incredibly flexible4
  • 5. 5Hadoop is incredibly low-level5
  • 6. 6Hadoop is incredibly complex6
  • 7. 7A typical system (zoom 100:1)7
  • 8. 8A typical system (zoom 10:1)8
  • 9. 9A typical system (zoom 5:1)9
  • 10. 10What you actually care aboutGetting data from A to BUsing it later10
  • 11. 11Infrastructure detailsSerialization, file formats, and compressionMetadata capture and maintenanceDataset organization and partitioningDurability and delivery guaranteesWell-defined failure semanticsPerformance and health instrumentation11
  • 12. 12Cloudera Development KitMake Hadoop accessible to the enterprise developerCodify expert patterns and practicesMake the “right thing” easy and obviousAddress the most common casesLet developers focus on business logical, not infrastructure12
  • 13. 13Cloudera Development KitAn open source set of libraries, guides, and examples forbuilding data-oriented systems and applicationsProvides higher level APIs atop existing components of CDHSupports piecemeal adoption via loosely coupled modules13
  • 14. 14CDK Data ModuleHigh level APIs for interacting with datasets in HDFSConfiguration-based format and schema managementConsistent data model and serialization semanticsMetadata system integration and supportAutomatic dataset partitioning and file management14
  • 15. 1515DatasetRepository repo = new FileSystemDatasetRepository.Builder().fileSystem(FileSystem.get(new Configuration())).directory(new Path(“/data”)).get();Dataset events = repo.create(“events”,new DatasetDescriptor.Builder().schema(new File(“event.avsc”)).partitionStrategy(new PartitionStrategy.Builder().hash(“userId”, 53).get()).get());DatasetWriter<GenericRecord> writer = events.getWriter();writer.open();writer.write(new GenericRecordBuilder(schema).set(“userId”, 1).set(“timeStamp”, System.currentTimeMillis()).build());writer.close();/data/events/.metadata/schema.avsc/descriptor.properties/userId=0/10000000.avro/10000001.avro/userId=1/20000000.avro/userId=2/30000000.avroCodeData
  • 16. 16Under developmentConfiguration-based record transformation and filtering engineData pipeline deployment, discovery, and managementWorking with customers, partners, and the community on newmodules and features16
  • 17. 17Getting startedCDK code repo: github.com/cloudera/cdkCDK example repo: github.com/cloudera/cdk-examplesBinary artifacts available from Cloudera’s Maven repositoryMailing list: groups.google.com/a/cloudera.org/d/forum/cdk-dev17
  • 18. 18• Submit questions in the Q&A panel• Watch this webinar on-demand athttp://cloudera.com• Follow Cloudera @Cloudera• Follow Cloudera Engineering@ClouderaEng• Thank you for attending!Learn more about the CDKhttp://cloudera.com/cdkCDK on GitHubhttp://cloudera.github.io/cdk/docs/0.2.0/
  • 19. 1919