Strata+Hadoop World | New York, NY | Sept 29-Oct 1, 2015
About the talk:
Data lakes represent a powerful new data architecture, providing enterprises with the scale and flexibility required for big data: unbounded storage for unbounded questions. Hadoop is the de facto standard for implementing data lakes, but significant expertise, time, and effort are still required for organizations to deliver one. Today, enterprises building their own data lakes on Hadoop are effectively implementing their own internal platforms from a collection of individual open source technologies.
The many projects provided by open source and commercial Hadoop distributions must be integrated with each other, integrated with the existing environment, and operationalized into new and existing processes. With no established best practices or standards, each organization is left to find their own way and rely on expensive, external experts. Data lake proof of concepts can take months.
This talk introduces Cask Hydrator, a new open source data lake framework included in the latest release of the Cask Data App Platform (CDAP). Hydrator is a self-service data ingestion and ETL framework with a drag-and-drop user interface and JSON-based pipeline configurations. Enforcing best practices and providing out-of-the-box functionality, Hydrator enables enterprises to build data lakes in a matter of days. Integrations are included with open source and traditional data sources, from Kafka and Flume to Oracle and Teradata. Completely open source, Cask Hydrator is highly extensible and can be easily integrated with new data sources and sinks, and extended with custom transformations and validations.
Attendees will learn about data lakes, the different approaches and architectures enterprises are utilizing, the benefits and challenges associated with them, and how Cask Hydrator can enable the rapid creation of data lakes and dramatically decrease the complexity in operationalizing them.
This session is sponsored by Cask
Jonathan Gray, founder and CEO of Cask, is an entrepreneur and software engineer with a background in startups, open source, and all things data. Prior to founding Cask, Jonathan was a software engineer at Facebook where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production.
An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase and is now a core contributor and active committer in the community.
Jonathan holds a bachelor’s degree in electrical and computer engineering from Carnegie Mellon University.