Gobblin What's New

1. Agenda Talk #1: Apache Gobblin: The Latest [Abhishek Tiwari / Apache] Talk #2: How We Gobble Data at Prezi [Tamas Nemeth / Prezi] Talk #3: Foundations for a Data-Driven Marketing Engine [Michael Dreibelbis / Machine Zone] Talk #4: Data Democracy + Data Privacy at LinkedIn [Eric Ogren, Anthony Hsu / LinkedIn] Big Data Meetup: Data Integration, Management & Compliance Apache Gobblin, Dali and friends … 25th Jan, 2018

2. Gobblin - What’s New? Latest and greatest from the world of Gobblin. https://gobblin.apache.org Abhishek Tiwari Apache PPMC, Committer

3. Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion, replication, organization, and lifecycle management, for both streaming and batch data ecosystems. Mission Build a highly scalable platform that simplifies data integration and management for small and large data ecosystems Vision Enable data to appear anywhere you need it, in the right form

4. Incubation - Incubated in Apache in February 2017 - Code donation, Apache Infrastructure setup by November 2017 - New website: https://gobblin.apache.org - New mailing lists: https://gobblin.apache.org/mailing-lists/ - New issue tracking: https://issues.apache.org/jira/projects/GOBBLIN/ - New wiki: https://cwiki.apache.org/confluence/display/GOBBLIN/Home - Design documents Open Source now: https://cwiki.apache.org/confluence/display/GOBBLIN/Design+Docs - New real time communication channel: https://gitter.im/gobblin/Lobby - Proposed new process for major initiatives: https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+Improvement+Propo sals - First external Apache committer voted in: Joel Baranick - Apache Gobblin Release 0.12.0 in progress

5. Standalone / Embedded Single box / JVM with tasks running in threads Mapreduce Mode As MapReduce application with tasks running in Maps Yarn (In progress: Mesos) Standalone Cluster with Master and Workers Cloud (In progress: Azure) Standalone Cluster with Master and Workers ./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh - Supports batch, streaming and also embedded mode - Low scale - Quick start - Supports batch only mode - Huge scale - Runs on Hadoop as MR application - Supports batch, streaming modes - Huge scale - Runs on Yarn / Mesos / etc - Supports batch, streaming modes - Huge scale - Auto-Scaling / Elastic - Runs on AWS / Azure / etc Multiple execution modes NEW NEW

9. Gobblin Service Run as a cluster itself for HA Gobblin on Hadoop 1 Gobblin MR application Gobblin on AWS Standalone Cluster Gobblin as a Service - REST API / UI - Authentication - Authorization - Flow Management - Flow Orchestration - Topology Management - Monitoring Gobblin on Hadoop 2 Gobblin MR application Setup Gobblin Ingest Job Setup Gobblin Data Format Conversion Job Setup Gobblin Replication Job HDFS 1 Write (Avro) Salesforce Read / Pull Read (Avro) Write (ORC) HDFS 2 Read Write - Platform as a Service for Gobblin - Self Serve - Optimal Resource Use - Seamless Failovers / Upgrades - Global State

10. Global Throttling Service Global Throttling Espresso Read / Write to Kafka Read (Avro) Write (ORC) Namenode RPC Calls - Bound total global QPS of applications - Ensure fair distribution of QPS - Different policy configurations - Audit access patterns RestLI Limiter RestLI Gobblin Limiter RestLI Generic App Limiter RestLI Generic App Read / Write to Espresso Acquire Permits Acquire Permits

11. Other Enhancements - Improved and stabilized gobblin-cluster - Enhanced stream processing - New Sources: RegexPartitionedAvroFileSource, GoogleAnalyticsSource, GoogleDriveSource, GoogleWebmasterSource - New Extractors: PostgresqlExtractor, EnvelopePayloadExtractor - New Converters: JsonToParquet, GrokToJson, JsonToAvro - New Writers: ParquetHdfsDataWriter, SalesforceWriter - Eventually consistent FS support

12. Get Involved Visit us at : https://gobblin.apache.org Mailing lists : https://gobblin.apache.org/mailing-lists/ Gitter : https://gitter.im/gobblin/Lobby 12

Gobblin What's New

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Gobblin What's New

Similar to Gobblin What's New (20)

Recently uploaded

Recently uploaded (20)

Gobblin What's New