SlideShare a Scribd company logo
Agenda
Talk #1: Apache Gobblin: The Latest
[Abhishek Tiwari / Apache]
Talk #2: How We Gobble Data at Prezi
[Tamas Nemeth / Prezi]
Talk #3: Foundations for a Data-Driven Marketing Engine
[Michael Dreibelbis / Machine Zone]
Talk #4: Data Democracy + Data Privacy at LinkedIn
[Eric Ogren, Anthony Hsu / LinkedIn]
Big Data Meetup: Data Integration, Management & Compliance
Apache Gobblin, Dali and friends …
25th Jan, 2018
Gobblin - What’s New?
Latest and greatest from the
world of Gobblin.
https://gobblin.apache.org
Abhishek Tiwari
Apache PPMC, Committer
Gobblin is a distributed data integration framework that simplifies common aspects of
big data integration, such as ingestion, replication, organization, and lifecycle
management, for both streaming and batch data ecosystems.
Mission
Build a highly scalable platform that simplifies data integration and
management for small and large data ecosystems
Vision
Enable data to appear anywhere you need it, in the right form
Incubation
- Incubated in Apache in February 2017
- Code donation, Apache Infrastructure setup by November 2017
- New website: https://gobblin.apache.org
- New mailing lists: https://gobblin.apache.org/mailing-lists/
- New issue tracking: https://issues.apache.org/jira/projects/GOBBLIN/
- New wiki: https://cwiki.apache.org/confluence/display/GOBBLIN/Home
- Design documents Open Source now:
https://cwiki.apache.org/confluence/display/GOBBLIN/Design+Docs
- New real time communication channel: https://gitter.im/gobblin/Lobby
- Proposed new process for major initiatives:
https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+Improvement+Propo
sals
- First external Apache committer voted in: Joel Baranick
- Apache Gobblin Release 0.12.0 in progress
Standalone /
Embedded
Single box / JVM with tasks
running in threads
Mapreduce Mode
As MapReduce application
with tasks running in Maps
Yarn
(In progress: Mesos)
Standalone Cluster with
Master and Workers
Cloud
(In progress: Azure)
Standalone Cluster with
Master and Workers
./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh
- Supports batch, streaming
and also embedded mode
- Low scale
- Quick start
- Supports batch only mode
- Huge scale
- Runs on Hadoop as MR
application
- Supports batch, streaming
modes
- Huge scale
- Runs on Yarn / Mesos / etc
- Supports batch, streaming
modes
- Huge scale
- Auto-Scaling / Elastic
- Runs on AWS / Azure / etc
Multiple execution modes
NEW NEW
Standalone /
Embedded
Single box / JVM with tasks
running in threads
Mapreduce Mode
As MapReduce application
with tasks running in Maps
Yarn
(In progress: Mesos)
Standalone Cluster with
Master and Workers
Cloud
(In progress: Azure)
Standalone Cluster with
Master and Workers
./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh
- Supports batch, streaming
and also embedded mode
- Low scale
- Quick start
- Supports batch only mode
- Huge scale
- Runs on Hadoop as MR
application
- Supports batch, streaming
modes
- Huge scale
- Runs on Yarn / Mesos / etc
- Supports batch, streaming
modes
- Huge scale
- Auto-Scaling / Elastic
- Runs on AWS / Azure / etc
Multiple execution modes
NEW NEW
Standalone /
Embedded
Single box / JVM with tasks
running in threads
Mapreduce Mode
As MapReduce application
with tasks running in Maps
Yarn
(In progress: Mesos)
Standalone Cluster with
Master and Workers
Cloud
(In progress: Azure)
Standalone Cluster with
Master and Workers
./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh
- Supports batch, streaming
and also embedded mode
- Low scale
- Quick start
- Supports batch only mode
- Huge scale
- Runs on Hadoop as MR
application
- Supports batch, streaming
modes
- Huge scale
- Runs on Yarn / Mesos / etc
- Supports batch, streaming
modes
- Huge scale
- Auto-Scaling / Elastic
- Runs on AWS / Azure / etc
Multiple execution modes
NEW NEW
Standalone /
Embedded
Single box / JVM with tasks
running in threads
Mapreduce Mode
As MapReduce application
with tasks running in Maps
Yarn
(In progress: Mesos)
Standalone Cluster with
Master and Workers
Cloud
(In progress: Azure)
Standalone Cluster with
Master and Workers
./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh
- Supports batch, streaming
and also embedded mode
- Low scale
- Quick start
- Supports batch only mode
- Huge scale
- Runs on Hadoop as MR
application
- Supports batch, streaming
modes
- Huge scale
- Runs on Yarn / Mesos / etc
- Supports batch, streaming
modes
- Huge scale
- Auto-Scaling / Elastic
- Runs on AWS / Azure / etc
Multiple execution modes
NEW NEW
Gobblin Service
Run as a cluster itself for HA
Gobblin on Hadoop 1
Gobblin MR application
Gobblin on AWS
Standalone Cluster
Gobblin as a Service
- REST API / UI
- Authentication
- Authorization
- Flow Management
- Flow Orchestration
- Topology Management
- Monitoring
Gobblin on Hadoop 2
Gobblin MR application
Setup Gobblin
Ingest Job
Setup Gobblin Data Format
Conversion Job
Setup Gobblin
Replication Job
HDFS 1
Write
(Avro)
Salesforce
Read /
Pull
Read
(Avro)
Write
(ORC)
HDFS 2 Read
Write
- Platform as a Service for Gobblin
- Self Serve
- Optimal Resource Use
- Seamless Failovers / Upgrades
- Global State
Global Throttling
Service
Global Throttling
Espresso
Read /
Write to
Kafka
Read
(Avro)
Write
(ORC)
Namenode
RPC Calls
- Bound total global QPS of applications
- Ensure fair distribution of QPS
- Different policy configurations
- Audit access patterns
RestLI
Limiter
RestLI
Gobblin
Limiter
RestLI
Generic App
Limiter
RestLI
Generic App
Read /
Write to
Espresso
Acquire
Permits
Acquire
Permits
Other Enhancements
- Improved and stabilized gobblin-cluster
- Enhanced stream processing
- New Sources: RegexPartitionedAvroFileSource, GoogleAnalyticsSource,
GoogleDriveSource, GoogleWebmasterSource
- New Extractors: PostgresqlExtractor, EnvelopePayloadExtractor
- New Converters: JsonToParquet, GrokToJson, JsonToAvro
- New Writers: ParquetHdfsDataWriter, SalesforceWriter
- Eventually consistent FS support
Get Involved
Visit us at : https://gobblin.apache.org
Mailing lists : https://gobblin.apache.org/mailing-lists/
Gitter : https://gitter.im/gobblin/Lobby
12

More Related Content

What's hot

Big Data in a Public Cloud
Big Data in a Public CloudBig Data in a Public Cloud
Big Data in a Public Cloud
CloudSigma
 
ASP.NET Scalability - WebDD
ASP.NET Scalability - WebDDASP.NET Scalability - WebDD
ASP.NET Scalability - WebDD
Phil Pursglove
 
SAP Open Source meetup/Speedment - Palo Alto 2015
SAP Open Source meetup/Speedment - Palo Alto 2015SAP Open Source meetup/Speedment - Palo Alto 2015
SAP Open Source meetup/Speedment - Palo Alto 2015
Speedment, Inc.
 
ASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG LondonASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG London
Phil Pursglove
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Amazon Web Services
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
Neotys
 
Implementing High Performance Drupal Sites
Implementing High Performance Drupal SitesImplementing High Performance Drupal Sites
Implementing High Performance Drupal Sites
Shri Kumar
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
mmoline
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
Chapter Three
 
Ruby Setup
Ruby SetupRuby Setup
Ruby Setup
Alan Hecht
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
Nitin Kumar
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
Almog Baku
 
Windows Azure Service Bus
Windows Azure Service BusWindows Azure Service Bus
Windows Azure Service Bus
Pavel Revenkov
 
Camel riders in the cloud
Camel riders in the cloudCamel riders in the cloud
Camel riders in the cloud
Claus Ibsen
 
Scaling application servers for efficiency
Scaling application servers for efficiencyScaling application servers for efficiency
Scaling application servers for efficiency
Tomas Doran
 
Apache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloudApache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloud
Robert Munteanu
 
AEM WITH MONGODB
AEM WITH MONGODBAEM WITH MONGODB
AEM WITH MONGODB
Nate Nelson
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
Philipp Fehre
 
Migrating and living on rds aurora
Migrating and living on rds auroraMigrating and living on rds aurora
Migrating and living on rds aurora
Balazs Pocze
 
Inception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premiumInception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premium
The Planning Lab
 

What's hot (20)

Big Data in a Public Cloud
Big Data in a Public CloudBig Data in a Public Cloud
Big Data in a Public Cloud
 
ASP.NET Scalability - WebDD
ASP.NET Scalability - WebDDASP.NET Scalability - WebDD
ASP.NET Scalability - WebDD
 
SAP Open Source meetup/Speedment - Palo Alto 2015
SAP Open Source meetup/Speedment - Palo Alto 2015SAP Open Source meetup/Speedment - Palo Alto 2015
SAP Open Source meetup/Speedment - Palo Alto 2015
 
ASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG LondonASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG London
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
Implementing High Performance Drupal Sites
Implementing High Performance Drupal SitesImplementing High Performance Drupal Sites
Implementing High Performance Drupal Sites
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Ruby Setup
Ruby SetupRuby Setup
Ruby Setup
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
 
Windows Azure Service Bus
Windows Azure Service BusWindows Azure Service Bus
Windows Azure Service Bus
 
Camel riders in the cloud
Camel riders in the cloudCamel riders in the cloud
Camel riders in the cloud
 
Scaling application servers for efficiency
Scaling application servers for efficiencyScaling application servers for efficiency
Scaling application servers for efficiency
 
Apache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloudApache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloud
 
AEM WITH MONGODB
AEM WITH MONGODBAEM WITH MONGODB
AEM WITH MONGODB
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
 
Migrating and living on rds aurora
Migrating and living on rds auroraMigrating and living on rds aurora
Migrating and living on rds aurora
 
Inception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premiumInception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premium
 

Similar to Gobblin What's New

Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay
 
Google Cloud Functions & Firebase Crash Course
Google Cloud Functions & Firebase Crash CourseGoogle Cloud Functions & Firebase Crash Course
Google Cloud Functions & Firebase Crash Course
Daniel Zivkovic
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Qubole
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Unit 5
Unit  5Unit  5
Unit 5
Ravi Kumar
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
Cisco DevNet
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Gaelyk - JFokus 2011 - Guillaume Laforge
Gaelyk - JFokus 2011 - Guillaume LaforgeGaelyk - JFokus 2011 - Guillaume Laforge
Gaelyk - JFokus 2011 - Guillaume Laforge
Guillaume Laforge
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
Guillaume Laforge
 
Introduction to MANTL Data Platform
Introduction to MANTL Data PlatformIntroduction to MANTL Data Platform
Introduction to MANTL Data Platform
Cisco DevNet
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
Jonathan Holloway
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
Rob Gulewich
 
Hadoop and MapReduce
Hadoop and MapReduceHadoop and MapReduce
Hadoop and MapReduce
Hemanth Kumar Mantri
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
Chris Bunch
 
final report
final reportfinal report
final report
Prathamesh Mantri
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
Amazon Web Services
 

Similar to Gobblin What's New (20)

Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Google Cloud Functions & Firebase Crash Course
Google Cloud Functions & Firebase Crash CourseGoogle Cloud Functions & Firebase Crash Course
Google Cloud Functions & Firebase Crash Course
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Unit 5
Unit  5Unit  5
Unit 5
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Gaelyk - JFokus 2011 - Guillaume Laforge
Gaelyk - JFokus 2011 - Guillaume LaforgeGaelyk - JFokus 2011 - Guillaume Laforge
Gaelyk - JFokus 2011 - Guillaume Laforge
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
Gaelyk update - Guillaume Laforge - SpringOne2GX 2011
 
Introduction to MANTL Data Platform
Introduction to MANTL Data PlatformIntroduction to MANTL Data Platform
Introduction to MANTL Data Platform
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
 
Hadoop and MapReduce
Hadoop and MapReduceHadoop and MapReduce
Hadoop and MapReduce
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 
final report
final reportfinal report
final report
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
 

Recently uploaded

AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 

Recently uploaded (20)

AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 

Gobblin What's New

  • 1. Agenda Talk #1: Apache Gobblin: The Latest [Abhishek Tiwari / Apache] Talk #2: How We Gobble Data at Prezi [Tamas Nemeth / Prezi] Talk #3: Foundations for a Data-Driven Marketing Engine [Michael Dreibelbis / Machine Zone] Talk #4: Data Democracy + Data Privacy at LinkedIn [Eric Ogren, Anthony Hsu / LinkedIn] Big Data Meetup: Data Integration, Management & Compliance Apache Gobblin, Dali and friends … 25th Jan, 2018
  • 2. Gobblin - What’s New? Latest and greatest from the world of Gobblin. https://gobblin.apache.org Abhishek Tiwari Apache PPMC, Committer
  • 3. Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion, replication, organization, and lifecycle management, for both streaming and batch data ecosystems. Mission Build a highly scalable platform that simplifies data integration and management for small and large data ecosystems Vision Enable data to appear anywhere you need it, in the right form
  • 4. Incubation - Incubated in Apache in February 2017 - Code donation, Apache Infrastructure setup by November 2017 - New website: https://gobblin.apache.org - New mailing lists: https://gobblin.apache.org/mailing-lists/ - New issue tracking: https://issues.apache.org/jira/projects/GOBBLIN/ - New wiki: https://cwiki.apache.org/confluence/display/GOBBLIN/Home - Design documents Open Source now: https://cwiki.apache.org/confluence/display/GOBBLIN/Design+Docs - New real time communication channel: https://gitter.im/gobblin/Lobby - Proposed new process for major initiatives: https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+Improvement+Propo sals - First external Apache committer voted in: Joel Baranick - Apache Gobblin Release 0.12.0 in progress
  • 5. Standalone / Embedded Single box / JVM with tasks running in threads Mapreduce Mode As MapReduce application with tasks running in Maps Yarn (In progress: Mesos) Standalone Cluster with Master and Workers Cloud (In progress: Azure) Standalone Cluster with Master and Workers ./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh - Supports batch, streaming and also embedded mode - Low scale - Quick start - Supports batch only mode - Huge scale - Runs on Hadoop as MR application - Supports batch, streaming modes - Huge scale - Runs on Yarn / Mesos / etc - Supports batch, streaming modes - Huge scale - Auto-Scaling / Elastic - Runs on AWS / Azure / etc Multiple execution modes NEW NEW
  • 6. Standalone / Embedded Single box / JVM with tasks running in threads Mapreduce Mode As MapReduce application with tasks running in Maps Yarn (In progress: Mesos) Standalone Cluster with Master and Workers Cloud (In progress: Azure) Standalone Cluster with Master and Workers ./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh - Supports batch, streaming and also embedded mode - Low scale - Quick start - Supports batch only mode - Huge scale - Runs on Hadoop as MR application - Supports batch, streaming modes - Huge scale - Runs on Yarn / Mesos / etc - Supports batch, streaming modes - Huge scale - Auto-Scaling / Elastic - Runs on AWS / Azure / etc Multiple execution modes NEW NEW
  • 7. Standalone / Embedded Single box / JVM with tasks running in threads Mapreduce Mode As MapReduce application with tasks running in Maps Yarn (In progress: Mesos) Standalone Cluster with Master and Workers Cloud (In progress: Azure) Standalone Cluster with Master and Workers ./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh - Supports batch, streaming and also embedded mode - Low scale - Quick start - Supports batch only mode - Huge scale - Runs on Hadoop as MR application - Supports batch, streaming modes - Huge scale - Runs on Yarn / Mesos / etc - Supports batch, streaming modes - Huge scale - Auto-Scaling / Elastic - Runs on AWS / Azure / etc Multiple execution modes NEW NEW
  • 8. Standalone / Embedded Single box / JVM with tasks running in threads Mapreduce Mode As MapReduce application with tasks running in Maps Yarn (In progress: Mesos) Standalone Cluster with Master and Workers Cloud (In progress: Azure) Standalone Cluster with Master and Workers ./gobblin.sh ./gobblin-mapreduce.sh ./gobblin-yarn.sh ./gobblin-aws.sh - Supports batch, streaming and also embedded mode - Low scale - Quick start - Supports batch only mode - Huge scale - Runs on Hadoop as MR application - Supports batch, streaming modes - Huge scale - Runs on Yarn / Mesos / etc - Supports batch, streaming modes - Huge scale - Auto-Scaling / Elastic - Runs on AWS / Azure / etc Multiple execution modes NEW NEW
  • 9. Gobblin Service Run as a cluster itself for HA Gobblin on Hadoop 1 Gobblin MR application Gobblin on AWS Standalone Cluster Gobblin as a Service - REST API / UI - Authentication - Authorization - Flow Management - Flow Orchestration - Topology Management - Monitoring Gobblin on Hadoop 2 Gobblin MR application Setup Gobblin Ingest Job Setup Gobblin Data Format Conversion Job Setup Gobblin Replication Job HDFS 1 Write (Avro) Salesforce Read / Pull Read (Avro) Write (ORC) HDFS 2 Read Write - Platform as a Service for Gobblin - Self Serve - Optimal Resource Use - Seamless Failovers / Upgrades - Global State
  • 10. Global Throttling Service Global Throttling Espresso Read / Write to Kafka Read (Avro) Write (ORC) Namenode RPC Calls - Bound total global QPS of applications - Ensure fair distribution of QPS - Different policy configurations - Audit access patterns RestLI Limiter RestLI Gobblin Limiter RestLI Generic App Limiter RestLI Generic App Read / Write to Espresso Acquire Permits Acquire Permits
  • 11. Other Enhancements - Improved and stabilized gobblin-cluster - Enhanced stream processing - New Sources: RegexPartitionedAvroFileSource, GoogleAnalyticsSource, GoogleDriveSource, GoogleWebmasterSource - New Extractors: PostgresqlExtractor, EnvelopePayloadExtractor - New Converters: JsonToParquet, GrokToJson, JsonToAvro - New Writers: ParquetHdfsDataWriter, SalesforceWriter - Eventually consistent FS support
  • 12. Get Involved Visit us at : https://gobblin.apache.org Mailing lists : https://gobblin.apache.org/mailing-lists/ Gitter : https://gitter.im/gobblin/Lobby 12