SlideShare a Scribd company logo
1 of 22
HUG France - 22 Sept 2014 - @Criteo 
Data Management platform for Hadoop 
Cédric Carbone, Talend CTO 
@carbone 
Jean-Baptiste Onofré, Falcon Committer 
@jbonofre 
Ce support est mis à disposition selon les termes de la Licence Creative 
Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 
France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ 
© Talend 2014 1
Overview 
• Falcon is a Data Management solution for Hadoop 
• Falcon in production at InMobi since 2012 
• InMobi gave Falcon to ASF in April 2013 
• Falcon is in Apache incubation 
• Falcon embedded per default inside HDP 
• Falcon leverages a lot of Apache components 
- Oozie, Ambari, ActiveMQ, HCat, Sqoop… 
• Committer/PPMC/IPMC: 
- #8 InMobi 
- #5 Hortonworks 
- #1 Talend 
© Talend 2014 2
Why Falcon? 
© Talend 2014 3
Why Falcon? 
© Talend 2014 4
What is Falcon? 
• Data Motion Import, Export, CDC 
• Policy-based Lifecycle Management Retention, Replication, 
Archival, Anonymization of PII data 
• Process orchestration and scheduling Late data handling, 
reprocessing, dependency checking, etc. Multi-cluster management to 
support Local/Global Aggregations, Rollups, etc. 
• Data Governance Lineage, Audit, SLA 
© Talend 2014 5
Falcon - The Solution! 
• Introduces a higher layer of abstraction – Data Set Decouples a 
data location and its properties from workflows Understanding the life-time 
of a feed will allow for implicit validation of the processing rules 
• Provides the key services for data processing apps Common data 
services are simple directives, No need to define them verbosely in 
each job Allows process owners to keep their processing specific to 
their application logic Sits in the execution path, intercepts to handle 
OOB data / retries etc. 
• Promotes Polyglot Programming Does not do any heavy lifting but 
delegates to tools with in the Hadoop ecosystem 
© Talend 2014 6
Falcon Basic Concepts : Data Pipelines 
• Cluster: : Represents the Hadoop cluster 
• Feed: Defines a “dataset” 
• Process: Consumes feeds, invokes processing logic & produces feeds 
• Entity Actions: submit, list, dependency, schedule, suspend, resume, status, 
definition, delete, update 
© Talend 2014 7
Falcon Entity Relationships 
© Talend 2014 8
Cluster Entity 
<?xml version="1.0"?> 
<cluster colo=”talend-datacenter" description="" name=”prod-cluster"> 
<interfaces> 
Needed by distcp for replications 
Writing to HDFS 
<interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" /> 
<interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" /> 
<interface type="execute" endpoint=”rm:8050" version="2.2.0" /> 
<interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" /> 
<interface type=”registry" endpoint=”thrift://hms:9083" version=”0.12.0" /> 
<interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" /> 
</interfaces> 
<locations> 
Used to submit processes as MR 
Submit Oozie jobs 
<location name="staging" path="/apps/falcon/prod-cluster/staging" /> 
<location name="temp" path="/tmp" /> 
<location name="working" path="/apps/falcon/prod-cluster/working" /> 
</locations> 
</cluster> 
Hive metastore to register/ 
deregister partitions and 
get events on partition availability 
Used For alerts 
HDFS directories used by Falcon server 
© Talend 2014 9
Feed Entity 
Feed run frequency in mins/hrs/days/mths 
<?xml version="1.0"?> 
<feed description=“" name=”testFeed” xmlns="uri:falcon:feed:0.1"> 
<frequency>hours(1)</frequency> 
<late-arrival cut-off="hours(6)”/> 
<groups>churnAnalysisFeeds</groups> 
<clusters> 
Late arrival cutoff 
<cluster name=”cluster-primary" type="source"> 
Feeds can belong to multiple groups 
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> 
<retention limit="days(2)" action="delete"/> 
One or more source & target clusters for retention & replication 
</cluster> 
<cluster name=”cluster-secondary" type="target"> 
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> 
<location type="data” path="/churn/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> 
<retention limit=”days(7)" action="delete"/> 
</cluster> 
</clusters> 
<locations> 
Global location across clusters - HDFS paths or Hive tables 
<location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> 
</locations> 
<ACL owner=”hdfs" group="users" permission="0755"/> 
<schema location="/none" provider="none"/> 
</feed> 
Access Permissions 
© Talend 2014 10
Process Entity 
<process name="process-test" xmlns="uri:falcon:process:0.1”> 
<clusters> 
<cluster name="cluster-primary"> 
Which cluster should the process run on and when 
<validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z" /> 
</cluster> 
How frequently does the process run , how many instances can be run in parallel and in what order 
</clusters> 
<parallel>1</parallel> 
<order>FIFO</order> 
<frequency>days(1)</frequency> 
<inputs> 
<input end="today(0,0)" start="today(0,0)" feed="feed-clicks-raw" name="input" /> 
</inputs> 
<outputs> 
Input & output feeds for process 
<output instance="now(0,2)" feed="feed-clicks-clean" name="output" /> 
</outputs> 
<workflow engine="pig" path="/apps/clickstream/clean-script.pig" /> 
<retry policy="periodic" delay="minutes(10)" attempts="3"/> 
<late-process policy="exp-backoff" delay="hours(1)"> 
<late-input input="input" workflow-path="/apps/clickstream/late" /> 
</late-process> 
</process> 
The processing logic. 
Retry policy on failure Handling late input feeds 
© Talend 2014 11
High Level Architecture 
© Talend 2014 12
Demo #1 : my first feed 
• Start Falcon on a single cluster 
• Submission of one simple feed 
• Check into Oozie the generated job 
© Talend 2014 13
Replication & Retention 
• Sophisticated retention policies expressed in one place 
• Simplify data retention for audit, compliance, or for data re-processing 
© Talend 2014 14
Demo #2: processes notification 
 Motivation: being able to be notified about processes activity “outside” 
of the cluster 
 Falcon manages workflow, send JMS notification, and a Camel route 
react to the notification 
 Result: trigger an action (Camel route) when a notification is sent by 
Falcon (eviction, late-arrival, process execution, …) 
 Mix BigData/Hadoop and ESB technologies 
© Talend 2014 15
Demo #2: workflow 
Falcon ActiveMQ 
Camel 
2014-03-19 11:25:43,273 | INFO | LCON.my-process] | process-listener | rg.apache.camel.util.CamelLogger 
176 | 74 - org.apache.camel.camel-core - 2.13.0.SNAPSHOT | Exchange[ExchangePattern: InOnly, BodyType: 
java.util.HashMap, Body: {brokerUrl=tcp://localhost:61616, timeStamp=2014-03-19T10:24Z, status=SUCCEEDED, 
logFile=hdfs://localhost:8020/falcon/staging/falcon/workflows/process/my-process/logs/instancePaths-2013-11-15-06- 
05.csv, feedNames=output, runId=0, entityType=process, nominalTime=2013-11-15T06:05Z, brokerTTL=4320, 
workflowUser=null, entityName=my-process, feedInstancePaths=hdfs://localhost:8020/data/output, 
operation=GENERATE, logDir=null, workflowId=0000026-140319105443372-oozie-jbon-W, cluster=local, 
brokerImplClass=org.apache.activemq.ActiveMQConnectionFactory, topicName=FALCON.my-process}] 
© Talend 2014 16
Demo #2 : Data Notification 
• All notification will be in ActiveMQ 
• Subscribers : Camel routes 
• Add some files and see the notification system working! 
• http://blog.nanthrax.net/2014/03/hadoop-cdc-and-processes-notification-with- 
apache-falcon-apache-activemq-and-apache-camel/ 
© Talend 2014 17
Data Pipeline Tracing 
© Talend 2014 18
Topologies 
• STANDALONE 
– Single Data Center 
– Single Falcon Server 
– Hadoop jobs and relevant processing involves only one cluster 
• DISTRIBUTED 
– Multiple Data Centers 
– Falcon Server per DC 
– Multiple instances of hadoop clusters and workflow schedulers 
© Talend 2014 19
Multi cluster failover 
 Motivation: replicate subset of data from one cluster to another, 
guarantee the different eviction depending of the data subset 
 Falcon manages workflow on primary cluster, and replication on failover 
cluster 
 Result: support business continuity without requiring full data 
reprocessing 
© Talend 2014 20
Roadmap 
 Improvement on the late-arrival messages in FALCON.ENTITY.TOPIC 
(ActiveMQ) 
 Feed implicit processes (no need of process) providing “native” CDC 
 Straight forward MR usage (without pig, oozie workflow XML, …) 
 More data acquisition 
 Monitoring/Management/Designer dashboard 
 TLP !!! 
© Talend 2014 21
Questions? 
Data Management platform for Hadoop 
Cédric Carbone, Talend CTO 
@carbone 
Jean-Baptiste Onofré, Falcon Committer 
@jbonofre 
Ce support est mis à disposition selon les termes de la Licence Creative 
Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 
France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ 
© Talend 2014 22

More Related Content

What's hot

Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionXuan Gong
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Kellyn Pot'Vin-Gorman
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDataWorks Summit
 
Understanding oracle rac internals part 2 - slides
Understanding oracle rac internals   part 2 - slidesUnderstanding oracle rac internals   part 2 - slides
Understanding oracle rac internals part 2 - slidesMohamed Farouk
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsSandesh Rao
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it Sandesh Rao
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cLeighton Nelson
 
Oracle RAC - A Safe Investment into the Future of Your IT
Oracle RAC - A Safe Investment into the Future of Your ITOracle RAC - A Safe Investment into the Future of Your IT
Oracle RAC - A Safe Investment into the Future of Your ITMarkus Michalewicz
 
Expert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACExpert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACSolarWinds
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Optimizing the Enterprise Manager 12c
Optimizing the Enterprise Manager 12cOptimizing the Enterprise Manager 12c
Optimizing the Enterprise Manager 12cKellyn Pot'Vin-Gorman
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Kellyn Pot'Vin-Gorman
 
Oracle RAC 12c Release 2 - Overview
Oracle RAC 12c Release 2 - OverviewOracle RAC 12c Release 2 - Overview
Oracle RAC 12c Release 2 - OverviewMarkus Michalewicz
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problemsChristo Kutrovsky
 

What's hot (20)

Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Understanding oracle rac internals part 2 - slides
Understanding oracle rac internals   part 2 - slidesUnderstanding oracle rac internals   part 2 - slides
Understanding oracle rac internals part 2 - slides
 
MySQL Fabric
MySQL FabricMySQL Fabric
MySQL Fabric
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata Environments
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
Future of Apache Storm
Future of Apache StormFuture of Apache Storm
Future of Apache Storm
 
Oracle RAC - A Safe Investment into the Future of Your IT
Oracle RAC - A Safe Investment into the Future of Your ITOracle RAC - A Safe Investment into the Future of Your IT
Oracle RAC - A Safe Investment into the Future of Your IT
 
Expert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACExpert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RAC
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Optimizing the Enterprise Manager 12c
Optimizing the Enterprise Manager 12cOptimizing the Enterprise Manager 12c
Optimizing the Enterprise Manager 12c
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016
 
Maximizing Oracle RAC Uptime
Maximizing Oracle RAC UptimeMaximizing Oracle RAC Uptime
Maximizing Oracle RAC Uptime
 
AWR & ASH Analysis
AWR & ASH AnalysisAWR & ASH Analysis
AWR & ASH Analysis
 
Oracle RAC 12c Release 2 - Overview
Oracle RAC 12c Release 2 - OverviewOracle RAC 12c Release 2 - Overview
Oracle RAC 12c Release 2 - Overview
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problems
 

Similar to Apache Falcon _ Hadoop User Group France 22-sept-2014

Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...li David
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Courtney Llamas
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCodemotion
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchEMC
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)jeckels
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspectivePriit Piipuu
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaNenad Bogojevic
 

Similar to Apache Falcon _ Hadoop User Group France 22-sept-2014 (20)

Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platform
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
OpenStack Murano
OpenStack MuranoOpenStack Murano
OpenStack Murano
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspective
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and Kafka
 

More from Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlusModern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 

More from Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 

Recently uploaded

VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 

Recently uploaded (20)

VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICECall Girls Service Dwarka @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICE
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 

Apache Falcon _ Hadoop User Group France 22-sept-2014

  • 1. HUG France - 22 Sept 2014 - @Criteo Data Management platform for Hadoop Cédric Carbone, Talend CTO @carbone Jean-Baptiste Onofré, Falcon Committer @jbonofre Ce support est mis à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ © Talend 2014 1
  • 2. Overview • Falcon is a Data Management solution for Hadoop • Falcon in production at InMobi since 2012 • InMobi gave Falcon to ASF in April 2013 • Falcon is in Apache incubation • Falcon embedded per default inside HDP • Falcon leverages a lot of Apache components - Oozie, Ambari, ActiveMQ, HCat, Sqoop… • Committer/PPMC/IPMC: - #8 InMobi - #5 Hortonworks - #1 Talend © Talend 2014 2
  • 3. Why Falcon? © Talend 2014 3
  • 4. Why Falcon? © Talend 2014 4
  • 5. What is Falcon? • Data Motion Import, Export, CDC • Policy-based Lifecycle Management Retention, Replication, Archival, Anonymization of PII data • Process orchestration and scheduling Late data handling, reprocessing, dependency checking, etc. Multi-cluster management to support Local/Global Aggregations, Rollups, etc. • Data Governance Lineage, Audit, SLA © Talend 2014 5
  • 6. Falcon - The Solution! • Introduces a higher layer of abstraction – Data Set Decouples a data location and its properties from workflows Understanding the life-time of a feed will allow for implicit validation of the processing rules • Provides the key services for data processing apps Common data services are simple directives, No need to define them verbosely in each job Allows process owners to keep their processing specific to their application logic Sits in the execution path, intercepts to handle OOB data / retries etc. • Promotes Polyglot Programming Does not do any heavy lifting but delegates to tools with in the Hadoop ecosystem © Talend 2014 6
  • 7. Falcon Basic Concepts : Data Pipelines • Cluster: : Represents the Hadoop cluster • Feed: Defines a “dataset” • Process: Consumes feeds, invokes processing logic & produces feeds • Entity Actions: submit, list, dependency, schedule, suspend, resume, status, definition, delete, update © Talend 2014 7
  • 8. Falcon Entity Relationships © Talend 2014 8
  • 9. Cluster Entity <?xml version="1.0"?> <cluster colo=”talend-datacenter" description="" name=”prod-cluster"> <interfaces> Needed by distcp for replications Writing to HDFS <interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" /> <interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" /> <interface type="execute" endpoint=”rm:8050" version="2.2.0" /> <interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" /> <interface type=”registry" endpoint=”thrift://hms:9083" version=”0.12.0" /> <interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" /> </interfaces> <locations> Used to submit processes as MR Submit Oozie jobs <location name="staging" path="/apps/falcon/prod-cluster/staging" /> <location name="temp" path="/tmp" /> <location name="working" path="/apps/falcon/prod-cluster/working" /> </locations> </cluster> Hive metastore to register/ deregister partitions and get events on partition availability Used For alerts HDFS directories used by Falcon server © Talend 2014 9
  • 10. Feed Entity Feed run frequency in mins/hrs/days/mths <?xml version="1.0"?> <feed description=“" name=”testFeed” xmlns="uri:falcon:feed:0.1"> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)”/> <groups>churnAnalysisFeeds</groups> <clusters> Late arrival cutoff <cluster name=”cluster-primary" type="source"> Feeds can belong to multiple groups <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> <retention limit="days(2)" action="delete"/> One or more source & target clusters for retention & replication </cluster> <cluster name=”cluster-secondary" type="target"> <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> <location type="data” path="/churn/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> <retention limit=”days(7)" action="delete"/> </cluster> </clusters> <locations> Global location across clusters - HDFS paths or Hive tables <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> </locations> <ACL owner=”hdfs" group="users" permission="0755"/> <schema location="/none" provider="none"/> </feed> Access Permissions © Talend 2014 10
  • 11. Process Entity <process name="process-test" xmlns="uri:falcon:process:0.1”> <clusters> <cluster name="cluster-primary"> Which cluster should the process run on and when <validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z" /> </cluster> How frequently does the process run , how many instances can be run in parallel and in what order </clusters> <parallel>1</parallel> <order>FIFO</order> <frequency>days(1)</frequency> <inputs> <input end="today(0,0)" start="today(0,0)" feed="feed-clicks-raw" name="input" /> </inputs> <outputs> Input & output feeds for process <output instance="now(0,2)" feed="feed-clicks-clean" name="output" /> </outputs> <workflow engine="pig" path="/apps/clickstream/clean-script.pig" /> <retry policy="periodic" delay="minutes(10)" attempts="3"/> <late-process policy="exp-backoff" delay="hours(1)"> <late-input input="input" workflow-path="/apps/clickstream/late" /> </late-process> </process> The processing logic. Retry policy on failure Handling late input feeds © Talend 2014 11
  • 12. High Level Architecture © Talend 2014 12
  • 13. Demo #1 : my first feed • Start Falcon on a single cluster • Submission of one simple feed • Check into Oozie the generated job © Talend 2014 13
  • 14. Replication & Retention • Sophisticated retention policies expressed in one place • Simplify data retention for audit, compliance, or for data re-processing © Talend 2014 14
  • 15. Demo #2: processes notification  Motivation: being able to be notified about processes activity “outside” of the cluster  Falcon manages workflow, send JMS notification, and a Camel route react to the notification  Result: trigger an action (Camel route) when a notification is sent by Falcon (eviction, late-arrival, process execution, …)  Mix BigData/Hadoop and ESB technologies © Talend 2014 15
  • 16. Demo #2: workflow Falcon ActiveMQ Camel 2014-03-19 11:25:43,273 | INFO | LCON.my-process] | process-listener | rg.apache.camel.util.CamelLogger 176 | 74 - org.apache.camel.camel-core - 2.13.0.SNAPSHOT | Exchange[ExchangePattern: InOnly, BodyType: java.util.HashMap, Body: {brokerUrl=tcp://localhost:61616, timeStamp=2014-03-19T10:24Z, status=SUCCEEDED, logFile=hdfs://localhost:8020/falcon/staging/falcon/workflows/process/my-process/logs/instancePaths-2013-11-15-06- 05.csv, feedNames=output, runId=0, entityType=process, nominalTime=2013-11-15T06:05Z, brokerTTL=4320, workflowUser=null, entityName=my-process, feedInstancePaths=hdfs://localhost:8020/data/output, operation=GENERATE, logDir=null, workflowId=0000026-140319105443372-oozie-jbon-W, cluster=local, brokerImplClass=org.apache.activemq.ActiveMQConnectionFactory, topicName=FALCON.my-process}] © Talend 2014 16
  • 17. Demo #2 : Data Notification • All notification will be in ActiveMQ • Subscribers : Camel routes • Add some files and see the notification system working! • http://blog.nanthrax.net/2014/03/hadoop-cdc-and-processes-notification-with- apache-falcon-apache-activemq-and-apache-camel/ © Talend 2014 17
  • 18. Data Pipeline Tracing © Talend 2014 18
  • 19. Topologies • STANDALONE – Single Data Center – Single Falcon Server – Hadoop jobs and relevant processing involves only one cluster • DISTRIBUTED – Multiple Data Centers – Falcon Server per DC – Multiple instances of hadoop clusters and workflow schedulers © Talend 2014 19
  • 20. Multi cluster failover  Motivation: replicate subset of data from one cluster to another, guarantee the different eviction depending of the data subset  Falcon manages workflow on primary cluster, and replication on failover cluster  Result: support business continuity without requiring full data reprocessing © Talend 2014 20
  • 21. Roadmap  Improvement on the late-arrival messages in FALCON.ENTITY.TOPIC (ActiveMQ)  Feed implicit processes (no need of process) providing “native” CDC  Straight forward MR usage (without pig, oozie workflow XML, …)  More data acquisition  Monitoring/Management/Designer dashboard  TLP !!! © Talend 2014 21
  • 22. Questions? Data Management platform for Hadoop Cédric Carbone, Talend CTO @carbone Jean-Baptiste Onofré, Falcon Committer @jbonofre Ce support est mis à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ © Talend 2014 22