SlideShare a Scribd company logo
#PARISDATAENG’ MEETUP
CHANGE DATA CAPTURE WITH
DATA COLLECTOR
HUGO LARCHER
DATA & SOFTWARE ENGINEER @OVH
@hugoch
DIMITRI CAPITAINE
SENIOR DEVOPS BIG DATA @OVH
@pirion
Big data common pattern
Messages bus, Twitter feed, website statistics, ….
Big data Cluster
Software (Hadoop)
Compute (CPU/RAM)
Storage
Data « at rest »
Data « in motion »
CSV, JSON, Database dump, …
Better understanding
Better decisions
Analyze, Exploit dataCollect & store various data
Perform
massive
operations
Data + OVH = ❤
OVH
Data collector
Cloudera
platform
(fully managed)
Analytics Data
Platform
Apache Spark
as a Service
Machine
Learning
NVIDIA NGC
catalog
Collect data Store data Process, Analyze Learn, predict
Free Lab
Free Lab
Free Lab
Object storage
Block storage
Storage
dedicated
servers
File storage
Managed
databases
Logs & Metrics
Data @ OVH
OVH Service
?
OVH SI
Data @ OVH
WHY :
• Centralize Data
• Extract value from data
• Enrich knowledge
• Setup data driven process Datawarehouse
Datalake
Data @ OVH
How to get data without impacting production database ?
Datalake
Data @ OVH
Just Query ?
Datalake
Select *
A lightweight data replication tool
Data Collector
Data Collector client
A lightweight data replication tool
KafkaData Source
Data Collector
OVH Cloud
SinkSource
Data Collector client
Performance
• 300 000 events/s in "Query" Mode
• ~40 000 events/s in "Change data capture"
Mode
Reliability
• Failure tolerant
• Encrypted
• Source Filter
Simplicity
• Remote control by API
• modular
A lightweight data replication tool
Data @ OVH
How to get data from kafka to datalake ?
Datalake
Kafka
A Distributed data replication Job
Data collector Ingest
Data Ingest
A Distributed data replication Job
Kafka
Kafka Data Ingest
OVH Cloud
SinkSource
Datalake
Data Ingest
Capability
• Auto Scalable
• Distributed
• Streaming
• Flink Powered
Performance
• > 5 000 000 events/s
Reliability
• Failure tolerant
A Distributed data replication Job
Datalake
Data Collector Suite
Why using kafka betwen collector and ingest ?
Data Collector suite
Why using kafka betwen collector and ingest ?
Private Network
Data
Collector
Private Network
Datalake
Data
Ingest
Stream fail & recover
Data Pipeline
Data Pipeline
Stream fail & recover
Kafka
Datalake
Data
Collector
Binary log JSON Event
JSON Event
Data
Ingest
SQL
database or Agent fails
Agent offset stored for recover
Data Pipeline
Stream fail & recover
Kafka
Datalake
Data
Collector
Binary log JSON Event
JSON Event
Data
Ingest
SQL
Server-Kafka fails
agent wait
Stop sending event
Data Pipeline
Stream fail & recover
Kafka
Datalake
Data
Collector
Binary log JSON Event
JSON Event
Data
Ingest
SQL
Flink – Phoenix fails
Job fails
Offset not commited = Replay
Remote Control & Query mode
Remote Control & Query mode
Kafka
Datalake
Data
Collector
statements JSON Event
JSON Event
Data
Ingest
SQL
APIBus GRPC
The replication solution
Data collector suite @OVH
Data Collector suite @OVH
> 200 agents deployed
Private Network
Hive
Data
Ingest
Private Network SI
Marathon
X 100
Team infra
Cloud
X 13
FLIGHT TRACKING WITH
DATACOLLECTOR
HUGO LARCHER
DATA & SOFTWARE ENGINEER @OVH
@hugoch
Use case : Aeronautics industry
Image credits :
Image credits :
ADS-B
GPS position
Speed
Heading
Altitude
~0.5 msg/s
1
RECEIVE & STORE2 COMPUTE3 VISUALIZE4
Plane tracking
ADS-B message transmission
ADS-B message structure
DF 5 | ** 3 | ICAO 24 | DATA 56 | PI 24
Downlink format
Mode S à DF=17
Capability Aircraft unique
registration
Data Parity for
checksum
Hexadecimal 8D 4840D6 202CC371C32CE0 576098
Binary 10001 101 010010000100
000011010110
[00100]0000010110011
00001101110001110000
110010110011100000
010101110110
000010011000
Decimal 17 5 [4] ................................
Field type DF CA ICAO [TC] DATA PARITY
Example raw message 8D4840D6202CC371C32CE0576098
Source https://mode-s.org/decode/adsb/introduction.html
Let’s sc n the sky...
ADSB USB receiver Raspberry Pi v2 dump1090 OVH Data Collector
with custom source
Kafka
...and push to Kafka...
OVH Data Collector
1 day
...with small footprint
Plane tracking
Plane tracking... at scale
17,000+ receivers
200k flights/day
105,000,000 pts/hour
ANALYTICS
DATA PLATFORM
For this demo:
3 master nodes
3 compute nodes
2x NVMe 2To per node
2.4Ghz 8 vCores per node
80Go RAM per node
Welcome Analytics Data Platform !
… to production !
From zero…
02
04
03
01
Flexible infra, flexible payment
On top of OVH Public Cloud
Competitive pricing
Ready to use Hadoop cluster
Secured and configured
Performance
Soon : High-speed storage instances (NVMe)
W
ithin 1 hour
Analyzing a flight dataset
Archive data
3 months ~3.5TB
Raspberry Pi
OVH DATA COLLECTOR
ANALYTICS DATA
PLATFORM
+
Aggregating flightpaths
Aggregating routes
YUL-CDG
Coverage map
Source: https://eng.uber.com/h3/
DEMO TIME
Useful URLs
ü Lab Data Collector : https://labs.ovh.com/ovh-data-collector
ü Data Collector Agent Github : https://github.com/Pirionfr/lookatch-agent
ü Lab Spark as a Service : https://labs.ovh.com/analytics-data-compute
ü Big data Analytics Data Platform offer : https://www.ovh.com/fr/platform/big-data/analytics-data-platform.xml
ü Big Data Cloudera offer : https://www.ovh.com/fr/platform/big-data/managed-cluster.xml
ü AI solutions : https://www.ovh.com/fr/platform/ai-machine-learning.xml
ü NVIDIA NGC : https://www.ovh.com/fr/public-cloud/instances/gpu-tesla.xml
ü Lab Machine Learning : https://labs.ovh.com/machine-learning-platform
ü Lab Premium Databases : https://labs.ovh.com/ha-database
Thanks all !
THANKS ALL!
✈

More Related Content

What's hot

Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
SingleStore
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
Membase
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Databricks
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Databricks
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
HostedbyConfluent
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
confluent
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
Michael Stack
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
Michael Stack
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Michael Stack
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
Michael Stack
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
Databricks
 

What's hot (20)

Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 

Similar to Change Data Capture with Data Collector @OVH

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Rajit Saha
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
nnakasone
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
iguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30th
iguazio
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
InfluxData
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
Data Science Thailand
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Lace Lofranco
 
Introduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data PlatformIntroduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data Platform
OVHcloud
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep Dive
Conrad23
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
csching
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application Patterns
Daniel Nüst
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
Avere Systems
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
Serhat Dirik
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
iguazio
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
Wei Gong
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Virtualized Platform Migration On A Validated System
Virtualized Platform Migration On A Validated SystemVirtualized Platform Migration On A Validated System
Virtualized Platform Migration On A Validated System
gazdagf
 

Similar to Change Data Capture with Data Collector @OVH (20)

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
iguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30th
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Introduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data PlatformIntroduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data Platform
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep Dive
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application Patterns
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Virtualized Platform Migration On A Validated System
Virtualized Platform Migration On A Validated SystemVirtualized Platform Migration On A Validated System
Virtualized Platform Migration On A Validated System
 

More from Paris Data Engineers !

Spark tools by Jonathan Winandy
Spark tools by Jonathan WinandySpark tools by Jonathan Winandy
Spark tools by Jonathan Winandy
Paris Data Engineers !
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
SCIO : Apache Beam API
SCIO : Apache Beam APISCIO : Apache Beam API
SCIO : Apache Beam API
Paris Data Engineers !
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
Paris Data Engineers !
 
REX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre schedulerREX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre scheduler
Paris Data Engineers !
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
Paris Data Engineers !
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Paris Data Engineers !
 
Introduction à Apache Pulsar
 Introduction à Apache Pulsar Introduction à Apache Pulsar
Introduction à Apache Pulsar
Paris Data Engineers !
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Paris Data Engineers !
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
Paris Data Engineers !
 
Scala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan WinandyScala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan Winandy
Paris Data Engineers !
 

More from Paris Data Engineers ! (11)

Spark tools by Jonathan Winandy
Spark tools by Jonathan WinandySpark tools by Jonathan Winandy
Spark tools by Jonathan Winandy
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
SCIO : Apache Beam API
SCIO : Apache Beam APISCIO : Apache Beam API
SCIO : Apache Beam API
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
 
REX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre schedulerREX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre scheduler
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
 
Introduction à Apache Pulsar
 Introduction à Apache Pulsar Introduction à Apache Pulsar
Introduction à Apache Pulsar
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
 
Scala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan WinandyScala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan Winandy
 

Recently uploaded

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 

Recently uploaded (20)

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 

Change Data Capture with Data Collector @OVH

  • 1. #PARISDATAENG’ MEETUP CHANGE DATA CAPTURE WITH DATA COLLECTOR HUGO LARCHER DATA & SOFTWARE ENGINEER @OVH @hugoch DIMITRI CAPITAINE SENIOR DEVOPS BIG DATA @OVH @pirion
  • 2. Big data common pattern Messages bus, Twitter feed, website statistics, …. Big data Cluster Software (Hadoop) Compute (CPU/RAM) Storage Data « at rest » Data « in motion » CSV, JSON, Database dump, … Better understanding Better decisions Analyze, Exploit dataCollect & store various data Perform massive operations
  • 3. Data + OVH = ❤ OVH Data collector Cloudera platform (fully managed) Analytics Data Platform Apache Spark as a Service Machine Learning NVIDIA NGC catalog Collect data Store data Process, Analyze Learn, predict Free Lab Free Lab Free Lab Object storage Block storage Storage dedicated servers File storage Managed databases Logs & Metrics
  • 4. Data @ OVH OVH Service ? OVH SI
  • 5. Data @ OVH WHY : • Centralize Data • Extract value from data • Enrich knowledge • Setup data driven process Datawarehouse Datalake
  • 6. Data @ OVH How to get data without impacting production database ? Datalake
  • 7. Data @ OVH Just Query ? Datalake Select *
  • 8. A lightweight data replication tool Data Collector
  • 9. Data Collector client A lightweight data replication tool KafkaData Source Data Collector OVH Cloud SinkSource
  • 10. Data Collector client Performance • 300 000 events/s in "Query" Mode • ~40 000 events/s in "Change data capture" Mode Reliability • Failure tolerant • Encrypted • Source Filter Simplicity • Remote control by API • modular A lightweight data replication tool
  • 11. Data @ OVH How to get data from kafka to datalake ? Datalake Kafka
  • 12. A Distributed data replication Job Data collector Ingest
  • 13. Data Ingest A Distributed data replication Job Kafka Kafka Data Ingest OVH Cloud SinkSource Datalake
  • 14. Data Ingest Capability • Auto Scalable • Distributed • Streaming • Flink Powered Performance • > 5 000 000 events/s Reliability • Failure tolerant A Distributed data replication Job Datalake
  • 15. Data Collector Suite Why using kafka betwen collector and ingest ?
  • 16. Data Collector suite Why using kafka betwen collector and ingest ? Private Network Data Collector Private Network Datalake Data Ingest
  • 17. Stream fail & recover Data Pipeline
  • 18. Data Pipeline Stream fail & recover Kafka Datalake Data Collector Binary log JSON Event JSON Event Data Ingest SQL database or Agent fails Agent offset stored for recover
  • 19. Data Pipeline Stream fail & recover Kafka Datalake Data Collector Binary log JSON Event JSON Event Data Ingest SQL Server-Kafka fails agent wait Stop sending event
  • 20. Data Pipeline Stream fail & recover Kafka Datalake Data Collector Binary log JSON Event JSON Event Data Ingest SQL Flink – Phoenix fails Job fails Offset not commited = Replay
  • 21. Remote Control & Query mode
  • 22. Remote Control & Query mode Kafka Datalake Data Collector statements JSON Event JSON Event Data Ingest SQL APIBus GRPC
  • 23. The replication solution Data collector suite @OVH
  • 24. Data Collector suite @OVH > 200 agents deployed Private Network Hive Data Ingest Private Network SI Marathon X 100 Team infra Cloud X 13
  • 25. FLIGHT TRACKING WITH DATACOLLECTOR HUGO LARCHER DATA & SOFTWARE ENGINEER @OVH @hugoch
  • 26. Use case : Aeronautics industry Image credits :
  • 28. ADS-B GPS position Speed Heading Altitude ~0.5 msg/s 1 RECEIVE & STORE2 COMPUTE3 VISUALIZE4 Plane tracking
  • 30. ADS-B message structure DF 5 | ** 3 | ICAO 24 | DATA 56 | PI 24 Downlink format Mode S à DF=17 Capability Aircraft unique registration Data Parity for checksum Hexadecimal 8D 4840D6 202CC371C32CE0 576098 Binary 10001 101 010010000100 000011010110 [00100]0000010110011 00001101110001110000 110010110011100000 010101110110 000010011000 Decimal 17 5 [4] ................................ Field type DF CA ICAO [TC] DATA PARITY Example raw message 8D4840D6202CC371C32CE0576098 Source https://mode-s.org/decode/adsb/introduction.html
  • 31. Let’s sc n the sky... ADSB USB receiver Raspberry Pi v2 dump1090 OVH Data Collector with custom source Kafka
  • 32. ...and push to Kafka... OVH Data Collector 1 day
  • 35. Plane tracking... at scale 17,000+ receivers 200k flights/day 105,000,000 pts/hour ANALYTICS DATA PLATFORM For this demo: 3 master nodes 3 compute nodes 2x NVMe 2To per node 2.4Ghz 8 vCores per node 80Go RAM per node
  • 36. Welcome Analytics Data Platform ! … to production ! From zero… 02 04 03 01 Flexible infra, flexible payment On top of OVH Public Cloud Competitive pricing Ready to use Hadoop cluster Secured and configured Performance Soon : High-speed storage instances (NVMe) W ithin 1 hour
  • 37. Analyzing a flight dataset Archive data 3 months ~3.5TB Raspberry Pi OVH DATA COLLECTOR ANALYTICS DATA PLATFORM +
  • 42. Useful URLs ü Lab Data Collector : https://labs.ovh.com/ovh-data-collector ü Data Collector Agent Github : https://github.com/Pirionfr/lookatch-agent ü Lab Spark as a Service : https://labs.ovh.com/analytics-data-compute ü Big data Analytics Data Platform offer : https://www.ovh.com/fr/platform/big-data/analytics-data-platform.xml ü Big Data Cloudera offer : https://www.ovh.com/fr/platform/big-data/managed-cluster.xml ü AI solutions : https://www.ovh.com/fr/platform/ai-machine-learning.xml ü NVIDIA NGC : https://www.ovh.com/fr/public-cloud/instances/gpu-tesla.xml ü Lab Machine Learning : https://labs.ovh.com/machine-learning-platform ü Lab Premium Databases : https://labs.ovh.com/ha-database