HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)

•

0 likes•1,764 views

Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution) Avec l'extraction de données stockées dans une base de données relationnelle à l'aide d'un outil de BI avancé, et avec l'envoi via Kafka des données vers Tachyon, plusieurs sessions Spark peuvent travailler sur le même dataset en limitant la duplication. On obtient grâce à cela une communication à coût contrôlé entre la base de données d'origine et Spark ce qui permet de réintroduire de manière dynamique les données modifiées avec MLlib tout en travaillant sur des données à jour. Les résultats préliminaires seront partagés durant cette présentation.

Internet

February 16th
2016
louis.rabiet@squidsolutions.com
Migrating structured data between Hadoop
and RDBMS

Who am I?
• Full Stack engineer at Squid Solutions.
• Specialised in Big data.
• Fun fact: sleeping by myself in my tent on
the top of the highest mountains of the world

What I do ?
• Develop of an analytics toolbox.
• No setup. No SQL. No compromise.
• Generate SQL with a REST API.
It is open source!
https://github.com/openbouquet

Topic of today
• You need Scalability?
• You need a machine learning toolbox?
Hadoop is the solution.
•But you still need structured data?
Our tool provide a solution.
=> We need both!

What does that mean?
• Creation of dataset in Bouquet
• Send the dataset to Spark
• Enrich inside Spark
• Re-injection in original database

How we do it?
User input
Relational
DB
SparkBouquet

How does it work?
BouquetRelational
DB
Spark
HDFS/
Tachyon
Hive
Metastore
User select the data. Bouquet generate the corresponding SQL Code
Kafka

How does it work?
BouquetRelational
DB
Spark
HDFS/
Tachyon
Hive
Metastore
Data is read from the SQL database
Kafka

How does it work?
BouquetRelational
DB
Spark
HDFS/
Tachyon
Hive
Metastore
The BI tool creates an avro schema and send the data to Kafka
Kafka

How does it work?
BouquetRelational
DB
Spark
Kafka
HDFS/
Tachyon
Hive
Metastore
Kafka Broker(s) receive the data

How does it work?
BouquetRelational
DB
Spark
HDFS/
Tachyon
Hive
Metastore
Kafka
The hive metastore is updated and the hdfs connectors writes into hdfs

How to keep the data structured?
Use a schema registry (Avro in Kafka).
each schema has a corresponding kafka topic and a distinct hive table.
{
"type": "record",
"name": "ArtistGender",
"fields" : [
{"name": "count", "type": "long"},
{"name": "gender", "type": "String"]}
]
}

Challenges
- Auto creation of topics/table in Hive for each datasets from Bouquet.
- JDBC reads are too slow for something like Kafka.
- Issue with types conversion: null is not supported for all cases for example (issue
272 on schema-registry).
- Versions: Kafka 0.9.0, Tachyon 0.7.1, Spark 1.5.2 with HortonWorks 2.3.4 (Dec
2015)
- Hive: Setting the warehouse directory.
- In tachyon: Setting up hostname.

Tachyon?
• Use it as in memory filesystem to replace
HDFS.
• Interact with Spark using the hdfs plugin.
• Transparent from user point of view

Status
Injection SQL -> Spark: OK
Spark usage: OK
Re-injection: In alpha stage.

Re-injection
Two solutions:
• Spark user notifies Bouquet that data has
changed (using a custom function)
• Bouquet pulls the data from spark

We use it for real!
Collaborating with La Poste to be able to
use Spark and the re-injection mechanism
to use Bouquet and a geographical
visualisation.

In the future
• Notebook integration
• We got a DSL for bouquet API, we may
want to have built-in support spark.
• Improve scalability (Bulk Unload and
Kafka fine tuning)

DB HD
Bouquet Architecture
Bouquet Server
SQL DATA
JDBC
Dynamic Caching
& Indexing
REST APIBusiness Modeling OAuth2
Generic Apps
Multi-Tenant
REDIS Elastic MongoDB
JS/SDK Custom Apps

What's hot

Visualizing big data in the browser using spark

Databricks

R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is an R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is an R package distributed within Apache Spark. It exposes Spark DataFrames, which was inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets. In this webinar, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: the exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.

Enabling Exploratory Analysis of Large Data with Apache Spark and R

Databricks

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Databricks

Spark Summit San Francisco 2016 - Ali Ghodsi Keynote

Databricks

Lessons from Running Large Scale Spark Workloads

Databricks

The next release of Apache Spark will be 2.0, marking a big milestone for the project. In this talk, I’ll cover how the community has grown to reach this point, and some of the major features in 2.0. The largest additions are performance improvements for Datasets, DataFrames and SQL through Project Tungsten, as well as a new Structured Streaming API that provides simpler and more powerful stream processing. I’ll also discuss a bit of what’s in the works for future versions.

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0

Databricks

New directions for Apache Spark in 2015

Databricks

Spark Application Carousel: Highlights of Several Applications Built with Spark

Databricks

Announcing Databricks Cloud (Spark Summit 2014)

Databricks

The prevailing issue when working with Operating Room (OR) scheduling within a hospital setting is that it is difficult to schedule and predict available OR block times. This leads to empty and unused operating rooms leading to longer waiting times for patients for their procedures. In this three-part session, Ayad Shammout and Denny will show: 1) How we tried to solve this problem using traditional DW techniques 2) How we took advantage of the DW capabilities in Apache Spark AND easily transition to Spark MLlib so we could more easily predict available OR block times resulting in better OR utilization and shorter wait times for patients. 3) Some of the key learnings we had when migrating from DW to Spark.

Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...

Databricks

ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics

Miklos Christine

What's new in pandas and the SciPy stack for financial users

Wes McKinney

Talend spark meetup 03042017 - Paris Spark Meetup

Modern Data Stack France

Spark Under the Hood - Meetup @ Data Science London

Databricks

Distributed ML in Apache Spark

Databricks

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)

Databricks

Yao Yao Mooyoung Lee https://github.com/yaowser/learn-spark/tree/master/Final%20project https://www.youtube.com/watch?v=IVMbSDS4q3A https://www.academia.edu/35646386/Teaching_Apache_Spark_Demonstrations_on_the_Databricks_Cloud_Platform https://www.slideshare.net/YaoYao44/teaching-apache-spark-demonstrations-on-the-databricks-cloud-platform-86063070/ Apache Spark is a fast and general engine for big data analytics processing with libraries for SQL, streaming, and advanced analytics Cloud Computing, Structured Streaming, Unified Analytics Integration, End-to-End Applications

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

Yao Yao

Databricks is going to Strata San Jose! This presentation introduces our flagship product, Databricks Cloud. More details: Databricks Cloud combines the power of Spark with a zero-management hosted platform and an initial set of applications built around common workflows to simplify the pain of provisioning a Spark cluster, exploring data, and building data products. Spark is a unified processing engine that eliminates the need to stitch together a disjointed set of tools, and provides support for interactive queries (Spark SQL), streaming data (Spark Streaming), machine learning (MLlib) and graph computation (GraphX) in a single common API across the entire pipeline. Additionally, Databricks Cloud reaps the benefit of the rapid pace of innovation in Spark, the fastest growing Apache project with over 400 contributors

Databricks @ Strata SJ

Databricks

This session will cover a series of use cases where you can store your data cheaply in files and analyze the data with Apache Spark, as well as use cases where you want to store your data into a different data source to access with Spark DataFrames. Here’s an example outline of some of the topics that will be covered in the talk: Use cases to store in file systems for use with Apache Spark: - Analyzing a large set of data files. - Doing ETL of a large amount of data. - Applying Machine Learning & Data Science to a large dataset. - Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally.

Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...

Databricks

This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes Speaker: Matthew Powers

Optimizing Delta/Parquet Data Lakes for Apache Spark

Databricks

What's hot (20)

Visualizing big data in the browser using spark

Enabling Exploratory Analysis of Large Data with Apache Spark and R

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Spark Summit San Francisco 2016 - Ali Ghodsi Keynote

Lessons from Running Large Scale Spark Workloads

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0

New directions for Apache Spark in 2015

Spark Application Carousel: Highlights of Several Applications Built with Spark

Announcing Databricks Cloud (Spark Summit 2014)

Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...

ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics

What's new in pandas and the SciPy stack for financial users

Talend spark meetup 03042017 - Paris Spark Meetup

Spark Under the Hood - Meetup @ Data Science London

Distributed ML in Apache Spark

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

Databricks @ Strata SJ

Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...

Optimizing Delta/Parquet Data Lakes for Apache Spark

Similar to HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)

Migrating structured data between Hadoop and RDBMS

Bouquet

Etu Solution Day 2014 Track-D: 掌握Impala和Spark

James Chen

Spark, the ultra-fast, general purpose big data computing platform provides some very flexible options for processing and accessing data. In a previous meetup we covered PySpark and the Schema RDD. In this session we reviewed and expanded on this, with an in-depth exploration of Spark SQL. - Overview of Spark in the Hadoop ecosystem - Deep dive into Spark SQL with step by steps on how to implement and use it If you have questions about the presentation or want to learn more about our services, please visit our website: http://casertaconcepts.com/

Spark SQL

Caserta

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

Lucidworks

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...

Michael Rys

Big Data and New Challenges for DBAs (Michael Naumov, LivePerson) Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover: What is Hadoop and why do I care? What do people do with Hadoop? How can SQL Server DBAs add Hadoop to their architecture?

מיכאל

sqlserver.co.il

Paris Data Geek - Spark Streaming

Djamel Zouaoui

Apache Spark for Everyone - Women Who Code Workshop

Amanda Casari

SQL on Hadoop

nvvrajesh

You have the perfect use case for your Spark applications – whether it be batch processing or super fast near-real time streaming — Now, where to store your valuable data!? In this talk we take a look at four storage options; HDFS, HBase, Solr and Kudu. With so many to choose from, which will fit your use case? What considerations should be taken into account? What are the pros and cons, what are the similarities and differences and how do they fit in with your Spark application? Learn the answers to these questions and more with a look at design patterns and techniques, and sample code to integrate into your application immediately. Walk away with the confidence to propose the right architecture for your use cases and the development know-how to implement and deliver with success.

Storage Engine Considerations for Your Apache Spark Applications with Mladen ...

Spark Summit

BDM25 - Spark runtime internal

David Lauzon

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012

Andrew Brust

Microsoft's Big Play for Big Data

Andrew Brust

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora

Andrey Kudryavtsev

Author: Stefan Papp, Data Architect at “The unbelievable Machine Company“. An overview of Big Data Processing engines with a focus on Apache Spark and Apache Flink, given at a Vienna Data Science Group meeting on 26 January 2017. Following questions are addressed: • What are big data processing paradigms and how do Spark 1.x/Spark 2.x and Apache Flink solve them? • When to use batch and when stream processing? • What is a Lambda-Architecture and a Kappa Architecture? • What are the best practices for your project?

20170126 big data processing

Vienna Data Science Group

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...

viirya

One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...

Tim Vaillancourt

A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.

Real time Analytics with Apache Kafka and Apache Spark

Rahul Jain

Introduction to Big Data Analytics using Apache Spark on HDInsights on Azure (SaaS) and/or HDP on Azure(PaaS) This workshop will provide an introduction to Big Data Analytics using Apache Spark using the HDInsights on Azure (SaaS) and/or HDP deployment on Azure(PaaS) . There will be a short lecture that includes an introduction to Spark, the Spark components. Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes. The lecture will be followed by demo . There will be a short lecture on Hadoop and how Spark and Hadoop interact and compliment each other. You will learn how to move data into HDFS using Spark APIs, create Hive table, explore the data with Spark and SQL, transform the data and then issue some SQL queries. We will be using Scala and/or PySpark for labs.

Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...

Alex Zeltov

Apache Spark has grown to be one of the largest open source communities in big data, with over 190 developers and dozens of companies contributing. The latest 1.0 release alone includes contributions from 117 people. A clean API, interactive shell, distributed in-memory computation, stream processing, interactive SQL, and libraries delivering everything from machine learning to graph processing make it an excellent unified platform to solve a number of problems. Apache Spark works very well with a growing number of big data solutions, including Cassandra and Hadoop. Come learn about Apache Spark and see how easy it is for you to get started using Spark to build your own high performance big data applications today.

Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms

DataStax Academy

Similar to HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution) (20)

Migrating structured data between Hadoop and RDBMS

Etu Solution Day 2014 Track-D: 掌握Impala和Spark

Spark SQL

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...

מיכאל

Paris Data Geek - Spark Streaming

Apache Spark for Everyone - Women Who Code Workshop

SQL on Hadoop

Storage Engine Considerations for Your Apache Spark Applications with Mladen ...

BDM25 - Spark runtime internal

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012

Microsoft's Big Play for Big Data

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora

20170126 big data processing

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...

One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...

Real time Analytics with Apache Kafka and Apache Spark

Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...

Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms

More from Modern Data Stack France

FinOps Data - FR - par Matthieu Rousseau & Ismael Goulani Matthieu Rousseau, CEO & Data Engineer Modeo. Ismael Goulani, CTO & Data Engineer Modeo. Retour sur le premier prix dans la catégorie "Solution Innovante" du challenge #LaNuitdelaData avec leur solution Stach, plateforme qui aide les équipes Data à mieux comprendre l'utilisation des données par les "consumers", son coût, et son impact carbone.

Stash - Data FinOPS

Modern Data Stack France

Dremio, une architecture simple et performance pour votre data lakehouse. Dans le monde de la donnée, Dremio, est inclassable ! C’est à la fois une plateforme de diffusion des données, un moteur SQL puissant basé sur Apache Arrow, Apache Calcite, Apache Parquet, un catalogue de données actif et aussi un Data Lakehouse ouvert ! Après avoir fait connaissance avec cette plateforme, il s’agira de préciser comment Dremio aide les organisations à relever les défis qui sont les leurs en matière de gestion et gouvernance des données facilitant l’exécution de leurs analyses dans le cloud (et/ou sur site) sans le coût, la complexité et le verrouillage des entrepôts de données.

Vue d'ensemble Dremio

Modern Data Stack France

Tomer Shiran est le fondateur et chef de produit (CPO) de Dremio. Tomer était le 4e employé et vice-président produit de MapR, un pionnier de l'analyse du Big Data. Il a également occupé de nombreux postes de gestion de produits et d'ingénierie chez IBM Research et Microsoft, et a fondé plusieurs sites Web qui ont servi des millions d'utilisateurs. Il est titulaire d'un Master en génie informatique de l'Université Carnegie Mellon et d'un Bachelor of Science en informatique du Technion - Israel Institute of Technology. Le Modern Data Stack meetup est ravi d'accueillir Tomer Shiran. Depuis Apache Drill, Apache Arrow maintenant Apache Iceberg, il ancre avec ses équipes des choix pour Dremio avec une vision de la plateforme de données “ouverte” basée sur des technologies open source. En plus, de ces valeurs qui évitent le verrouillage de clients dans des formats propriétaires, il a aussi le souci des coûts qu’engendrent de telles plateformes. Il sait aussi proposer un certain nombre de fonctionnalités qui transforment la gestion de données grâce à des initiatives telles Nessie qui ouvre la route du Data As Code et du transactionnel multi-processus. Le Modern Data Stack Meetup laisse “carte blanche” à Tomer Shiran afin qu’il nous partage son expérience et sa vision quant à l’Open Data Lakehouse.

From Data Warehouse to Lakehouse

Modern Data Stack France

Paris Spark Meetup - Trifacta - 03_04_2017

Modern Data Stack France

Construire le cluster le plus rapide pour l'analyse des datas : benchmarks sur un régresseur par Christopher Bourez (Axa Global Direct) Les toutes dernières technologies de calcul parallèle permettent de calculer des modèles de prédiction sur des big datas en des temps records. Avec le cloud est facilité l'accès à des configurations hardware modernes avec la possibilité d'une scalabilité éphémère durant les calculs. Des benchmarks sont réalisés sur plusieurs configuration hardware, allant de 1 instance à un cluster de 100 instances. Christopher Bourez, développeur & manager expert en systèmes d'information modernes chez Axa Global Direct. Alien thinker. Blog : http://christopher5106.github.io/

Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...

Modern Data Stack France

Système de recommandations de produits sur un site marchand par Koby KARP, Data Scientist (Equancy) & Hervé MIGNOT, Partner at Equancy La recommandation reste un outil clé pour la personnalisation des sites marchands et le sujet est loin d’être épuisé. La prise en compte de la particularité d’un marché peut nécessité d’adapter le traitement et les algorithmes utilisés. Après une revue des techniques de recommandations, nous présenterons la démarche spécifique que nous avons adopté. Le système a été développé sous Spark pour la préparation des données et le calcul des modèles de recommandations. Une API simple et son service ont été développé pour délivrer les recommandations aux applications clientes.

Hadoop France meetup Feb2016 : recommendations with spark

Modern Data Stack France

L'approche Model as Code par Benoit Grossin (EDF-R&D) et Matthieu Vautrot (Quantmetry) La mise en production de modèles est une étape charnière du cycle de vie d’un projet Data Science mené au sein d’une entreprise. On observe que cette partie est encore rarement industrialisée alors qu’elle est indispensable pour l’exploitation continue des résultats des modèles. Lorsque qu’un modèle finalisé présente un pouvoir prédictif satisfaisant en phase de développement, l'industrialisation de sa mise en production permet de le déployer et de l’exploiter de manière continue et automatique et ce, en minimisant la charge de travail. Notre intervention présentera notre retour d'expérience dans le contexte EDF sur la mise en place d'une approche capable de raccourcir voire d'annuler le temps de mise en production dans un environnement Hadoop et plus particulièrement Hive. Benoit Grossin est Ingénieur de Recherche chez EDF-R&D ICAM Matthieu Vautrot est Consultant Analytics & Big Data chez Quantmetry

Hug janvier 2016 -EDF

Modern Data Stack France

Industrialisation des processus Big Data chez CANAL+ par Pascal PERISSEAU et Stephen CLAIRVILLE (CanalPlus) L'intégration de la brique technique Big Data au sein d'une architecture décisionnelle déjà existante. Retour d’expérience sur les développements réalisés afin de faciliter l’intégration, la supervision, et l’exploitation des flux Hadoop dans notre écosystème décisionnel / présentation de la phase préparatoire de la mise à disposition des données aux data analysts et data scientists. Pascal PERISSEAU, responsable technique du pôle décisionnel et Big Data chez CANAL+ depuis 10 ans Stephen CLAIRVILLE, chef de projet tech. lead Big Data depuis 2 ans chez CANAL+

HUG France - 20160114 industrialisation_process_big_data CanalPlus

Modern Data Stack France

HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO) Le traitement et l’analyse de grand volume de données sont au cœur des activités des banques. Bon nombre d’acteurs des marchés financiers ont déjà adopté Hadoop sur de nombreux cas d’usage : gestion des risques, identification des opportunités commerciales, détection de fraude, surveillance des marchés… Une incroyable diversité de format doit être gérée. De ce point de vue, HBase est un choix naturel de base de données distribuée grâce à son modèle de donnée dynamique. Après une présentation générale des caractéristiques d’HBase, ce talk présente comment modéliser les informations traitées pour s’adapter à différents contextes d’utilisation. Pierre Bittner est le CTO de Scaled Risk, éditeur d’une plateforme Big Data dédiée aux institutions financières. Scaled Risk est bâtie sur HBase. Pierre intervient depuis 10 ans sur les SI bancaires.

HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)

Modern Data Stack France

Démarrer rapidement avec Apache Flink par Bilal Baltagi - Présentation de l'éco Système Apache Flink - Prise en main rapide Bilal Baltagi a obtenu un master en analyse des données à l'Université Paris Nord - Paris 13. Il est actuellement consultant décisionnel chez Sarenza à Paris. Il intervient sur toutes les phases d'un projet décisionnel et Big data: recueil des besoins, conceptions, réalisations et accompagnement des utilisateurs. Bilal est de plus en plus intéressé à l'intersection de la Big Data avec la Business Intelligence et aime jouer avec Apache Flink!

Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015

Modern Data Stack France

Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy Retour d'expérience sur la mise en place d'un Datalab avec Hadoop, Spark et ElasticSearch dans un environnement contraint. Nous allons exposer les méthodes qui nous ont permis d'améliorer la conception, le développement, les performances et la recette d'une application complexe en Spark. Jonathan Winandy est MOE, développeur Java/Scala spécialisé dans les pipelines de données.

Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...

Modern Data Stack France

Record Linkage, un cas d’utilisation en Spark ML par Alexis Seigneurin Le Record Linkage est le process qui consiste à trouver, dans un data set, les enregistrements qui représentent la même entité. Cette opération est particulièrement compliquée quand, comme nous, vous travaillez avec des données anonymisées. C’est là que le Machine Learning vient en renfort ! Nous avons implémenté un algorithme de Record Linkage en Spark SQL (DataFrames) et Spark ML plutôt que d’utiliser des règles statiques. Nous verrons le process de Feature Engineering, pourquoi nous avons dû étendre Spark DataFrames pour préserver des méta-données au travers du pipeline de traitement, et comment nous avons utilisé le Machine Learning pour réconcilier les enregistrements. Nous verrons enfin comment nous avons industrialisé cette application. Alexis Seigneurin : Développeur depuis 15 ans, j'attache beaucoup d'importance aux problématiques de traitement, d'analyse et de stockage de la donnée.Chez Ippon, j'interviens principalement sur des missions de conseil et d'architecture autour de technologies big data. Par ailleurs, j'anime la formation Spark chez Ippon.

Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015

Modern Data Stack France

Spark dataframe

Modern Data Stack France

June Spark meetup : search as recommandation

Modern Data Stack France

Spark ML par Xebia (Spark Meetup du 11/06/2015)

Modern Data Stack France

Spark meetup at viadeo

Modern Data Stack France

Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel

Modern Data Stack France

HUG Hadoop User Group du 29 Janvier 2015 chez HP. Slidedeck des 3 talks ci-dessous: #1: Traitement des données non structurées (Vidéos, images, …) avec Haven pour Hadoop, #2: Apache Flink: Fast and Reliable Large-scale Data Processing, #3: Etude de cas, projet Hadoop dans le domaine des RH avec Capgemini. La vectorisation des documents : rendre comparables des informations non structurées, de nouvelles opportunités pour un acteur de l’emploi

Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX

Modern Data Stack France

The Cascading (big) data application framework

Modern Data Stack France

During this presentation, Olivier will introduce Apache Tez. What it does ? Why is it seen by many as the Map Reduce v2. How is it helping Hive / Pig / Cascading and other increase their performance. Speaker: Olivier Renault is a Principal Solution Engineer at Hortonworks the company behind Hortonworks Data Platform. Olivier is an expert on how to deploy Hadoop at scale in a secure and performant manner.

Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Modern Data Stack France

More from Modern Data Stack France (20)

Stash - Data FinOPS

Vue d'ensemble Dremio

From Data Warehouse to Lakehouse

Paris Spark Meetup - Trifacta - 03_04_2017

Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...

Hadoop France meetup Feb2016 : recommendations with spark

Hug janvier 2016 -EDF

HUG France - 20160114 industrialisation_process_big_data CanalPlus

HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)

Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015

Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...

Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015

Spark dataframe

June Spark meetup : search as recommandation

Spark ML par Xebia (Spark Meetup du 11/06/2015)

Spark meetup at viadeo

Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel

Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX

The Cascading (big) data application framework

Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Recently uploaded

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...

gajnagarg

Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...

meghakumariji156

The presentation of the talk "Solid Pods Vs Personal Knowledge Graphs: Similarities and Differences" that was given by the PhD researcher Eleni Ilkou, in the 2nd Solid Symposium. Abstract: Solid Pods and Personal Knowledge Graphs are pioneering constructs within the Semantic Web community, each offering unique avenues for empowering individuals in the digital realm. Solid Pods stand as decentralized bastions of data control which grant users unprecedented authority over their digital presence, ensuring ownership, privacy and portability of personal data. Personal Knowledge Graphs infuse personal data and activity with contextual relevance, facilitating the synthesis and discovery of knowledge. This presentation aims to briefly reflect on the main similarities and differences among the Solid Pods and Personal Knowledge Graphs.

2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs

EleniIlkou

20240509 QFM015 Engineering Leadership Reading List April 2024.pdf

Matthew Sinclair

NALASOPARA CALL GIRL ❤ 07506202331 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL IN We are Providing :- ● – Private independent collage girls . ● – independent Models . ● – House Wife’s . ● – Private Independent House Wife’s ● – Corporate M.N.C Working Profiles . ● – Call Center Girls . ● – Live Band Girls . ●- Foreigners & Many More . Service type: 1.In call 2.out call 3. full Lip to Lip kiss 4.69 5.b-job without Condom 6. Hard Core sex & Much More. 7 Body to Body Touch 8 Kissing 9 Sucking Boobs and More 10 Enjoy by Hand 11 Relax By Oral 12 Sex with Happy Ending • In Call and Out Call Service • 3* 5* 7* Hotels Service • 24 Hours Available • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily Escorts

Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls

Priya Reddy

Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

HenryBriggs2

学校颁发一模一样【微信：95270640 】【(Curtin毕业证书)科廷大学毕业证成绩单】【微信：95270640 】学位证，留信认证（真实可查，永久存档）原件一模一样/offer、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问微信：95270640 【主营项目】一.毕业证【微信：95270640】成绩单、使馆认证、教育部认证、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【微信：95270640】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才办理Curtin毕业证书)科廷大学【微信：95270640】外观非常简单，由纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理Curtin毕业证书)科廷大学【微信：95270640】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理Curtin毕业证书)科廷大学毕业证价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理Curtin毕业证书)科廷大学毕业证是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样

ayvbos

APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...

APNIC

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts. A nutshell review for Hot Call girls in Abu Dhabi. My experience was superb with them this is the only recommended Call girls service in Abu Dhabi with verified Call girls. I am using their services from past 6 months they never ever disappointed me in any way. Let's just say if i asked them to provide me russian Call girls they fulfilled my request or even pakistani Call girls or indian Call girls in Abu Dhabi. They have their owen drivers who brings the Call girls in less time in any area of Abu Dhabi like as well. I'm writing here everything after experience their services in all conditions. So hopefully they will keeps working in the same way and makes more consumers like me. These guys are the best example of work with perfection. Pleased with Abu Dhabi Call girls and highly suggested to every one. Call girls whatsapp numbers in abu dhabi. Rent a girlfriend abu dhabi, stylish call girls abu dhabi any more than she had to. In a way, I guess I was the reason for her being like, Indian call girls abu dhabi, Indian call girl abu dhabi, indian call girls in abu dhabi, indian call girl agency abu dhabi, Pakistan call girls in abu dhabi, Pakistani call girl in abu dhabi, russian call girls in abu dhabi, russian call girl service in abu dhabi, indian call girls numbers abu dhabi, pakistani call girls number abu dhabi, teen call girls in abu dhabi, outcall call girls in abu dhabi Al Zahiyah Abu Dhabi Call Girls, Al Wahda Abu Dhabi Call Girls, Al Khalidiya Abu Dhabi Call Girls, Khalifa City Abu Dhabi Call Girls, Al Reem island Call Girls, Yas Island Call girls, Al lulu Island Call Girls, Nurai Island Call girls, Saadiyat Island Call Girls, Tourist Club Area Call Girls, Al Baraha Call Girls, Al Bateen Call Girls, Al Danah Call Girls, Al Dhafrah Call Girls, Al Falah City Call Girls, Al Ghadeer Call Girls she was, at least partially, and that made me feel even more responsible to help fix it. But I'd be lying if I didn't admit more sordid, and much more interesting, thoughts crept up right alongside that responsibility.

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts

Monica Sydney

原件一模一样【微信：6496090 】【田纳西大学毕业证成绩单】【微信：6496090 】学位证，留信认证（真实可查，永久存档）原件一模一样/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微6496090 【主营项目】一.毕业证【q微6496090】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微6496090】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才办理田纳西大学毕业证【微信：6496090 】外观非常简单，由纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理田纳西大学毕业证【微信：6496090 】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理田纳西大学毕业证【微信：6496090 】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理田纳西大学毕业证【微信：6496090 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

一比一原版田纳西大学毕业证如何办理

Best SEO Services Company in Dallas | Best SEO Agency Dallas

Digicorns Technologies

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in- call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500 /in-call, ₱6000/out-call Body to body massage with sex: ₱3000/in-call Full night for one person: ₱7000/in-call, ₱10000/out-call Full night for more than 1 person : Contact us at 🔝 9953056974🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974🔝. Thank you for considering us

call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Trump Diapers Over Dems t shirts Sweatshirt

rahman018755

APNIC Updates presented by Paul Wilson at ARIN 53

APNIC

Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room

meghakumariji156

原版制作美国爱荷华大学毕业证（iowa毕业证书）学位证网上存档可查【Q/微信741003700】原版仿制全套留学文凭材料（毕业证/成绩单（GPA成绩修改）/文凭学历证书/在读证明、毕业完成信、Offer录取通知书)；（真实可查）教育部学历认证、留信网认证、文凭认证、diploma、certificate、Degree、Transcript（实体公司，专业可靠）。 1:1完美还原海外各大学证书上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700。留学生归国服务中心面向美国英国澳洲加拿大留学生提供以下服务: 一、办理毕业证成绩单（学校原版1：1高仿真制作）二、留信认证（留学回国人员学历认证，网上存档可查，查到后付款）三、教育部学历认证（中国教育部留服中心存档可查,查到后付款）办理流程 1、收集客户办理信息； 2、客户付定金下单、公司确认到账转制作点做电子版； 3、电子版做好发给客户确认、电子版确认好转成品部做成品； 4、成品做好拍照或者视频确认再付余款； 5、快递给客户（国内顺丰，国外DHL）。本公司诚聘美国、加拿大、英国、新西兰、澳洲、法国、德国、新加坡各地代理人员，如果你有业余时间有兴趣就请联系我们校园代理，报酬丰厚。真诚期待您的加盟。请联系本公司学历认证顾问(QQ：741003700 微信：741003700)欢迎咨询！最专业的学历顾问，最有经验的顶尖OP为广大留学生的归国之路保卫护航！

原版制作美国爱荷华大学毕业证（iowa毕业证书）学位证网上存档可查

ydyuyu

在线制作约克大学毕业证（yu毕业证）在读证明认证可查【Q/微信741003700】原版仿制全套留学文凭材料（毕业证/成绩单（GPA成绩修改）/文凭学历证书/在读证明、毕业完成信、Offer录取通知书)；（真实可查）教育部学历认证、留信网认证、文凭认证、diploma、certificate、Degree、Transcript（实体公司，专业可靠）。 1:1完美还原海外各大学证书上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700。留学生归国服务中心面向美国英国澳洲加拿大留学生提供以下服务: 一、办理毕业证成绩单（学校原版1：1高仿真制作）二、留信认证（留学回国人员学历认证，网上存档可查，查到后付款）三、教育部学历认证（中国教育部留服中心存档可查,查到后付款）办理流程 1、收集客户办理信息； 2、客户付定金下单、公司确认到账转制作点做电子版； 3、电子版做好发给客户确认、电子版确认好转成品部做成品； 4、成品做好拍照或者视频确认再付余款； 5、快递给客户（国内顺丰，国外DHL）。本公司诚聘美国、加拿大、英国、新西兰、澳洲、法国、德国、新加坡各地代理人员，如果你有业余时间有兴趣就请联系我们校园代理，报酬丰厚。真诚期待您的加盟。请联系本公司学历认证顾问(QQ：741003700 微信：741003700)欢迎咨询！最专业的学历顾问，最有经验的顶尖OP为广大留学生的归国之路保卫护航！

在线制作约克大学毕业证（yu毕业证）在读证明认证可查

ydyuyu

学校颁发一模一样【微信：95270640 】【(Flinders毕业证书)弗林德斯大学毕业证成绩单】【微信：95270640 】学位证，留信认证（真实可查，永久存档）原件一模一样/offer、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问微信：95270640 【主营项目】一.毕业证【微信：95270640】成绩单、使馆认证、教育部认证、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【微信：95270640】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才办理Flinders毕业证书)弗林德斯大学【微信：95270640】外观非常简单，由纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理Flinders毕业证书)弗林德斯大学【微信：95270640】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理Flinders毕业证书)弗林德斯大学毕业证价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理Flinders毕业证书)弗林德斯大学毕业证是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样

ayvbos

Call girls Service in Ajman 0505086370 Ajman call girls. Being an enthusiast of women I examined many call girls in the prior 3 months but never got that much pleasure from anyone. I'm facing challenges in my own relationship that is the reason I become close to such girls but when you cannot get peace even paying for women that thing hurts more. I chose This call girl agency and tried one of their Indian call girl she gave me real comfort and after one week of that service, I chose another Pakistani call girls and again then Russian. so overall my practice was magnificent. The price is also moderate per hour. The plus point is the girl comes instantly to your location doesn't matter you are in Ajman or al Nahda or Deira or any area she comes undeviatingly to your hotel room. Ajman call girls, call girls in Ajman, indian call girls in Ajman, pakistani call girls in Ajman, Ajman escorts, Ajman call girls number, indian call girls numbers, pakistani call girls numbers, paid sex in Ajman, girls for night in Ajman, night service in Ajman

Call girls Service in Ajman 0505086370 Ajman call girls

Monica Sydney

20240508 QFM014 Elixir Reading List April 2024.pdf

Matthew Sinclair

Recently uploaded (20)

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...

Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...

2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs

20240509 QFM015 Engineering Leadership Reading List April 2024.pdf

Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls

Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样

APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts

一比一原版田纳西大学毕业证如何办理

Best SEO Services Company in Dallas | Best SEO Agency Dallas

call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7

Trump Diapers Over Dems t shirts Sweatshirt

APNIC Updates presented by Paul Wilson at ARIN 53

Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room

原版制作美国爱荷华大学毕业证（iowa毕业证书）学位证网上存档可查

在线制作约克大学毕业证（yu毕业证）在读证明认证可查

一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样

Call girls Service in Ajman 0505086370 Ajman call girls

20240508 QFM014 Elixir Reading List April 2024.pdf

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)

1. February 16th 2016 louis.rabiet@squidsolutions.com Migrating structured data between Hadoop and RDBMS

2. Who am I? • Full Stack engineer at Squid Solutions. • Specialised in Big data. • Fun fact: sleeping by myself in my tent on the top of the highest mountains of the world

3. What I do ? • Develop of an analytics toolbox. • No setup. No SQL. No compromise. • Generate SQL with a REST API. It is open source! https://github.com/openbouquet

4. Topic of today • You need Scalability? • You need a machine learning toolbox? Hadoop is the solution. •But you still need structured data? Our tool provide a solution. => We need both!

5. What does that mean? • Creation of dataset in Bouquet • Send the dataset to Spark • Enrich inside Spark • Re-injection in original database

6. How we do it? User input Relational DB SparkBouquet

7. Create and Send

8. How does it work? BouquetRelational DB Spark HDFS/ Tachyon Hive Metastore User select the data. Bouquet generate the corresponding SQL Code Kafka

9. How does it work? BouquetRelational DB Spark HDFS/ Tachyon Hive Metastore Data is read from the SQL database Kafka

10. How does it work? BouquetRelational DB Spark HDFS/ Tachyon Hive Metastore The BI tool creates an avro schema and send the data to Kafka Kafka

11. How does it work? BouquetRelational DB Spark Kafka HDFS/ Tachyon Hive Metastore Kafka Broker(s) receive the data

12. How does it work? BouquetRelational DB Spark HDFS/ Tachyon Hive Metastore Kafka The hive metastore is updated and the hdfs connectors writes into hdfs

13. How to keep the data structured? Use a schema registry (Avro in Kafka). each schema has a corresponding kafka topic and a distinct hive table. { "type": "record", "name": "ArtistGender", "fields" : [ {"name": "count", "type": "long"}, {"name": "gender", "type": "String"]} ] }

14. Challenges - Auto creation of topics/table in Hive for each datasets from Bouquet. - JDBC reads are too slow for something like Kafka. - Issue with types conversion: null is not supported for all cases for example (issue 272 on schema-registry). - Versions: Kafka 0.9.0, Tachyon 0.7.1, Spark 1.5.2 with HortonWorks 2.3.4 (Dec 2015) - Hive: Setting the warehouse directory. - In tachyon: Setting up hostname.

15. Tachyon? • Use it as in memory filesystem to replace HDFS. • Interact with Spark using the hdfs plugin. • Transparent from user point of view

16. Status Injection SQL -> Spark: OK Spark usage: OK Re-injection: In alpha stage.

17. Re-injection Two solutions: • Spark user notifies Bouquet that data has changed (using a custom function) • Bouquet pulls the data from spark

18. We use it for real! Collaborating with La Poste to be able to use Spark and the re-injection mechanism to use Bouquet and a geographical visualisation.

19. In the future • Notebook integration • We got a DSL for bouquet API, we may want to have built-in support spark. • Improve scalability (Bulk Unload and Kafka fine tuning)

20. QUESTIONS OPENBOUQUET.IO

21. DB HD Bouquet Architecture Bouquet Server SQL DATA JDBC Dynamic Caching & Indexing REST APIBusiness Modeling OAuth2 Generic Apps Multi-Tenant REDIS Elastic MongoDB JS/SDK Custom Apps

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)

Similar to HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution) (20)

More from Modern Data Stack France

More from Modern Data Stack France (20)

Recently uploaded

Recently uploaded (20)

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)