Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
By OCTO & The RefinersPierre-Alain Jachiet - Aurélien Gervasi
PEN S URCE
ANALYTICS
on MONGO DB
with Schema
Pierre-Alain Jachiet Aurélien Gervasi
DATA
SCIENTIST
Data strategist Applied mathematician
Analysts, with developer skills
DATA
SCIENTIST
DATA
PROCESSOR
Data strategist Applied mathematician
Analysts, with developer skills
“
the major activity in the data science process is
identifying, accessing and preparing data
for analysis
From MongoDB data … to Superset Colors
OCTO TECHNOLOGY > THERE IS A BETTER WAY
So ! What's the point with
MongoDB ?>
MongoDB - The Leading NoSQL Database
Cassandra
Redis
Hbase
MongoDB - A NoSQL database in the big leagues of RDBMS
2013 2014 2015
2016 2017
https://db-engines.com/en/ranking
Populari...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Why MongoDB ?
Yes !
Semi-structured data ? Performance ?Scalability ?
And more gen...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Developers speak json …
XML
JSON
100
75
50
25
2008 2011 2014 2017
(= document with...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Developers speak json …
XML
JSON
100
75
50
25
2008 2011 2014 2017
(= document with...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
MongoDB, a common technology to store data
OCTO TECHNOLOGY > THERE IS A BETTER WAY
So far, so good>
And, one day,
someone has a dream…
So far, so good.
But times goes on and data goes in.
AI
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Please ! An analyst for this data !
Hey !
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Please ! An analyst for this data !
But NoSQL / json data is not
natural for analy...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Analysts use SQL
MongoDB : aggregation framework
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Analysts work with tables
Analyst land
… and relations
Developer land
Developer li...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Relational database
Code
Model layer
Application
= API to data
Map + Contract
Data...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
MongoDB
Code
Model layer
Application
= API to data
Map + Contract
Data
schema
Anal...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
The usual reaction…
MongoDB ExcelAccessSAS
Hack a pipeline to flatten the Mongo DB...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Mongo DB enterprise
solution>
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Mongo BI Connector
Mongo BI Connector
Developed for integration with SQL-based BI ...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Mongo BI Connector - Pro & Cons
Pro
☉ Official & Supported
Install it and go
Cons
...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
An open-source
solution ?>
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Open-source bricks put together !
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Mongo Connector : Connect them all !
Developed by MongoDB Labs
Python 2.6, 2.7, 3....
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Mongo Connector : Connect them all !
changes in DB
write new events
(differential)...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Doc-manager : Do you speak PostgreSQL ?
PostgreSQLMongoDB Mongo Connector
Pymongo-...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
{
_id: “12”,
f1: “fu”,
f2: true,
f3: 42,
f4: {
sf1: “pyparis”
sf2: 2017
},
f5: [
“...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Pymongo Schema : A mapping to rule them all
PostgreSQLMongoDB Mongo Connector
Pymo...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Demo>
OCTO TECHNOLOGY > THERE IS A BETTER WAY
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
Doc-manager
☉ Mongodb example Dat...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
OCTO TECHNOLOGY > THERE IS A BETTER WAY
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
Doc-manager
EXTRACT
Read entire d...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
OCTO TECHNOLOGY > THERE IS A BETTER WAY
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
Doc-manager
TOSQL
Read a schema t...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Same table
Column “cuisine”
New table
“restaurants__address__coord
OCTO TECHNOLOGY > THERE IS A BETTER WAY
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
Doc-manager
Check for updates in
...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
Doc-manager
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Time to play with your
analytics tools>
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Adding an open-source BI tool...
PostgreSQLMongoDB Mongo Connector
Pymongo-Schema
...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Now, in Superset colors !
“Superset is a data exploration platform designed
to be ...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
SQL lab
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Wrap up>
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Take home message
☉ Issues for analysts with NoSQL frameworks
> Developer oriented...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Come, use and contribute ! :)
pajachiet@octo.com
agervasi@octo.com
https://github....
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Bien rappeler qu’on est sur une stack open-source
☉ Collaborative
☉ Gratuite
But h...
OCTO TECHNOLOGY > THERE IS A BETTER WAY 53OCTO TECHNOLOGY > THERE IS A BETTER WAY
« J’analyse mes données
pour me comprend...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
MongoDB popularity
https://db-engines.com/en/ranking
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Analysts use SQL
The Mongo way : aggregation framework
Superset
Architecture des visualisations
Datasource (tables SQLa)Tables PostgreSQL Visualisations Tableau de bord
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Please ! An analyst for this data !
But analysts don’t speak json…
??? ?
Should we...
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and Aurélien Gervasi
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and Aurélien Gervasi
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and Aurélien Gervasi
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

3

Share

Download to read offline

Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and Aurélien Gervasi

Download to read offline

PyParis 2017
http://pyparis.org

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and Aurélien Gervasi

  1. 1. By OCTO & The RefinersPierre-Alain Jachiet - Aurélien Gervasi PEN S URCE ANALYTICS on MONGO DB with Schema
  2. 2. Pierre-Alain Jachiet Aurélien Gervasi DATA SCIENTIST
  3. 3. Data strategist Applied mathematician Analysts, with developer skills DATA SCIENTIST
  4. 4. DATA PROCESSOR Data strategist Applied mathematician Analysts, with developer skills
  5. 5. “ the major activity in the data science process is identifying, accessing and preparing data for analysis
  6. 6. From MongoDB data … to Superset Colors
  7. 7. OCTO TECHNOLOGY > THERE IS A BETTER WAY So ! What's the point with MongoDB ?>
  8. 8. MongoDB - The Leading NoSQL Database Cassandra Redis Hbase
  9. 9. MongoDB - A NoSQL database in the big leagues of RDBMS 2013 2014 2015 2016 2017 https://db-engines.com/en/ranking Popularity score by db-engines.com
  10. 10. OCTO TECHNOLOGY > THERE IS A BETTER WAY Why MongoDB ? Yes ! Semi-structured data ? Performance ?Scalability ? And more generally because it is natural for developers a pleasure to use from the developer perspective “ “ “ MongoDB is fast to get started “
  11. 11. OCTO TECHNOLOGY > THERE IS A BETTER WAY Developers speak json … XML JSON 100 75 50 25 2008 2011 2014 2017 (= document with schema) … the modern data exchange format …
  12. 12. OCTO TECHNOLOGY > THERE IS A BETTER WAY Developers speak json … XML JSON 100 75 50 25 2008 2011 2014 2017 (= document with schema) … the modern data exchange format … … and Mongo DB eats JSON
  13. 13. OCTO TECHNOLOGY > THERE IS A BETTER WAY MongoDB, a common technology to store data
  14. 14. OCTO TECHNOLOGY > THERE IS A BETTER WAY So far, so good>
  15. 15. And, one day, someone has a dream… So far, so good. But times goes on and data goes in. AI
  16. 16. OCTO TECHNOLOGY > THERE IS A BETTER WAY Please ! An analyst for this data ! Hey !
  17. 17. OCTO TECHNOLOGY > THERE IS A BETTER WAY Please ! An analyst for this data ! But NoSQL / json data is not natural for analysts ?
  18. 18. OCTO TECHNOLOGY > THERE IS A BETTER WAY Analysts use SQL MongoDB : aggregation framework
  19. 19. OCTO TECHNOLOGY > THERE IS A BETTER WAY Analysts work with tables Analyst land … and relations Developer land Developer like json … and imbrications
  20. 20. OCTO TECHNOLOGY > THERE IS A BETTER WAY Relational database Code Model layer Application = API to data Map + Contract Data schema Analyst landDeveloper land Analysts work with a data schemaDeveloper have a data model in the code
  21. 21. OCTO TECHNOLOGY > THERE IS A BETTER WAY MongoDB Code Model layer Application = API to data Map + Contract Data schema Analyst landDeveloper land > But mongoDB is schema-less Analysts work with a data schemaDeveloper have a data model in the code
  22. 22. OCTO TECHNOLOGY > THERE IS A BETTER WAY The usual reaction… MongoDB ExcelAccessSAS Hack a pipeline to flatten the Mongo DB data Pymongo + scripts Python notebooksCSV file Difficulties ☉ Hard job for the analyst ☉ Batch / no real time ☉ Not robust to changes => Difficult to industrialize
  23. 23. OCTO TECHNOLOGY > THERE IS A BETTER WAY Mongo DB enterprise solution>
  24. 24. OCTO TECHNOLOGY > THERE IS A BETTER WAY Mongo BI Connector Mongo BI Connector Developed for integration with SQL-based BI tools An SQL compatibility layer to MongoDB Mongo SQLD MongoDB Data Model Tableau MySQL Wire * DRDL = Document - Relational Definition Language Mongo DRDL* - SQL translator Data table - Post-processor Data json
  25. 25. OCTO TECHNOLOGY > THERE IS A BETTER WAY Mongo BI Connector - Pro & Cons Pro ☉ Official & Supported Install it and go Cons ☉ Commercial → MongoDB Enterprise license ☉ Closed-source → black box ☉ Limited performance ? ☉ Mandatory use of SQL wire protocol
  26. 26. OCTO TECHNOLOGY > THERE IS A BETTER WAY An open-source solution ?>
  27. 27. OCTO TECHNOLOGY > THERE IS A BETTER WAY Open-source bricks put together ! PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager (PostgreSQL) Streaming data from MongoDB to PostgreSQL
  28. 28. OCTO TECHNOLOGY > THERE IS A BETTER WAY Mongo Connector : Connect them all ! Developed by MongoDB Labs Python 2.6, 2.7, 3.3+ MongoDB 2.4, 2.6, 3.0, 3.2, and 3.4 Apache License 2.0 https://github.com/mongodb-labs/mongo-connector PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager Synchronize a Mongodb database with another database ☉ MongoDB ☉ SolR ☉ ElasticSearch
  29. 29. OCTO TECHNOLOGY > THERE IS A BETTER WAY Mongo Connector : Connect them all ! changes in DB write new events (differential) replication Oplog file propagate changes to other DB Primary Secondary Secondary Mongo Connector
  30. 30. OCTO TECHNOLOGY > THERE IS A BETTER WAY Doc-manager : Do you speak PostgreSQL ? PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager Developed by Hopwork Python 2.7, 3.4+ PostgreSQL 9.5 Apache License 2.0 https://github.com/Hopwork/mongo-connector-postgresql ☉ Translate a modification request from MongoConnector to the target database ☉ Speak the target database language
  31. 31. OCTO TECHNOLOGY > THERE IS A BETTER WAY { _id: “12”, f1: “fu”, f2: true, f3: 42, f4: { sf1: “pyparis” sf2: 2017 }, f5: [ “fu”, “bar”, “fubar” ] } Doc-manager : Do you speak PostgreSQL ? _id f1 f2 f3 12 “fu” true 42 _id value id_parent 1 ‘fu’ 12 2 ‘bar’ 12 3 ‘fubar’ 12 f4.sf1 f4.sf2 ‘pyparis’ 2017 Mongo DB world SQL world
  32. 32. OCTO TECHNOLOGY > THERE IS A BETTER WAY Pymongo Schema : A mapping to rule them all PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager “Homemade” Python 2.7 Apache License 2.0 https://github.com/pajachiet/pymongo-schema ☉ Scan the entire database to define its data model schema ☉ Generate a mapping file flattening the MongoDB schema into an SQL-compatible schema
  33. 33. OCTO TECHNOLOGY > THERE IS A BETTER WAY Demo>
  34. 34. OCTO TECHNOLOGY > THERE IS A BETTER WAY PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager ☉ Mongodb example Dataset: Restaurants in New York > Address & coordinates > Cuisi ne type > List of grades ☉ Nested data structure
  35. 35. OCTO TECHNOLOGY > THERE IS A BETTER WAY
  36. 36. OCTO TECHNOLOGY > THERE IS A BETTER WAY PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager EXTRACT Read entire database to extract its data model schema Returns: ☉ Field name and field nesting ☉ Field completion (frequence and ratio) ☉ Field type
  37. 37. OCTO TECHNOLOGY > THERE IS A BETTER WAY
  38. 38. OCTO TECHNOLOGY > THERE IS A BETTER WAY PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager TOSQL Read a schema to generate a MongoDB/SQL mapping. Returns: ☉ Mapping file used by the doc-manager
  39. 39. OCTO TECHNOLOGY > THERE IS A BETTER WAY Same table Column “cuisine” New table “restaurants__address__coord
  40. 40. OCTO TECHNOLOGY > THERE IS A BETTER WAY PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager Check for updates in the oplog file Send update commands with data Translate command and make SQL requests
  41. 41. OCTO TECHNOLOGY > THERE IS A BETTER WAY PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager
  42. 42. OCTO TECHNOLOGY > THERE IS A BETTER WAY Time to play with your analytics tools>
  43. 43. OCTO TECHNOLOGY > THERE IS A BETTER WAY Adding an open-source BI tool... PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager (PostgreSQL)
  44. 44. OCTO TECHNOLOGY > THERE IS A BETTER WAY Now, in Superset colors ! “Superset is a data exploration platform designed to be visual, intuitive and interactive.” PostgreSQLMongoDB Mongo Connector Pymongo-Schema Doc-manager Developed by AirBnB Python 2.7, 3.4, 3.5 Apache License 2.0 https://github.com/airbnb/superset Superset
  45. 45. OCTO TECHNOLOGY > THERE IS A BETTER WAY SQL lab
  46. 46. OCTO TECHNOLOGY > THERE IS A BETTER WAY Wrap up>
  47. 47. OCTO TECHNOLOGY > THERE IS A BETTER WAY Take home message ☉ Issues for analysts with NoSQL frameworks > Developer oriented languages > Nested data structure > Schema-less ☉ An open-source stack to unlock analysis of MongoDB data > Extract a MongoDB schema > Normalize the data model > Real time synchronization to PostgreSQL ☉ Currently running in production environments
  48. 48. OCTO TECHNOLOGY > THERE IS A BETTER WAY Come, use and contribute ! :) pajachiet@octo.com agervasi@octo.com https://github.com/mongodb-labs/mongo-connector https://github.com/Hopwork/mongo-connector-postgresql https://github.com/pajachiet/pymongo-schema https://github.com/airbnb/superset
  49. 49. OCTO TECHNOLOGY > THERE IS A BETTER WAY Bien rappeler qu’on est sur une stack open-source ☉ Collaborative ☉ Gratuite But hey ! It’s Open-Source !
  50. 50. OCTO TECHNOLOGY > THERE IS A BETTER WAY 53OCTO TECHNOLOGY > THERE IS A BETTER WAY « J’analyse mes données pour me comprendre » « J’apprends automatiquement à réaliser des tâches complexes à partir des données » « Je me dote d’outils avancés me permettant des analyses complexes et interactives » Dataviz Search Statistics Organisation pilotée par la donnée Learning
  51. 51. OCTO TECHNOLOGY > THERE IS A BETTER WAY MongoDB popularity https://db-engines.com/en/ranking
  52. 52. OCTO TECHNOLOGY > THERE IS A BETTER WAY Analysts use SQL The Mongo way : aggregation framework
  53. 53. Superset Architecture des visualisations Datasource (tables SQLa)Tables PostgreSQL Visualisations Tableau de bord
  54. 54. OCTO TECHNOLOGY > THERE IS A BETTER WAY Please ! An analyst for this data ! But analysts don’t speak json… ??? ? Should we call the developer ?
  • AntoineMAYSLICH

    Sep. 5, 2018
  • netconfigfr

    Nov. 20, 2017
  • TonTrn32

    Jul. 31, 2017

PyParis 2017 http://pyparis.org

Views

Total views

2,145

On Slideshare

0

From embeds

0

Number of embeds

75

Actions

Downloads

34

Shares

0

Comments

0

Likes

3

×