EuroPython 2017 - Bonono - Simple ETL in python 3.5+

Romain Dorgueil
Romain DorgueilMaker & Technical Advisor at MakerSquad
bonobo
Simple ETL in Python 3.5+
Romain Dorgueil
@rdorgueil
CTO/Hacker in Residence
Technical Co-founder
(Solo) Founder
Eng. Manager
Developer
L’Atelier BNP Paribas
WeAreTheShops
RDC Dist. Agency
Sensio/SensioLabs
AffiliationWizard
Felt too young in a Linux Cauldron
Dismantler of Atari computers
Basic literacy using a Minitel
Guitars & accordions
Off by one baby
Inception
STARTUP ACCELERATION PROGRAMS
NO HYPE, JUST BUSINESS
launchpad.atelier.net
bonobo
Simple ETL in Python 3.5+
• History of Extract Transform Load
• Concept ; Existing tools ; Related tools ; Ignition
• Practical Bonobo
• Tutorial ; Under the hood ; Demo ; Plugins & Extensions ; More demos
• Wrap up
• Present & future ; Resources ; Sprint ; Feedback
Plan
Once upon a time…
Extract Transform Load
• Not new. Popular concept in the 1970s [1] [2]
• Everywhere. Commerce, websites, marketing, finance, …
[1] https://en.wikipedia.org/wiki/Extract,_transform,_load
[2] https://www.sas.com/en_us/insights/data-management/what-is-etl.html
Extract Transform Load
foo
bar
baz
Extract Transform Load
Extract Transform Load
foo
bar
baz
Extract
Transform Load
Transform

more
Join


DB
HTTP POST
log?
Data Integration Tools
• Pentaho Data Integration (IDE/Java)
• Talend Open Studio (IDE/Java)
• CloverETL (IDE/Java)
Talend Open Studio
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
Data Integration Tools
• Java + IDE based, for most of them
• Data transformations are blocks
• IO flow managed by connections
• Execution
GUI first, eventually code :-(
In the Python world …
• Bubbles (https://github.com/stiivi/bubbles)
• PETL (https://github.com/alimanfoo/petl)
• (insert a few more here)
• and now… Bonobo (https://www.bonobo-project.org/)
You can also use amazing libraries including 

Joblib, Dask, Pandas, Toolz, 

but ETL is not their main focus.
Other scales…
Small Automation Tools
• Mostly aimed at simple recurring tasks.
• Cloud / SaaS only.

Big Data Tools
• Can do anything. And probably more. Fast.
• Either needs an infrastructure, or cloud based.
Story time
Partner 1 Data Integration
WE GOT DEALS !!!
Partner 1 Partner 2 Partner 3 Partner 4 Partner 5
Partner 6 Partner 7 Partner 8 Partner 9 …
Tiny bug there…
Can you fix it ?
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
My need
• A data integration / ETL tool using code as configuration.
• Preferably Python code.
• Something that can be tested (I mean, by a machine).
• Something that can use inheritance.
• Fast & cheap install on laptop, thought for servers too.
And that’s Bonobo
It is …
• A framework to write ETL jobs in Python 3 (3.5+)
• Using the same concepts as the old ETLs.
• You can use OOP!
Code first. Eventually a GUI will come.
It is NOT …
• Pandas / R Dataframes
• Dask (but will probably implement a dask.distributed
strategy someday)
• Luigi / Airflow
• Hadoop / Big Data / Big Query / …
• A monkey (spoiler : it’s an ape, damnit french language…)
Let’s see…
Create a project
~ $ pip install bonobo
~ $ bonobo init europython/tutorial
~ $ bonobo run europython/tutorial
TEMPLATE
~ $ bonobo run .
…demo
Write our own
import bonobo
def extract():
yield 'euro'
yield 'python'
yield '2017'
def transform(s):
return s.title()
def load(s):
print(s)
graph = bonobo.Graph(
extract,
transform,
load,
)
EXAMPLE_1
~ $ bonobo run .
…demo
EXAMPLE_1
~ $ bonobo run first.py
…demo
Under the hood…
graph = bonobo.Graph(…)
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
Graph…
class Graph:
def __init__(self, *chain):
self.edges = {}
self.nodes = []
self.add_chain(*chain)
def add_chain(self, *nodes, _input=None, _output=None):
# ...
bonobo.run(graph)
or in a shell…
$ bonobo run main.py
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
Context
+
Thread
Context
+
Thread
Context
+
Thread
Context
+
Thread
Context…
class GraphExecutionContext:
def __init__(self, graph, plugins, services):
self.graph = graph
self.nodes = [
NodeExecutionContext(node, parent=self)
for node in self.graph
]
self.plugins = [
PluginExecutionContext(plugin, parent=self)
for plugin in plugins
]
self.services = services
Strategy…
class ThreadPoolExecutorStrategy(Strategy):
def execute(self, graph, plugins, services):
context = self.create_context(graph, plugins, services)
executor = self.create_executor()
for node_context in context.nodes:
executor.submit(
self.create_runner(node_context)
)
while context.alive:
self.sleep()
executor.shutdown()
return context
</ implementation details >
Transformations
a.k.a
nodes in the graph
Functions
def get_more_infos(api, **row):
more = api.query(row.get('id'))
return {
**row,
**(more or {}),
}
Generators
def join_orders(order_api, **row):
for order in order_api.get(row.get('customer_id')):
yield {
**row,
**order,
}
Iterators
extract = (
'foo',
'bar',
'baz',
)
extract = range(0, 1001, 7)
Classes
class RiminizeThis:
def __call__(self, **row):
return {
**row,
'Rimini': 'Woo-hou-wo...',
}
Anything, as long as it’s callable().
Configurable classes
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
table_name = Option(str, default=‘customers')
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clien
return {
**row,
'is_customer': bool(customer),
}
Configurable classes
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
table_name = Option(str, default=‘customers')
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clien
return {
**row,
'is_customer': bool(customer),
}
Configurable classes
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
table_name = Option(str, default=‘customers')
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clien
return {
**row,
'is_customer': bool(customer),
}
Configurable classes
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
table_name = Option(str, default=‘customers')
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clien
return {
**row,
'is_customer': bool(customer),
}
Configurable classes
query_database = QueryDatabase(
table_name='test_customers',
database='database.testing',
)
Services
Define as names
class QueryDatabase(Configurable):
database = Service('database.default')
def call(self, database, **row):
return { … }
Runtime injection
import bonobo
graph = bonobo.Graph(...)
def get_services():
return {
‘database.default’: MyDatabaseImpl()
}
Bananas!
Library
bonobo.FileReader(…)
bonobo.CsvReader(…)
bonobo.JsonReader(…)
bonobo.PickleReader(…)
bonobo.ExcelReader(…)
bonobo.XMLReader(…)
… more to come
bonobo.FileWriter(…)
bonobo.CsvWriter(…)
bonobo.JsonWriter(…)
bonobo.PickleWriter(…)
bonobo.ExcelWriter(…)
bonobo.XMLWriter(…)
… more to come
Library
bonobo.Limit(limit)
bonobo.PrettyPrinter()
bonobo.Filter(…)
… more to come
Extensions & Plugins
Console Plugin
Jupyter Plugin
SQLAlchemy Extension
bonobo_sqlalchemy.Select(
query,
*,
pack_size=1000,
limit=None
)
bonobo_sqlalchemy.InsertOrUpdate(
table_name,
*,
fetch_columns,
insert_only_fields,
discriminant,
…
)
PREVIEW
Docker Extension
$ pip install bonobo[docker]
$ bonobo runc myjob.py
PREVIEW
Dev Kit
PREVIEW
https://github.com/python-bonobo/bonobo-devkit
More examples
?
EXAMPLE_1 -> EXAMPLE_2
…demo
• Use filesystem service.
• Write to a CSV
• Also write to JSON
EXAMPLE_3
Rimini open data
~/bdk/demos/europython2017
Europython attendees
featuring…
jupyter notebook

selenium & firefox
~/bdk/demos/sirene
French companies registry
featuring…
docker
postgresql
sql alchemy
Wrap up
Young
• First commit : December 2016
• 23 releases, ~420 commits, 4 contributors
• Current « stable » 0.4.3
• Target : 1.0 early 2018
Python 3.5+
• {**}
• async/await
• (…, *, …)
• GIL :(
1.0
• 100% Open-Source.
• Light & Focused.
• Very few dependencies.
• Comprehensive standard library.
• The rest goes to plugins and extensions.
Small scale
• 1 minute to install
• Easy to deploy
• NOT : Big Data, Statistics, Analytics …
• IS : Lean manufacturing for data
Interwebs are crazy
Data Processing for Humans
www.bonobo-project.org
docs.bonobo-project.org
bonobo-slack.herokuapp.com
github.com/python-bonobo
Let me know what you think!
Sprint
• Sprints at Europython are amazing
• Nice place to learn about Bonobo, basics, etc.
• Nice place to contribute while learning.
• You’re amazing.
Thank you!
@monkcage @rdorgueil
https://goo.gl/e25eoa
bonobo
@monkcage
1 of 82

Recommended

Simple ETL in python 3.5+ with Bonobo - PyParis 2017 by
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
1.7K views82 slides
Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes) by
Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)
Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)Romain Dorgueil
410 views37 slides
Simple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETL by
Simple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETLSimple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETL
Simple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETLRomain Dorgueil
617 views45 slides
Happy Go Programming Part 1 by
Happy Go Programming Part 1Happy Go Programming Part 1
Happy Go Programming Part 1Lin Yo-An
5.1K views92 slides
Introduction to Programming in Go by
Introduction to Programming in GoIntroduction to Programming in Go
Introduction to Programming in GoAmr Hassan
579 views105 slides
Streams for (Co)Free! by
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
2.1K views35 slides

More Related Content

What's hot

Python Coroutines, Present and Future by
Python Coroutines, Present and FuturePython Coroutines, Present and Future
Python Coroutines, Present and Futureemptysquare
21.8K views39 slides
JVMLS 2016. Coroutines in Kotlin by
JVMLS 2016. Coroutines in KotlinJVMLS 2016. Coroutines in Kotlin
JVMLS 2016. Coroutines in KotlinAndrey Breslav
3K views38 slides
Docopt by
DocoptDocopt
DocoptRené Ribaud
600 views14 slides
GoFFIng around with Ruby #RubyConfPH by
GoFFIng around with Ruby #RubyConfPHGoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPHGautam Rege
865 views39 slides
A tour of Python by
A tour of PythonA tour of Python
A tour of PythonAleksandar Veselinovic
1.1K views42 slides
Explaining ES6: JavaScript History and What is to Come by
Explaining ES6: JavaScript History and What is to ComeExplaining ES6: JavaScript History and What is to Come
Explaining ES6: JavaScript History and What is to ComeCory Forsyth
1.7K views50 slides

What's hot(20)

Python Coroutines, Present and Future by emptysquare
Python Coroutines, Present and FuturePython Coroutines, Present and Future
Python Coroutines, Present and Future
emptysquare21.8K views
JVMLS 2016. Coroutines in Kotlin by Andrey Breslav
JVMLS 2016. Coroutines in KotlinJVMLS 2016. Coroutines in Kotlin
JVMLS 2016. Coroutines in Kotlin
Andrey Breslav3K views
GoFFIng around with Ruby #RubyConfPH by Gautam Rege
GoFFIng around with Ruby #RubyConfPHGoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPH
Gautam Rege865 views
Explaining ES6: JavaScript History and What is to Come by Cory Forsyth
Explaining ES6: JavaScript History and What is to ComeExplaining ES6: JavaScript History and What is to Come
Explaining ES6: JavaScript History and What is to Come
Cory Forsyth1.7K views
Intro to PHP for Beginners by mtlgirlgeeks
Intro to PHP for BeginnersIntro to PHP for Beginners
Intro to PHP for Beginners
mtlgirlgeeks3.8K views
Imugi: Compiler made with Python by Han Lee
Imugi: Compiler made with PythonImugi: Compiler made with Python
Imugi: Compiler made with Python
Han Lee931 views
The Kotlin Programming Language by intelliyole
The Kotlin Programming LanguageThe Kotlin Programming Language
The Kotlin Programming Language
intelliyole9.4K views
7 Common Mistakes in Go (2015) by Steven Francia
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
Steven Francia23.5K views
Ruby on rails tips by BinBin He
Ruby  on rails tipsRuby  on rails tips
Ruby on rails tips
BinBin He1.1K views
Python高级编程(二) by Qiangning Hong
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong9.7K views
eMan Dev Meetup: Kotlin - A Language we should know it exists (part 02/03) 18... by eMan s.r.o.
eMan Dev Meetup: Kotlin - A Language we should know it exists (part 02/03) 18...eMan Dev Meetup: Kotlin - A Language we should know it exists (part 02/03) 18...
eMan Dev Meetup: Kotlin - A Language we should know it exists (part 02/03) 18...
eMan s.r.o.412 views
Beauty and Power of Go by Frank Müller
Beauty and Power of GoBeauty and Power of Go
Beauty and Power of Go
Frank Müller3.3K views
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies by it-people
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies
it-people200 views
Command line arguments that make you smile by Martin Melin
Command line arguments that make you smileCommand line arguments that make you smile
Command line arguments that make you smile
Martin Melin1.1K views
Take Flight - Using Fly with the Play Framework by Asher Glynn
Take Flight - Using Fly with the Play FrameworkTake Flight - Using Fly with the Play Framework
Take Flight - Using Fly with the Play Framework
Asher Glynn846 views

Similar to EuroPython 2017 - Bonono - Simple ETL in python 3.5+

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil by
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilPôle Systematic Paris-Region
1.7K views82 slides
Introduction of Pharo 5.0 by
Introduction of Pharo 5.0Introduction of Pharo 5.0
Introduction of Pharo 5.0Masashi Umezawa
2K views21 slides
Drupal 8: A story of growing up and getting off the island by
Drupal 8: A story of growing up and getting off the islandDrupal 8: A story of growing up and getting off the island
Drupal 8: A story of growing up and getting off the islandAngela Byron
923 views83 slides
FI MUNI 2012 - iOS Basics by
FI MUNI 2012 - iOS BasicsFI MUNI 2012 - iOS Basics
FI MUNI 2012 - iOS BasicsPetr Dvorak
960 views80 slides
Apache Calcite (a tutorial given at BOSS '21) by
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
2.5K views88 slides
Django at Scale by
Django at ScaleDjango at Scale
Django at Scalebretthoerner
8.4K views43 slides

Similar to EuroPython 2017 - Bonono - Simple ETL in python 3.5+(20)

Drupal 8: A story of growing up and getting off the island by Angela Byron
Drupal 8: A story of growing up and getting off the islandDrupal 8: A story of growing up and getting off the island
Drupal 8: A story of growing up and getting off the island
Angela Byron923 views
FI MUNI 2012 - iOS Basics by Petr Dvorak
FI MUNI 2012 - iOS BasicsFI MUNI 2012 - iOS Basics
FI MUNI 2012 - iOS Basics
Petr Dvorak960 views
Apache Calcite (a tutorial given at BOSS '21) by Julian Hyde
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde2.5K views
MFF UK - Introduction to iOS by Petr Dvorak
MFF UK - Introduction to iOSMFF UK - Introduction to iOS
MFF UK - Introduction to iOS
Petr Dvorak807 views
Living With Legacy Code by Rowan Merewood
Living With Legacy CodeLiving With Legacy Code
Living With Legacy Code
Rowan Merewood25.3K views
Ml goes fruitful by Preeti Negi
Ml goes fruitfulMl goes fruitful
Ml goes fruitful
Preeti Negi415 views
Python introduction by Roger Xia
Python introductionPython introduction
Python introduction
Roger Xia1.1K views
Discover GraphQL with Python, Graphene and Odoo by Odoo
Discover GraphQL with Python, Graphene and OdooDiscover GraphQL with Python, Graphene and Odoo
Discover GraphQL with Python, Graphene and Odoo
Odoo1.1K views
Django dev-env-my-way by Robert Lujo
Django dev-env-my-wayDjango dev-env-my-way
Django dev-env-my-way
Robert Lujo941 views
The goodies of zope, pyramid, and plone (2) by Dylan Jay
The goodies of zope, pyramid, and plone (2)The goodies of zope, pyramid, and plone (2)
The goodies of zope, pyramid, and plone (2)
Dylan Jay11.9K views
Dependency Injection Why is it awesome and Why should I care? by ColdFusionConference
Dependency Injection Why is it awesome and Why should I care?Dependency Injection Why is it awesome and Why should I care?
Dependency Injection Why is it awesome and Why should I care?
Python and Oracle : allies for best of data management by Laurent Leturgez
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
Laurent Leturgez192 views

Recently uploaded

SPICE PARK DEC2023 (6,625 SPICE Models) by
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models) Tsuyoshi Horigome
24 views218 slides
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptxlwang78
53 views19 slides
DevOps-ITverse-2023-IIT-DU.pptx by
DevOps-ITverse-2023-IIT-DU.pptxDevOps-ITverse-2023-IIT-DU.pptx
DevOps-ITverse-2023-IIT-DU.pptxAnowar Hossain
9 views45 slides
What is Unit Testing by
What is Unit TestingWhat is Unit Testing
What is Unit TestingSadaaki Emura
23 views25 slides
Instrumentation & Control Lab Manual.pdf by
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdfNTU Faisalabad
5 views63 slides
SUMIT SQL PROJECT SUPERSTORE 1.pptx by
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptxSumit Jadhav
13 views26 slides

Recently uploaded(20)

2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang7853 views
Instrumentation & Control Lab Manual.pdf by NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 13 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 views
Advances in micro milling: From tool fabrication to process outcomes by Shivendra Nandan
Advances in micro milling: From tool fabrication to process outcomesAdvances in micro milling: From tool fabrication to process outcomes
Advances in micro milling: From tool fabrication to process outcomes
Introduction to CAD-CAM.pptx by suyogpatil49
Introduction to CAD-CAM.pptxIntroduction to CAD-CAM.pptx
Introduction to CAD-CAM.pptx
suyogpatil495 views
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... by AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
AltinKaradagli9 views
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla68 views

EuroPython 2017 - Bonono - Simple ETL in python 3.5+