Managing Transactions
On Ethereum
with Apache AirïŹ‚ow
By Michael Ghen (@mikeghen)
October 2020
Managing
Transactions on
Ethereum
with Apache AirïŹ‚ow
Current:
● Mining Pool Operator
● Ph.D. Student at Drexel University
Previous:
● Data Architect at BeneïŹts Data Trust
● Data Platform Engineer at Cohealo
● Systems Engineer at Brandeis University
● Introduction to Ethereum
● Introduction to Apache AirïŹ‚ow
○ Core Ideas
● AirïŹ‚ow in Action
○ Complete Example
● Journey to AirïŹ‚ow
Ethereum is a Public Computing Platform
● Ethereum can be viewed as a transaction-based state machine
● Begin with a genesis state and incrementally execute
transactions to morph it into some ïŹnal state
Ether (ETH) is the currency for purchasing resources
Ether is meant to be used to pay for running smart contracts,
which are computer programs that run on an emulated computer
called the Ethereum Virtual Machine (EVM)
Apache AirïŹ‚ow is a WorkïŹ‚ow Management System
● a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want
to run, organized in a way that reïŹ‚ects their relationships and dependencies
● While DAGs describe how to run a workïŹ‚ow, Operators determine what
actually gets done
● Once an operator is instantiated, it is referred to as a task
AirïŹ‚ow is a platform to programmatically author, schedule and
monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
Apache AirïŹ‚ow is a WorkïŹ‚ow Management System
AirïŹ‚ow is a platform to programmatically author, schedule and
monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
Apache AirïŹ‚ow is a WorkïŹ‚ow Management System
AirïŹ‚ow is a platform to programmatically author, schedule and
monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
Apache AirïŹ‚ow
Core Ideas
DAGs
Operators (and Sensors)
Hooks
Tasks and Task Instances
Core Ideas: DAG
● a DAG describes how you want
to carry out your workïŹ‚ow
● DAGs are deïŹned in standard
Python ïŹles that are placed in
AirïŹ‚ow’s DAG_FOLDER
● You can have as many DAGs as
you want, each describing an
arbitrary number of tasks
● In general, each one should
correspond to a single logical
workïŹ‚ow.
https://airflow.apache.org/concepts.html#core-ideas
Core Ideas: Operators
● An operator describes a single task
in a workïŹ‚ow
● Describes what a task does
● In general, if two operators need to
share information, like a ïŹlename or
small amount of data, you should
consider combining them into a
single operator
● AirïŹ‚ow does have a feature for
operator cross-communication
called XCom
https://airflow.apache.org/concepts.html#core-ideas
BashOperator - executes a bash command
PythonOperator - calls an arbitrary Python function
EmailOperator - sends an email
SimpleHttpOperator - sends an HTTP request
MySqlOperator, SqliteOperator, PostgresOperator,
MsSqlOperator, OracleOperator, JdbcOperator, etc. -
executes a SQL command
Sensor - waits for a certain time, file, database row, S3 key, ..
Core Ideas: Hooks
● Hooks implement a common
interface when possible, and
act as a building block for
operators
● Hooks keep authentication
code and information out of
pipelines, centralized in the
metadata database
https://airflow.apache.org/concepts.html#core-ideas
Core Ideas: Tasks and Task Instances
● Once an operator is instantiated, it is referred to
as a “task”
● The instantiation deïŹnes speciïŹc values when
calling the abstract operator, and the
parameterized task becomes a node in a DAG.
● A task instance represents a speciïŹc run of a
task and is characterized as the combination of
a dag, a task, and a point in time
● Task instances also have an indicative state,
which could be “running”, “success”, “failed”,
“skipped”, “up for retry”, etc.
https://airflow.apache.org/concepts.html#core-ideas
Centralized Monitoring, Alerting, and Logging
● AirïŹ‚ow is an improvement over running
tasks with CRON because it has features
to support task monitoring, alerting, and
logging
● Task failures can be retried automatically
● Failures can trigger email alerts (or Slack,
Datadog, etc.)
● Logs generated from tasks can be stored
in a S3 or Google Cloud bucket
● Task failures can be easily identiïŹed,
investigated, and resolved
Example: Aggregate ETH to Centralized Wallet
DAG Example: Aggregate ETH
Python Operator Example: Check Balance
Python Operator Example: Check Balance
Python Operator Example: Check Balance
Custom Operators
Custom Operators
Custom Operators
Hooks Example: Ethereum Wallet Management
Custom Operators
Hooks Example: Web3 Connection Management
Custom Operators
Custom Operator Example:
Ethereum Transfer
Custom Operator Example:
Ethereum Transfer
Custom Operator Example:
Ethereum Transfer
Custom Operator Example:
Ethereum Transfer
Custom Operator Example:
Ethereum Transfer
Custom Operator Example:
Ethereum Transfer
Custom Operators
Relevant
Alternatives
● Apache NiïŹ
● Apache Beam
● Apache Camel
● Spotify’s Luigi
● Many other awesome projects
AirïŹ‚ow is not a data streaming
solution. Tasks do not move data
from one to the other easily.
Streaming and Batching
Apache AirïŹ‚ow
for IT Stakeholders
1. Integrate with any Information
System using Python
2. Automate the Development of
WorkïŹ‚ows (ConïŹg as Code)
3. Centralize WorkïŹ‚ow
Monitoring, Alerting, Logging
Thank you!
Michael Ghen, @mikeghen

Managing transactions on Ethereum with Apache Airflow

  • 1.
    Managing Transactions On Ethereum withApache AirïŹ‚ow By Michael Ghen (@mikeghen) October 2020
  • 2.
    Managing Transactions on Ethereum with ApacheAirïŹ‚ow Current: ● Mining Pool Operator ● Ph.D. Student at Drexel University Previous: ● Data Architect at BeneïŹts Data Trust ● Data Platform Engineer at Cohealo ● Systems Engineer at Brandeis University ● Introduction to Ethereum ● Introduction to Apache AirïŹ‚ow ○ Core Ideas ● AirïŹ‚ow in Action ○ Complete Example ● Journey to AirïŹ‚ow
  • 3.
    Ethereum is aPublic Computing Platform ● Ethereum can be viewed as a transaction-based state machine ● Begin with a genesis state and incrementally execute transactions to morph it into some ïŹnal state
  • 6.
    Ether (ETH) isthe currency for purchasing resources Ether is meant to be used to pay for running smart contracts, which are computer programs that run on an emulated computer called the Ethereum Virtual Machine (EVM)
  • 8.
    Apache AirïŹ‚ow isa WorkïŹ‚ow Management System ● a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reïŹ‚ects their relationships and dependencies ● While DAGs describe how to run a workïŹ‚ow, Operators determine what actually gets done ● Once an operator is instantiated, it is referred to as a task AirïŹ‚ow is a platform to programmatically author, schedule and monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
  • 9.
    Apache AirïŹ‚ow isa WorkïŹ‚ow Management System AirïŹ‚ow is a platform to programmatically author, schedule and monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
  • 10.
    Apache AirïŹ‚ow isa WorkïŹ‚ow Management System AirïŹ‚ow is a platform to programmatically author, schedule and monitor workïŹ‚ows. WorkïŹ‚ows are authored using Python.
  • 11.
    Apache AirïŹ‚ow Core Ideas DAGs Operators(and Sensors) Hooks Tasks and Task Instances
  • 12.
    Core Ideas: DAG ●a DAG describes how you want to carry out your workïŹ‚ow ● DAGs are deïŹned in standard Python ïŹles that are placed in AirïŹ‚ow’s DAG_FOLDER ● You can have as many DAGs as you want, each describing an arbitrary number of tasks ● In general, each one should correspond to a single logical workïŹ‚ow. https://airflow.apache.org/concepts.html#core-ideas
  • 13.
    Core Ideas: Operators ●An operator describes a single task in a workïŹ‚ow ● Describes what a task does ● In general, if two operators need to share information, like a ïŹlename or small amount of data, you should consider combining them into a single operator ● AirïŹ‚ow does have a feature for operator cross-communication called XCom https://airflow.apache.org/concepts.html#core-ideas BashOperator - executes a bash command PythonOperator - calls an arbitrary Python function EmailOperator - sends an email SimpleHttpOperator - sends an HTTP request MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, etc. - executes a SQL command Sensor - waits for a certain time, file, database row, S3 key, ..
  • 14.
    Core Ideas: Hooks ●Hooks implement a common interface when possible, and act as a building block for operators ● Hooks keep authentication code and information out of pipelines, centralized in the metadata database https://airflow.apache.org/concepts.html#core-ideas
  • 15.
    Core Ideas: Tasksand Task Instances ● Once an operator is instantiated, it is referred to as a “task” ● The instantiation deïŹnes speciïŹc values when calling the abstract operator, and the parameterized task becomes a node in a DAG. ● A task instance represents a speciïŹc run of a task and is characterized as the combination of a dag, a task, and a point in time ● Task instances also have an indicative state, which could be “running”, “success”, “failed”, “skipped”, “up for retry”, etc. https://airflow.apache.org/concepts.html#core-ideas
  • 16.
    Centralized Monitoring, Alerting,and Logging ● AirïŹ‚ow is an improvement over running tasks with CRON because it has features to support task monitoring, alerting, and logging ● Task failures can be retried automatically ● Failures can trigger email alerts (or Slack, Datadog, etc.) ● Logs generated from tasks can be stored in a S3 or Google Cloud bucket ● Task failures can be easily identiïŹed, investigated, and resolved
  • 18.
    Example: Aggregate ETHto Centralized Wallet
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Hooks Example: EthereumWallet Management
  • 27.
  • 28.
    Hooks Example: Web3Connection Management
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 39.
    Relevant Alternatives ● Apache NiïŹ ●Apache Beam ● Apache Camel ● Spotify’s Luigi ● Many other awesome projects AirïŹ‚ow is not a data streaming solution. Tasks do not move data from one to the other easily. Streaming and Batching
  • 40.
    Apache AirïŹ‚ow for ITStakeholders 1. Integrate with any Information System using Python 2. Automate the Development of WorkïŹ‚ows (ConïŹg as Code) 3. Centralize WorkïŹ‚ow Monitoring, Alerting, Logging
  • 41.