MAXIME BEAUCHEMIN / MAY 2017
Apache Superset
a modern, enterprise-ready business
intelligence web application
Slide Title Here
Optional subtitle goes here
• Cereal Entrepreneur: Creative. Embraces constraints. Solution-oriented. Tenacious.
• Be a Host: Collaborative. Anticipates the needs of others. Prepared. Authentic. Listens.
• Embrace the Adventure: Flexible. Risk tolerant. Always learning. Curious. Open-minded.
• Simplify: Prioritizes. Distills a problem to its essence. Makes and communicates clear decisions.
• Champion the Mission: Passionate. Committed. Optimistic. Puts the Airbnb community first.
• Every Frame Matters: Thinks holistically. Rigorous about quality. Appreciates the details and 

prioritizes the right ones.
MaximeBeauchemin
* explore your data!
* create interactive dashboards
* share discoveries
A modern, enterprise-ready business intelligence web application
The Superset Vision
Any visualizationAny database
microscopic
amount of
work
Event
Logs
MySQL
Dumps Gold
Hive Cluster
HDFS
Spark Cluster
Airpal
Airflow Scheduling
Presto Cluster
Silver
Hive Cluster
HDFS
Replication
Kafka
Sqoop
Tableau
S3
Superset !
Druid
* Data is too strategic to depend on vendors!
* Tableau doesn’t support Presto & Druid
* Tableau extracts don’t scale well
* Buying means lock-in and increasing costs
* We need deep integration with our stack
* We’re builders not buyers!
Live demo!
The Stack
* Python backend
* Flask App Builder (authentication, roles, CRUD, …)
* Pandas for rich analytics
* SqlAlchemy (multi dialect SQL toolkit)
* Javascript frontend
* React / Redux
* ES6 / Webpack / npm
* d3.js!
* nvd3.org
Security
* Provided by Flask AppBuilder (python web framework)
* Easily integrate with: OpenID, LDAP, REMOTE_USER,
OAUTH, or use the builtin database
* Ships with 3 roles:
* Admin (all access)
* Alpha (all access but cannot alter permissions)
* Gamma (per-datasource / table access)
* Fine grain controls to create new roles
A thin Semantic Layer
* Verbose names and long descriptions for columns and metrics
* Add calculated fields and metrics as SQL expression
* Set how individual columns are exposed
Caching!
* Provided by flask-cache
* Backends: memcache, redis, filesystem, memory, …
* cascading timeout configuration
* UI is upfront about staleness
* allows to force-refresh
Open Source!
* Grow a community! Joining the ASF
* UX -> smoothen common flows
* Improve SQL Lab
* Ship visualizations & controls as React.js
components
* DSL for the semantic layer
What’s next?
github.com/airbnb/superset
Q?

Apache Superset at Airbnb

  • 1.
    MAXIME BEAUCHEMIN /MAY 2017 Apache Superset a modern, enterprise-ready business intelligence web application
  • 2.
    Slide Title Here Optionalsubtitle goes here • Cereal Entrepreneur: Creative. Embraces constraints. Solution-oriented. Tenacious. • Be a Host: Collaborative. Anticipates the needs of others. Prepared. Authentic. Listens. • Embrace the Adventure: Flexible. Risk tolerant. Always learning. Curious. Open-minded. • Simplify: Prioritizes. Distills a problem to its essence. Makes and communicates clear decisions. • Champion the Mission: Passionate. Committed. Optimistic. Puts the Airbnb community first. • Every Frame Matters: Thinks holistically. Rigorous about quality. Appreciates the details and 
 prioritizes the right ones. MaximeBeauchemin
  • 3.
    * explore yourdata! * create interactive dashboards * share discoveries A modern, enterprise-ready business intelligence web application
  • 4.
    The Superset Vision AnyvisualizationAny database microscopic amount of work
  • 5.
    Event Logs MySQL Dumps Gold Hive Cluster HDFS SparkCluster Airpal Airflow Scheduling Presto Cluster Silver Hive Cluster HDFS Replication Kafka Sqoop Tableau S3 Superset ! Druid
  • 6.
    * Data istoo strategic to depend on vendors! * Tableau doesn’t support Presto & Druid * Tableau extracts don’t scale well * Buying means lock-in and increasing costs * We need deep integration with our stack * We’re builders not buyers!
  • 7.
  • 17.
    The Stack * Pythonbackend * Flask App Builder (authentication, roles, CRUD, …) * Pandas for rich analytics * SqlAlchemy (multi dialect SQL toolkit) * Javascript frontend * React / Redux * ES6 / Webpack / npm * d3.js! * nvd3.org
  • 18.
    Security * Provided byFlask AppBuilder (python web framework) * Easily integrate with: OpenID, LDAP, REMOTE_USER, OAUTH, or use the builtin database * Ships with 3 roles: * Admin (all access) * Alpha (all access but cannot alter permissions) * Gamma (per-datasource / table access) * Fine grain controls to create new roles
  • 19.
    A thin SemanticLayer * Verbose names and long descriptions for columns and metrics * Add calculated fields and metrics as SQL expression * Set how individual columns are exposed
  • 20.
    Caching! * Provided byflask-cache * Backends: memcache, redis, filesystem, memory, … * cascading timeout configuration * UI is upfront about staleness * allows to force-refresh
  • 21.
  • 22.
    * Grow acommunity! Joining the ASF * UX -> smoothen common flows * Improve SQL Lab * Ship visualizations & controls as React.js components * DSL for the semantic layer What’s next?
  • 23.
  • 24.