Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Flink's Table & SQL API - unified APIs for batch and stream processing

590 views

Published on

SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API.

Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer.

In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.

Published in: Software
  • Be the first to comment

Apache Flink's Table & SQL API - unified APIs for batch and stream processing

  1. 1. 1 Timo Walther Apache Flink PMC @twalthr With slides from Fabian Hueske Flink Meetup @ Amsterdam, March 2nd, 2017 Table & SQL API unified APIs for batch and stream processing
  2. 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  3. 3. Motivation 3
  4. 4. DataStream API is not for Everyone 4 § Writing DataStream programs is not easy • Stream processing technology spreads rapidly § Requires Knowledge & Skill • Stream processing concepts (time, state, windows, ...) • Programming experience (Java / Scala) § Program logic goes into UDFs • great for expressiveness • bad for optimization - need for manual tuning
  5. 5. Why not a Relational API? 5 § Relational APIs are declarative • User says what is needed • System decides how to compute it § Users do not specify implementation § Queries are efficiently executed § “Everybody” knows SQL!
  6. 6. Goals § Flink is a platform for distributed stream and batch data processing § Relational APIs as a unifying layer • Queries on batch tables terminate and produce a finite result • Queries on streaming tables run continuously and produce result stream § Same syntax & semantics for both queries 6
  7. 7. Table API & SQL 7
  8. 8. Table API & SQL § Flink features two relational APIs • Table API: LINQ-style API for Java & Scala (since Flink 0.9.0) • SQL: Standard SQL (since Flink 1.1.0) § Equivalent feature set (at the moment) • Table API and SQL can be mixed § Both are tightly integrated with Flink’s core APIs • DataStream • DataSet 8
  9. 9. Table API Example 9 val sensorData: DataStream[(String, Long, Double)] = ??? // convert DataSet into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, ’time, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  10. 10. SQL Example 10 val sensorData: DataStream[(String, Long, Double)] = ??? // register DataStream tableEnv.registerDataStream( "sensorData", sensorData, 'location, ’time, 'tempF) // query registered Table val avgTempCTable: Table = tableEnv .sql(""" SELECT FLOOR(rowtime() TO DAY) AS day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%' GROUP BY location, FLOOR(rowtime() TO DAY) """)
  11. 11. Architecture 2 APIs [SQL, Table API] * 2 backends [DataStream, DataSet] = 4 different translation paths? 11
  12. 12. Architecture 12
  13. 13. Architecture § Table API and SQL queries are translated into common logical plan representation. § Logical plans are translated and optimized depending on execution backend. § Plans are transformed into DataSet or DataStream programs. 13
  14. 14. Translation to Logical Plan 14 sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  15. 15. Translation to Optimized Plan 15
  16. 16. Translation to Flink Program 16
  17. 17. Current State (in master) § Batch SQL & Table API support • Selection, Projection, Sort, Inner & Outer Joins, Set operations • Windows for Slide, Tumble, Session § Streaming Table API support • Selection, Projection, Union • Windows for Slide, Tumble, Session § Streaming SQL • Selection, Projection, Union, Tumble, but … 17
  18. 18. Use Cases for Streaming SQL § Continuous ETL & Data Import § Live Dashboards & Reports § Ad-hoc Analytics & Exploration 18
  19. 19. Outlook: Dynamic Tables 19
  20. 20. Dynamic Tables § Dynamic tables change over time § Dynamic tables are treated like static batch tables • Dynamic tables are queried with standard SQL • A query returns another dynamic table § Stream ←→ Dynamic Table conversions without information loss • “Stream / Table Duality” 20
  21. 21. Stream to Dynamic Tables § Append: § Replace by key: 21
  22. 22. Querying Dynamic Tables § Dynamic tables change over time • A[t]: Table A at time t § Dynamic tables are queried with regular SQL • Result of a query changes as input table changes • q(A[t]): Evaluate query q on table A at time t § Query result is continuously updated as t progresses • Similar to maintaining a materialized view • t is current event time 22
  23. 23. Querying Dynamic Tables 23
  24. 24. Querying Dynamic Tables § Can we run any query on Dynamic Tables? No! § State may not grow infinitely as more data arrives • Set clean-up timeout or key constraints. § Input may only trigger partial re-computation § Queries with possibly unbounded state or computation are rejected 24
  25. 25. Dynamic Tables to Stream § Update: 25
  26. 26. Dynamic Tables to Stream § Add/Retract: 26
  27. 27. Result computation & refinement 27
  28. 28. Contributions welcome! § Huge interest and many contributors • Adding more window operators • Introducing dynamic tables § And there is a lot more to do • New operators and features for streaming and batch • Performance improvements • Tooling and integration § Try it out, give feedback, and start contributing! 28
  29. 29. 29 One day of hands-on Flink training One day of conference Tickets are on sale Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward
  30. 30. We are hiring! data-artisans.com/careers
  31. 31. 3 Thank you! @twalthr @ApacheFlink @dataArtisans

×