*This talk was first presented at http://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/225673273/*
Enterprise users today demand the ability to glean insights from their disparate data spread across varied transactional and analytics sources; hence, analytics application developers need the ability to connect to varied data & compute engines such as Spark, Flink, Cassandra, etc.
A key pain point for developers is the lack of a uniform API across data & compute engines, a limitation which adversely impacts developer productivity, while also restricting dataflow across different engines. DDF (Distributed DataFrame) is a simple but powerful API above and across multiple engines. Using DDF, developers reap significant benefits including (1) a unified and highly productive API for data/compute access, (2) the ability to process data at-source, bypassing the absolute requirement for a Hadoop data lake, and (3) future-proofing against rapidly shifting economics of specific data engines.
To date, DDF has been implemented on Spark, Flink, and other engines. In this talk we demonstrate, for the first time, a business-analyst-friendly realtime data exploration and visualization system working directly with Flink. We will show how a business user can enter natural-language questions of his/her data and get real-time answers from Flink, in the form of visual charts and tables. We’ll also show interaction with the DDF-on-Flink API at the developer level, and share our experience on the challenges and lessons learned in realizing this vision on Flink, and compare and contrast that with the same experience on Spark.
Christopher Nguyen, Co-Founder and CEO, Arimo
Rohit Rai, Founder and CEO of Tuplejump