SlideShare a Scribd company logo
Integrate Apache Flink with Apache Hive
Xuefu Zhang,
-- Senior Staff Engineer, Alibaba
-- Hive PMC, Apache Member
Bowen Li
-- Senior Engineer, Alibaba
● Background
● Goals
● Technical Overview
● Current Progress
● Demo
● Q&A
Agenda
Background
● Flink has achieved an impressive success in stream processing
● Its scalability and potential has been proven and pushed further by Blink, now
part of Flink
● at Alibaba, Flink is used to process extremely large amount of data at an
unprecedented scale
1.7B Events/secEB Total PB Everyday 1T Event/Day
Streaming SQL
● Majority of stream analytics can be expressed in SQL
● Instead of programming, streaming SQL gives a user a non-programming way of
writing and deploying streaming jobs
● For SQL, there is need for metadata: sources, sinks, UDFs, views, etc
● The metadata needs a store
Streaming SQL (cont’d)
● Currently, Flink stores metadata in a memory
● The metadata is ill-organized, scattered around in different components
● Poor usability, interoperability, productivity, and manageability
● Problem #1: Flink lacking a well-organized, persistent store for its metadata
Batch and SQL
● Stream analytics users usually have also offline, batch analytics
● ETL is still an important use case for big data
● AI/ML is a major driving force behind both real-time and batch analytics
○ Gathering data to train and test a model, deploying it in stream processing
● SQL is the main tool processing big data for batch
● Unfortunately, users have to have a different engine for non-stream processing
Batch and SQL (cont’d)
● Flink has showed prevailing advantages over other solutions for
heavy-volume stream processing
● In Blink, we systematically explored Flink’s capabilities in batch processing,
and it shows great potential
Flink is the fastest due to its pipelined execution
Tez and Spark do not overlap 1st and 2nd stages
MapReduce is slow despite overlapping stages
A Comparative Performance Evaluation of Flink, Dongwon Kim, POSTECH, Flink Forward 2015
Batch and SQL (cont’d)
● Batch requires more on SQL capability
● Demands an even stronger metadata management
● Hive is the de facto standard for big data/batch processing on Hadoop
● The center of big data ecosystem is Hive metadata store
● Problem #2: Flink lacking a seamless access to Hive’s metadata and data
Heterogeneous Sources/Sinks
● Whether batch or streaming, Flink usually needs to access many data systems
○ Hive
○ MySQL
○ Key-Value stores
○ Kafka stream
● Different data catalogs
● Problem #3, Flink needs a unified interface to interact with different data catalogs
Beyond Flink
● Batch has a large use case then streaming
● Many Hive users are not Flink users
● We like Hive users can benefit from Flink’s batch capabilities
● Problem #4: Flink needing a story for Hive users
Four Goals
● Define Unified catalog API
● Implement In-Memory catalog and persistent catalog for Flink metadata
● Implement Hive catalog, enabling deep integration with Hive
● Provide Flink as Hive’s new execution engine (long-term)
Technical Overview
● Define unified catalog APIs (FLIP-30)
● Three implementations
○ Generic in-memory catalog
○ Generic persistent catalog (based on Hive metastore)
○ Hive catalog
● Hive data access
● Hive on Flink is not yet planned
Architecture
Flink Deployment
Flink Runtime
Query processing & optimization
Table API and SQL
SQL Client/Zeppelin
Catalog APIs
Catalog APIs and Implementations
GenericInMemoryCatalog
GenericHiveMetastoreCatalog
ReadableCatalog
ReadableWritableCatalog
HiveCatalog
Shim Layer:
HiveMetastoreClient
CatalogManager
TableEnvironment
inheritance reference
SQL Client HiveCatalogBase
Hive Metastore
Catalog APIs
Hive Data Connector
BatchTableFactory
HiveTableFactory
BatchTableSource
HiveTableSource
InputFormat
HiveTableInputFormat
BatchTableSink
HiveTableSink
OutputFormat
HiveTableOutputFormat
Read
Write
Hive Data
HiveTableSink HiveTableOutputFormat
Current Progress, Development Plan, and Demo
Bowen Li
Integrating Flink with Hive
This is a major change, work needs to be broken into parts
Part 1. Unified Catalog APIs (FLIP-30, FLINK-11275)
Part 2. Integrate Flink with Hive (FLINK-10556)
● for metadata thru Hive Metastore (FLINK-10744)
● for data (FLINK-10729)
Part 3. Support a complete set of SQL DDL/DML in Flink (FLINK-10232)
1 - Unified Catalog APIs
Flink current status:
○ Barely any catalog support
○ Has separate function catalog
Our highlighted improvements:
○ Introduced new catalog APIs and framework and connected to Calcite
● ReadableCatalog and ReadableWritableCatalog
● Meta-Objects: Database, Table, View, Partition, Functions, Stats, etc
● Operations: Create/Alter/Rename/Drop/Get/List/Exist/
○ Unified function catalog with new catalog APIs and supported persisting functions
1 - Unified Catalog APIs
Flink current status:
○ No well-structured hierarchy yet to manage metadata
○ Needs better SQL user experience when referencing metadata
Our highlighted improvements:
● Introduced two-level management structure: <catalog>.<db>.<meta-object>
● Added CatalogManager to resolve object name
select * from defaultCatalog.defaultDb.Tbl => select * from Tbl
● Made Flink case-insensitive to object names, similar to Hive, MySQL, Oracle
1 - Unified Catalog APIs
Flink current status:
No production-ready catalogs
Our highlighted improvements:
Developed three production-ready catalogs
■ GenericInMemoryCatalog - in-memory non-persistent, per session
■ HiveCatalog - compatible with Hive, read/write Hive meta-objects
■ GenericHiveMetastoreCatalog - persist Flink streaming and batch meta-objects
1 - Unified Catalog APIs
Catalogs are pluggable and opens opportunities to build catalogs for
○ Streams and MQ
● Kafka (Confluent Schema Registry), Kinesis, RabbitMQ, Pulsar, etc
○ Structured Data
● RDMS like MySQL, etc
○ Semi-Structured Data
● ElasticSearch, HBase, Cassandra, etc
○ Your other favorite data management systems
● …...
2 - Flink-Hive Integration - Metadata - HiveCatalog
Our highlighted improvements:
Developed HiveCatalog, via which Flink can
● read Hive meta-objects, like tables, views, functions, stats
● create and write Hive meta-objects to Hive Metastore such that Hive can consume
Flink can read and write Hive metadata thru HiveCatalogFlink can read and write Hive metadata thru HiveCatalog
2 - Flink-Hive Integration - Metadata - GenericHiveMetastoreCatalog
Our highlighted improvements:
● Persisted Flink’s metadata (both streaming and batch) by using Hive Metastore purely
as storage
HiveCatalog v.s. GenericHiveMetastoreCatalog
● for Hive batch metadata
● Hive can understand
● for any streaming and batch metadata
● Hive may not understand
Both are backed by Hive Metastore
2. Flink-Hive Integration - Data
Our highlighted improvements:
Connector:
○ Developed source and sink to read/write partition/non-partition tables and views
○ Supported partition-pruning
Data Types:
○ Supported for all Hive simple and complex (array, map, struct) data types
2. Flink-Hive Integration -
User defined functions and Version Compatibility
● Hive user defined functions
■ Supported Hive UDF
■ Working on supporting Hive GenericUDF, UDTF, UDAF
● Hive versions
■ Currently supports Hive 2.3.4 and 1.2.2 via shimming
■ Relies on Hive’s backward compatibility for 2.x and 1.x
● Working on direct support for more Hive versions, e.g. 2.1.1, 1.2.1
Timeline
First Targeted Flink release - 1.9.0, June 2019
Demo with Flink SQL CLI
• Query Hive Metadata
• Create Hive Source/Sink with HiveCatalog to read/write data
• Create CSV Source/Sink with GenericHiveMetastoreCatalog to read/write data
This tremendous amount of work cannot happen without help and support
Shout out to everyone in the community and our team
who have been helping us with designs, codes, feedbacks, etc!
● Flink is good at stream processing, but batch processing is equally important
● Flink has shown its potential in batch processing
● Flink/Hive integration benefits both communities
● This is a big effort
● We are taking a phased approach
● Your contribution is greatly welcome and appreciated!
Conclusions
Flink Forward China, Beijing, Dec 2019!
All major Chinese tech companies will attend.
Expected Attendees: 3,000+
Reach out to flink-forward-china@list.alibaba-inc.com for details!
Call for sponsors
Thanks!

More Related Content

What's hot

Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
Zohar Elkayam
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_ResumeAmit Kumar
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
Carst Vaartjes
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Zohar Elkayam
 
Adding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memoryAdding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memory
Zohar Elkayam
 
Metadata Synchronization in MySQL NDB Cluster 8.0
Metadata Synchronization in MySQL NDB Cluster 8.0Metadata Synchronization in MySQL NDB Cluster 8.0
Metadata Synchronization in MySQL NDB Cluster 8.0
Arnab Ray
 
What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)
Contrext Solutions
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
Zohar Elkayam
 
Directory Structure Changes in Laravel 5.3
Directory Structure Changes in Laravel 5.3Directory Structure Changes in Laravel 5.3
Directory Structure Changes in Laravel 5.3
DHRUV NATH
 
Where the &amp;$%! did this come from e resources in alma%2-f_primo a teachi...
Where the &amp;$%! did this come from  e resources in alma%2-f_primo a teachi...Where the &amp;$%! did this come from  e resources in alma%2-f_primo a teachi...
Where the &amp;$%! did this come from e resources in alma%2-f_primo a teachi...
Martin Patrick
 
Informatica Online Training
Informatica Online TrainingInformatica Online Training
Informatica Online Training
Rao Rao
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
Salehein Syed
 
Free Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryFree Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG library
Dobrica Pavlinušić
 

What's hot (14)

Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
 
Adding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memoryAdding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memory
 
Metadata Synchronization in MySQL NDB Cluster 8.0
Metadata Synchronization in MySQL NDB Cluster 8.0Metadata Synchronization in MySQL NDB Cluster 8.0
Metadata Synchronization in MySQL NDB Cluster 8.0
 
What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
Directory Structure Changes in Laravel 5.3
Directory Structure Changes in Laravel 5.3Directory Structure Changes in Laravel 5.3
Directory Structure Changes in Laravel 5.3
 
Where the &amp;$%! did this come from e resources in alma%2-f_primo a teachi...
Where the &amp;$%! did this come from  e resources in alma%2-f_primo a teachi...Where the &amp;$%! did this come from  e resources in alma%2-f_primo a teachi...
Where the &amp;$%! did this come from e resources in alma%2-f_primo a teachi...
 
Informatica Online Training
Informatica Online TrainingInformatica Online Training
Informatica Online Training
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Free Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryFree Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG library
 

Similar to Integrating Flink with Hive - Flink Forward SF 2019

Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
Bowen Li
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
HostedbyConfluent
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
HostedbyConfluent
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
Stitch Fix Algorithms
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
Goutam Biswas
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 

Similar to Integrating Flink with Hive - Flink Forward SF 2019 (20)

Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
G.Bs Presentation Of Guru Nanak Univ. National Conf.2009
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Apache flink
Apache flinkApache flink
Apache flink
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 

More from Bowen Li

Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
Bowen Li
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
Bowen Li
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Bowen Li
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
Bowen Li
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
Bowen Li
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Bowen Li
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
Bowen Li
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Bowen Li
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
Bowen Li
 

More from Bowen Li (13)

Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
 

Recently uploaded

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 

Recently uploaded (20)

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 

Integrating Flink with Hive - Flink Forward SF 2019

  • 1. Integrate Apache Flink with Apache Hive Xuefu Zhang, -- Senior Staff Engineer, Alibaba -- Hive PMC, Apache Member Bowen Li -- Senior Engineer, Alibaba
  • 2. ● Background ● Goals ● Technical Overview ● Current Progress ● Demo ● Q&A Agenda
  • 3. Background ● Flink has achieved an impressive success in stream processing ● Its scalability and potential has been proven and pushed further by Blink, now part of Flink ● at Alibaba, Flink is used to process extremely large amount of data at an unprecedented scale
  • 4. 1.7B Events/secEB Total PB Everyday 1T Event/Day
  • 5. Streaming SQL ● Majority of stream analytics can be expressed in SQL ● Instead of programming, streaming SQL gives a user a non-programming way of writing and deploying streaming jobs ● For SQL, there is need for metadata: sources, sinks, UDFs, views, etc ● The metadata needs a store
  • 6. Streaming SQL (cont’d) ● Currently, Flink stores metadata in a memory ● The metadata is ill-organized, scattered around in different components ● Poor usability, interoperability, productivity, and manageability ● Problem #1: Flink lacking a well-organized, persistent store for its metadata
  • 7. Batch and SQL ● Stream analytics users usually have also offline, batch analytics ● ETL is still an important use case for big data ● AI/ML is a major driving force behind both real-time and batch analytics ○ Gathering data to train and test a model, deploying it in stream processing ● SQL is the main tool processing big data for batch ● Unfortunately, users have to have a different engine for non-stream processing
  • 8. Batch and SQL (cont’d) ● Flink has showed prevailing advantages over other solutions for heavy-volume stream processing ● In Blink, we systematically explored Flink’s capabilities in batch processing, and it shows great potential
  • 9. Flink is the fastest due to its pipelined execution Tez and Spark do not overlap 1st and 2nd stages MapReduce is slow despite overlapping stages A Comparative Performance Evaluation of Flink, Dongwon Kim, POSTECH, Flink Forward 2015
  • 10. Batch and SQL (cont’d) ● Batch requires more on SQL capability ● Demands an even stronger metadata management ● Hive is the de facto standard for big data/batch processing on Hadoop ● The center of big data ecosystem is Hive metadata store ● Problem #2: Flink lacking a seamless access to Hive’s metadata and data
  • 11. Heterogeneous Sources/Sinks ● Whether batch or streaming, Flink usually needs to access many data systems ○ Hive ○ MySQL ○ Key-Value stores ○ Kafka stream ● Different data catalogs ● Problem #3, Flink needs a unified interface to interact with different data catalogs
  • 12. Beyond Flink ● Batch has a large use case then streaming ● Many Hive users are not Flink users ● We like Hive users can benefit from Flink’s batch capabilities ● Problem #4: Flink needing a story for Hive users
  • 13. Four Goals ● Define Unified catalog API ● Implement In-Memory catalog and persistent catalog for Flink metadata ● Implement Hive catalog, enabling deep integration with Hive ● Provide Flink as Hive’s new execution engine (long-term)
  • 14. Technical Overview ● Define unified catalog APIs (FLIP-30) ● Three implementations ○ Generic in-memory catalog ○ Generic persistent catalog (based on Hive metastore) ○ Hive catalog ● Hive data access ● Hive on Flink is not yet planned
  • 15. Architecture Flink Deployment Flink Runtime Query processing & optimization Table API and SQL SQL Client/Zeppelin Catalog APIs
  • 16. Catalog APIs and Implementations GenericInMemoryCatalog GenericHiveMetastoreCatalog ReadableCatalog ReadableWritableCatalog HiveCatalog Shim Layer: HiveMetastoreClient CatalogManager TableEnvironment inheritance reference SQL Client HiveCatalogBase Hive Metastore Catalog APIs
  • 18. Current Progress, Development Plan, and Demo Bowen Li
  • 19. Integrating Flink with Hive This is a major change, work needs to be broken into parts Part 1. Unified Catalog APIs (FLIP-30, FLINK-11275) Part 2. Integrate Flink with Hive (FLINK-10556) ● for metadata thru Hive Metastore (FLINK-10744) ● for data (FLINK-10729) Part 3. Support a complete set of SQL DDL/DML in Flink (FLINK-10232)
  • 20. 1 - Unified Catalog APIs Flink current status: ○ Barely any catalog support ○ Has separate function catalog Our highlighted improvements: ○ Introduced new catalog APIs and framework and connected to Calcite ● ReadableCatalog and ReadableWritableCatalog ● Meta-Objects: Database, Table, View, Partition, Functions, Stats, etc ● Operations: Create/Alter/Rename/Drop/Get/List/Exist/ ○ Unified function catalog with new catalog APIs and supported persisting functions
  • 21. 1 - Unified Catalog APIs Flink current status: ○ No well-structured hierarchy yet to manage metadata ○ Needs better SQL user experience when referencing metadata Our highlighted improvements: ● Introduced two-level management structure: <catalog>.<db>.<meta-object> ● Added CatalogManager to resolve object name select * from defaultCatalog.defaultDb.Tbl => select * from Tbl ● Made Flink case-insensitive to object names, similar to Hive, MySQL, Oracle
  • 22. 1 - Unified Catalog APIs Flink current status: No production-ready catalogs Our highlighted improvements: Developed three production-ready catalogs ■ GenericInMemoryCatalog - in-memory non-persistent, per session ■ HiveCatalog - compatible with Hive, read/write Hive meta-objects ■ GenericHiveMetastoreCatalog - persist Flink streaming and batch meta-objects
  • 23. 1 - Unified Catalog APIs Catalogs are pluggable and opens opportunities to build catalogs for ○ Streams and MQ ● Kafka (Confluent Schema Registry), Kinesis, RabbitMQ, Pulsar, etc ○ Structured Data ● RDMS like MySQL, etc ○ Semi-Structured Data ● ElasticSearch, HBase, Cassandra, etc ○ Your other favorite data management systems ● …...
  • 24. 2 - Flink-Hive Integration - Metadata - HiveCatalog Our highlighted improvements: Developed HiveCatalog, via which Flink can ● read Hive meta-objects, like tables, views, functions, stats ● create and write Hive meta-objects to Hive Metastore such that Hive can consume Flink can read and write Hive metadata thru HiveCatalogFlink can read and write Hive metadata thru HiveCatalog
  • 25. 2 - Flink-Hive Integration - Metadata - GenericHiveMetastoreCatalog Our highlighted improvements: ● Persisted Flink’s metadata (both streaming and batch) by using Hive Metastore purely as storage
  • 26. HiveCatalog v.s. GenericHiveMetastoreCatalog ● for Hive batch metadata ● Hive can understand ● for any streaming and batch metadata ● Hive may not understand Both are backed by Hive Metastore
  • 27. 2. Flink-Hive Integration - Data Our highlighted improvements: Connector: ○ Developed source and sink to read/write partition/non-partition tables and views ○ Supported partition-pruning Data Types: ○ Supported for all Hive simple and complex (array, map, struct) data types
  • 28. 2. Flink-Hive Integration - User defined functions and Version Compatibility ● Hive user defined functions ■ Supported Hive UDF ■ Working on supporting Hive GenericUDF, UDTF, UDAF ● Hive versions ■ Currently supports Hive 2.3.4 and 1.2.2 via shimming ■ Relies on Hive’s backward compatibility for 2.x and 1.x ● Working on direct support for more Hive versions, e.g. 2.1.1, 1.2.1
  • 29. Timeline First Targeted Flink release - 1.9.0, June 2019
  • 30. Demo with Flink SQL CLI • Query Hive Metadata • Create Hive Source/Sink with HiveCatalog to read/write data • Create CSV Source/Sink with GenericHiveMetastoreCatalog to read/write data
  • 31. This tremendous amount of work cannot happen without help and support Shout out to everyone in the community and our team who have been helping us with designs, codes, feedbacks, etc!
  • 32. ● Flink is good at stream processing, but batch processing is equally important ● Flink has shown its potential in batch processing ● Flink/Hive integration benefits both communities ● This is a big effort ● We are taking a phased approach ● Your contribution is greatly welcome and appreciated! Conclusions
  • 33. Flink Forward China, Beijing, Dec 2019! All major Chinese tech companies will attend. Expected Attendees: 3,000+ Reach out to flink-forward-china@list.alibaba-inc.com for details! Call for sponsors