SlideShare a Scribd company logo
1 of 35
Download to read offline
Flink and Hive Integration
- Unifying Enterprise Data Processing Systems
Bowen Li
Committer@Flink, Senior Engineer@Alibaba
Flink Forward Europe, Oct 2019
Agenda
● Background
● Motivations and Impacts
● Flink 1.9 - State of Union
● Flink 1.10 - What’s upcoming next?
● Q&A
Background
Flink aims at unifying data processing for streaming and batch use cases
● Batch is a special case of Streaming
○ bounded v.s. unbounded data streams
● Unified and simpler tech stack, deployment and operation
● Smaller learning cost for end users
○ developers, data scientists, analysts, etc
Why integrate with Hive?
● Hive is de facto standard for batch processing (ETL, analytics, etc) in enterprises
● Hive is widely adopted with huge user base
● Hive metastore is the center of big data ecosystem
● Hive users want lower latency and near real time data warehouse
● Streaming users usually have Hive deployment and need to access Hive data/metadata
Motivations and Impacts
● Strengthen Flink’s lead in stream processing by enhancing its metadata stack
● Advance Flink’s batch capabilities
● Provide unified solution for stream and batch processings using SQL
● Enrich and extend Flink’s ecosystem
● Promote Flink’s adoption
Platform Level Integration
● Not another “Hive on Xxx”
● The integration is fully in Flink repo
● Released as part of Flink
Flink 1.9 - State of the Union
check out more official documentations
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/hive/
Integrate with Hive Metadata
We developed brand-new Catalog APIs to
● integrate Flink with Hive Metadata/Metastore
● completely reshape Flink’s metadata stack for both streaming and batch
FLIP-30 - Unified Catalog APIs
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs)
Catalog APIs
Catalog
● Meta-Objects
○ Database, Table, View, Partition, Functions, TableStats, and PartitionStats
● Operations
○ Get/List/Exist/Create/Alter/Rename/Drop
Namespace of Catalog Table/View/Function
● fully qualified namespace as <catalog_name>.<db_name>.<object_name>
Catalogs are pluggable and opens opportunities for
● Catalog for Hive
● Catalog for Streams and MQ
○ Pulsar Catalog is in review
○ Kafka(Confluent Schema Registry), RabbitMQ, RocketMQ, etc
● Catalog for structured data
○ RDMS like MySQL, etc
● Catalogs for semi-structured data
○ ElasticSearch, HBase, Cassandra, etc
Roadmaps of Catalog APIs
CatalogManager
● manage all registered catalogs and resolve objects
● default to current catalog and current database in parsing queries
select * from currentCatalog.currentDb.myTable
>>>=== can be simplified as ===>>>
select * from myTable
Arichtecture
Flink Runtime
Query processing & optimization
Table API and SQL Catalog APIs
SQL Client/Zeppelin
Flink 1.9 provides two catalog implementations out of shelf.
● GenericInMemoryCatalog
○ in-memory non-persistent, per session, used by default
● HiveCatalog
○ compatible with multiple Hive versions
○ supports most Hive data types
○ can read/write Hive meta-objects
○ can persist Flink non-hive streaming and batch meta-objects to Hive Metastore
■ e.g. kafka/pulsar tables
Catalogs
HiveCatalog
Flink-Hive interoperability: Flink can read/write Hive
metadata thru HiveCatalog
Flink can persist non-hive metadata using Hive
Metastore as storage via HiveCatalog
Design
● Officially support Hive 2.3.4 and 1.2.1
● Rely on Hive’s own compatibility for other 2.x and 1.x
Supported Hive Versions
● Supports all Hive UDF interfaces via HiveCatalog
○ UDF
○ GenericUDF
○ UDTF
○ UDAF
○ GenericUDAFResolver
Support Hive UDF
● Can read/write non-partitioned Hive tables
● Can read partitioned Hive tables
● Supports partition-pruning
● Supports text, SequenceFile, ORC, Parquet
Hive Source and Sink
Example - Table API
TableEnvironment tEnv = ...
tEnv.registerCatalog(new HiveCatalog("myHive", "/opt/hive-conf/"));
tEnv.useCatalog("myHive");
tEnv.useDatabase("myDb");
// Read Hive meta-objects
Catalog myHive1 = tEnv.getCatalog("myHive1").get();
myHive1.listDatabases();
myHive1.listTables("myDb");
ObjectPath myTablePath = new ObjectPath("myDb", "myHiveTable");
myHive1.getTable(myTablePath);
myHive1.listPartitions(myTablePath);
// Query Hive data
tEnv.sqlQuery("select * from myHiveTable").print()
SQL Client Example
// Register catalogs in sql-cli-defaults.yml
SQL CLI Example (cont’)
Flink SQL> SHOW CATALOGS;
myhive1
default_catalog
Flink SQL> SHOW DATABASES;
myDb
Flink SQL> USE myhive1.myDb;
Flink SQL> SHOW TABLES;
myTable
Flink SQL> DRESCRIBE myHiveTable;
...
Flink SQL> SELECT * FROM myHiveTable;
...
● Integration with Hive was released in Flink 1.9 in Beta
● It lays the foundation for Flink’s integration with Hive
● This initiative led us to vast development and enhancement of Flink’s SQL stack and
metadata management capabilities
Summary
Flink 1.10 - What’s upcoming next?
Supports all Hive 1.2, 2.0, 2.1, 2.2, 2.3, 3.1 versions
● 1.2.0, 1.2.1
● 2.0.0, 2.0.1
● 2.1.0, 2.1.1
● 2.2.0
● 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.3.6
● 3.1.0, 3.1.1, 3.1.2
Support More Hive Versions
● HiveTableSource supports
○ projection pushdown
○ reading Hive views (in-progress)
● HiveTableSink supports
○ “INSERT OVERWRITE”
○ inserting into partitions, both dynamic and static
Hive Source and Sink Improvements
FLIP-57 Rework FunctionCatalog
● Problems to solve
○ Clarify and complete function categories
■ currently definition of temp functions is ambiguous
○ Enabling referencing functions with qualified names across catalog and database
○ Redefine function resolution order
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
Similarity:
● volatile, and lifespan within a session
Differences:
● Temp System Functions
○ has no namespace, can be referenced anywhere with function name
○ can override system/built-in functions
○ “CREATE TEMPORARY SYSTEM FUNCTION …”
● Temp Catalog Functions
○ has catalog/db namespaces
○ can override catalog functions
○ “CREATE TEMPORARY FUNCTION …”
Introducing Temp System Function v.s. Temp Catalog Function
Introducing Ambiguous v.s. Precise Function References
Ambiguous function reference Precise function reference (NEW!)
with only function name
SELECT <func>(col) FROM T
with fully or partially qualified name
SELECT <cat>.<db>.<func>(col) FROM T
SELECT <db>.<func>(col) FROM T
enables cross-catalog/db function reference
new resolution order:
1. temp system function
2. system(built-in) function
3. temp catalog function in current cat/db
4. catalog function in current cat/db
resolution order:
1. temp catalog function
2. catalog function
FLIP-68 Extend Core Table System with Pluggable Modules
● Motivations:
○ Enable users to integrate Flink with cores and built-in objects of other systems
■ Supports Hive built-in functions via pluggable modules
■ To name a few more upcoming - Geo and Machine Learning modules
○ Empower users to write code and do customized developement for Flink table core
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+Sys
tem+with+Modular+Plugins
Design of Pluggable Modules
Full set of DDLs and other commands supported via **Unified** SQL parser
● CREATE/DROP/ALTER/RENAME
○ CATALOG/DATABASE/FUNCTION
○ TABLE
■ CREATE TABLE AS SELECT ...
User Facing Points
● SQL Client - already supported a few but not thru SQL parser
● Table APIs - TableEnvironment#sqlUpdate(“...”) and sqlQuery(“...”)
https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement
FLIP-69 DDL Enhancement in SQL and Table APIs
Beyond Flink 1.10
● More Hive SQL compatibilities to minimize migration efforts
● More user custom objects, like serdes, storage handlers
● Performance optimization
● Feature parity with Hive (bucketing, etc)
● Enterprise readiness - security, governess
● Regular maintenance and releases
Xuefu Zhang, Rui Li, Terry Wang, Timo Walter,
Dawid Wysakowicz, Kurt Young, Jingsong Lee, Jark Wu, etc
Thanks to Other Contributors
Flink’s integrating with Hive
● helps Flink to realize its potential in batch processing
● is a critical step for Flink towards unified data processing
● reshapes Flink’s metadata management capabilities
● enhances Flink’s SQL stack
● brings mass user base and lays the foundation for enterprise adoption
Conclusions
Thanks!
Twitter: @Bowen__Li

More Related Content

What's hot

It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyHostedbyConfluent
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...Flink Forward
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Flink Forward
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureJoey Bolduc-Gilbert
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureYaroslav Tkachenko
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2Ververica
 

What's hot (20)

It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 

Similar to Flink and Hive integration - unifying enterprise data processing systems

Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Bowen Li
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Bowen Li
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
MOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMonica Li
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMonica Li
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaDataWorks Summit
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)Mathew Beane
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreVikalp Bhalia
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolAlex Rayón Jerez
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesOSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesNETWAYS
 

Similar to Flink and Hive integration - unifying enterprise data processing systems (20)

Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
MOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical Examples
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your Data
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Java one2013
Java one2013Java one2013
Java one2013
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStore
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesOSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
 

More from Bowen Li

Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupBowen Li
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkBowen Li
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...Bowen Li
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Bowen Li
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Bowen Li
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...Bowen Li
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpBowen Li
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupBowen Li
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupBowen Li
 

More from Bowen Li (11)

Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Flink and Hive integration - unifying enterprise data processing systems

  • 1. Flink and Hive Integration - Unifying Enterprise Data Processing Systems Bowen Li Committer@Flink, Senior Engineer@Alibaba Flink Forward Europe, Oct 2019
  • 2. Agenda ● Background ● Motivations and Impacts ● Flink 1.9 - State of Union ● Flink 1.10 - What’s upcoming next? ● Q&A
  • 3. Background Flink aims at unifying data processing for streaming and batch use cases ● Batch is a special case of Streaming ○ bounded v.s. unbounded data streams ● Unified and simpler tech stack, deployment and operation ● Smaller learning cost for end users ○ developers, data scientists, analysts, etc
  • 4. Why integrate with Hive? ● Hive is de facto standard for batch processing (ETL, analytics, etc) in enterprises ● Hive is widely adopted with huge user base ● Hive metastore is the center of big data ecosystem ● Hive users want lower latency and near real time data warehouse ● Streaming users usually have Hive deployment and need to access Hive data/metadata
  • 5. Motivations and Impacts ● Strengthen Flink’s lead in stream processing by enhancing its metadata stack ● Advance Flink’s batch capabilities ● Provide unified solution for stream and batch processings using SQL ● Enrich and extend Flink’s ecosystem ● Promote Flink’s adoption
  • 6. Platform Level Integration ● Not another “Hive on Xxx” ● The integration is fully in Flink repo ● Released as part of Flink
  • 7. Flink 1.9 - State of the Union check out more official documentations https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/hive/
  • 8. Integrate with Hive Metadata We developed brand-new Catalog APIs to ● integrate Flink with Hive Metadata/Metastore ● completely reshape Flink’s metadata stack for both streaming and batch FLIP-30 - Unified Catalog APIs (https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs)
  • 9. Catalog APIs Catalog ● Meta-Objects ○ Database, Table, View, Partition, Functions, TableStats, and PartitionStats ● Operations ○ Get/List/Exist/Create/Alter/Rename/Drop Namespace of Catalog Table/View/Function ● fully qualified namespace as <catalog_name>.<db_name>.<object_name>
  • 10. Catalogs are pluggable and opens opportunities for ● Catalog for Hive ● Catalog for Streams and MQ ○ Pulsar Catalog is in review ○ Kafka(Confluent Schema Registry), RabbitMQ, RocketMQ, etc ● Catalog for structured data ○ RDMS like MySQL, etc ● Catalogs for semi-structured data ○ ElasticSearch, HBase, Cassandra, etc Roadmaps of Catalog APIs
  • 11. CatalogManager ● manage all registered catalogs and resolve objects ● default to current catalog and current database in parsing queries select * from currentCatalog.currentDb.myTable >>>=== can be simplified as ===>>> select * from myTable
  • 12. Arichtecture Flink Runtime Query processing & optimization Table API and SQL Catalog APIs SQL Client/Zeppelin
  • 13. Flink 1.9 provides two catalog implementations out of shelf. ● GenericInMemoryCatalog ○ in-memory non-persistent, per session, used by default ● HiveCatalog ○ compatible with multiple Hive versions ○ supports most Hive data types ○ can read/write Hive meta-objects ○ can persist Flink non-hive streaming and batch meta-objects to Hive Metastore ■ e.g. kafka/pulsar tables Catalogs
  • 14. HiveCatalog Flink-Hive interoperability: Flink can read/write Hive metadata thru HiveCatalog Flink can persist non-hive metadata using Hive Metastore as storage via HiveCatalog
  • 16. ● Officially support Hive 2.3.4 and 1.2.1 ● Rely on Hive’s own compatibility for other 2.x and 1.x Supported Hive Versions
  • 17. ● Supports all Hive UDF interfaces via HiveCatalog ○ UDF ○ GenericUDF ○ UDTF ○ UDAF ○ GenericUDAFResolver Support Hive UDF
  • 18. ● Can read/write non-partitioned Hive tables ● Can read partitioned Hive tables ● Supports partition-pruning ● Supports text, SequenceFile, ORC, Parquet Hive Source and Sink
  • 19. Example - Table API TableEnvironment tEnv = ... tEnv.registerCatalog(new HiveCatalog("myHive", "/opt/hive-conf/")); tEnv.useCatalog("myHive"); tEnv.useDatabase("myDb"); // Read Hive meta-objects Catalog myHive1 = tEnv.getCatalog("myHive1").get(); myHive1.listDatabases(); myHive1.listTables("myDb"); ObjectPath myTablePath = new ObjectPath("myDb", "myHiveTable"); myHive1.getTable(myTablePath); myHive1.listPartitions(myTablePath); // Query Hive data tEnv.sqlQuery("select * from myHiveTable").print()
  • 20. SQL Client Example // Register catalogs in sql-cli-defaults.yml
  • 21. SQL CLI Example (cont’) Flink SQL> SHOW CATALOGS; myhive1 default_catalog Flink SQL> SHOW DATABASES; myDb Flink SQL> USE myhive1.myDb; Flink SQL> SHOW TABLES; myTable Flink SQL> DRESCRIBE myHiveTable; ... Flink SQL> SELECT * FROM myHiveTable; ...
  • 22. ● Integration with Hive was released in Flink 1.9 in Beta ● It lays the foundation for Flink’s integration with Hive ● This initiative led us to vast development and enhancement of Flink’s SQL stack and metadata management capabilities Summary
  • 23. Flink 1.10 - What’s upcoming next?
  • 24. Supports all Hive 1.2, 2.0, 2.1, 2.2, 2.3, 3.1 versions ● 1.2.0, 1.2.1 ● 2.0.0, 2.0.1 ● 2.1.0, 2.1.1 ● 2.2.0 ● 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.3.6 ● 3.1.0, 3.1.1, 3.1.2 Support More Hive Versions
  • 25. ● HiveTableSource supports ○ projection pushdown ○ reading Hive views (in-progress) ● HiveTableSink supports ○ “INSERT OVERWRITE” ○ inserting into partitions, both dynamic and static Hive Source and Sink Improvements
  • 26. FLIP-57 Rework FunctionCatalog ● Problems to solve ○ Clarify and complete function categories ■ currently definition of temp functions is ambiguous ○ Enabling referencing functions with qualified names across catalog and database ○ Redefine function resolution order https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
  • 27. Similarity: ● volatile, and lifespan within a session Differences: ● Temp System Functions ○ has no namespace, can be referenced anywhere with function name ○ can override system/built-in functions ○ “CREATE TEMPORARY SYSTEM FUNCTION …” ● Temp Catalog Functions ○ has catalog/db namespaces ○ can override catalog functions ○ “CREATE TEMPORARY FUNCTION …” Introducing Temp System Function v.s. Temp Catalog Function
  • 28. Introducing Ambiguous v.s. Precise Function References Ambiguous function reference Precise function reference (NEW!) with only function name SELECT <func>(col) FROM T with fully or partially qualified name SELECT <cat>.<db>.<func>(col) FROM T SELECT <db>.<func>(col) FROM T enables cross-catalog/db function reference new resolution order: 1. temp system function 2. system(built-in) function 3. temp catalog function in current cat/db 4. catalog function in current cat/db resolution order: 1. temp catalog function 2. catalog function
  • 29. FLIP-68 Extend Core Table System with Pluggable Modules ● Motivations: ○ Enable users to integrate Flink with cores and built-in objects of other systems ■ Supports Hive built-in functions via pluggable modules ■ To name a few more upcoming - Geo and Machine Learning modules ○ Empower users to write code and do customized developement for Flink table core https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+Sys tem+with+Modular+Plugins
  • 31. Full set of DDLs and other commands supported via **Unified** SQL parser ● CREATE/DROP/ALTER/RENAME ○ CATALOG/DATABASE/FUNCTION ○ TABLE ■ CREATE TABLE AS SELECT ... User Facing Points ● SQL Client - already supported a few but not thru SQL parser ● Table APIs - TableEnvironment#sqlUpdate(“...”) and sqlQuery(“...”) https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement FLIP-69 DDL Enhancement in SQL and Table APIs
  • 32. Beyond Flink 1.10 ● More Hive SQL compatibilities to minimize migration efforts ● More user custom objects, like serdes, storage handlers ● Performance optimization ● Feature parity with Hive (bucketing, etc) ● Enterprise readiness - security, governess ● Regular maintenance and releases
  • 33. Xuefu Zhang, Rui Li, Terry Wang, Timo Walter, Dawid Wysakowicz, Kurt Young, Jingsong Lee, Jark Wu, etc Thanks to Other Contributors
  • 34. Flink’s integrating with Hive ● helps Flink to realize its potential in batch processing ● is a critical step for Flink towards unified data processing ● reshapes Flink’s metadata management capabilities ● enhances Flink’s SQL stack ● brings mass user base and lays the foundation for enterprise adoption Conclusions