SlideShare a Scribd company logo
1 of 35
Download to read offline
Flink and Hive Integration
- Unifying Enterprise Data Processing Systems
Bowen Li
Committer@Flink, Senior Engineer@Alibaba
Flink Forward Europe, Oct 2019
Agenda
● Background
● Motivations and Impacts
● Flink 1.9 - State of Union
● Flink 1.10 - What’s upcoming next?
● Q&A
Background
Flink aims at unifying data processing for streaming and batch use cases
● Batch is a special case of Streaming
○ bounded v.s. unbounded data streams
● Unified and simpler tech stack, deployment and operation
● Smaller learning cost for end users
○ developers, data scientists, analysts, etc
Why integrate with Hive?
● Hive is de facto standard for batch processing (ETL, analytics, etc) in enterprises
● Hive is widely adopted with huge user base
● Hive metastore is the center of big data ecosystem
● Hive users want lower latency and near real time data warehouse
● Streaming users usually have Hive deployment and need to access Hive data/metadata
Motivations and Impacts
● Strengthen Flink’s lead in stream processing by enhancing its metadata stack
● Advance Flink’s batch capabilities
● Provide unified solution for stream and batch processings using SQL
● Enrich and extend Flink’s ecosystem
● Promote Flink’s adoption
Platform Level Integration
● Not another “Hive on Xxx”
● The integration is fully in Flink repo
● Released as part of Flink
Flink 1.9 - State of the Union
check out more official documentations
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/hive/
Integrate with Hive Metadata
We developed brand-new Catalog APIs to
● integrate Flink with Hive Metadata/Metastore
● completely reshape Flink’s metadata stack for both streaming and batch
FLIP-30 - Unified Catalog APIs
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs)
Catalog APIs
Catalog
● Meta-Objects
○ Database, Table, View, Partition, Functions, TableStats, and PartitionStats
● Operations
○ Get/List/Exist/Create/Alter/Rename/Drop
Namespace of Catalog Table/View/Function
● fully qualified namespace as <catalog_name>.<db_name>.<object_name>
Catalogs are pluggable and opens opportunities for
● Catalog for Hive
● Catalog for Streams and MQ
○ Pulsar Catalog is in review
○ Kafka(Confluent Schema Registry), RabbitMQ, RocketMQ, etc
● Catalog for structured data
○ RDMS like MySQL, etc
● Catalogs for semi-structured data
○ ElasticSearch, HBase, Cassandra, etc
Roadmaps of Catalog APIs
CatalogManager
● manage all registered catalogs and resolve objects
● default to current catalog and current database in parsing queries
select * from currentCatalog.currentDb.myTable
>>>=== can be simplified as ===>>>
select * from myTable
Arichtecture
Flink Runtime
Query processing & optimization
Table API and SQL Catalog APIs
SQL Client/Zeppelin
Flink 1.9 provides two catalog implementations out of shelf.
● GenericInMemoryCatalog
○ in-memory non-persistent, per session, used by default
● HiveCatalog
○ compatible with multiple Hive versions
○ supports most Hive data types
○ can read/write Hive meta-objects
○ can persist Flink non-hive streaming and batch meta-objects to Hive Metastore
■ e.g. kafka/pulsar tables
Catalogs
HiveCatalog
Flink-Hive interoperability: Flink can read/write Hive
metadata thru HiveCatalog
Flink can persist non-hive metadata using Hive
Metastore as storage via HiveCatalog
Design
● Officially support Hive 2.3.4 and 1.2.1
● Rely on Hive’s own compatibility for other 2.x and 1.x
Supported Hive Versions
● Supports all Hive UDF interfaces via HiveCatalog
○ UDF
○ GenericUDF
○ UDTF
○ UDAF
○ GenericUDAFResolver
Support Hive UDF
● Can read/write non-partitioned Hive tables
● Can read partitioned Hive tables
● Supports partition-pruning
● Supports text, SequenceFile, ORC, Parquet
Hive Source and Sink
Example - Table API
TableEnvironment tEnv = ...
tEnv.registerCatalog(new HiveCatalog("myHive", "/opt/hive-conf/"));
tEnv.useCatalog("myHive");
tEnv.useDatabase("myDb");
// Read Hive meta-objects
Catalog myHive1 = tEnv.getCatalog("myHive1").get();
myHive1.listDatabases();
myHive1.listTables("myDb");
ObjectPath myTablePath = new ObjectPath("myDb", "myHiveTable");
myHive1.getTable(myTablePath);
myHive1.listPartitions(myTablePath);
// Query Hive data
tEnv.sqlQuery("select * from myHiveTable").print()
SQL Client Example
// Register catalogs in sql-cli-defaults.yml
SQL CLI Example (cont’)
Flink SQL> SHOW CATALOGS;
myhive1
default_catalog
Flink SQL> SHOW DATABASES;
myDb
Flink SQL> USE myhive1.myDb;
Flink SQL> SHOW TABLES;
myTable
Flink SQL> DRESCRIBE myHiveTable;
...
Flink SQL> SELECT * FROM myHiveTable;
...
● Integration with Hive was released in Flink 1.9 in Beta
● It lays the foundation for Flink’s integration with Hive
● This initiative led us to vast development and enhancement of Flink’s SQL stack and
metadata management capabilities
Summary
Flink 1.10 - What’s upcoming next?
Supports all Hive 1.2, 2.0, 2.1, 2.2, 2.3, 3.1 versions
● 1.2.0, 1.2.1
● 2.0.0, 2.0.1
● 2.1.0, 2.1.1
● 2.2.0
● 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.3.6
● 3.1.0, 3.1.1, 3.1.2
Support More Hive Versions
● HiveTableSource supports
○ projection pushdown
○ reading Hive views (in-progress)
● HiveTableSink supports
○ “INSERT OVERWRITE”
○ inserting into partitions, both dynamic and static
Hive Source and Sink Improvements
FLIP-57 Rework FunctionCatalog
● Problems to solve
○ Clarify and complete function categories
■ currently definition of temp functions is ambiguous
○ Enabling referencing functions with qualified names across catalog and database
○ Redefine function resolution order
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
Similarity:
● volatile, and lifespan within a session
Differences:
● Temp System Functions
○ has no namespace, can be referenced anywhere with function name
○ can override system/built-in functions
○ “CREATE TEMPORARY SYSTEM FUNCTION …”
● Temp Catalog Functions
○ has catalog/db namespaces
○ can override catalog functions
○ “CREATE TEMPORARY FUNCTION …”
Introducing Temp System Function v.s. Temp Catalog Function
Introducing Ambiguous v.s. Precise Function References
Ambiguous function reference Precise function reference (NEW!)
with only function name
SELECT <func>(col) FROM T
with fully or partially qualified name
SELECT <cat>.<db>.<func>(col) FROM T
SELECT <db>.<func>(col) FROM T
enables cross-catalog/db function reference
new resolution order:
1. temp system function
2. system(built-in) function
3. temp catalog function in current cat/db
4. catalog function in current cat/db
resolution order:
1. temp catalog function
2. catalog function
FLIP-68 Extend Core Table System with Pluggable Modules
● Motivations:
○ Enable users to integrate Flink with cores and built-in objects of other systems
■ Supports Hive built-in functions via pluggable modules
■ To name a few more upcoming - Geo and Machine Learning modules
○ Empower users to write code and do customized developement for Flink table core
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+Sys
tem+with+Modular+Plugins
Design of Pluggable Modules
Full set of DDLs and other commands supported via **Unified** SQL parser
● CREATE/DROP/ALTER/RENAME
○ CATALOG/DATABASE/FUNCTION
○ TABLE
■ CREATE TABLE AS SELECT ...
User Facing Points
● SQL Client - already supported a few but not thru SQL parser
● Table APIs - TableEnvironment#sqlUpdate(“...”) and sqlQuery(“...”)
https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement
FLIP-69 DDL Enhancement in SQL and Table APIs
Beyond Flink 1.10
● More Hive SQL compatibilities to minimize migration efforts
● More user custom objects, like serdes, storage handlers
● Performance optimization
● Feature parity with Hive (bucketing, etc)
● Enterprise readiness - security, governess
● Regular maintenance and releases
Xuefu Zhang, Rui Li, Terry Wang, Timo Walter,
Dawid Wysakowicz, Kurt Young, Jingsong Lee, Jark Wu, etc
Thanks to Other Contributors
Flink’s integrating with Hive
● helps Flink to realize its potential in batch processing
● is a critical step for Flink towards unified data processing
● reshapes Flink’s metadata management capabilities
● enhances Flink’s SQL stack
● brings mass user base and lays the foundation for enterprise adoption
Conclusions
Thanks!
Twitter: @Bowen__Li

More Related Content

What's hot

Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 
Do Flink on Web with FLOW
Do Flink on Web with FLOWDo Flink on Web with FLOW
Do Flink on Web with FLOWDongwon Kim
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Flink Forward
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink Forward
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
 
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...Flink Forward
 

What's hot (20)

Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Do Flink on Web with FLOW
Do Flink on Web with FLOWDo Flink on Web with FLOW
Do Flink on Web with FLOW
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
 

Similar to Unify Enterprise Data Processing System Platform Level Integration of Flink and Hive - Bowen Li, Alibaba

Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Bowen Li
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Bowen Li
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
MOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMonica Li
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMonica Li
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaDataWorks Summit
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)Mathew Beane
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreVikalp Bhalia
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolAlex Rayón Jerez
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesOSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesNETWAYS
 

Similar to Unify Enterprise Data Processing System Platform Level Integration of Flink and Hive - Bowen Li, Alibaba (20)

Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
MOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical ExamplesMOUG17: SQLT Utility for Tuning - Practical Examples
MOUG17: SQLT Utility for Tuning - Practical Examples
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your Data
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Java one2013
Java one2013Java one2013
Java one2013
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStore
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesOSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life Examples
 

More from Flink Forward

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 

More from Flink Forward (20)

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Unify Enterprise Data Processing System Platform Level Integration of Flink and Hive - Bowen Li, Alibaba

  • 1. Flink and Hive Integration - Unifying Enterprise Data Processing Systems Bowen Li Committer@Flink, Senior Engineer@Alibaba Flink Forward Europe, Oct 2019
  • 2. Agenda ● Background ● Motivations and Impacts ● Flink 1.9 - State of Union ● Flink 1.10 - What’s upcoming next? ● Q&A
  • 3. Background Flink aims at unifying data processing for streaming and batch use cases ● Batch is a special case of Streaming ○ bounded v.s. unbounded data streams ● Unified and simpler tech stack, deployment and operation ● Smaller learning cost for end users ○ developers, data scientists, analysts, etc
  • 4. Why integrate with Hive? ● Hive is de facto standard for batch processing (ETL, analytics, etc) in enterprises ● Hive is widely adopted with huge user base ● Hive metastore is the center of big data ecosystem ● Hive users want lower latency and near real time data warehouse ● Streaming users usually have Hive deployment and need to access Hive data/metadata
  • 5. Motivations and Impacts ● Strengthen Flink’s lead in stream processing by enhancing its metadata stack ● Advance Flink’s batch capabilities ● Provide unified solution for stream and batch processings using SQL ● Enrich and extend Flink’s ecosystem ● Promote Flink’s adoption
  • 6. Platform Level Integration ● Not another “Hive on Xxx” ● The integration is fully in Flink repo ● Released as part of Flink
  • 7. Flink 1.9 - State of the Union check out more official documentations https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/hive/
  • 8. Integrate with Hive Metadata We developed brand-new Catalog APIs to ● integrate Flink with Hive Metadata/Metastore ● completely reshape Flink’s metadata stack for both streaming and batch FLIP-30 - Unified Catalog APIs (https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs)
  • 9. Catalog APIs Catalog ● Meta-Objects ○ Database, Table, View, Partition, Functions, TableStats, and PartitionStats ● Operations ○ Get/List/Exist/Create/Alter/Rename/Drop Namespace of Catalog Table/View/Function ● fully qualified namespace as <catalog_name>.<db_name>.<object_name>
  • 10. Catalogs are pluggable and opens opportunities for ● Catalog for Hive ● Catalog for Streams and MQ ○ Pulsar Catalog is in review ○ Kafka(Confluent Schema Registry), RabbitMQ, RocketMQ, etc ● Catalog for structured data ○ RDMS like MySQL, etc ● Catalogs for semi-structured data ○ ElasticSearch, HBase, Cassandra, etc Roadmaps of Catalog APIs
  • 11. CatalogManager ● manage all registered catalogs and resolve objects ● default to current catalog and current database in parsing queries select * from currentCatalog.currentDb.myTable >>>=== can be simplified as ===>>> select * from myTable
  • 12. Arichtecture Flink Runtime Query processing & optimization Table API and SQL Catalog APIs SQL Client/Zeppelin
  • 13. Flink 1.9 provides two catalog implementations out of shelf. ● GenericInMemoryCatalog ○ in-memory non-persistent, per session, used by default ● HiveCatalog ○ compatible with multiple Hive versions ○ supports most Hive data types ○ can read/write Hive meta-objects ○ can persist Flink non-hive streaming and batch meta-objects to Hive Metastore ■ e.g. kafka/pulsar tables Catalogs
  • 14. HiveCatalog Flink-Hive interoperability: Flink can read/write Hive metadata thru HiveCatalog Flink can persist non-hive metadata using Hive Metastore as storage via HiveCatalog
  • 16. ● Officially support Hive 2.3.4 and 1.2.1 ● Rely on Hive’s own compatibility for other 2.x and 1.x Supported Hive Versions
  • 17. ● Supports all Hive UDF interfaces via HiveCatalog ○ UDF ○ GenericUDF ○ UDTF ○ UDAF ○ GenericUDAFResolver Support Hive UDF
  • 18. ● Can read/write non-partitioned Hive tables ● Can read partitioned Hive tables ● Supports partition-pruning ● Supports text, SequenceFile, ORC, Parquet Hive Source and Sink
  • 19. Example - Table API TableEnvironment tEnv = ... tEnv.registerCatalog(new HiveCatalog("myHive", "/opt/hive-conf/")); tEnv.useCatalog("myHive"); tEnv.useDatabase("myDb"); // Read Hive meta-objects Catalog myHive1 = tEnv.getCatalog("myHive1").get(); myHive1.listDatabases(); myHive1.listTables("myDb"); ObjectPath myTablePath = new ObjectPath("myDb", "myHiveTable"); myHive1.getTable(myTablePath); myHive1.listPartitions(myTablePath); // Query Hive data tEnv.sqlQuery("select * from myHiveTable").print()
  • 20. SQL Client Example // Register catalogs in sql-cli-defaults.yml
  • 21. SQL CLI Example (cont’) Flink SQL> SHOW CATALOGS; myhive1 default_catalog Flink SQL> SHOW DATABASES; myDb Flink SQL> USE myhive1.myDb; Flink SQL> SHOW TABLES; myTable Flink SQL> DRESCRIBE myHiveTable; ... Flink SQL> SELECT * FROM myHiveTable; ...
  • 22. ● Integration with Hive was released in Flink 1.9 in Beta ● It lays the foundation for Flink’s integration with Hive ● This initiative led us to vast development and enhancement of Flink’s SQL stack and metadata management capabilities Summary
  • 23. Flink 1.10 - What’s upcoming next?
  • 24. Supports all Hive 1.2, 2.0, 2.1, 2.2, 2.3, 3.1 versions ● 1.2.0, 1.2.1 ● 2.0.0, 2.0.1 ● 2.1.0, 2.1.1 ● 2.2.0 ● 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.3.6 ● 3.1.0, 3.1.1, 3.1.2 Support More Hive Versions
  • 25. ● HiveTableSource supports ○ projection pushdown ○ reading Hive views (in-progress) ● HiveTableSink supports ○ “INSERT OVERWRITE” ○ inserting into partitions, both dynamic and static Hive Source and Sink Improvements
  • 26. FLIP-57 Rework FunctionCatalog ● Problems to solve ○ Clarify and complete function categories ■ currently definition of temp functions is ambiguous ○ Enabling referencing functions with qualified names across catalog and database ○ Redefine function resolution order https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
  • 27. Similarity: ● volatile, and lifespan within a session Differences: ● Temp System Functions ○ has no namespace, can be referenced anywhere with function name ○ can override system/built-in functions ○ “CREATE TEMPORARY SYSTEM FUNCTION …” ● Temp Catalog Functions ○ has catalog/db namespaces ○ can override catalog functions ○ “CREATE TEMPORARY FUNCTION …” Introducing Temp System Function v.s. Temp Catalog Function
  • 28. Introducing Ambiguous v.s. Precise Function References Ambiguous function reference Precise function reference (NEW!) with only function name SELECT <func>(col) FROM T with fully or partially qualified name SELECT <cat>.<db>.<func>(col) FROM T SELECT <db>.<func>(col) FROM T enables cross-catalog/db function reference new resolution order: 1. temp system function 2. system(built-in) function 3. temp catalog function in current cat/db 4. catalog function in current cat/db resolution order: 1. temp catalog function 2. catalog function
  • 29. FLIP-68 Extend Core Table System with Pluggable Modules ● Motivations: ○ Enable users to integrate Flink with cores and built-in objects of other systems ■ Supports Hive built-in functions via pluggable modules ■ To name a few more upcoming - Geo and Machine Learning modules ○ Empower users to write code and do customized developement for Flink table core https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+Sys tem+with+Modular+Plugins
  • 31. Full set of DDLs and other commands supported via **Unified** SQL parser ● CREATE/DROP/ALTER/RENAME ○ CATALOG/DATABASE/FUNCTION ○ TABLE ■ CREATE TABLE AS SELECT ... User Facing Points ● SQL Client - already supported a few but not thru SQL parser ● Table APIs - TableEnvironment#sqlUpdate(“...”) and sqlQuery(“...”) https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement FLIP-69 DDL Enhancement in SQL and Table APIs
  • 32. Beyond Flink 1.10 ● More Hive SQL compatibilities to minimize migration efforts ● More user custom objects, like serdes, storage handlers ● Performance optimization ● Feature parity with Hive (bucketing, etc) ● Enterprise readiness - security, governess ● Regular maintenance and releases
  • 33. Xuefu Zhang, Rui Li, Terry Wang, Timo Walter, Dawid Wysakowicz, Kurt Young, Jingsong Lee, Jark Wu, etc Thanks to Other Contributors
  • 34. Flink’s integrating with Hive ● helps Flink to realize its potential in batch processing ● is a critical step for Flink towards unified data processing ● reshapes Flink’s metadata management capabilities ● enhances Flink’s SQL stack ● brings mass user base and lays the foundation for enterprise adoption Conclusions