SlideShare a Scribd company logo
OASIS : DATA ANALYSIS PLATFORM
FOR MULTI-TENANT HADOOP CLUSTER
Keiji Yoshida - Data Engineer, Data Labs
OASIS
• Web-based data analysis platform
• Enables employees to analyze their service’s data
Agenda 1. Motivation
2. Features
3. Use Cases
Agenda 1. Motivation
2. Features
3. Use Cases
DATA PLATFORM
LINE Ads Platform
LINE Creators Market
LINE NEWS
LINE Pay
LINE LIVE
LINE MOBILE
Hadoop Cluster (Data Lake)
LINE Ads
Platform
LINE Creators
Market
LINE NEWS LINE Pay
LINE LIVE LINE MOBILE
ETL
Analysis
BI / Reporting
DATA DEMOCRATIZATION
• Make Hadoop cluster public within LINE
• Enable employees to analyze their service’s data as they like
• Speed up data analysis process and decision making
Multi-tenant Hadoop Cluster
LINE Ads
Platform
LINE Creators
Market
LINE Ads Platform LINE Creators Market
REQUIREMENTS
1. Security
2. Stability
3. Features
1. SECURITY
• Strict access control
• Allow employees to access only their service’s data
Multi-tenant Hadoop Cluster
LINE Ads
Platform
LINE Creators
Market
LINE Ads Platform LINE Creators Market
1. SECURITY
• Kerberos authentication
• Apache Ranger for authorization
2. STABILITY
• Isolation of applications
• Resource control
Multi-tenant Hadoop Cluster
App 1
App 4
App 2 App 3
App 5 App 6
2. STABILITY
• Apache Spark on YARN
• Utilize Apache YARN’s resource control mechanism
Multi-tenant Hadoop Cluster
3. FEATURES
Skill
Role
Required
Features
SQL Programming Data Science
X X X Manager Result Sharing
O X X Planner
Query Result
Visualization
O O X Engineer ETL
O O O Data Scientist
Ad Hoc Data
Analysis
3. FEATURES
• Query execution
• Query result visualization
• Code execution (Scala, Python, R)
• Scheduled execution
• Result sharing
• Result access control
3. FEATURES
Apache Zeppelin • Has security and stability issues
Jupyter
• Does not support query result visualization
• Does not support scheduled execution
Redash
• Does not support Spark application code execution
• Does not support user impersonation
Apache Superset
• Does not support Spark application code execution
• Does not support scheduled execution
Apache Hue
• Relies on Apache Livy
• Does not support concurrent Spark SQL execution
• Does not support Spark application sharing
APACHE ZEPPELIN 0.7.3 : SECURITY
• Configurable execution user
APACHE ZEPPELIN 0.7.3 : SECURITY
• Launch Spark application with another user account
• Cheats Apache Ranger
Spark Application : User BApache Zeppelin
HDFS / Apache Ranger
User A
APACHE ZEPPELIN 0.7.3 : STABILITY
• Runs only on a single server
• Does not support “yarn-cluster” mode
• Easy to freeze
Apache Zeppelin Server
Apache Zeppelin Driver Program 1 Driver Program 2
Driver Program 3 Driver Program 4 Driver Program 5
OASIS
Agenda 1. Motivation
2. Features
3. Use Cases
OVERVIEW
• Spark application submission
• Query result visualization
• Notebook sharing
• Notebook scheduling
• Multiple servers
SYSTEM ARCHITECTURE
OASIS
Spark
Interpreter
MySQL Redis
Hadoop
YARN
Cluster
HDFS /
Apache
Ranger
Job
Scheduler
Frontend /
API
End Users
NOTEBOOK CREATION
SPARK APPLICATION
• Launch per notebook session
• Use notebook’s author’s account for accessing HDFS
• Support Spark, Spark SQL, PySpark, and SparkR
SPARK APPLICATION SHARING
NOTEBOOK SHARING
• Notebooks can be shared within a “space”
• “space” : root directory of notebooks for each LINE service
• Access rights: “read write”, “read only”
Space 1
Read Write

Users
Read Only

Users
Notebooks
Space 2
Read Write

Users
Read Only

Users
Notebooks
PARAMETERS
• Parameters can be injected into a notebook
• Read only users can redraw a notebook while changing its parameters
SCHEDULING
• Automatically execute notebook
• Keep notebook contents up to date
• Periodically run ETL processing
SMALL FILES PROBLEM
• Consume a lot of NameNode’s memory
• Degrade search performance
• Default value of spark.sql.shuffle.partitions : 200
DATA INSERTION API
• oasis.insertOverwrite(query, table)
• Replace spark.sql(query).write.mode(“overwrite”).insertInto(table)
• Number of files are optimized automatically
OASIS.INSERTOVERWRITE(QUERY, TABLE)
1. Create temporary table
2. Insert query result to temporary table
Spark
Application Tmp Table
1. spark.sql(“create table …”)
2. spark.sql(query).write.insertInto(tmpTable)
OASIS.INSERTOVERWRITE(QUERY, TABLE)
3. Calculate optimal number of files
4. Recreate temporary table’s data with optimal number of files
Spark
Application Tmp Table
3. filesNum = total file size / block size
4. spark.sql(query).repartition(filesNum).write
.mode(“overwrite”).insertInto(tmpTable)
OASIS.INSERTOVERWRITE(QUERY, TABLE)
5. Drop Hive partitions from target table
6. Move temporary table’s files to target table
Spark
Application
Tmp Table
5. spark.sql(“alter table … drop partition …”)
6. FileSystem.get(…).rename(tmpPath, targetPath)
Target Table
OASIS.INSERTOVERWRITE(QUERY, TABLE)
7. Add Hive partitions to the target table
8. Drop the temporary table
Spark
Application
Tmp Table
7. spark.sql(“alter table … add partition …”)
Target Table
8. spark.sql(“drop table …”)
MULTIPLE SERVERS
• Scalable
• Highly available
OASIS
Spark
Interpreter
MySQL Redis
Job
Scheduler
Frontend /
API
SPARK INTERPRETER ROUTING
• Route information is managed on Redis
• Code of the same notebook session goes to the same interpreter
Spark Interpreter 1
Redis
Frontend / API 1
End Users
Load Balancer
Frontend / API 2 Spark Interpreter 2
Update

route Information
Search

route Information
Spark

application

code
Round-robin
MULTIPLE JOB SCHEDULERS
• Make OASIS Job Scheduler highly available
• Utilize Quartz’s clustering feature
MySQL
Job Scheduler 1 Job Scheduler 2 Job Scheduler 3
Quartz Clustering
HADOOP CLUSTER (DATA LAKE)
• 500 DataNodes / NodeManagers
• HDFS usage : 25PB
• 150+ Hive databases
• 1,500+ Hive tables
Agenda 1. Motivation
2. Features
3. Use Cases
Users
40+Spaces
1,600+Notebooks
500+
STATS
USE CASES
1. Report
2. Interactive dashboard
3. ETL
4. Monitoring
5. Ad hoc analysis
RECAP : OASIS
• Data analysis platform for a multi-tenant Hadoop cluster
• Data can be extracted, processed, visualized, and shared
• Used for reporting, data monitoring, ad hoc analysis, etc. at LINE
THANK YOU

More Related Content

What's hot

Oracleからamazon auroraへの移行にむけて
Oracleからamazon auroraへの移行にむけてOracleからamazon auroraへの移行にむけて
Oracleからamazon auroraへの移行にむけて
Yoichi Sai
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
Yahoo!デベロッパーネットワーク
 
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)Amazon Web Services Japan
 
負荷試験入門公開資料 201611
負荷試験入門公開資料 201611負荷試験入門公開資料 201611
負荷試験入門公開資料 201611
樽八 仲川
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門
Etsuji Nakai
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
 
AWSからのメール送信
AWSからのメール送信AWSからのメール送信
AWSからのメール送信
Amazon Web Services Japan
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド
崇介 藤井
 
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
Amazon Web Services Japan
 
AWS 初心者向けWebinar 基本から理解する、AWS運用監視
AWS 初心者向けWebinar 基本から理解する、AWS運用監視AWS 初心者向けWebinar 基本から理解する、AWS運用監視
AWS 初心者向けWebinar 基本から理解する、AWS運用監視
Amazon Web Services Japan
 
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
NTT DATA Technology & Innovation
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05都元ダイスケ Miyamoto
 
コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話
Yuta Shimada
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
NTT DATA Technology & Innovation
 
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
Amazon Web Services Japan
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
NTT DATA Technology & Innovation
 
Hiveを高速化するLLAP
Hiveを高速化するLLAPHiveを高速化するLLAP
Hiveを高速化するLLAP
Yahoo!デベロッパーネットワーク
 
WebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話すWebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話す
Takaya Saeki
 

What's hot (20)

Oracleからamazon auroraへの移行にむけて
Oracleからamazon auroraへの移行にむけてOracleからamazon auroraへの移行にむけて
Oracleからamazon auroraへの移行にむけて
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)
[AWSマイスターシリーズ]Amazon Elastic Load Balancing (ELB)
 
負荷試験入門公開資料 201611
負荷試験入門公開資料 201611負荷試験入門公開資料 201611
負荷試験入門公開資料 201611
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
 
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
 
AWSからのメール送信
AWSからのメール送信AWSからのメール送信
AWSからのメール送信
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド
 
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
Kinesis + Elasticsearchでつくるさいきょうのログ分析基盤
 
AWS 初心者向けWebinar 基本から理解する、AWS運用監視
AWS 初心者向けWebinar 基本から理解する、AWS運用監視AWS 初心者向けWebinar 基本から理解する、AWS運用監視
AWS 初心者向けWebinar 基本から理解する、AWS運用監視
 
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
 
コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
202110 AWS Black Belt Online Seminar AWS Site-to-Site VPN
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
 
Hiveを高速化するLLAP
Hiveを高速化するLLAPHiveを高速化するLLAP
Hiveを高速化するLLAP
 
WebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話すWebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話す
 

Similar to OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster

Oasis – data analysis platform for enterprise
Oasis – data analysis platform for enterpriseOasis – data analysis platform for enterprise
Oasis – data analysis platform for enterprise
LINE Corporation
 
Spark Resource Manager
Spark Resource ManagerSpark Resource Manager
Spark Resource Manager
Shad Amez
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
P. Taylor Goetz
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
Amazon Web Services
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Amazon Web Services
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
Nicolaas Matthijs
 
T3 - Deploy, manage, and scale your apps
T3 - Deploy, manage, and scale your appsT3 - Deploy, manage, and scale your apps
T3 - Deploy, manage, and scale your apps
Amazon Web Services
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
Amazon Web Services
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
Carolyn Duby
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
Kevin Epstein
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
Amazon Web Services
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
NAVER D2
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
Rick van den Bosch
 
TechBeats #2
TechBeats #2TechBeats #2
TechBeats #2
applausepoland
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...Tim Vaillancourt
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 

Similar to OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster (20)

Oasis – data analysis platform for enterprise
Oasis – data analysis platform for enterpriseOasis – data analysis platform for enterprise
Oasis – data analysis platform for enterprise
 
Spark Resource Manager
Spark Resource ManagerSpark Resource Manager
Spark Resource Manager
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
T3 - Deploy, manage, and scale your apps
T3 - Deploy, manage, and scale your appsT3 - Deploy, manage, and scale your apps
T3 - Deploy, manage, and scale your apps
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
TechBeats #2
TechBeats #2TechBeats #2
TechBeats #2
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 

More from LINE Corporation

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
LINE Corporation
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
LINE Corporation
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
LINE Corporation
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
LINE Corporation
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
LINE Corporation
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
LINE Corporation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
LINE Corporation
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
LINE Corporation
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
LINE Corporation
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
LINE Corporation
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
LINE Corporation
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
LINE Corporation
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
LINE Corporation
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
LINE Corporation
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
LINE Corporation
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE Corporation
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
LINE Corporation
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
LINE Corporation
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Corporation
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
LINE Corporation
 

More from LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster

  • 1. OASIS : DATA ANALYSIS PLATFORM FOR MULTI-TENANT HADOOP CLUSTER Keiji Yoshida - Data Engineer, Data Labs
  • 2. OASIS • Web-based data analysis platform • Enables employees to analyze their service’s data
  • 3. Agenda 1. Motivation 2. Features 3. Use Cases
  • 4. Agenda 1. Motivation 2. Features 3. Use Cases
  • 5. DATA PLATFORM LINE Ads Platform LINE Creators Market LINE NEWS LINE Pay LINE LIVE LINE MOBILE Hadoop Cluster (Data Lake) LINE Ads Platform LINE Creators Market LINE NEWS LINE Pay LINE LIVE LINE MOBILE ETL Analysis BI / Reporting
  • 6. DATA DEMOCRATIZATION • Make Hadoop cluster public within LINE • Enable employees to analyze their service’s data as they like • Speed up data analysis process and decision making Multi-tenant Hadoop Cluster LINE Ads Platform LINE Creators Market LINE Ads Platform LINE Creators Market
  • 8. 1. SECURITY • Strict access control • Allow employees to access only their service’s data Multi-tenant Hadoop Cluster LINE Ads Platform LINE Creators Market LINE Ads Platform LINE Creators Market
  • 9. 1. SECURITY • Kerberos authentication • Apache Ranger for authorization
  • 10. 2. STABILITY • Isolation of applications • Resource control Multi-tenant Hadoop Cluster App 1 App 4 App 2 App 3 App 5 App 6
  • 11. 2. STABILITY • Apache Spark on YARN • Utilize Apache YARN’s resource control mechanism Multi-tenant Hadoop Cluster
  • 12. 3. FEATURES Skill Role Required Features SQL Programming Data Science X X X Manager Result Sharing O X X Planner Query Result Visualization O O X Engineer ETL O O O Data Scientist Ad Hoc Data Analysis
  • 13. 3. FEATURES • Query execution • Query result visualization • Code execution (Scala, Python, R) • Scheduled execution • Result sharing • Result access control
  • 14. 3. FEATURES Apache Zeppelin • Has security and stability issues Jupyter • Does not support query result visualization • Does not support scheduled execution Redash • Does not support Spark application code execution • Does not support user impersonation Apache Superset • Does not support Spark application code execution • Does not support scheduled execution Apache Hue • Relies on Apache Livy • Does not support concurrent Spark SQL execution • Does not support Spark application sharing
  • 15. APACHE ZEPPELIN 0.7.3 : SECURITY • Configurable execution user
  • 16. APACHE ZEPPELIN 0.7.3 : SECURITY • Launch Spark application with another user account • Cheats Apache Ranger Spark Application : User BApache Zeppelin HDFS / Apache Ranger User A
  • 17. APACHE ZEPPELIN 0.7.3 : STABILITY • Runs only on a single server • Does not support “yarn-cluster” mode • Easy to freeze Apache Zeppelin Server Apache Zeppelin Driver Program 1 Driver Program 2 Driver Program 3 Driver Program 4 Driver Program 5
  • 18. OASIS
  • 19. Agenda 1. Motivation 2. Features 3. Use Cases
  • 20. OVERVIEW • Spark application submission • Query result visualization • Notebook sharing • Notebook scheduling • Multiple servers
  • 21. SYSTEM ARCHITECTURE OASIS Spark Interpreter MySQL Redis Hadoop YARN Cluster HDFS / Apache Ranger Job Scheduler Frontend / API End Users
  • 23. SPARK APPLICATION • Launch per notebook session • Use notebook’s author’s account for accessing HDFS • Support Spark, Spark SQL, PySpark, and SparkR
  • 25. NOTEBOOK SHARING • Notebooks can be shared within a “space” • “space” : root directory of notebooks for each LINE service • Access rights: “read write”, “read only” Space 1 Read Write
 Users Read Only
 Users Notebooks Space 2 Read Write
 Users Read Only
 Users Notebooks
  • 26. PARAMETERS • Parameters can be injected into a notebook • Read only users can redraw a notebook while changing its parameters
  • 27. SCHEDULING • Automatically execute notebook • Keep notebook contents up to date • Periodically run ETL processing
  • 28. SMALL FILES PROBLEM • Consume a lot of NameNode’s memory • Degrade search performance • Default value of spark.sql.shuffle.partitions : 200
  • 29. DATA INSERTION API • oasis.insertOverwrite(query, table) • Replace spark.sql(query).write.mode(“overwrite”).insertInto(table) • Number of files are optimized automatically
  • 30. OASIS.INSERTOVERWRITE(QUERY, TABLE) 1. Create temporary table 2. Insert query result to temporary table Spark Application Tmp Table 1. spark.sql(“create table …”) 2. spark.sql(query).write.insertInto(tmpTable)
  • 31. OASIS.INSERTOVERWRITE(QUERY, TABLE) 3. Calculate optimal number of files 4. Recreate temporary table’s data with optimal number of files Spark Application Tmp Table 3. filesNum = total file size / block size 4. spark.sql(query).repartition(filesNum).write .mode(“overwrite”).insertInto(tmpTable)
  • 32. OASIS.INSERTOVERWRITE(QUERY, TABLE) 5. Drop Hive partitions from target table 6. Move temporary table’s files to target table Spark Application Tmp Table 5. spark.sql(“alter table … drop partition …”) 6. FileSystem.get(…).rename(tmpPath, targetPath) Target Table
  • 33. OASIS.INSERTOVERWRITE(QUERY, TABLE) 7. Add Hive partitions to the target table 8. Drop the temporary table Spark Application Tmp Table 7. spark.sql(“alter table … add partition …”) Target Table 8. spark.sql(“drop table …”)
  • 34. MULTIPLE SERVERS • Scalable • Highly available OASIS Spark Interpreter MySQL Redis Job Scheduler Frontend / API
  • 35. SPARK INTERPRETER ROUTING • Route information is managed on Redis • Code of the same notebook session goes to the same interpreter Spark Interpreter 1 Redis Frontend / API 1 End Users Load Balancer Frontend / API 2 Spark Interpreter 2 Update
 route Information Search
 route Information Spark
 application
 code Round-robin
  • 36. MULTIPLE JOB SCHEDULERS • Make OASIS Job Scheduler highly available • Utilize Quartz’s clustering feature MySQL Job Scheduler 1 Job Scheduler 2 Job Scheduler 3 Quartz Clustering
  • 37. HADOOP CLUSTER (DATA LAKE) • 500 DataNodes / NodeManagers • HDFS usage : 25PB • 150+ Hive databases • 1,500+ Hive tables
  • 38. Agenda 1. Motivation 2. Features 3. Use Cases
  • 40. USE CASES 1. Report 2. Interactive dashboard 3. ETL 4. Monitoring 5. Ad hoc analysis
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. RECAP : OASIS • Data analysis platform for a multi-tenant Hadoop cluster • Data can be extracted, processed, visualized, and shared • Used for reporting, data monitoring, ad hoc analysis, etc. at LINE