SlideShare a Scribd company logo
Apache Tajo on Swift
Bringing SQL to the OpenStack World
Jihoon Son
Apache Tajo PMC member
Who am I
● Jihoon Son
○ Ph.D candidate (Computer Science & Engineering,
2010.3 ~)
○ Apache Tajo PMC and Committer (2014.5.1 ~)
○ Mentor of Google Summer of Code (2013)
● Contacts
○ Email: jihoonson AT apache.org
○ LinkedIn: https://www.linkedin.com/in/jihoonson
Outline
● OpenStack Swift
● Apache Tajo
● Tajo on Swift
● Demo
● Our Roadmap
OpenStack Swift
● Popular object storage
○ Images, videos, logs, ...
● Enterprises store objects on Swift to provide
their services
○ Usually private clusters
SQL on Swift
● Data analysis is important to improve the quality
of their services
○ SQL is one of the most powerful and popular query
language
● Many enterprise data analysis tools relying on
SQL
○ OLAP, visualization, data mining, …
● Need for using SQL on Swift
Apache Tajo
● Scalable, efficient, and fault-tolerant data
warehouse system
○ Support SQL standards compliance
○ Efficient batch execution and interactive ad-hoc
analysis
■ Low latency and high throughput
■ No use of MapReduce
○ No single point of failure
Apache Tajo
● Active open source project
○ 18 committers and 16 contributors
○ Activity summary
Apache Tajo
Pluggable Storage Layer
...
MasterMasterTajo
Master
Tajo
Worker
Tajo
Worker
Tajo
Worker
Tajo
Worker
...
Tajo on Swift
Pluggable Storage Layer
MasterMasterTajo
Master
Tajo
Worker
Tajo
Worker
Tajo
Worker
Tajo
Worker
...
...
Swift
Tajo on Swift
● No need to modify code of Tajo and Swift
○ Tajo can access Swift with the Hadoop-openstack
library
■ But, doesn’t need to install or run Hadoop
○ Just use it
Swift
Network
Tajo on Swift
● Configuration highlights
○ Swift configuration
■ Need the keystone authentication for the Hadoop
■ No additional configurations
○ HDFS configuration
■ Different cloud providers support
● Key name pattern
fs.swift.service.${provider}
Tajo on Swift
● Configuration highlights
○ Swift configuration
■ Need the keystone authentication for the HDFS client
■ No additional configurations
○ HDFS configuration
■ Different cloud providers support
● Key name pattern
fs.swift.service.${provider}
Tajo on Swift
● Data locality problem
Worker
Storage
Node
Interconnection Network
Node A
Worker
Node B
Storage
Node
Significant
Network
Overhead
Tajo on Swift
● Data locality problem
Worker
Storage
Node
Interconnection Network
Node A
Worker
Node B
Storage
Node
Advanced Integration
● List endpoints middleware
○ Providing the location information of objects,
accounts or containers
■ Tajo workers can directly access each object
○ Example
Advanced Integration
● List endpoints middleware
○ Swift configuration
○
○
○ Hadoop configuration
Advanced Integration
● Location-aware computing
○ Moving the processing close to the data
■ Avoiding the performance degradation due to the data
transfer over the network
○ Important issue when Tajo and Swift share the same
cluster
Location-aware Computing
Storage
Node
Storage
Node
Storage
Node
Query
Master
MasterMasterProxy
Server
Tajo
Worker
Tajo
Worker
Tajo
Worker
Data
location
Data
Swift Cluster Tajo Cluster
Storage
Node
Location-aware Computing
1. Getting object locations from the ring
Query
Master
MasterMasterProxy
Server
Get object locations
Storage
Node
Storage
Node
Location-aware Computing
2. Assigning tasks based on object locations
Query
Master
Worker Worker Worker ...
Storage
Node
Storage
Node
Storage
Node
...
Assign tasks
close to the object
Directly read object data
Demo
Our Roadmap
● Storage layer specialized for Swift
● Block storage support
○ Cinder and Ceph
● Provisioning Tajo clusters
○ Sahara
○ Heat, TOSCA
Thanks!
http://tajo.apache.org/

More Related Content

What's hot

Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
felixbarny
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
HDP2 and YARN operations point
HDP2 and YARN operations pointHDP2 and YARN operations point
HDP2 and YARN operations point
Treasure Data, Inc.
 
Spark Core
Spark CoreSpark Core
Spark Core
Todd McGrath
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Scylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with NewtsScylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with Newts
ScyllaDB
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
Mike Dirolf
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
ScyllaDB
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
GeeksLab Odessa
 
Presto@Uber
Presto@UberPresto@Uber
Presto@Uber
Zhenxiao Luo
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
SoftwareMill
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Avinash Ramineni
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
Alluxio, Inc.
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
ScyllaDB
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 

What's hot (20)

Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
HDP2 and YARN operations point
HDP2 and YARN operations pointHDP2 and YARN operations point
HDP2 and YARN operations point
 
Spark Core
Spark CoreSpark Core
Spark Core
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Scylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with NewtsScylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with Newts
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
Presto@Uber
Presto@UberPresto@Uber
Presto@Uber
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 

Viewers also liked

テレビに未来はあるか!?
テレビに未来はあるか!?テレビに未来はあるか!?
テレビに未来はあるか!?momoe
 
Le Filatrici
Le FilatriciLe Filatrici
Le Filatrici
guido522
 
Vida Sencilla
Vida SencillaVida Sencilla
Vida Sencilla
jverges
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
대학과 오픈소스
대학과 오픈소스대학과 오픈소스
대학과 오픈소스
Jihoon Son
 
Apache tajo configuration
Apache tajo configurationApache tajo configuration
Apache tajo configuration
Jihoon Son
 

Viewers also liked (6)

テレビに未来はあるか!?
テレビに未来はあるか!?テレビに未来はあるか!?
テレビに未来はあるか!?
 
Le Filatrici
Le FilatriciLe Filatrici
Le Filatrici
 
Vida Sencilla
Vida SencillaVida Sencilla
Vida Sencilla
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
 
대학과 오픈소스
대학과 오픈소스대학과 오픈소스
대학과 오픈소스
 
Apache tajo configuration
Apache tajo configurationApache tajo configuration
Apache tajo configuration
 

Similar to Apache Tajo on Swift

PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
Accumulo Summit
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
Jason Flittner
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
VN Tech Seminor Vol.1
VN Tech Seminor Vol.1VN Tech Seminor Vol.1
VN Tech Seminor Vol.1
Shuhei Yamashita
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
Julien Dubois
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
GraphAware
 
OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptx
lohitvijayarenu
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
All Things Open
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
Cesar Cardenas Desales
 
CON6423: Scalable JavaScript applications with Project Nashorn
CON6423: Scalable JavaScript applications with Project NashornCON6423: Scalable JavaScript applications with Project Nashorn
CON6423: Scalable JavaScript applications with Project Nashorn
Michel Graciano
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Neo4j
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
Gaurav Bahrani
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
 
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!
Soluto
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 

Similar to Apache Tajo on Swift (20)

PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
VN Tech Seminor Vol.1
VN Tech Seminor Vol.1VN Tech Seminor Vol.1
VN Tech Seminor Vol.1
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applications
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
 
OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptx
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
CON6423: Scalable JavaScript applications with Project Nashorn
CON6423: Scalable JavaScript applications with Project NashornCON6423: Scalable JavaScript applications with Project Nashorn
CON6423: Scalable JavaScript applications with Project Nashorn
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 

Recently uploaded

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
ycwu0509
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 

Recently uploaded (20)

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 

Apache Tajo on Swift

  • 1. Apache Tajo on Swift Bringing SQL to the OpenStack World Jihoon Son Apache Tajo PMC member
  • 2. Who am I ● Jihoon Son ○ Ph.D candidate (Computer Science & Engineering, 2010.3 ~) ○ Apache Tajo PMC and Committer (2014.5.1 ~) ○ Mentor of Google Summer of Code (2013) ● Contacts ○ Email: jihoonson AT apache.org ○ LinkedIn: https://www.linkedin.com/in/jihoonson
  • 3. Outline ● OpenStack Swift ● Apache Tajo ● Tajo on Swift ● Demo ● Our Roadmap
  • 4. OpenStack Swift ● Popular object storage ○ Images, videos, logs, ... ● Enterprises store objects on Swift to provide their services ○ Usually private clusters
  • 5. SQL on Swift ● Data analysis is important to improve the quality of their services ○ SQL is one of the most powerful and popular query language ● Many enterprise data analysis tools relying on SQL ○ OLAP, visualization, data mining, … ● Need for using SQL on Swift
  • 6. Apache Tajo ● Scalable, efficient, and fault-tolerant data warehouse system ○ Support SQL standards compliance ○ Efficient batch execution and interactive ad-hoc analysis ■ Low latency and high throughput ■ No use of MapReduce ○ No single point of failure
  • 7. Apache Tajo ● Active open source project ○ 18 committers and 16 contributors ○ Activity summary
  • 8. Apache Tajo Pluggable Storage Layer ... MasterMasterTajo Master Tajo Worker Tajo Worker Tajo Worker Tajo Worker ...
  • 9. Tajo on Swift Pluggable Storage Layer MasterMasterTajo Master Tajo Worker Tajo Worker Tajo Worker Tajo Worker ... ... Swift
  • 10. Tajo on Swift ● No need to modify code of Tajo and Swift ○ Tajo can access Swift with the Hadoop-openstack library ■ But, doesn’t need to install or run Hadoop ○ Just use it Swift Network
  • 11. Tajo on Swift ● Configuration highlights ○ Swift configuration ■ Need the keystone authentication for the Hadoop ■ No additional configurations ○ HDFS configuration ■ Different cloud providers support ● Key name pattern fs.swift.service.${provider}
  • 12. Tajo on Swift ● Configuration highlights ○ Swift configuration ■ Need the keystone authentication for the HDFS client ■ No additional configurations ○ HDFS configuration ■ Different cloud providers support ● Key name pattern fs.swift.service.${provider}
  • 13. Tajo on Swift ● Data locality problem Worker Storage Node Interconnection Network Node A Worker Node B Storage Node Significant Network Overhead
  • 14. Tajo on Swift ● Data locality problem Worker Storage Node Interconnection Network Node A Worker Node B Storage Node
  • 15. Advanced Integration ● List endpoints middleware ○ Providing the location information of objects, accounts or containers ■ Tajo workers can directly access each object ○ Example
  • 16. Advanced Integration ● List endpoints middleware ○ Swift configuration ○ ○ ○ Hadoop configuration
  • 17. Advanced Integration ● Location-aware computing ○ Moving the processing close to the data ■ Avoiding the performance degradation due to the data transfer over the network ○ Important issue when Tajo and Swift share the same cluster
  • 19. Storage Node Location-aware Computing 1. Getting object locations from the ring Query Master MasterMasterProxy Server Get object locations Storage Node Storage Node
  • 20. Location-aware Computing 2. Assigning tasks based on object locations Query Master Worker Worker Worker ... Storage Node Storage Node Storage Node ... Assign tasks close to the object Directly read object data
  • 21. Demo
  • 22. Our Roadmap ● Storage layer specialized for Swift ● Block storage support ○ Cinder and Ceph ● Provisioning Tajo clusters ○ Sahara ○ Heat, TOSCA