SlideShare a Scribd company logo
1 of 26
Download to read offline
Pharos as a Pluggable Secondary
Index Component
Lei Wang
China Everbright Bank Enterprise Architect
Introduce
Lei Wang 王 磊
光大银行科技部领域架构师,曾任职于IBM全球咨询服务部从事技术咨询工作,具有十余年数据领域
研发及咨询经验。目前负责全行数据领域系统的日常架构设计、评审及内部研发等工作,对分布式数据库、
Hadoop等基础架构研究有浓厚兴趣。
I Overall Introduction
II Architecture Design
III Future Development
Part I
Overall Introduction
Research Background
Existing Products or Solutions
1. Native Filter,Low performance
2. HBase +Solr(ES) , Complex architecture / Performance is not good enough / High maintenance cost
3. Phoenix,Heavy Solution / Community inactivity / Imperfect function
4. HBase On Cloud,Unable to use because of security requirements
Our requirements
1. Non-Invasive
2. High performance
3. Universal
4. Simple architecture
5. Support transaction consistency
Pharos
1. Name
comes from English word ‘pharos’
2. Business Scenarios
•Read Only , T+1 Batch Load
•Read and Less Write ( Experimental )
3. Design Principle
Non-Invasive, Simple architecture
Research Process
Startup
V0.1 Release V0.22 Release
V0.3
Single Index
Multi Condition
Multi Data Type
V0.2 Release
Multi Index
Sorting
Paging
Cache
Index Builder Improvement
Refactoring Code
Bitmap Index
CBO Improvement
More Complex Conditions
Todo…
Transaction Consistency Index
2018.4 2018.11 2019.3 2019.7 2019.11
Pharos V0.22 Features (on HBase 1.2.6 or CDH 5.8.3-HBase 1.2.0)
1. Single Index(single column、multi column),Multi Index.
2. Paging, Sorting.
3. Multi Condition Query,including equal, less (equal), greater (equal).
4. AND / OR logic operation.
5. Multiple Data Type, including Char, Date ,Double and so on.
6. Simple Function Compute, for example record count.
7. Batch Index Creation.
Components
Client
API
Coordinator
Server
Coprocessor
Major Business Logic
Builder
Create Index
Coprocessor
Client
1. Define conditions
2. Set conditions in the scan
Server
1. Intercept and parse scan
2. Scan Index
3. Set matched index to filter
Client Code Example
1. Keep the original HBase style
2. Avoiding the complexity of SQL parser
Part II
Architecture Design
Global Index VS Partition(Local)Index
Global Index
• Support unique index
• Index creating and updating will be part of distributed
transactions, performance is not good.
• Query will cross different nodes, so performance may not
be good
Partition Index
• Index and data are co-distribution ,so queries can be pushed down to
each node. We can get good performance.
• Avoiding distributed transaction
• Not support unique index or other global constraints.
Storage Policy
Shadow Column Family
Index and data are exist the same region but in different column
family. We just need to control the generating logic of the index start
rowkey. It is un-invasive.
Single Index Table
Region is the smallest unit that is balanced. We must
guarantee that an index region is distributed with the
corresponding data region. So we must the modify the
balancer . It is invasive.
Index Key
1. Start key, keep index co-distribution
with data
2. Index name / number
3. Indexed column value
4. Reference data row key
Index Value
1. Version info
2. Metadata for deserialization
3. Transaction flag
Index Data Structure
Client as Global Coordinator, Keep Arch Simple
Two-Phase Sorting
1. Region Side Sorting
Base on natural sequence of index
2. Client Side Sorting
Merge results from regions, If data skew occurs,
query again.
Sorting Mechanisms
Reason
1. Paging is a universal requirement.
2. Always, the matched index is greater than the memory.
Design Strategy
Adding global session, shielding internal complexity.
One session breakpoint mapping multi region’s breakpoint.
Implement
1.Assuming that the data is evenly distributed, the page size
is spread to each region.
2.Region side, we control return indexes number and cache
the breakpoint.
3.Client side, merge result, if not enough then query again .
Paging Mechanisms
Choose local cache instead of distributed cache
Keep Arch Simple
Client side cache + Server side cache
Cache Mechanisms
Avoid Rebuilding index due to Region Splitting.
For different rowkey design, Bulkload may lead to
region splitting.
After data loading, stable regions can be obtained.
So we can create indexes and keep co-distribution
with the reference data.
Index Builder
Architecture
Part III
Future Development
Transaction Consistency
Inspire by Google’s Percolator
Bob -> Joe $7
Deformation of 2 Phase Commit
The state of the primary data is the state of the
transaction.
In the write process, only modify the primary state.
In the subsequent query, the state of second data
can be modified asynchronously based on the
primary.
Step 4 -> Transaction complete
Transaction Consistency
In the write process, we can
complete majority consistency, but
not all;
In the read process, we can confirm
transaction state by data row state,
then update index state.
Bitmap Index
CBO Improvement
Integration with SQL Engine(Presto?)
Other features
Thanks!
公众号:金融数士

More Related Content

Similar to hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch ProcessingChris Adkin
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsJoel Oleson
 
Qardio experience with Core Data
Qardio experience with Core DataQardio experience with Core Data
Qardio experience with Core DataDmitrii Ivanov
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical AnalysisRyan Casey
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceSolarWinds
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsCalpont
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applicationsguest40cda0b
 
IRJET- Physical Database Design Techniques to improve Database Performance
IRJET-	 Physical Database Design Techniques to improve Database PerformanceIRJET-	 Physical Database Design Techniques to improve Database Performance
IRJET- Physical Database Design Techniques to improve Database PerformanceIRJET Journal
 
MySQL HA Presentation
MySQL HA PresentationMySQL HA Presentation
MySQL HA Presentationpapablues
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 

Similar to hbaseconasia2019 Pharos as a Pluggable Secondary Index Component (20)

J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
 
Qardio experience with Core Data
Qardio experience with Core DataQardio experience with Core Data
Qardio experience with Core Data
 
No sql database
No sql databaseNo sql database
No sql database
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical Analysis
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for Performance
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Richa_Profile
Richa_ProfileRicha_Profile
Richa_Profile
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
 
Spring
SpringSpring
Spring
 
IRJET- Physical Database Design Techniques to improve Database Performance
IRJET-	 Physical Database Design Techniques to improve Database PerformanceIRJET-	 Physical Database Design Techniques to improve Database Performance
IRJET- Physical Database Design Techniques to improve Database Performance
 
Redshift
RedshiftRedshift
Redshift
 
MySQL HA Presentation
MySQL HA PresentationMySQL HA Presentation
MySQL HA Presentation
 
Ravi Kiran Resume
Ravi Kiran ResumeRavi Kiran Resume
Ravi Kiran Resume
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Isset Presentation @ EECI2009
Isset Presentation @ EECI2009Isset Presentation @ EECI2009
Isset Presentation @ EECI2009
 
Vineet Kurrewar
Vineet KurrewarVineet Kurrewar
Vineet Kurrewar
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

More from Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesMichael Stack
 

More from Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 

Recently uploaded

audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkklolsDocherty
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfOndejSur
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsrahman018755
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsrahman018755
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.Tortogel
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfAnushkaTripathi61
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appscristianmanaila2
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxChloeMeadows1
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideVarun Mithran
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresencePC Doctors NET
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfappinfoedgeca
 

Recently uploaded (16)

audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdf
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

  • 1.
  • 2. Pharos as a Pluggable Secondary Index Component Lei Wang China Everbright Bank Enterprise Architect
  • 3. Introduce Lei Wang 王 磊 光大银行科技部领域架构师,曾任职于IBM全球咨询服务部从事技术咨询工作,具有十余年数据领域 研发及咨询经验。目前负责全行数据领域系统的日常架构设计、评审及内部研发等工作,对分布式数据库、 Hadoop等基础架构研究有浓厚兴趣。
  • 4. I Overall Introduction II Architecture Design III Future Development
  • 6. Research Background Existing Products or Solutions 1. Native Filter,Low performance 2. HBase +Solr(ES) , Complex architecture / Performance is not good enough / High maintenance cost 3. Phoenix,Heavy Solution / Community inactivity / Imperfect function 4. HBase On Cloud,Unable to use because of security requirements Our requirements 1. Non-Invasive 2. High performance 3. Universal 4. Simple architecture 5. Support transaction consistency
  • 7. Pharos 1. Name comes from English word ‘pharos’ 2. Business Scenarios •Read Only , T+1 Batch Load •Read and Less Write ( Experimental ) 3. Design Principle Non-Invasive, Simple architecture
  • 8. Research Process Startup V0.1 Release V0.22 Release V0.3 Single Index Multi Condition Multi Data Type V0.2 Release Multi Index Sorting Paging Cache Index Builder Improvement Refactoring Code Bitmap Index CBO Improvement More Complex Conditions Todo… Transaction Consistency Index 2018.4 2018.11 2019.3 2019.7 2019.11
  • 9. Pharos V0.22 Features (on HBase 1.2.6 or CDH 5.8.3-HBase 1.2.0) 1. Single Index(single column、multi column),Multi Index. 2. Paging, Sorting. 3. Multi Condition Query,including equal, less (equal), greater (equal). 4. AND / OR logic operation. 5. Multiple Data Type, including Char, Date ,Double and so on. 6. Simple Function Compute, for example record count. 7. Batch Index Creation.
  • 11. Coprocessor Client 1. Define conditions 2. Set conditions in the scan Server 1. Intercept and parse scan 2. Scan Index 3. Set matched index to filter
  • 12. Client Code Example 1. Keep the original HBase style 2. Avoiding the complexity of SQL parser
  • 14. Global Index VS Partition(Local)Index Global Index • Support unique index • Index creating and updating will be part of distributed transactions, performance is not good. • Query will cross different nodes, so performance may not be good Partition Index • Index and data are co-distribution ,so queries can be pushed down to each node. We can get good performance. • Avoiding distributed transaction • Not support unique index or other global constraints.
  • 15. Storage Policy Shadow Column Family Index and data are exist the same region but in different column family. We just need to control the generating logic of the index start rowkey. It is un-invasive. Single Index Table Region is the smallest unit that is balanced. We must guarantee that an index region is distributed with the corresponding data region. So we must the modify the balancer . It is invasive.
  • 16. Index Key 1. Start key, keep index co-distribution with data 2. Index name / number 3. Indexed column value 4. Reference data row key Index Value 1. Version info 2. Metadata for deserialization 3. Transaction flag Index Data Structure
  • 17. Client as Global Coordinator, Keep Arch Simple Two-Phase Sorting 1. Region Side Sorting Base on natural sequence of index 2. Client Side Sorting Merge results from regions, If data skew occurs, query again. Sorting Mechanisms
  • 18. Reason 1. Paging is a universal requirement. 2. Always, the matched index is greater than the memory. Design Strategy Adding global session, shielding internal complexity. One session breakpoint mapping multi region’s breakpoint. Implement 1.Assuming that the data is evenly distributed, the page size is spread to each region. 2.Region side, we control return indexes number and cache the breakpoint. 3.Client side, merge result, if not enough then query again . Paging Mechanisms
  • 19. Choose local cache instead of distributed cache Keep Arch Simple Client side cache + Server side cache Cache Mechanisms
  • 20. Avoid Rebuilding index due to Region Splitting. For different rowkey design, Bulkload may lead to region splitting. After data loading, stable regions can be obtained. So we can create indexes and keep co-distribution with the reference data. Index Builder
  • 23. Transaction Consistency Inspire by Google’s Percolator Bob -> Joe $7 Deformation of 2 Phase Commit The state of the primary data is the state of the transaction. In the write process, only modify the primary state. In the subsequent query, the state of second data can be modified asynchronously based on the primary. Step 4 -> Transaction complete
  • 24. Transaction Consistency In the write process, we can complete majority consistency, but not all; In the read process, we can confirm transaction state by data row state, then update index state.
  • 25. Bitmap Index CBO Improvement Integration with SQL Engine(Presto?) Other features