What Is Apache Tephra ?
● Provides transactions for HBase and Phoenix
● Apache incubating project
● Uses HBase's native data versioning to
● Provide multi-versioned concurrency control (MVCC)
● For transactional reads and writes
● Provides snapshot isolation of concurrent transactions
● Open source / Apache 2.0 license
Tephra Architecture
● Tephra has three main components
● Transaction Server
– Maintains global view of transaction state
– Assigns new transaction IDs
– Performs conflict detection
● Transaction Client
– Coordinates start, commit
– And rollback of transactions
Tephra Architecture
● Tephra has three main components
● TransactionProcessor Coprocessor
– Applies filtering to the data read
● (based on a given transaction's state)
– Cleans up any data from old
● (no longer visible) transactions
● Multiple transaction server instances can run concurrently
– Allows for automatic failover
– One server instance is actively serving requests
– Configured by ZooKeeper
Tephra Phoenix
● Tephra is an incubating Apache project
● Phoenix uses Tephra for transaction support
● So this functionality is in a beta stage
● It gives cross row and cross table transaction support
● And full ACID semantics
● Remember that Phoenix uses Hbase as it's backing store
● Next slides show configuration
Phoenix Architecture ( Reminder )
Tephra Phoenix Config
● Add the following config
● To your client side hbase-site.xml file
● To enable transactions
<property>
<name>phoenix.transactions.enabled</name>
<value>true</value>
</property>
Tephra Phoenix Config
● Add the following config
● To your server side hbase-site.xml file
● To configure the transaction manager
<property>
<name>data.tx.snapshot.dir</name>
<value>/tmp/tephra/snapshots</value>
</property>
Tephra Phoenix Config
● Add the following config
● To your server side hbase-site.xml file
● To set the transaction timeout
<property>
<name>data.tx.timeout</name>
<value>60</value>
</property>
● Then you can start Tephra on Phoenix
./bin/tephra
Tephra Requirements
Component
Java
HDFS
Hbase
ZooKeeper
Source
Apache Hadoop
CDH or HDP
MapR
Apache
CDH or HDP
MapR
Apache
CDH or HDP
MapR
Version
1.7.xx / 1.8.xx
2.0.2-alpha - 2.7.x
(CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6
4.1 - 5.1 (with MapR-FS)
0.96.x, 0.98.x, 1.0.x, 1.1.x, 1.2.x, 1.3.x
(except 1.1.5 and 1.2.2) and 2.0.x
(CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6
4.1 - 5.1 (with Apache Hbase)
Version 3.4.3 - 3.4.5
(CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6
4.1 - 5.1
Tephra Transaction Server Config
● Add changes to hbase-site.xml
data.tx.bind.port
data.tx.bind.address
data.tx.server.io.threads
data.tx.server.threads
data.tx.timeout
data.tx.long.timeout
data.tx.cleanup.interval
data.tx.snapshot.dir
data.tx.snapshot.interval
data.tx.snapshot.retain
data.tx.metrics.period
15165
0.0.0.0
2
20
30
86400
10
300
10
60
Port to bind to
Server address to listen on
Number of threads for socket IO
Number of handler threads
Timeout for a transaction to complete
Timeout for a long run trans to complete
Frequency to check for timed out trans
HDFS directory used to store snapshots
requency to write new snapshots
No. old transaction snapshots to retain
Frequency for metrics reporting
Tephra Transaction Client Config
● Add changes to hbase-site.xml
data.tx.client.timeout
data.tx.client.provider
data.tx.client.count
data.tx.client.obtain.timeout
data.tx.client.retry.strategy
data.tx.client.retry.attempts
data.tx.client.retry.backoff.initial
data.tx.client.retry.backoff.factor
data.tx.client.retry.backoff.limit
30000
Pool
50
3000
Backoff
2
100
4
30000
Client socket timeout (milliseconds)
Client provider strategy:
"pool" uses a pool of clients
"thread-local" a client per thread
Max number of clients for "pool" provider
Pool provider clients get timeout (ms)
Client retry strategy(Backoff/n-times)
Number of times to retry (n-times)
Initial sleep time (backoff)
Multiplication factor for sleep time
Exit when sleep time reaches this limit
Tephra HBase Coprocessor Configuration
● Tephra requires an HBase coprocessor to be installed
● On all tables where transactional reads and writes
● Will be performed, Add this change
● To hbase-site.xml
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.tephra.hbase.coprocessor.TransactionProcessor</value>
</property>
● Use Tephra binary to start once configured
./bin/tephra start
Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Apache Tephra

  • 1.
    What Is ApacheTephra ? ● Provides transactions for HBase and Phoenix ● Apache incubating project ● Uses HBase's native data versioning to ● Provide multi-versioned concurrency control (MVCC) ● For transactional reads and writes ● Provides snapshot isolation of concurrent transactions ● Open source / Apache 2.0 license
  • 2.
    Tephra Architecture ● Tephrahas three main components ● Transaction Server – Maintains global view of transaction state – Assigns new transaction IDs – Performs conflict detection ● Transaction Client – Coordinates start, commit – And rollback of transactions
  • 3.
    Tephra Architecture ● Tephrahas three main components ● TransactionProcessor Coprocessor – Applies filtering to the data read ● (based on a given transaction's state) – Cleans up any data from old ● (no longer visible) transactions ● Multiple transaction server instances can run concurrently – Allows for automatic failover – One server instance is actively serving requests – Configured by ZooKeeper
  • 4.
    Tephra Phoenix ● Tephrais an incubating Apache project ● Phoenix uses Tephra for transaction support ● So this functionality is in a beta stage ● It gives cross row and cross table transaction support ● And full ACID semantics ● Remember that Phoenix uses Hbase as it's backing store ● Next slides show configuration
  • 5.
  • 6.
    Tephra Phoenix Config ●Add the following config ● To your client side hbase-site.xml file ● To enable transactions <property> <name>phoenix.transactions.enabled</name> <value>true</value> </property>
  • 7.
    Tephra Phoenix Config ●Add the following config ● To your server side hbase-site.xml file ● To configure the transaction manager <property> <name>data.tx.snapshot.dir</name> <value>/tmp/tephra/snapshots</value> </property>
  • 8.
    Tephra Phoenix Config ●Add the following config ● To your server side hbase-site.xml file ● To set the transaction timeout <property> <name>data.tx.timeout</name> <value>60</value> </property> ● Then you can start Tephra on Phoenix ./bin/tephra
  • 9.
    Tephra Requirements Component Java HDFS Hbase ZooKeeper Source Apache Hadoop CDHor HDP MapR Apache CDH or HDP MapR Apache CDH or HDP MapR Version 1.7.xx / 1.8.xx 2.0.2-alpha - 2.7.x (CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6 4.1 - 5.1 (with MapR-FS) 0.96.x, 0.98.x, 1.0.x, 1.1.x, 1.2.x, 1.3.x (except 1.1.5 and 1.2.2) and 2.0.x (CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6 4.1 - 5.1 (with Apache Hbase) Version 3.4.3 - 3.4.5 (CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 – 2.6 4.1 - 5.1
  • 10.
    Tephra Transaction ServerConfig ● Add changes to hbase-site.xml data.tx.bind.port data.tx.bind.address data.tx.server.io.threads data.tx.server.threads data.tx.timeout data.tx.long.timeout data.tx.cleanup.interval data.tx.snapshot.dir data.tx.snapshot.interval data.tx.snapshot.retain data.tx.metrics.period 15165 0.0.0.0 2 20 30 86400 10 300 10 60 Port to bind to Server address to listen on Number of threads for socket IO Number of handler threads Timeout for a transaction to complete Timeout for a long run trans to complete Frequency to check for timed out trans HDFS directory used to store snapshots requency to write new snapshots No. old transaction snapshots to retain Frequency for metrics reporting
  • 11.
    Tephra Transaction ClientConfig ● Add changes to hbase-site.xml data.tx.client.timeout data.tx.client.provider data.tx.client.count data.tx.client.obtain.timeout data.tx.client.retry.strategy data.tx.client.retry.attempts data.tx.client.retry.backoff.initial data.tx.client.retry.backoff.factor data.tx.client.retry.backoff.limit 30000 Pool 50 3000 Backoff 2 100 4 30000 Client socket timeout (milliseconds) Client provider strategy: "pool" uses a pool of clients "thread-local" a client per thread Max number of clients for "pool" provider Pool provider clients get timeout (ms) Client retry strategy(Backoff/n-times) Number of times to retry (n-times) Initial sleep time (backoff) Multiplication factor for sleep time Exit when sleep time reaches this limit
  • 12.
    Tephra HBase CoprocessorConfiguration ● Tephra requires an HBase coprocessor to be installed ● On all tables where transactional reads and writes ● Will be performed, Add this change ● To hbase-site.xml <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.tephra.hbase.coprocessor.TransactionProcessor</value> </property> ● Use Tephra binary to start once configured ./bin/tephra start
  • 13.
    Available Books ● See“Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  • 14.
    Connect ● Feel freeto connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration