Terark Product and Technology

Terark
Make Data Smaller and Access Faster

Terark built a fastest storage engine with best compression.
Compatible with MySQL, MongoDB and RocksDB, making
random read 200X faster, storage size 10X smaller. It is built for
general purpose, optimized for read heavy scenarios, resulting in
larger scalability with lower cost for big data applications.
Brief Introduction
Terark Confidential

Y Combinator is the world leading startup incubator (total valuation of portfolio
companies $100+ billion). The best known are Airbnb and Dropbox.
We Are a YC Company
Terark Confidential

Paying Customers
Terark technology supports Cloud, Big Data and Internet companies
to have better performance with less costs.
Terark Confidential
E-Commerce Giant around the Globe
Terark technology supports its business growth through
Alibaba Cloud.

Proven Results
Terark Compression
$ 5,000$ 30,000
Others (6 servers)
Terark (1 server)
550G
47G
TerarkTPC-H Dataset
TCO (on the same data size)
Hardware & Ops Cost
Terark Confidential

Use Terark’s CO-Index and PA-Zip to implement RocksDB’s SSTable.
• Much better compression
• Much better random read performance
• Terark trades off compression speed for high compression ratio and performance
• Use universal compaction to minimize write amplification
TerarkDB: Compatible with RocksDB
Terark Confidential

Strong Compression ( > 10:1 compression ratio)
- Lift Data Capacity
- Increase Memory Utilization, Lower Down Disk I/O
- Save Data Infrastructure Cost
Extreme Performance (QPS 15~500X of Competitors)
- Lower Latency, Higher Throughput and Concurrency
Simple DevOps & HA
- Leverage MySQL&MongoDB ecosystem
- Support proven devops tools
- HA based on MySQL and MongoDB
MySQL on TerarkDB, Mongo on TerarkDB
Terark Confidential

Core Technology
● CO-Index (Compressed Ordered Index)
Direct search on highly compressed index
● PA-Zip (Point Accessible Zip)
Direct point access one datum on globally compressed dataset
Our breakthrough technology is protected by 5 patents in the US, China and worldwide.
Terark Confidential

Thanks
Sean Fu
Mobile & WeChat: (+86) 13911734987
E: xinyuan@terark.com

Appendix 1: TCO & ROI Details
Hardware Cost (1 server ~ $5000 a year referred to AWS) Operational Cost (~20% of the hardware cost)
Terark $5,000 $1,000
Other Product $30,000 $6,000
Terark Confidential

Appendix 1: TCO & ROI Details
Year(s) Cost Savings Estimated Rev Lift due to Performance/Scalibility Improvement(~20% of Cost Savings)
1 $30,000 $6,000
3 $90,000 $18,000
5 $150,000 $30,000
Terark Confidential

• CO-Index (Compressed Ordered Index)
Terark Nested Succinct Trie
• PA-Zip (Point Accessible Zip)
Global compression, point access
Appendix 2: Core Technology Detail
Terark Confidential

Hash B+Tree Terark Nested Succinct Trie
Compression None OK ✔✔✔ Excellent
Searching ✔✔ Very Fast OK ✔ Fast
Exact Searching ✔ Supported ✔ Supported ✔ Supported
Range Searching Not Supported ✔ Supported ✔ Supported
Prefix Searching Not Supported ✔ Supported ✔ Supported
Regex Searching Not Supported Not Supported ✔ Supported
Reverse Searching(id to key) Not Supported(can be work-around) Not Supported ✔ Supported
Index Comparation
Terark Confidential

Block-based: leveldb,
rocksdb, wiredtiger…
Short data: Terark
Nested Succinct Trie
Long data: Terark Global
Compression
Compression ratio OK ✔✔✔ Excellent ✔✔✔ Excellent
Random Read Slow ✔ Fast ✔ Fast
Sequential Read ✔ Fast OK ✔ Fast
Double Cache Problem YES NO NO
Compression Speed ✔ Fast Slow Slow
Data (Value) Compression
Terark Confidential

2-bits for a node, Pre-Order
DFUDS
101110000100
Level-Order
LOUDS
101110010000
Parent(c) = rank0(select1(c))
Child(p, i) = select0(p) – p + i
Needs findopen, findclose, enclose, which are much
slower than rank/select, rarely used
Simple and fast, small:
Succinct Data Structure represents data within a space which is close to theoretical limit. It uses bitmap to represent data, and uses
rank-select to look for data.
It can tremendously reduce memory usage, but it is very complex to implement. Terark has our own implementations and achieved
much better performance than open-source implementations.
CO-Index: Succinct Tree
Terark Confidential

Patricia Trie: A Compressed Trie
Path compression: Compress all one-child nodes in a
path into a single node
Nested: Convert the compressed path into a new Trie
Requirements: Trie need to support “reverse searching”,
meaning to extract string from the node
CO-Index: Patricia Trie + Nesting
Terark Confidential

• Global Compression
• Global + Local Dictionary
• Short data friendly (~50 bytes)
• Larger dataset, better compression
• Point accessible (via record id)
• Inspired by lz77
PA-Zip (Point Accessible Zip)
Terark Confidential

Terark Product and Technology

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Terark Product and Technology

Similar to Terark Product and Technology (20)

Recently uploaded

Recently uploaded (20)

Terark Product and Technology