SlideShare a Scribd company logo
1 of 17
Terark
Make Data Smaller and Access Faster
Terark built a fastest storage engine with best compression.
Compatible with MySQL, MongoDB and RocksDB, making
random read 200X faster, storage size 10X smaller. It is built for
general purpose, optimized for read heavy scenarios, resulting in
larger scalability with lower cost for big data applications.
Brief Introduction
Terark Confidential
Y Combinator is the world leading startup incubator (total valuation of portfolio
companies $100+ billion). The best known are Airbnb and Dropbox.
We Are a YC Company
Terark Confidential
Paying Customers
Terark technology supports Cloud, Big Data and Internet companies
to have better performance with less costs.
Terark Confidential
E-Commerce Giant around the Globe
Terark technology supports its business growth through
Alibaba Cloud.
Proven Results
Terark Compression
$ 5,000$ 30,000
Others (6 servers)
Terark (1 server)
550G
47G
TerarkTPC-H Dataset
TCO (on the same data size)
Hardware & Ops Cost
Terark Confidential
Use Terark’s CO-Index and PA-Zip to implement RocksDB’s SSTable.
• Much better compression
• Much better random read performance
• Terark trades off compression speed for high compression ratio and performance
• Use universal compaction to minimize write amplification
TerarkDB: Compatible with RocksDB
Terark Confidential
Strong Compression ( > 10:1 compression ratio)
- Lift Data Capacity
- Increase Memory Utilization, Lower Down Disk I/O
- Save Data Infrastructure Cost
Extreme Performance (QPS 15~500X of Competitors)
- Lower Latency, Higher Throughput and Concurrency
Simple DevOps & HA
- Leverage MySQL&MongoDB ecosystem
- Support proven devops tools
- HA based on MySQL and MongoDB
MySQL on TerarkDB, Mongo on TerarkDB
Terark Confidential
Core Technology
● CO-Index (Compressed Ordered Index)
Direct search on highly compressed index
● PA-Zip (Point Accessible Zip)
Direct point access one datum on globally compressed dataset
Our breakthrough technology is protected by 5 patents in the US, China and worldwide.
Terark Confidential
Thanks
Sean Fu
Mobile & WeChat: (+86) 13911734987
E: xinyuan@terark.com
Appendix 1: TCO & ROI Details
Hardware Cost (1 server ~ $5000 a year referred to AWS) Operational Cost (~20% of the hardware cost)
Terark $5,000 $1,000
Other Product $30,000 $6,000
Terark Confidential
Appendix 1: TCO & ROI Details
Year(s) Cost Savings Estimated Rev Lift due to Performance/Scalibility Improvement(~20% of Cost Savings)
1 $30,000 $6,000
3 $90,000 $18,000
5 $150,000 $30,000
Terark Confidential
• CO-Index (Compressed Ordered Index)
Terark Nested Succinct Trie
• PA-Zip (Point Accessible Zip)
Global compression, point access
Appendix 2: Core Technology Detail
Terark Confidential
Hash B+Tree Terark Nested Succinct Trie
Compression None OK ✔✔✔ Excellent
Searching ✔✔ Very Fast OK ✔ Fast
Exact Searching ✔ Supported ✔ Supported ✔ Supported
Range Searching Not Supported ✔ Supported ✔ Supported
Prefix Searching Not Supported ✔ Supported ✔ Supported
Regex Searching Not Supported Not Supported ✔ Supported
Reverse Searching(id to key) Not Supported(can be work-around) Not Supported ✔ Supported
Index Comparation
Terark Confidential
Block-based: leveldb,
rocksdb, wiredtiger…
Short data: Terark
Nested Succinct Trie
Long data: Terark Global
Compression
Compression ratio OK ✔✔✔ Excellent ✔✔✔ Excellent
Random Read Slow ✔ Fast ✔ Fast
Sequential Read ✔ Fast OK ✔ Fast
Double Cache Problem YES NO NO
Compression Speed ✔ Fast Slow Slow
Data (Value) Compression
Terark Confidential
2-bits for a node, Pre-Order
DFUDS
101110000100
Level-Order
LOUDS
101110010000
Parent(c) = rank0(select1(c))
Child(p, i) = select0(p) – p + i
Needs findopen, findclose, enclose, which are much
slower than rank/select, rarely used
Simple and fast, small:
Succinct Data Structure represents data within a space which is close to theoretical limit. It uses bitmap to represent data, and uses
rank-select to look for data.
It can tremendously reduce memory usage, but it is very complex to implement. Terark has our own implementations and achieved
much better performance than open-source implementations.
CO-Index: Succinct Tree
Terark Confidential
Patricia Trie: A Compressed Trie
Path compression: Compress all one-child nodes in a
path into a single node
Nested: Convert the compressed path into a new Trie
Requirements: Trie need to support “reverse searching”,
meaning to extract string from the node
CO-Index: Patricia Trie + Nesting
Terark Confidential
• Global Compression
• Global + Local Dictionary
• Short data friendly (~50 bytes)
• Larger dataset, better compression
• Point accessible (via record id)
• Inspired by lz77
PA-Zip (Point Accessible Zip)
Terark Confidential

More Related Content

What's hot

PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Spark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFSJohn Conley
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Spark Summit
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSpark Summit
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...Spark Summit
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsMatthew Rocklin
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark Summit
 

What's hot (20)

PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Spark tutorial
Spark tutorialSpark tutorial
Spark tutorial
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python Applications
 
Up and running with pyspark
Up and running with pysparkUp and running with pyspark
Up and running with pyspark
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 

Similar to Terark Product and Technology

IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfMiguel Angel Fajardo
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Data protection for oracle backup & recovery for oracle databases
Data protection for oracle  backup & recovery for oracle databasesData protection for oracle  backup & recovery for oracle databases
Data protection for oracle backup & recovery for oracle databasessolarisyougood
 
Data protection for oracle backup & recovery for oracle databases
Data protection for oracle  backup & recovery for oracle databasesData protection for oracle  backup & recovery for oracle databases
Data protection for oracle backup & recovery for oracle databasessolarisyougood
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
EMC IT's Best Practices
EMC IT's Best PracticesEMC IT's Best Practices
EMC IT's Best Practiceswebhostingguy
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDBAmazon Web Services
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 

Similar to Terark Product and Technology (20)

IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Exadata
ExadataExadata
Exadata
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Data protection for oracle backup & recovery for oracle databases
Data protection for oracle  backup & recovery for oracle databasesData protection for oracle  backup & recovery for oracle databases
Data protection for oracle backup & recovery for oracle databases
 
Data protection for oracle backup & recovery for oracle databases
Data protection for oracle  backup & recovery for oracle databasesData protection for oracle  backup & recovery for oracle databases
Data protection for oracle backup & recovery for oracle databases
 
I explore
I exploreI explore
I explore
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
EMC IT's Best Practices
EMC IT's Best PracticesEMC IT's Best Practices
EMC IT's Best Practices
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
Exa backup playbook
Exa backup playbookExa backup playbook
Exa backup playbook
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 

Recently uploaded

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Terark Product and Technology

  • 1. Terark Make Data Smaller and Access Faster
  • 2. Terark built a fastest storage engine with best compression. Compatible with MySQL, MongoDB and RocksDB, making random read 200X faster, storage size 10X smaller. It is built for general purpose, optimized for read heavy scenarios, resulting in larger scalability with lower cost for big data applications. Brief Introduction Terark Confidential
  • 3. Y Combinator is the world leading startup incubator (total valuation of portfolio companies $100+ billion). The best known are Airbnb and Dropbox. We Are a YC Company Terark Confidential
  • 4. Paying Customers Terark technology supports Cloud, Big Data and Internet companies to have better performance with less costs. Terark Confidential E-Commerce Giant around the Globe Terark technology supports its business growth through Alibaba Cloud.
  • 5. Proven Results Terark Compression $ 5,000$ 30,000 Others (6 servers) Terark (1 server) 550G 47G TerarkTPC-H Dataset TCO (on the same data size) Hardware & Ops Cost Terark Confidential
  • 6. Use Terark’s CO-Index and PA-Zip to implement RocksDB’s SSTable. • Much better compression • Much better random read performance • Terark trades off compression speed for high compression ratio and performance • Use universal compaction to minimize write amplification TerarkDB: Compatible with RocksDB Terark Confidential
  • 7. Strong Compression ( > 10:1 compression ratio) - Lift Data Capacity - Increase Memory Utilization, Lower Down Disk I/O - Save Data Infrastructure Cost Extreme Performance (QPS 15~500X of Competitors) - Lower Latency, Higher Throughput and Concurrency Simple DevOps & HA - Leverage MySQL&MongoDB ecosystem - Support proven devops tools - HA based on MySQL and MongoDB MySQL on TerarkDB, Mongo on TerarkDB Terark Confidential
  • 8. Core Technology ● CO-Index (Compressed Ordered Index) Direct search on highly compressed index ● PA-Zip (Point Accessible Zip) Direct point access one datum on globally compressed dataset Our breakthrough technology is protected by 5 patents in the US, China and worldwide. Terark Confidential
  • 9. Thanks Sean Fu Mobile & WeChat: (+86) 13911734987 E: xinyuan@terark.com
  • 10. Appendix 1: TCO & ROI Details Hardware Cost (1 server ~ $5000 a year referred to AWS) Operational Cost (~20% of the hardware cost) Terark $5,000 $1,000 Other Product $30,000 $6,000 Terark Confidential
  • 11. Appendix 1: TCO & ROI Details Year(s) Cost Savings Estimated Rev Lift due to Performance/Scalibility Improvement(~20% of Cost Savings) 1 $30,000 $6,000 3 $90,000 $18,000 5 $150,000 $30,000 Terark Confidential
  • 12. • CO-Index (Compressed Ordered Index) Terark Nested Succinct Trie • PA-Zip (Point Accessible Zip) Global compression, point access Appendix 2: Core Technology Detail Terark Confidential
  • 13. Hash B+Tree Terark Nested Succinct Trie Compression None OK ✔✔✔ Excellent Searching ✔✔ Very Fast OK ✔ Fast Exact Searching ✔ Supported ✔ Supported ✔ Supported Range Searching Not Supported ✔ Supported ✔ Supported Prefix Searching Not Supported ✔ Supported ✔ Supported Regex Searching Not Supported Not Supported ✔ Supported Reverse Searching(id to key) Not Supported(can be work-around) Not Supported ✔ Supported Index Comparation Terark Confidential
  • 14. Block-based: leveldb, rocksdb, wiredtiger… Short data: Terark Nested Succinct Trie Long data: Terark Global Compression Compression ratio OK ✔✔✔ Excellent ✔✔✔ Excellent Random Read Slow ✔ Fast ✔ Fast Sequential Read ✔ Fast OK ✔ Fast Double Cache Problem YES NO NO Compression Speed ✔ Fast Slow Slow Data (Value) Compression Terark Confidential
  • 15. 2-bits for a node, Pre-Order DFUDS 101110000100 Level-Order LOUDS 101110010000 Parent(c) = rank0(select1(c)) Child(p, i) = select0(p) – p + i Needs findopen, findclose, enclose, which are much slower than rank/select, rarely used Simple and fast, small: Succinct Data Structure represents data within a space which is close to theoretical limit. It uses bitmap to represent data, and uses rank-select to look for data. It can tremendously reduce memory usage, but it is very complex to implement. Terark has our own implementations and achieved much better performance than open-source implementations. CO-Index: Succinct Tree Terark Confidential
  • 16. Patricia Trie: A Compressed Trie Path compression: Compress all one-child nodes in a path into a single node Nested: Convert the compressed path into a new Trie Requirements: Trie need to support “reverse searching”, meaning to extract string from the node CO-Index: Patricia Trie + Nesting Terark Confidential
  • 17. • Global Compression • Global + Local Dictionary • Short data friendly (~50 bytes) • Larger dataset, better compression • Point accessible (via record id) • Inspired by lz77 PA-Zip (Point Accessible Zip) Terark Confidential