DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeDatabricks
As the size and complexity of data grows, building reliable data pipelines is increasingly important, but also complex and challenging. Learn how Delta Live Tables simplifies the ETL lifecycle on Delta Lake, helping data engineering teams greatly simplify ETL development and ongoing management to improve data quality, scale operations, and reduce cost.
Building Data Lakes with Apache AirflowGary Stafford
Build a simple Data Lake on AWS using a combination of services, including Amazon Managed Workflows for Apache Airflow (Amazon MWAA), AWS Glue, AWS Glue Studio, Amazon Athena, and Amazon S3.
Blog post and link to the video: https://garystafford.medium.com/building-a-data-lake-with-apache-airflow-b48bd953c2b
Performance Optimizations in Apache ImpalaCloudera, Inc.
Apache Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Hive or SPARK. Impala is written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile).
To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. Impala employs runtime code generation using LLVM in order to improve execution times and uses static and dynamic partition pruning to significantly reduce the amount of data accessed. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload. Although initially designed for running on-premises against HDFS-stored data, Impala can also run on public clouds and access data stored in various storage engines such as object stores (e.g. AWS S3), Apache Kudu and HBase. In this talk, we present Impala's architecture in detail and discuss the integration with different storage engines and the cloud.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeDatabricks
As the size and complexity of data grows, building reliable data pipelines is increasingly important, but also complex and challenging. Learn how Delta Live Tables simplifies the ETL lifecycle on Delta Lake, helping data engineering teams greatly simplify ETL development and ongoing management to improve data quality, scale operations, and reduce cost.
Building Data Lakes with Apache AirflowGary Stafford
Build a simple Data Lake on AWS using a combination of services, including Amazon Managed Workflows for Apache Airflow (Amazon MWAA), AWS Glue, AWS Glue Studio, Amazon Athena, and Amazon S3.
Blog post and link to the video: https://garystafford.medium.com/building-a-data-lake-with-apache-airflow-b48bd953c2b
Performance Optimizations in Apache ImpalaCloudera, Inc.
Apache Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Hive or SPARK. Impala is written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile).
To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. Impala employs runtime code generation using LLVM in order to improve execution times and uses static and dynamic partition pruning to significantly reduce the amount of data accessed. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload. Although initially designed for running on-premises against HDFS-stored data, Impala can also run on public clouds and access data stored in various storage engines such as object stores (e.g. AWS S3), Apache Kudu and HBase. In this talk, we present Impala's architecture in detail and discuss the integration with different storage engines and the cloud.
Understanding SQL Trace, TKPROF and Execution Plan for beginnersCarlos Sierra
The three fundamental steps of SQL Tuning are: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation. This introductory session on SQL Tuning is for novice DBAs and Developers that are required to investigate a piece of an application performing poorly.
On this session participants will learn about producing a SQL Trace then a summary TKPROF report. A sample TKPROF is navigated with the audience, where the trivial and the no so trivial is exposed and explain. Execution Plans are also navigated and explained, so participants can later untangle complex Execution Plans and start diagnosing SQL performing badly.
This session encourages participants to ask all kind of questions that could be potential road-blocks for deeper understanding of how to address a SQL performing poorly.
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Data Con LA 2020
Description
Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action
Speaker
Matt Sarrel, Imply Data, Developer Evangelist
This presentation was inspired post read of "TimeSeries Databases" -- Ted Dunning & Ellen Friedman.
I have tried to summarize a lot of the previous bench marks. Hope others find it useful. The slides were compiled early 2015 so some of the results might have changed but the core literature should still hold.
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Troubleshooting tips and tricks for Oracle Database Oct 2020Sandesh Rao
This talk presents 15 different tips and tricks using tools to better troubleshoot and debug problems with Database , Oracle RAC and Oracle Clusterware , ASM and how to get the right pieces of data with the least of commands which today most people do manually. This session will cover tools from the Oracle Autonomous Health Framework (AHF) like Trace file Analyzer (TFA) to collect , organize and analyze log data , Exachk and orachk to perform mass best practices analysis and automation , Cluster Health Advisor to debug node evictions and calibrate the framework , OSWatcher and its analysis engine , oratop for pinpointing performance issues and many others to make one feel like a rockstar DBA.
This document covers guidelines around achieving multitenancy in a data lake environment. It mentions the different design and implementation guidelines necessary for on premise as well as cloud-based multitenant data lake, and highlights the reference architecture for both these deployment options.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Change Data Feed is a new feature of Delta Lake on Databricks that is available as a public preview since DBR 8.2. This feature enables a new class of ETL workloads such as incremental table/view maintenance and change auditing that were not possible before. In short, users will now be able to query row level changes across different versions of a Delta table.
In this talk we will dive into how Change Data Feed works under the hood and how to use it with existing ETL jobs to make them more efficient and also go over some new workloads it can enable.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
Oracle Data Guard and Oracle Active Data Guard have long been the answer for the real-time protection, availability, and usability of Oracle data. This presentation provides an in-depth look at several key new features that will make your life easier and protect your data in new and more flexible ways. Learn how Oracle Active Data Guard 19c has been integrated with Oracle Database In-Memory and offers a faster application response after a role transition. See how DML can now be redirected from an Oracle Active Data Guard standby to its primary for more flexible data protection in today’s data centers or your data clouds. This technical deep dive on Active Data Guard is designed to give you a glimpse into upcoming new features brought to you by Oracle Development.
This presentation provides a clear overview of how Oracle Database In-Memory optimizes both analytics and mixed workloads, delivering outstanding performance while supporting real-time analytics, business intelligence, and reporting. It provides details on what you can expect from Database In-Memory in both Oracle Database 12.1.0.2 and 12.2.
Understanding SQL Trace, TKPROF and Execution Plan for beginnersCarlos Sierra
The three fundamental steps of SQL Tuning are: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation. This introductory session on SQL Tuning is for novice DBAs and Developers that are required to investigate a piece of an application performing poorly.
On this session participants will learn about producing a SQL Trace then a summary TKPROF report. A sample TKPROF is navigated with the audience, where the trivial and the no so trivial is exposed and explain. Execution Plans are also navigated and explained, so participants can later untangle complex Execution Plans and start diagnosing SQL performing badly.
This session encourages participants to ask all kind of questions that could be potential road-blocks for deeper understanding of how to address a SQL performing poorly.
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Data Con LA 2020
Description
Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action
Speaker
Matt Sarrel, Imply Data, Developer Evangelist
This presentation was inspired post read of "TimeSeries Databases" -- Ted Dunning & Ellen Friedman.
I have tried to summarize a lot of the previous bench marks. Hope others find it useful. The slides were compiled early 2015 so some of the results might have changed but the core literature should still hold.
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Troubleshooting tips and tricks for Oracle Database Oct 2020Sandesh Rao
This talk presents 15 different tips and tricks using tools to better troubleshoot and debug problems with Database , Oracle RAC and Oracle Clusterware , ASM and how to get the right pieces of data with the least of commands which today most people do manually. This session will cover tools from the Oracle Autonomous Health Framework (AHF) like Trace file Analyzer (TFA) to collect , organize and analyze log data , Exachk and orachk to perform mass best practices analysis and automation , Cluster Health Advisor to debug node evictions and calibrate the framework , OSWatcher and its analysis engine , oratop for pinpointing performance issues and many others to make one feel like a rockstar DBA.
This document covers guidelines around achieving multitenancy in a data lake environment. It mentions the different design and implementation guidelines necessary for on premise as well as cloud-based multitenant data lake, and highlights the reference architecture for both these deployment options.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Change Data Feed is a new feature of Delta Lake on Databricks that is available as a public preview since DBR 8.2. This feature enables a new class of ETL workloads such as incremental table/view maintenance and change auditing that were not possible before. In short, users will now be able to query row level changes across different versions of a Delta table.
In this talk we will dive into how Change Data Feed works under the hood and how to use it with existing ETL jobs to make them more efficient and also go over some new workloads it can enable.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
Oracle Data Guard and Oracle Active Data Guard have long been the answer for the real-time protection, availability, and usability of Oracle data. This presentation provides an in-depth look at several key new features that will make your life easier and protect your data in new and more flexible ways. Learn how Oracle Active Data Guard 19c has been integrated with Oracle Database In-Memory and offers a faster application response after a role transition. See how DML can now be redirected from an Oracle Active Data Guard standby to its primary for more flexible data protection in today’s data centers or your data clouds. This technical deep dive on Active Data Guard is designed to give you a glimpse into upcoming new features brought to you by Oracle Development.
This presentation provides a clear overview of how Oracle Database In-Memory optimizes both analytics and mixed workloads, delivering outstanding performance while supporting real-time analytics, business intelligence, and reporting. It provides details on what you can expect from Database In-Memory in both Oracle Database 12.1.0.2 and 12.2.
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
Trinity 大幅提昇企業面對大量快速變化資訊潮流時的競爭力。
現今企業 BI 多建於 RDBMS 上,伴隨大量的 ETL 與資料交換作業。在導入 Hadoop Big Data 應用之後, 如何有效地與既有 BI 系統介接,且進一步整合,以發揮整體綜效,將是一項挑戰。
Trinity 藉由優越的架構,在傳統 Structured Data 與 Hadoop Big Data 的應用間,建立無縫的交換作業,讓資訊分析人員直接運用熟悉的方式,以大幅降低導入 Big Data 應用時的學習曲線與後續對系統維運所投入的人力。
This talk is introduce by Junping Du, who is an Apache member and Hadoop PMC, at Apache Event at Tsinghua University in China.
Junping Du comes from Tencent and is the chairman of TOSA.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Practice of building apache sharding sphere iincubator communityjixuan1989
This talk is introduce by Liang Zhang, who is a PPMC of Apache SahrdingSphere (incubating) project, at Apache Event at Tsinghua University in China.
Liang Zhang comes from JD.com.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...jixuan1989
This talk is introduce by Willem Ning Jiang, who is an Apache member and ServiceComb PMC, at Apache Event at Tsinghua University in China.
Willem Ning Jiang comes from Huawei.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
This talk is introduce by Craig L Russell, who is the Apache Software Foundation Chairman, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
31. 数据写入
d1,s1, t1, v
d1,s1, t3, v
d1,s1, t2, v
d1,s1, t4, v
d1,s1, t_, v
内存数据
d1 d2
s1 s1 s2
t1, v
t3, v
t2, v
t4, v
t1, v
t2, v
t1, v
t2, v
Disk
cleaned
memory
存在大量乱序数据
乱序时长从0-300分钟不等
30分钟以内乱序数据较多
•WritableMemChunk容忍一定的乱序,且不排序
•过于陈旧的数据,则放入溢出数据对应的内存表中
*一些系统在内存中采用Btree结构
•刷入磁盘时或者服务于查询时,将数据拷贝并排序
•CopyOnRead
d2,s1, t1, v
d2,s1, t2, v
d2,s2, t1, v
d2,s2, t2, v
原始数据
33. 刷写磁盘
MC M M
MC M
M M M M
F1
F2
F3
ThreadPool
并行与串行协同工作
多
线
程
并
行
单文
件所
有M
任务
串行
保序
执行
WritableMemChunk 增大任务调度并行度
1 排序
2 编码
3 IO
1 2 3 1 2 3 1 2 3
1
2
3
1
2
3
1
2
3
34. 数据文件:现有文件格式的问题
Apache Parquet
Time1 Value1
p领域语义
Ø 时间列总是被读取
(otherwise the time dimension is missing)
Ø 数据总是按时间排序
Ø 统计信息的意义很重大
(max/min time/value, for accelerating query)
p空值问题
Ø Parquet 需要存储空值来保证按行返回结果
Ø Parquet has to add R and D fields for supporting nesting, which is
meaningless for time series data.
Time2 Value2
时间序列数据通用文件格式
36. 数据文件 TsFile
ChunkGroup Footer
Chunk
Header
Chunk data Chunk data Chunk data
marker
Chunk
Header
marker
Chunk
Header
marker
TsDeviceMetadata(
d1)
marker
TsFileMetaData(d1,d2)
String: 12 bytes
Magic
TsDeviceMetadata(
d2)
ChunkGroup Footer
Chunk
Header
Chunk data Chunk data Chunk data
marker
Chunk
Header
marker
Chunk
Header
marker
marker
marker
TsFileMetada
ta Size
String: 12 bytes
Magic
d1
d2
37. 数据文件 TsFile
ChunkGroup Footer
Chunk
Header
Chunk data Chunk data Chunk data
marker
Chunk
Header
marker
Chunk
Header
marker
TsDeviceMetadata(
d1)
marker
TsFileMetaData(d1,d2)
String: 12 bytes
Magic
TsDeviceMetadata(
d2)
ChunkGroup Footer
Chunk
Header
Chunk data Chunk data Chunk data
marker
Chunk
Header
marker
Chunk
Header
marker
marker
marker
TsFileMetada
ta Size
String: 12 bytes
Magic
d1
d2
自解释
自修复
向量化读取
38. 编码
128, 136, 144, 152, 160, …
8, 8, 8, 8 à 1st difference is constant.
0, 0, 0 à 2nd difference is 1-bit storage needed!
128, 135, 143, 154, 163, …
7, 8, 11, 9 à 1st difference is not constant
1, 3, -2 à 2nd difference is 2-bit storage needed!
Unified support of fixed frequency times series
or irregular frequency time series
TS2Diff encoding – Optimized for timestamps Bitmap encoding – for enum-type values
RLE encoding - for consecutively repeated
values
BitPacking encoding - for squeezing out
wasteful bits when storing switch values
Gorilla encoding