Vector wise presen
Upcoming SlideShare
Loading in...5
×
 

Vector wise presen

on

  • 1,555 views

「日本JasperServerユーザ会(JJSUG)第7回勉強会」「分析用DB Ingres VectorWiseについて(野田)」の資料です。

「日本JasperServerユーザ会(JJSUG)第7回勉強会」「分析用DB Ingres VectorWiseについて(野田)」の資料です。

Statistics

Views

Total Views
1,555
Views on SlideShare
1,554
Embed Views
1

Actions

Likes
1
Downloads
13
Comments
0

1 Embed 1

http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Ingres VectorWise’s differentiator is to unlock the power of modern chips. This must be the focus. No other relational database vendor does this.
  • This slide compares IngresVectorWise data processing performance against Oracle and HP Neoview. While especially HP Neoview is not a direct Ingres VectorWise competitor this product is included in the comparison because a number was publicly available (while generally most vendors don’t have any information publicly available). The links to the resources are included on the slide.Make an assertion that other relational databases will be in the same range as Oracle and HP Neoview.
  • This is an imaginary example. 1 table with (only) 10 columns, and assume every column is 1/10th of the table size. The performance difference gets even more staggering as the table has more columns (and in a data warehouse it is common to see wide tables).Both Oracle and HP Neoview use a row-based architecture (see below for Oracle Exadata). I.e. both Oracle and HP Neoview will have to load all table data in order to process this simple query. Ingres VectorWise only accesses the 2 columns it requires to answer the query so it requires only 2/10th of the data.Per core processing speaks for itself, so if we assume there is no parallelism then the data processing time is dramatically faster with Ingres VectorWise (~38 times faster than Oracle).HP Neoview will not run a query without using parallelism. However, you need to run at about 50x parallelism to get down to 13 seconds. This is going to require a lot more hardware and will introduce complexity. Similarly, Oracle can run in parallel. Or in Oracle you could create a materialized view that happens to contain only the 2 columns you need and only scan 20 rather than 100 GB. However then the query would still take 100 seconds and someone has to tune the database to make this happen.Oracle Exadata implicitly provides the same benefits columnar access provides if you run the query in parallel. I.e. it is certainly possible to get the query in Oracle (or HP Neoview for that matter) down to 13 seconds but it takes a lot more hardware and/or tuning effort. Ingres VectorWise provides this performance out-of-the-box, with no tuning.
  • This slide shows an example of the traditional database processing at the CPU level – tuple by tuple, versus Ingres VectorWises more efficient vector-based processing: apply the same operation to a set (vector) of data at the time.
  • Access to data on disks requires millions of CPU cycles, and can be achieved at 40 to 100 MB/s (for spinning disks – Solid State Disks (SSD) will deliver higher throughput). Access to RAM is a lot more efficient than access to disk. It requires a few hundred CPU cycles to access data in RAM, and a throughput of 2-3 GB/s can be achieved when data is read out of RAM. Access to chip cache however is by far the most efficient was to get access to data. It takes a handful of CPU cycles to access the data and throughput of up to 10 GB/s can be achieved in the disk cache.Ingres VectorWise pre-fetches compressed data from disk to load it into RAM (compressed) and uses the chip cache as the only true processing memory. As a result data processing with Ingres VectorWise is a lot more efficient than other relational database technologies.
  • Ingres VectorWise uses column-based storage. Analytic queries rarely access all columns of the tables accessed in the query. Column-based storage ensures that only relevant data is accessed.Ingres VectorWise features an innovative approach to incremental DML called Positional Delta Trees (PDTs). The PDTs enable efficient updates to the column-based store. Traditionally incremental DML has been an Achilles heel for column-based stores.
  • Ingres VectorWise automatically compresses all data that is stored in the column store. The compression algorithm varies per column depending on the data type, but is automatically chosen by Ingres VectorWise. The algorithms Ingres VectorWise uses are optimized for high speed decompression in order to support the high throughput requirements. Decompression is vectorised just like other functions that operate on the data.In order to obtain maximum throughput Ingres VectorWise pre-fetches compressed data blocks from disks, loads them compressed into memory (into the so-called Column Buffer Manager) and only decompresses the data when it is ready to be processed. As mentioned earlier the chip cache is used as the only true random access memory delivering optimum throughput.
  • As data is loaded into Ingres VectorWise the database automatically creates and maintains a storage index. The storage index is very small relative to the table size and stores minimum and maximum information for data blocks. Based on the information in the storage index the database can very quickly identify candidate data blocks. This is another way Ingres VectorWise minimizes IO necessary to answer queries.

Vector wise presen Vector wise presen Presentation Transcript

  • Ingres VectorWise
    Technical Overview – 21 Jun 2011
  • アジェンダ
    © 2010 Ingres Corporation
    Slide 2
  • Ingres VectorWiseとは?
    分析・解析用のリレーショナル・データベース
    分析のための問合せが他のRDBMS より高速
    最近のCPUの持つ能力を最大限発揮
    安価な汎用サーバーで動作
    © 2010 Ingres Corporation
    Slide 3
    10倍 – 70倍の性能向上
  • Ingres VectorWiseアーキテクチャ
    © 2010 Ingres Corporation
    Slide 4
    企業向け
    データウェアハウス
    BI、レポート
    アプリケーション
    エンド
    ユーザ
    ETL
    ERP
    CRM
    データマート
    SCM
    アドホック検索
    ダッシュボード
    統計
    データマイニング
    分析
    レガシー
    分析DB
    OLTP
  • Ingres VectorWiseの特徴
    特徴
    最近のCPUが持つ機能をフルに活用
    自動的なベクトル処理で解凍、結合、計算
    CPUのキャッシュをRAMとして使用
    更新可能なカラム毎の格納方式
    実証されている技術
    カラム毎の格納方式
    自動的な圧縮
    自動的な格納インデックス
    © 2010 Ingres Corporation
    Slide 5
  • Ingres VectorWiseデータ処理の比較
    © 2010 Ingres Corporation
    Slide 6
    O社DBMS(行ごと格納)200 MB/s – コアあたりのデータ処理スループット。CPUに依存。
    (http://download.oracle.com/docs/cd/E11882_01/server.112/e10578/tdpdw_system.htm#CHDHAEGE)
    H社DBMS(行ごと格納) 150 MB/s – コアあたりのデータ処理スループット。
    (http://www.wintercorp.com/whitepapers/whitepapers.asp)
    Ingres VectorWise(カラム毎の格納) 1.5 GB/s – コアあたりのデータ処理スループット。
  • データ処理性能の例
    © 2010 Ingres Corporation
    Slide 7
    • シナリオ
    • 1テーブルでカラムが10あり、各カラムは、テーブルの1/10のサイズ
    • テーブルサイズは、100GB
    select <c1>, sum <c2> from <table> group by <c1>
    * リニアーなスケーラビリティを仮定
  • 高速化で、より良い分析
    即時にインタラクティブにデータを分析
    時間をかかるデータ準備作業が不要
    より多くのデータを分析
    よりデータを活用
    より多くのユーザがデータを分析
    より多くのアプリケーションからデータをアクセス
    © 2010 Ingres Corporation
    Slide 8
  • より廉価なコストで高速に結果
    高速なデータベース設計
    特別なエキスパートがスキーマを設計する必要なし
    インデックス設計やマテリアライズドビュー、投影などが必要なし
    継続的なチューニングが不要
    安価なx86ベースの汎用サーバー・PCで動作
    1 CPUで20CPU以上の作業をこなす
    単一のサーバで複雑な複数ノードのクラスターを超える
    運用や空調のエネルギー使用・コストを低減
    メンテナンスコストを低減し、故障も少なくなる
    © 2010 Ingres Corporation
    Slide 9
  • Ingres Business Intelligence Partners
    © 2010 Ingres Corporation
    Slide 10
  • Ingres VectorWiseの実績
    The Rohatyn Group : ニューヨークにあるヘッジファンド企業(http://www.ingres.com/images/success_stories/success_story_rohatyn_group.pdf)
    イギリスにあるトップクラスの銀行
    イギリスにあるトップクラスの大学
    ニューヨークにあるB2CのEコマース企業
    カナダにある電話会社
    ポーランドにあるソーシャルネットワークサービス企業
    フィリピンにある政府系金融企業
    オーストラリアにある航空会社
    © 2010 Ingres Corporation
    Slide 11
    2011/3時点
  • Ingres Supports 10,000+ Clients Globally
    © 2010 Ingres Corporation
    Slide 12
  • Ingres VectorWiseの技術
    © 2010 Ingres Corporation
    Slide 13
  • 自動的にベクトル処理を活用
    単一の命令で、何個ものデータを処理
    © 2010 Ingres Corporation
    Slide 14
    SSE(ストリーミング SIMD 拡張命令)
    16個の128bitのレジスタ
    (Intel Sandy Bridgeは256bit)
    *32bit float *4
    *16bit integer * 8
    *8bit byte/char * 16
    etc.
    *加減算、積除算、比較、最大最小など
    *文字列の処理でSSE4.2が効果大
    (GROUP BYやLIKEなど)
    SSE2はPentium4, AMD64以降
    SSE3は後期Pentium4,
    後期Athlon64以降
    SSE4は後期Core2以降
    =
    Many
    V’s
    1
  • CPUキャッシュ内で処理
    CPUキャッシュのアクセスは、RAMより非常に高速
    すべてのベクトルがCPUキャッシュ内に収まるように問合せの実行プランを作成
    © 2010 Ingres Corporation
    Slide 15
    DISK
    Millions
    RAM
    150-250
    Time / Cycles to Process
    CHIP
    2-20
    10GB
    2-3GB
    40-100MB
    Data Processed
  • 更新可能なカラム毎の格納方式
    必要なデータだけにアクセス
    効率的な”増分更新”が可能
    以前のカラム毎の格納方式では、弱点だった
    © 2010 Ingres Corporation
    Slide 16
  • 自動的な圧縮と解凍
    複数のアルゴリズムを使ってカラム毎に圧縮
    最適なものをIngresVectorWiseが自動的に使用
    解凍はベクトル処理
    CPUキャッシュ中でデータ処理
    © 2010 Ingres Corporation
    Slide 17
    I/Oスループット
    を最大化
    CPUキャッシュ中に
    解凍し格納
    キャッシュ
    カラム
    カラム
    バッファ
    管理
    解凍
    CPU
    Disk
    RAM
    RAMへの書き込み・
    読み込みを削減
  • 自動的な最小値/最大値の作成
    いつでも自動的に作成
    自動的にメンテナンス
    データブロックごとに、最小値/最大値を維持
    小さく、読み込みも速い(カラムサイズの0.1%以下)
    候補となるデータブロックを効率よく見つけることが可能になる
    © 2010 Ingres Corporation
    Slide 18
  • TPC-Hベンチマーク
    性能で Ingres VectorWise がNon ClusteredでNO. 1(2011/6/15現在)
    303,290 QphH@100GB
    400,932 QphH@300GB
    436,789 QphH@1000GB
    コストパフォーマンスでIngres VectorWise がNon Clustered でNo. 1 (2011/6/15現在)
    0.16 USD per QphH@100GB
    0.35 USD per QphH@300GB
    0.88 USD per QphH@1000GB
    © 2010 Ingres Corporation
    Slide 19
    TPC-Hベンチマークとは(http://www.tpc.org より引用)
    The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data
    modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates
    decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to
    critical business questions.
  • TPC-Hベンチマーク比較
    © 2010 Ingres Corporation
    Slide 20
  • Slide 21