SlideShare a Scribd company logo
1 of 34
Download to read offline
Introduction to
Tokyo Products

          Mikio Hirabayashi
              <hirarin@gmail.com>
Tokyo Products
• Tokyo Cabinet
  – database library
• Tokyo Tyrant                                    applications
                                                                               Prome
  – database server              custom storage    Tyrant           Dystopia
                                                                                nade


• Tokyo Dystopia                                        Cabinet
  – full-text search engine
                                                      file system
• Tokyo Promenade
  – content management system


• open source
  – released under LGPL
• powerful, portable, practical
  – written in the standard C, optimized to POSIX
Tokyo Cabinet
 - database library -
Features
• modern implementation of DBM
 – key/value database
   • e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB
 – simple library = process embedded
 – Successor of QDBM
   • C99 and POSIX compatible, using Pthread, mmap, etc...
   • Win32 porting is work-in-progress

• high performance
 – insert: 0.4 sec/1M records (2,500,000 qps)
 – search: 0.33 sec/1M records (3,000,000 qps)
• high concurrency
  – multi-thread safe
  – read/write locking by records
• high scalability
  – hash and B+tree structure = O(1) and O(log N)
  – no actual limit size of a database file (to 8 exabytes)
• transaction
  – write ahead logging and shadow paging
  – ACID properties
• various APIs
  – on-memory list/hash/tree
  – file hash/B+tree/array/table
• script language bindings
  – Perl, Ruby, Java, Lua, Python, PHP, Haskell, Erlang, etc...
TCHDB: Hash Database
• static hashing                  bucket array

  – O(1) time complexity

• separate chaining                key           value

  – binary search tree             key           value

  – balances by the second hash    key           value

• free block pool                  key           value

                                   key           value
  – best fit allocation
  – dynamic defragmentation        key           value


• combines mmap and                key

                                   key
                                                 value

                                                 value
  pwrite/pread                     key           value
  – saves calling system calls
                                   key           value
• compression                      key           value
  – deflate(gzip)/bzip2/custom
TCBDB: B+ Tree Database
• B+ tree                                     key   value
  – O(log N) time complexity   B tree index   key   value

• page caching                                key
                                              key
                                                    value
                                                    value
  – LRU removing
                                              key   value
  – speculative search
                                              key   value
• stands on hash DB                           key   value
  – records pages in hash DB                  key   value
  – succeeds time and space                   key   value
    efficiency                                key   value
• custom comparison                           key   value

  function                                    key   value

  – prefix/range matching                     key   value
                                              key   value
• cursor                                      key   value
  – jump/next/prev                            key   value
TCFDB: Fixed-length Database
• array of fixed-
  length elements          array
  – O(1) time complexity     value   value   value   value
  – natural number keys      value   value   value   value
  – addresses records by     value   value   value   value
    multiple of key          value   value   value   value

• most effective             value

                             value
                                     value

                                     value
                                             value

                                             value
                                                     value

                                                     value
  – bulk load by mmap        value   value   value   value
  – no key storage per       value   value   value   value
    record                   value   value   value   value
  – extremely fast and       value   value   value   value
    concurrent
TCTDB: Table Database
• column based
  – the primary key and named
    columns
  – stands on hash DB                                                    bucket array

• flexible structure                   primary key   name value name value name value
  – no data scheme, no data type
  – various structure for each
    record                             primary key   name value name value name value


• query mechanism                      primary key   name value name value name value
  – various operators matching
    column values                      primary key   name value name value name value
  – lexical/decimal orders by column
    values                             primary key   name value name value name value

• column indexes
  –   implemented with B+ tree         primary key   name value name value name value

  –   typed as string/number
  –   inverted index of token/q-gram                            column index
  –   query optimizer
On-memory Structures
• TCXSTR: extensible string
  – concatenation, formatted allocation
• TCLIST: array list (dequeue)
  – random access by index
  – push/pop, unshift/shift, insert/remove
• TCMAP: map of hash table
  – insert/remove/search
  – iterator by order of insertion
• TCTREE: map of ordered tree
  – insert/remove/search
  – iterator by order of comparison function
Other Mechanisms
• abstract database
  – common interface of 6 schema
     • on-memory hash, on-memory tree
     • file hash, file B+tree, file array, file table
  – decides the concrete scheme in runtime

• remote database
  – network interface of the abstract database
  – yes, it's Tokyo Tyrant!

• miscellaneous utilities
  – string processing, filesystem operation
  – memory pool, encoding/decoding
Example Code
#include   <tcutil.h>
#include   <tchdb.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){

  TCHDB *hdb;
  int ecode;
  char *key, *value;

  /* create the object */
  hdb = tchdbnew();

  /* open the database */
  if(!tchdbopen(hdb, "casket.hdb", HDBOWRITER | HDBOCREAT)){
    ecode = tchdbecode(hdb);                                       /* traverse records */
    fprintf(stderr, "open error: %s¥n", tchdberrmsg(ecode));       tchdbiterinit(hdb);
  }                                                                while((key = tchdbiternext2(hdb)) != NULL){
                                                                     value = tchdbget2(hdb, key);
  /* store records */                                                if(value){
  if(!tchdbput2(hdb, "foo", "hop") ||                                  printf("%s:%s¥n", key, value);
     !tchdbput2(hdb, "bar", "step") ||                                 free(value);
     !tchdbput2(hdb, "baz", "jump")){                                }
    ecode = tchdbecode(hdb);                                         free(key);
    fprintf(stderr, "put error: %s¥n", tchdberrmsg(ecode));        }
  }
                                                                   /* close the database */
  /* retrieve records */                                           if(!tchdbclose(hdb)){
  value = tchdbget2(hdb, "foo");                                     ecode = tchdbecode(hdb);
  if(value){                                                         fprintf(stderr, "close error: %s¥n", tchdberrmsg(ecode));
    printf("%s¥n", value);                                         }
    free(value);
  } else {                                                         /* delete the object */
    ecode = tchdbecode(hdb);                                       tchdbdel(hdb);
    fprintf(stderr, "get error: %s¥n", tchdberrmsg(ecode));
  }                                                                return 0;
                                                               }
Tokyo Tyrant
- database server -
Features
• network server of Tokyo Cabinet
 – client/server model
 – multi applications can access one database
 – effective binary protocol
• compatible protocols
 – supports memcached protocol and HTTP
 – available from most popular languages
• high concurrency/performance
 – resolves "c10k" with epoll/kqueue/eventports
 – 17.2 sec/1M queries (58,000 qps)
• high availability
  – hot backup and update log
  – asynchronous replication between servers
• various database schema
  – using the abstract database API of Tokyo Cabinet
• effective operations
  – no-reply updating, multi-record retrieval
  – atomic increment
• Lua extension
  – defines arbitrary database operations
  – atomic operation by record locking
• pure script language interfaces
  – Perl, Ruby, Java, Python, PHP, Erlang, etc...
Asynchronous Replication
 master and slaves                                 dual master
 (load balancing)                                  (fail over)

                          write query

      master server                      client
                                                          client
            database                                                 query if the master is dead
                                                  query
                            read query
           update log                             active master                 standby master
                            with load balancing

                                                      database                        database
replicate
the difference
slave server            slave server                  update log                     update log
                                                                   replicate
      database              database
                                                                   the difference



     update log             update log
Thread Pool Model
                                 epoll/kqueue            listen
accept the client connection
if the event is about the listener           queue   first of all, the listening socket is enqueued into
                                                     the epoll queue
           accept           epoll_ctl(add)
                                                                           queue back if keep-alive
                            epoll_wait

                                                         task manager
                            epoll_ctl(del)
                                                                  queue
                                                     enqueue
          move the readable client socket
          from the epoll queue to the task queue

                                                       deque
                                                                                 worker thread

                                                                                 worker thread

                                                     do each task                worker thread
Lua Extention
• defines DB operations as Lua functions
  – clients send the function name and record data
  – the server returns the return value of the function

• options about atomicity
  – no locking / record locking / global locking


  front end           request           back end
                      - function name
                      - key data            Tokyo Tyrant
                      - value data
    Clients
                                            Tokyo Cabinet   Lua processor
                     response
                     - result data

                                                                  script
                                              database          user-defined
                                                                operations
case: Timestamp DB at mixi.jp
• 20 million records               mod_perl
  – each record size is 20 bytes
                                      home.pl          update
• more than 10,000                  show_friend.pl
                                                                 TT (active)


  updates per sec.                   view_diary.pl
                                                                  database

  – keeps 10,000 connections
                                         search.pl               replication
• dual master                            other pages

  replication                                                     TT (standby)

  – each server is only one                                         database
                                                         fetch
• memcached                         list_friend.pl



  compatible protocol
                                    list_bookmark.pl



  – reuses existing Perl clients
case: Cache for Big Storages
• works as proxy                   clients                        1. inserts to the storage
  – mediates insert/search                                        2. inserts to the cache

  – write through, read through

• Lua extension
                                                 Tokyo Tyrant          MySQL/hBase


  – atomic operation by record                    atomic insert
                                                                          database
    locking                                             Lua
  – uses LuaSocket to access
    the storage                                                           database
                                                 atomic search
• proper DB scheme                                      Lua
  – TCMDB: for generic cache                                              database

  – TCNDB: for biased access
  – TCHDB: for large records                           cache
                                                                          database
    such as image                 1. retrieves from the cache
  – TCFDB: for small records           if found, return
                                  2. retrieves from the storage
    such as timestamp             3. inserts to the cache
Example Code
#include   <tcrdb.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){

  TCRDB *rdb;
  int ecode;
  char *value;

  /* create the object */
  rdb = tcrdbnew();

  /* connect to the server */
  if(!tcrdbopen(rdb, "localhost", 1978)){
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "open error: %s¥n", tcrdberrmsg(ecode));
  }

  /* store records */
  if(!tcrdbput2(rdb, "foo", "hop") ||
     !tcrdbput2(rdb, "bar", "step") ||
     !tcrdbput2(rdb, "baz", "jump")){
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "put error: %s¥n", tcrdberrmsg(ecode));
  }                                                                /* close the connection */
                                                                   if(!tcrdbclose(rdb)){
  /* retrieve records */                                             ecode = tcrdbecode(rdb);
  value = tcrdbget2(rdb, "foo");                                     fprintf(stderr, "close error: %s¥n", tcrdberrmsg(ecode));
  if(value){                                                       }
    printf("%s¥n", value);
    free(value);                                                   /* delete the object */
  } else {                                                         tcrdbdel(rdb);
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "get error: %s¥n", tcrdberrmsg(ecode));        return 0;
  }                                                            }
Tokyo Dystopia
- full-text search engine -
Features
• full-text search engine
  – manages databases of Tokyo Cabinet as an inverted
    index

• combines two tokenizers
  – character N-gram (bi-gram) method
     • perfect recall ratio
  – simple word by outer language processor
     • high accuracy and high performance

• high performance/scalability
  – handles more than 10 million documents
  – searches in milliseconds
• optimized to professional use
  – layered architecture of APIs
  – no embedded scoring system
    • to combine outer scoring system
  – no text filter, no crawler, no language
    processor
• convenient utilities
  – multilingualism with Unicode
  – set operations
  – phrase matching, prefix matching, suffix
    matching, and token matching
  – command line utilities
Inverted Index
• stands on key/value database
  – key = token
     • N-gram or simple word
  – value = occurrence data (posting list)
     • list of pairs of document number and offset in the document

• uses B+ tree database
  – reduces write operations into the disk device
  – enables common prefix search for tokens
  – delta encoding and deflate compression

       ID:21            text: "abracadabra"
          a    -   21:10          ca - 21:1, 21:8
          ab   -   21:0,21:7      da - 21:4
          ac   -   21:3           ra - 21:2, 21:9
          br   -   21:5
Layered Architecture
• character N-gram index
  – "q-gram index" (only index), and "indexed database"
  – uses embedded tokenizer

• word index
  – "word index" (only index), and "tagged index"
  – uses outer tokenizer
                                                Applications
                             Character N-gram Index            Tagging Index

                               indexed database         tagged database

                                 q-gram index              word index

                                              Tokyo Cabinet
case: friend search at mixi.jp
• 20 million records
  – each record size is 1K bytes                   query               user interface
  – name and self introduction
                                      merger
• more than 100 qps                       TT's        social               query

• attribute narrowing                    cache        graph

  – gender, address, birthday                                         searcher
  – multiple sort orders              copy the social graph
                                                                        inverted   attribute

• distributed processing
                                                                          index       DB


  – more than 10 servers              indexer
  – indexer, searchers, merger                                      copy the index and the DB
                                        inverted     attribute
• ranking by social                       index         DB


  graph                                                          dump profile data

  – the merger scores the result by
    following the friend links                                    profile DB
Example Code
#include   <dystopia.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){
  TCIDB *idb;
  int ecode, rnum, i;
  uint64_t *result;
  char *text;
                                                                   /* search records */
  /* create the object */                                          result = tcidbsearch2(idb, "john || thomas", &rnum);
  idb = tcidbnew();                                                if(result){
                                                                     for(i = 0; i < rnum; i++){
  /* open the database */                                              text = tcidbget(idb, result[i]);
  if(!tcidbopen(idb, "casket", IDBOWRITER | IDBOCREAT)){               if(text){
    ecode = tcidbecode(idb);                                             printf("%d¥t%s¥n", (int)result[i], text);
    fprintf(stderr, "open error: %s¥n", tcidberrmsg(ecode));             free(text);
  }                                                                    }
                                                                     }
  /* store records */                                                free(result);
  if(!tcidbput(idb, 1, "George Washington") ||                     } else {
     !tcidbput(idb, 2, "John Adams") ||                              ecode = tcidbecode(idb);
     !tcidbput(idb, 3, "Thomas Jefferson")){                         fprintf(stderr, "search error: %s¥n", tcidberrmsg(ecode));
    ecode = tcidbecode(idb);                                       }
    fprintf(stderr, "put error: %s¥n", tcidberrmsg(ecode));
  }                                                                /* close the database */
                                                                   if(!tcidbclose(idb)){
                                                                     ecode = tcidbecode(idb);
                                                                     fprintf(stderr, "close error: %s¥n", tcidberrmsg(ecode));
                                                                   }

                                                                   /* delete the object */
                                                                   tcidbdel(idb);

                                                                   return 0;
                                                               }
Tokyo Promenade
- content management system -
Features
• content management system
  – manages Web contents easily with a browser
  – available as BBS, Blog, and Wiki

• simple and logical interface
  – aims at conciseness like LaTeX
  – optimized for text browsers such as w3m and Lynx
  – complying with XHTML 1.0 and considering WCAG 1.0

• high performance/throughput
  – implemented in pure C
  – uses Tokyo Cabinet and supports FastCGI
  – 0.836ms/view (more than 1,000 qps)
• sufficient functionality
  – simple Wiki formatting
  – file uploader and manager
  – user authentication by the login form
  – guest comment authorization by a riddle
  – supports the sidebar navigation
  – full-text/attribute search, calendar view
  – Atom feed
• flexible customizability
  – thorough separation of logic and presentation
  – template file to generate the output
  – server side scripting by the Lua extension
  – post processing by outer commands
Example Code
#!   Introduction to Tokyo Cabinet
#c   2009-11-05T18:58:39+09:00
#m   2009-11-05T18:58:39+09:00
#o   mikio
#t   database,programming,tokyocabinet

This article describes what is [[Tokyo
Cabinet|http://1978th.net/tokyocabinet/]] and
how to use it.

@ upfile:1257415094-logo-ja.png

* Features

- modern implementation of DBM
-- key/value database
-- e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley
DB
- simple library = process embedded
- Successor of QDBM
-- C99 and POSIX compatible, using Pthread,
mmap, etc...
-- Win32 porting is work-in-progress
- high performance
- insert: 0.4 sec/1M records (2,500,000 qps)
- search: 0.33 sec/1M records (3,000,000 qps)
innovating more and yet more...
               http://1978th.net/
Introduction to Tokyo Products

More Related Content

What's hot

(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014
(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014
(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014Amazon Web Services
 
파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)Heungsub Lee
 
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチAmazon Web Services Japan
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティスAmazon Web Services Japan
 
What's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBWhat's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBMikiya Okuno
 
いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計Serverworks Co.,Ltd.
 
Autoscale a self-healing cluster in OpenStack with Heat
Autoscale a self-healing cluster in OpenStack with HeatAutoscale a self-healing cluster in OpenStack with Heat
Autoscale a self-healing cluster in OpenStack with HeatRico Lin
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介Amazon Web Services Japan
 
20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBSAmazon Web Services Japan
 
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0Amazon Web Services Japan
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAmazon Web Services Japan
 
AWS における Microservices Architecture と DevOps を推進する組織と人とツール
AWS における Microservices Architecture と DevOps を推進する組織と人とツールAWS における Microservices Architecture と DevOps を推進する組織と人とツール
AWS における Microservices Architecture と DevOps を推進する組織と人とツールAmazon Web Services Japan
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 
AWS Black Belt Online Seminar 2017 Amazon DynamoDB
AWS Black Belt Online Seminar 2017 Amazon DynamoDB AWS Black Belt Online Seminar 2017 Amazon DynamoDB
AWS Black Belt Online Seminar 2017 Amazon DynamoDB Amazon Web Services Japan
 
俺のサイジング
俺のサイジング俺のサイジング
俺のサイジングToru Makabe
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザNoritaka Sekiyama
 

What's hot (20)

(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014
(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014
(SPOT301) AWS Innovation at Scale | AWS re:Invent 2014
 
파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)
 
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
 
What's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBWhat's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDB
 
いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計
 
Autoscale a self-healing cluster in OpenStack with Heat
Autoscale a self-healing cluster in OpenStack with HeatAutoscale a self-healing cluster in OpenStack with Heat
Autoscale a self-healing cluster in OpenStack with Heat
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介
AWS初心者向けWebinar 失敗例を成功に変える AWSアンチパターンのご紹介
 
20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS
 
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon Kinesis
 
AWS における Microservices Architecture と DevOps を推進する組織と人とツール
AWS における Microservices Architecture と DevOps を推進する組織と人とツールAWS における Microservices Architecture と DevOps を推進する組織と人とツール
AWS における Microservices Architecture と DevOps を推進する組織と人とツール
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
AWS Black Belt Online Seminar 2017 Amazon DynamoDB
AWS Black Belt Online Seminar 2017 Amazon DynamoDB AWS Black Belt Online Seminar 2017 Amazon DynamoDB
AWS Black Belt Online Seminar 2017 Amazon DynamoDB
 
俺のサイジング
俺のサイジング俺のサイジング
俺のサイジング
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザ
 

Similar to Introduction to Tokyo Products

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemCloudera, Inc.
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2John Ashmead
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014Avinash Ramineni
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksData Con LA
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Chris Fregly
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014clairvoyantllc
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 

Similar to Introduction to Tokyo Products (20)

Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Introduction to Tokyo Products

  • 1. Introduction to Tokyo Products Mikio Hirabayashi <hirarin@gmail.com>
  • 2. Tokyo Products • Tokyo Cabinet – database library • Tokyo Tyrant applications Prome – database server custom storage Tyrant Dystopia nade • Tokyo Dystopia Cabinet – full-text search engine file system • Tokyo Promenade – content management system • open source – released under LGPL • powerful, portable, practical – written in the standard C, optimized to POSIX
  • 3. Tokyo Cabinet - database library -
  • 4. Features • modern implementation of DBM – key/value database • e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB – simple library = process embedded – Successor of QDBM • C99 and POSIX compatible, using Pthread, mmap, etc... • Win32 porting is work-in-progress • high performance – insert: 0.4 sec/1M records (2,500,000 qps) – search: 0.33 sec/1M records (3,000,000 qps)
  • 5. • high concurrency – multi-thread safe – read/write locking by records • high scalability – hash and B+tree structure = O(1) and O(log N) – no actual limit size of a database file (to 8 exabytes) • transaction – write ahead logging and shadow paging – ACID properties • various APIs – on-memory list/hash/tree – file hash/B+tree/array/table • script language bindings – Perl, Ruby, Java, Lua, Python, PHP, Haskell, Erlang, etc...
  • 6. TCHDB: Hash Database • static hashing bucket array – O(1) time complexity • separate chaining key value – binary search tree key value – balances by the second hash key value • free block pool key value key value – best fit allocation – dynamic defragmentation key value • combines mmap and key key value value pwrite/pread key value – saves calling system calls key value • compression key value – deflate(gzip)/bzip2/custom
  • 7. TCBDB: B+ Tree Database • B+ tree key value – O(log N) time complexity B tree index key value • page caching key key value value – LRU removing key value – speculative search key value • stands on hash DB key value – records pages in hash DB key value – succeeds time and space key value efficiency key value • custom comparison key value function key value – prefix/range matching key value key value • cursor key value – jump/next/prev key value
  • 8. TCFDB: Fixed-length Database • array of fixed- length elements array – O(1) time complexity value value value value – natural number keys value value value value – addresses records by value value value value multiple of key value value value value • most effective value value value value value value value value – bulk load by mmap value value value value – no key storage per value value value value record value value value value – extremely fast and value value value value concurrent
  • 9. TCTDB: Table Database • column based – the primary key and named columns – stands on hash DB bucket array • flexible structure primary key name value name value name value – no data scheme, no data type – various structure for each record primary key name value name value name value • query mechanism primary key name value name value name value – various operators matching column values primary key name value name value name value – lexical/decimal orders by column values primary key name value name value name value • column indexes – implemented with B+ tree primary key name value name value name value – typed as string/number – inverted index of token/q-gram column index – query optimizer
  • 10. On-memory Structures • TCXSTR: extensible string – concatenation, formatted allocation • TCLIST: array list (dequeue) – random access by index – push/pop, unshift/shift, insert/remove • TCMAP: map of hash table – insert/remove/search – iterator by order of insertion • TCTREE: map of ordered tree – insert/remove/search – iterator by order of comparison function
  • 11. Other Mechanisms • abstract database – common interface of 6 schema • on-memory hash, on-memory tree • file hash, file B+tree, file array, file table – decides the concrete scheme in runtime • remote database – network interface of the abstract database – yes, it's Tokyo Tyrant! • miscellaneous utilities – string processing, filesystem operation – memory pool, encoding/decoding
  • 12. Example Code #include <tcutil.h> #include <tchdb.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCHDB *hdb; int ecode; char *key, *value; /* create the object */ hdb = tchdbnew(); /* open the database */ if(!tchdbopen(hdb, "casket.hdb", HDBOWRITER | HDBOCREAT)){ ecode = tchdbecode(hdb); /* traverse records */ fprintf(stderr, "open error: %s¥n", tchdberrmsg(ecode)); tchdbiterinit(hdb); } while((key = tchdbiternext2(hdb)) != NULL){ value = tchdbget2(hdb, key); /* store records */ if(value){ if(!tchdbput2(hdb, "foo", "hop") || printf("%s:%s¥n", key, value); !tchdbput2(hdb, "bar", "step") || free(value); !tchdbput2(hdb, "baz", "jump")){ } ecode = tchdbecode(hdb); free(key); fprintf(stderr, "put error: %s¥n", tchdberrmsg(ecode)); } } /* close the database */ /* retrieve records */ if(!tchdbclose(hdb)){ value = tchdbget2(hdb, "foo"); ecode = tchdbecode(hdb); if(value){ fprintf(stderr, "close error: %s¥n", tchdberrmsg(ecode)); printf("%s¥n", value); } free(value); } else { /* delete the object */ ecode = tchdbecode(hdb); tchdbdel(hdb); fprintf(stderr, "get error: %s¥n", tchdberrmsg(ecode)); } return 0; }
  • 14. Features • network server of Tokyo Cabinet – client/server model – multi applications can access one database – effective binary protocol • compatible protocols – supports memcached protocol and HTTP – available from most popular languages • high concurrency/performance – resolves "c10k" with epoll/kqueue/eventports – 17.2 sec/1M queries (58,000 qps)
  • 15. • high availability – hot backup and update log – asynchronous replication between servers • various database schema – using the abstract database API of Tokyo Cabinet • effective operations – no-reply updating, multi-record retrieval – atomic increment • Lua extension – defines arbitrary database operations – atomic operation by record locking • pure script language interfaces – Perl, Ruby, Java, Python, PHP, Erlang, etc...
  • 16. Asynchronous Replication master and slaves dual master (load balancing) (fail over) write query master server client client database query if the master is dead query read query update log active master standby master with load balancing database database replicate the difference slave server slave server update log update log replicate database database the difference update log update log
  • 17. Thread Pool Model epoll/kqueue listen accept the client connection if the event is about the listener queue first of all, the listening socket is enqueued into the epoll queue accept epoll_ctl(add) queue back if keep-alive epoll_wait task manager epoll_ctl(del) queue enqueue move the readable client socket from the epoll queue to the task queue deque worker thread worker thread do each task worker thread
  • 18. Lua Extention • defines DB operations as Lua functions – clients send the function name and record data – the server returns the return value of the function • options about atomicity – no locking / record locking / global locking front end request back end - function name - key data Tokyo Tyrant - value data Clients Tokyo Cabinet Lua processor response - result data script database user-defined operations
  • 19. case: Timestamp DB at mixi.jp • 20 million records mod_perl – each record size is 20 bytes home.pl update • more than 10,000 show_friend.pl TT (active) updates per sec. view_diary.pl database – keeps 10,000 connections search.pl replication • dual master other pages replication TT (standby) – each server is only one database fetch • memcached list_friend.pl compatible protocol list_bookmark.pl – reuses existing Perl clients
  • 20. case: Cache for Big Storages • works as proxy clients 1. inserts to the storage – mediates insert/search 2. inserts to the cache – write through, read through • Lua extension Tokyo Tyrant MySQL/hBase – atomic operation by record atomic insert database locking Lua – uses LuaSocket to access the storage database atomic search • proper DB scheme Lua – TCMDB: for generic cache database – TCNDB: for biased access – TCHDB: for large records cache database such as image 1. retrieves from the cache – TCFDB: for small records if found, return 2. retrieves from the storage such as timestamp 3. inserts to the cache
  • 21. Example Code #include <tcrdb.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCRDB *rdb; int ecode; char *value; /* create the object */ rdb = tcrdbnew(); /* connect to the server */ if(!tcrdbopen(rdb, "localhost", 1978)){ ecode = tcrdbecode(rdb); fprintf(stderr, "open error: %s¥n", tcrdberrmsg(ecode)); } /* store records */ if(!tcrdbput2(rdb, "foo", "hop") || !tcrdbput2(rdb, "bar", "step") || !tcrdbput2(rdb, "baz", "jump")){ ecode = tcrdbecode(rdb); fprintf(stderr, "put error: %s¥n", tcrdberrmsg(ecode)); } /* close the connection */ if(!tcrdbclose(rdb)){ /* retrieve records */ ecode = tcrdbecode(rdb); value = tcrdbget2(rdb, "foo"); fprintf(stderr, "close error: %s¥n", tcrdberrmsg(ecode)); if(value){ } printf("%s¥n", value); free(value); /* delete the object */ } else { tcrdbdel(rdb); ecode = tcrdbecode(rdb); fprintf(stderr, "get error: %s¥n", tcrdberrmsg(ecode)); return 0; } }
  • 22. Tokyo Dystopia - full-text search engine -
  • 23. Features • full-text search engine – manages databases of Tokyo Cabinet as an inverted index • combines two tokenizers – character N-gram (bi-gram) method • perfect recall ratio – simple word by outer language processor • high accuracy and high performance • high performance/scalability – handles more than 10 million documents – searches in milliseconds
  • 24. • optimized to professional use – layered architecture of APIs – no embedded scoring system • to combine outer scoring system – no text filter, no crawler, no language processor • convenient utilities – multilingualism with Unicode – set operations – phrase matching, prefix matching, suffix matching, and token matching – command line utilities
  • 25. Inverted Index • stands on key/value database – key = token • N-gram or simple word – value = occurrence data (posting list) • list of pairs of document number and offset in the document • uses B+ tree database – reduces write operations into the disk device – enables common prefix search for tokens – delta encoding and deflate compression ID:21 text: "abracadabra" a - 21:10 ca - 21:1, 21:8 ab - 21:0,21:7 da - 21:4 ac - 21:3 ra - 21:2, 21:9 br - 21:5
  • 26. Layered Architecture • character N-gram index – "q-gram index" (only index), and "indexed database" – uses embedded tokenizer • word index – "word index" (only index), and "tagged index" – uses outer tokenizer Applications Character N-gram Index Tagging Index indexed database tagged database q-gram index word index Tokyo Cabinet
  • 27. case: friend search at mixi.jp • 20 million records – each record size is 1K bytes query user interface – name and self introduction merger • more than 100 qps TT's social query • attribute narrowing cache graph – gender, address, birthday searcher – multiple sort orders copy the social graph inverted attribute • distributed processing index DB – more than 10 servers indexer – indexer, searchers, merger copy the index and the DB inverted attribute • ranking by social index DB graph dump profile data – the merger scores the result by following the friend links profile DB
  • 28. Example Code #include <dystopia.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCIDB *idb; int ecode, rnum, i; uint64_t *result; char *text; /* search records */ /* create the object */ result = tcidbsearch2(idb, "john || thomas", &rnum); idb = tcidbnew(); if(result){ for(i = 0; i < rnum; i++){ /* open the database */ text = tcidbget(idb, result[i]); if(!tcidbopen(idb, "casket", IDBOWRITER | IDBOCREAT)){ if(text){ ecode = tcidbecode(idb); printf("%d¥t%s¥n", (int)result[i], text); fprintf(stderr, "open error: %s¥n", tcidberrmsg(ecode)); free(text); } } } /* store records */ free(result); if(!tcidbput(idb, 1, "George Washington") || } else { !tcidbput(idb, 2, "John Adams") || ecode = tcidbecode(idb); !tcidbput(idb, 3, "Thomas Jefferson")){ fprintf(stderr, "search error: %s¥n", tcidberrmsg(ecode)); ecode = tcidbecode(idb); } fprintf(stderr, "put error: %s¥n", tcidberrmsg(ecode)); } /* close the database */ if(!tcidbclose(idb)){ ecode = tcidbecode(idb); fprintf(stderr, "close error: %s¥n", tcidberrmsg(ecode)); } /* delete the object */ tcidbdel(idb); return 0; }
  • 29. Tokyo Promenade - content management system -
  • 30. Features • content management system – manages Web contents easily with a browser – available as BBS, Blog, and Wiki • simple and logical interface – aims at conciseness like LaTeX – optimized for text browsers such as w3m and Lynx – complying with XHTML 1.0 and considering WCAG 1.0 • high performance/throughput – implemented in pure C – uses Tokyo Cabinet and supports FastCGI – 0.836ms/view (more than 1,000 qps)
  • 31. • sufficient functionality – simple Wiki formatting – file uploader and manager – user authentication by the login form – guest comment authorization by a riddle – supports the sidebar navigation – full-text/attribute search, calendar view – Atom feed • flexible customizability – thorough separation of logic and presentation – template file to generate the output – server side scripting by the Lua extension – post processing by outer commands
  • 32. Example Code #! Introduction to Tokyo Cabinet #c 2009-11-05T18:58:39+09:00 #m 2009-11-05T18:58:39+09:00 #o mikio #t database,programming,tokyocabinet This article describes what is [[Tokyo Cabinet|http://1978th.net/tokyocabinet/]] and how to use it. @ upfile:1257415094-logo-ja.png * Features - modern implementation of DBM -- key/value database -- e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB - simple library = process embedded - Successor of QDBM -- C99 and POSIX compatible, using Pthread, mmap, etc... -- Win32 porting is work-in-progress - high performance - insert: 0.4 sec/1M records (2,500,000 qps) - search: 0.33 sec/1M records (3,000,000 qps)
  • 33. innovating more and yet more... http://1978th.net/