SlideShare a Scribd company logo
as
Search Engine Document Repository

         Preferred Infrastructure, Inc.
                      CTO

       Kazuki Ohta <kzk@preferred.jp>
                 http://kzk9.net/


                        1
Self Introduction
•   Kazuki Ohta, CTO at Preferred Infrastructure, Inc. (http://preferred.jp)
    •   Interested in Data Intensive Computing
    •   Graduated U-Tokyo in 2010 (System Software)
        •   Parallel I/O Middleware for Massively Parallel HPC Environment
        •   Summer Intern @ Argonne National Laboratory
    •   ACM ICPC
    •   Hadoop User Group ( http://hugjp.org/ )
•   Personal Site
    •   http://kzk9.net/, @kzk_mover

                                     2
Agenda




  3
Agenda

• Introduction of Sedue




                    3
Agenda

• Introduction of Sedue
• Problems We Had



                    3
Agenda

• Introduction of Sedue
• Problems We Had
• How MongoDB Solved


                   3
Agenda

• Introduction of Sedue
• Problems We Had
• How MongoDB Solved
• Problems in the Integration Phase

                     3
Agenda

• Introduction of Sedue
• Problems We Had
• How MongoDB Solved
• Problems in the Integration Phase
• Future Insight
                     3
4
Sedue Search Engine
•   Enterprise Distributed Search Engine
    •   Developed at Preferred Infrastructure, Inc.
    •   Multi-threaded C++ Server (0.3 million lines)
    •   Often Handles Midscale Contents
        •   50 million documents/items

•   Around 30 customers
    •   Media, Ad, E-Commerce, Digital Library, etc.

                             5
Sedue Data Model
     •      Fixed Schema over De-Normalized Data
         •    Field Definition + Index Definition
         •    How the data is stored (name? type?)
         •    How the data is indexed




ArticleID       Title             Content             Search             Recommend

 ID123          iPad2         iPad2 is coming!
                                                                Filter
 ID124        MongoDB     Durable in Single Server!

 ID125       MongoTokyo            Today!                      Query
                                        6
Sedue Schema (Sample)
<schema>
   <fields>
      <field name=”article_id” type=”string” />
      <field name=”title”        type=”string” />
      <field name=”contents” type=”string” />
      <field name=”date”          type=”datetime” />
   </fields>
   <indexes>
      <index name=”search” type=”invertedindex”
       target=”content” />
      <index name=”recommend” type=”doc2doc”
       target=”title, content” />
   </indexes>
</schema>
                             7
Sedue Query (Sample)
(search:iPad2)?date<today()?sort=date:desc

 QueryText               Filter                  Sort


ArticleID      Title             Content                date

 ID123        iPad2          iPad2 is coming!           today

 ID124       MongoDB     Durable in Single Server!      today

 ID125      MongoTokyo            Today!             yesterday

                            8
Sedue Query (Sample)
((search:iPad2)&(search:coming))?date<today()?sort=date:desc
         QueryText                          Filter               Sort


       ArticleID      Title             Content               date

        ID123        iPad2          iPad2 is coming!         today

        ID124       MongoDB     Durable in Single Server!    today

        ID125      MongoTokyo            Today!             yesterday

                                   9
Sedue Query (Sample)
(recommend:ID124)?date<today()?sort=date:desc

   QueryText                   Filter                   Sort


   ArticleID      Title             Content               date

    ID123        iPad2          iPad2 is coming!         today

    ID124       MongoDB     Durable in Single Server!    today

    ID125      MongoTokyo            Today!             yesterday

                               10
This Data Model is Mapped to
   The Distributed System



             11
Sedue Architecture   Crawler




        12
Sedue Architecture    Crawler




                     Distributed
                     Repository




        12
Sedue Architecture            Crawler




                             Distributed
                             Repository




                Document
                Repository
                  Proxy

        12
Sedue Architecture                   Crawler




                                    Distributed
                                    Repository




                       Document
             Indexer   Repository
                         Proxy

        12
Sedue Architecture                           Crawler




                                            Distributed
       Distributed                          Repository
       File System
          (DFS)




                               Document
                     Indexer   Repository
                                 Proxy

            12
Sedue Architecture                                  Crawler




                                                   Distributed
              Distributed                          Repository
              File System
                 (DFS)




                                      Document
   Searchar                 Indexer   Repository
                                        Proxy

                   12
Sedue Architecture                                     Crawler




                                                         Distributed
                    Distributed                          Repository
                    File System
                       (DFS)




                                            Document
Query
         Searchar                 Indexer   Repository
Server
                                              Proxy

                         12
Sedue Architecture                                     Crawler




                                                                Distributed
                           Distributed                          Repository
                           File System
                              (DFS)



User



                                                   Document
       Query
                Searchar                 Indexer   Repository
       Server
                                                     Proxy

                                12
Sedue Architecture                                         Crawler




                                                                    Distributed
                             Distributed                            Repository
                             File System
                                (DFS)



User



                                                       Document
       Query
                Searchar                     Indexer   Repository
       Server
                                                         Proxy

                           Archive 12
                                   Manager
Sedue Architecture
•   “Distributed Index-Query Mechanism”
    •   Create indices, distribute them, query with them
        •   Most types of search/recommendation algorithm fits
            into this architecture
    •   Otherwords: “Distributed Column-Oriented Database”


•   Once put the documents into Sedue, you can use search/
    recommendation in One System
    •   Register/Query is done via REST API

                               13
OK,
now we developed the

    Distributed
Index-Query Engine!


         14
However...




    15
However...
• THE PROBLEM: THE REAL WORLD




                   15
However...
• THE PROBLEM: THE REAL WORLD
 • Schema is changed once a week.




                      15
However...
• THE PROBLEM: THE REAL WORLD
 • Schema is changed once a week.
 • Real data lacks most columns



                      15
However...
• THE PROBLEM: THE REAL WORLD
 • Schema is changed once a week.
 • Real data lacks most columns
   • Especially in building vertical search over many
      sites (each has its own schema)



                          15
However...
• THE PROBLEM: THE REAL WORLD
 • Schema is changed once a week.
 • Real data lacks most columns
   • Especially in building vertical search over many
      sites (each has its own schema)
  • High Availability is required in some cases
                           15
Especially, Cross-Site Search
          BP
ITPro
ITPro
NikkeiBusiness Online
PC Online
TechOn
Kenplatz
ECO Japan
BPNet



                   




   BP          



                        16
ArticleI      Title         Content                date      FlagA   FlagB   FlagC   FlagXX
  D
ID123         iPad2     iPad2 is coming!          today                       1

ID124       MongoDB     Durable in Single         today
                            Server!
ID125      MongoTokyo        Today!              yesterday                             0

ID126        HBase        0.90 is out!                                                 1

ID127       Cassandra                                                 1

ID128       CouchDB

ID129         Ruby                                today

ID130        Python           N/A                                     0

ID131        Haskell          N/A                                             1

ID132        D-Lang           N/A
                                            17
ArticleI      Title         Content                date      FlagA   FlagB   FlagC   FlagXX
  D
ID123         iPad2     iPad2 is coming!          today                       1

ID124       MongoDB     Durable in Single         today
                            Server!
ID125      MongoTokyo        Today!              yesterday                             0

ID126        HBase        0.90 is out!                                                 1

ID127       Cassandra                                                 1
                                                  Sparse!!!
ID128       CouchDB

ID129         Ruby                                today

ID130        Python           N/A                                     0

ID131        Haskell          N/A                                             1

ID132        D-Lang           N/A
                                            18
One Lucky Thing:
“Pluggable Storage Strategy”



             19
Pluggable Storage Strategy
•   Important: We want to focus on developing application servers
        •   we’re the search engine company, not the database company


•   DocumentRepository, DistributedFileSystem is pluggable!
    •   Many, many NoSQL storages are emerging
    •   Prepare the simple interface on top of them
        •   You can select the underlying storage technology by the
            requirements of the system itself
        •   by document volume, availability, consistency, etc.

                                     20
At first... (Repository)
                                           Online
                API   Replication         Column           Sharding
                                          Addition

Tokyo Cabinet
  (Table DB)    ○         ×                     ○             ×


   MySQL        ×         ○

                            Unfortunately, TokyoTyrant
                           doen’t support Table Database
                                    at that time.
                          21
At first... (DFS)

        API      Setup     Availability Performance


NFS    POSIX      ○          costly       costly


       libhdfs
HDFS                                        ○
        sucks


                      22
23
http://www.mongodb.org/

•   OSS Document-Oriented Database
    •   No Schema, BSON, Rich Query + B-TreeIndex
    •   written in C++
        •   C, C++, Java, PHP, Python, Ruby COOL drivers
    •   Embedded JavaScript Engine
        •   db.insert({“category”:”         ”},              MongoDB Sharding

            {“          ”: “          ”})

        •   db.articles.find({“category”: “             ”})

    •   High Availability by ReplicaSet
    •   High Scalability by Auto-Sharding
                                                  24
As Repository
                                         Online
                API   Replication       Column     Sharding
                                        Addition
Tokyo Cabinet
  (Table DB)    ○           ×              ○          ×


   MySQL        ×           ○


 MongoDB        ○           ○              ○       ongoing
                      (master-master)

                            25
GridFS
• MongoDB as Blob-Storage
 • The contents is splitted into 256kb
      chunks, with some metadata.
 • Performance is not as high as HDFS, but
      still useful in mid-scale deployment.

                        Chunk0
  Large Blob                            Metadata
                        Chunk1



                        26
As DFS
          API       Setup     Availability Performance


 NFS     POSIX       ○          costly       costly


         libhdfs
HDFS                                           ○
          sucks


GridFS   C++         ○            ○

                         27
Now Sedue                             MongoDB
                                         •   Use as Multiple Ways

                Repository
                                             •   Repository + DFS

                                             •   Easy setup!!!

                                         •       30million documents


User
                                             •   No Schema change is required
                  DFS
                                             •   Master-Master Replication

                                             •   Backup once a week

        Sedue
                         MongoDB 1.6     •   4 Production Deployments
                        (Master-Master
                          Replication)       •     1 year

                                  28
We had issues, but MongoDB is OSS!

•   SERVER-1408 (Fixed)
    •   C++ Driver GridFS cannot store over 4G object.
•   SERVER-1372 (Fixed)
    •   NULL check for auto_ptr<DBClientConnection> is missing
•   SERVER-1328 (Fixed)
    •   scons install doesn't end with --prefix parameter?
•   SERVER-1232 (Fixed)
    •   C++ GridFS Client should support larger Chunk Size
•   SERVER-2050
    •   Enables ScopedDbConnection to set the timeout.
                                 29
Got the Mug!




     30
How Long?
•   Prototype Version is in One Week
    •   using C++ client API
    •   about 500 lines
•   Production release in about 2 month
    •   including bugfixes
    •   mongo-user ML is really responsible
    •   Eliot Horowitz merged my patch as quick as possible
    •   The product itself is really stable than I expected (sorry)

                                 31
How we store documents?
• Most Straight Forward Way as Document DB
 • 30m documents, 4M limit each...
{
                             Internal DocumentID (Indexed)
    # Internal Fields
                            Internal ShardingID (Indexed)
    “__docid”: 32132,
    “__arcid”: 3,
    # Data Fields
    “title”: “MongoDB 1.8 is released!”,
    “content”: “Single Server Durability is supported”
}
                            32
DocID Numbering


• Counter by Atomic Increment Operation
 • docid++


                   33
Query
•   Query by DocumentID
    •   db.datadb.find({“__docid”: 12345}) = 1 doc
•   Query by ShardingID
    •   db.datadb.find({“__arcid”: 3}) = <3m doc


•   These two fields have index!
    •   Usage is more like K-V lookup, not the complex query
    •   ShardingID query accesses whole disk structure now
        •   Split by collection is ideal, but more hard to maintain
                                 34
Problems...



     35
Problem: Disk Consumption
• MongoDB consumes the disk space a lot
• Allocate some GBs (configurable), for the
  replication logs
• Mostly append architecture
 • In-place modification is supported, if smaller
    than the original size
• No compression scheme
 • want LZO/gzip support!
                         36
Problem: Consistency
•   Fire-and-Forget Write Behavior
    •   Normally, mongodb insert doesn’t ensure the success at
        the server-side
    •   Need to call getLastError() to ensure it, but slower
    •   In replicated environment, you can specify minimum
        number of servers which succeeded the write operation
•   ReplicaSet mechanism is somewhat in the blackbox?
    •   What consistency it provides? Fail-over mechanism?
    •   Finally chose master-master replication. But will be
        obsoleted?
                                  37
1 billion Docs in MongoDB
            38
Sharding
• Scaling without no application modification




                     39
Sharding
•   Test with 2 nodes (8G mem, 1 SATA disk)
    •   150 Doc Register / sec
        •   Upto 50 million documents
    •   Gradually slowing down...
        •   More latency than non-sharding setup
        •   More parallelism, More node?
•   This results is early 1.7 release
    •   Now enhanced a lot?
                           40
Conclusion
• Sedue is “Distributed Index-Query Engine”
 • Headache about Frequently Changing Schema
• Sedue MongoDB
 • As DocumentRepository + Blob Storage
 • MongoDB handles real data well in some cases
 • Future: Sharding for More Large Deployment
                       41
We’re Hiring!
•   Engineers
    •   Core Search Engine Developer
        •   C++ Expert
        •   Distributed Systems Expert
    •   Professional Support and Service
        •   UNIX/Linux Expert
    •   Summer Intern Student
•   Contact Me
    •   kzk@preferred.jp , @kzk_mover
    •   PFI: @preferred_jp
    •   SedueTeam: @nobu_k, @eiichiroi, @repeatedly
                                   42

More Related Content

What's hot

Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with BuildxVirtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Docker, Inc.
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
Docker, Inc.
 
Docker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよDocker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよ
Yusuke Kon
 
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
Docker, Inc.
 
Package your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and KubernetesPackage your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and Kubernetes
Arun Gupta
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
Oleg Chunikhin
 
Introduction to Docker - Docker workshop @Twitter
Introduction to Docker - Docker workshop @TwitterIntroduction to Docker - Docker workshop @Twitter
Introduction to Docker - Docker workshop @Twitter
dotCloud
 
Django via Docker
Django via DockerDjango via Docker
Django via Docker
Brenden West
 
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
Docker, Inc.
 
Docker architecture-04-1
Docker architecture-04-1Docker architecture-04-1
Docker architecture-04-1
Mohammadreza Amini
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
John Willis
 
QNAP COSCUP Container Station
QNAP COSCUP Container StationQNAP COSCUP Container Station
QNAP COSCUP Container Station
Wu Fan-Cheng
 
Docker 101: An Introduction
Docker 101: An IntroductionDocker 101: An Introduction
Docker 101: An Introduction
POSSCON
 
Intro to containerization
Intro to containerizationIntro to containerization
Intro to containerization
Balint Pato
 
Containerization and Docker
Containerization and DockerContainerization and Docker
Containerization and Docker
Megha Bansal
 
Docker Platform 1.9
Docker Platform 1.9Docker Platform 1.9
Docker Platform 1.9
Docker, Inc.
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
Akihiro Suda
 
DockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ DockerDockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ Docker
Docker, Inc.
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
juvenxu
 
Rootless Kubernetes
Rootless KubernetesRootless Kubernetes
Rootless Kubernetes
Akihiro Suda
 

What's hot (20)

Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with BuildxVirtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
 
Docker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよDocker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよ
 
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
 
Package your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and KubernetesPackage your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and Kubernetes
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
 
Introduction to Docker - Docker workshop @Twitter
Introduction to Docker - Docker workshop @TwitterIntroduction to Docker - Docker workshop @Twitter
Introduction to Docker - Docker workshop @Twitter
 
Django via Docker
Django via DockerDjango via Docker
Django via Docker
 
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
 
Docker architecture-04-1
Docker architecture-04-1Docker architecture-04-1
Docker architecture-04-1
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
QNAP COSCUP Container Station
QNAP COSCUP Container StationQNAP COSCUP Container Station
QNAP COSCUP Container Station
 
Docker 101: An Introduction
Docker 101: An IntroductionDocker 101: An Introduction
Docker 101: An Introduction
 
Intro to containerization
Intro to containerizationIntro to containerization
Intro to containerization
 
Containerization and Docker
Containerization and DockerContainerization and Docker
Containerization and Docker
 
Docker Platform 1.9
Docker Platform 1.9Docker Platform 1.9
Docker Platform 1.9
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
 
DockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ DockerDockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ Docker
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
 
Rootless Kubernetes
Rootless KubernetesRootless Kubernetes
Rootless Kubernetes
 

Viewers also liked

LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
Preferred Networks
 
MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習
Preferred Networks
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
nlab_utokyo
 
Lighting talk chainer hands on
Lighting talk chainer hands onLighting talk chainer hands on
Lighting talk chainer hands on
Ogushi Masaya
 
Chainer meetup lt
Chainer meetup ltChainer meetup lt
Chainer meetup lt
Ace12358
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
Kenta Oono
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
Naoto Yoshida
 
ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法
Yuko Fujiyama
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Jun-ya Norimatsu
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
Preferred Networks
 
ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)
Motoki Sato
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
Seiya Tokui
 
Chainer入門と最近の機能
Chainer入門と最近の機能Chainer入門と最近の機能
Chainer入門と最近の機能
Yuya Unno
 
CuPy解説
CuPy解説CuPy解説
CuPy解説
Ryosuke Okuta
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
Preferred Networks
 
Introduction to DEEPstation the GUI Deep learning environment for chainer
Introduction to DEEPstation the GUI Deep learning environment for chainerIntroduction to DEEPstation the GUI Deep learning environment for chainer
Introduction to DEEPstation the GUI Deep learning environment for chainer
Ryo Shimizu
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル
Yuta Kashino
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011Preferred Networks
 

Viewers also liked (20)

LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
 
MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習MapReduceによる大規模データを利用した機械学習
MapReduceによる大規模データを利用した機械学習
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Lighting talk chainer hands on
Lighting talk chainer hands onLighting talk chainer hands on
Lighting talk chainer hands on
 
Chainer meetup lt
Chainer meetup ltChainer meetup lt
Chainer meetup lt
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
 
ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
 
ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
 
Chainer入門と最近の機能
Chainer入門と最近の機能Chainer入門と最近の機能
Chainer入門と最近の機能
 
CuPy解説
CuPy解説CuPy解説
CuPy解説
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
 
Introduction to DEEPstation the GUI Deep learning environment for chainer
Introduction to DEEPstation the GUI Deep learning environment for chainerIntroduction to DEEPstation the GUI Deep learning environment for chainer
Introduction to DEEPstation the GUI Deep learning environment for chainer
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル
 
jubatus pressrelease
jubatus pressreleasejubatus pressrelease
jubatus pressrelease
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
 
rcast_20140411
rcast_20140411rcast_20140411
rcast_20140411
 

Similar to MongoDB as Search Engine Repository @ MongoTokyo2011

PFIインターン最終発表
PFIインターン最終発表PFIインターン最終発表
PFIインターン最終発表
Shuzo Kashihara
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
Jim Dowling
 
Azure CosmosDB
Azure CosmosDBAzure CosmosDB
Azure CosmosDB
Fernando Mejía
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012
CLOUDIAN KK
 
Introduction to Fauna
Introduction to FaunaIntroduction to Fauna
Introduction to Fauna
alialaei7
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
DataWorks Summit
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
Daniel Leon
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
Michael Rys
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 Berlin
Alexander Klimetschek
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
jhierrot
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
Mohit Chhabra
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Search all the things
Search all the thingsSearch all the things
Search all the things
cyberswat
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
lucenerevolution
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
lucenerevolution
 

Similar to MongoDB as Search Engine Repository @ MongoTokyo2011 (20)

PFIインターン最終発表
PFIインターン最終発表PFIインターン最終発表
PFIインターン最終発表
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Azure CosmosDB
Azure CosmosDBAzure CosmosDB
Azure CosmosDB
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012
 
Introduction to Fauna
Introduction to FaunaIntroduction to Fauna
Introduction to Fauna
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 Berlin
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Search all the things
Search all the thingsSearch all the things
Search all the things
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 

More from Preferred Networks

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
Preferred Networks
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Preferred Networks
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Preferred Networks
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
Preferred Networks
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Preferred Networks
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
Preferred Networks
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Preferred Networks
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
Preferred Networks
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
Preferred Networks
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Preferred Networks
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Preferred Networks
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
Preferred Networks
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Preferred Networks
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
Preferred Networks
 

More from Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

MongoDB as Search Engine Repository @ MongoTokyo2011

  • 1. as Search Engine Document Repository Preferred Infrastructure, Inc. CTO Kazuki Ohta <kzk@preferred.jp> http://kzk9.net/ 1
  • 2. Self Introduction • Kazuki Ohta, CTO at Preferred Infrastructure, Inc. (http://preferred.jp) • Interested in Data Intensive Computing • Graduated U-Tokyo in 2010 (System Software) • Parallel I/O Middleware for Massively Parallel HPC Environment • Summer Intern @ Argonne National Laboratory • ACM ICPC • Hadoop User Group ( http://hugjp.org/ ) • Personal Site • http://kzk9.net/, @kzk_mover 2
  • 5. Agenda • Introduction of Sedue • Problems We Had 3
  • 6. Agenda • Introduction of Sedue • Problems We Had • How MongoDB Solved 3
  • 7. Agenda • Introduction of Sedue • Problems We Had • How MongoDB Solved • Problems in the Integration Phase 3
  • 8. Agenda • Introduction of Sedue • Problems We Had • How MongoDB Solved • Problems in the Integration Phase • Future Insight 3
  • 9. 4
  • 10. Sedue Search Engine • Enterprise Distributed Search Engine • Developed at Preferred Infrastructure, Inc. • Multi-threaded C++ Server (0.3 million lines) • Often Handles Midscale Contents • 50 million documents/items • Around 30 customers • Media, Ad, E-Commerce, Digital Library, etc. 5
  • 11. Sedue Data Model • Fixed Schema over De-Normalized Data • Field Definition + Index Definition • How the data is stored (name? type?) • How the data is indexed ArticleID Title Content Search Recommend ID123 iPad2 iPad2 is coming! Filter ID124 MongoDB Durable in Single Server! ID125 MongoTokyo Today! Query 6
  • 12. Sedue Schema (Sample) <schema> <fields> <field name=”article_id” type=”string” /> <field name=”title” type=”string” /> <field name=”contents” type=”string” /> <field name=”date” type=”datetime” /> </fields> <indexes> <index name=”search” type=”invertedindex” target=”content” /> <index name=”recommend” type=”doc2doc” target=”title, content” /> </indexes> </schema> 7
  • 13. Sedue Query (Sample) (search:iPad2)?date<today()?sort=date:desc QueryText Filter Sort ArticleID Title Content date ID123 iPad2 iPad2 is coming! today ID124 MongoDB Durable in Single Server! today ID125 MongoTokyo Today! yesterday 8
  • 14. Sedue Query (Sample) ((search:iPad2)&(search:coming))?date<today()?sort=date:desc QueryText Filter Sort ArticleID Title Content date ID123 iPad2 iPad2 is coming! today ID124 MongoDB Durable in Single Server! today ID125 MongoTokyo Today! yesterday 9
  • 15. Sedue Query (Sample) (recommend:ID124)?date<today()?sort=date:desc QueryText Filter Sort ArticleID Title Content date ID123 iPad2 iPad2 is coming! today ID124 MongoDB Durable in Single Server! today ID125 MongoTokyo Today! yesterday 10
  • 16. This Data Model is Mapped to The Distributed System 11
  • 17. Sedue Architecture Crawler 12
  • 18. Sedue Architecture Crawler Distributed Repository 12
  • 19. Sedue Architecture Crawler Distributed Repository Document Repository Proxy 12
  • 20. Sedue Architecture Crawler Distributed Repository Document Indexer Repository Proxy 12
  • 21. Sedue Architecture Crawler Distributed Distributed Repository File System (DFS) Document Indexer Repository Proxy 12
  • 22. Sedue Architecture Crawler Distributed Distributed Repository File System (DFS) Document Searchar Indexer Repository Proxy 12
  • 23. Sedue Architecture Crawler Distributed Distributed Repository File System (DFS) Document Query Searchar Indexer Repository Server Proxy 12
  • 24. Sedue Architecture Crawler Distributed Distributed Repository File System (DFS) User Document Query Searchar Indexer Repository Server Proxy 12
  • 25. Sedue Architecture Crawler Distributed Distributed Repository File System (DFS) User Document Query Searchar Indexer Repository Server Proxy Archive 12 Manager
  • 26. Sedue Architecture • “Distributed Index-Query Mechanism” • Create indices, distribute them, query with them • Most types of search/recommendation algorithm fits into this architecture • Otherwords: “Distributed Column-Oriented Database” • Once put the documents into Sedue, you can use search/ recommendation in One System • Register/Query is done via REST API 13
  • 27. OK, now we developed the Distributed Index-Query Engine! 14
  • 29. However... • THE PROBLEM: THE REAL WORLD 15
  • 30. However... • THE PROBLEM: THE REAL WORLD • Schema is changed once a week. 15
  • 31. However... • THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns 15
  • 32. However... • THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns • Especially in building vertical search over many sites (each has its own schema) 15
  • 33. However... • THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns • Especially in building vertical search over many sites (each has its own schema) • High Availability is required in some cases 15
  • 34. Especially, Cross-Site Search BP ITPro ITPro NikkeiBusiness Online PC Online TechOn Kenplatz ECO Japan BPNet BP 16
  • 35. ArticleI Title Content date FlagA FlagB FlagC FlagXX D ID123 iPad2 iPad2 is coming! today 1 ID124 MongoDB Durable in Single today Server! ID125 MongoTokyo Today! yesterday 0 ID126 HBase 0.90 is out! 1 ID127 Cassandra 1 ID128 CouchDB ID129 Ruby today ID130 Python N/A 0 ID131 Haskell N/A 1 ID132 D-Lang N/A 17
  • 36. ArticleI Title Content date FlagA FlagB FlagC FlagXX D ID123 iPad2 iPad2 is coming! today 1 ID124 MongoDB Durable in Single today Server! ID125 MongoTokyo Today! yesterday 0 ID126 HBase 0.90 is out! 1 ID127 Cassandra 1 Sparse!!! ID128 CouchDB ID129 Ruby today ID130 Python N/A 0 ID131 Haskell N/A 1 ID132 D-Lang N/A 18
  • 37. One Lucky Thing: “Pluggable Storage Strategy” 19
  • 38. Pluggable Storage Strategy • Important: We want to focus on developing application servers • we’re the search engine company, not the database company • DocumentRepository, DistributedFileSystem is pluggable! • Many, many NoSQL storages are emerging • Prepare the simple interface on top of them • You can select the underlying storage technology by the requirements of the system itself • by document volume, availability, consistency, etc. 20
  • 39. At first... (Repository) Online API Replication Column Sharding Addition Tokyo Cabinet (Table DB) ○ × ○ × MySQL × ○ Unfortunately, TokyoTyrant doen’t support Table Database at that time. 21
  • 40. At first... (DFS) API Setup Availability Performance NFS POSIX ○ costly costly libhdfs HDFS ○ sucks 22
  • 41. 23
  • 42. http://www.mongodb.org/ • OSS Document-Oriented Database • No Schema, BSON, Rich Query + B-TreeIndex • written in C++ • C, C++, Java, PHP, Python, Ruby COOL drivers • Embedded JavaScript Engine • db.insert({“category”:” ”}, MongoDB Sharding {“ ”: “ ”}) • db.articles.find({“category”: “ ”}) • High Availability by ReplicaSet • High Scalability by Auto-Sharding 24
  • 43. As Repository Online API Replication Column Sharding Addition Tokyo Cabinet (Table DB) ○ × ○ × MySQL × ○ MongoDB ○ ○ ○ ongoing (master-master) 25
  • 44. GridFS • MongoDB as Blob-Storage • The contents is splitted into 256kb chunks, with some metadata. • Performance is not as high as HDFS, but still useful in mid-scale deployment. Chunk0 Large Blob Metadata Chunk1 26
  • 45. As DFS API Setup Availability Performance NFS POSIX ○ costly costly libhdfs HDFS ○ sucks GridFS C++ ○ ○ 27
  • 46. Now Sedue MongoDB • Use as Multiple Ways Repository • Repository + DFS • Easy setup!!! • 30million documents User • No Schema change is required DFS • Master-Master Replication • Backup once a week Sedue MongoDB 1.6 • 4 Production Deployments (Master-Master Replication) • 1 year 28
  • 47. We had issues, but MongoDB is OSS! • SERVER-1408 (Fixed) • C++ Driver GridFS cannot store over 4G object. • SERVER-1372 (Fixed) • NULL check for auto_ptr<DBClientConnection> is missing • SERVER-1328 (Fixed) • scons install doesn't end with --prefix parameter? • SERVER-1232 (Fixed) • C++ GridFS Client should support larger Chunk Size • SERVER-2050 • Enables ScopedDbConnection to set the timeout. 29
  • 49. How Long? • Prototype Version is in One Week • using C++ client API • about 500 lines • Production release in about 2 month • including bugfixes • mongo-user ML is really responsible • Eliot Horowitz merged my patch as quick as possible • The product itself is really stable than I expected (sorry) 31
  • 50. How we store documents? • Most Straight Forward Way as Document DB • 30m documents, 4M limit each... { Internal DocumentID (Indexed) # Internal Fields Internal ShardingID (Indexed) “__docid”: 32132, “__arcid”: 3, # Data Fields “title”: “MongoDB 1.8 is released!”, “content”: “Single Server Durability is supported” } 32
  • 51. DocID Numbering • Counter by Atomic Increment Operation • docid++ 33
  • 52. Query • Query by DocumentID • db.datadb.find({“__docid”: 12345}) = 1 doc • Query by ShardingID • db.datadb.find({“__arcid”: 3}) = <3m doc • These two fields have index! • Usage is more like K-V lookup, not the complex query • ShardingID query accesses whole disk structure now • Split by collection is ideal, but more hard to maintain 34
  • 54. Problem: Disk Consumption • MongoDB consumes the disk space a lot • Allocate some GBs (configurable), for the replication logs • Mostly append architecture • In-place modification is supported, if smaller than the original size • No compression scheme • want LZO/gzip support! 36
  • 55. Problem: Consistency • Fire-and-Forget Write Behavior • Normally, mongodb insert doesn’t ensure the success at the server-side • Need to call getLastError() to ensure it, but slower • In replicated environment, you can specify minimum number of servers which succeeded the write operation • ReplicaSet mechanism is somewhat in the blackbox? • What consistency it provides? Fail-over mechanism? • Finally chose master-master replication. But will be obsoleted? 37
  • 56. 1 billion Docs in MongoDB 38
  • 57. Sharding • Scaling without no application modification 39
  • 58. Sharding • Test with 2 nodes (8G mem, 1 SATA disk) • 150 Doc Register / sec • Upto 50 million documents • Gradually slowing down... • More latency than non-sharding setup • More parallelism, More node? • This results is early 1.7 release • Now enhanced a lot? 40
  • 59. Conclusion • Sedue is “Distributed Index-Query Engine” • Headache about Frequently Changing Schema • Sedue MongoDB • As DocumentRepository + Blob Storage • MongoDB handles real data well in some cases • Future: Sharding for More Large Deployment 41
  • 60. We’re Hiring! • Engineers • Core Search Engine Developer • C++ Expert • Distributed Systems Expert • Professional Support and Service • UNIX/Linux Expert • Summer Intern Student • Contact Me • kzk@preferred.jp , @kzk_mover • PFI: @preferred_jp • SedueTeam: @nobu_k, @eiichiroi, @repeatedly 42

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n