10. Sedue Search Engine
• Enterprise Distributed Search Engine
• Developed at Preferred Infrastructure, Inc.
• Multi-threaded C++ Server (0.3 million lines)
• Often Handles Midscale Contents
• 50 million documents/items
• Around 30 customers
• Media, Ad, E-Commerce, Digital Library, etc.
5
11. Sedue Data Model
• Fixed Schema over De-Normalized Data
• Field Definition + Index Definition
• How the data is stored (name? type?)
• How the data is indexed
ArticleID Title Content Search Recommend
ID123 iPad2 iPad2 is coming!
Filter
ID124 MongoDB Durable in Single Server!
ID125 MongoTokyo Today! Query
6
23. Sedue Architecture Crawler
Distributed
Distributed Repository
File System
(DFS)
Document
Query
Searchar Indexer Repository
Server
Proxy
12
24. Sedue Architecture Crawler
Distributed
Distributed Repository
File System
(DFS)
User
Document
Query
Searchar Indexer Repository
Server
Proxy
12
25. Sedue Architecture Crawler
Distributed
Distributed Repository
File System
(DFS)
User
Document
Query
Searchar Indexer Repository
Server
Proxy
Archive 12
Manager
26. Sedue Architecture
• “Distributed Index-Query Mechanism”
• Create indices, distribute them, query with them
• Most types of search/recommendation algorithm fits
into this architecture
• Otherwords: “Distributed Column-Oriented Database”
• Once put the documents into Sedue, you can use search/
recommendation in One System
• Register/Query is done via REST API
13
31. However...
• THE PROBLEM: THE REAL WORLD
• Schema is changed once a week.
• Real data lacks most columns
15
32. However...
• THE PROBLEM: THE REAL WORLD
• Schema is changed once a week.
• Real data lacks most columns
• Especially in building vertical search over many
sites (each has its own schema)
15
33. However...
• THE PROBLEM: THE REAL WORLD
• Schema is changed once a week.
• Real data lacks most columns
• Especially in building vertical search over many
sites (each has its own schema)
• High Availability is required in some cases
15
38. Pluggable Storage Strategy
• Important: We want to focus on developing application servers
• we’re the search engine company, not the database company
• DocumentRepository, DistributedFileSystem is pluggable!
• Many, many NoSQL storages are emerging
• Prepare the simple interface on top of them
• You can select the underlying storage technology by the
requirements of the system itself
• by document volume, availability, consistency, etc.
20
39. At first... (Repository)
Online
API Replication Column Sharding
Addition
Tokyo Cabinet
(Table DB) ○ × ○ ×
MySQL × ○
Unfortunately, TokyoTyrant
doen’t support Table Database
at that time.
21
40. At first... (DFS)
API Setup Availability Performance
NFS POSIX ○ costly costly
libhdfs
HDFS ○
sucks
22
42. http://www.mongodb.org/
• OSS Document-Oriented Database
• No Schema, BSON, Rich Query + B-TreeIndex
• written in C++
• C, C++, Java, PHP, Python, Ruby COOL drivers
• Embedded JavaScript Engine
• db.insert({“category”:” ”}, MongoDB Sharding
{“ ”: “ ”})
• db.articles.find({“category”: “ ”})
• High Availability by ReplicaSet
• High Scalability by Auto-Sharding
24
43. As Repository
Online
API Replication Column Sharding
Addition
Tokyo Cabinet
(Table DB) ○ × ○ ×
MySQL × ○
MongoDB ○ ○ ○ ongoing
(master-master)
25
44. GridFS
• MongoDB as Blob-Storage
• The contents is splitted into 256kb
chunks, with some metadata.
• Performance is not as high as HDFS, but
still useful in mid-scale deployment.
Chunk0
Large Blob Metadata
Chunk1
26
45. As DFS
API Setup Availability Performance
NFS POSIX ○ costly costly
libhdfs
HDFS ○
sucks
GridFS C++ ○ ○
27
46. Now Sedue MongoDB
• Use as Multiple Ways
Repository
• Repository + DFS
• Easy setup!!!
• 30million documents
User
• No Schema change is required
DFS
• Master-Master Replication
• Backup once a week
Sedue
MongoDB 1.6 • 4 Production Deployments
(Master-Master
Replication) • 1 year
28
47. We had issues, but MongoDB is OSS!
• SERVER-1408 (Fixed)
• C++ Driver GridFS cannot store over 4G object.
• SERVER-1372 (Fixed)
• NULL check for auto_ptr<DBClientConnection> is missing
• SERVER-1328 (Fixed)
• scons install doesn't end with --prefix parameter?
• SERVER-1232 (Fixed)
• C++ GridFS Client should support larger Chunk Size
• SERVER-2050
• Enables ScopedDbConnection to set the timeout.
29
49. How Long?
• Prototype Version is in One Week
• using C++ client API
• about 500 lines
• Production release in about 2 month
• including bugfixes
• mongo-user ML is really responsible
• Eliot Horowitz merged my patch as quick as possible
• The product itself is really stable than I expected (sorry)
31
50. How we store documents?
• Most Straight Forward Way as Document DB
• 30m documents, 4M limit each...
{
Internal DocumentID (Indexed)
# Internal Fields
Internal ShardingID (Indexed)
“__docid”: 32132,
“__arcid”: 3,
# Data Fields
“title”: “MongoDB 1.8 is released!”,
“content”: “Single Server Durability is supported”
}
32
52. Query
• Query by DocumentID
• db.datadb.find({“__docid”: 12345}) = 1 doc
• Query by ShardingID
• db.datadb.find({“__arcid”: 3}) = <3m doc
• These two fields have index!
• Usage is more like K-V lookup, not the complex query
• ShardingID query accesses whole disk structure now
• Split by collection is ideal, but more hard to maintain
34
54. Problem: Disk Consumption
• MongoDB consumes the disk space a lot
• Allocate some GBs (configurable), for the
replication logs
• Mostly append architecture
• In-place modification is supported, if smaller
than the original size
• No compression scheme
• want LZO/gzip support!
36
55. Problem: Consistency
• Fire-and-Forget Write Behavior
• Normally, mongodb insert doesn’t ensure the success at
the server-side
• Need to call getLastError() to ensure it, but slower
• In replicated environment, you can specify minimum
number of servers which succeeded the write operation
• ReplicaSet mechanism is somewhat in the blackbox?
• What consistency it provides? Fail-over mechanism?
• Finally chose master-master replication. But will be
obsoleted?
37
58. Sharding
• Test with 2 nodes (8G mem, 1 SATA disk)
• 150 Doc Register / sec
• Upto 50 million documents
• Gradually slowing down...
• More latency than non-sharding setup
• More parallelism, More node?
• This results is early 1.7 release
• Now enhanced a lot?
40
59. Conclusion
• Sedue is “Distributed Index-Query Engine”
• Headache about Frequently Changing Schema
• Sedue MongoDB
• As DocumentRepository + Blob Storage
• MongoDB handles real data well in some cases
• Future: Sharding for More Large Deployment
41
60. We’re Hiring!
• Engineers
• Core Search Engine Developer
• C++ Expert
• Distributed Systems Expert
• Professional Support and Service
• UNIX/Linux Expert
• Summer Intern Student
• Contact Me
• kzk@preferred.jp , @kzk_mover
• PFI: @preferred_jp
• SedueTeam: @nobu_k, @eiichiroi, @repeatedly
42