Capable of storing thousands of TB of data High and sustainable aggregate IO bandwidth Hundreds of GB/s read performance Tens of GB/s write performance Uninterruptible service Built-in fault tolerance and high availability Automatic machine management Plug and play
Deque query 2 times faster than stl List insert 10 times faster than stl Hash query 2.5 times faster than stl
Offline data statistics Map/Reduce Offline data computing (high CPU) MPI Online/offline data condition query (second – min) Big Table Online/offline data aggregate query OLAP Online relational data query (millisecond – sec) Distributed DB Online key-value query (millisecond) Key-Value DB(MOLA)
Hot deploy Anti attack, load balance, apps dynamic deploy, App config, resource constrain Protect system, resource computing and limit Upload, download big file
Under the concept, the company's search engine will not only provide query results, but can also carry out commands like launching an application or linking a user directly with an online service.
Baidu Cloud Computing Practice Fei Dong 2010-11-12
History of Baidu• Baidu was established in 2000 by Robin Li and Eric Xu.• Trade on NASTAQ in 2005, Mkt cap > 30B, 4 th largest internet company in the world• Vast majority of its revenues from online advertising, Pay for performance (P4P)• Success from multimedia search, "MP3 Search"
Baidu Status• #1 site in China, #6 around the world• List company in NASTAQ, Mkt cap > 30B, 4 th largest internet company• leading Chinese online search engine: hold 80% of market in China• Mission: provide the best way for people to find information• Technology-driven company,8000 employees, 3000 engineers
Cloud Computing Meaning• Serve for inside company, reorganize resource, unify the interface• New requirement: Large Scale Storage, computing, high performance, high availability, dynamic user needs• Transfer technology from backend to frontend, business relay on cloud computing
Cloud Computing Products• Pyramid (DFS/DTS/DCS)• Online Storage System (Mola, MySQL DBProxy)• Offline computing: Hadoop (HCE)• Platform as a Service: Baidu App Engine (BAE)• Cloud cache management: ZCache
PyramidDFS ： Distributed File SystemDTS ： Distributed Table System DCDCS ： Distributed Computing System SChoose Machine, cost, save energyAssumptions: DTS• File mutation is appending (often concurrently) rather than overwriting DF• Once written, files are only read (often sequentially) for S times many• Component failures are norm• High sustained bandwidth is more important than low latency
Pyramid - DTS• Single DTS master + many workers• Sorted and partitioned by row key – Each partition is about 256MB – Partitions can be split or merged due to insertion or deletion•B + tree hierarchy
Pyramid - Tradeoff• Strong consistency vs Single point• Separate layers as DFS, DTS. If there is bug in DTS, the data still exist on DFS vs. Two layer structure leads to complex engineering.• Tablet autoload or autounload, B+ tree: Snapshot, Checkpoint, Lock-free vs. maintenance• Strong ability to fault-tolerance, high performance both in sequence and random read vs. writing latency will accumulate to hundreds of ms
DISQL • Distributed framework for statistic requirementHuman Computer Interaction LSP Distributed shellProgramming API Simple mode DQuery modeHigh layer framework DISQLLower layer framework Hadoop MOLA DDB MPI Big Table ？ LINUX
Baidu App Engine Key-value Key-value DB DB Cache Cache TaskQ TaskQ Public Cloud PHP PHP ueu ueu Entire Web Solution Mail Mail Cron Cron FetchURL FetchURL
BAE Concerns• static scalability & dynamical scalability• isolation & security• high availability ( computing & data )
BAE Tech Features• Multiple apps• Web Server Cluster• Elastic code execution platform• Data Center• Resource Statistics• Auto Monitor• Security Runtime Env.• SDK
BAE Dependent• Software – FS (Linux EXT3) – DB (MySQL, DBProxy) – DataCenter (Mola) – Web Server Cluser(Lighttpd, mod_*) – App Language (PHP, BINGO, Smarty) – Network (RPC Framework, MCPack protocols)
BAE Arch. dashboard dashboard S A T Web server cluster U A T T O I Code execution Code execution cluster M S cluster A T N I A C Cloud Service: Data Cloud Service: Data G S center, cache, mysql, center, cache, mysql, E fetchurl, crontab… fetchurl, crontab…
BAE SandboxPOSIX ENVIRONMENT c HTTP Server Sandbox App Config PHP Sandbox Your APP
BAE Future• LAMP runtime environment => diverse languages support• Support 10k+ applications => 80% products of Baidu will be immigrated.• Billion traffic per day => More than 2k machines in 2 years, CPU IDEL 90% -> 60%
Box computing vs. Cloud Computing• Cloud Computing focus more on the back end, –i.e. the infrastructure of the services, the scalable computing• Box Computing more concerns about the front end, –i.e. the requirements from the users and how to meet the requirements.
Aladdin Plan• In Aladdin open platform, the third party is allowed to submit its own service together with its structured data –signed up your app –choose keywords that you want display –choose a template –submit your data in XML form.• resolve the existing search engines could not crawl and retrieval of "hidden network of" information from.
Challenges• Massive data set: TB, billions of PV day• Return results in very short time: ms• Large scale real time business computing