PQL – Puma Query Languageo CREATE INPUT TABLE t o CREATE AGGREGATION (‘time, ‘adid’, ‘userid’); ‘abc’o CREATE VIEW v AS INSERT INTO l (a, b, c) SELECT *, udf.age(userid) SELECT FROM t udf.hour(time), WHERE udf.age(userid) > adid, 21 age, count(1),o CREATE HBASE TABLE h … udf.count_distinc(userid)o CREATE LOGICAL TABLE l … FROM v GROUP BY udf.hour(time), adid, age;
Facebook recently deployed FacebookMessages, its first ever user-facing applicationbuilt on the Apache Hadoop platform. ApacheHBase is a database-like layer built on Hadoopdesigned to support billions of messages perday. This paper describes the reasons whyFacebook chose Hadoop and HBase over othersystems such as Apache Cassandra andVoldemort and discusses the application’srequirements for consistency, availability,partition tolerance, data model and scalability.We explore the enhancements made toHadoop to make it a more effective realtimesystem, the tradeoffs we made whileconfiguring the system,
and how this solution has significantadvantages over the sharded MySQLdatabase scheme used in otherapplications at Facebook and many otherweb-scale companies.We discuss the motivations behind ourdesign choices, the challenges that we facein day-to-day operations, and futurecapabilities and improvements still underdevelopment. We offer these observationson the deployment as a model for othercompanies who are contemplating aHadoop-based solution over traditionalsharded RDBMS deployments
Scribehttps://github.com/facebook/scribe/wiki o Scribe is a server for aggregating log data that‘s streamed in real time from clients. It is designed to be scalable and reliable.o There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.o If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers.
Scribehttps://github.com/facebook/scribe/wiki o The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers.o Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code.
Scribehttps://github.com/facebook/scribe/wiki o The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path.o Flexibility and extensibility is provided through the “store” abstraction.o Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server.
Scribehttps://github.com/facebook/scribe/wiki o Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.o Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
Scribe Overview / Reliabilityhttps://github.com/facebook/scribe/wiki/Scribe-Overview o The scribe system is designed to be robust to failure of the network or any specific machine, but does not provide transactional guarantees. If a scribe instance on a client machine (we’ll call it a resender for the moment) is unable to send messages to the central scribe server it saves them on local disk, then sends them when the central server or network recovers. To avoid overloading the central server upon a restart, the resender waits a random time between reconnect attempts, and if the central server is near capacity it will return TRY_LATER, which tells the resender to not attempt another send for several minutes.
Scribe Overview / Reliabilityhttps://github.com/facebook/scribe/wiki/Scribe-Overview o The central server has similar behavior (the same code in fact) for handling failure of the nfs filer or distributed filesystem it’s writing to. If the filesystem goes down the scribe server writes to local disk until it recovers, then sends the data from local disk to the remote filesystem. The order of the messages is preserved in both this and the resender case.
Scribe Overview / Reliabilityhttps://github.com/facebook/scribe/wiki/Scribe-Overview o These error cases will lead to loss of data:o If a client can’t connect to either the local or central scribe server the message will be losto If a scribe server crashes it could lose a small amount of data that’s in memory but not on disko Some multiple component failure cases, such as a resender can’t connect to any central server and its local disk fills upo Some rare timeout conditions can lead to duplicate messages
Scribe Overview / Configurationhttps://github.com/facebook/scribe/wiki/Scribe-Overview o The scribe server is configured by the file specified in the -c command line option, or the file /usr/local/scribe/scribe.conf if none is specified on the command line.o The basic idea of the configuration is that a particular category if messages is sent to one or more “stores” of various types. Some types of stores can contain other stores, for example a bucket store contains many file stores and distributes messages to them based on a hash.
Scribe Overview / Configurationhttps://github.com/facebook/scribe/wiki/Scribe-Overview o The configuration file consists of a global section and a section for each store. The global section includes the listening port number and the maximum number of messages that the server can handle in a second.o Each store section must include a category and a type. There is no restriction on the number categories or the number of stores per category.