• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
"erlang, webmail and hibari" at Rakuten tech talk
 

"erlang, webmail and hibari" at Rakuten tech talk

on

  • 4,159 views

Presentation materials to talk about erlang overview, webmail development by erlang and "hibari" use case for GB mail box web mail at Rakuten tech talk on August 24, 2010

Presentation materials to talk about erlang overview, webmail development by erlang and "hibari" use case for GB mail box web mail at Rakuten tech talk on August 24, 2010

Statistics

Views

Total Views
4,159
Views on SlideShare
4,079
Embed Views
80

Actions

Likes
3
Downloads
74
Comments
0

6 Embeds 80

http://www.gemini-bigdata.com 61
http://www.techgig.com 9
http://hibari-gemini.blogspot.com 5
http://www.linkedin.com 2
http://www.cloudian-blog.com 2
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    "erlang, webmail and hibari" at Rakuten tech talk "erlang, webmail and hibari" at Rakuten tech talk Presentation Transcript

    • People, Software, WebMail, BigData Powered by Erlang & Functional Programming Gemini Mobile Technologies, Inc. August 24, 2010 GMT Erlang August 2010 - Rakuten Tech Talk 1/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 2/55
    • Introduction Who is Gemini Mobile Technologies? • Founded: July, 2001 • Offices: San Mateo, CA; Shibuya, Tokyo; Star City, Beijing • Milestones: • 2003: Multimedia messaging service (MMS), Vodafone Japan • 2005: MMSC, Nextel International • 2006: MMSC, eMobile Japan • 2006: S!Town, Softbank Mobile • 2007: ExCast enterprise mail gateway, NTT docomo • 2008: eXplo(tm) service, China Unicom • 2008: Fax satellite gateway, NTT docomo • 2009: International MMS gateway, NTT docomo • 2010: WebMail, Japanese Mobile Carrier & Internet Provider • 2010: Hibari/BigData, Open Source Community • Investors: Goldman Sachs, Ignite, Mizuho Capital, Tokyo MUFJ, Nomura, Access, Aplix • Erlang: Deployed in Japan, China, & European telecoms use for 3 years GMT Erlang August 2010 - Rakuten Tech Talk 3/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 4/55
    • Erlang/OTP Erlang is . . . • General purpose programming language, runtime environment • Originally written in Prolog, now self-hosting in own environment • Functional language, with strict evaluation, single assignment, and dynamic typing • Support for concurrency, multi-core CPU, network distribution, and fault tolerance • Designed for soft-real-time, non-stop applications GMT Erlang August 2010 - Rakuten Tech Talk 5/55
    • Erlang/OTP Open Telecom Platform is . . . • Collection of libraries to support Erlang applications • Standard support libraries: lists, trees, dictionaries, sets, files, queues, network sockets, time manipulation, generic servers & FSMs, basic mathematical funcs, string handling, timed events, • Error handling & logging, alarms, hot code upgrade/downgrade, process tree manipulation & supervision, ... • Protocol stacks for ASN.1, CORBA, HTTP, SSL, Megaco/H.248, . . . • Mnesia distributed database • Foreign language interfaces for C/C++, Java, TCP-based servers GMT Erlang August 2010 - Rakuten Tech Talk 6/55
    • Why Erlang? • Software for Ericsson’s products (telecomm switches, radio gear) was getting too complex: C, C++, Pascal, EriPascal, assembler, PLEX, . . . over 20 different languages used in production and research labs. • Ad hoc mechanisms for field maintenance, bugfixes, upgrades. • ”There must be a better way” . . . • Must be high-level to provide productivity gains. • Must support concurrency, error recovery. Soft-realtime requires no back-tracking, very cheap thread model. • Hot code upgrades very desirable. • Best-known product: AXD301 ATM switch with now 2 MLoC Erlang, plus another 1+ MLoC C and C++ (proprietary h/w drivers, third-party firmward/drivers and protocol stacks) GMT Erlang August 2010 - Rakuten Tech Talk 7/55
    • Erlang Timeline http://www.erlang.org/course/history.html • 1982-85: Language surveys • 1985-86: Experiments with LISP, Prolog, Parlog. • 1988: First Ericsson PBX product to use Erlang (in Prolog) • 1989: Experimental rewrite of switch code, Plex -¿ Erlang, 10x programmer efficiency. First non-Prolog-based interpreter. • 1990: Conference papers, Erlang spreads to Bellcore & others. • 1992: Ports to VxWorks, PC, Macintosh. First two ”real” Ericsson products start using Erlang. • 1993: Network distribution added. Spinoff organization to support Erlang development. GMT Erlang August 2010 - Rakuten Tech Talk 8/55
    • Erlang Timeline continued . . . • 1995: Ericsson AXE-N product collapses (non-Erlang). The replacement ADX starts with Erlang. • 1998: Erlang banned for new products: it wasn’t C++ :( • 1998: Erlang open-sourced, new companies spin off • Today . . . • Erlang still used in Ericsson (despite ban): productivity is too high • AXD301 has 11% of world market (market leader), runs British Telecom’s country-wide ATM network, handles 30-40 million calls/week (avg 49-66 calls/sec), has experienced 31 milliseconds of downtime per year (9 ”nines” reliability) • Active and Growing Open Source Community GMT Erlang August 2010 - Rakuten Tech Talk 9/55
    • Erlang Overview • Concurrency: User-space thread model (extremely cheap to create, switch contexts, destroy), now support for multiple CPUs and multi-core CPUs. Such threads are really ”processes”. • Distribution: All inter-process communication by message passing. Multiple Erlang VMs (virtual machines) communicate transparently via TCP. Same syntax used for message passing for intra- and inter-node communication. • Robustness: All processes are isolated, no data sharing. Reliable detection of crashed processes, even on remote nodes. • Hot code upgrade: old and new code can run simultaneously during code upgrade. Support for data structure changes, module dependencies, etc. GMT Erlang August 2010 - Rakuten Tech Talk 10/55
    • Erlang Overview continued . . . • External interfaces: via Erlang message passing over TCP, ”standard” TCP & UDP protocols, UNIX pipes, shared library API interface. • Portable: Same VM runs on Linux & UNIX, Windows, Macintosh, VxWorks. Message passing between heterogenous systems not a problem. • Many programming errors avoided by: garbage collected data structures, single-assignment variables, robust exception handling and inter-node communication GMT Erlang August 2010 - Rakuten Tech Talk 11/55
    • Currency Oriented Programming • Utterly independent processes: imagine they’re on different machines! • Process semantics: No data sharing, copy-everything message passing → Sharing means: inefficient (distribution is Hard), complicated (mutexes, condition variables, write barriers, etc.) • No penalty for massive parallelism (e.g. tens of thousands of processes) • Each process has an unforgeable name • To send a message, the recipient’s process name is required • Message passing semantics are unreliable, ”send and pray” • Reliable monitoring of remote processes: when and why • No unavoidable penalty for distribution • Same behavior on any hosted OS GMT Erlang August 2010 - Rakuten Tech Talk 12/55
    • Why use a Concurrency Oriented Programming language? • The world is parallel. And distributed. • Things fail. • The biggest challenge is using the proper degree of parallelism in a COP program . . . but it’s difficult to err when processes are cheap. • Programs are automatically scalable: if it works on 1 CPU, it works on many. • Programs are automatically robust when a process fails, no matter where the process is located. See Appendix for additional information. GMT Erlang August 2010 - Rakuten Tech Talk 13/55
    • Erlang in 11 Examples ”One minute per example” text courtesy of Joe Armstrong • Sequential Erlang: 5 examples • Concurrent Erlang: 2 examples • Distribute Erlang: 1 example • Fault-tolerant Erlang: 2 examples • Bit syntax: 1 example See Appendix for additional information. GMT Erlang August 2010 - Rakuten Tech Talk 14/55
    • Sequential: Factorial -module(math). -export([fac/1]). fac(N) when N > 0 -> N * fac(N-1); fac(0) -> 1. > math:fac(25). 15511210043330985984000000 GMT Erlang August 2010 - Rakuten Tech Talk 15/55
    • Sequential: Binary Tree Search lookup(Key, {Key, Val, _, _}) -> {ok, Val}; lookup(Key, {Key1, Val, Left, Right}) when Key < Key1 -> lookup(Key, Left); lookup(Key, {Key1, Val, Left, Right}) -> lookup(Key, Right); lookup(Key, nil) -> not_found. GMT Erlang August 2010 - Rakuten Tech Talk 16/55
    • Sequential: Append, Sort, Adder %% append append([H | T], L) -> [H | append(T, L)]; append([], L) -> L. %% sort sort([Pivot | T]) -> sort([X || X <- T, X < Pivot]) ++ [Pivot] ++ sort([X || X <- T, X >= Pivot]); sort([]) -> []. %% adder > Adder = fun(N) -> fun(X) -> X + N end end. #Fun > G = Adder(10). #Fun > G(5). 15 GMT Erlang August 2010 - Rakuten Tech Talk 17/55
    • Concurrent: Spawn, Send and Receive %% spawn Pid = spawn(fun() -> loop(0) end) %% send Pid ! Message, ... %% receive receive Message1 -> Actions1; Message2 -> Actions2; ... after Time -> TimeOutActions end GMT Erlang August 2010 - Rakuten Tech Talk 18/55
    • Distributed Erlang ... true = net_kernel:connect_node(NodeName), Pid1 = spawn(NodeName, Fun), Pid2 = spawn(NodeName, Module, Func, ArgList), true = is_process_alive(Pid1), ... GMT Erlang August 2010 - Rakuten Tech Talk 19/55
    • Fault Tolerance: catch/throw ... case (catch foo(A, B)) of {abnormal_case1, Y} -> ... {’EXIT’, Opps} -> ... Val -> ... end, ... foo(A, B) -> ... throw({abnormal_case1, ...}) GMT Erlang August 2010 - Rakuten Tech Talk 20/55
    • Fault Tolerance: monitor a process ... process_flag(trap_exit, true), Pid = spawn_link(fun() -> ... end), receive {’EXIT’, Pid, Why} -> ... end GMT Erlang August 2010 - Rakuten Tech Talk 21/55
    • Parsing an IP datagram -define(IP_VERSION, 4). -define(IP_MIN_HDR_LEN,5). DgramSize = size(Dgram), case Dgram of <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, ID:16, Flgs:3, FragOff:13, TTL:8, Proto:8, HdrChkSum:16, SrcIP:32, DestIP:32, Body/binary>> when HLen >= 5, 4*HLen =< DgramSize -> OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN), <<Opts:OptsLen/binary,Data/binary>> = Body, ... GMT Erlang August 2010 - Rakuten Tech Talk 22/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 23/55
    • Why Erlang/OTP? A Killer App In 2008, Gemini deployed it’s first commercial Erlang-based product . . . a high-performance “User Profile” storage server as part of a larger system. What wasn’t selected? • LDAP - persistent, fast . . . but no transactions • RDBMS - persistent, transactions . . . but too slow Why was Erlang selected? • Mnesia - persistent, fast, and transactions • plus many other benefits (programmable, high quality, and open source!) and we haven’t looked back since . . . no regrets! GMT Erlang August 2010 - Rakuten Tech Talk 24/55
    • Why Erlang/OTP? What have we learned? Erlang and functional programming has taught us some good practices and lessons: • lots of processes and messaging passing can be cheap • shared and mutable data can be (are) evil • side-effects can be (are) evil • let it crash! . . . defensive programming is evil • don’t (over) optimize too soon . . . the bottlenecks aren’t always where you expect • keep it simple . . . less is more • don’t be afraid to re-factor . . . when you have the right tools • distributed systems can still be (are) difficult and complex GMT Erlang August 2010 - Rakuten Tech Talk 25/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 26/55
    • WebMail: Multi-Tier Architecture 20K Meter View MTA ISP MOBILE PC O&M SMTP/POP/IMAP HTTP LDAP CLIENT API FRONT API AUTH API BACK API DIRECTORY STORE DATA STORE GMT Erlang August 2010 - Rakuten Tech Talk 27/55
    • WebMail: Multi-Tier Architecture 10K Meter View O&M MOBILE PC ISP MTA HTTP SMTP/POP/IMAP M2CI I/F LDAP M2FE I/F M2BE I/F M2FE AUTH I/F M2FE JOBQ I/F AUTH I/F HIBARI MNESIA MNESIA GMT Erlang August 2010 - Rakuten Tech Talk 28/55
    • WebMail: Erlang What’s It Doing? • All core processing for the “webmail” application • JSON-RPC with the Web browser-based UI (based on UBF) • HTTP and LDAP with authentication and proxy to full-text indexing services • UBF for most inter-application communication • Interface with C++ components for speed, legacy protocol support, and code re-use • Application/Transaction logging and message tracing • Hibari distributed, scalable key-value store for all persistent data • Mnesia for job queuing and multi-indexed profile data GMT Erlang August 2010 - Rakuten Tech Talk 29/55
    • WebMail: Hibari Key-Value Storage for (Almost) Everything • Profile Store • User • Mail • Mail Incoming & Outgoing Filters • User Interface • External ISP • Address Book Store • vCards - Singletons & Packs • Labels - Folders, Flags, and User-Defined • Mail Store • Messages - Singletons & Packs • Message Summaries - Singletons & Packs • Meta Data - Next Uid, Quotas, . . . • Labels - Folders, Flags, and User-Defined • Quota Policy Store GMT Erlang August 2010 - Rakuten Tech Talk 30/55
    • WebMail: Mnesia Storage for Everything Else • Subset of Profile Store • Indexing & retrieval by various attributes • The WebMail application keeps Mnesia and Hibari synchronized for provisioning, updates, and deprovisioning • The WebMail application uses Hibari as the master copy • Job Queue • Outgoing mail, bounce messages, vacation messages, . . . • Notifications to external text indexer • Asynchronous mail deletion • Asynchronous user deprovisioning • ... Possible with Hibari-based storage, but Mnesia was easier (at the project start). GMT Erlang August 2010 - Rakuten Tech Talk 31/55
    • WebMail: Post (almost) Mortem Stuff We’ll Repeat • Erlang, the secret sauce → Ericsson’s support of Erlang/OTP is wonderful • UBF, QuickCheck, & UBF+QuickCheck → Auto-compilation of QuickCheck generators from UBF contracts • Test in various environments: → Exactly the same hardware as customer, on really old & slow hardware, and on a single box/laptop • Automate everything possible: regression tests, performance tests, cluster setups, post-mortem log file gathering, . . . • Document everything possible (with good tools): Git, AsciiDoc, Graphviz, “mscgen” GMT Erlang August 2010 - Rakuten Tech Talk 32/55
    • WebMail: Post (almost) Mortem Stuff We Would Probably Do Differently • Negotiate “less aggressive” schedule • More hardware • Always double check “X & Y” before customer tries doing “X & Y” • Always revisit and cleanup “initial” prototypes • Better and “practical” code review by peers • Better traffic models (for finding bottlenecks, garbage collection issues, . . . ) • 100% automated unit test and code coverage analysis GMT Erlang August 2010 - Rakuten Tech Talk 33/55
    • WebMail: Summary • Technically, Erlang was a great fit for this large system. → Used another language (C++) whenever convenient. • UBF is a very good tool for design, implementation, and testing phases of a large project. • Combining UBF and QuickCheck was invaluable in finding bugs that otherwise would’ve been discovered later. • It’s feasible to develop real-time apps on top of a distributed key-value database. → Hibari’s “strong consistency” support is a large advantage. GMT Erlang August 2010 - Rakuten Tech Talk 34/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 35/55
    • Hibari What is Hibari? • Hibari is a production-ready, distributed, key-value, big data store. → China Mobile and China Unicom - SNS → Japanese internet provider - GB mailbox webmail → Japanese mobile carrier - GB mailbox webmail • Hibari uses chain replication for strong consistency, high-availability, and durability. • Hibari has excellent performance especially for read and large value operations. • Hibari is open-source software under the Apache 2.0 license. GMT Erlang August 2010 - Rakuten Tech Talk 36/55
    • Hibari Environments • Hibari runs on commodity, heterogeneous servers. • Hibari supports Red Hat, CentOS, and Fedora Linux distributions. → Debian, Ubuntu, Gentoo, Mac OS X, and Free BSD are coming soon. • Hibari supports Erlang/OTP R13B04. → R14A is coming soon. • Hibari supports Amazon S3, JSON-RPC-RFC4627, UBF/EBF/JSF and native Erlang client APIs. → Thrift is coming soon. GMT Erlang August 2010 - Rakuten Tech Talk 37/55
    • Hibari Why Another NonSQL? Durable updates Every update is written and flushed to stable storage (fsync() system call) before sending acknowledgments to the client. Consistent updates After an update is acknowledged, no client can see an older version. High Availability Each key can be replicated multiple times. As long as one copy of the key survives, all operations on that key are permitted. GMT Erlang August 2010 - Rakuten Tech Talk 38/55
    • Hibari Why Another NonSQL? Lockless API Locks are not required for all client operations. Optionally, Hibari supports “test-and-set” of each key-value pair via an increasing (enforced by the server) timestamp value. Micro-transactions Under limited circumstances, operations on multiple keys can be given transactional commit/abort semantics. GMT Erlang August 2010 - Rakuten Tech Talk 39/55
    • Hibari Overview - Chain Replication GMT Erlang August 2010 - Rakuten Tech Talk 40/55
    • Hibari Misc - Chain Balancing GMT Erlang August 2010 - Rakuten Tech Talk 41/55
    • Hibari Network Partition - Admin Server GMT Erlang August 2010 - Rakuten Tech Talk 42/55
    • Hibari Network Partition - Chains GMT Erlang August 2010 - Rakuten Tech Talk 43/55
    • Hibari Network Partition - Clients GMT Erlang August 2010 - Rakuten Tech Talk 44/55
    • Hibari Why Erlang/OTP? • Functional • Concurrency and Distribution • Robustness • Hot code and incremental upgrade • Tools → Development, analysis, production support, . . . • Efficiency and Productivity → Small teams make big impact. • Ericsson’s support of Erlang/OTP is wonderful Everything you need to build robust, high performance distributed systems! GMT Erlang August 2010 - Rakuten Tech Talk 45/55
    • Agenda • Introduction • Erlang/OTP • Why Erlang/OTP? • WebMail Case Study • Hibari Case Study • What’s Next? GMT Erlang August 2010 - Rakuten Tech Talk 46/55
    • What’s Next? • WebMail • improving the end-user’s experience • expanding the system’s capacity • adding new and valueable features and services • Hibari • Benchmarking - YCSB performance test • Thrift and Cassandra API • Hadoop map/reduce integration • ... • Community Building • Erlang and Functional Programming → UBF hands-on workshop(s) • Hibari and BigData → Hibari hands-on workshop(s) → Application developer workshop(s) GMT Erlang August 2010 - Rakuten Tech Talk 47/55
    • Work Hard, Work Smarter, Have Fun Thank You http://www.erlang.org/ http://www.geminimobile.com/ http://www.geminimobile.jp/ http://hibari.sourceforge.net/ http://github.com/norton/ubf http://github.com/norton/ubf-jsonrpc http://github.com/norton/ubf-bertrpc Feedback, Contributors Wanted: hibari@geminimobile.com GMT Erlang August 2010 - Rakuten Tech Talk 48/55
    • Appendix Additional Slides Concurrency Oriented Programming GMT Erlang August 2010 - Rakuten Tech Talk 49/55
    • Java And COP ”The only safe way to execute multiple applications, written in the Java programming language, on the same computer is to use a separate JVM for each of them, and to execute each JVM in a separate OS process. This introduces various inefficiencies in resource utilization, which downgrades performance, scalability, and application startup time.” – Czajkowski & Daynes, Sun Microsystems GMT Erlang August 2010 - Rakuten Tech Talk 50/55
    • JSR-000121, Application Isolation API JSR-000121, Application Isolation API, appears (?) to implement such process separation and inter-object communication. It defines: • 11 classes, 78 methods (not including constructors), and 3 exceptions. • Does not directly address inter-machine communication. • Does not directly address debugging and profiling issues. • ”Links” are used for communication between ”isolates”. However . . . ”To maintain isolation, Links provide only ”data” passing facilities; normal Java Objects cannot be shared by passing them. However, a limited number of object types may be passed, including byte arrays, strings, isolates, and links themselves.” GMT Erlang August 2010 - Rakuten Tech Talk 51/55
    • C/C++ and COP • No, neither are even close to being a COP. • No processes, no memory isolation, non-portable, no GC, . . . • Pipes, files, FIFOs, UNIX domain sockets, TCP/UDP sockets, ... • Advantage: You have complete freedom to create the ideal solution. • Disadvantage: You have complete freedom to create the ideal solution. GMT Erlang August 2010 - Rakuten Tech Talk 52/55
    • Is Erlang a COP? • Mostly. • It’s possible to ”forge” an Erlang process name • The ”E” language uses crypto for provably-difficult-to-forge process naming. (Ask Google. . . ) • Very useful for debugging, almost never used by any production system. • Would be possible to remove feature from local VM, but would be very difficult to discriminate between ”legit” vs. ”forged” PIDs received from remote nodes. • Better security policies are needed for WAN-scale distribution. • Robust failure handling is almost all there, but programmer input is slight more than the COP ideal. GMT Erlang August 2010 - Rakuten Tech Talk 53/55
    • Appendix Additional Slides Erlang Examples GMT Erlang August 2010 - Rakuten Tech Talk 54/55
    • Behaviors: A generic server A universal client/server, with hot code swapping. rpc(A, B) -> Tag = new_ref(), A ! {rpc, self(), Tag, B}, receive {Tag, Val} -> Val end. server(Fun, Data) -> receive {new_fun, Fun1} -> server(Fun1, Data); {rpc, From, ReplyAs, Q} -> {Reply, Data1} = Fun(Q, Data), From ! {ReplyAs, Reply}, server(Fun, Data1) end. GMT Erlang August 2010 - Rakuten Tech Talk 55/55