Presentation materials to talk about erlang overview, webmail development by erlang and "hibari" use case for GB mail box web mail at Rakuten tech talk on August 24, 2010
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
"erlang, webmail and hibari" at Rakuten tech talk
1. People, Software, WebMail, BigData
Powered by
Erlang & Functional
Programming
Gemini Mobile Technologies, Inc.
August 24, 2010
GMT Erlang August 2010 - Rakuten Tech Talk 1/55
2. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 2/55
3. Introduction
Who is Gemini Mobile Technologies?
• Founded: July, 2001
• Offices: San Mateo, CA; Shibuya, Tokyo; Star City, Beijing
• Milestones:
• 2003: Multimedia messaging service (MMS), Vodafone Japan
• 2005: MMSC, Nextel International
• 2006: MMSC, eMobile Japan
• 2006: S!Town, Softbank Mobile
• 2007: ExCast enterprise mail gateway, NTT docomo
• 2008: eXplo(tm) service, China Unicom
• 2008: Fax satellite gateway, NTT docomo
• 2009: International MMS gateway, NTT docomo
• 2010: WebMail, Japanese Mobile Carrier & Internet Provider
• 2010: Hibari/BigData, Open Source Community
• Investors: Goldman Sachs, Ignite, Mizuho Capital, Tokyo
MUFJ, Nomura, Access, Aplix
• Erlang: Deployed in Japan, China, & European
telecoms use for 3 years
GMT Erlang August 2010 - Rakuten Tech Talk 3/55
4. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 4/55
5. Erlang/OTP
Erlang is . . .
• General purpose programming language, runtime environment
• Originally written in Prolog, now self-hosting in own
environment
• Functional language, with strict evaluation, single assignment,
and dynamic typing
• Support for concurrency, multi-core CPU, network
distribution, and fault tolerance
• Designed for soft-real-time, non-stop applications
GMT Erlang August 2010 - Rakuten Tech Talk 5/55
6. Erlang/OTP
Open Telecom Platform is . . .
• Collection of libraries to support Erlang applications
• Standard support libraries: lists, trees, dictionaries, sets, files,
queues, network sockets, time manipulation, generic servers &
FSMs, basic mathematical funcs, string handling, timed
events,
• Error handling & logging, alarms, hot code
upgrade/downgrade, process tree manipulation & supervision,
...
• Protocol stacks for ASN.1, CORBA, HTTP, SSL,
Megaco/H.248, . . .
• Mnesia distributed database
• Foreign language interfaces for C/C++, Java, TCP-based
servers
GMT Erlang August 2010 - Rakuten Tech Talk 6/55
7. Why Erlang?
• Software for Ericsson’s products (telecomm switches, radio
gear) was getting too complex: C, C++, Pascal, EriPascal,
assembler, PLEX, . . . over 20 different languages used in
production and research labs.
• Ad hoc mechanisms for field maintenance, bugfixes, upgrades.
• ”There must be a better way” . . .
• Must be high-level to provide productivity gains.
• Must support concurrency, error recovery. Soft-realtime
requires no back-tracking, very cheap thread model.
• Hot code upgrades very desirable.
• Best-known product: AXD301 ATM switch with now 2 MLoC
Erlang, plus another 1+ MLoC C and C++ (proprietary h/w
drivers, third-party firmward/drivers and protocol stacks)
GMT Erlang August 2010 - Rakuten Tech Talk 7/55
8. Erlang Timeline
http://www.erlang.org/course/history.html
• 1982-85: Language surveys
• 1985-86: Experiments with LISP, Prolog, Parlog.
• 1988: First Ericsson PBX product to use Erlang (in Prolog)
• 1989: Experimental rewrite of switch code, Plex -¿ Erlang,
10x programmer efficiency. First non-Prolog-based interpreter.
• 1990: Conference papers, Erlang spreads to Bellcore & others.
• 1992: Ports to VxWorks, PC, Macintosh. First two ”real”
Ericsson products start using Erlang.
• 1993: Network distribution added. Spinoff organization to
support Erlang development.
GMT Erlang August 2010 - Rakuten Tech Talk 8/55
9. Erlang Timeline
continued . . .
• 1995: Ericsson AXE-N product collapses (non-Erlang). The
replacement ADX starts with Erlang.
• 1998: Erlang banned for new products: it wasn’t C++ :(
• 1998: Erlang open-sourced, new companies spin off
• Today . . .
• Erlang still used in Ericsson (despite ban): productivity is too
high
• AXD301 has 11% of world market (market leader), runs
British Telecom’s country-wide ATM network, handles 30-40
million calls/week (avg 49-66 calls/sec), has experienced 31
milliseconds of downtime per year (9 ”nines” reliability)
• Active and Growing Open Source Community
GMT Erlang August 2010 - Rakuten Tech Talk 9/55
10. Erlang Overview
• Concurrency: User-space thread model (extremely cheap to
create, switch contexts, destroy), now support for multiple
CPUs and multi-core CPUs. Such threads are really
”processes”.
• Distribution: All inter-process communication by message
passing. Multiple Erlang VMs (virtual machines)
communicate transparently via TCP. Same syntax used for
message passing for intra- and inter-node communication.
• Robustness: All processes are isolated, no data sharing.
Reliable detection of crashed processes, even on remote nodes.
• Hot code upgrade: old and new code can run simultaneously
during code upgrade. Support for data structure changes,
module dependencies, etc.
GMT Erlang August 2010 - Rakuten Tech Talk 10/55
11. Erlang Overview
continued . . .
• External interfaces: via Erlang message passing over TCP,
”standard” TCP & UDP protocols, UNIX pipes, shared library
API interface.
• Portable: Same VM runs on Linux & UNIX, Windows,
Macintosh, VxWorks. Message passing between heterogenous
systems not a problem.
• Many programming errors avoided by: garbage collected data
structures, single-assignment variables, robust exception
handling and inter-node communication
GMT Erlang August 2010 - Rakuten Tech Talk 11/55
12. Currency Oriented Programming
• Utterly independent processes: imagine they’re on different
machines!
• Process semantics: No data sharing, copy-everything message
passing
→ Sharing means: inefficient (distribution is Hard), complicated
(mutexes, condition variables, write barriers, etc.)
• No penalty for massive parallelism (e.g. tens of thousands of
processes)
• Each process has an unforgeable name
• To send a message, the recipient’s process name is required
• Message passing semantics are unreliable, ”send and pray”
• Reliable monitoring of remote processes: when and why
• No unavoidable penalty for distribution
• Same behavior on any hosted OS
GMT Erlang August 2010 - Rakuten Tech Talk 12/55
13. Why use a Concurrency Oriented Programming
language?
• The world is parallel. And distributed.
• Things fail.
• The biggest challenge is using the proper degree of parallelism
in a COP program . . . but it’s difficult to err when processes
are cheap.
• Programs are automatically scalable: if it works on 1 CPU, it
works on many.
• Programs are automatically robust when a process fails, no
matter where the process is located.
See Appendix for additional information.
GMT Erlang August 2010 - Rakuten Tech Talk 13/55
14. Erlang in 11 Examples
”One minute per example” text courtesy of Joe Armstrong
• Sequential Erlang: 5 examples
• Concurrent Erlang: 2 examples
• Distribute Erlang: 1 example
• Fault-tolerant Erlang: 2 examples
• Bit syntax: 1 example
See Appendix for additional information.
GMT Erlang August 2010 - Rakuten Tech Talk 14/55
20. Fault Tolerance: catch/throw
...
case (catch foo(A, B)) of
{abnormal_case1, Y} ->
...
{’EXIT’, Opps} ->
...
Val ->
...
end,
...
foo(A, B) ->
...
throw({abnormal_case1, ...})
GMT Erlang August 2010 - Rakuten Tech Talk 20/55
21. Fault Tolerance: monitor a process
...
process_flag(trap_exit, true),
Pid = spawn_link(fun() -> ... end),
receive
{’EXIT’, Pid, Why} ->
...
end
GMT Erlang August 2010 - Rakuten Tech Talk 21/55
22. Parsing an IP datagram
-define(IP_VERSION, 4).
-define(IP_MIN_HDR_LEN,5).
DgramSize = size(Dgram),
case Dgram of
<<?IP_VERSION:4, HLen:4,
SrvcType:8, TotLen:16, ID:16, Flgs:3,
FragOff:13, TTL:8, Proto:8, HdrChkSum:16,
SrcIP:32, DestIP:32, Body/binary>> when
HLen >= 5, 4*HLen =< DgramSize ->
OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN),
<<Opts:OptsLen/binary,Data/binary>> = Body,
...
GMT Erlang August 2010 - Rakuten Tech Talk 22/55
23. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 23/55
24. Why Erlang/OTP?
A Killer App
In 2008, Gemini deployed it’s first commercial Erlang-based
product . . . a high-performance “User Profile” storage server as
part of a larger system.
What wasn’t selected?
• LDAP - persistent, fast . . . but no transactions
• RDBMS - persistent, transactions . . . but too slow
Why was Erlang selected?
• Mnesia - persistent, fast, and transactions
• plus many other benefits (programmable, high quality, and
open source!)
and we haven’t looked back since . . . no regrets!
GMT Erlang August 2010 - Rakuten Tech Talk 24/55
25. Why Erlang/OTP?
What have we learned?
Erlang and functional programming has taught us some good
practices and lessons:
• lots of processes and messaging passing can be cheap
• shared and mutable data can be (are) evil
• side-effects can be (are) evil
• let it crash! . . . defensive programming is evil
• don’t (over) optimize too soon . . . the bottlenecks aren’t
always where you expect
• keep it simple . . . less is more
• don’t be afraid to re-factor . . . when you have the right tools
• distributed systems can still be (are) difficult and complex
GMT Erlang August 2010 - Rakuten Tech Talk 25/55
26. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 26/55
27. WebMail: Multi-Tier Architecture
20K Meter View
MTA ISP MOBILE PC O&M
SMTP/POP/IMAP HTTP
LDAP CLIENT API
FRONT API
AUTH API BACK API
DIRECTORY STORE DATA STORE
GMT Erlang August 2010 - Rakuten Tech Talk 27/55
28. WebMail: Multi-Tier Architecture
10K Meter View
O&M MOBILE PC ISP MTA
HTTP SMTP/POP/IMAP
M2CI I/F LDAP
M2FE I/F
M2BE I/F M2FE AUTH I/F
M2FE JOBQ I/F AUTH I/F
HIBARI MNESIA MNESIA
GMT Erlang August 2010 - Rakuten Tech Talk 28/55
29. WebMail: Erlang
What’s It Doing?
• All core processing for the “webmail” application
• JSON-RPC with the Web browser-based UI (based on UBF)
• HTTP and LDAP with authentication and proxy to full-text
indexing services
• UBF for most inter-application communication
• Interface with C++ components for speed, legacy protocol
support, and code re-use
• Application/Transaction logging and message tracing
• Hibari distributed, scalable key-value store for all persistent
data
• Mnesia for job queuing and multi-indexed profile data
GMT Erlang August 2010 - Rakuten Tech Talk 29/55
30. WebMail: Hibari
Key-Value Storage for (Almost) Everything
• Profile Store
• User
• Mail
• Mail Incoming & Outgoing Filters
• User Interface
• External ISP
• Address Book Store
• vCards - Singletons & Packs
• Labels - Folders, Flags, and User-Defined
• Mail Store
• Messages - Singletons & Packs
• Message Summaries - Singletons & Packs
• Meta Data - Next Uid, Quotas, . . .
• Labels - Folders, Flags, and User-Defined
• Quota Policy Store
GMT Erlang August 2010 - Rakuten Tech Talk 30/55
31. WebMail: Mnesia
Storage for Everything Else
• Subset of Profile Store
• Indexing & retrieval by various attributes
• The WebMail application keeps Mnesia and Hibari
synchronized for provisioning, updates, and deprovisioning
• The WebMail application uses Hibari as the master copy
• Job Queue
• Outgoing mail, bounce messages, vacation messages, . . .
• Notifications to external text indexer
• Asynchronous mail deletion
• Asynchronous user deprovisioning
• ...
Possible with Hibari-based storage, but Mnesia was easier (at the
project start).
GMT Erlang August 2010 - Rakuten Tech Talk 31/55
32. WebMail: Post (almost) Mortem
Stuff We’ll Repeat
• Erlang, the secret sauce
→ Ericsson’s support of Erlang/OTP is wonderful
• UBF, QuickCheck, & UBF+QuickCheck
→ Auto-compilation of QuickCheck generators from UBF
contracts
• Test in various environments:
→ Exactly the same hardware as customer, on really old & slow
hardware, and on a single box/laptop
• Automate everything possible: regression tests, performance
tests, cluster setups, post-mortem log file gathering, . . .
• Document everything possible (with good tools): Git,
AsciiDoc, Graphviz, “mscgen”
GMT Erlang August 2010 - Rakuten Tech Talk 32/55
33. WebMail: Post (almost) Mortem
Stuff We Would Probably Do Differently
• Negotiate “less aggressive” schedule
• More hardware
• Always double check “X & Y” before customer tries
doing “X & Y”
• Always revisit and cleanup “initial” prototypes
• Better and “practical” code review by peers
• Better traffic models (for finding bottlenecks, garbage
collection issues, . . . )
• 100% automated unit test and code coverage analysis
GMT Erlang August 2010 - Rakuten Tech Talk 33/55
34. WebMail: Summary
• Technically, Erlang was a great fit for this large system.
→ Used another language (C++) whenever convenient.
• UBF is a very good tool for design, implementation, and
testing phases of a large project.
• Combining UBF and QuickCheck was invaluable in finding
bugs that otherwise would’ve been discovered later.
• It’s feasible to develop real-time apps on top of a distributed
key-value database.
→ Hibari’s “strong consistency” support is a large advantage.
GMT Erlang August 2010 - Rakuten Tech Talk 34/55
35. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 35/55
36. Hibari
What is Hibari?
• Hibari is a production-ready, distributed, key-value, big data
store.
→ China Mobile and China Unicom - SNS
→ Japanese internet provider - GB mailbox webmail
→ Japanese mobile carrier - GB mailbox webmail
• Hibari uses chain replication for strong consistency,
high-availability, and durability.
• Hibari has excellent performance especially for read and large
value operations.
• Hibari is open-source software under the Apache 2.0 license.
GMT Erlang August 2010 - Rakuten Tech Talk 36/55
37. Hibari
Environments
• Hibari runs on commodity, heterogeneous servers.
• Hibari supports Red Hat, CentOS, and Fedora Linux
distributions.
→ Debian, Ubuntu, Gentoo, Mac OS X, and Free BSD are
coming soon.
• Hibari supports Erlang/OTP R13B04.
→ R14A is coming soon.
• Hibari supports Amazon S3, JSON-RPC-RFC4627,
UBF/EBF/JSF and native Erlang client APIs.
→ Thrift is coming soon.
GMT Erlang August 2010 - Rakuten Tech Talk 37/55
38. Hibari
Why Another NonSQL?
Durable updates Every update is written and flushed to stable
storage (fsync() system call) before sending
acknowledgments to the client.
Consistent updates After an update is acknowledged, no client
can see an older version.
High Availability Each key can be replicated multiple times. As
long as one copy of the key survives, all operations
on that key are permitted.
GMT Erlang August 2010 - Rakuten Tech Talk 38/55
39. Hibari
Why Another NonSQL?
Lockless API Locks are not required for all client operations.
Optionally, Hibari supports “test-and-set” of each
key-value pair via an increasing (enforced by the
server) timestamp value.
Micro-transactions Under limited circumstances, operations on
multiple keys can be given transactional
commit/abort semantics.
GMT Erlang August 2010 - Rakuten Tech Talk 39/55
45. Hibari
Why Erlang/OTP?
• Functional
• Concurrency and Distribution
• Robustness
• Hot code and incremental upgrade
• Tools
→ Development, analysis, production support, . . .
• Efficiency and Productivity
→ Small teams make big impact.
• Ericsson’s support of Erlang/OTP is wonderful
Everything you need to build robust, high performance distributed
systems!
GMT Erlang August 2010 - Rakuten Tech Talk 45/55
46. Agenda
• Introduction
• Erlang/OTP
• Why Erlang/OTP?
• WebMail Case Study
• Hibari Case Study
• What’s Next?
GMT Erlang August 2010 - Rakuten Tech Talk 46/55
47. What’s Next?
• WebMail
• improving the end-user’s experience
• expanding the system’s capacity
• adding new and valueable features and services
• Hibari
• Benchmarking - YCSB performance test
• Thrift and Cassandra API
• Hadoop map/reduce integration
• ...
• Community Building
• Erlang and Functional Programming
→ UBF hands-on workshop(s)
• Hibari and BigData
→ Hibari hands-on workshop(s)
→ Application developer workshop(s)
GMT Erlang August 2010 - Rakuten Tech Talk 47/55
48. Work Hard, Work Smarter, Have Fun
Thank You
http://www.erlang.org/
http://www.geminimobile.com/
http://www.geminimobile.jp/
http://hibari.sourceforge.net/
http://github.com/norton/ubf
http://github.com/norton/ubf-jsonrpc
http://github.com/norton/ubf-bertrpc
Feedback, Contributors Wanted: hibari@geminimobile.com
GMT Erlang August 2010 - Rakuten Tech Talk 48/55
50. Java And COP
”The only safe way to execute multiple applications,
written in the Java programming language, on the same
computer is to use a separate JVM for each of them, and
to execute each JVM in a separate OS process. This
introduces various inefficiencies in resource utilization,
which downgrades performance, scalability, and
application startup time.”
– Czajkowski & Daynes, Sun Microsystems
GMT Erlang August 2010 - Rakuten Tech Talk 50/55
51. JSR-000121, Application Isolation API
JSR-000121, Application Isolation API, appears (?) to implement
such process separation and inter-object communication. It defines:
• 11 classes, 78 methods (not including constructors), and 3
exceptions.
• Does not directly address inter-machine communication.
• Does not directly address debugging and profiling issues.
• ”Links” are used for communication between ”isolates”.
However . . .
”To maintain isolation, Links provide only ”data” passing
facilities; normal Java Objects cannot be shared by
passing them. However, a limited number of object types
may be passed, including byte arrays, strings, isolates,
and links themselves.”
GMT Erlang August 2010 - Rakuten Tech Talk 51/55
52. C/C++ and COP
• No, neither are even close to being a COP.
• No processes, no memory isolation, non-portable, no GC, . . .
• Pipes, files, FIFOs, UNIX domain sockets, TCP/UDP sockets,
...
• Advantage: You have complete freedom to create the ideal
solution.
• Disadvantage: You have complete freedom to create the ideal
solution.
GMT Erlang August 2010 - Rakuten Tech Talk 52/55
53. Is Erlang a COP?
• Mostly.
• It’s possible to ”forge” an Erlang process name
• The ”E” language uses crypto for provably-difficult-to-forge
process naming. (Ask Google. . . )
• Very useful for debugging, almost never used by any
production system.
• Would be possible to remove feature from local VM, but would
be very difficult to discriminate between ”legit” vs. ”forged”
PIDs received from remote nodes.
• Better security policies are needed for WAN-scale distribution.
• Robust failure handling is almost all there, but programmer
input is slight more than the COP ideal.
GMT Erlang August 2010 - Rakuten Tech Talk 53/55