The Internet-of-things: Architecting for the deluge of data


Published on

My presentation from Velocity 2014.

Published in: Technology

The Internet-of-things: Architecting for the deluge of data

  1. 1. The Internet-of-things: Architecting for the deluge of data CTO Bryan Cantrill @bcantrill
  2. 2. Big Data circa 1994: Pre-Internet Source: BusinessWeek, September 5, 1994
  3. 3. Aside: Internet circa 1994 Source: BusinessWeek, October 10, 1994
  4. 4. Big Data circa 2004: Internet exhaust • Through the 1990s, Moore’s Law + Kryder’s Law grew faster than transaction rates, and what was “overwhelming” in 1994 was manageable by 2004 • But large internet concerns (Google, Facebook, Yahoo!) encountered a new class of problem: analyzing massive amounts of data emitted as a byproduct of activity • Data scaled with activity, not transactions — changing both data sizes and economics • Data sizes were too large for extant data warehousing solutions — and were embarrassingly parallel besides
  5. 5. Big Data circa 2004: MapReduce • MapReduce, pioneered by Google and later emulated by Hadoop, pointed to a new paradigm where compute tasks are broken into map and reduce phases • Serves to explicitly divide the work that can be parallelized from that which must be run sequentially • Map phases are farmed out to a storage layer that attempts to co-locate them with the data being mapped • Made for commodity scale-out systems; relatively cheap storage allowed for sloppy but effective solutions (e.g. storing data in triplicate)
  6. 6. Big Data circa 2014 • Hadoop has become the de facto big data processing engine — and HDFS the de facto storage substrate • But HDFS is designed around availability during/for computation; it is not designed to be authoritative • HDFS is used primarily for data that is redundant, transient, replaceable or otherwise fungible • Authoritative storage remains either enterprise storage (on premises) or object storage (in the cloud) • For analysis of non-fungible data, pattern is to ingest data into a Hadoop cluster from authoritative storage • But a new set of problems is poised to emerge...
  7. 7. Big Data circa 2014: Internet-of-things • IDC forecasts that the “digital universe” will grow from 130 exabytes in 2005 to 40,000 exabytes in 2020 — with as much of a third having “analytic value” • This doesn’t even factor in the (long forecasted) rise of the internet-of-things/industrial internet... • Machine-generated data at the edge will effect a step function in data sizes and processing methodologies • No one really knows how much data will be generated by IoT, but the numbers are insane (e.g., HD camera generates 20 GB/hour; a Ford Energi engine generates 25 GB/hour; a GE jet engine generates 1TB/flight)
  8. 8. How to cope with IoT-generated data? • IoT presents so much more data that we will increasingly need data science to make sense of it • To assure data, we need to retain as much raw data as possible, storing it once and authoritatively • Storing data authoritatively has ramifications for the storage substrate • To allow for science, we need to place an emphasis on hypothesis exploration: it must be quick to iterate from hypothesis to experiment to result to new hypothesis • Emphasizing hypothesis exploration has ramifications for the compute abstractions and data movement
  9. 9. The coming ramifications of IoT • It will no longer be acceptable to discard data: all data will need to be retained to explore future hypotheses • It will no longer be acceptable to store three copies: 3X on storage costs is too acute when data is massive • It will no longer be acceptable to move data for analysis: in some cases, not even over the internet! • It will no longer be acceptable to dictate the abstraction: we must accommodate anything that can process data • These shifts are as significant as the shift from traditional data warehousing to scale-out MapReduce!
  10. 10. IoT: Authoritative storage? • “Filesystems” that are really just user-level programs layered on local filesystems lack device-level visibility, sacrificing reliability and performance • Even in-kernel, we have seen the corrosiveness of an abstraction divide in the historic divide between logical volume management and the filesystem: • The volume manager understands multiple disks, but nothing of the higher level semantics of the filesystem • The filesystem understands the higher semantics of the data, but has no physical device understanding • This divide became entrenched over the 1990s, and had devastating ramifications for reliability and performance
  11. 11. The ZFS revolution • Starting in 2001, Sun began a revolutionary new software effort: to unify storage and eliminate the divide • In this model, filesystems would lose their one-to-one association with devices: many filesystems would be multiplexed on many devices • By starting with a clean sheet of paper, ZFS opened up vistas of innovation — and by its architecture was able to solve many otherwise intractable problems • Sun shipped ZFS in 2005, and used it as the foundation of its enterprise storage products starting in 2008 • ZFS was open sourced in 2005; it remains the only open source enterprise-grade filesystem
  12. 12. ZFS advantages • Copy-on-write design allows on-disk consistency to be always assured (eliminating file system check) • Copy-on-write design allows constant-time snapshots in unlimited quantity — and writable clones! • Filesystem architecture allows filesystems to be created instantly and expanded — or shrunk! — on-the-fly • Integrated volume management allows for intelligent device behavior with respect to disk failure and recovery • Adaptive replacement cache (ARC) allows for optimal use of DRAM — especially on high DRAM systems • Support for dedicated log and cache devices allows for optimal use of flash-based SSDs
  13. 13. ZFS at Joyent • Joyent was the earliest ZFS adopter: becoming (in 2005) the first production user of ZFS outside of Sun • ZFS is one of the four foundational technologies of Joyent’s SmartOS, our illumos derivative • The other three foundational technologies in SmartOS are DTrace, Zones and KVM • Search “fork yeah illumos” for the (uncensored) history of OpenSolaris, illumos, SmartOS and derivatives • Joyent has extended ZFS to provide better support multi-tenant operation with I/O throttling
  14. 14. ZFS as the basis for IoT? • ZFS offers commodity hardware economics with enterprise-grade reliability — and obviates the need for cross-machine mirroring for durability • But ZFS is not itself a scale-out distributed system, and is ill suited to become one • Conclusion: ZFS is a good building block for the data explosion from IoT, but not the whole puzzle
  15. 15. IoT: Compute abstraction? • To facilitate hypothesis exploration, we need to carefully consider the abstraction for computation • How is data exploration programmatically expressed? • How can this be made to be multi-tenant? • The key enabling technology for multi-tenancy is virtualization — but where in the stack to virtualize?
  16. 16. • The historical answer — since the 1960s — has been to virtualize at the level of the hardware: • A virtual machine is presented upon which each tenant runs an operating system of their choosing • There are as many operating systems as tenants • The historical motivation for hardware virtualization remains its advantage today: it can run entire legacy stacks unmodified • However, hardware virtualization exacts a heavy tolls: operating systems are not designed to share resources like DRAM, CPU, I/O devices or the network • Hardware virtualization limits tenancy and inhibits performance! Hardware-level virtualization?
  17. 17. • Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization… • ...but at the cost of dictating abstraction to the developer • With IoT, this is especially problematic: we can expect much more analog data and much deeper numerical analysis — and dependencies on native libraries and/or domain-specific languages • Virtualizing at the application platform layer poses many other challenges: • Security, resource containment, language specificity, environment-specific engineering costs Platform-level virtualization?
  18. 18. • Containers virtualizing the OS and hit the sweet spot: • Single OS (single kernel) allows for efficient use of hardware resources, and therefore allows load factors to be high • Disjoint instances are securely compartmentalized by the operating system • Gives customers what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software • Gives customers PaaS when the abstractions work for them, IaaS when they need more generality • OS-level virtualization allows for high levels of tenancy without dictating abstraction or sacrificing efficiency • Zones is a bullet-proof implementation of OS-level virtualization — and is the core abstraction in Joyent’s SmartOS Joyent’s solution: OS containers
  19. 19. Idea: ZFS + Containers?
  20. 20. • Building a sophisticated distributed system on top of ZFS and zones, we have built Manta, an internet-facing object storage system offering in situ compute • That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation Manta: ZFS + Containers!
  21. 21. Aside: Unix • When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie
  22. 22. Unix: Let there be light • In 1969, Doug McIlroy had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”
  23. 23. Unix: ...and there was light And the next morning we had this orgy of one-liners. — Doug McIlroy
  24. 24. The Unix philosophy • The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • Write programs that do one thing and do it well • Write programs to work together • Write programs that handle text streams, because that is a universal interface • Four decades later, this philosophy remains the single most important revolution in software systems thinking!
  25. 25. • In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q Doug McIlroy v. Don Knuth: FIGHT!
  26. 26. Big Data: History repeats itself? • The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • But the solutions do not adhere to the Unix philosophy... • ...and nor do they make use of the substantial Unix foundation for data processing • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight
  27. 27. • Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | mjob create -o -m "tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c" -r "awk '{ x[$2] += $1 } END { for (w in x) { print x[w] " " w } }' | sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way Manta: Unix for Big Data — and IoT
  28. 28. • Eventual consistency represents the wrong CAP tradeoffs for most; we prefer consistency over availability for writes (but still availability for reads) • Many more details: • Celebrity endorsement: Manta: CAP tradeoffs
  29. 29. • Hierarchical storage is an excellent idea (ht: Multics); Manta implements proper directories, delimited with a forward slash • Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning • Manta has full support for CORS headers • Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00) • Manta SDKs exist for node.js, R, Go, Java, Ruby, Python — and of course, compute jobs may be in any of these (plus Perl, Clojure, Lisp, Erlang, Forth, Prolog, Fortran, Haskell, Lua, Mono, COBOL, Fortran, etc.) • “npm install manta” for command line interface Manta: Other design principles
  30. 30. • We believe compute/data convergence to be a constraint imposed by IoT: stores of record must support computation as a first-class, in situ operation • We believe that some (and perhaps many) IoT workloads will require computing at the edge — internet transit may be prohibitive for certain applications • We believe that Unix is a natural way of expressing this computation — and that OS containers are the right way to support this securely • We believe that ZFS is the only sane storage underpinning for such a system • Manta will surely not be the only system to represent the confluence of these — but it is the first Manta and IoT
  31. 31. • Product page: • node.js module: • Manta documentation: • IRC, e-mail, Twitter, etc.: #manta on freenode,, @mcavage, @dapsays, @yunongx, @joyent Manta: More information