Unix in the Cloud
Ignorance, Stagnation,
    Obsolescence
Synopsis
▪ cloud in the broad sense of ideology
▪ not quite about running BSD on EC2

▪ very limited to skills and experience of yours
  humbly
Multi-core
▪ installation?
▪ configuration management?

▪ load balancing?
Multi-node
▪ installation?
▪ configuration management?

▪ load balancing?

▪ why multi-node?
Large Computing
     Needs
▪ Facebook, Google, ...

▪ more than any OS can provide
Happy Hardware Vendor Law
The amount of nodes needed to solve a given task doubles every now and again.
OS Scalability Limit
 ▪ 1 node only
 ▪ multi-socket and stacks approaching NUMA

 ▪ E25K, z10, etc — fail for most purposes
Operating System — ?
  ▪ traditional definition no more relevant
  ▪ the notion itself on the brink of obsolescence

  ▪ field heavily eroded by current distributed apps
Distributed
      Applications
▪ forced to be an OS unto themselves

▪ huge overlap

▪ huge opportunity for sharing and consolidation
Anti-Patterns
▪ virtualization
▪ chefs and puppets

▪ thick abstraction
Attempts
▪ z/OS
▪ Plan 9, Inferno

▪ Clustrx, E1, DYSEAC, ...

▪ OpenStack (~~)
Species Survival Plan
Freeze the bodies and leave them for future generations to fix.
Don't Panic:
       Incremental
▪ perfection v. done

▪ still a decade or more till a good AI

▪ no practical need for POSIX over a cloud
Mindful Approach
▪ immediate practicality
▪ long-term perspective

▪ sustained, integrally rich effect
Operating System
▪ major abstraction repository
▪ overlapping code distillery

▪ pre-production architecture research
Increments
Machine Generated
      Data
▪ logs, error messages, status monitors

▪ meant for humans... no more

▪ rethinking for better aggregation and analysis
Identity and
    Authentication
▪ YP, LDAP outdated and poorly supported

▪ no distributed model

▪ passwd in git as a first stab
Remote Procedure Call
  ▪ ssh losing relevance, HPN or not
  ▪ all-mighty agent daemon worse than rsh

  ▪ capabilities, RBAC, WoT
Hardware Failures
▪ no culture for low-level fault-tolerance
▪ watchdogd as state-of-the-art self-healing

▪ focus on self-diagnostics: disk error counters, etc
Distributed
     Configuration
▪ current anti-patterns worsen the problem

▪ role-aware configuration

▪ / in git as a second stab
Storage
▪ intra-node redundancy irrelevant
▪ no appropriate local multi-disk FS

▪ no fast path for data exchange

▪ nginx + curl + dispatcher
Error Handling
▪ cf MGD and hardware failures
▪ software is 10x more prone to failures

▪ serious problem at scale
☺

Unix in the Cloud — Ignorance, Stagnation, Obsolescence