Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ops for NoOps - Operational Challenges for Serverless Apps


Published on

A look into the problems users are facing running serverless applications in production, solutions, and digging into the Lambda blackbox.

Presented by Erica Windisch, CTO of IOpipe, Inc. IOpipe offers Application Performance Monitoring for Serverless apps. Eric is ex-Docker, ex-Cloudscaling, builder of clouds, and destroyer of monoliths.

Register for IOpipe at!

Published in: Technology
  • Be the first to comment

Ops for NoOps - Operational Challenges for Serverless Apps

  1. 1. Ops for NoOps Operational challenges for serverless apps Eric Windisch CTO IOpipe, Inc.
  2. 2. ERIC WINDISCH@ewindisch Founder & CTO of IOpipe, Inc.
 ex-Docker, ex-Cloudscaling. Builder of clouds,
 destroyer of monoliths.
  3. 3. EVOLUTION CREATES CHALLENGES ➤ Fear, uncertainty, and doubt for new users:
 ➤ What problems will I run into with this new platform? ➤ What will I do when those problems happen? ➤ Will I know about those problems when they happen? ➤ Is it secure? ➤ What tools to use?
  4. 4. SERVERLESS DEVELOPER PROFILES ➤ Frameworks: SLS, Zappa, Apex, DIY, others. ➤ Event sources: API Gateway, SNS, S3, Kinesis, others. (Alexa and AWS IoT sources are relatively infrequent) ➤ Languages: Node, Python, Java, Go, C, Ruby. ➤ Regions: all the regions: us-east, us-west, etc. several moving to new international regions (Sydney, etc.) ➤ Events: 0-100m+ events per day ➤ Stage: dev/test through production
  5. 5. CLOUDWATCH ➤ Basic “super-outside” metrics: ➤ Errors ➤ Logs ➤ Invocations/time ➤ Duration ➤ Memory ➤ This is what Datadog, Sumologic, etc. ingest.
  6. 6. HARD PROBLEMS ➤ Cold-starts ➤ Especially painful for Java users. ➤ Relationship of metrics vs logs. ➤ Lack or difficulty of profiling & tracing tools. When do GCs happen? ➤ Retries - why/when & in relation to event sources ➤ AWS account level limits (& when to bump them up) ➤ Difficulty of managing unsupported languages:
 C, C++, Go, Ruby, etc. ➤ Debugging of & visibility into distributed systems ➤ Are failures at event-source or lambda function? ➤ Kinesis!!! ➤ Cross-invocation leaks ➤ Memory leaks ➤ File descriptor leaks ➤ Backend process visibility ➤ Thread/callback leaks. ➤ etc.
  7. 7. ➤ We install into your process, around your functions. ➤ Import a library, use a decorator (or low-level reporting API) ➤ Gets info via NodeJS process var, Python sys, etc. ➤ Timing information for wrapped function(s). ➤ Stacktrace reporting. ➤ Extra logging / events pushed by developers. ➤ & looks outside… INSIDE THE PROCESS
  9. 9. 
  11. 11. OUTSIDE THE FUNCTION - INSIDE THE BLACK BOX ➤ Reuse of containers and VMs ➤ Cold-starts by VM, container, and app process. ➤ Tenancy of VMs (how many containers) ➤ Host VM processes(!!) & processes in other containers(!!!) ➤ Limited & very likely to go away…
 probably per-tenent VMs anyway ➤ Spawned processes
  12. 12. SECURITY ➤ I founded the Docker Security Team… ➤ FYI - Lambda’s not Docker! ➤ Lambda’s not perfect! (Security never is!) ➤ Amazon did a good job. ➤ Re-inventing the wheel means repeating some mistakes solved elsewhere… ➤ Still… AWS did a pretty good job. ➤ Don’t worry about it. ➤ Some questions can only be answered by AWS or with more data! TBD!
  13. 13. APP MANAGEMENT ➤ Actionable metrics from inside & outside the function. ➤ Ingest CloudTrail for context-aware intelligence. ➤ Where events originate, retries, etc. ➤ Alarms -> Lambda invocation ➤ triggers AWS services, PagerDuty, IFTTT, Zapier, etc. ➤ Real-time visibility. Daily, Weekly, Monthly reporting.
  14. 14. GETTING HELP ➤ Gitter… ➤ ➤ Slack… ➤ ➤ IOpipe Slack (for registered users!) ➤ Forums… ➤ Amazon -
  15. 15. Eric Windisch CTO IOpipe, Inc. Register for FREE beta access: Q&A