The document discusses datacenter computing using Apache Mesos. It begins by discussing concepts like "data democratization" and "cluster democratization", which refer to making data and computing resources available throughout an organization. It then discusses lessons from Google's approach to datacenter computing, and frameworks that can be integrated with Mesos like Hadoop, Spark, and Docker. Examples of companies using Mesos in production are provided, including Twitter, Airbnb, and eBay. Mesos provides a common substrate that makes heterogeneous computing resources available as a homogeneous set, improving scalability, elasticity, fault tolerance and resource utilization.
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksPaco Nathan
O'Reilly Media - Strata SC 2014
Apache Mesos is an open source cluster manager that provides efficient resource isolation for distributed frameworks—similar to Google’s “Borg” and “Omega” projects for warehouse scale computing. It is based on isolation features in the modern kernel: “cgroups” in Linux, “zones” in Solaris.
Google’s “Omega” research paper shows that while 80% of the jobs on a given cluster may be batch (e.g., MapReduce), 55-60% of cluster resources go toward services. The batch jobs on a cluster are the easy part—services are much more complex to schedule efficiently. However by mixing workloads, the overall problem of scheduling resources can be greatly improved.
Given the use of Mesos as the kernel for a “data center OS”, two additional open source components Chronos (like Unix “cron”) and Marathon (like Unix “init.d”) serve as the building blocks for creating distributed, fault-tolerant, highly-available apps at scale.
This talk will examine case studies of Mesos uses in production at scale: ranging from Twitter (100% on prem) to Airbnb (100% cloud), plus MediaCrossing, Categorize, HubSpot, etc. How have these organizations leveraged Mesos to build better, more scalable and efficient distributed apps? Lessons from the Mesos developer community show that one can port an existing framework with a wrapper in approximately 100 line of code. Moreover, an important lesson from Spark is that based on “data center OS” building blocks one can rewrite a distributed system much like Hadoop to be 100x faster within a relatively small amount of source code.
These case studies illustrate the obvious benefits over prior approaches based on virtualization: scalability, elasticity, fault-tolerance, high availability, improved utilization rates, etc. Less obvious outcomes also include: reduced time for engineers to ramp-up new services at scale; reduced latency between batch and services, enabling new high-ROI use cases; and enabling dev/test apps to run on a production cluster without disrupting operations.
Apache Mesos is the first open source cluster manager that handles the workload efficiently in a distributed environment through dynamic resource sharing and isolation.
Deploying Docker Containers at Scale with Mesos and MarathonDiscover Pinterest
Connor Doyle from Mesosphere.
Deploying Docker Containers at Scale with Mesos and Marathon
The norm these days is to operate apps at web scale. But that’s out of reach for most companies. Deploying Docker containers with Mesos and Marathon makes it easier. See how they help deploy and manage Docker containers at scale and how the Mesos cluster scheduler builds highly-available, fault-tolerant web scale apps.
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksPaco Nathan
O'Reilly Media - Strata SC 2014
Apache Mesos is an open source cluster manager that provides efficient resource isolation for distributed frameworks—similar to Google’s “Borg” and “Omega” projects for warehouse scale computing. It is based on isolation features in the modern kernel: “cgroups” in Linux, “zones” in Solaris.
Google’s “Omega” research paper shows that while 80% of the jobs on a given cluster may be batch (e.g., MapReduce), 55-60% of cluster resources go toward services. The batch jobs on a cluster are the easy part—services are much more complex to schedule efficiently. However by mixing workloads, the overall problem of scheduling resources can be greatly improved.
Given the use of Mesos as the kernel for a “data center OS”, two additional open source components Chronos (like Unix “cron”) and Marathon (like Unix “init.d”) serve as the building blocks for creating distributed, fault-tolerant, highly-available apps at scale.
This talk will examine case studies of Mesos uses in production at scale: ranging from Twitter (100% on prem) to Airbnb (100% cloud), plus MediaCrossing, Categorize, HubSpot, etc. How have these organizations leveraged Mesos to build better, more scalable and efficient distributed apps? Lessons from the Mesos developer community show that one can port an existing framework with a wrapper in approximately 100 line of code. Moreover, an important lesson from Spark is that based on “data center OS” building blocks one can rewrite a distributed system much like Hadoop to be 100x faster within a relatively small amount of source code.
These case studies illustrate the obvious benefits over prior approaches based on virtualization: scalability, elasticity, fault-tolerance, high availability, improved utilization rates, etc. Less obvious outcomes also include: reduced time for engineers to ramp-up new services at scale; reduced latency between batch and services, enabling new high-ROI use cases; and enabling dev/test apps to run on a production cluster without disrupting operations.
Apache Mesos is the first open source cluster manager that handles the workload efficiently in a distributed environment through dynamic resource sharing and isolation.
Deploying Docker Containers at Scale with Mesos and MarathonDiscover Pinterest
Connor Doyle from Mesosphere.
Deploying Docker Containers at Scale with Mesos and Marathon
The norm these days is to operate apps at web scale. But that’s out of reach for most companies. Deploying Docker containers with Mesos and Marathon makes it easier. See how they help deploy and manage Docker containers at scale and how the Mesos cluster scheduler builds highly-available, fault-tolerant web scale apps.
Learn from Accubits Technologies
High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
Cassandra & puppet, scaling data at $15 per monthdaveconnors
Constant Contact shares lessons learned from DevOps approach to implementing Cassandra to manage social media data for over 400k small business customers. Puppet is the critical in our tool chain. Single most important factor was the willingness of Development and Operations to stretch beyond traditional roles and responsibilities.
Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services
This session will feature ways customers have accelerated scientific research in the cloud using the AWS Cloud
Speaker: Brendon Bouffer, Scientific Computing, Amazon Web Services
These are slides from a lecture given at the UC Berkeley School of Information for the Analyzing Big Data with Twitter class. A video of the talk can be found at http://blogs.ischool.berkeley.edu/i290-abdt-s12/2012/08/31/video-lecture-posted-intro-to-hadoop/
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
Apache Hadoop is clearly one of the fastest growing big data platforms to store and analyze arbitrarily structured data in search of business insights. However, applicable commodity infrastructures have advanced greatly in the last number of years and there is not a lot of accurate, current information to assist the community in optimally designing and configuring
Hadoop platforms (Infrastructure and O/S). In this talk we`ll present guidance on Linux and Infrastructure deployment, configuration and optimization from both Red Hat and HP (derived from actual performance data) for clusters optimized for single workloads or balanced clusters that host multiple concurrent workloads.
Learn from Accubits Technologies
High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
Cassandra & puppet, scaling data at $15 per monthdaveconnors
Constant Contact shares lessons learned from DevOps approach to implementing Cassandra to manage social media data for over 400k small business customers. Puppet is the critical in our tool chain. Single most important factor was the willingness of Development and Operations to stretch beyond traditional roles and responsibilities.
Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services
This session will feature ways customers have accelerated scientific research in the cloud using the AWS Cloud
Speaker: Brendon Bouffer, Scientific Computing, Amazon Web Services
These are slides from a lecture given at the UC Berkeley School of Information for the Analyzing Big Data with Twitter class. A video of the talk can be found at http://blogs.ischool.berkeley.edu/i290-abdt-s12/2012/08/31/video-lecture-posted-intro-to-hadoop/
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
Apache Hadoop is clearly one of the fastest growing big data platforms to store and analyze arbitrarily structured data in search of business insights. However, applicable commodity infrastructures have advanced greatly in the last number of years and there is not a lot of accurate, current information to assist the community in optimally designing and configuring
Hadoop platforms (Infrastructure and O/S). In this talk we`ll present guidance on Linux and Infrastructure deployment, configuration and optimization from both Red Hat and HP (derived from actual performance data) for clusters optimized for single workloads or balanced clusters that host multiple concurrent workloads.
[Capitole du Libre] #serverless - mettez-le en oeuvre dans votre entreprise...Ludovic Piot
Tout comme le Cloud IaaS avant lui, le serverless promet de faciliter le succès de vos projets en accélérant le Time to Market et en fluidifiant les relations entre Devs et Ops.
Mais sa mise en œuvre au sein d’une entreprise reste complexe et coûteuse.
Après 2 ans à mettre en place des plateformes managées de ce type, nous partagons nos expériences de ce qu’il faut faire pour mettre en œuvre du serverless en entreprise, en évitant les douleurs et en limitant les contraintes au maximum.
Tout d’abord l’architecture technique, avec 2 implémentations très différentes : Kubernetes et Helm d’un côté, Clever Cloud on-premise de l’autre.
Ensuite, la mise en place et l’utilisation d’OpenFaaS. Comment tester et versionner du Function as a Service. Mais aussi les problématiques de blue/green deployment, de rolling update, d’A/B testing. Comment diagnostiquer rapidement les dépendances et les communications entre services.
Enfin, en abordant les sujets chers à la production : * vulnerability management et patch management, * hétérogénéïté du parc, * monitoring et alerting, * gestion des stacks obsolètes, etc.
In an increasingly competitive marketplace, speed and business agility are paramount. And integration between customer-facing systems and back-end applications is more crucial than ever.
At this event, you'll learn how open source software built by communities, like Apache Camel, Docker, Kubernetes, OpenShift Origin, and Fabric8, can help organizations integrate services and establish effective continuous integration and delivery (CI/CD) pipelines.
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OW2
The OCCIware project aims at managing in a unified manner all layers and domains of the Cloud (XaaS), by building on the Open Cloud Computing (OCCI) standard. OCCIware Metamodel formally specifies the main OCCI concepts. Today a first EMF metamodel is defined that adds to OCCI new concepts such as Extension, Configuration, and EDataType, addressing some limitations of OCCI.
This session highlights OCCIware platform two main components:
– The OCCIware Studio Factory, allowing to produce visually customizable diagram editors for any Cloud configuration business domain modeled in OCCI using the OCCI Extension Studio, such as the flagship Docker Studio ;
– The OCCIware Runtime, based on OW2 erocci project, including the tools for deployment, supervision and administration, and allowing to federate multiple XaaS Cloud runtimes, such as the Roboconf PaaS server and the ActiveEon Cloud Automation multi-IaaS connector.
This talk includes a demonstration of the Docker connector and of how to use the OCCIware Cloud Designer to configure a real life Cloud application (a Java API server on top of a MongoDB cluster)’s business, platform and infrastructure layers seamlessly on both VirtualBox and OpenStack infrastructure.
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
The OCCIware project aims at managing in a unified manner all layers and domains of the Cloud (XaaS), by building on the Open Cloud Computing (OCCI) standard. OCCIware Metamodel formally specifies the main OCCI concepts. Today a first EMF metamodel is defined that adds to OCCI new concepts such as Extension, Configuration, and EDataType, addressing some limitations of OCCI.
This session highlights OCCIware platform two main components:
– The OCCIware Studio Factory, allowing to produce visually customizable diagram editors for any Cloud configuration business domain modeled in OCCI using the OCCI Extension Studio, such as the flagship Docker Studio ;
– The OCCIware Runtime, based on OW2 erocci project, including the tools for deployment, supervision and administration, and allowing to federate multiple XaaS Cloud runtimes, such as the Roboconf PaaS server and the ActiveEon Cloud Automation multi-IaaS connector.
This talk includes a demonstration of the Docker connector and of how to use the OCCIware Cloud Designer to configure a real life Cloud application (a Java API server on top of a MongoDB cluster)’s business, platform and infrastructure layers seamlessly on both VirtualBox and OpenStack infrastructure.
Enterprise-Ready Private and Hybrid Cloud Computing TodayRightScale
RightScale User Conference NYC 2011:
Enterprise-Ready Private and Hybrid Cloud Computing Today
Rich Wolski - Founder and CTO, Eucalyptus
In this session, we'll discuss the use of Eucalyptus and RightScale to build enterprise-grade cloud computing environments. By combining on-premise clouds with Amazon Web Services (AWS) through a common cloud management interface, Eucalyptus and AWS form a coherent platform for reliable and cost-effective enterprise cloud computing. The RightScale Cloud Management Platform delivers the high-level framework for cost-effectively automating and managing this ensemble of technologies.
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
OCCIware at Paris Open Source Summit 2016 - an extensible, standard XaaS cloud consumer platform - demos : Docker & Linked Data Studios, online playground
Tackling complexity in giant systems: approaches from several cloud providersPatrick Chanezon
Systems architecture evolve in cycles every 15-20 years, oscillating between centralization and decentralization, but growing in size and complexity. The last cycle shifted from vertical to horizontal scalability for hardware, applications and data platforms. This talk will describe approaches used by some of the companies who pioneered cloud platforms, Google, Microsoft, Amazon, Netflix & VMware, to tackle complexity when building these giant distributed systems.
This talk was presented at JFokus 2014.
https://www.jfokus.se/jfokus/talks.jsp#Tacklingcomplexityin
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...The Linux Foundation
From Linux Foundation Collaboration Summit 2015, as presented by Larry Wikelius of Cavium Inc on 2015 02-18.
ARM-powered servers for cloud computing and data center servers took a big step forward when Cavium recently announced availability of the industry’s first 48-core family of ARMv8 workload optimized processors.
Delivering the highest core counts and most comprehensive set of I/O and accelerators in the market, ThunderX-based solutions are fully optimized for a targeted set of highly scalable workloads. In his talk, Larry Wikelius, who spearheads ecosystems and partner enablement for Cavium, will discuss growing momentum for ARMv8-based servers for hyperscale data center, cloud servers, big data and scale out computing as well as share new performance benchmark data. In addition, developments in the networking and carrier space are quickly mobilizing behind programs like The Open NFV Organization. The Linux Foundation, Xen Project, Open Compute Project and Linaro are also key.
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
Jan 22nd, 2010 Hadoop meetup presentation on project voldemort and how it plays well with Hadoop at linkedin. The talk focus on Linkedin Hadoop ecosystem. How linkedin manage complex workflows, data ETL , data storage and online serving of 100GB to TB of data.
Similar to Datacenter Computing with Apache Mesos - BigData DC (20)
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. !
Have you heard about
“data democratization” ? ? ?
making data available
throughout more of the organization
4. !
Have you heard about
“data democratization” ? ? ?
making data available
throughout more of the organization
!
Then how would you handle
“cluster democratization” ? ? ?
making data+resources available
throughout
more of the organization
5. !
Have you heard about
“data democratization” ? ? ?
making data available
throughout more of the organization
!
Then how would you handle
“cluster democratization” ? ? ?
making data+resources available
throughout
more of the organization
In other words,
how to remove silos…
7. Datacenter Computing
Google has been doing datacenter computing for years,
to address the complexities of large-scale data workflows:
• leveraging the modern kernel: isolation in lieu of VMs
• “most (>80%) jobs are batch jobs, but the majority
of resources (55–80%) are allocated to service jobs”
• mixed workloads, multi-tenancy
• relatively high utilization rates
• JVM? not so much…
• reality: scheduling batch is simple;
scheduling services is hard/expensive
8. The Modern Kernel: Top Linux Contributors…
arstechnica.com/information-technology/2013/09/...
9. “Return of the Borg”
Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon
Cade Metz
wired.com/wiredenterprise/2013/03/google-
borg-twitter-mesos
!
The Datacenter as a Computer: An Introduction
to the Design ofWarehouse-Scale Machines
Luiz André Barroso, Urs Hölzle
research.google.com/pubs/pub35290.html
!
!
2011 GAFS Omega
John Wilkes, et al.
youtu.be/0ZFMlO98Jkc
10. Google describes the technology…
Omega: flexible, scalable schedulers for large compute clusters
Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes
eurosys2013.tudos.org/wp-content/uploads/2013/paper/
Schwarzkopf.pdf
11. Google describes the business case…
Taming LatencyVariability
Jeff Dean
plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
12. Commercial OS Cluster Schedulers
!
• IBM Platform Symphony
• Microsoft Autopilot
!
Arguably, some grid controllers
are quite notable in-category:
• Univa Grid Engine (formerly SGE)
• Condor
• etc.
14. Beyond Hadoop
Hadoop – an open source solution for fault-tolerant
parallel processing of batch jobs at scale, based on
commodity hardware… however, other priorities have
emerged for the analytics lifecycle:
• apps require integration beyond Hadoop
• multiple topologies, mixed workloads, multi-tenancy
• significant disruptions in h/w cost/performance
curves
• higher utilization
• lower latency
• highly-available, long running services
• more than “Just JVM” – e.g., Python growth
15. Just No Getting Around It
“There's Just No Getting Around It:You're Building a Distributed System”
Mark Cavage
ACM Queue (2013-05-03)
queue.acm.org/detail.cfm?id=2482856
key takeaways on architecture:
• decompose the business application into discrete services on the
boundaries of fault domains, scaling, and data workload
• make as many things as possible stateless
• when dealing with state, deeply understand CAP, latency, throughput,
and durability requirements
“Without practical experience working on successful—and failed—systems, most engineers
take a "hopefully it works" approach and attempt to string together off-the-shelf software,
whether open source or commercial, and often are unsuccessful at building a resilient,
performant system. In reality, building a distributed system requires a methodical approach
to requirements along the boundaries of failure domains, latency, throughput, durability,
consistency, and desired SLAs for the business application at all aspects of the application.”
16.
17. Mesos – open source datacenter computing
a common substrate for cluster computing
mesos.apache.org
heterogenous assets in your datacenter or cloud
made available as a homogenous set of resources
• top-level Apache project
• scalability to 10,000s of nodes
• obviates the need for virtual machines
• isolation (pluggable) for CPU, RAM, I/O, FS, etc.
• fault-tolerant leader election based on Zookeeper
• APIs in C++, Java/Scala, Python, Go, Erlang, Haskell
• web UI for inspecting cluster state
• available for Linux, OpenSolaris, Mac OSX
18. What are the costs of Virtualization?
benchmark
type
OpenVZ
improvement
mixed workloads 210%-300%
LAMP (related) 38%-200%
I/O throughput 200%-500%
response time order magnitude
more pronounced
at higher loads
19. What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
20. Arguments for Datacenter Computing
rather than running several specialized clusters, each
at relatively low utilization rates, instead run many
mixed workloads
obvious benefits are realized in terms of:
• scalability, elasticity, fault tolerance, performance, utilization
• reduced equipment capex, Ops overhead, etc.
• reduced licensing, eliminating need forVMs or potential
vendor lock-in
subtle benefits – arguably, more important for Enterprise IT:
• reduced time for engineers to ramp up new services at scale
• reduced latency between batch and services, enabling new
high ROI use cases
• enables Dev/Test apps to run safely on a Production cluster
22. Prior Practice: Dedicated Servers
• low utilization rates
• longer time to ramp up new services
DATACENTER
23. Prior Practice: Virtualization
DATACENTER PROVISIONED VMS
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
24. Prior Practice: Static Partitioning
STATIC PARTITIONING
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
• failures make static partitioning
more complex to manage
DATACENTER
25. MESOS
Mesos: One Large Pool of Resources
“We wanted people to be able to program
for the datacenter just like they program
for their laptop."
!
Ben Hindman
DATACENTER
27. !
Fault-tolerant distributed systems…
…written in 100-300 lines of
C++, Java/Scala, Python, Go, etc.
…building blocks, if you will
!
Q: required lines of network code?
A: probably none
34. Mesos Master Server
init
|
+ mesos-master
|
+ marathon
|
Mesos Slave Server
init
|
+ docker
| |
| + lxc
| |
| + (user task, under container init system)
| |
|
+ mesos-slave
| |
| + /var/lib/mesos/executors/docker
| | |
| | + docker run …
| | |
The executor, monitored by the
Mesos slave, delegates to the
local Docker daemon for image
discovery and management. The
executor communicates with
Marathon via the Mesos master
and ensures that Docker enforces
the specified resource limitations.
Example: Docker on Mesos
mesosphere.io/2013/09/26/docker-on-mesos/
35. Mesos Master Server
init
|
+ mesos-master
|
+ marathon
|
Mesos Slave Server
init
|
+ docker
| |
| + lxc
| |
| + (user task, under container init system)
| |
|
+ mesos-slave
| |
| + /var/lib/mesos/executors/docker
| | |
| | + docker run …
| | |
Docker
Registry
When a user requests
a container…
Mesos, LXC, and
Docker are tied
together for launch
2
1
3
4
5
6
7
8
Example: Docker on Mesos
mesosphere.io/2013/09/26/docker-on-mesos/
37. Quasar+Mesos @ Stanford, Twitter, etc.…
Quasar: Resource-Efficient and QoS-Aware Cluster Management
Christina Delimitrou, Christos Kozyrakis
stanford.edu/~cdel/2014.asplos.quasar.pdf
38. Quasar+Mesos @ Stanford, Twitter, etc.…
Improving Resource Efficiency with Apache Mesos
Christina Delimitrou
youtu.be/YpmElyi94AA
39. Quasar+Mesos @ Stanford, Twitter, etc.…
Consider that for datacenter computing at scale, surge in workloads
implies:
• large cap-ex investment, long lead-time to build
• utilities cannot supply the power requirements
Even for large players that achieve 2x beyond typical industry DC
util rates, those factors become show-stoppers. Even so, high rates
of over-provisioning are typical, so there’s much room to improve.
Experiences with Quasar+Mesos showed:
• 88% apps get >95% performance
• ~10% overprovisioning instead of 500%
• up to 70% cluster util at steady state
• 23% shorter scenario completion
40. What’s New in Mesos 0.18 ?
mesos.apache.org/blog/mesos-0-18-0-released/
• Containerizer API
• Isolator API
• Cgroup layout change
!
!
Containerization and Docker:
“The first change is in terminology.The Mesos Slave now
uses a Containerizer to provide an environment for each
executor and its tasks to run in. Containerization includes
resource isolation but is a more general concept that can
encompass such things as packaging.”
44. Opposite Ends of the Spectrum, One Common Substrate
Request /
Response
Batch
45. Case Study: Twitter (bare metal / on premise)
“Mesos is the cornerstone of our elastic compute infrastructure –
it’s how we build all our new services and is critical forTwitter’s
continued success at scale. It's one of the primary keys to our
data center efficiency."
Chris Fry, SVP Engineering
blog.twitter.com/2013/mesos-graduates-from-apache-incubation
wired.com/gadgetlab/2013/11/qa-with-chris-fry/
!
• key services run in production: analytics, typeahead, ads
• Twitter engineers rely on Mesos to build all new services
• instead of thinking about static machines, engineers think
about resources like CPU, memory and disk
• allows services to scale and leverage a shared pool of
servers across datacenters efficiently
• reduces the time between prototyping and launching
46. Case Study: Airbnb (fungible cloud infrastructure)
“We think we might be pushing data science in the field of travel
more so than anyone has ever done before… a smaller number
of engineers can have higher impact through automation on
Mesos."
Mike Curtis,VP Engineering
gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data...
• improves resource management and efficiency
• helps advance engineering strategy of building small teams
that can move fast
• key to letting engineers make the most of AWS-based
infrastructure beyond just Hadoop
• allowed company to migrate off Elastic MapReduce
• enables use of Hadoop along with Chronos, Spark, Storm, etc.
47. Case Study: eBay (continuous integration)
eBay PaaS Team
ebaytechblog.com/2014/04/04/delivering-ebays-ci-
solution-with-apache-mesos-part-i/
• cluster management (PaaS core framework
services) for CI
• integration of: OpenStack, Jenkins, Zookeeper,
Mesos, Marathon,Ansible
In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master
instance.This Jenkins instance runs within a dedicatedVM, and over time the
result has beenVM sprawl and poor resource utilization.We started looking at
solutions to maximize our resource utilization and reduce theVM footprint while
still preserving the individual CI instance model.After much deliberation, we
chose Apache Mesos for a POC.This post shares the journey of how we
approached this challenge and accomplished our goal.
48. Case Study: HubSpot (cluster management)
Tom Petr
youtu.be/ROn14csiikw
• 500 deployable objects; 100 deploys/day to
production; 90 engineers; 3 devops on Mesos
cluster
• “Our QA cluster is now a fixed $10K/month —
that used to fluctuate”
65. GlueCon
Broomfield, May 21
gluecon.com/2014/
DockerCon
SF, Jun 9
dockercon.com
Spark Summit
SF, Jun 30
spark-summit.org/2014
OSCON
PDX, Jul 20
oscon.com/oscon2014/
Strata NYC + Hadoop World
NYC, Oct 15
strataconf.com/stratany2014
Calendar:
66. New Book:
“Just Enough Math”,
with Allen Day @MapR Asia
!
advanced math for business people,
to leverage open source for Big Data
galleys: July 2014 @ OSCON
oscon.com/oscon2014/public/
schedule/detail/34873
67. Enterprise DataWorkflows with Cascading
O’Reilly, 2013
shop.oreilly.com/product/0636920028536.do
!
monthly newsletter for updates,
events, conference summaries, etc.:
liber118.com/pxn/