- PyWren-IBM is a Python framework that allows users to easily scale Python code across serverless platforms like IBM Cloud Functions without having to learn the underlying storage or function as a service APIs.
- It addresses challenges like how to integrate existing applications and workflows with serverless computing, how to process large datasets without becoming a storage expert, and how to scale code without major disruptions.
- The document discusses use cases for PyWren-IBM like Monte Carlo simulations, protein folding, and stock price prediction that demonstrate how it can be used for high performance computing workloads.
Your easy move to serverless computing and radically simplified data processingBig Data Value Association
Your easy move to serverless computing and radically simplified data processing.
Dr Ofer Biran (STSM and Manager, Cloud and Data Technologies, IBM Haifa Research Lab)
AWS Serverless patterns & best-practices in AWSDima Pasko
This presentation discusses serverless patterns and best practices in AWS. It defines what serverless computing is and outlines the business case for serverless, including faster time to market, reduced costs, improved reliability, and increased innovation. It then covers AWS serverless design patterns and solutions, serverless myths and anti-patterns to avoid, and best practices like using the Serverless Application Model, AWS CDK, nested stacks, managing limits and connections, and power tuning.
1) Cloud native applications are built to take advantage of cloud computing resources like dynamically provisioned micro-services and distributed ephemeral components.
2) Netflix has transitioned to being a cloud native application built on an open source platform using AWS for scalable infrastructure, but also uses other providers for services not fully supported by AWS like content delivery and DNS.
3) What has changed is developers are freed from being the bottleneck through decentralization and automation of operations, allowing for greater agility, innovation, and business competitiveness in the cloud native model.
Disaster recovery is a coordinated process of restoring systems, data and infrastructure, required to support key ongoing business operations. Cloud is a great way for managing the risk.
Review this AWS and Nimbo webinar where we discuss moving your data center to the AWS Cloud. We feature a real world example to illustrate how this can be achieved both quickly and smoothly.
Hess Corporation recently moved part of its infrastructure to the cloud, to prepare for a business divestiture. Relying on consultation from enterprise cloud solution provider Nimbo, the migration was completed securely, in about half the time it would have taken in an on-premises environment.
[Full slides now also available at http://www.slideshare.net/adrianco/netflix-on-cloud-combined-slides-for-dev-and-ops]
Short summary of why Netflix is running on the Amazon cloud, what is running there, what we have learned and where this is taking us.
This is the introduction section to a series of public presentations that will go into much more detail. The Silicon Valley Cloud Computing Meetup was on Oct 14th, QCon San Francisco November 3rd.
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
Continuous integration and deployment has become an increasingly standard and common practice in software development. However, doing this for machine learning models and applications introduces many challenges. Not only do we need to account for standard code quality and integration testing, but how do we best account for changes in model performance metrics coming from changes to code, deployment framework or mechanism, pre- and post-processing steps, changes in data, not to mention the core deep learning model itself?
In addition, deep learning presents particular challenges:
* model sizes are often extremely large and take significant time and resources to train
* models are often more difficult to understand and interpret making it more difficult to debug issues
* inputs to deep learning are often very different from the tabular data involved in most ‘traditional machine learning’ models
* model formats, frameworks and the state-of-the art models and architectures themselves are changing extremely rapidly
* usually many disparate tools are combined to create the full end-to-end pipeline for training and deployment, making it trickier to plug together these components and track down issues.
We also need to take into account the impact of changes on wider aspects such as model bias, fairness, robustness and explainability. And we need to track all of this over time and in a standard, repeatable manner. This talk explores best practices for handling these myriad challenges to create a standardized, automated, repeatable pipeline for continuous deployment of deep learning models and pipelines. I will illustrate this through the work we are undertaking within the free and open-source IBM Model Asset eXchange.
Your easy move to serverless computing and radically simplified data processingBig Data Value Association
Your easy move to serverless computing and radically simplified data processing.
Dr Ofer Biran (STSM and Manager, Cloud and Data Technologies, IBM Haifa Research Lab)
AWS Serverless patterns & best-practices in AWSDima Pasko
This presentation discusses serverless patterns and best practices in AWS. It defines what serverless computing is and outlines the business case for serverless, including faster time to market, reduced costs, improved reliability, and increased innovation. It then covers AWS serverless design patterns and solutions, serverless myths and anti-patterns to avoid, and best practices like using the Serverless Application Model, AWS CDK, nested stacks, managing limits and connections, and power tuning.
1) Cloud native applications are built to take advantage of cloud computing resources like dynamically provisioned micro-services and distributed ephemeral components.
2) Netflix has transitioned to being a cloud native application built on an open source platform using AWS for scalable infrastructure, but also uses other providers for services not fully supported by AWS like content delivery and DNS.
3) What has changed is developers are freed from being the bottleneck through decentralization and automation of operations, allowing for greater agility, innovation, and business competitiveness in the cloud native model.
Disaster recovery is a coordinated process of restoring systems, data and infrastructure, required to support key ongoing business operations. Cloud is a great way for managing the risk.
Review this AWS and Nimbo webinar where we discuss moving your data center to the AWS Cloud. We feature a real world example to illustrate how this can be achieved both quickly and smoothly.
Hess Corporation recently moved part of its infrastructure to the cloud, to prepare for a business divestiture. Relying on consultation from enterprise cloud solution provider Nimbo, the migration was completed securely, in about half the time it would have taken in an on-premises environment.
[Full slides now also available at http://www.slideshare.net/adrianco/netflix-on-cloud-combined-slides-for-dev-and-ops]
Short summary of why Netflix is running on the Amazon cloud, what is running there, what we have learned and where this is taking us.
This is the introduction section to a series of public presentations that will go into much more detail. The Silicon Valley Cloud Computing Meetup was on Oct 14th, QCon San Francisco November 3rd.
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
Continuous integration and deployment has become an increasingly standard and common practice in software development. However, doing this for machine learning models and applications introduces many challenges. Not only do we need to account for standard code quality and integration testing, but how do we best account for changes in model performance metrics coming from changes to code, deployment framework or mechanism, pre- and post-processing steps, changes in data, not to mention the core deep learning model itself?
In addition, deep learning presents particular challenges:
* model sizes are often extremely large and take significant time and resources to train
* models are often more difficult to understand and interpret making it more difficult to debug issues
* inputs to deep learning are often very different from the tabular data involved in most ‘traditional machine learning’ models
* model formats, frameworks and the state-of-the art models and architectures themselves are changing extremely rapidly
* usually many disparate tools are combined to create the full end-to-end pipeline for training and deployment, making it trickier to plug together these components and track down issues.
We also need to take into account the impact of changes on wider aspects such as model bias, fairness, robustness and explainability. And we need to track all of this over time and in a standard, repeatable manner. This talk explores best practices for handling these myriad challenges to create a standardized, automated, repeatable pipeline for continuous deployment of deep learning models and pipelines. I will illustrate this through the work we are undertaking within the free and open-source IBM Model Asset eXchange.
Netflix Cloud Platform Building BlocksSudhir Tonse
Architectural Building Blocks of the Netflix Cloud Platform and lessons learned while implementing the same.
Commandments of Web Scale Cloud Deployments
CloudGenius offers an incremental and evolutionary process for migrating IT systems to the cloud that uses the (MC2)2 framework to evaluate alternatives and make complex decisions through multi-criteria analysis and decision making methods like AHP, and a prototype called CumulusGenius is available to demonstrate this migration process.
Pragmatic Approach to Workload Migrations - London Summit Enteprise Track RePlayAmazon Web Services
Migrating a portfolio of legacy applications to AWS cloud infrastructure requires careful planning as each phase needs balancing between risk tolerance and the speed of migration. This session will present a set of successful best practices, tools and techniques that help migration speed of delivery and increase success rate. We will also cover the complete lifecycle of an application portfolio migration including a special focus on how to organise and conduct the assessment and identify elements that can benefit from cloud architecture.
A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
This was presented by Supratik Ghatak, Co-Founder Blazeclan, at the AWS Summit Mumbai 2013. CIO pain points, cloud migration reasons and strategies are the key focus of this presentation. CIOs and CTOs can gain insights into various ways of leveraging the AWS cloud. The presentation also talks about the priority areas for CIOs & CTOs to look at while using the cloud as well as how to plan their strategies around AWS cloud. Further case studies are depicted that show how organization
can benefit from AWS Cloud.
This document discusses considerations for migrating applications to AWS. It identifies some key factors such as understanding the application stack and its components, security requirements, and current configuration. Standalone applications and loosely coupled applications are generally better candidates for migration than tightly integrated applications. It is recommended to do a proof of concept early to identify gaps. The document outlines the migration process and how on-premises infrastructure can be mapped to AWS architectures. CloudFormation templates can be used to automate infrastructure provisioning. Open source and paid toolkits can assist with monitoring and migration. Partnering with an experienced organization can help tailor the solution.
Evolution of Netflix's cloud security strategy. Includes cloud-based key management and hybrid security controls that span traditional datacenter and public cloud.
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
This document summarizes Netflix's approach to building cloud native applications. It discusses how Netflix uses microservices, replicates components across availability zones, and implements automated testing like Chaos Monkey to make applications resilient to failures. It also describes how Netflix uses Apache Cassandra and other open source tools to build highly available storage and handle large volumes of data in the cloud.
AICamp - Dr Ramine Tinati - Making Computer Vision RealRamine Tinati
The document provides an overview of computer vision and convolutional neural networks (CNNs). It discusses the basic architecture of CNNs, including convolutional layers, pooling layers, fully connected layers, and other concepts like activation functions, loss functions, regularization, and transfer learning. It also covers techniques for measuring CNN performance, such as precision, recall, average precision (AP), and intersection over union (IoU). The goal is to introduce computer vision and explain how CNNs work at a high level.
Intel IT Open Cloud - What's under the Hood and How do we Drive it?Odinot Stanislas
L'IT d'Intel fait sa révolution et s'impose d'agir comme un "Cloud Service Provider". La transformation est initiée avec au programme la mise en place d'un Cloud Fédéré, Interopérable et Open mais aussi d'un framework de maturité, du DevOps et de la prise de risque. Bref, vraiment intéressant
Adrian Cockcroft discusses the challenges of building reliable cloud services in an imperfect environment. He describes Netflix's approach of using microservices, continuous delivery, and automation to create stability. Cockcroft also introduces NetflixOSS, an open source platform that provides libraries and tools to help other companies adopt this "cloud native" architecture. The talk outlines opportunities to improve portability and foster an ecosystem around NetflixOSS.
Feasibility of cloud migration for large enterprisesAnant Damle
Analysis of conditions favouring movement to cloud for medium to large enterprises including checks for migration of existing application(s) from on-premise setup to cloud.
There are many questions on what are the best steps and ways to migrate to the cloud better. Enterprises need to have specific steps to follow when migrating to the cloud.
In this solution, we identify those specific steps and processes and how it can be adapted best.
To know more, please get in touch with us at info@blazeclan.com
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudNew Relic
The process of building new apps or migrating existing apps to a cloud-based platform is complex. There are hundreds of paths you can take and only a few will make sense for you and your business. Get a step-by-step guide on how to plan for a successful app migration.
This document discusses microservices architecture and related technologies like Docker and Kubernetes. It provides an overview of how microservices break up monolithic applications into independent, scalable services. It also describes how Docker containerizes services and how Kubernetes provides automation, scaling, and management of containerized services running across a cluster. Logging and monitoring services like ELK/EFK are also summarized as ways to aggregate and analyze logs from microservices.
The document discusses cloud computing and AWS cloud technology. It provides an overview of cloud computing benefits like agility, scalability, and cost savings. It then describes AWS as the most comprehensive cloud platform offering over 175 services. It lists several popular AWS products and services like EC2, S3, RDS, DynamoDB, and Lambda. It discusses the future of faster modern architectures using cloud-first approaches like serverless, microservices, and DevOps to build applications. It concludes with tips on getting started with AWS like using free tiers and learning basic services first.
The document discusses Network Functions Virtualization (NFV) and how CloudStack can be enhanced to better support NFV use cases. It provides an overview of NFV, comparing the NFV reference architecture to CloudStack's virtual router. While CloudStack's virtual router functions similarly to a virtualized network function, CloudStack currently lacks features like layer 2 networking and enterprise topologies that are important for NFV. The document proposes enhancements to CloudStack such as new topology and network types that would improve its capabilities for NFV.
Cloud Migration and Portability Best PracticesRightScale
Migrating applications to the cloud requires a clear understanding of both business and technical considerations. In addition, you will want to ensure portability among clouds to avoid lock-in. Here we define technical options for cloud portability and how to assess application suitability for migration.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
General discussions
Why cloud?
The terminology: relating virtualization and cloud
Types of Virtualization and Cloud deployment model
Decisive factors in migration
Hands-on cloud deployment
Cloud for banks
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Key characteristics of cloud computing include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Common uses of cloud computing involve hosting applications and services through major cloud platforms like Amazon Web Services, Microsoft Azure, and Google Cloud.
Netflix Cloud Platform Building BlocksSudhir Tonse
Architectural Building Blocks of the Netflix Cloud Platform and lessons learned while implementing the same.
Commandments of Web Scale Cloud Deployments
CloudGenius offers an incremental and evolutionary process for migrating IT systems to the cloud that uses the (MC2)2 framework to evaluate alternatives and make complex decisions through multi-criteria analysis and decision making methods like AHP, and a prototype called CumulusGenius is available to demonstrate this migration process.
Pragmatic Approach to Workload Migrations - London Summit Enteprise Track RePlayAmazon Web Services
Migrating a portfolio of legacy applications to AWS cloud infrastructure requires careful planning as each phase needs balancing between risk tolerance and the speed of migration. This session will present a set of successful best practices, tools and techniques that help migration speed of delivery and increase success rate. We will also cover the complete lifecycle of an application portfolio migration including a special focus on how to organise and conduct the assessment and identify elements that can benefit from cloud architecture.
A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
This was presented by Supratik Ghatak, Co-Founder Blazeclan, at the AWS Summit Mumbai 2013. CIO pain points, cloud migration reasons and strategies are the key focus of this presentation. CIOs and CTOs can gain insights into various ways of leveraging the AWS cloud. The presentation also talks about the priority areas for CIOs & CTOs to look at while using the cloud as well as how to plan their strategies around AWS cloud. Further case studies are depicted that show how organization
can benefit from AWS Cloud.
This document discusses considerations for migrating applications to AWS. It identifies some key factors such as understanding the application stack and its components, security requirements, and current configuration. Standalone applications and loosely coupled applications are generally better candidates for migration than tightly integrated applications. It is recommended to do a proof of concept early to identify gaps. The document outlines the migration process and how on-premises infrastructure can be mapped to AWS architectures. CloudFormation templates can be used to automate infrastructure provisioning. Open source and paid toolkits can assist with monitoring and migration. Partnering with an experienced organization can help tailor the solution.
Evolution of Netflix's cloud security strategy. Includes cloud-based key management and hybrid security controls that span traditional datacenter and public cloud.
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
This document summarizes Netflix's approach to building cloud native applications. It discusses how Netflix uses microservices, replicates components across availability zones, and implements automated testing like Chaos Monkey to make applications resilient to failures. It also describes how Netflix uses Apache Cassandra and other open source tools to build highly available storage and handle large volumes of data in the cloud.
AICamp - Dr Ramine Tinati - Making Computer Vision RealRamine Tinati
The document provides an overview of computer vision and convolutional neural networks (CNNs). It discusses the basic architecture of CNNs, including convolutional layers, pooling layers, fully connected layers, and other concepts like activation functions, loss functions, regularization, and transfer learning. It also covers techniques for measuring CNN performance, such as precision, recall, average precision (AP), and intersection over union (IoU). The goal is to introduce computer vision and explain how CNNs work at a high level.
Intel IT Open Cloud - What's under the Hood and How do we Drive it?Odinot Stanislas
L'IT d'Intel fait sa révolution et s'impose d'agir comme un "Cloud Service Provider". La transformation est initiée avec au programme la mise en place d'un Cloud Fédéré, Interopérable et Open mais aussi d'un framework de maturité, du DevOps et de la prise de risque. Bref, vraiment intéressant
Adrian Cockcroft discusses the challenges of building reliable cloud services in an imperfect environment. He describes Netflix's approach of using microservices, continuous delivery, and automation to create stability. Cockcroft also introduces NetflixOSS, an open source platform that provides libraries and tools to help other companies adopt this "cloud native" architecture. The talk outlines opportunities to improve portability and foster an ecosystem around NetflixOSS.
Feasibility of cloud migration for large enterprisesAnant Damle
Analysis of conditions favouring movement to cloud for medium to large enterprises including checks for migration of existing application(s) from on-premise setup to cloud.
There are many questions on what are the best steps and ways to migrate to the cloud better. Enterprises need to have specific steps to follow when migrating to the cloud.
In this solution, we identify those specific steps and processes and how it can be adapted best.
To know more, please get in touch with us at info@blazeclan.com
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudNew Relic
The process of building new apps or migrating existing apps to a cloud-based platform is complex. There are hundreds of paths you can take and only a few will make sense for you and your business. Get a step-by-step guide on how to plan for a successful app migration.
This document discusses microservices architecture and related technologies like Docker and Kubernetes. It provides an overview of how microservices break up monolithic applications into independent, scalable services. It also describes how Docker containerizes services and how Kubernetes provides automation, scaling, and management of containerized services running across a cluster. Logging and monitoring services like ELK/EFK are also summarized as ways to aggregate and analyze logs from microservices.
The document discusses cloud computing and AWS cloud technology. It provides an overview of cloud computing benefits like agility, scalability, and cost savings. It then describes AWS as the most comprehensive cloud platform offering over 175 services. It lists several popular AWS products and services like EC2, S3, RDS, DynamoDB, and Lambda. It discusses the future of faster modern architectures using cloud-first approaches like serverless, microservices, and DevOps to build applications. It concludes with tips on getting started with AWS like using free tiers and learning basic services first.
The document discusses Network Functions Virtualization (NFV) and how CloudStack can be enhanced to better support NFV use cases. It provides an overview of NFV, comparing the NFV reference architecture to CloudStack's virtual router. While CloudStack's virtual router functions similarly to a virtualized network function, CloudStack currently lacks features like layer 2 networking and enterprise topologies that are important for NFV. The document proposes enhancements to CloudStack such as new topology and network types that would improve its capabilities for NFV.
Cloud Migration and Portability Best PracticesRightScale
Migrating applications to the cloud requires a clear understanding of both business and technical considerations. In addition, you will want to ensure portability among clouds to avoid lock-in. Here we define technical options for cloud portability and how to assess application suitability for migration.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
General discussions
Why cloud?
The terminology: relating virtualization and cloud
Types of Virtualization and Cloud deployment model
Decisive factors in migration
Hands-on cloud deployment
Cloud for banks
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Key characteristics of cloud computing include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Common uses of cloud computing involve hosting applications and services through major cloud platforms like Amazon Web Services, Microsoft Azure, and Google Cloud.
Unleashing Apache Kafka and TensorFlow in the Cloud Kai Wähner
How can you leverage the flexibility and extreme scale in the public cloud combined with your Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures, which span multiple public clouds or bridge your on-premise data centre to cloud?
This talk will discuss and demo how you can leverage machine learning technologies such as TensorFlow with your Kafka deployments in public cloud to build a scalable, mission-critical machine learning infrastructure for data preprocessing and ingestion, and model training, deployment and monitoring.
The discussed architecture includes capabilities like scalable data preprocessing for training and predictions, combination of different Deep Learning frameworks, data replication between data centres, intelligent real time microservices running on Kubernetes, and local deployment of analytic models for offline predictions.
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data.:
I built a KSQL UDF for sensor analytics. It leverages the new API features of KSQL to build UDF / UDAF functions easily with Java to do continuous stream processing on incoming events.
Use Case: Connected Cars - Real Time Streaming Analytics using Deep Learning
Continuously process millions of events from connected devices (sensors of cars in this example).
Lino Telera gave a presentation on serverless computing. He began with introductions and background. The presentation covered serverless concepts like Function as a Service, demonstrated building a simple microservice using AWS Lambda that interacts with S3, and discussed integrating functions with services like S3 using Boto. It also showed how functions can be called from devices using skills and discussed running serverless on-premise using OpenFaaS or Pivotal Container Service. The presentation concluded with a Q&A and thanks to sponsors.
The document discusses cloud management challenges and how RightScale addresses them with its cloud management platform. It summarizes that RightScale provides unified management of multiple public and private clouds, enables self-service provisioning while maintaining governance controls, and offers automated tools to help enterprises scale their IT operations in the cloud.
Accelerate Digital Transformation with IBM Cloud PrivateMichael Elder
Accelerate the journey to cloud-native, refactor existing mission-critical workloads, and catalyze enterprise digital transformations.
How do you ensure the success of your enterprise in highly competitive market landscapes? How will you deliver new cloud-native workloads, modernize existing estates, and drive integration between them?
Protecting Your Power Systems with Cloud-based HA/DRPrecisely
This document discusses using Skytap on Azure to provide high availability and disaster recovery for IBM Power Systems workloads in the cloud. Some key points:
- Skytap on Azure allows customers to run their IBM Power and x86 applications natively in Microsoft Azure without refactoring. This enables easy migration and modernization.
- It provides a familiar environment for IBM Power applications in Azure, requiring no training, changes, or refactoring. Workloads run securely with low-latency connectivity between on-premises and Azure networks using ExpressRoute.
- Skytap on Azure is available in several public Azure regions globally. Customers can use it for production workloads with high availability, as well as disaster
[Capitole du Libre] #serverless - mettez-le en oeuvre dans votre entreprise...Ludovic Piot
Tout comme le Cloud IaaS avant lui, le serverless promet de faciliter le succès de vos projets en accélérant le Time to Market et en fluidifiant les relations entre Devs et Ops.
Mais sa mise en œuvre au sein d’une entreprise reste complexe et coûteuse.
Après 2 ans à mettre en place des plateformes managées de ce type, nous partagons nos expériences de ce qu’il faut faire pour mettre en œuvre du serverless en entreprise, en évitant les douleurs et en limitant les contraintes au maximum.
Tout d’abord l’architecture technique, avec 2 implémentations très différentes : Kubernetes et Helm d’un côté, Clever Cloud on-premise de l’autre.
Ensuite, la mise en place et l’utilisation d’OpenFaaS. Comment tester et versionner du Function as a Service. Mais aussi les problématiques de blue/green deployment, de rolling update, d’A/B testing. Comment diagnostiquer rapidement les dépendances et les communications entre services.
Enfin, en abordant les sujets chers à la production : * vulnerability management et patch management, * hétérogénéïté du parc, * monitoring et alerting, * gestion des stacks obsolètes, etc.
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...RightScale
We’ve all seen the trend everywhere around us: customers want self-service. It offers them the agility they need and gives businesses the ability to scale and lower their costs. With cloud deployments, enterprises can experience similar benefits through the use of a self-service portal where internal customers can provision their own resources while Central IT maintains control and visibility. This saves both time and money.
In this webinar, learn how to empower your internal customers to provision the necessary cloud resources when they need them but also ensure that what get receive is well within IT approved guidelines. Beyond simple convenience, this methodology permits you to operationalize your AWS cloud usage to easily roll cloud into an overall IT strategy.
Architects from Amazon Web Services (AWS) and RightScale, an Advanced Technology Partner, will provide an overview of the key business and technical considerations for operationalizing your AWS cloud usage. In the second half of the webinar, our technical experts will answer your questions. Priority will be given to pre-submitted questions.
To help illustrate the effectiveness of this approach, our architects will walk you through real-world examples and the overall impact on their organizations.
Key Topics:
1. Create an IT Vending machine with consistent and reproducible processes.
2. Enable your end users while maintaining visibility and control.
3. Use cost planning and forecasting to fine-tune and understand cloud spend.
4. Discover reporting and auditing tools to ensure compliance.
5. Avoid downtime through proven HA/DR architectures.
Cloud computing is an umbrella term for internet-based computing resources that provide shared processing, data storage, software, and other services. It allows users to access applications and data from anywhere via simple web services. Key advantages include lower costs, improved performance, universal access to documents, easier collaboration, and unlimited storage. However, it requires a constant internet connection and features may be limited compared to desktop software. Data security and loss of access are also potential disadvantages.
IaaS provides on-demand, self-service access to computing resources like servers and storage. PaaS automates the deployment of applications on top of IaaS and handles scaling. SaaS delivers applications to users through a thin client like a web browser. iPaaS facilitates integration between SaaS, PaaS, IaaS, and on-premise systems through a cloud-based platform. Popular IaaS include OpenStack and VMware vSphere, PaaS include Cloud Foundry and OpenShift, while Salesforce and Office 365 are examples of SaaS.
Docker Orchestration: Welcome to the Jungle! JavaOne 2015Patrick Chanezon
In two years, Docker hit the sweet spot for devs and ops, with tools for building, shipping, and running distributed apps architected as a set of collaborating microservices packaged as Linux containers. One area of the Docker ecosystem that saw a lot of innovation in the past year is container orchestration systems. This session compares and contrasts various Docker orchestration systems (Swarm, Machine, and Compose), the batteries included with Docker itself, Mesos, Kubernetes, CoreOS/Fleet, Deis, Cloud Foundry, and Tutum. It includes a demo of how to deploy a Java 8 app with MongoDB on several of these systems. The goal of the session is to give you a framework to help evaluate how these systems can meet your particular requirements.
The document provides an overview of a cloud computing course, including introductions to cloud concepts and technologies, demonstrations of cloud capabilities, security considerations, hands-on labs, and a business case study. The course outline covers cloud models, elasticity, pay-per-use, on-demand services, virtual private clouds, storage solutions, serverless technologies, and implementing security and governance in the cloud.
We are working on building Hybrid Cloud for research and development purpose. Our project goal is to realize managing not only Public Cloud but also Private Cloud by making operations even easier. We are managing Amazon EC2, and our Private Cloud by making our own Cloud management tool by Drupal, which we call Clanavi beyond Drupal as a Content Management System. --- Drupal as a fundamental of PaaS (Platform as a Service).
We are happy to introduce our Clanavi including its requirements, architecture design and business value. We would like to show how Drupal can define to manage multiple Cloud infrastructures and why Drupal can be used as Web Application Framework.
Key Points Covered:
- Cloud Computing Overview (Definition)
- Private Cloud Requiremetns
- Goal, Design and Architecture
- Operation Problems in-the-Cloud
- Business Value by Clanavi
- Future Direction
- Q & A
Evolve or Fall Behind: Driving Transformation with Containers - Sai Vennam - ...CodeOps Technologies LLP
This presentation was the opening session in the container conference 2018 in Bangalore.
"IBM Developer Advocate Sai Vennam speaks about the latest emerging technology in the container space - from managed Kubernetes offerings to open-source tools like Istio and containerd. You'll also see how container technology is driving transformation in all industries across the world."
URL: www.containerconf.in
This document discusses three ways to move existing middleware to the cloud: lift and shift, refactor, and rebuild. It then provides details on each approach and the stages of modernization. It focuses on using containers with Docker and Kubernetes as part of the refactor approach. It also discusses related topics like microservices, API management, and IBM Cloud offerings.
Similar to Your easy move to serverless computing and radically simplified data processing (20)
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
End-to-end pipeline agility - Berlin Buzzwords 2024
Your easy move to serverless computing and radically simplified data processing
1. Your easy move to serverless computing and
radically simplified data processing
Dr. Gil Vernik, IBM Research
2. About myself
• Gil Vernik
• IBM Research from 2010
• PhD in mathematics. Post-doc in Germany
• Architect, 25+ years of development experience
• Active in open source
• Recent interest – Cloud. Hybrid cloud. Big Data. Storage. Serverless
Twitter: @vernikgil
https://www.linkedin.com/in/gil-vernik-1a50a316/
3. Agenda
What problem we solve
Why serverless computing
How to make an easy move to serverless
Use cases
4. This project has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 825184.
http://cloudbutton.eu
6. Simulations
• Alice is working in the risk
management department at the bank
• She needs to evaluate a new contract
• She decided to run a Monte-Carlo
simulation to evaluate the contract
• There is need about 100,000,000
calculations to get a better estimation
This Photo by Unknown Author is licensed under CC BY-SA
7. The challenge
How and where to scale the code of Monte Carlo simulations?
Business logic
8. Data processing
• Maria needs to run face detection using TensorFlow over millions of
images. The process requires raw images to be pre-processed
before used by TensorFlow
• Maria wrote a code and tested it on a single image
• Now she needs to execute the same code at massive scale, with
parallelism, on terabytes of data stored in object storage
Raw image Pre-processed image
9. The challenge
How to scale the code to run in parallel on terabytes of data without become
a systems expert in scaling the code and learn storage semantics?
IBM Cloud Object
Storage
10. Mid summary
• How and where to scale the code?
• How to process massive data sets without become a storage
expert?
• How to scale certain flows from the existing applications
without major disruption to the existing system?
11. VMs, containers and the rest
• Naïve solution to scale an application - provision high resourced virtual
machines and run your application there
• Complicated , Time consuming, Expensive
• Recent trend is to leverage container platforms
• Containers have better granularity comparing to VMs, better resource
allocations, and so on.
• Docker containers became popular, yet many challenges how to ”containerize”
existing code or applications
• Comparing VMs and containers is beyond the scope of this talk…
• Leverage Function as a Service platforms
12. FaaS - "Hello Strata NY”
Deploy the code
(as specified by the FaaS provider)
Invoke “helloStrata”
“Hello Strata NY”
Invoke “helloStrata”
“Hello Strata NY”
# main() will be invoked when you Run This Action.
#
# @param Cloud Functions actions accept a single
parameter,
# which must be a JSON object.
#
# @return which must be a JSON object.
# It will be the output of this action.
#
#
import sys
def main(dict):
if 'name' in dict:
name = dict['name']
else:
name = 'Strata NY'
greeting = 'Hello ' + name + '!'
print(greeting)
return {'greeting':greeting}
IBM Cloud Functions
”helloStrata”
FaaS
13. code()
Event Action
Deploy
the code
Input
Output
• Unit of computation is a function
• Function is a short lived task
• Smart activation, event driven, etc.
• Usually stateless
• Transparent auto-scaling
• Pay only for what you use
• No administration
• All other aspects of the execution are
delegated to the Cloud Provider
Function as a Service
IBM Cloud Functions
14. Are there still challenges?
• How to integrate FaaS into existing applications and frameworks
without major disruption?
• Users need to be familiar with API of storage and FaaS platform
• How to control and coordinate invocations
• How to scale the input and generate output
1
4
15. Push to the Cloud
• Occupy the Cloud: Distributed Computing for the 99%,
(Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, Benjamin Recht , 2017)
• Why is it still ”complicated” to move workflows to the Cloud?
User need to be familiar with cloud provider API, use deployments
tools, write code according to cloud provider spec and so on.
• Can FaaS be used for broad scope of flows? (RISELab at UC Berkley, 2017)
PyWren - an open source framework released
16. Push to the cloud with PyWren
• Serverless for more use cases
(not just event based or “Glue” for services)
• Push to the Cloud experience
• Designed to scale Python application at massive scale
Python code
Serverless
action1
Serverless action 2
Serverless
action1000
………
………
17. Cloud Button Toolkit
• PyWren-IBM ( aka CloudButton Toolkit) is a novel Python
framework extending PyWren
• ~800 commits to PyWren-IBM on top of PyWren
• Being developed as part of CloudButton project
• Leaded by IBM Research Haifa
• Open source https://github.com/pywren/pywren-ibm-cloud
18. PyWren-IBM example
data = [1,2,3,4]
def my_map_function(x):
return x+7
PyWren-IBM
print (cb.get_result())
[8,9,10,11]
IBM Cloud Functions
import pywren_ibm_cloud as cbutton
cb = cbutton.ibm_cf_executor()
cb.map(my_map_function, data))
PyWren-IBM
PyWren-IBM
20. PyWren-IBM example
data = “cos://mybucket/year=2019/”
def my_map_function(obj, boto3_client):
// business logic
return obj.name
PyWren-IBM
print (cb.get_result())
[d1.csv, d2.csv, d3.csv,….]
IBM Cloud Functions
import pywren_ibm_cloud as cbutton
cb = cbutton.ibm_cf_executor()
cb.map(my_map_function, data))
PyWren-IBM
PyWren-IBM
21. Unique differentiations of PyWren-IBM
• Pluggable implementation for FaaS platforms
• IBM Cloud Functions, Apache OpenWhisk, OpenShift by Red Hat, Kubernetess
• Supports Docker containers
• Seamless integration with Python notebooks
• Advanced input data partitioner
• Data discovery to process large amounts of data stored in IBM Cloud Object
storage, chunking of CSV files, supports user provided partition logic
• Unique functionalities
• Map-Reduce, monitoring, retry, in-memory queues, authentication token reuse,
pluggable storage backends, and many more..
22. What PyWren-IBM good for
• Batch processing, UDF, ETL, HPC and Monte Carlo simulations
• Embarrassingly parallel workload or problems - often the case where there is little or no
dependency or need for communication between parallel tasks
• Subset of map-reduce flows
Input Data
Results
………Tasks 1 2 3 n
23. What PyWren-IBM requires?
Function as a Service platform
• IBM Cloud Functions,
Apache OpenWhisk
• OpenShift, Kubernetes, etc.
Storage accessed from
Function as a Service platform
through S3 API
• IBM Cloud Object Storage
• Red Hat Ceph
24. PyWren-IBM and HPC This Photo by Unknown Author is licensed under CC BY-SA
25. What is HPC?
• High Performance Computing
• Mostly used to solve advanced problems that may be
simulations, analysis, research problems , etc.
• Does HPC well defined? – depends whom you ask
• Super computers or highly parallel processing or both?
• MPI (Message Passing Interface) for communication or
there is only need to exchange results between simulations?
• Data locality or “fast “access to the data?
• Super fast? “fast” enough? Or good enough?
This Photo by Unknown Author is licensed under CC BY-NC
26. HPC and “super” computers
• Dedicated HPC super computers
• Designed to be super fast
• Calculations usually rely on Message
Passing Interface (MPI)
• Pros : HPC super computers
• Cons: HPC super computers
DedicatedHPC
supercomputers
HPC simulations
27. HPC and VMs
• No need to buy expensive machines
• Frameworks to run HPC flows over VMs
• Flows usually depends on MPI, data locality
• Recent academic interest
• Pros : Virtual Machines
• Cons: Virtual Machines
VirtualMachines
private,cloud,etc.
HPC simulations
28. HPC and Containers
Containers
• Good granularity, parallelism, resource
allocation, etc.
• Research papers, frameworks
• Singularity / Docker containers
• Pros: containers
• Cons: many focuses how to move entire
application into containers, which
usually require to re-design applications
HPC simulations
29. HPC and FaaS with PyWren-IBM
HPC simulations
Containers
• FaaS is a perfect platform to scale code and
applications
• Many FaaS platforms allows users to use
Docker containers
• Code can contain any dependencies
• PyWren-IBM is natural fit for many HPC
flows
• Pros : the easy move to serverless
• Try it yourself…
PyWren-IBM
overFaaS
30. Use cases and demos..
IBM Cloud
Object Storage
PyWren-IBM framework
https://github.com/pywren/pywren-ibm-cloud
IBM Cloud
Functions
31. Monte Carlo and PyWren-IBM
PyWren is natural fit to scale Monte Carlo
computations across FaaS platform
User need to write business logic and PyWren does
the rest
Monte Carlo methods are a broad class of computational algorithms
- evaluate the risk and uncertainty, investments in projects,
popular methods in finance
32. Stock price prediction
• A mathematical approach for stock price modelling. More accurate for
modelling prices over longer periods of time
• We run Monte Carlo stock prediction over IBM Cloud Functions with
PyWren-IBM
• With PyWren-IBM total code is ~40 lines. Without PyWren-IBM
running the same code requires 100s of additional lines of code
Number of
forecasts
Local run (1CPU,
4 cores)
IBM CF Total number of CF
invocations
100,000 10,000 seconds ~70 seconds 1000
• We run 1000 concurrent invocations, each consuming 1024MB of memory
• Each invocation predicted a forecast of 1080 days and used 100 random samples per prediction.Totally we did 108,000,000 calculations
About 2500 forecasts predicted stock price around $130
34. Protein Folding
• Proteins are biological polymers that carry out most
of the cell’s day-to-day functions.
• Protein structure leads to protein function
• Proteins are made from a linear chain of amino acids
and folded into variety of 3-D shapes
• Protein folding is a complex process
that is not yet completely understood
This Photo by Unknown Author is licensed under CC BY-SA-NC
This Photo by Unknown Author is licensed under CC BY-SA
35. Replica exchange
• Monte Carlo simulations are popular methods to predict protein folding
• ProtoMol is special designed framework for molecular dynamics
• http://protomol.sourceforge.net
• A highly parallel replica exchange molecular dynamics (REMD) method used
to exchange Monte Carlo process for efficient sampling
• A series of tasks (replicas) are run in parallel at various temperatures
• From time to time the configurations of neighboring tasks are exchanged
• Various HPC frameworks allows to run Protein Folding
• Depends on MPI
• VMs or dedicated HPC machines
36. Protein folding with PyWren-IBM
PyWren-IBM
submit a job of X
invocations
each running
ProtoMol
PyWren-IBM
collect results of
all invocations
REMD algorithms uses
output of invocations as
an input to the next job
IBM Cloud Functions
Each invocation runs ProtoMol library to run
Monte Carlo simulations. ProtoMol uses MPI for
communication between threads
* This Photo by Unknown Author is licensed under CC BY-SA
*
Our experiment – 99 jobs
• Each job executes many IBM CF invocations
• Each invocation runs 100 Monte Carlo steps
• Each step running 10000 Molecular Dynamic steps
• REMD exchange the results of the completed job
which used as an input to the following job
• Our approach doesn’t use MPI
38. PyWren-IBM for data processing
Face recognition experiment with PyWren-IBM over IBM Cloud
• Align faces using open source from 1000 images stored in IBM cloud
object storage
• Given python code that know how to extract face from a single image
• Run from any Python notebook
39. Processing images without PyWren-IBM
import logging
import os
import sys
import time
import shutil
import cv2
from openface.align_dlib import AlignDlib
logger = logging.getLogger(__name__)
temp_dir = '/tmp'
def preprocess_image(bucket, key, data_stream, storage_handler):
"""
Detect face, align and crop :param input_path. Write output to :param output_path
:param bucket: COS bucket
:param key: COS key (object name ) - may contain delimiters
:param storage_handler: can be used to read / write data from / into COS
"""
crop_dim = 180
#print("Process bucket {} key {}".format(bucket, key))
sys.stdout.write(".")
# key of the form /subdir1/../subdirN/file_name
key_components = key.split('/')
file_name = key_components[len(key_components)-1]
input_path = temp_dir + '/' + file_name
if not os.path.exists(temp_dir + '/' + 'output'):
os.makedirs(temp_dir + '/' +'output')
output_path = temp_dir + '/' +'output/' + file_name
with open(input_path, 'wb') as localfile:
shutil.copyfileobj(data_stream, localfile)
exists = os.path.isfile(temp_dir + '/' +'shape_predictor_68_face_landmarks')
if exists:
pass;
else:
res = storage_handler.get_object(bucket, 'lfw/model/shape_predictor_68_face_landmarks.dat', stream =
True)
with open(temp_dir + '/' +'shape_predictor_68_face_landmarks', 'wb') as localfile:
shutil.copyfileobj(res, localfile)
align_dlib = AlignDlib(temp_dir + '/' +'shape_predictor_68_face_landmarks')
image = _process_image(input_path, crop_dim, align_dlib)
if image is not None:
#print('Writing processed file: {}'.format(output_path))
cv2.imwrite(output_path, image)
f = open(output_path, "rb")
processed_image_path = os.path.join('output',key)
storage_handler.put_object(bucket, processed_image_path, f)
os.remove(output_path)
else:
pass;
#print("Skipping filename: {}".format(input_path))
os.remove(input_path)
def _process_image(filename, crop_dim, align_dlib):
image = None
aligned_image = None
image = _buffer_image(filename)
if image is not None:
aligned_image = _align_image(image, crop_dim, align_dlib)
else:
raise IOError('Error buffering image: {}'.format(filename))
return aligned_image
def _buffer_image(filename):
logger.debug('Reading image: {}'.format(filename))
image = cv2.imread(filename, )
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
def _align_image(image, crop_dim, align_dlib):
bb = align_dlib.getLargestFaceBoundingBox(image)
aligned = align_dlib.align(crop_dim, image, bb, landmarkIndices=AlignDlib.INNER_EYES_AND_BOTTOM_LIP)
if aligned is not None:
aligned = cv2.cvtColor(aligned, cv2.COLOR_BGR2RGB)
return aligned
import ibm_boto3
import ibm_botocore
from ibm_botocore.client import Config
from ibm_botocore.credentials import DefaultTokenManager
t0 = time.time()
client_config = ibm_botocore.client.Config(signature_version='oauth',
max_pool_connections=200)
api_key = config['ibm_cos']['api_key']
token_manager = DefaultTokenManager(api_key_id=api_key)
cos_client = ibm_boto3.client('s3', token_manager=token_manager,
config=client_config, endpoint_url=config['ibm_cos']['endpoint'])
try:
paginator = cos_client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket="gilvdata", Prefix = 'lfw/test/images')
print (page_iterator)
except ibm_botocore.exceptions.ClientError as e:
print(e)
class StorageHandler:
def __init__(self, cos_client):
self.cos_client = cos_client
def get_object(self, bucket_name, key, stream=False, extra_get_args={}):
"""
Get object from COS with a key. Throws StorageNoSuchKeyError if the given key does not exist.
:param key: key of the object
:return: Data of the object
:rtype: str/bytes
"""
try:
r = self.cos_client.get_object(Bucket=bucket_name, Key=key, **extra_get_args)
if stream:
data = r['Body']
else:
data = r['Body'].read()
return data
except ibm_botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "NoSuchKey":
raise StorageNoSuchKeyError(key)
else:
raise e
def put_object(self, bucket_name, key, data):
"""
Put an object in COS. Override the object if the key already exists.
:param key: key of the object.
:param data: data of the object
:type data: str/bytes
:return: None
"""
try:
res = self.cos_client.put_object(Bucket=bucket_name, Key=key, Body=data)
status = 'OK' if res['ResponseMetadata']['HTTPStatusCode'] == 200 else 'Error'
try:
log_msg='PUT Object {} size {} {}'.format(key, len(data), status)
logger.debug(log_msg)
except:
log_msg='PUT Object {} {}'.format(key, status)
logger.debug(log_msg)
except ibm_botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "NoSuchKey":
raise StorageNoSuchKeyError(key)
else:
raise e
temp_dir = '/home/dsxuser/.tmp'
storage_client = StorageHandler(cos_client)
for page in page_iterator:
if 'Contents' in page:
for item in page['Contents']:
key = item['Key']
r = cos_client.get_object(Bucket='gilvdata', Key=key)
data = r['Body']
Business Logic Boiler plate
• Loop over all images
• Close to 100 lines of “boiler
plate” code to find the
images, read and write the
objects, etc.
• Data scientist needs to be
familiar with s3 API
• Execution time
approximately 36
minutes!
40. Processing images with PyWren-IBM
import logging
import os
import sys
import time
import shutil
import cv2
from openface.align_dlib import AlignDlib
logger = logging.getLogger(__name__)
temp_dir = '/tmp'
def preprocess_image(bucket, key, data_stream, storage_handler):
"""
Detect face, align and crop :param input_path. Write output to :param output_path
:param bucket: COS bucket
:param key: COS key (object name ) - may contain delimiters
:param storage_handler: can be used to read / write data from / into COS
"""
crop_dim = 180
#print("Process bucket {} key {}".format(bucket, key))
sys.stdout.write(".")
# key of the form /subdir1/../subdirN/file_name
key_components = key.split('/')
file_name = key_components[len(key_components)-1]
input_path = temp_dir + '/' + file_name
if not os.path.exists(temp_dir + '/' + 'output'):
os.makedirs(temp_dir + '/' +'output')
output_path = temp_dir + '/' +'output/' + file_name
with open(input_path, 'wb') as localfile:
shutil.copyfileobj(data_stream, localfile)
exists = os.path.isfile(temp_dir + '/' +'shape_predictor_68_face_landmarks')
if exists:
pass;
else:
res = storage_handler.get_object(bucket, 'lfw/model/shape_predictor_68_face_landmarks.dat', stream =
True)
with open(temp_dir + '/' +'shape_predictor_68_face_landmarks', 'wb') as localfile:
shutil.copyfileobj(res, localfile)
align_dlib = AlignDlib(temp_dir + '/' +'shape_predictor_68_face_landmarks')
image = _process_image(input_path, crop_dim, align_dlib)
if image is not None:
#print('Writing processed file: {}'.format(output_path))
cv2.imwrite(output_path, image)
f = open(output_path, "rb")
processed_image_path = os.path.join('output',key)
storage_handler.put_object(bucket, processed_image_path, f)
os.remove(output_path)
else:
pass;
#print("Skipping filename: {}".format(input_path))
os.remove(input_path)
def _process_image(filename, crop_dim, align_dlib):
image = None
aligned_image = None
image = _buffer_image(filename)
if image is not None:
aligned_image = _align_image(image, crop_dim, align_dlib)
else:
raise IOError('Error buffering image: {}'.format(filename))
return aligned_image
def _buffer_image(filename):
logger.debug('Reading image: {}'.format(filename))
image = cv2.imread(filename, )
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
def _align_image(image, crop_dim, align_dlib):
bb = align_dlib.getLargestFaceBoundingBox(image)
aligned = align_dlib.align(crop_dim, image, bb, landmarkIndices=AlignDlib.INNER_EYES_AND_BOTTOM_LIP)
if aligned is not None:
aligned = cv2.cvtColor(aligned, cv2.COLOR_BGR2RGB)
return aligned
pw = pywren.ibm_cf_executor(config=config, runtime='pywren-dlib-runtime_3.5')
bucket_name = 'gilvdata/lfw/test/images'
results = pw.map_reduce(preprocess_image, bucket_name, None, None).get_result()
Business Logic Boiler plate
• Under 3 lines of “boiler
plate”!
• Data scientist does not
need to use s3 API!
• Execution time is 35s
• 35 seconds as compared
to 36 minutes!
41. Input
Application Code
or Docker Image
Raw Satellite
Imagery
IBM Cloud Functions
With PyWren-IBM Processed Raster Data
Processed Raster Data
Meta Data
Processed Vector Data
• Horizontally Scalable
• Unified Imagery Ingestion Pipeline
• Efficient Output Query on Both Raster Data and
Vector Data
Satellite batch data processing
Output
43. Big dataMass spectrometry Computational biology
ORGANISM
1 m
tissues
Microbial
plates
1 cm
SINGLe CELLs
1 μm
Alexandrov team at embl Heidelberg
Spatial metabolomics across scales
Protsyuk et al, Nature Protocols 2018
Bouslimani et al, PNAS 2017
Palmer et al, Nature Methods 2017
Alexandrov et al, BioRxiv 2019
Rappez et al, BioRxiv 2019
Interdisciplinary
Single-cell biology
-omics
COMPUTATIONAL
applications
INFLAMMATION
IMMUNITY
CANCER
Methods dev
Molecular
Image analysis
ML, AI, BIG DATA
44. METASPACE use case
• Alexandrov team at EMBL develops novel computational
biology tools to reveal the spatial organization of
metabolic processes https://www.embl.de/research/units/scb/alexandrov/
• “Imaging mass spectrometry” application
• Uses Apache Spark and deployed across VMs in the cloud
Customer
uploads
medical image
Customer choose
molecular
databases
Data pre-
processing and
segmentation
Molecular scan of
datasets with
dedicated algorithms
Generation
output images
with results
45. Metabolomics with PyWren-IBM
Metabolomics application with PyWren-IBM
• We use PyWren-IBM to provide a prototype that deploys Metabolite
annotation engine as a serverless actions in the IBM Cloud
• https://github.com/metaspace2020/pywren-annotation-pipeline
Benefit of PyWren-IBM
• Better control of data partitions
• Speed of deployment, no need VMs
• Elasticity and automatic scale
• And many more..
46. Molecular Databases
up to 100M molecular strings
Dataset Input
up to 50GB binary file
Behind the scenes
Molecularannotationengine
Imageprocessing
IBM Cloud Functions
Results
Metabolite annotation engine
deployed by PyWren-IBM
47. tumorbrain
A whole-body section of a
mouse model showing
localization of glutamate.
Glutamate is a well-known
neurotransmitter abundant in
the brain. It however is linked to
cancer where it supports
proliferation and growth
of cancer cells. Both facts are
supported by the detected
localization, obtained using
METASPACE.
Data provided by Genentech.
glutamate
Annotation results
48. Summary
PyWren-IBM is a novel framework with advanced capabilities to run
user code in the cloud
We demonstrated the benefit of PyWren-IBM for HPC, molecular
biology and batch data pre-processing
For more use case and examples visit our project page
https://github.com/pywren/pywren-ibm-cloud
All is open source
Gil Vernik
gilv@il.ibm.com
Thank you
gilv@il.ibm.com