The document discusses creating developer-friendly containers using Chaperone and the chaperone-baseimage family. Chaperone is a process manager that provides services like logging, cron jobs, and orderly shutdown within containers. The chaperone-baseimage images use Chaperone to provide three personalities for containers: closed, attached-data, and development. This allows developers to have a consistent environment to develop applications without understanding container internals. The development model mounts the container's infrastructure to the developer's local directory for easy editing of code and data outside the container.
Slides from my talk at 2016 Apache: Big Data conference.
Resource managers like Apache YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. We present Apache REEF, a powerful yet simple framework that helps developers of big data systems to retain fine-grained control over the cloud resources and address common problems of fault-tolerance, task scheduling and coordination, caching, interprocess communication, and bulk-data transfers. We will guide the developers through a simple REEF application and discuss current state of Apache REEF project and its place in the Hadoop ecosystem.
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevWerner Keil
At Maersk Line, not only the world's biggest ships 'Triple-E' class vessels are currently being built. Continuous Integration and Delivery on a similar scale using Hudson, Maven and tools like Kokki similar to Puppet or Chef are also practiced there.
This session is going to give a brief overview of Multi-Configuration (Matrix) job types used in most of these projects. Things are being built and deployed in a heterogenous environment, otherwise probably found only at large vendors of Public Cloud services.
Slides from my talk at 2016 Apache: Big Data conference.
Resource managers like Apache YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. We present Apache REEF, a powerful yet simple framework that helps developers of big data systems to retain fine-grained control over the cloud resources and address common problems of fault-tolerance, task scheduling and coordination, caching, interprocess communication, and bulk-data transfers. We will guide the developers through a simple REEF application and discuss current state of Apache REEF project and its place in the Hadoop ecosystem.
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevWerner Keil
At Maersk Line, not only the world's biggest ships 'Triple-E' class vessels are currently being built. Continuous Integration and Delivery on a similar scale using Hudson, Maven and tools like Kokki similar to Puppet or Chef are also practiced there.
This session is going to give a brief overview of Multi-Configuration (Matrix) job types used in most of these projects. Things are being built and deployed in a heterogenous environment, otherwise probably found only at large vendors of Public Cloud services.
Java APIs- The missing manual (concurrency)Hendrik Ebbers
This isn’t a talk about microservices, NoSQL, container solutions or hip new frameworks. This talk will show some of the standard Java APIs that are part of Java since version 5, 6, 7 or 8. All those features are very helpful to create maintainable and future-proof applications, regardless of whether JavaEE, Spring, JavaFX or any other framework is used. The talk will give an overview of some important standard concepts and APIs of Java like annotations, null values and concurrency.
JavaOne 2015: From Java Code to Machine CodeChris Bailey
When you write and run Java code, it is first compiled by javac to bytecode and then converted to optimized machine code by the just-in-time (JIT) compiler. Although JIT compilers are advanced and are able to create highly optimized code, the level of optimization achievable is ultimately limited by how the original Java code was written. This presentation introduces the compilation and optimization process and uses applications to show how following several simple rules when writing your Java code can lead to highly optimizable, and therefore highly performant, applications.
Presented at JavaOne 2015
Apache DeviceMap - ApacheCon core Europe 2015Werner Keil
We experience a growing number of mobile and similar devices flooding the market almost every day. Capturing the specification of each device is a tough job. If you want to create a great UX you need dynamic content matching hardware and browser specs of your device. That’s why Device Description Repositories (DDR) exist. Apache DeviceMap is a collaborative effort to create a comprehensive open-source and open-data repository of device information and other relevant data for various devices. The project began in January 2012 after which OpenDDR contributed data and APIs for Java and. NET. DeviceMap left Apache Incubator Nov 2014. After modularization, DeviceMap 2.0 aims to make classification generic, so people can introduce their own detection domains. Support further languages like JavaScript/Node.js, common web UI frameworks, etc. and a JSON representation of device data.
Slides from Workshop 'Cloud Foundry: Hands-on Deployment Workshop'
http://www.meetup.com/CloudFoundry/events/150601282/
In this workshop you will learn Cloud Foundry fundamental concepts, setup, deployment and operations. We’ll cover a couple of alternatives to deploy CF in a local environment for learning and testing purposes as well as deploying Cloud Foundry atop IaaS production level environment, being able to manage hundreds of components and thousands of applications.
If you did not have a chance to work with Cloud Foundry, it may be useful to test its features locally at first. Deploying this environment on a local machine allows you to get hands-on experience in the solution and, in case you are a contributor, to test some features before you commit them to a production environment.
This is the second session of the learning pathway at PASS Summit 2019, which is still a stand alone session to teach you how to write proper Linux BASH scripts
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
3 examples for Big Data analytics containerized:
1. The installation with Docker and Weave for small and medium,
2. Hadoop on Mesos w/ Appache Myriad
3. Spark on Mesos
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
An overview for Big Data Engineers on how one could use Apache projects to run deep learning workflows with Apache NiFi, YARN, Spark, Kafka and many other Apache projects.
This isn’t a talk about microservices, NO-SQL, Container solutions or hip new frameworks. This talk will show some of the standard Java APIs that are part of Java since version 5, 6, 7 or 8. All this features are very helpful to create maintainable and future-proof applications, regardless of whether JavaEE, Spring, JavaFX or any other framework is used. The talk will give an overview of some important standard concepts and APIs of Java like annotations, null values and concurrency. Based on an overview of this topics and some samples the talk will answer questions like:
- How can I create my own annotations?
- How can I create a plugin structure without using frameworks like OSGI?
- What’s the best way to handle NullPointerExceptions?
- How can I write concurrent code that is still maintainable?
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsAmazon Web Services
Learn more about the processes followed by Amazon engineers and discuss how you can bring them to your company by using AWS CodePipeline and AWS CodeDeploy, services inspired by Amazon's internal developer tools and DevOps culture.
Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.
WaveMaker Webinar: Cloud-based App Development and Docker: Trends to watch out for in 2015 - http://www.wavemaker.com/news/webinar-cloud-app-development-and-docker-trends/
CIOs, IT planners and developers at a growing number of organizations are taking advantage of the simplicity and productivity benefits of cloud application development. With Docker technology, cloud-based app development or aPaaS (Application Platform as a Service) is only becoming more disruptive − forcing organizations to rethink how they handle innovation, time-to-market pressures, and IT workloads.
Java APIs- The missing manual (concurrency)Hendrik Ebbers
This isn’t a talk about microservices, NoSQL, container solutions or hip new frameworks. This talk will show some of the standard Java APIs that are part of Java since version 5, 6, 7 or 8. All those features are very helpful to create maintainable and future-proof applications, regardless of whether JavaEE, Spring, JavaFX or any other framework is used. The talk will give an overview of some important standard concepts and APIs of Java like annotations, null values and concurrency.
JavaOne 2015: From Java Code to Machine CodeChris Bailey
When you write and run Java code, it is first compiled by javac to bytecode and then converted to optimized machine code by the just-in-time (JIT) compiler. Although JIT compilers are advanced and are able to create highly optimized code, the level of optimization achievable is ultimately limited by how the original Java code was written. This presentation introduces the compilation and optimization process and uses applications to show how following several simple rules when writing your Java code can lead to highly optimizable, and therefore highly performant, applications.
Presented at JavaOne 2015
Apache DeviceMap - ApacheCon core Europe 2015Werner Keil
We experience a growing number of mobile and similar devices flooding the market almost every day. Capturing the specification of each device is a tough job. If you want to create a great UX you need dynamic content matching hardware and browser specs of your device. That’s why Device Description Repositories (DDR) exist. Apache DeviceMap is a collaborative effort to create a comprehensive open-source and open-data repository of device information and other relevant data for various devices. The project began in January 2012 after which OpenDDR contributed data and APIs for Java and. NET. DeviceMap left Apache Incubator Nov 2014. After modularization, DeviceMap 2.0 aims to make classification generic, so people can introduce their own detection domains. Support further languages like JavaScript/Node.js, common web UI frameworks, etc. and a JSON representation of device data.
Slides from Workshop 'Cloud Foundry: Hands-on Deployment Workshop'
http://www.meetup.com/CloudFoundry/events/150601282/
In this workshop you will learn Cloud Foundry fundamental concepts, setup, deployment and operations. We’ll cover a couple of alternatives to deploy CF in a local environment for learning and testing purposes as well as deploying Cloud Foundry atop IaaS production level environment, being able to manage hundreds of components and thousands of applications.
If you did not have a chance to work with Cloud Foundry, it may be useful to test its features locally at first. Deploying this environment on a local machine allows you to get hands-on experience in the solution and, in case you are a contributor, to test some features before you commit them to a production environment.
This is the second session of the learning pathway at PASS Summit 2019, which is still a stand alone session to teach you how to write proper Linux BASH scripts
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
3 examples for Big Data analytics containerized:
1. The installation with Docker and Weave for small and medium,
2. Hadoop on Mesos w/ Appache Myriad
3. Spark on Mesos
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
An overview for Big Data Engineers on how one could use Apache projects to run deep learning workflows with Apache NiFi, YARN, Spark, Kafka and many other Apache projects.
This isn’t a talk about microservices, NO-SQL, Container solutions or hip new frameworks. This talk will show some of the standard Java APIs that are part of Java since version 5, 6, 7 or 8. All this features are very helpful to create maintainable and future-proof applications, regardless of whether JavaEE, Spring, JavaFX or any other framework is used. The talk will give an overview of some important standard concepts and APIs of Java like annotations, null values and concurrency. Based on an overview of this topics and some samples the talk will answer questions like:
- How can I create my own annotations?
- How can I create a plugin structure without using frameworks like OSGI?
- What’s the best way to handle NullPointerExceptions?
- How can I write concurrent code that is still maintainable?
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsAmazon Web Services
Learn more about the processes followed by Amazon engineers and discuss how you can bring them to your company by using AWS CodePipeline and AWS CodeDeploy, services inspired by Amazon's internal developer tools and DevOps culture.
Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.
WaveMaker Webinar: Cloud-based App Development and Docker: Trends to watch out for in 2015 - http://www.wavemaker.com/news/webinar-cloud-app-development-and-docker-trends/
CIOs, IT planners and developers at a growing number of organizations are taking advantage of the simplicity and productivity benefits of cloud application development. With Docker technology, cloud-based app development or aPaaS (Application Platform as a Service) is only becoming more disruptive − forcing organizations to rethink how they handle innovation, time-to-market pressures, and IT workloads.
3 years ago, Meetic chose to rebuild it's backend architecture using microservices and an event driven strategy. As we where moving along our old legacy application, testing features became gradually a pain, especially when those features rely on multiple changes across multiple components. Whatever the number of application you manage, unit testing is easy, as well as functional testing on a microservice. A good gherkin framework and a set of docker container can do the job. The real challenge is set in end-to-end testing even more when a feature can involve up to 60 different components.
To solve that issue, Meetic is building a Kubernetes strategy around testing. To do such a thing we need to :
- Be able to generate a docker container for each pull-request on any component of the stack
- Be able to create a full testing environment in the simplest way
- Be able to launch automated test on this newly created environment
- Have a clean-up process to destroy testing environment after tests To separate the various testing environment, we chose to use Kubernetes Namespaces each containing a variant of the Meetic stack. But when it comes to Kubernetes, managing multiple namespaces can be hard. Yaml configuration files need to be shared in a way that each people / automated job can access to them and modify them without impacting others.
This is typically why Meetic chose to develop it's own tool to manage namespace through a cli tool, or a REST API on which we can plug a friendly UI.
In this talk we will tell you the story of our CI/CD evolution to satisfy the need to create a docker container for each new pull request. And we will show you how to make end-to-end testing easier using Blackbeard, the tool we developed to handle the need to manage namespaces inspired by Helm.
This talk will try to cover the most important techniques and best practices used when creating Django web application.
Overview of the topics covered:
- development general principles and goals
- python/django project initial setup - project layout, git&venv&pip&shell, settings
- central project shell command - contains all commands to manage project
- "IDE" - editor & shell
- edit/run/test cycle
- deploy/test-remotely cycle
Disclaimer: techniques and practices presented are current AUTHOR'S optimal choice used for usual django project.
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...Amazon Web Services
Today’s cutting edge companies have software release cycles measured in days instead of months. This agility is enabled by the DevOps practice of continuous delivery, which automates building, testing, and deploying all code changes. This automation helps you catch bugs sooner and accelerates developer productivity. In this session, we’ll share the processes followed by Amazon engineers and discuss how you can bring them to your company by using AWS CodePipeline and AWS CodeDeploy, services inspired by Amazon's internal developer tools and DevOps culture.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
2. Hello
Background (the boring stuff)
Developer needs as we see it
Our solution (think of it as a case-study)
Q&A or whatever
3. We provide systems architecture and
engineering consulting to enterprise, small-
business development teams and start-ups.
Docker is transforming the way we think about
architecture and deployment.
This is how we’ve been helping clients
transition toward a robust container
environment…
6. Managing and Deploying Containers
Solid Advice and Emerging Best Practices
Technologies / Techniques / Case-Studies
Best Practices for Application Developers
“One Process Per Container”
Advice new adopters discover online…
7. Best Practices for Application Developers
“One Process Per Container”
Futile attempts to split complex existing applications
into separate container processes.
Redesign of existing systems using Micro-Services
architecture.
Attempts to do self-scripted container startup without
DevOps skill-set.
Attempts to create a mini-VM environment with
Supervisor, S6, runit, or even systemd.
and rarely…
8. “Your architecture is really outdated.
Nobody builds new systems like this
anymore, and considering some of your
code is 5 to 10 years old, we suggest
you rebuild everything from the ground-
up.”
So, this message hasn’t been getting
through very well…
9. “Containers can save you time and
money…
You can have greater application
consistency and stability for existing
development while paving the way for a
transition to better architectures.”
This works better…
11. Good Developers…
• Benefit from uninterrupted focus for their most
productive activities.
• Usually spend years mastering the language and
tools they use to achieve maximum productivity.
• Resist change because they know that goals will not
be met if they change toolsets without ample
consideration, or at the wrong time.
13. coders
Defining, managing, scheduling, deploying,
requirements changes, management changes,
technology changes, goal changes.
The most successful
strategies will
relieve pressure
rather than
add new skillsets
to the requirements.
14. Making things easier
means…
• A documented, well-managed non-root development
environment. Application code and related services
should never run as root.
• An environment where system services are properly
configured.
• Constraints to assure that application practices
which violate production requirements trigger errors.
15. How could we best assist
people at making the
transition?
We built a technology solution.
16. Goals
• Create a developer-friendly environment for people who spend
99% of their time coding applications.
• Assure developers can configure and develop their applications
without having to modify or understand container internals.
• “Scale down” necessary services like logging, cron, error
recovery, process management to assure all supporting services
present a properly-configured container environment.
• Create a consistent runtime model so that DevOps teams can
rely upon consistent requirements when developing, assembling,
testing and deploying applications using tools like compose, etc..
18. Chaperone
• Single PID 1 process that provides…
1. dependency-based startup, cron scheduling, script execution,
systemd notify protocol, orderly shutdown, zombie harvesting,
and…
2. syslog emulation, /dev/log capture and redirection
3. uid/gid mapping for attached storage
4. rich full-featured service, logging and environment configuration
• A general-purpose tool. Simple YAML configuration in a
single file, or can be as complex as desired.
• Open-source, well-documented.
19. chaperone-baseimage family
(at https://registry.hub.docker.com/repos/chapdev/)
• Collection of images which use Chaperone to establish
a robust development and deployment model.
• All images support three “personalities”:
• closed: applications and data reside inside the
container
• attached-data: applications and infrastructure reside
inside the container, data is external
• development: infrastructure is inside the container,
data and applications are external (usually in
developer’s home directory).
29. Development Model
Only infrastructure resides
in container.
docker run -i --rm chapdev/chaperone-lemp --task get-chaplocal | sh
Step 1:
Extract ‘chaplocal’ utility
from the desired container
30. Development Model
Only infrastructure resides
in container.
docker run -i --rm chapdev/chaperone-lemp --task get-chaplocal | sh
Step 1:
Extract ‘chaplocal’ utility
from the desired container
$ docker run -i --rm chapdev/chaperone-lemp --task get-chaplocal | sh
The 'chaplocal' script is ready to use. Here is the help you get if you type
./chaplocal
at the command line...
Usage: chaplocal [-d] local-apps-dir [image-name]
Runs the specified chaperone image and uses local-apps-dir for the apps
directory. Creates a script in local-apps-dir called run.sh so you can
run an interactive (default) or daemon instance.
Will run all container processes under the current user account with the
local drive mounted as a shared volume in the container.
If not specified, the the image 'chapdev/chaperone-lemp' will be used.
$
32. Development Model
Only infrastructure resides
in container.
Step 2:
Create and start a new
development directory
./chaplocal myappdir
./chaplocal myappdir
Extracting /apps default directory into /home/garyw/meetup/myappdir ...
You can customize the contents of /home/garyw/meetup/myappdir to tailor it for your application,
then use it as a template for your production image.
Executing run.sh within /home/garyw/meetup/myappdir ...
Port 8080 available at docker1:8080 ...
Port 8443 available at docker1:8443 ...
Jul 19 14:06:55 c8056b4d6b73 chaperone[1]: system wll be killed when '/bin/bash' exits
Now running inside container. Directory is: /home/garyw/meetup/myappdir
The default 'nginx' site is running at http://docker1:8080/
garyw@c8056b4d6b73:~/meetup/myappdir$
Processes run as… In directory… With data here…
—create-user user mounted: /home/garyw/apps mounted: /home/garyw/apps/var
33. Development Model
Only infrastructure resides
in container.
apps directory contents on
in developers’s home
directory
Processes run as… In directory… With data here…
—create-user user mounted: /home/garyw/apps mounted: /home/garyw/apps/var
garyw@c8056b4d6b73:~/meetup/myappdir$ ls -l
total 44
-rw-r--r-- 1 garyw garyw 328 Jul 19 14:06 bash.bashrc
drwxr-sr-x 2 garyw garyw 4096 Jul 19 13:24 bin
drwxr-sr-x 2 garyw garyw 4096 Jul 19 14:06 build
-rwxr-xr-x 1 garyw garyw 589 Jul 19 14:06 build.sh
drwxr-sr-x 2 garyw garyw 4096 Jul 19 13:24 chaperone.d
drwxr-sr-x 4 garyw garyw 4096 Jul 19 13:24 etc
-rw-r--r-- 1 garyw garyw 1016 Jun 10 03:53 README
-rwxr-xr-x 1 garyw garyw 1775 Jul 19 14:06 run.sh
drwxr-sr-x 2 garyw garyw 4096 Jul 19 13:24 startup.d
drwxr-sr-x 7 garyw garyw 4096 Jul 19 14:06 var
drwxr-sr-x 4 garyw garyw 4096 Jun 28 04:00 www
garyw@c8056b4d6b73:~/meetup/myappdir$ exit
34. Processes
run as…
In directory…
With data
here…
closed “runapps” /apps /apps/var
attached data
externally-specified
UID/GID
/apps
/apps/var
(attached)
developer
externally specified
UID/GID
/home/xxx/apps
(attached)
/home/xxx/apps/var
(attached)
Summary of container models
supported by chaperone-baseimage
and any derivatives
35. The result…
• Developers have a single, consistent development
model where…
• They control, configure, and add all services and
applications they need under their own user account
in their own development directory, and…
• Resulting images can be run using all three models:
closed, attached-data, and for additional
development.
40. Warning!
• In use in production, but just released this month as
open-source. Though well-tested and documented,
it is still a work in progress.
• Chaperone itself is platform neutral, but tools for
creating the development environment may need
minor tweaking for Kitematic or boot2docker
systems. Recommended environment is Linux host.
• Images have been tested under CentOS but there is
no CentOS base image yet (coming soon).
41. Q&A ++
me: http://garywiz.com
chaperone: https://github.com/garywiz/chaperone
documentation: http://garywiz.github.io/chaperone
chaperone-baseimage and friends:
https://github.com/garywiz/chaperone-docker
on Docker Hub:
https://registry.hub.docker.com/repos/chapdev/
Editor's Notes
End with: “How can we overcome this? First, let’s consider developers themselves…”
These are often at odds with the reality of organisations and businesses…
End with: HOW?
End with: “SO WE’LL QUICKLY SWITCH TO BEING TECHNICAL…”