Jupyter Kernel: How to Speak in Another LanguageWey-Han Liaw
The document discusses how to create a Jupyter kernel. It explains that kernels use ZeroMQ sockets to communicate with clients via the Jupyter messaging protocol. Native kernels are implemented from scratch while wrapper kernels are built from an existing interpreter using Python. The document provides examples of existing kernels like IJulia and the Python ipykernel. It also outlines the steps to build a wrapper kernel and mentions several other kernel types.
This document discusses Jupyter, an open-source tool for interactive data science and scientific computing. Jupyter allows for interactive exploration, development, and communication through code, equations, visualizations and narrative text. It supports over 50 programming languages and has found widespread adoption in academia and industry for individual and collaborative work across the entire workflow of a scientific idea from data collection to publication. The document outlines Jupyter's history and architecture, ecosystem of related projects, and future development plans to enhance collaboration and software engineering capabilities.
This document discusses using Jupyter notebooks, Pandas, and Spark for analytics pipelines on both small and large datasets. It summarizes the challenges of working with different data volumes and timeframes. For small mobile transaction data, notebooks with Pandas and R are used, while larger retail data is analyzed with Spark ML and scikit-learn in notebooks running in Docker containers. Future work includes applying Spark to additional domains and building forecasting and streaming capabilities.
Computable content: Notebooks, containers, and data-centric organizational le...Domino Data Lab
by Paco Nathan
Director, Learning Group at O’Reilly Media
This talk will present:
* the system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, Nginx, etc.
* data analytics and project experiences based on delivering _computable content_ at scale
* supporting theory for this pedagogical approach, including Knuth’s _Literate Programming_
* media production techniques that use the video as _subtext_
We will also consider the use of notebooks (Jupyter and others) in an organizational context: how do notebooks help teams share and learn? what impact might notebooks have on developer collaboration that is currently focused on IDEs? The resulting medium provides highly effective tooling for a data-centric organization.
Data analytics in the cloud with Jupyter notebooks.Graham Dumpleton
Jupyter Notebooks provide an interactive computational environment, in which you can combine Python code, rich text, mathematics, plots and rich media. It provides a convenient way for data analysts to explore, capture and share their research.
Numerous options exist for working with Jupyter Notebooks, including running a Jupyter Notebook instance locally or by using a Jupyter Notebook hosting service.
This talk will provide a quick tour of some of the more well known options available for running Jupyter Notebooks. It will then look at custom options for hosting Jupyter Notebooks yourself using public or private cloud infrastructure.
An in-depth look at how you can run Jupyter Notebooks in OpenShift will be presented. This will cover how you can directly deploy a Jupyter Notebook server image, as well as how you can use Source-to-Image (S2I) to create a custom application for your requirements by combining an existing Jupyter Notebook server image with your own notebooks, additional code and research data.
Specific use cases around Jupyter Notebooks which will be explored will include individual use, team use within an organisation, and class room environments for teaching. Other issues which will be covered include importing of notebooks and data into an environment, storing data using persistent volumes and other forms of centralised storage.
As an example of the possibilities of using Jupyter Notebooks with a cloud, it will be shown how you can easily use OpenShift to set up a distributed parallel computing cluster using ‘ipyparallel’ and use it in conjunction with a Jupyter Notebook.
Version control, like Git, allows developers to track changes made to code and assets over time by saving revisions to a remote server or repository. This allows for easy collaboration, reverting mistakes, taking risks with new features, and preventing work from being lost due to hardware failures. The document recommends using a distributed version control system like Git for game development projects and outlines best practices for setting it up with Unity.
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...NETWAYS
I gave a talk titled "Continuous Integration in data centers“ at OSDC in 2013, presenting ways how to realize continuous integration/delivery with Jenkins and related tools.Three years later we gained new tools in our continuous delivery pipeline, including Docker, Gerrit and Goss. Over the years we also had to deal with different problems caused by faster release cycles, a growing team and gaining new projects. We therefore established code review in our pipeline, improved our test infrastructure and invested in our infrastructure automation.In this talk I will discuss the lessons we learned over the last years, demonstrate how a proper continuous delivery pipeline can improve your life and how open source tools like Jenkins, Docker and Gerrit can be leveraged for setting up such an environment.
Jupyter Kernel: How to Speak in Another LanguageWey-Han Liaw
The document discusses how to create a Jupyter kernel. It explains that kernels use ZeroMQ sockets to communicate with clients via the Jupyter messaging protocol. Native kernels are implemented from scratch while wrapper kernels are built from an existing interpreter using Python. The document provides examples of existing kernels like IJulia and the Python ipykernel. It also outlines the steps to build a wrapper kernel and mentions several other kernel types.
This document discusses Jupyter, an open-source tool for interactive data science and scientific computing. Jupyter allows for interactive exploration, development, and communication through code, equations, visualizations and narrative text. It supports over 50 programming languages and has found widespread adoption in academia and industry for individual and collaborative work across the entire workflow of a scientific idea from data collection to publication. The document outlines Jupyter's history and architecture, ecosystem of related projects, and future development plans to enhance collaboration and software engineering capabilities.
This document discusses using Jupyter notebooks, Pandas, and Spark for analytics pipelines on both small and large datasets. It summarizes the challenges of working with different data volumes and timeframes. For small mobile transaction data, notebooks with Pandas and R are used, while larger retail data is analyzed with Spark ML and scikit-learn in notebooks running in Docker containers. Future work includes applying Spark to additional domains and building forecasting and streaming capabilities.
Computable content: Notebooks, containers, and data-centric organizational le...Domino Data Lab
by Paco Nathan
Director, Learning Group at O’Reilly Media
This talk will present:
* the system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, Nginx, etc.
* data analytics and project experiences based on delivering _computable content_ at scale
* supporting theory for this pedagogical approach, including Knuth’s _Literate Programming_
* media production techniques that use the video as _subtext_
We will also consider the use of notebooks (Jupyter and others) in an organizational context: how do notebooks help teams share and learn? what impact might notebooks have on developer collaboration that is currently focused on IDEs? The resulting medium provides highly effective tooling for a data-centric organization.
Data analytics in the cloud with Jupyter notebooks.Graham Dumpleton
Jupyter Notebooks provide an interactive computational environment, in which you can combine Python code, rich text, mathematics, plots and rich media. It provides a convenient way for data analysts to explore, capture and share their research.
Numerous options exist for working with Jupyter Notebooks, including running a Jupyter Notebook instance locally or by using a Jupyter Notebook hosting service.
This talk will provide a quick tour of some of the more well known options available for running Jupyter Notebooks. It will then look at custom options for hosting Jupyter Notebooks yourself using public or private cloud infrastructure.
An in-depth look at how you can run Jupyter Notebooks in OpenShift will be presented. This will cover how you can directly deploy a Jupyter Notebook server image, as well as how you can use Source-to-Image (S2I) to create a custom application for your requirements by combining an existing Jupyter Notebook server image with your own notebooks, additional code and research data.
Specific use cases around Jupyter Notebooks which will be explored will include individual use, team use within an organisation, and class room environments for teaching. Other issues which will be covered include importing of notebooks and data into an environment, storing data using persistent volumes and other forms of centralised storage.
As an example of the possibilities of using Jupyter Notebooks with a cloud, it will be shown how you can easily use OpenShift to set up a distributed parallel computing cluster using ‘ipyparallel’ and use it in conjunction with a Jupyter Notebook.
Version control, like Git, allows developers to track changes made to code and assets over time by saving revisions to a remote server or repository. This allows for easy collaboration, reverting mistakes, taking risks with new features, and preventing work from being lost due to hardware failures. The document recommends using a distributed version control system like Git for game development projects and outlines best practices for setting it up with Unity.
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...NETWAYS
I gave a talk titled "Continuous Integration in data centers“ at OSDC in 2013, presenting ways how to realize continuous integration/delivery with Jenkins and related tools.Three years later we gained new tools in our continuous delivery pipeline, including Docker, Gerrit and Goss. Over the years we also had to deal with different problems caused by faster release cycles, a growing team and gaining new projects. We therefore established code review in our pipeline, improved our test infrastructure and invested in our infrastructure automation.In this talk I will discuss the lessons we learned over the last years, demonstrate how a proper continuous delivery pipeline can improve your life and how open source tools like Jenkins, Docker and Gerrit can be leveraged for setting up such an environment.
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
In this talk, I will discuss what are the basic ideas that underpin Jupyter, and how they provide "lego blocks" that enable the project team, and the broader community, to develop a variety of tools and approaches to problems in interactive computing, data science, visualization and more.
1. This document discusses how to create an instant website using Python, Sphinx, and GitHub Pages by automating documentation through continuous integration and deployment workflows.
2. Key steps include setting up a Python virtual environment, installing Sphinx, configuring Sphinx deployment, building documentation locally, setting up GitHub Pages in a GitHub repository, and pushing changes to deploy updates automatically.
3. Automating documentation through these techniques provides benefits like keeping documentation close to code changes, tracking documentation issues like code, enabling iterative improvements, and allowing many contributors.
What is version control software and why do you need it?Leonid Mamchenkov
Version control software (VCS) manages changes to files such as documents, images, and code. It allows users to undo changes, try ideas, collaborate, and troubleshoot. VCS originated from engineering blueprints and software development in the early UNIX days. It works by storing revisions in a repository with branches and tags. Git is the most commonly used VCS as it is free, distributed, fast, and the standard for open source projects. Users can get started by installing Git, configuring user information, initializing repositories for projects, and committing file changes with descriptive messages.
IPython is an interactive Python shell, it provides tools for interactive and parallel computing that are widely used in the scientific world. It can also benefit any other Python developer.
This document provides an overview of GitHub and its technical architecture presented by Chris Wanstrath. Some key points:
- GitHub started as a git hosting site but became a social coding platform where users can see friends' activity and leave comments.
- It uses Ruby on Rails for the main codebase, Resque for background jobs, MySQL for the database, and nginx, unicorn, and memcached.
- Git operations are handled by Grit and communicated to file servers via the BERT-RPC based Smoke protocol.
- Caching, asset optimization, and AJAX loading are used extensively to improve performance. Monitoring tools include Nagios, Resque Web, Haystack, and CollectD.
C# - Raise the bar with functional & immutable constructs (Dutch)Rick Beerendonk
OO en C# hebben geen geheimen meer voor jou. Maar toch, af en toe is nog veel code nodig om eenvoudige dingen te doen. Welkom in de wereld van functioneel programmeren. Of je worstelt met state en variabelen die onder je neus wijzigen?! Immutable collections bieden een uitweg. Deze sessie zal diep ingaan op pure functions, persistent collections, memoize, interactive extensions en andere technieken die de ervaren C# ontwikkelaar in zijn gereedschapskist moet hebben.
"Git Tutorial" a hands-on session on Git presented at Theoretical Neuroscience Lab, IISER Pune.
Very brief overview of Git commands.
Github: https://github.com/pranavcode/git-tutorial
Open Source Tools for Leveling Up Operations FOSSET 2014Mandi Walls
This document discusses using open source tools to improve operations workflows and processes. It introduces various tools including Git for version control, packaging tools like FPM, and testing tools like Nagios plugins. The document advocates applying principles from development like testing, version control, and automation to make operations processes more reliable, transparent and reduce risk.
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
Fernando Perez gave a presentation on IPython and open source academia. He discussed (1) how IPython provides an interactive computing environment and notebook format to improve the scientific process, (2) the growth of IPython from a small project to a large open source ecosystem, and (3) challenges of open source work in an academic setting where rewards differ. He outlined a vision of building on abstractions like kernels, unified interactive and parallel computing, and growing the community.
Re-thinking Performance tuning with HTTP2Vinci Rufus
The document discusses how best practices for performance tuning with HTTP/1.1 may need to be re-thought with the introduction of HTTP/2. It provides an overview of how HTTP/2 addresses limitations of HTTP/1.1 like head-of-line blocking through features like multiplexing, binary framing, header compression and server push. It recommends approaches like keeping HTTP requests low and caching resources while avoiding past practices like excessive domain sharding or image sprites that are no longer needed with HTTP/2.
Git is a distributed version control system that allows developers to work collaboratively on projects. It works by creating snapshots of files in a project over time. Developers can commit changes locally and then push them to a remote repository to share with others. Key Git concepts include repositories, commits, branches, cloning repositories from remote locations, and commands like push, pull, commit, log and diff to manage changes.
This document provides an overview of Git and GitHub for code versioning and sharing. It discusses key Git concepts like branches, commits, and merges. It also demonstrates how to perform basic Git commands from the command line interface. GitHub is presented as a tool for easy collaboration on Git projects through features like forking and pull requests. Overall the document serves as an introduction to using Git and GitHub for researchers and code sharing.
This document provides an overview of version control using git and GitHub. It explains that git is a distributed version control system that allows users to track changes to files and collaborate on projects. GitHub is a web-based hosting service for git repositories that provides additional features like a user interface, documentation, and pull requests. The document outlines how to install git, create a GitHub account, and covers key git concepts like commits, repositories, cloning, pulling, and pushing changes.
Jupyter Notebooks allow users to write and run code interactively in the browser by combining code and rich text in a single document. They can be run locally on localhost:8888 after installing Anaconda, a Python distribution containing popular scientific libraries, or Jupyter, which is launched by typing $ jupyter notebook in a terminal. Jupyter Notebooks provide code, text, and some terminal functionality in an interactive browser-based environment for data science and scientific computing.
This document provides an introduction to Git, a distributed version control system. It discusses what Git is, its history and general features, how and where it can be used. It then provides a quick overview of installing Git, basic usage through a demo, why Git is advantageous compared to other version control systems like SVN, and some everyday Git commands and tools. Resources for learning more about Git are also listed.
The document provides an overview of version control systems and introduces Git and GitHub. It discusses the differences between centralized and distributed version control. It then covers the basics of using Git locally including initialization, staging files, committing changes, branching and merging. Finally, it demonstrates some common remote operations with GitHub such as pushing, pulling and tagging releases.
Package Management on Windows with ChocolateyPuppet
This document discusses using Puppet and Chocolatey for package management on Windows systems. It provides an overview of how Puppet works, why Chocolatey is useful as a package manager for Windows, how to use the Chocolatey Puppet provider to manage packages, how to create Chocolatey packages, host your own Chocolatey package server, and resources for learning more about Puppet and Windows management. It also includes an agenda for the content covered and a question and answer section.
Visual Analytics in Omics - why, what, how?Jan Aerts
This document discusses visual analytics in omics data. It begins by noting the shift from hypothesis-driven to data-driven research due to large datasets. Visual analytics can help explore these data by opening the "black box" of algorithms and enabling researchers to develop hypotheses. Effective visualization leverages human perception through techniques like preattentive vision and Gestalt laws. Challenges to visual analytics include scalability issues for large datasets and identifying interesting patterns for further analysis. Examples demonstrate data exploration, filtering, and user-guided analysis in genomic applications.
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
In this talk, I will discuss what are the basic ideas that underpin Jupyter, and how they provide "lego blocks" that enable the project team, and the broader community, to develop a variety of tools and approaches to problems in interactive computing, data science, visualization and more.
1. This document discusses how to create an instant website using Python, Sphinx, and GitHub Pages by automating documentation through continuous integration and deployment workflows.
2. Key steps include setting up a Python virtual environment, installing Sphinx, configuring Sphinx deployment, building documentation locally, setting up GitHub Pages in a GitHub repository, and pushing changes to deploy updates automatically.
3. Automating documentation through these techniques provides benefits like keeping documentation close to code changes, tracking documentation issues like code, enabling iterative improvements, and allowing many contributors.
What is version control software and why do you need it?Leonid Mamchenkov
Version control software (VCS) manages changes to files such as documents, images, and code. It allows users to undo changes, try ideas, collaborate, and troubleshoot. VCS originated from engineering blueprints and software development in the early UNIX days. It works by storing revisions in a repository with branches and tags. Git is the most commonly used VCS as it is free, distributed, fast, and the standard for open source projects. Users can get started by installing Git, configuring user information, initializing repositories for projects, and committing file changes with descriptive messages.
IPython is an interactive Python shell, it provides tools for interactive and parallel computing that are widely used in the scientific world. It can also benefit any other Python developer.
This document provides an overview of GitHub and its technical architecture presented by Chris Wanstrath. Some key points:
- GitHub started as a git hosting site but became a social coding platform where users can see friends' activity and leave comments.
- It uses Ruby on Rails for the main codebase, Resque for background jobs, MySQL for the database, and nginx, unicorn, and memcached.
- Git operations are handled by Grit and communicated to file servers via the BERT-RPC based Smoke protocol.
- Caching, asset optimization, and AJAX loading are used extensively to improve performance. Monitoring tools include Nagios, Resque Web, Haystack, and CollectD.
C# - Raise the bar with functional & immutable constructs (Dutch)Rick Beerendonk
OO en C# hebben geen geheimen meer voor jou. Maar toch, af en toe is nog veel code nodig om eenvoudige dingen te doen. Welkom in de wereld van functioneel programmeren. Of je worstelt met state en variabelen die onder je neus wijzigen?! Immutable collections bieden een uitweg. Deze sessie zal diep ingaan op pure functions, persistent collections, memoize, interactive extensions en andere technieken die de ervaren C# ontwikkelaar in zijn gereedschapskist moet hebben.
"Git Tutorial" a hands-on session on Git presented at Theoretical Neuroscience Lab, IISER Pune.
Very brief overview of Git commands.
Github: https://github.com/pranavcode/git-tutorial
Open Source Tools for Leveling Up Operations FOSSET 2014Mandi Walls
This document discusses using open source tools to improve operations workflows and processes. It introduces various tools including Git for version control, packaging tools like FPM, and testing tools like Nagios plugins. The document advocates applying principles from development like testing, version control, and automation to make operations processes more reliable, transparent and reduce risk.
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
Fernando Perez gave a presentation on IPython and open source academia. He discussed (1) how IPython provides an interactive computing environment and notebook format to improve the scientific process, (2) the growth of IPython from a small project to a large open source ecosystem, and (3) challenges of open source work in an academic setting where rewards differ. He outlined a vision of building on abstractions like kernels, unified interactive and parallel computing, and growing the community.
Re-thinking Performance tuning with HTTP2Vinci Rufus
The document discusses how best practices for performance tuning with HTTP/1.1 may need to be re-thought with the introduction of HTTP/2. It provides an overview of how HTTP/2 addresses limitations of HTTP/1.1 like head-of-line blocking through features like multiplexing, binary framing, header compression and server push. It recommends approaches like keeping HTTP requests low and caching resources while avoiding past practices like excessive domain sharding or image sprites that are no longer needed with HTTP/2.
Git is a distributed version control system that allows developers to work collaboratively on projects. It works by creating snapshots of files in a project over time. Developers can commit changes locally and then push them to a remote repository to share with others. Key Git concepts include repositories, commits, branches, cloning repositories from remote locations, and commands like push, pull, commit, log and diff to manage changes.
This document provides an overview of Git and GitHub for code versioning and sharing. It discusses key Git concepts like branches, commits, and merges. It also demonstrates how to perform basic Git commands from the command line interface. GitHub is presented as a tool for easy collaboration on Git projects through features like forking and pull requests. Overall the document serves as an introduction to using Git and GitHub for researchers and code sharing.
This document provides an overview of version control using git and GitHub. It explains that git is a distributed version control system that allows users to track changes to files and collaborate on projects. GitHub is a web-based hosting service for git repositories that provides additional features like a user interface, documentation, and pull requests. The document outlines how to install git, create a GitHub account, and covers key git concepts like commits, repositories, cloning, pulling, and pushing changes.
Jupyter Notebooks allow users to write and run code interactively in the browser by combining code and rich text in a single document. They can be run locally on localhost:8888 after installing Anaconda, a Python distribution containing popular scientific libraries, or Jupyter, which is launched by typing $ jupyter notebook in a terminal. Jupyter Notebooks provide code, text, and some terminal functionality in an interactive browser-based environment for data science and scientific computing.
This document provides an introduction to Git, a distributed version control system. It discusses what Git is, its history and general features, how and where it can be used. It then provides a quick overview of installing Git, basic usage through a demo, why Git is advantageous compared to other version control systems like SVN, and some everyday Git commands and tools. Resources for learning more about Git are also listed.
The document provides an overview of version control systems and introduces Git and GitHub. It discusses the differences between centralized and distributed version control. It then covers the basics of using Git locally including initialization, staging files, committing changes, branching and merging. Finally, it demonstrates some common remote operations with GitHub such as pushing, pulling and tagging releases.
Package Management on Windows with ChocolateyPuppet
This document discusses using Puppet and Chocolatey for package management on Windows systems. It provides an overview of how Puppet works, why Chocolatey is useful as a package manager for Windows, how to use the Chocolatey Puppet provider to manage packages, how to create Chocolatey packages, host your own Chocolatey package server, and resources for learning more about Puppet and Windows management. It also includes an agenda for the content covered and a question and answer section.
Visual Analytics in Omics - why, what, how?Jan Aerts
This document discusses visual analytics in omics data. It begins by noting the shift from hypothesis-driven to data-driven research due to large datasets. Visual analytics can help explore these data by opening the "black box" of algorithms and enabling researchers to develop hypotheses. Effective visualization leverages human perception through techniques like preattentive vision and Gestalt laws. Challenges to visual analytics include scalability issues for large datasets and identifying interesting patterns for further analysis. Examples demonstrate data exploration, filtering, and user-guided analysis in genomic applications.
This document discusses the shift from hypothesis-driven to data-driven scientific research paradigms and the role of visualization in facilitating human reasoning about complex data. It describes visualization as a framework involving interaction, visual representations, and analytics to support biological data exploration and hypothesis generation. Examples are provided of visualization tools that enable interactive analysis, algorithm development by making black boxes transparent, and user-guided analysis through continuous refinement. Challenges in scalability, uncertainty, evaluation and infrastructure are also discussed.
Visual Analytics in Omics: why, what, how?Jan Aerts
Visual Analytics in omics can help address several challenges in analyzing complex biological data:
- It allows researchers to explore large datasets in an interactive way to generate hypotheses, as the initial analysis is often exploratory rather than driven by a specific hypothesis.
- It opens the "black box" of automated analysis by making the analysis process transparent and understandable to domain experts.
- Effective visualization techniques leverage human visual perception and cognition to facilitate reasoning about the data.
This document discusses visualizing genomic variation from DNA sequencing data. It begins by defining genomic variation such as single nucleotide polymorphisms and structural variations. It then discusses analyzing multiple samples, showing affected genes and clustering individuals. The document outlines challenges in visualizing high-dimensional genomic data from deep sequencing at scale, while maintaining computational performance for interactivity. It proposes representing rearranged chromosomes based on segment relationships to focus on functional impacts.
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
This document discusses visualizing structural variation in genomes. It begins by defining structural variation and copy number variation. It then discusses why structural variation is important, listing examples of traits influenced by copy number differences. The document outlines challenges in visualizing structural variation data from techniques like array CGH and sequencing. It proposes dual approaches - focusing on functional impact and representing rearranged chromosomes based on segment relationships. Future directions discussed include single-cell analysis and cross-omic data integration.
This document discusses tools for improving reproducibility in research, including hosting data in GigaDB, sharing images using OMERO, implementing workflows using Galaxy and executable documents, and sharing virtual machines. It emphasizes the need for publishers to host and curate research objects like data, code, and workflows and provide citations for reproducible research. Key tools highlighted are GigaDB for data hosting, OMERO for image hosting, Galaxy for implementing workflows, and virtual machines for sharing full computational environments.
Youtube Link: https://youtu.be/ou65T_mC8Z8
** Python Certification Training: https://www.edureka.co/data-science-python-certification-course **
This Edureka PPT on 'Python Spyder iDE' will train you to use the Python Spyder IDE along with its installation and customizations.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
Many data scientists are already making heavy usage of the Jupyter ecosystem for analyzing data using interactive notebooks. Apache Toree (incubating) is a Jupyter kernel designed that enables data scientists and data engineers to easily connect and leverage Apache Spark and its powerful APIs from a standard Jupyter notebook to execute their analytics workloads. In this talk, we will go over what's new with the most recent Apache Toree release. We will cover available magics and visualizations extensions that can be integrated with Toree to enable better data exploration and data visualizations. We will also describe some high-level designs of Toree and how users can extend the functionality of Apache Toree powerful plugin system. And all of these with multiple live demos that demonstrate how Toree can help with your analytics workloads in an Apache Spark environment.
This document discusses Sonian's contributions to open source projects like Fog, Elasticsearch, OpenStack Swift, and Chef. It also describes Sensu, an open source monitoring framework developed by Sonian. Sensu is designed for dynamic cloud environments using a messaging architecture with RabbitMQ and Redis. It allows reusing existing Nagios plugins and is intended to work with configuration management tools like Chef and Puppet. The document advocates adopting an open source community approach around Sensu to help test, develop plugins/modules, and provide documentation.
This document provides an introduction to Python programming. It discusses that Python is an interpreted, object-oriented, high-level programming language with simple syntax. It then covers the need for programming languages, different types of languages, and translators like compilers, interpreters, assemblers, linkers, and loaders. The document concludes by discussing why Python is popular for web development, software development, data science, and more.
Getting Started With Jenkins And DrupalPhilip Norton
Jenkins is a really powerful tool for automating things like code analysis, testing and even deployment. Getting started with Jenkins, especially with Drupal, is a challenge and can be quite difficult for a beginner to the system. In this session I'll show you how to install Jenkins, how to configure things like authentication and then how to do some interesting things with the tool. I'll show some real life examples of things that can be done with the tool on your Drupal sites to do things like run cron jobs, syntax check the code or even automatically copying code to your web servers.
Continuous Integration with Open Source Tools - PHPUgFfm 2014-11-20Michael Lihs
Presentation about open source tools to set up continuous integration and continuous deployment. Covers Git, Gitlab, Chef, Vagrant, Jenkins, Gatling, Dashing, TYPO3 Surf and some other tools. Shows some best practices for testing with Behat and Functional Testing.
The Five Stages of Enterprise Jupyter DeploymentFrederick Reiss
Meetup talk from May 30, 2018.
Jupyter notebooks are an important tool for data science. For a single user on a laptop, these notebooks are a simple, straightforward tool. But Jupyter in the enterprise is a much more complex affair. Enterprises have large teams of data scientists who need to run their notebooks atop scalable compute infrastructure with secure, audited access to massive, proprietary data sets; all while keeping hardware costs down.
Here at IBM’s Center for Open-Source Data and AI Technologies, we’ve seen multiple enterprise rollouts of Jupyter notebooks, both first-hand, in IBM products and services; and second-hand, in our discussions with other members of the Jupyter community.
In this talk, we merge together the stories of these projects and walk through the process of deploying high-performance, secure, mulitentant Jupyter notebooks in an enterprise setting. Our goal is here is inform others who may be at the beginning of this journey of what is coming and how to navigate the challenges ahead.
Along the way, we answer five important questions: What are Jupyter notebooks? What makes Jupyter so attractive to data scientists? Why is deploying Jupyter in the enterprise difficult? What are your deployment options today? And, what are the tradeoffs of those approaches?
We’ll finish with a description of how how IBM and other members of the Jupyter community are working towards reducing those tradeoffs with the Jupyter Enterprise Gateway project. Finally, we’ll give a demonstration of multitenant Jupyter notebooks in action.
This talk is aimed at enterprise architects who need to support growing data science teams with multi-user deployments of Jupyter. No knowledge of data science is required.
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
The presentation showcased at the Open Source Summit North America 2018 in Vancouver, BC. It covers the learnings from transitioning the MSDN site functionality and content to docs.microsoft.com.
Resumable File Upload API using GridFS and TUSkhangtoh
TUS is a resumable file upload protocol and with MongoDB GridFS, we build an API for uploading files through a REST API and show how to scale this API horizontally using MongoDB as the storage for these files.
Singapore MongoDB User Group March Meetup
On the Edge Systems Administration with GolangChris McEniry
This document describes a tutorial on systems administration topics using the Go programming language. It provides an overview of the schedule and topics to be covered, including Go language features like interfaces, files, web servers, TLS, HTTP/2, JSON, package management, one-liners, cross-compilation, metrics, containers, and SSH. It also lists some prerequisites and expectations around the example code provided, noting that errors will be panicked and the code is for demonstration purposes only and not meant for production use. The document is intended to serve as an agenda and introduction to the tutorial content.
This document provides a case study on a project created using open source technology. It discusses analyzing project goals and resources, evaluating open source options based on total cost of ownership, implementing a solution using LAMP stack, and lessons learned. The project was developed using Linux, Apache, MySQL, and PHP based on the needs of a low budget, ability to invest in internal skills, and reduce dependency on external trends. Key steps included preparing the Linux server, using version control and local testing, and engaging the open source community for support.
SymfonyCon Madrid 2014 - Rock Solid Deployment of Symfony AppsPablo Godel
Web applications are becoming increasingly more complex, so deployment is not just transferring files with FTP anymore. We will go over the different challenges and how to deploy our PHP applications effectively, safely and consistently with the latest tools and techniques. We will also look at tools that complement deployment with management, configuration and monitoring.
Everyone wants (someone else) to do it: writing documentation for open source...Jody Garnett
Many people will cite how their adoption of software was based on the quality of documentation, and yet documentation can be one of the largest gaps in quality with an open source project. This talk will discuss why that is, what you (yes you) can do about it, and how the author has so far managed to avoid burnout by learning to accept less-than-perfect grammar.
A FOSS4G 2015 Presentation
This document discusses reproducible research and provides guidance on key practices and tools to support reproducibility. It defines reproducibility as distributing all data, code, and tools required to reproduce published research results. Version control systems like Git allow researchers to track changes over time and collaborate more effectively. Tools like DMPTool can help researchers create data management plans and plan for long-term storage and sharing of research data and materials. R Markdown allows integrating human-readable text with executable code to produce reproducible reports and analyses.
Using NuGet the way you should
Consuming NuGet packages, that’s what everyone does. Open source projects create NuGet packages and post them on NuGet.org. Meanwhile, all of us are still working with shared projects and fighting relative paths, versioning and so on. In this talk, we’ll use Visual Studio, NuGet and TeamCity to work with NuGet the way you should. Project references must die! Add Package Reference and good continuous integration is everything you will ever need.
The document discusses the author's approach to setting up a development environment for Django projects. It describes establishing a project layout with separate folders for source code, virtual environments, requirements files, and more. It also covers tools and practices for tasks like dependency management, testing, debugging, deployment, and overall software development philosophy.
Reproducibility and automation of machine learning processDenis Dus
A speech about organization of machine learning process in practice. Conceptual and technical aspects discussed. Introduction into Luigi framework. A short story about neural networks fitting in Flo - top-level mobile tracker of women health.
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
This document discusses the challenges of reproducibility in bioinformatics. It notes that for an analysis to be repeatable, the same data, code, and version information must be available. However, obtaining the exact same starting data can be difficult when data is large, hardware fails, or filtering steps are not documented. Pipelines help capture and automate analyses but are not a panacea, as quality control requires human judgment. The best approach may be to package and publish individual analyses with documentation of the full process.
Similar to A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology (20)
The document discusses humanizing data analysis by putting the human back in the loop of data analysis processes. It notes that current data analysis involves filtering and other automated tasks that act as a "black box" for humans. The author argues that data analysis should involve generating hypotheses with the human perspective in mind through techniques like visual analytics and cognitive tasks to make the data analysis process more transparent and understandable for people.
This document provides an introduction to data visualization. It discusses what data visualization is, why it is used, and the stages involved in creating visualizations from data. Key points include:
- Data visualization involves using visual representations of data to help people analyze and communicate information more effectively.
- Visualizations are used for tasks like recording information, analyzing data to support reasoning, and communicating information.
- The process of creating visualizations involves understanding the properties of the data, properties of images and perception, and rules for mapping data to visual encodings.
- Important considerations include which visual variables to use to encode different data properties, principles of visual perception, and enabling interaction with the data. Validation of the effectiveness of
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
The document introduces Dao, a new programming language for bioinformatics. It discusses Dao's key features like optional typing, native support for concurrent programming, an LLVM-based JIT compiler, simple C interfaces, and the ClangDao tool for wrapping C/C++ libraries. An example demonstrates using thread tasks and futures for concurrent programming. The document outlines future plans to develop BioDao, an open source project providing bioinformatics modules to the Dao language.
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
Presentation at BOSC2012 by J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module for distributed analysis of large-scale biological data
GMOD in the Cloud provides preinstalled GMOD tools like Tripal, Chado, GBrowse, and JBrowse on cloud.gmod.org. These tools allow users to visualize, annotate, and manage biological data in the cloud. Potential use cases include community annotation events where users can load data, configure tools, annotate, and then export annotations without installing software locally. Using the cloud avoids installation issues and saves money while providing access to sample genomic datasets.
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
The Bioinformatics Testing Consortium aims to improve bioinformatics software by having software tested by others in addition to the developers. It will assign testers to review open source bioinformatics projects and ensure they meet minimum standards through running standard tests and verifying output matches test data. This benefits new users by providing more reliable software, developers by identifying bugs, testers by learning quality standards, and journal editors by ensuring published software is fit for purpose. The consortium seeks feedback, participation, test cases, and engagement on Twitter to achieve its goals.
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
The document describes Galaxy, an open-source web-based platform for visual analysis of genomic data. Galaxy provides tools for obtaining, integrating, analyzing, visualizing, sharing and publishing complete genomic analyses through a graphical user interface. It allows users to easily chain tools and create complex analysis workflows. The document highlights several Galaxy visualization tools, including Trackster for interactive exploration of large genomic datasets, Paramamonster for parameter space exploration, and Circster for circular genome-wide views. Future directions include expanding visualization capabilities to other data types and integrating multiple coordinated views.
GMOD in the Cloud provides preinstalled GMOD tools like Tripal, Chado, GBrowse, and JBrowse on cloud.gmod.org. These tools allow users to visualize, annotate, and manage biological data in the cloud. Potential use cases include community annotation events where users can load data, configure tools, annotate, and then export annotations without installing software locally. Using the cloud avoids installation issues and saves money while providing access to sample genomic datasets.
B Chapman - Toolkit for variation comparison and analysisJan Aerts
The document describes a toolkit for comparing variant calls from different variant callers and sequencing technologies. It proposes establishing a set of true variants by comparing calls across multiple callers and technologies on gold standard genomes. The toolkit includes a comparison architecture that analyzes variants, identifies real variants by summarizing metrics, and scales to large numbers of variants and samples. It also describes building analysis pipelines in Clojure and providing comparison results through a web interface with metrics. The goal is to help answer biological questions by determining true variants and prioritizing based on existing evidence.
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
The KUPKB integrates thousands of kidney and urinary pathway studies into an RDF knowledge base using ontologies to provide schema and annotation. The iKUP browser exposes the knowledge in a simple web interface, allowing biologists to more easily survey biological publications and generate hypotheses than traditional literature searches. The tools and APIs used make it possible to build such applications at relatively low cost.
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
InterMine is an integrated data warehouse with an optimizing query engine. It provides web services and embeddable widgets to make powerful data querying accessible to non-technical users. InterMine runs databases for various model organisms and is working to make machine-readable APIs and data displays universally accessible.
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
This document discusses how to quickly set up a bioinformatics analysis platform in four minutes using various open source tools. It introduces CloudBioLinux for building custom tool suites, CloudMan for creating scalable processing platforms, Galaxy for exploratory analysis, and BioCloudCentral for getting started easily. A new Python library called Blend is also introduced for automating repetitive tasks related to analysis and infrastructure manipulation using the APIs of these tools.
B Kinoshita - Creating biology pipelines with BioUnoJan Aerts
BioUno is an open source project that uses continuous integration tools like Jenkins to create biology pipelines. It was created by Bruno Kinoshita in Brazil as a way to apply DevOps practices to biology. BioUno uses Jenkins for its jobs, notifications, and integration with other tools. The next steps are to enhance documentation, find new developers and users, and compare BioUno to other similar biology tools.
The document discusses updates to the Galaxy API and automatic parallelization capabilities. The RESTful Galaxy API now uses JSON and authentication keys instead of usernames/passwords. Tools can be configured for automatic parallelization to take advantage of available resources. The Tool Shed allows simple installation and updating of tools and workflows in a Galaxy instance.
The document discusses how integrative studies can provide insights through combining candidate genomic regions, mitochondrial proteomic data, and cancer expression compendiums to discover genes involved in diseases like Leigh Syndrome and cancers. It also highlights several other studies that have integrated data like DNA sequences, copy numbers, methylation, expression profiles, and pathways to characterize disease subtypes and improve risk stratification for conditions such as glioblastoma multiforme and medulloblastoma. The document presents an example of a translational research study that integrated multiple genomic data types and computational tools in 12 steps to analyze alterations in gene expression and identify potential transcription factor binding sites.
CT Brown - Doing next-gen sequencing analysis in the cloudJan Aerts
This document summarizes work on digital normalization, a technique for reducing sequencing data size prior to assembly. Digital normalization works by discarding reads whose k-mer counts are below a cutoff, based on analysis of k-mer abundances across the dataset. It can remove over 95% of data in a single pass with fixed memory. This makes genome and metagenome assembly scalable to larger datasets using cloud computing resources. The work is done in an open science manner, with all code, data, and manuscripts openly accessible online.
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...Jan Aerts
Cloudgene is an open-source platform that provides a graphical web interface to simplify the execution of MapReduce programs for genomic data analysis in public and private clouds. It allows users to integrate different MapReduce programs through a plugin interface, import and export data from various sources, and connect programs together in a pipeline. Cloudgene handles setting up clusters in public clouds and installing programs and data, making it easier for scientists to perform MapReduce analysis without having to manage the underlying infrastructure.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology
1. PyPedia
The free programming environment
that anyone can edit!
AlexandrosKanterakis
Genomics Coordination Center, Department of Genetics,
University Medical Center, Groningen, The Netherlands
3. How not to be a bioinformatician
• Stay low level at every level
• Be open source without being open
• Make tools that make no sense to scientists
• Do not ever share your results and do not reuse
• Never maintain your databases and web services
• Be unreachable and isolated
4. So, you think you can be a
bioinformatician…
• Imagine you only have: A personal computer
with a browser and an Internet connection
• Answer the following question:
- Who is the current prime minister of Latvia?
5. SYTYCBAB
• Imagine you only have: A personal computer with
a browser and an Internet connection
• Answer the following question:
Compute the Hardy-Weinberg equilibriums of a set of
genotypes
Execute
Source
Documentation
Execute
Source
Documentation
Execute
Source
Documentation
6. Execute
Source
Documentation
But what about…
? Web environment, online execution
? Open Source
? Integrate with other tools
? Edit a method and share it
? Examples and Unit tests
? Deploy in the cloud
? Frequency of new releases
7. Apython sandbox to the rescue
From:
http://wiki.python.org/moin/SandboxedPython
So:
Google App Engine + MediaWiki = PyPedia
12. Executing a method in a remote computer
• Edit your user page and add an “ssh” section:
==ssh==
host=ec2-107-22-59-115.compute-1.amazonaws.com
username=JohnDoe
path=/home/JohnDoe/runPyPedia
• This content is NOT shown to anyone
• Install the PyPedia client on remote
computer(details on pypedia.com)
13. “Execute on remote computer”
Example:
Fixed_point_user_JohnDoe
The cloud instance contains:
numpy, scipy, matplotlib
Like SAGE but with custom
execution environments
(i.eBioPython, PyCogent, …)
14. Cool, but I want to call the function from my local computer..
• Install the PyPedia python library:
git clone git://github.com/kantale/pypedia.git
• Load the function in python:
import pypedia
from pypedia importPairwise_linkage_disequilibrium
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G),
(G,A)], [(A,A), (A,G), (G,G), (A,A)])
{'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG',
2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498,
0.3125), ('GG', 0.37499999997393502, 0.1875)], 'R_sq':
0.59999999983318408, 'Dprime': 0.99999999986098675}
• You can call the method of any user and your method can be
called by anyone.
• Edit locally, push changes.
15. • On the top of each article there is a button:
• Creates a personalized version of the article that only
you can edit.
• This is similar to the Github’s “fork” feature.
16. Using PyPedia for open science
• A complete analysis can be hosted in PyPedia
• Any finding generated or published should be
easily shared and reproduced.
• The reproduction of a finding takes time even
when the source code is released.
17. Reproducible science
• PyPedia offers a REST interface:
• www.pypedia.com/index.php?
b_timestamp=YYYYMMDDHHMMSS
get_code=python code
• Get the most recent version of the python
code that is edited before the timestamp.
• Reproduce the analysis by sharing a single URL:
http://www.pypedia.com/index.php?b_timestamp=20120102101010get_code=print
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G),
(A,A)])
19. Meta-webserver
• HTML injection is allowed
and encouraged!
http://www.pypedia.com/index.php/Draw_face_user_Kantale
• Example run an HTML code
posted on gist:
http://www.pypedia.com/index.php?
run_code=
import urllib2
print urllib2.urlopen(
‘https://raw.github.com/gist/2689822/bbea0c43b278d7c4c04
b3f7a23ba43f558fba98b/index_full.html’).read()
Click me!
20. • All content is under the Simplified BSD License
• Two namespaces:
– Validated articles. i.e: Minor_allele_frequency
• Safe, only admins can edit
– User articles. i.e: Minor_allele_frequency_user_John
• Unsafe, edited by individual user
– Qualitative articles from User namespace is
promoted to the Validated namespace
– Validated articles cannot call User articles (duh..)
21.
22. Some thoughts
(in the embarrassing occasion I have some minutes left)
Code as wiki, program as wiki concept
• Multidimensional expansion
• As Mao said: Let a thousand flowers scripts bloom (and
some of them rot in hell)
• Minimize the distance:
Dsanity(SCRIPTmade_by_IT_guy, SCRIPTuseful_to_biologists)
• Encyclopedialize™ your scripts because open source isn’t
enough!
Future steps:
• Attract editors, make communities!
• If it can be done in python, why not Ruby, …?