Introduction to genome sequencing and bioinformatics. Constructing and execution portable, reproducible analysis in Common Workflow Language. Jupyter Notebook on the cloud. Example of bioinformatics analysis: Neoantigen discovery using NGS data.
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma
IIBMP2020 Poster #77 Generating annotation texts of HLA sequences with antigen classes by a T5 (Text-to-Text Transfer Transformer) model using International Nucleotide Sequence Database
This document provides an overview of bioinformatics databases and file formats for storing genetic sequence data. It discusses flat file databases like GenBank that store sequences in plain text formats. It also describes relational databases that allow querying across related data fields. Examples of biological relational databases and tools for working with sequence data files are also presented.
DIYA: An annotation pipeline for any genomics labAndrew Stewart
The document describes DIYA, an open source pipeline for annotating genomic sequences. The pipeline takes DNA contigs as input and produces a fully annotated sequence as output. It is modular and expandable. The pipeline includes steps for assembly, gene prediction, BLAST searches, and identification of non-coding RNA and tRNA. The software is designed to be decentralized and accessible for smaller genomics labs.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonoveurobsdcon
Abstract
The next release of NetBSD will have a support for Just-In-Time (JIT) compilation of bpf programs in the kernel; this change will greatly speed-up traffic sniffing on multiple platforms. Unlike similar interface in other operating systems, bpfjit uses a unified programming interface for code generation which is based on Stack Less JIT Compiler library (SLJIT) and which supports x86, mips, arm, sparc and some other platforms.
The speaker will give an overview of SLJIT API and discuss some implementation details of the bpfjit code with emphasis on supported optimizations of bpf programs by JIT engine. He will also touch on unit testing of dynamically generated code running inside the kernel and on other areas in the NetBSD project where bpfjit can help in boosting performance."
Speaker bio
Alex is a software developer working in the financial sector in the City of London. He often amuses fellow tube passengers with C or Lua coding in NetBSD console and sometimes even with the green kernel debugger prompt.
This document summarizes RefSeq's curation and annotation of the reference human genome GRCh38. It discusses how RefSeq provides manual curation of known transcripts and proteins as well as model annotations from computational pipelines. It also describes RefSeq's collaboration with other groups to transition annotations from GRCh37 to GRCh38 and handle structural variations and alternative loci.
The document summarizes a lecture on identifying SNPs, indels, and structural variants from next-generation sequencing data. It discusses the VCF format for storing variant call data, methods for identifying SNPs and indels, and approaches for detecting structural variants like insertions, deletions, and inversions using read pair information. It also covers sources of bias in variant calling and strategies for evaluating called variants.
Advanced Evasion Techniques by Win32/GapzAlex Matrosov
The document discusses advanced evasion techniques used by the Win32/Gapz malware. It describes how Gapz uses droppers, bootkits, and rootkit functionality for stealthy infection. The dropper uses PowerLoader and code injection into explorer.exe to bypass detection. The bootkit modifies the MBR and VBR to load at early boot stages. The rootkit implements hidden storage, process injection, and covert network communication channels.
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma
IIBMP2020 Poster #77 Generating annotation texts of HLA sequences with antigen classes by a T5 (Text-to-Text Transfer Transformer) model using International Nucleotide Sequence Database
This document provides an overview of bioinformatics databases and file formats for storing genetic sequence data. It discusses flat file databases like GenBank that store sequences in plain text formats. It also describes relational databases that allow querying across related data fields. Examples of biological relational databases and tools for working with sequence data files are also presented.
DIYA: An annotation pipeline for any genomics labAndrew Stewart
The document describes DIYA, an open source pipeline for annotating genomic sequences. The pipeline takes DNA contigs as input and produces a fully annotated sequence as output. It is modular and expandable. The pipeline includes steps for assembly, gene prediction, BLAST searches, and identification of non-coding RNA and tRNA. The software is designed to be decentralized and accessible for smaller genomics labs.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonoveurobsdcon
Abstract
The next release of NetBSD will have a support for Just-In-Time (JIT) compilation of bpf programs in the kernel; this change will greatly speed-up traffic sniffing on multiple platforms. Unlike similar interface in other operating systems, bpfjit uses a unified programming interface for code generation which is based on Stack Less JIT Compiler library (SLJIT) and which supports x86, mips, arm, sparc and some other platforms.
The speaker will give an overview of SLJIT API and discuss some implementation details of the bpfjit code with emphasis on supported optimizations of bpf programs by JIT engine. He will also touch on unit testing of dynamically generated code running inside the kernel and on other areas in the NetBSD project where bpfjit can help in boosting performance."
Speaker bio
Alex is a software developer working in the financial sector in the City of London. He often amuses fellow tube passengers with C or Lua coding in NetBSD console and sometimes even with the green kernel debugger prompt.
This document summarizes RefSeq's curation and annotation of the reference human genome GRCh38. It discusses how RefSeq provides manual curation of known transcripts and proteins as well as model annotations from computational pipelines. It also describes RefSeq's collaboration with other groups to transition annotations from GRCh37 to GRCh38 and handle structural variations and alternative loci.
The document summarizes a lecture on identifying SNPs, indels, and structural variants from next-generation sequencing data. It discusses the VCF format for storing variant call data, methods for identifying SNPs and indels, and approaches for detecting structural variants like insertions, deletions, and inversions using read pair information. It also covers sources of bias in variant calling and strategies for evaluating called variants.
Advanced Evasion Techniques by Win32/GapzAlex Matrosov
The document discusses advanced evasion techniques used by the Win32/Gapz malware. It describes how Gapz uses droppers, bootkits, and rootkit functionality for stealthy infection. The dropper uses PowerLoader and code injection into explorer.exe to bypass detection. The bootkit modifies the MBR and VBR to load at early boot stages. The rootkit implements hidden storage, process injection, and covert network communication channels.
The document summarizes the Biopython project and the Python ecosystem for bioinformatics. It discusses how Biopython provides tools for working with biological data, such as reading and writing sequences and restriction enzymes. It also describes related Python packages like NumPy, SciPy, and Matplotlib that are useful for scientific computing and visualization. Finally, it outlines future goals for Biopython and opportunities for community contributions.
This document provides an introduction to biological databases. It discusses primary databases like GenBank which contain original sequence submissions and secondary databases derived from primary data, maintained by third parties like NCBI. Some key databases mentioned include GenBank, PDB, Swiss-Prot. The document also provides an overview of the NCBI and Entrez retrieval system, which allows integrated searches across literature and sequences.
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
This document provides an overview of bioinformatics tools and services for analyzing big data in biomedical research. It discusses traditional bioinformatics tools, analyzing genomic data from microarrays and next-generation sequencing without and with code, interpreting results using protein interaction networks and pathways, tools for data storage, cleaning and visualization, and making research reproducible. Galaxy, R, and programming are presented as useful for automated, reproducible analysis of large genomic datasets.
BioMake is a language for specifying build networks of interdependent computational tasks. It allows defining targets with logical patterns that represent tasks. Targets have dependencies on other targets and are built by running actions. This allows automating sequencing analysis pipelines by specifying the execution of tasks like formatting databases and running BLAST alignments in a declarative way.
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
The document summarizes an update on the Biopython project and discusses the Python ecosystem for bioinformatics. It provides an overview of Biopython's features for working with biological data, examples of using Biopython modules like Entrez and AlignIO, and upcoming goals like moving to Subversion and supporting more file formats. It also discusses related Python packages like NumPy, SciPy, Matplotlib for visualization, and Jython/IronPython for interacting with virtual machines.
This document discusses Q-normalization, a method for normalizing gene expression data. It presents parallel implementations of Q-normalization using shared memory, message passing, and GPU architectures. Benchmarking shows the GPU implementation provides a 5.5x speedup over the sequential CPU version for processing large gene expression datasets. The shared memory implementation provides a 2.9x total speedup, while the message passing version is suitable for distributed memory clusters.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
Geared towards bioinformatics students and taking a somewhat humoristic point of view, this presentation explains what bioinformaticians are and what they do.
The document proposes HUG, a hardware and software system to efficiently process large genomic data sets. HUG includes a hardware accelerator to speed up genomic analysis algorithms like Smith-Waterman and protein folding. It also accelerates genomic variant calling, which identifies DNA changes associated with diseases but requires comparing to large databases. The system implements regular expression matching for genomic data using a reconfigurable instruction set architecture circuit, achieving a 6x speedup over software. It aims to integrate algorithms and develop visualization tools to facilitate analysis of computed genomic data.
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit
The document discusses KeyGene's use of Apache Spark for high-throughput genomics data analysis. KeyGene is a crop innovation company that analyzes genomic data from thousands of plants to improve crop traits like yield and quality. They previously used conventional HPC clusters for genomics pipelines but found Spark enabled more interactive analysis. KeyGene developed a "Sparkified" genomics pipeline using tools like BWA, GATK and their own Guacamole variant caller. This allowed interactive variant selection and GWAS using Spark SQL queries, demonstrating Spark is well-suited for interactive plant genomics analysis.
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
The Infobiotics BioProgramming Language & Workbench provides a computer-aided design environment for synthetic biology. It integrates simulation, verification, and compilation capabilities through an iterative workflow. The Infobiotics Language (IBL) allows users to define synthetic biology parts, rules, and devices. IBL supports abstraction, encapsulation, and hierarchical organization. The workbench performs stochastic simulation, model checking for verification, and biomatter compilation to generate DNA sequences. It aims to enable more reliable engineering of synthetic biological circuits.
This document summarizes bioinformatics tools that can be used for analysis of high-throughput sequencing data for molecular diagnostics. It discusses databases for virulence factors and antimicrobial resistance as well as tools for assembly, annotation, pan-genome analysis, visualization, and commercial solutions. The presentation emphasizes that there is no single best tool and different approaches are needed for different questions. Collaboration with other researchers is recommended.
1. The Fletcher Framework provides a standardized way to integrate FPGAs into heterogeneous computing systems using the Apache Arrow in-memory data format.
2. Arrow avoids serialization overhead by using a standardized columnar format that allows for efficient data movement and hardware interfacing.
3. Fletcher generates hardware interfaces from Arrow schemas and provides runtime interfaces for languages like C++ and Python to accelerate algorithms on FPGAs using the Arrow data format.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
More Related Content
Similar to Portable and reproducible bioinformatic analysis. Neoantigen discovery.
The document summarizes the Biopython project and the Python ecosystem for bioinformatics. It discusses how Biopython provides tools for working with biological data, such as reading and writing sequences and restriction enzymes. It also describes related Python packages like NumPy, SciPy, and Matplotlib that are useful for scientific computing and visualization. Finally, it outlines future goals for Biopython and opportunities for community contributions.
This document provides an introduction to biological databases. It discusses primary databases like GenBank which contain original sequence submissions and secondary databases derived from primary data, maintained by third parties like NCBI. Some key databases mentioned include GenBank, PDB, Swiss-Prot. The document also provides an overview of the NCBI and Entrez retrieval system, which allows integrated searches across literature and sequences.
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
This document provides an overview of bioinformatics tools and services for analyzing big data in biomedical research. It discusses traditional bioinformatics tools, analyzing genomic data from microarrays and next-generation sequencing without and with code, interpreting results using protein interaction networks and pathways, tools for data storage, cleaning and visualization, and making research reproducible. Galaxy, R, and programming are presented as useful for automated, reproducible analysis of large genomic datasets.
BioMake is a language for specifying build networks of interdependent computational tasks. It allows defining targets with logical patterns that represent tasks. Targets have dependencies on other targets and are built by running actions. This allows automating sequencing analysis pipelines by specifying the execution of tasks like formatting databases and running BLAST alignments in a declarative way.
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
The document summarizes an update on the Biopython project and discusses the Python ecosystem for bioinformatics. It provides an overview of Biopython's features for working with biological data, examples of using Biopython modules like Entrez and AlignIO, and upcoming goals like moving to Subversion and supporting more file formats. It also discusses related Python packages like NumPy, SciPy, Matplotlib for visualization, and Jython/IronPython for interacting with virtual machines.
This document discusses Q-normalization, a method for normalizing gene expression data. It presents parallel implementations of Q-normalization using shared memory, message passing, and GPU architectures. Benchmarking shows the GPU implementation provides a 5.5x speedup over the sequential CPU version for processing large gene expression datasets. The shared memory implementation provides a 2.9x total speedup, while the message passing version is suitable for distributed memory clusters.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
Geared towards bioinformatics students and taking a somewhat humoristic point of view, this presentation explains what bioinformaticians are and what they do.
The document proposes HUG, a hardware and software system to efficiently process large genomic data sets. HUG includes a hardware accelerator to speed up genomic analysis algorithms like Smith-Waterman and protein folding. It also accelerates genomic variant calling, which identifies DNA changes associated with diseases but requires comparing to large databases. The system implements regular expression matching for genomic data using a reconfigurable instruction set architecture circuit, achieving a 6x speedup over software. It aims to integrate algorithms and develop visualization tools to facilitate analysis of computed genomic data.
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit
The document discusses KeyGene's use of Apache Spark for high-throughput genomics data analysis. KeyGene is a crop innovation company that analyzes genomic data from thousands of plants to improve crop traits like yield and quality. They previously used conventional HPC clusters for genomics pipelines but found Spark enabled more interactive analysis. KeyGene developed a "Sparkified" genomics pipeline using tools like BWA, GATK and their own Guacamole variant caller. This allowed interactive variant selection and GWAS using Spark SQL queries, demonstrating Spark is well-suited for interactive plant genomics analysis.
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
The Infobiotics BioProgramming Language & Workbench provides a computer-aided design environment for synthetic biology. It integrates simulation, verification, and compilation capabilities through an iterative workflow. The Infobiotics Language (IBL) allows users to define synthetic biology parts, rules, and devices. IBL supports abstraction, encapsulation, and hierarchical organization. The workbench performs stochastic simulation, model checking for verification, and biomatter compilation to generate DNA sequences. It aims to enable more reliable engineering of synthetic biological circuits.
This document summarizes bioinformatics tools that can be used for analysis of high-throughput sequencing data for molecular diagnostics. It discusses databases for virulence factors and antimicrobial resistance as well as tools for assembly, annotation, pan-genome analysis, visualization, and commercial solutions. The presentation emphasizes that there is no single best tool and different approaches are needed for different questions. Collaboration with other researchers is recommended.
1. The Fletcher Framework provides a standardized way to integrate FPGAs into heterogeneous computing systems using the Apache Arrow in-memory data format.
2. Arrow avoids serialization overhead by using a standardized columnar format that allows for efficient data movement and hardware interfacing.
3. Fletcher generates hardware interfaces from Arrow schemas and provides runtime interfaces for languages like C++ and Python to accelerate algorithms on FPGAs using the Arrow data format.
Similar to Portable and reproducible bioinformatic analysis. Neoantigen discovery. (20)
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
2. 2
Agenda
1. Genome sequencing and bioinformatics
2. Constructing portable and reproducible bioinformatics analysis in Common Workflow Language (CWL)
3. Executing bioinformatic analysis on the cloud (Cancer Genomics Cloud platform)
4. Jupyter Notebook bioinformatic analysis on the cloud
5. Bioinformatic analysis example: Discovery of neoantigen cancer markers in the era of NGS data
4. 4
DNA - the code of life
● DeoxyriboNucleic Acid
● Same in every cell (almost)
● DNA replicates during cell division
● Base (nucleotide) complementary bases
○ A - T (adenine and thymine)
○ C - G (cytosine and guanine)
● 3 billion base-pairs x 2
CTGGATTATATATAAATACGAAGGGACTAT... etc
● Intron and exom (2%)
● ~99.6% same between 2 individuals
5. Genome sequencing
● Digitalization of genome
● Human Genome Project (1990-2003), 3B $
● Birth of bioinformatics
● Sanger sequencing (First generation sequencing)
○ Long (took 13 years)
○ Costly (3B$ for one human genome)
● Currently NGS (next generation sequencing)
○ Illumina
○ Around 200$ and 1 day needed to sequence the genome
● Also third generation sequencing in use
○ Longer read-length (up to 50k base)
○ Oxford nanopore, PacBio
○ Higher error rate
○ Smaller in size
○ Sequencing in space (Mark and Scott Kelly)
5
6. 6
6
Why perform DNA sequencing?
Stanford University
● Rare genetic diseases
● Origins of humans
● Precision medicine - Cancer
treatment (immunotherapy)
● Microbes that live inside us
(microbiome)
● Study ways that genomes
work
7. Illumina sequencing
● Read - DNA fragment after reading it in sequencer
● Typical whole genome sequencing experiment:
○ 200-500 million reads
○ 50-150 bases (letters) long
7
13. Sequencing (sum up)
1. Shearing (fragmentation of the genome)
2. Attaching adapters
3. PCR amplification (optional)
4. Attaching template to surface/flowcel
5. PCR/bridge amplification (cluster creation)
6. Adding fluorescent bases and taking a picture after each cycle (repeat
this many times)
7. Stack up images and read the sequence
13
16. 16
Bioinformatics to the rescue!
Bioinformatics, n. The science of information and information flow in biological systems, esp. of the use of computational methods
in genetics and genomics. (Oxford English Dictionary)
Bioinformatics - using statistical and computing methods that aim to solve biological problems.
17. Secondary genomics analysis
● Genomes of the all species are arrays of nucleotides (A, T, C, G) - strings
● The process of DNA sequencing returns only fragments of it
● Our mission: RECONSTRUCT IT!
17
18. Sequencing data - FASTQ file
4 lines for each read
● Read id
● Read sequence
● + sign
● ASCII encoded quality
18
19. Genome reconstruction
Result of sequencing experiment
● FASTQ file
● 100-500 GB
● Each read(line) containing a genome sequence 50-250 bp long
19
20. Genome reconstruction
How do we reconstruct genome from reads?
1. Alignment
○ Using reference genome to map the
position of the reads
2. Assembly
○ Reconstructing the genome by finding the
links between the reads
20
37. Common Workflow Language
● CWL is a way to describe command line tools execution
● Every tool has defined set of inputs and outputs
● Every tool is executed in its own environment (Docker)
● Execution on the cloud or local environment
● Enables portable and reproducible execution
message
echo
Used by CWL executor
39. What is a CWL workflow?
● Acyclic graph of tools connected to perform some analysis
● Workflow’s nodes are:
○ Inputs (file or parameter)
○ Tools
○ Outputs
○ Workflow
FASTQ
SAM
Fasta
BWA-MEM
bwa mem ref.fa read1.fq read2.fq > aln.sam
sam2bam aln.sam > aln.bam SAM2BAM
41. How to build a workflow?
https://github.com/rabix/composer
42. What is Docker?
● Docker is a light-weight virtual environment
● Allows you to package the tool (e.g. Python script
or some C program) with all of its dependencies
into the standardized unit for software
development
● Docker containers run on any computer, on any
infrastructure
● Layered container structure
● Can directly access resources of host operating
system
43. Docker file
FROM ubuntu:16.04
MAINTAINER vladimir.kovacevic@sbgenomics.com
RUN apt-get update && apt-get install -y wget
make
gcc
zlib1g-dev
WORKDIR /opt
RUN wget https://github.com/bwa/releases/bwa-
0.7.15.tar.bz2
RUN tar xfj bwa-0.7.15.tar.bz2
WORKDIR /opt/bwa-0.7.15
RUN make
COPY Dockerfile /opt/Dockerfile
# Build image from Dockerfile and push to docker repo
docker build -t images.sbgenomics.com/vladimirk/bwa:0.7.15 .
docker push images.sbgenomics.com/vladimirk/bwa:0.7.15
44. Common Workflow Language
● Define inputs and outputs of a command line tool,
runtime and requirements
● Define how to connect command line tools,
creating a workflow
● Ensure reproducibility and portability
● Think of CWL as a detailed recipe!
46. Cancer Genomics Cloud platform
● Two petabytes of multi-
dimensional genomics data
available to ~3800 authorized
researchers to analyse on the
cloud
● The Cancer Genome Atlas
(TCGA), a landmark cancer
genomics program, molecularly
characterized over 20,000
primary cancer and matched
normal samples
● Free registration for academia
with $300 credit!
48. ...and run it!
PhiX is an icosahedral, nontailed bacteriophage with a
single-stranded DNA. It has a tiny genome with 5386
nucleotides and was the first DNA genome to be
sequenced by Fred Sanger. Due to its small, well-defined
genome sequence, PhiX has been commonly used as a
control for Illumina sequencing runs.
49. So, what just happened?
● Request for default (c4.2xlarge) instance sent to aws
● Initialize instance
● cwl.job.json created from task inputs and parameters
● Together with cwl.app.json sent to initialized aws instance
● Download input files to the aws instance
● Download of docker image(s) of the tool(s)
● Run the tool inside docker container
● Collect marked outputs and upload them to the cloud storage
attached to our platform’s project
54. HLA Typing
● The HLA gene family provides instructions for making a group
of related proteins known as the human leukocyte antigen
(HLA) complex.
● The HLA complex helps the immune system distinguish the
body's proteins from proteins made by foreign invaders such as
viruses and bacteria.
● HLA typing has been widely used for reducing the
risk of organ rejection
● Specific HLA variants are associated with both
autoimmune (e.g. type 1 diabetes, rheumatoid
arthritis) and infectious (e.g. HIV, Hepatitis C)
diseases HLA
57. Local executor
Runnable from the command line
Suitable for local testing and development
./rabix [OPTIONS] <app> <inputs>
rabix.io
https://github.com/rabix/bunny
HLA
59. Interactive analysis
Run python/R Jupyter Notebook on the cloud
Further process outputs from bioinformatics tasks
HLA
pattern = 'ACCT'
with open('/sbgenomics/project-
files/PhiX_genome.fasta', 'r') as
myfile:
data=myfile.readlines()
data = ''.join(data).replace('n', '')
cnt = 0
for i in range(0, len(data) -
len(pattern)):
if data[i:i+len(pattern)] ==
pattern:
cnt += 1
print(cnt, i)
60. Microbiome Differential Abundance Analysis
Detect microbes that are differentially abundant between disease-
control (~500 each) samples
HLA
62. What is cancer?
Mutation (error) during DNA replication can fall to:
1. Intron (no change)
2. Important gene (cell dies, organism lives)
3. Gene that stops regulation of the cell division (cell
lives, organism...)
What causes cancer (increases probability of mutation)?
1. EM radiation
2. Chemical agents
3. Free radicals
4. Genetic factors
5. Infections (viruses)
A dividing lung cancer cell.
Credit: National Institutes of Health
62
64. ● Neoantigens - proteins presented only
by cancer cells
● When neoantigen is known -> immune
T-cells can be “programmed” to destroy
cancer cells
● These unique cancer markers could be a
key to developing a new generation of
personalized, targeted cancer
immunotherapies
Neoantigens
Yugang Guo et all, Neoantigen Vaccine Delivery for Personalized Anticancer Immunotherapy
64
65. How can we discover neoantigens?
1. Reconstruct DNA of tumor, DNA normal and RNA of tumor tissue
65
66. How can we discover neoantigens?
2. Compare DNA from Tumor and Normal tissue
66
67. C →T
A →T (ignored) G →TC
exon 1 intron intron exon 3
Start
codon
exon 2DNA
Start
codon
Stop
codon
MCYEVILQNFHGVAKKRTGYHYKVGRGRALLSVES
exon 1 exon 3exon 2
Stop
codon
ILQNFHGVAKKRTGYHYKVGR
A →GG
Somatic variant (mutations present
in tumor)
A C
GG T
G
TCRNA
Protein
How can we discover neoantigens?
3. Protein extraction 67
68. How can we discover neoantigens?
4. Discover HLA type from genome
(translates to MHC molecule)
5. Perform scoring of protein-HLA sets MHC
Mutation HLA type peptide
NetMHC
score
Pickpocket
score
NetCTLPan
score
RNA
expression
1_111957245_C_A HLA-A*02:01 MMLSSSPV 0.881 0.633 1.09815 11.5
8_144392368_T_C HLA-A*02:01 WLLEKLEQL 0.828 1.097 1.06133 12.5
17_28537638_C_T HLA-A*02:01 VLDEFPHV 0.836 0.374 1.06015 23
68
70. Neoantigen workflow validation
Tumor Neoantigen Selection Alliance (TESLA) Challenge
Flow-cytometry-validated protein-HLA sets (>10 patients)
SBG Neoantigen workflow detected and rank high the majority of confirmed
neoantigens
70
71. Neoantigen cancer vaccine
Yugang Guo et all, Neoantigen Vaccine Delivery for Personalized Anticancer Immunotherapy
Status Indication Reference
Phase I Melanoma
(stage III and IV)
Ugur Sahin et all, Personalized RNA mutanome vaccines mobilize poly-
specific therapeutic immunity against cancer
Phase I Melanoma
(stage IIIB/C and IVM1a/b)
Patrick A. Ott et all, An immunogenic personal neoantigen vaccine for patients
with melanoma
Preclinical study MC-38 colon cancer Mahesh Yadav et all, Predicting immunogenic tumour mutations by combining
mass spectrometry and exome sequencing
Preclinical study B16F10 melanoma Mutant MHC class II epitopes drive therapeutic immune responses to cancer
Preclinical study A2.DR1 sarcoma A vaccine targeting mutant IDH1 induces antitumour immunity
Preclinical study B16F10 melanoma Exploiting the Mutanome for Tumor Vaccination
Phase I Melanoma (stage III) Beatriz M. Carreno et all, A dendritic cell vaccine increases the breadth and
diversity of melanoma neoantigen-specific T cells
71
72. More than 20 gene therapy drugs obtained FDA approval:
● Novartis - 83% (52/63) of patients complete or partial remission
● Advaxis - target hotspot mutations that commonly occur in specific cancer
types. More than 10 drug candidates have been designed for different
tumor types in the ADXS-HOT program
Cons of immunotherapy
● Autoimmune disease
● Very expensive
Neoantigen cancer vaccine
72