A Topic i had presented at the Cloud and DevOps stream at TechXLR8 London 2017. Topic covers using Google cloud , Kubernetes , docker and cloud functions to create a managed distributed compute infrastructure to generate synthetic genomic data for simulation and infrastructure testing needs.
Next generation sequencing: research opportunities and bioinformatic challenges. A seminar I gave for the Computational Life Science (Univ. of Oslo) seminar series, March 2, 2011
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
This is a talk I gave at the Institute for Genomics & System Biology (IGSB) on December 7, 2009. The talk looks at the role of cloud computing platforms, including private clouds, for managing the large data produced by next generation sequencing platforms.
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Here we introduce VariantSpark, which utilizes Hadoop/Spark along with its machine learning library, MLlib, providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark is the interface to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results.
To demonstrate the capabilities of VariantSpark, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.
The package is written in Scala and available at https://github.com/BauerLab/VariantSpark.
Population-scale high-throughput sequencing data analysisDenis C. Bauer
Unprecedented computational capabilities and high-throughput data collection methods promise a new era of personalised, evidence-based healthcare, utilising individual genomic profiles to tailor health management as demonstrated by recent successes in rare genetic disorders or stratified cancer treatments. However, processing genomic information at a scale relevant for the health-system remains challenging due to high demands on data reproducibility and data provenance. Furthermore, the necessary computational requirements requires a large investment associated with compute hardware and IT personnel, which is a barrier to entry for small laboratories and difficult to maintain at peak times for larger institutes. This hampers the creation of time-reliable production informatics environments for clinical genomics. Commercial cloud computing frameworks, like Amazon Web Services (AWS) provide an economical alternative to in-house compute clusters as they allow outsourcing of computation to third-party providers, while retaining the software and compute flexibility.
To cater for this resource-hungry, fast pace yet sensitive environment of personalized medicine, we developed NGSANE, a Linux-based, HPC-enabled framework that minimises overhead for set up and processing of new projects yet maintains full flexibility of custom scripting and data provenance when processing raw sequencing data either on a local cluster or Amazon’s Elastic Compute Cloud (EC2).
Next generation sequencing: research opportunities and bioinformatic challenges. A seminar I gave for the Computational Life Science (Univ. of Oslo) seminar series, March 2, 2011
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
This is a talk I gave at the Institute for Genomics & System Biology (IGSB) on December 7, 2009. The talk looks at the role of cloud computing platforms, including private clouds, for managing the large data produced by next generation sequencing platforms.
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Here we introduce VariantSpark, which utilizes Hadoop/Spark along with its machine learning library, MLlib, providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark is the interface to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results.
To demonstrate the capabilities of VariantSpark, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.
The package is written in Scala and available at https://github.com/BauerLab/VariantSpark.
Population-scale high-throughput sequencing data analysisDenis C. Bauer
Unprecedented computational capabilities and high-throughput data collection methods promise a new era of personalised, evidence-based healthcare, utilising individual genomic profiles to tailor health management as demonstrated by recent successes in rare genetic disorders or stratified cancer treatments. However, processing genomic information at a scale relevant for the health-system remains challenging due to high demands on data reproducibility and data provenance. Furthermore, the necessary computational requirements requires a large investment associated with compute hardware and IT personnel, which is a barrier to entry for small laboratories and difficult to maintain at peak times for larger institutes. This hampers the creation of time-reliable production informatics environments for clinical genomics. Commercial cloud computing frameworks, like Amazon Web Services (AWS) provide an economical alternative to in-house compute clusters as they allow outsourcing of computation to third-party providers, while retaining the software and compute flexibility.
To cater for this resource-hungry, fast pace yet sensitive environment of personalized medicine, we developed NGSANE, a Linux-based, HPC-enabled framework that minimises overhead for set up and processing of new projects yet maintains full flexibility of custom scripting and data provenance when processing raw sequencing data either on a local cluster or Amazon’s Elastic Compute Cloud (EC2).
IDT provides a range of solutions for targeted next generation sequencing. Labs processing hundreds to thousands of samples can create highly uniform, custom panels using xGen® Lockdown Probes. The new xGen Acute Myeloid Leukemia (AML) panel is a predesigned set of Lockdown Probes that captures 260 genes identified by whole genome and exome sequencing of 200 patient samples. The AML panel can be used as stand-alone or customized with additional probes to detect other targets of interest.
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific
Agilent’s SureSelectXT2 baits are popular options for target capture because they offer offer a wide range of predesigned baits and flexible customization options which allow users to design their own capture panels. Incorporating index-specific barcode blockers during library prep allow researchers to obtain a higher percentage of on-target reads and better coverage from their SureSelectXT2 target capture experiments. The NEXTflex™ Pre- and Post- Capture Combo Kit (Agilent SureSelectXT2 Compatible) incorporates index-specific barcode blockers allowing researchers to get more useful data from their Agilent SureSelectXT2 Target Capture sequencing runs.
Improving exome sequencing, targeted sequencing, and low frequency variant de...Laura Berry
Presented in the NGS Tech & Applications Strand of the 4Bio Summit. To find out more, visit:
www.global-engage.com
In this presentation, Xiangyu Rao at Integrated DNA Technologies, discusses the Illumina partnership for exome sequencing and other NGS product beings developed at IDT.
How novel compute technology transforms life science researchDenis C. Bauer
Unprecedented data volumes and pressure on turnaround time driven by commercial applications require bioinformatics solutions to evolve to meed these new demands. New compute paradigms and cloud-based IT solutions enable this transition. Here I present two solution capable of meeting these demands for genomic variant analysis, VariantSpark, as well as genome engineering applications, GT-Scan2.
VariantSpark classifies 3000 individuals with 80 Million genomic variants each in under 30 minutes. This Hadoop/Spark solution for machine learning application on genomic data is hence capable to scale up to population size cohorts.
GT-Scan2, identifies CRISPR target sites by minimizing off-target effects and maximizing on-target efficiency. This optimization is powered by AWS Lambda functions, which offer an “always-on” web service that can instantaneously recruit enough compute resources keep runtime stable even for queries with several thousand of potential target sites.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Next-generation sequencing (NGS) has revolutionized the way we analyze diseases and commercial outfits such as Illumina, Helicos, QIAGEN and Pacific Biosciences have made significant contributions. In addition, the launch of direct-to-consumer genetic testing solutions has dramatically changed the way consumers access genomics data. Until a few years ago, the cost of sequencing was a major bottleneck. Recent developments have reduced the cost from thousands of dollars to a couple of cents per megabase. When did these changes start? What were the changes in the commercial sector in the last 15 years? This infographic is a timeline of the NGS commercial marketplace.
Timothy Dawes of Genentech and Elliot Hui of the University of California, Irvine share their well-received presentation from SLAS2017 in Washington, DC.
Adam Weinglass and Mary Jo Wildey from Merck & Co. share their winning presentation from SLAS2017 in Washington, DC. Join the conversation in the SLAS Screen Design and Assay Technology Special Interest Group LinkedIn group at https://www.linkedin.com/groups/3867725.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.3- Introduction to NGS Variant Calling Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Genome Simulation & Applications: Use of Managed Distributed Compute Infrastr...Nick Brown
Presentation by Hari Radhakrishnan (Solution Engineer) in my team at the Cloud & DevOps stream at techXLR8 in London on 14th June 2017 (https://tmt.knect365.com/cloud-devops-world/ ). This provided a pretty detailed overview about how we combined the use of Google Cloud with Kubernetes, Docker and other cloud functions to create a managed distributed compute infrastructure to generate synthetic genomics data for simulation and testing needs. As part of the Technology Incubation Labs, this highlights the sort of technical project that we drive to demonstrate what emerging technologies can offer to our internal business stakeholders.
IDT provides a range of solutions for targeted next generation sequencing. Labs processing hundreds to thousands of samples can create highly uniform, custom panels using xGen® Lockdown Probes. The new xGen Acute Myeloid Leukemia (AML) panel is a predesigned set of Lockdown Probes that captures 260 genes identified by whole genome and exome sequencing of 200 patient samples. The AML panel can be used as stand-alone or customized with additional probes to detect other targets of interest.
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific
Agilent’s SureSelectXT2 baits are popular options for target capture because they offer offer a wide range of predesigned baits and flexible customization options which allow users to design their own capture panels. Incorporating index-specific barcode blockers during library prep allow researchers to obtain a higher percentage of on-target reads and better coverage from their SureSelectXT2 target capture experiments. The NEXTflex™ Pre- and Post- Capture Combo Kit (Agilent SureSelectXT2 Compatible) incorporates index-specific barcode blockers allowing researchers to get more useful data from their Agilent SureSelectXT2 Target Capture sequencing runs.
Improving exome sequencing, targeted sequencing, and low frequency variant de...Laura Berry
Presented in the NGS Tech & Applications Strand of the 4Bio Summit. To find out more, visit:
www.global-engage.com
In this presentation, Xiangyu Rao at Integrated DNA Technologies, discusses the Illumina partnership for exome sequencing and other NGS product beings developed at IDT.
How novel compute technology transforms life science researchDenis C. Bauer
Unprecedented data volumes and pressure on turnaround time driven by commercial applications require bioinformatics solutions to evolve to meed these new demands. New compute paradigms and cloud-based IT solutions enable this transition. Here I present two solution capable of meeting these demands for genomic variant analysis, VariantSpark, as well as genome engineering applications, GT-Scan2.
VariantSpark classifies 3000 individuals with 80 Million genomic variants each in under 30 minutes. This Hadoop/Spark solution for machine learning application on genomic data is hence capable to scale up to population size cohorts.
GT-Scan2, identifies CRISPR target sites by minimizing off-target effects and maximizing on-target efficiency. This optimization is powered by AWS Lambda functions, which offer an “always-on” web service that can instantaneously recruit enough compute resources keep runtime stable even for queries with several thousand of potential target sites.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Next-generation sequencing (NGS) has revolutionized the way we analyze diseases and commercial outfits such as Illumina, Helicos, QIAGEN and Pacific Biosciences have made significant contributions. In addition, the launch of direct-to-consumer genetic testing solutions has dramatically changed the way consumers access genomics data. Until a few years ago, the cost of sequencing was a major bottleneck. Recent developments have reduced the cost from thousands of dollars to a couple of cents per megabase. When did these changes start? What were the changes in the commercial sector in the last 15 years? This infographic is a timeline of the NGS commercial marketplace.
Timothy Dawes of Genentech and Elliot Hui of the University of California, Irvine share their well-received presentation from SLAS2017 in Washington, DC.
Adam Weinglass and Mary Jo Wildey from Merck & Co. share their winning presentation from SLAS2017 in Washington, DC. Join the conversation in the SLAS Screen Design and Assay Technology Special Interest Group LinkedIn group at https://www.linkedin.com/groups/3867725.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.3- Introduction to NGS Variant Calling Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Genome Simulation & Applications: Use of Managed Distributed Compute Infrastr...Nick Brown
Presentation by Hari Radhakrishnan (Solution Engineer) in my team at the Cloud & DevOps stream at techXLR8 in London on 14th June 2017 (https://tmt.knect365.com/cloud-devops-world/ ). This provided a pretty detailed overview about how we combined the use of Google Cloud with Kubernetes, Docker and other cloud functions to create a managed distributed compute infrastructure to generate synthetic genomics data for simulation and testing needs. As part of the Technology Incubation Labs, this highlights the sort of technical project that we drive to demonstrate what emerging technologies can offer to our internal business stakeholders.
Production Bioinformatics, emphasis on ProductionChris Dwan
Production bioinformatics at Sema4 can be thought of as data ops - a peer to the lab ops organization. We operate 24/7 to deliver correct and timely results on NGS and other data for thousands of samples per week. This deck introduces the Prod BI organization and systems architecture with a focus on what it takes to run bioinformatics in production rather than for R&D or pure research.
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
"Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in Cytometry" was an Invited Tutorial given at the 2019 CYTO conference for the the International Society for the Advancement of Cytometry on the 22nd May 2019. This tutorial was recorded and we expect that it will be converted to a CYTOU webinar in the near future.
This tutorial will begin by explaining why the emerging field of Genomic Cytometry, i.e. the measurement of cells using genomic techniques (e.g. sequencing), in conjunction with more traditional cytometry techniques such as fluorescence, mass and imaging cytometry is becoming a standard tool for biologists looking to unravel complex cellular processes and to develop a deeper understanding of heterogeneity.
We will give a detailed overview of the various technologies that have allowed the emergence of Genomic Cytometry as well as those that continue to push the boundaries of cellular characterisation.
We will then provide a basic overview of the sequencing process such that both research cytometerists and the staff for the cytometry SRL are better equipped to understand the downstream genomic component of Genomic Cytometry.
Finally, we will wrap up the session with case studies that illustrate the power of the genomic cytometry approach and will give a brief outline of where we feel the field needs to go as it matures. We expect attendees will gain a better understanding of 1) the rapidly maturing field of Genomic Cytometry and 2) how Genomic Cytometry should be leveraged into more traditional cytometry workflows.
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...Dag Endresen
Regional GBIF NODES meeting of Europe in March 2010. Presentation of current activities from the NordGen NODE. Implementations of the GBIF IPT toolkit for genebanks in Europe. Upgrade for selected genebanks from the BioCASE publishing toolkit to the IPT. First step of a scheduled larger implementation planned to start in 2011 as part of the EuroGeneBank application pending EU funding decision. NordGen IPT EURISCO
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
Genetic data is the foundation of precision medicine. Next Generation Sequencing(NGS) enable us to get our whole genome data in affordable cost. How to process huge amount of NGS data effectively ?
This is a talk titled "Cloud-Based Services For Large Scale Analysis of Sequence & Expression Data: Lessons from Cistrack" that I gave at CAMDA 2009 on October 6, 2009.
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Paolo Missier
A paper presented at the annual Italian Database conference (SEBD): http://sisinflab.poliba.it/sebd/2018/
here is the paper: http://sisinflab.poliba.it/sebd/2018/papers/June-27-Wednesday/1-Big-Data/SEBD_2018_paper_23.pdf
UCSC's Biomolecular Department Eliminates I/O Bottleneck with PanasasPanasas
Slow I/O and downtime impacted the run times of the University of California Santa Cruz's Genome Browser search tool used by scientists in their work to solve questions of the postgenomic era. They were searching for a storage solution that delivered high performance random I/O to an exceptionally large number of cluster nodes and one that would allow them to focus solely on their tests instead of the systems running them.
FabSim: Facilitating computational research through automation on large-scale...Derek Groen
We present FabSim, a toolkit developed to simplify a range of computational tasks for researchers in diverse disciplines. FabSim is flexible, adaptable, and allows users to perform a wide range of tasks with ease. It also provides a systematic way to automate the use of resources, including HPC and distributed machines, and to make tasks easier to repeat by recording contextual information. To demonstrate this, we present three use cases where FabSim has enhanced our research productivity. These include simulating cerebrovascular bloodflow, modelling clay-polymer nanocomposites across multiple scales, and calculating ligand–protein binding affinities.
This poster is a summary of our open access paper in Computer Physics Communications, see http://dx.doi.org/10.1016/j.cpc.2016.05.020 . It was presented at the Solvay Symposium on Multiscale Modelling in Brussels in April 2016.
Similar to Genome simulation and applications (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
1. Genome (FASTQ and VCF) Simulation & Applications
Hariprasad Radhakrishnan
AstraZeneca, Technology Labs, UK
2. Genome Simulation
AstraZeneca
We are a global, science-led
biopharmaceutical business
pushing the boundaries of science
to deliver life-changing medicines.
61,500
employees worldwide
$23bn
2016 Revenue*
100+
Countries
3. Our Experiment –
• Quick introduction to DNA - DNA Sequencing
• synthetically generated Genome data (FASTQ & VCF)!
• How to Scale/ run Distributed Compute using Kubernetes and
Docker
Hari works as an Associate Architect - Data &
Analytics in the UK Tech Incubation Lab of
AstraZeneca.
Genome Simulation
Introduction
7. In a perfect world (just your 3 billion letters):
~700 megabytes
In the real world, right off the genome sequencer:
100~200 gigabytes
Genome Simulation
Human Genome Sequencing
For maybe 100,000 + samples
10~20 petabytes
9. • Be able to Simulate high-throughput Genome sequencing.
• The Genome generated can be used to test existing pipelines and
infrastructure.
• If run in a distributed mode can generate sufficient data to create
near production line scenarios, the data could be used to test the
ingestion, processing through the pipeline and subsequent analytic
tools.
• Synthetic so no issues with privacy, patient de – identification,
transfer across regions.
Genome Simulation
Why Simulate Genomic Data
12. • VarSim was picked as the tool for genome simulation, it provided
ways were variations could be introduced in a random fashion into
Genome Simulation and the output FASTQ and VCF files would be unique
from each other
• Other tools were considered, but were not maintained or did not
provide sufficient flexibility.
Genome Simulation
Tool Selection
13. References
John C. Mu, Marghoob Mohiyuddin, Jian Li, Narges Bani Asadi, Mark
B. Gerstein, Alexej Abyzov, Wing H. Wong, and Hugo Y.K. Lam
VarSim: A high-fidelity simulation and validation framework for
high-throughput genome sequencing with cancer applications
Bioinformatics first published online December 17,
2014doi:10.1093/bioinformatics/btu828
Summary:
VarSim is a framework for assessing alignment and
variant calling accuracy in high- throughput genome
sequencing through simulation or real data. In contrast
to simulating a raNdom mutation spectrum, it
synthesizes diploid genomes with germline and somatic
mutations based on a realistic model. This model
leverages information such as previously reported
mutations to make the synthetic genomes biologically
relevant.
Genome Simulation
VarSim
14. • VarSim is a Python/Java based tool that would simulate one genome per run.
• We looked into ways were we could parallelize the to generate more
genomes.
• Build Docker container/Image for the tool.
• Experiment run on Google Cloud – Container engine.
• Parameters like Coverage, Unique ID & Seed value were externalized in a
Lambda function that the Docker images could talk to and receive arguments
before execution.
• Output FASTQ files and VCF’s would then be stored in Cloud storage.
• Ability to choose to generate FASTQ & VCF or just VCF.
Genome Simulation
Technicalities
16. Using Docker
Registry on Google
Cloud.
Given the size of
the Docker Image it
made sense to take
advantage of the
high Network speeds
between servers on
the cloud for quick
deployment to the
Container Engine.
Genome Simulation
Container Registry
DOCKER IMAGE4.8 GB
17. Using Container
Registry on Google
Cloud.
Genome Simulation
Container Clusters
Can Reach a MAX cluster size of 1000
18. Configure the version of Docker image to be deployed to the cluster.
Kubernetes takes care of distributing the Docker image to all the instances
in the cluster.
Genome Simulation
Kubernetes - Container Clusters
20. • We have around 1000
unique VCF files
generated so far.
• We have around 10 Genome
FASTQ and VCF’s.
• More FASTQ and VCF if we
can fund it.
Cost
• Cost $1000 to generate 1000 unique VCF files. 1$ per VCF.
• Cost’s to generate FASTQ files vary based on the coverage required. For
a 50X coverage the costs work out around 5$ for the FASTQ and VCF. The
costs can be brought down by generating FATSQ files in multiple lanes.
Genome Simulation
Outcome
21. John C. Mu, Marghoob Mohiyuddin, Jian Li, Narges Bani Asadi, Mark B. Gerstein, Alexej Abyzov, Wing H.
Wong, and Hugo Y.K. Lam
VarSim: A high-fidelity simulation and validation framework for high-throughput genome sequencing
with cancer applications
Bioinformatics first published online December 17, 2014doi:10.1093/bioinformatics/btu828
To folks from Google - Daniel Bergqvist, Nico Gaviola & Craig Box. Mathew Woodwark, Nick Brown, Rob
Hernandez, Sandra Giuliani, Frank Lombardi from AstraZeneca for supporting this work.
Genome Simulation
References
Thanks
The human body has about 100 trillion cells with more than 200 different cell types. Each cell harbors the same genetic information in its nucleus in form of DNA containing chromosomes
DNA
Deoxyribonucleic acid (DNA) is the chemical inside the nucleus of all cells that carries the genetic instructions for making living organisms. A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder. The sides are made of sugar and phosphate molecules. The �rungs� are made of nitrogen-containing chemicals called bases. Each strand is composed of one sugar molecule, one phosphate molecule, and a base. Four different bases are present in DNA - adenine (A), thymine (T), cytosine (C), and guanine (G). The particular order of the bases arranged along the sugar - phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.
Each strand of the DNA molecule is held together at its base by a weak bond. The four bases pair in a set manner: Adenine (A) pairs with thymine (T), while cytosine (C) pairs with guanine (G). These pairs of bases are known as Base Pairs (bp).
These Base Pairs (bp) are the basis of Y-chromosome testing.
The human body has about 100 trillion cells with more than 200 different cell types. Each cell harbors the same genetic information in its nucleus in form of DNA containing chromosomes
DNA
Deoxyribonucleic acid (DNA) is the chemical inside the nucleus of all cells that carries the genetic instructions for making living organisms. A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder. The sides are made of sugar and phosphate molecules. The �rungs� are made of nitrogen-containing chemicals called bases. Each strand is composed of one sugar molecule, one phosphate molecule, and a base. Four different bases are present in DNA - adenine (A), thymine (T), cytosine (C), and guanine (G). The particular order of the bases arranged along the sugar - phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.
Each strand of the DNA molecule is held together at its base by a weak bond. The four bases pair in a set manner: Adenine (A) pairs with thymine (T), while cytosine (C) pairs with guanine (G). These pairs of bases are known as Base Pairs (bp).
These Base Pairs (bp) are the basis of Y-chromosome testing.
VarSim is a Python/Java based tool that would simulate one genome per run.
We looked into ways were we could parallelize the process and run simultaneously on multiple machines to generate more genomes.
We packaged the tool into a Docker container/Image so it can be easily shipped and deployed in a cluster.
As we had some experience in using Google Cloud, we decided to execute it in a cluster using a managed service called Container Engine (running Kubernetes). This allows us to spin multiple machines (upto 500) and deploy our Docker Image(Varsim) and execute.
Parameters like Coverage, Unique ID & Seed value was externalized in a Lambda function that the Docker images could talk to and receive arguments before execution.
Output FASTQ files and VCF’s would then be stored in Cloud storage.
Ability to choose to generate FASTQ & VCF or just VCF.