This document explores mapping variant call format (VCF) genomic data into resource description framework (RDF) triples. It presents examples of transforming VCF metadata, variant-level information, allele information, and genotype calls into RDF. It discusses challenges around materializing versus virtual transformations and questions whether the transformed data would be useful for querying to answer interesting biological questions.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
The call stack is a data structure that stores information about active subroutines in a computer program. It keeps track of the point to which each subroutine should return control when finished. Subroutines can call other subroutines, resulting in information stacking up on the call stack. The call stack is composed of stack frames containing state information for each subroutine. Debuggers like GDB allow viewing the call stack to see how the program arrived at its current state.
On a dare, HFM expert and Oracle ACE Chris Barbieri takes a Mythbusters style approach to debunking a series of potentially damaging changes to HFM applications to find out exactly "What would happen if I did...?"
This document discusses a modified coding technique to reduce peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It begins by introducing the PAPR problem in OFDM and existing solutions using Golay complementary sequences (GCS). It then presents definitions and theorems regarding recursive Golay complementary codes (RGCC) that can provide codewords with low PAPR. The paper proposes generating a 16-QAM OFDM signal from QPSK Golay complementary pairs using RGCC to achieve both higher coding rate and lower PAPR compared to existing techniques.
Safeguard Scientifics, Inc. (NYSE: SFE) has a distinguished track record of fostering innovation and building market leaders. For more than 60 years, Safeguard has been providing growth capital and operational support to entrepreneurs across an evolving spectrum of industries. Today, Safeguard is focused specifically on healthcare and technology. Safeguard is a proven partner for entrepreneurs looking to accelerate growth and build long-term value in their businesses. Investors in SFE have a unique opportunity to tap into the high potential of Safeguard's early- and growth-stage partners companies.
Syapse Presentation at Health2.0 Silicon Valley meetup 7-16-13David Parpart, D.C.
Syapse is a startup that aims to bring clinical omics data under control through a new data management platform. The platform enables the generation and use of multi-omics patient profiles from laboratories and healthcare providers to improve diagnosis and treatment. Syapse software includes products for hospitals and laboratories that integrate omics profiles, phenotypes, outcomes and medical knowledge to support clinical diagnostics workflows and precision medicine.
The document summarizes Dr. Matthieu-P. Schapranow's presentation at the Festival of Genomics in Boston on turning big medical data into precision medicine. It describes an in-memory database approach that enables real-time analysis of heterogeneous medical data sources. This allows clinicians and researchers to interactively explore patient data, clinical trials, pathways, and literature to obtain personalized treatment recommendations. The system was designed using a human-centered methodology to ensure usability, effectiveness, and feasibility for precision medicine applications.
The document discusses key considerations for big data aspects of precision medicine. It outlines 10 points regarding potential disparities in data capture, the importance of active participant involvement, ensuring enough statistical power through validation studies, exploiting network effects to connect multiple data sets while maintaining privacy, the need for third party access and usability of data, allowing imperfect initial data release, keeping data fresh over time, and leaving open questions for other researchers.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
The call stack is a data structure that stores information about active subroutines in a computer program. It keeps track of the point to which each subroutine should return control when finished. Subroutines can call other subroutines, resulting in information stacking up on the call stack. The call stack is composed of stack frames containing state information for each subroutine. Debuggers like GDB allow viewing the call stack to see how the program arrived at its current state.
On a dare, HFM expert and Oracle ACE Chris Barbieri takes a Mythbusters style approach to debunking a series of potentially damaging changes to HFM applications to find out exactly "What would happen if I did...?"
This document discusses a modified coding technique to reduce peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It begins by introducing the PAPR problem in OFDM and existing solutions using Golay complementary sequences (GCS). It then presents definitions and theorems regarding recursive Golay complementary codes (RGCC) that can provide codewords with low PAPR. The paper proposes generating a 16-QAM OFDM signal from QPSK Golay complementary pairs using RGCC to achieve both higher coding rate and lower PAPR compared to existing techniques.
Safeguard Scientifics, Inc. (NYSE: SFE) has a distinguished track record of fostering innovation and building market leaders. For more than 60 years, Safeguard has been providing growth capital and operational support to entrepreneurs across an evolving spectrum of industries. Today, Safeguard is focused specifically on healthcare and technology. Safeguard is a proven partner for entrepreneurs looking to accelerate growth and build long-term value in their businesses. Investors in SFE have a unique opportunity to tap into the high potential of Safeguard's early- and growth-stage partners companies.
Syapse Presentation at Health2.0 Silicon Valley meetup 7-16-13David Parpart, D.C.
Syapse is a startup that aims to bring clinical omics data under control through a new data management platform. The platform enables the generation and use of multi-omics patient profiles from laboratories and healthcare providers to improve diagnosis and treatment. Syapse software includes products for hospitals and laboratories that integrate omics profiles, phenotypes, outcomes and medical knowledge to support clinical diagnostics workflows and precision medicine.
The document summarizes Dr. Matthieu-P. Schapranow's presentation at the Festival of Genomics in Boston on turning big medical data into precision medicine. It describes an in-memory database approach that enables real-time analysis of heterogeneous medical data sources. This allows clinicians and researchers to interactively explore patient data, clinical trials, pathways, and literature to obtain personalized treatment recommendations. The system was designed using a human-centered methodology to ensure usability, effectiveness, and feasibility for precision medicine applications.
The document discusses key considerations for big data aspects of precision medicine. It outlines 10 points regarding potential disparities in data capture, the importance of active participant involvement, ensuring enough statistical power through validation studies, exploiting network effects to connect multiple data sets while maintaining privacy, the need for third party access and usability of data, allowing imperfect initial data release, keeping data fresh over time, and leaving open questions for other researchers.
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...Amazon Web Services
In the past ten years, the cost of sequencing a human genome has fallen from $3 billion dollars to $1,000, unlocking the ability for clinicians to use genomics in routine care. As the volume of genomic data used in the clinic begins to grow, healthcare providers are facing a number of new IT challenges, such as how to integrate this data with clinical data stored in electronic medical records, and how to make both available in real time to inform clinical decisions. In this session, find out how UCSF Medical Center and Syapse met these challenges head-on and solved them using AWS, all while remaining compliant with privacy and security requirements. Learn how Syapse's precision medicine platform uses Amazon VPC, Dedicated Instances, Amazon EC2, and Amazon EBS to build a high performance, scalable, and HIPAA-compliant data platform that enables UCSF to deliver on the promise of precision medicine by dramatically reducing time and increasing the accuracy and utility of genomic profiling in cancer treatment.
Electronic Medical Records: From Clinical Decision Support to Precision MedicineKent State University
This document discusses the transition from traditional clinical decision support using electronic medical records to precision medicine. It provides examples of how Cleveland Clinic has used electronic medical records to create registries for conditions like chronic kidney disease, develop predictive models, and power algorithms for precision treatment recommendations. The document envisions precision medicine relying on vast amounts of molecular, genomic, and patient-reported data integrated into clinical decision support.
The document outlines a potential partner pitch that includes sections on the organization, the customer problem being solved, the proposed solution and how a partnership could enhance it. It describes how the partnership would work, the underlying technology, a demonstration, competition if applicable, the management team, and next steps such as a trial period. The overall aim is to convince a potential partner to collaborate by explaining how the partnership could alleviate customer pain better than either could alone.
Opus Infosys is a software development and outsourcing company founded in 2007 and headquartered in Waltham, MA with an offshore development center in India. They are seeking partners to help acquire new clients in the US market by selling their services such as application development, software testing, and contract staffing. Partners will receive a predetermined percentage of profits generated by accounts they acquire as long as the clients continue using Opus Infosys services, with full transparency into financial information. The partnership model requires no investment and partners will benefit from Opus Infosys' goodwill and client portfolio.
The document describes Neev International, a company seeking to build mutually rewarding partnerships. It provides their website and discusses their vision of creating a space where people can bring ideas and help each other without expectation of monetary rewards. The company dreams of building a team to share visions and commitments. It commits to delivering value to clients by understanding their needs and delivering measurable benefits through proven solutions and methodologies.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against developing mental illness and improve symptoms for those who already have a condition.
The slide deck we used to raise half a million dollarsBuffer
This is the pitchdeck we used to raise half a million dollars from Angel investors. More here:
http://onstartups.com/tabid/3339/bid/98034/The-Pitch-Deck-We-Used-To-Raise-500-000-For-Our-Startup.aspx
The document discusses a new compiler architecture for the Dotty Scala Compiler (dsc) that takes inspiration from functional databases. The architecture treats all values as time-varying functions indexed by compilation phase. This allows the compiler to answer questions about program elements by looking up their meaning at a specific point in time. The core data types include time-indexed abstract syntax trees, types, references to declarations, and denotations, which capture the meaning of references. Caching is used to efficiently store and retrieve values across phases.
The document discusses the process of register allocation in compilers. It begins by establishing that registers are analogous to variable names in programs, as they both hold values. It then explains that the goal of register allocation is to minimize data movement between registers and memory. The rest of the document dives deeper into the concepts of models that track variable locations, life of variables, how models are transformed through the allocation process, techniques like spilling and splitting variables between registers and memory, and opportunities to formalize and verify aspects of register allocation.
This document provides an overview of ElasticSearch, an open source, distributed, RESTful search and analytics engine. It discusses how ElasticSearch is highly available, distributed across shards and replicas, and can be deployed in the cloud. Examples are provided showing how to index and search data via the REST API and retrieve cluster health information. Advanced features like faceting, scripting, parent/child relationships, and versioning are also summarized.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
The document discusses next generation sequencing methods and RNA sequencing. It covers topics like sequencing formats, data analysis workflows including mapping, clustering, assembly programs, finding new genes and correcting existing ones. It discusses input file types, calculating sequencing depth, available tools for alignment, output file formats, assembly programs, splice junction prediction, and applications of RNA sequencing like gene expression analysis and annotation.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
The 8086 instruction set consists of the following instructions: Data Transfer Instructions move, copy, load, exchange, input, and output. Arithmetic Instructions add, subtract, increment, decrement, convert byte/word and compare. Logical Instructions AND, OR, exclusive OR, shift/rotate and test
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
Hive Query Language (HQL) is excellent for productivity and enables reuse of SQL skills, but falls short in advanced analytic queries. Hive`s Map & Reduce scripts mechanism lacks the simplicity of SQL and specifying new analysis is cumbersome. We developed SQLWindowing for Hive(SQW) to overcome these issues. SQW introduces both Windowing and Table Functions to the Hive user. SQW appears as a HQL extension with table functions and windowing clauses interspersed with HQL. This means the user stays within a SQL-like interface, while simultaneously having these capabilities available. SQW has been published as an open source project. It is available as both a CLI and an embeddable jar with a simple query API. There are pre-built functions for windowing to do Ranking, Aggregation, Navigation and Linear Regression. There are Table functions to do Time Series Analysis, Allocations, and Data Densification. Functions can be chained for more complex analysis. Under the covers MR mechanics are used to partition and order data. The fundamental interface is the tableFunction, whose core job is to operate on data partitions. Function implemenations are isolated from MR mechanics, focus purely on computation logic. Groovy scripting can be used for core implementation and parameterizing behavior. Writing functions typically involves extending one of the existing Abstract functions.
Incremental pattern matching in the VIATRA2 model transformation systemIstvan Rath
Incremental pattern matching allows model transformations to update target models incrementally based on changes to the source model. The VIATRA model transformation system implements incremental pattern matching using a RETE network to efficiently retrieve matching sets as models change. Benchmark results show near-linear performance for sparse models and constant execution time for certain patterns. Future work includes improving construction algorithms and enabling event-driven live transformations.
1. NOCOPY is a compiler hint that can be used with OUT and IN OUT parameters to request to pass them by reference rather than by value for improved performance.
2. Report types include tabular, group left, group above, formlike, matrix, multi media, mailing label, and OLE reports.
3. Anchors define the relative position of an object to the object it is anchored to and are used to determine vertical and horizontal positioning of child objects within parent objects.
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...Amazon Web Services
In the past ten years, the cost of sequencing a human genome has fallen from $3 billion dollars to $1,000, unlocking the ability for clinicians to use genomics in routine care. As the volume of genomic data used in the clinic begins to grow, healthcare providers are facing a number of new IT challenges, such as how to integrate this data with clinical data stored in electronic medical records, and how to make both available in real time to inform clinical decisions. In this session, find out how UCSF Medical Center and Syapse met these challenges head-on and solved them using AWS, all while remaining compliant with privacy and security requirements. Learn how Syapse's precision medicine platform uses Amazon VPC, Dedicated Instances, Amazon EC2, and Amazon EBS to build a high performance, scalable, and HIPAA-compliant data platform that enables UCSF to deliver on the promise of precision medicine by dramatically reducing time and increasing the accuracy and utility of genomic profiling in cancer treatment.
Electronic Medical Records: From Clinical Decision Support to Precision MedicineKent State University
This document discusses the transition from traditional clinical decision support using electronic medical records to precision medicine. It provides examples of how Cleveland Clinic has used electronic medical records to create registries for conditions like chronic kidney disease, develop predictive models, and power algorithms for precision treatment recommendations. The document envisions precision medicine relying on vast amounts of molecular, genomic, and patient-reported data integrated into clinical decision support.
The document outlines a potential partner pitch that includes sections on the organization, the customer problem being solved, the proposed solution and how a partnership could enhance it. It describes how the partnership would work, the underlying technology, a demonstration, competition if applicable, the management team, and next steps such as a trial period. The overall aim is to convince a potential partner to collaborate by explaining how the partnership could alleviate customer pain better than either could alone.
Opus Infosys is a software development and outsourcing company founded in 2007 and headquartered in Waltham, MA with an offshore development center in India. They are seeking partners to help acquire new clients in the US market by selling their services such as application development, software testing, and contract staffing. Partners will receive a predetermined percentage of profits generated by accounts they acquire as long as the clients continue using Opus Infosys services, with full transparency into financial information. The partnership model requires no investment and partners will benefit from Opus Infosys' goodwill and client portfolio.
The document describes Neev International, a company seeking to build mutually rewarding partnerships. It provides their website and discusses their vision of creating a space where people can bring ideas and help each other without expectation of monetary rewards. The company dreams of building a team to share visions and commitments. It commits to delivering value to clients by understanding their needs and delivering measurable benefits through proven solutions and methodologies.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against developing mental illness and improve symptoms for those who already have a condition.
The slide deck we used to raise half a million dollarsBuffer
This is the pitchdeck we used to raise half a million dollars from Angel investors. More here:
http://onstartups.com/tabid/3339/bid/98034/The-Pitch-Deck-We-Used-To-Raise-500-000-For-Our-Startup.aspx
The document discusses a new compiler architecture for the Dotty Scala Compiler (dsc) that takes inspiration from functional databases. The architecture treats all values as time-varying functions indexed by compilation phase. This allows the compiler to answer questions about program elements by looking up their meaning at a specific point in time. The core data types include time-indexed abstract syntax trees, types, references to declarations, and denotations, which capture the meaning of references. Caching is used to efficiently store and retrieve values across phases.
The document discusses the process of register allocation in compilers. It begins by establishing that registers are analogous to variable names in programs, as they both hold values. It then explains that the goal of register allocation is to minimize data movement between registers and memory. The rest of the document dives deeper into the concepts of models that track variable locations, life of variables, how models are transformed through the allocation process, techniques like spilling and splitting variables between registers and memory, and opportunities to formalize and verify aspects of register allocation.
This document provides an overview of ElasticSearch, an open source, distributed, RESTful search and analytics engine. It discusses how ElasticSearch is highly available, distributed across shards and replicas, and can be deployed in the cloud. Examples are provided showing how to index and search data via the REST API and retrieve cluster health information. Advanced features like faceting, scripting, parent/child relationships, and versioning are also summarized.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
The document discusses next generation sequencing methods and RNA sequencing. It covers topics like sequencing formats, data analysis workflows including mapping, clustering, assembly programs, finding new genes and correcting existing ones. It discusses input file types, calculating sequencing depth, available tools for alignment, output file formats, assembly programs, splice junction prediction, and applications of RNA sequencing like gene expression analysis and annotation.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
The 8086 instruction set consists of the following instructions: Data Transfer Instructions move, copy, load, exchange, input, and output. Arithmetic Instructions add, subtract, increment, decrement, convert byte/word and compare. Logical Instructions AND, OR, exclusive OR, shift/rotate and test
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
Hive Query Language (HQL) is excellent for productivity and enables reuse of SQL skills, but falls short in advanced analytic queries. Hive`s Map & Reduce scripts mechanism lacks the simplicity of SQL and specifying new analysis is cumbersome. We developed SQLWindowing for Hive(SQW) to overcome these issues. SQW introduces both Windowing and Table Functions to the Hive user. SQW appears as a HQL extension with table functions and windowing clauses interspersed with HQL. This means the user stays within a SQL-like interface, while simultaneously having these capabilities available. SQW has been published as an open source project. It is available as both a CLI and an embeddable jar with a simple query API. There are pre-built functions for windowing to do Ranking, Aggregation, Navigation and Linear Regression. There are Table functions to do Time Series Analysis, Allocations, and Data Densification. Functions can be chained for more complex analysis. Under the covers MR mechanics are used to partition and order data. The fundamental interface is the tableFunction, whose core job is to operate on data partitions. Function implemenations are isolated from MR mechanics, focus purely on computation logic. Groovy scripting can be used for core implementation and parameterizing behavior. Writing functions typically involves extending one of the existing Abstract functions.
Incremental pattern matching in the VIATRA2 model transformation systemIstvan Rath
Incremental pattern matching allows model transformations to update target models incrementally based on changes to the source model. The VIATRA model transformation system implements incremental pattern matching using a RETE network to efficiently retrieve matching sets as models change. Benchmark results show near-linear performance for sparse models and constant execution time for certain patterns. Future work includes improving construction algorithms and enabling event-driven live transformations.
1. NOCOPY is a compiler hint that can be used with OUT and IN OUT parameters to request to pass them by reference rather than by value for improved performance.
2. Report types include tabular, group left, group above, formlike, matrix, multi media, mailing label, and OLE reports.
3. Anchors define the relative position of an object to the object it is anchored to and are used to determine vertical and horizontal positioning of child objects within parent objects.
The document discusses code generation in compilers. It covers:
- The main tasks of a code generator are instruction selection, register allocation and assignment, and instruction ordering.
- Code generators map an intermediate representation to target machine code for a specific architecture. This mapping can vary in complexity depending on the level of the intermediate representation and the target architecture.
- Key issues in code generator design include addressing in the target code, representing the program as basic blocks and flow graphs, register allocation, and selecting machine instructions to represent operations.
Deep Packet Inspection with Regular Expression MatchingEditor IJCATR
Deep packet inspection directs, persists, filters and logs IP-based applications and Web services traffic based on content
encapsulated in a packet's header or payload, regardless of the protocol or application type. In content scanning, the packet payload is
compared against a set of patterns specified as regular expressions. With deep packet inspection in place through a single intelligent
network device, companies can boost performance without buying expensive servers or additional security products. They are typically
matched through deterministic finite automata (DFAs), but large rule sets need a memory amount that turns out to be too large for
practical implementation. Many recent works have proposed improvements to address this issue, but they increase the number of
transitions (and then of memory accesses) per character. This paper presents a new representation for DFAs, orthogonal to most of the
previous solutions, called delta finite automata (FA), which considerably reduces states and transitions while preserving a transition
per character only, thus allowing fast matching. A further optimization exploits Nth order relationships within the DFA by adopting
the concept of temporary transitions.
Slides for my talk at SkyCon'12 in Limerick.
Here I've squeezed four talks into one, covering a lot of ground quickly, so I've included links to more detailed presentations and other resources.
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
Massive distillation of pre-trained language models like multilingual BERT with 35x compression and 51x speedup (98% smaller and faster) retaining 95% F1-score over 41 languages
This document summarizes data from the Genome in a Bottle Consortium SV Data Jamboree. It discusses several approaches to integrating structural variant calls from multiple technologies to establish a benchmark set of high-confidence SVs:
1) Finding deletions supported by multiple technologies with concordant breakpoints and filtering questionable calls. This approach identified 524 deletions supported by 2+ technologies ranging in size from 20bp to over 3kb.
2) A method called svcompare that compares SV calls across technologies and outputs a multi-sample VCF with variant details from each caller. This identified over 2,000 regions with structural variants called by multiple technologies.
3) A method called svviz that analyzes read support for
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Cassandra is a structured storage system designed for large amounts of data across commodity servers. It provides high availability with eventual consistency and scales incrementally without centralized administration. Data is partitioned across nodes and replicated for fault tolerance. Writes are applied locally and propagated asynchronously, prioritizing availability over consistency. It uses a gossip protocol for membership and failure detection.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
VCF and RDF
1. VCF and RDF
Jeremy J Carroll
April 3rd, 2013
Prepared for W3C Clinical Genomics Task Force
www.Syapse.com
jjc@Syapse.com
Copyright 2013 Syapse Inc.
2. Contents
• Background
– Business Background
– Goal
– VCF and RDF
• This specific work (exploration)
– Stats
– VCF example
– Lots of example transforms: VCF and RDF
• Discussion: might the examples meet the
business goals?
3. Background
• Syapse Discovery provides an end-to-end
solution for […] laboratories deploying next-
generation sequencing-based diagnostics, […]
configurable semantic data structure enables
users to bring omics data together with
traditional medical information […]
• This work: initial exploration of VCF
import, what useful information is in a VCF
etc.
4. Success = Useful Advanced Query
Over Genomic Information
Patient Report Tissue
Disease: Gene: Biopsy Site:
Breast Cancer V-ERB-B2 AVIAN Lobe of Liver
5. VCF, a Web Based Knowledge Format
• Variant Call Format
• Used for exchange from one system to
another
• Used for exchange from one lab to another
• Used for exchange between universities and
the wider community
6. RDF, a Web Based Knowledge Format
• Resource Description Framework
• Used for exchange from one system to
another
• Used for exchange from one lab to another
• Used for exchange between universities and
the wider community
7. VCF vs RDF
• VCF: efficient representation of huge
quantities of data
• RDF: flexible representation with excellent
interoperability and query
8. Materialized or Virtual?
• Materialized = Transform data:
Extract Transform Load into RDF
• Virtual = Transform query:
Runtime mapping
• Or some combination: transform VCF into
some internal format, transform SPARQL
queries into a query over internal format
9. This Work
• Initial mapping from VCF to RDF
• Materialized for simplicity
• Key questions:
– Is this useful?
• Would query answer interesting questions?
– Scale?
• Can we materialize? Should we go virtual?
10. Known limitations
• Lots not addressed ….
• Too many literals not enough URIs
• Slow
• Hoped to be fit for purpose, i.e. answering
questions, not production code. (scale and
utility, see previous slide)
11. Some Stats
• Working with Chromosome 7 from 1000G
ftp://ftp-
trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr7.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz
• 704,243 Variants, 1,092 samples
– 689,034,445 ref calls, 79,998,911 variant calls
• H/W: new mac book pro 2.3 GHz Intel Core i7 8 GB
• Gunzip: 1m37s
• PyVCF: 2h41m
• VCF2TTL: 37h57m
• serdi TTL 2 Ntriples: 13h57m
• triples: 14,780,932,912 (1.2 trillion bytes)
12. VCF Overview
Metadata
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36 T-Box:
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
Properties, Classe
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
s, Domains, Rang
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> es
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51
20 17330 . T A 3.0 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3
20 1110696 rs6040355 A G,T 1e+03 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51
20 1234567 microsat1 GTCT G,GTACT . PASS NS=3;DP=9;AA=G GT:GQ:DP ./.:35:4 0/2:17:2
Per
Per Variant Alternative Per
Information Allele Sample, Variant
Information Call
13. Key Classes
• Variant A position in the reference
genome, where mutation may happen, and
related information.
• Allele A possible sequence of bases at some
variant, and related information.
• Sample A related set of Variant Calls
• Variant Call A selection of two Alleles (diploid)
at a Variant (can be phased or unphased)
14. A Variant
#CHROM POS ID REF ALT QUAL
20 1110696 rs6040355 A G,T 1e+03
15. INFO about a Variant
Same node as on previous
slide, selecting different triples for
display.
DB property should probably be a
class (dbSNP membership); AF (allele
frequency) is not a property of the
variant but the allele.
#CHROM POS INFO
20 1110696 NS=2;DP=10;AF=0.333,0.667;AA=T;DB
16. INFO about Alleles
Same node as on
previous slide, selecting
different triples for
display.
#CHROM POS REF ALT INFO
20 1110696 A G,TAF=0.333,0.667;AA=T;DB
17. Detail about a Variant Call
Same variant node as on
previous slide, HQ is
mismodeled.
#CHROM POS REF ALT FORMAT NA00001
20 1110696 A G,T GT:GQ:DP:HQ 1|2:21:6:23,27
18. A Phased Diploid Genotype Call: rdf:Seq
The | indicates
phased, consistent
ordering between VCF
rows. rdf:Seq indicates
order is significant.
Heterozygous
VCF concept of phase set
not addressed.
#CHROM POS REF ALT FORMAT NA00001
20 1110696 A G,TGT:GQ:DP:HQ 1|2:21:6:23,27
19. Unphased Diploid Genotype Call: rdf:Bag
The / indicates unphased
genotype. rdf:Bag
indicates order is not
significant. rdf:Bag also
used for homozygous
genotypes. The 0/2
indicates the use of the
reference and the 2nd
alternative.
This is a different variant
#CHROM POS REF ALT FORMAT NA00002
20 1234567 GTCT G,GTACT GT:GQ:DP 0/2:17:2
20. Genotype Likelihoods 1000G
#CHROM POS REF ALT FORMAT NA11829
7 16161 A G GT:DS:GL 0|1:1.300:-0.36,-0.43,-0.71
Notes: _1 and _2 hide one another, id_2397 is variant of id_23830
21. Discussion Points
• How much of this is actually useful?
– Maybe just artifacts of the history of the call?
– Maybe just joins that we could/should recompute on
the fly
– Maybe much of the data is fundamentally
uninteresting
• Do we know the queries? If so can we represent
in a more compact way and optimize for those
queries?
22. Further observations
• Metadata is a mess, not actually reusable by
machine, e.g. why file date but not sample date …
provokes the desire to show rationale for file, but
not the performance
• Extensibility of Tbox allows English text to
introduce new syntax (e.g. enumerations, per alt
allele and reference). This is not a good idea.
• The per genotype call info is only for unphased
data, yet the call can be phased
Editor's Notes
VCF2TTLreal 2278m26.935suser 2269m23.479ssys 8m2.282stime serdi -i turtle -o ntriples /tmp/pipe | time wc 14780932912 59123731783 1,211334,403999 136699.18 real 4809.46 user 628.36 sysreal 2278m19.206suser 807m12.920ssys 29m30.467s