The document describes the CAMERA annotation pipelines and related infrastructure. It discusses the compute infrastructure including the CALIT2 compute grid and SOS cluster. It then describes the GOS/CAMERA ncRNA and ORF finding pipeline, including the rRNA finding pipeline, ORF calling, and tRNA extraction. It also discusses the GOS incremental protein clustering pipeline and the CAMERA annotation pipeline, including specifications and implementation. Finally, it provides thoughts on object-oriented design approaches for the annotation pipeline to support changing annotation rules and data sources over time.
This document discusses multithreading and multicore processors. It begins by explaining that instruction level parallelism is difficult to achieve for a single program, but that thread level parallelism exists when running multiple threads or programs simultaneously. It then covers different multithreading paradigms including coarse-grained and fine-grained multithreading as well as challenges with context switching. The document also discusses techniques for multicore processors including cache sharing and instruction fetching policies. It provides examples of commercial multicore chips and research prototypes.
BioMake is a language for specifying build networks of interdependent computational tasks. It allows defining targets with logical patterns that represent tasks. Targets have dependencies on other targets and are built by running actions. This allows automating sequencing analysis pipelines by specifying the execution of tasks like formatting databases and running BLAST alignments in a declarative way.
The kernel exploit attacks have recently become difficult to be
launched because executing either malicious scripts or
instructions is prohibited by the DEP/NX (Data Execution
Prevention/Not Executable). As an alternative way, returnoriented programming (ROP) could be another option to treat the
prevention. However, despite lots of cost for making ROP gadgets,
it has no guarantee to assemble the proper gadgets. To overcome
this limitation, we introduce Page Table Manipulation Attack
(PTMA) to alter memory attribute through page table
modification. This attack enables an attacker to rewrite memory
attribute of protected memory. We show how to find the page
table entry of interest in Master Kernel Page Table and modify its
attribute in AArch32 and x86-64. The results show that PTMA
effectively circumvents the existing kernel exploitation defenses
that are based on memory permission
This document discusses advanced Linux firewall configuration using Netfilter and Iptables. It begins with an introduction of the speaker and an overview of the topics to be covered, including packet processing, connection tracking, iptables rules and tables, iptables modules, and managing firewall rules for cloud environments. The document then delves into technical details like the sk_buff packet representation in Linux, the Netfilter packet flow, basic iptables usage, and differences between stateful and stateless firewalls.
This document summarizes Martin Geisler's presentation on using Python in Mercurial. It discusses:
1) How Mercurial uses Python for its rapid prototyping abilities and clean syntax which helps contributions.
2) How Mercurial speeds up startup time by using demandimport to lazily load modules, reducing imported modules from 305 to 69.
3) How Mercurial optimizes performance through efficient data structures like storing revisions sequentially and maintaining file ordering, as well as rewriting critical parts in C.
Writing an Ostinato Protocol Builder [FOSDEM 2021]pstavirs
How to add more protocols to the Ostinato traffic generator.
While the Ostinato traffic generator can import, edit and replay packets from PCAP files, most users prefer to craft packets from scratch using the Ostinato GUI which has support for common protocols out of the box. To add more protocols quickly and easily, Ostinato has a Protocol Builder framework using which new protocols can be added.
In this talk, Ostinato creator Srivats P shows you how to add a new protocol using this framework.
This document provides an overview of deep sequencing data analysis. It discusses sequencing technologies like Ion Torrent and Illumina, library preparation, alignment, and common file formats. It also demonstrates commands for quality control like FastQC, alignment with Bowtie, and working with the output files including SAM, BAM, pileup formats. Next steps discussed are accessing the Galaxy analysis framework and server to perform an NGS analysis project.
cReComp is an automated design tool that improves the productivity of developing ROS-compliant FPGA components. It generates a component-oriented interface that enables communication between FPGA hardware and ROS software. By describing a user logic circuit and configuration in simple files, cReComp can create the hardware interface circuit, ROS application code, and ROS message files to build a complete ROS-compliant FPGA component in less than an hour, significantly improving development time and productivity over manual design. An evaluation experiment showed that cReComp reduced the time and lines of code required for componentization compared to manual development.
This document discusses multithreading and multicore processors. It begins by explaining that instruction level parallelism is difficult to achieve for a single program, but that thread level parallelism exists when running multiple threads or programs simultaneously. It then covers different multithreading paradigms including coarse-grained and fine-grained multithreading as well as challenges with context switching. The document also discusses techniques for multicore processors including cache sharing and instruction fetching policies. It provides examples of commercial multicore chips and research prototypes.
BioMake is a language for specifying build networks of interdependent computational tasks. It allows defining targets with logical patterns that represent tasks. Targets have dependencies on other targets and are built by running actions. This allows automating sequencing analysis pipelines by specifying the execution of tasks like formatting databases and running BLAST alignments in a declarative way.
The kernel exploit attacks have recently become difficult to be
launched because executing either malicious scripts or
instructions is prohibited by the DEP/NX (Data Execution
Prevention/Not Executable). As an alternative way, returnoriented programming (ROP) could be another option to treat the
prevention. However, despite lots of cost for making ROP gadgets,
it has no guarantee to assemble the proper gadgets. To overcome
this limitation, we introduce Page Table Manipulation Attack
(PTMA) to alter memory attribute through page table
modification. This attack enables an attacker to rewrite memory
attribute of protected memory. We show how to find the page
table entry of interest in Master Kernel Page Table and modify its
attribute in AArch32 and x86-64. The results show that PTMA
effectively circumvents the existing kernel exploitation defenses
that are based on memory permission
This document discusses advanced Linux firewall configuration using Netfilter and Iptables. It begins with an introduction of the speaker and an overview of the topics to be covered, including packet processing, connection tracking, iptables rules and tables, iptables modules, and managing firewall rules for cloud environments. The document then delves into technical details like the sk_buff packet representation in Linux, the Netfilter packet flow, basic iptables usage, and differences between stateful and stateless firewalls.
This document summarizes Martin Geisler's presentation on using Python in Mercurial. It discusses:
1) How Mercurial uses Python for its rapid prototyping abilities and clean syntax which helps contributions.
2) How Mercurial speeds up startup time by using demandimport to lazily load modules, reducing imported modules from 305 to 69.
3) How Mercurial optimizes performance through efficient data structures like storing revisions sequentially and maintaining file ordering, as well as rewriting critical parts in C.
Writing an Ostinato Protocol Builder [FOSDEM 2021]pstavirs
How to add more protocols to the Ostinato traffic generator.
While the Ostinato traffic generator can import, edit and replay packets from PCAP files, most users prefer to craft packets from scratch using the Ostinato GUI which has support for common protocols out of the box. To add more protocols quickly and easily, Ostinato has a Protocol Builder framework using which new protocols can be added.
In this talk, Ostinato creator Srivats P shows you how to add a new protocol using this framework.
This document provides an overview of deep sequencing data analysis. It discusses sequencing technologies like Ion Torrent and Illumina, library preparation, alignment, and common file formats. It also demonstrates commands for quality control like FastQC, alignment with Bowtie, and working with the output files including SAM, BAM, pileup formats. Next steps discussed are accessing the Galaxy analysis framework and server to perform an NGS analysis project.
cReComp is an automated design tool that improves the productivity of developing ROS-compliant FPGA components. It generates a component-oriented interface that enables communication between FPGA hardware and ROS software. By describing a user logic circuit and configuration in simple files, cReComp can create the hardware interface circuit, ROS application code, and ROS message files to build a complete ROS-compliant FPGA component in less than an hour, significantly improving development time and productivity over manual design. An evaluation experiment showed that cReComp reduced the time and lines of code required for componentization compared to manual development.
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
This document summarizes a lecture on dynamic scheduling and the Tomasulo algorithm. It begins with an overview of dynamic scheduling and out-of-order execution. It then describes the Tomasulo algorithm used in IBM's 360/91 floating point unit, which introduced reservation stations, register renaming, and a common data bus to enable out-of-order execution while maintaining in-order retirement. Examples are provided to illustrate how the algorithm handles register dependencies like RAW, WAR, and WAW.
This document discusses ROS (Robot Operating System) integration with FPGAs. It introduces cReComp, a creator for reusable FPGA components that allows developers to integrate FPGA hardware accelerators with ROS nodes through a standardized interface. cReComp components provide hardware acceleration while still being accessible from ROS applications through message passing. Evaluation results show that cReComp reduces development time and effort for ROS-FPGA systems compared to other approaches.
The presentation addresses the most typical issues during network software development and testing, explains the causes and suggests solutions:
- overlapping IP networks
- invalid netmasks
- incomplete routing configuration
- incorrect local MAC addresses
- unidirectional packet generator and unicast flood
- disabled ethernet auto negotiation
Modern CPUs use various techniques to improve performance such as instruction pipelining, cache memory, superscalar execution, out-of-order execution, speculative execution, and branch prediction. However, these optimizations can introduce security vulnerabilities like Spectre and Meltdown attacks which exploit side effects of speculative execution in the CPU cache to leak secret data from memory. Speculative execution may process instructions early before branch resolution, potentially loading secret data into the cache where an attacker can detect it using precise timing measurements. While fixes have been developed, fully mitigating these issues remains an ongoing challenge for CPU architecture.
True stories on the analysis of network activity using Pythondelimitry
The document discusses network packet analysis using Python. It provides an overview of network analysis tools like Wireshark and tcpdump, and how to use them to analyze network traffic captured in a pcap file. It also discusses how to create and send network packets using Scapy for tasks like port scanning, and how to filter network traffic using IPv4/IPv6 packet filters like iptables. The document provides examples of summarizing pcap data and crafting network packets for various protocols.
The document discusses the Tomasulo algorithm, which enables out-of-order execution of instructions in computer processors. It does this through three key mechanisms: common data busing, register tagging, and reservation stations. This allows independent instructions to execute out of order while preserving dependencies. The algorithm tracks dependencies through register tags rather than physical registers, allowing overlapping of dependent instructions by forwarding values via the common data bus. This decouples dependency tracking from instruction decoding and dispatch, improving parallelism.
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
This talk introduces JTAG debugging capabilities, both for debugging hardware and software. Marek first explains what the JTAG stands for and explains the operation of the JTAG state machine. This is followed by an introduction to free software JTAG tools, OpenOCD and urJTAG. Marek shortly explains how to debug software using those tools and how that ties into the JTAG state machine. However, JTAG was designed for testing hardware. Marek explains what boundary scan testing (BST) is, what are BSDL files and their format, and practically demonstrates how to blink an LED using BST and only free software tools.
Marek Vasut
This document summarizes different aspects of instrumentation and runtime measurement using VampirTrace. It discusses automatic, manual, and binary instrumentation. It describes runtime measurement behind the scenes and using the OTF trace format. It outlines various options, settings, and parameters that can be configured using environment variables to control aspects like hardware counters, memory tracing, I/O tracing, and filtering.
Depuis FreeBSD 8.0, le SSP est activé automatique pour la compilation de l'OS. Cette option de GCC développée au départ par IBM, permet d'ajouter des mécanismes de protection face aux buffer overflows. La présentation sera accompagnée de sources C et d'étude de la mémoire via GDB. La présentation commencera par le fonctionnement du SSP (via 3 aspects), suivi de l'implémentation sous FreeBSD et son Linux pour finir par l'exploitation dans certains cas de figure.
1. The document describes the core analysis steps for ChIP-seq and RNA-seq experiments, including trimming, alignment, peak calling, and downstream analyses like viewing data in a genome browser and identifying motifs.
2. It explains key ChIP-seq steps like sonication, immunoprecipitation of DNA-bound proteins, and use of control samples to identify true enrichment peaks.
3. It also outlines RNA-seq workflow involving poly-A selection, cDNA synthesis, and analysis of gene and transcript expression.
This document discusses using cReComp to develop ROS-compliant FPGA components. cReComp is a tool that takes specifications written in scrp and generates FPGA IP cores and C++ driver code. An example is presented where cReComp is used to generate a FIR filter component from a scrp specification. The component communicates with ROS using topics and processes data in real-time on the FPGA to provide latency of less than 1ms. Details are provided on the component architecture generated by cReComp and how it integrates FPGA hardware acceleration with the ROS framework.
The document discusses resolving an ORA-00257 error caused by the archive log destination running out of disk space. Key steps include:
1) Deleting archive log files manually from the OS and using RMAN to remove references
2) Using RMAN to crosscheck and delete expired archive logs
3) This frees up disk space and allows the database to continue archiving logs
Taking these actions resolves the error by freeing space for archiving redo logs.
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentAnne Nicolas
Testing is important. That’s a well known fact that very few developers will dispute. Why is then so little kernel code covered by a clear testing strategy ? Through real stories about test plans (or the lack thereof), this talk will convince you that none of your excuses for not having a test strategy are valid. You will learn how various parts of the Linux kernel have approached testing and how you can benefit from their experience. The talk will use the V4L2 subsystem to demonstrate the use of test tools, but will be applicable to kernel development in general.
Laurent Pinchart
The document discusses ROS (Robot Operating System) integration with FPGA using cReComp and Scrp. cReComp is a tool that allows creating reusable FPGA components from C++ or HDL code. Scrp is a domain specific language used to specify the hardware and software interface of a component. The document demonstrates creating a sensor component with cReComp and evaluates its performance and timing.
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
The document discusses dynamic scheduling in modern out-of-order processors. It describes how register renaming is used to avoid false dependencies and allow instructions to execute out-of-order. The reorder buffer (ROB) is used to support precise interrupts by buffering instruction results and allowing the processor state to be reconstructed sequentially. The ROB also handles precise handling of speculative execution for branch mispredictions.
The document discusses the introduction of ARM 64-bit architecture. It begins with an introduction of the speaker and then covers several topics on ARM64 including:
- ARM64 terminology such as AArch64 for 64-bit mode and AArch32 for 32-bit mode
- The ARM64 execution model including 64-bit general purpose registers and 128-bit floating point registers
- The ARM64 instruction set architecture including new instructions for cache control and floating point support
- Demonstrations of ARM64 assembly code for various C examples compiled to ARM64
- Trying out ARM64 emulation using QEMU to debug ARM64 code with GDB.
Brno Perl Mongers 28.5.2015 - Perl family by mj41Michal Jurosz
This document summarizes the 15-year history of Perl 6 and Perl 5 development from 1987 to 2015. It describes the early versions of Perl from 1.0 to 5.0 in the late 1980s and 1990s. It then covers the beginnings of Perl 6 in 2000, the development of implementations like Pugs and Rakudo, and the long journey to a stable 1.0 release in 2010. It discusses key people, technologies like Parrot and MoarVM, and the ongoing progress toward finalizing Perl 6 features and performance.
This document discusses FPGA and ROS integration using cReComp. cReComp is a tool that allows defining reusable FPGA components using a specification language and integrating them with ROS. It handles the hardware/software interface and generation of HDL, C++ code and ROS packages from a single specification file. An example is provided of using cReComp to implement an ultrasonic sensor component on an FPGA board running Linux and ROS. The goal is to explore using this approach to implement visual SLAM on an FPGA for low power robotics applications.
Squash Those IoT Security Bugs with a Hardened System ProfileSteve Arnold
Although the tools and documentation have been around a long time, the industry as a whole has been woefully slow at taking security engineering seriously (even more so in the embedded world). The current mainline kernel includes several access control systems that reduce the risk of bugs escalating into high-level security compromises, such as the venerable SELinux (which is enabled by default in Android 4.4 and several "enterprise" Linux distributions). This presentation focuses on a complementary set of security mechanisms that work independently from the overlying frameworks: PIE toolchain hardening, PAX kernel hardening, and the PAX userland tools. These technologies work together to demote whole classes of bugs from headline-grabbing remote compromise and/or data theft exploits to "mere" DoS annoyances.
SANS @Night There's Gold in Them Thar Package Management DatabasesPhil Hagen
This document discusses how package management databases like RPM can provide useful evidence during Linux forensic examinations. It describes how RPM stores file metadata that can be queried to identify file ownership and validate installed packages. Examples are provided of using RPM to find modified or orphaned files, as well as techniques like directly validating the filesystem against package files to avoid issues with a compromised RPM database. The document encourages developing shell scripts to efficiently extract relevant RPM information.
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
This document summarizes a lecture on dynamic scheduling and the Tomasulo algorithm. It begins with an overview of dynamic scheduling and out-of-order execution. It then describes the Tomasulo algorithm used in IBM's 360/91 floating point unit, which introduced reservation stations, register renaming, and a common data bus to enable out-of-order execution while maintaining in-order retirement. Examples are provided to illustrate how the algorithm handles register dependencies like RAW, WAR, and WAW.
This document discusses ROS (Robot Operating System) integration with FPGAs. It introduces cReComp, a creator for reusable FPGA components that allows developers to integrate FPGA hardware accelerators with ROS nodes through a standardized interface. cReComp components provide hardware acceleration while still being accessible from ROS applications through message passing. Evaluation results show that cReComp reduces development time and effort for ROS-FPGA systems compared to other approaches.
The presentation addresses the most typical issues during network software development and testing, explains the causes and suggests solutions:
- overlapping IP networks
- invalid netmasks
- incomplete routing configuration
- incorrect local MAC addresses
- unidirectional packet generator and unicast flood
- disabled ethernet auto negotiation
Modern CPUs use various techniques to improve performance such as instruction pipelining, cache memory, superscalar execution, out-of-order execution, speculative execution, and branch prediction. However, these optimizations can introduce security vulnerabilities like Spectre and Meltdown attacks which exploit side effects of speculative execution in the CPU cache to leak secret data from memory. Speculative execution may process instructions early before branch resolution, potentially loading secret data into the cache where an attacker can detect it using precise timing measurements. While fixes have been developed, fully mitigating these issues remains an ongoing challenge for CPU architecture.
True stories on the analysis of network activity using Pythondelimitry
The document discusses network packet analysis using Python. It provides an overview of network analysis tools like Wireshark and tcpdump, and how to use them to analyze network traffic captured in a pcap file. It also discusses how to create and send network packets using Scapy for tasks like port scanning, and how to filter network traffic using IPv4/IPv6 packet filters like iptables. The document provides examples of summarizing pcap data and crafting network packets for various protocols.
The document discusses the Tomasulo algorithm, which enables out-of-order execution of instructions in computer processors. It does this through three key mechanisms: common data busing, register tagging, and reservation stations. This allows independent instructions to execute out of order while preserving dependencies. The algorithm tracks dependencies through register tags rather than physical registers, allowing overlapping of dependent instructions by forwarding values via the common data bus. This decouples dependency tracking from instruction decoding and dispatch, improving parallelism.
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
This talk introduces JTAG debugging capabilities, both for debugging hardware and software. Marek first explains what the JTAG stands for and explains the operation of the JTAG state machine. This is followed by an introduction to free software JTAG tools, OpenOCD and urJTAG. Marek shortly explains how to debug software using those tools and how that ties into the JTAG state machine. However, JTAG was designed for testing hardware. Marek explains what boundary scan testing (BST) is, what are BSDL files and their format, and practically demonstrates how to blink an LED using BST and only free software tools.
Marek Vasut
This document summarizes different aspects of instrumentation and runtime measurement using VampirTrace. It discusses automatic, manual, and binary instrumentation. It describes runtime measurement behind the scenes and using the OTF trace format. It outlines various options, settings, and parameters that can be configured using environment variables to control aspects like hardware counters, memory tracing, I/O tracing, and filtering.
Depuis FreeBSD 8.0, le SSP est activé automatique pour la compilation de l'OS. Cette option de GCC développée au départ par IBM, permet d'ajouter des mécanismes de protection face aux buffer overflows. La présentation sera accompagnée de sources C et d'étude de la mémoire via GDB. La présentation commencera par le fonctionnement du SSP (via 3 aspects), suivi de l'implémentation sous FreeBSD et son Linux pour finir par l'exploitation dans certains cas de figure.
1. The document describes the core analysis steps for ChIP-seq and RNA-seq experiments, including trimming, alignment, peak calling, and downstream analyses like viewing data in a genome browser and identifying motifs.
2. It explains key ChIP-seq steps like sonication, immunoprecipitation of DNA-bound proteins, and use of control samples to identify true enrichment peaks.
3. It also outlines RNA-seq workflow involving poly-A selection, cDNA synthesis, and analysis of gene and transcript expression.
This document discusses using cReComp to develop ROS-compliant FPGA components. cReComp is a tool that takes specifications written in scrp and generates FPGA IP cores and C++ driver code. An example is presented where cReComp is used to generate a FIR filter component from a scrp specification. The component communicates with ROS using topics and processes data in real-time on the FPGA to provide latency of less than 1ms. Details are provided on the component architecture generated by cReComp and how it integrates FPGA hardware acceleration with the ROS framework.
The document discusses resolving an ORA-00257 error caused by the archive log destination running out of disk space. Key steps include:
1) Deleting archive log files manually from the OS and using RMAN to remove references
2) Using RMAN to crosscheck and delete expired archive logs
3) This frees up disk space and allows the database to continue archiving logs
Taking these actions resolves the error by freeing space for archiving redo logs.
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentAnne Nicolas
Testing is important. That’s a well known fact that very few developers will dispute. Why is then so little kernel code covered by a clear testing strategy ? Through real stories about test plans (or the lack thereof), this talk will convince you that none of your excuses for not having a test strategy are valid. You will learn how various parts of the Linux kernel have approached testing and how you can benefit from their experience. The talk will use the V4L2 subsystem to demonstrate the use of test tools, but will be applicable to kernel development in general.
Laurent Pinchart
The document discusses ROS (Robot Operating System) integration with FPGA using cReComp and Scrp. cReComp is a tool that allows creating reusable FPGA components from C++ or HDL code. Scrp is a domain specific language used to specify the hardware and software interface of a component. The document demonstrates creating a sensor component with cReComp and evaluates its performance and timing.
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
The document discusses dynamic scheduling in modern out-of-order processors. It describes how register renaming is used to avoid false dependencies and allow instructions to execute out-of-order. The reorder buffer (ROB) is used to support precise interrupts by buffering instruction results and allowing the processor state to be reconstructed sequentially. The ROB also handles precise handling of speculative execution for branch mispredictions.
The document discusses the introduction of ARM 64-bit architecture. It begins with an introduction of the speaker and then covers several topics on ARM64 including:
- ARM64 terminology such as AArch64 for 64-bit mode and AArch32 for 32-bit mode
- The ARM64 execution model including 64-bit general purpose registers and 128-bit floating point registers
- The ARM64 instruction set architecture including new instructions for cache control and floating point support
- Demonstrations of ARM64 assembly code for various C examples compiled to ARM64
- Trying out ARM64 emulation using QEMU to debug ARM64 code with GDB.
Brno Perl Mongers 28.5.2015 - Perl family by mj41Michal Jurosz
This document summarizes the 15-year history of Perl 6 and Perl 5 development from 1987 to 2015. It describes the early versions of Perl from 1.0 to 5.0 in the late 1980s and 1990s. It then covers the beginnings of Perl 6 in 2000, the development of implementations like Pugs and Rakudo, and the long journey to a stable 1.0 release in 2010. It discusses key people, technologies like Parrot and MoarVM, and the ongoing progress toward finalizing Perl 6 features and performance.
This document discusses FPGA and ROS integration using cReComp. cReComp is a tool that allows defining reusable FPGA components using a specification language and integrating them with ROS. It handles the hardware/software interface and generation of HDL, C++ code and ROS packages from a single specification file. An example is provided of using cReComp to implement an ultrasonic sensor component on an FPGA board running Linux and ROS. The goal is to explore using this approach to implement visual SLAM on an FPGA for low power robotics applications.
Squash Those IoT Security Bugs with a Hardened System ProfileSteve Arnold
Although the tools and documentation have been around a long time, the industry as a whole has been woefully slow at taking security engineering seriously (even more so in the embedded world). The current mainline kernel includes several access control systems that reduce the risk of bugs escalating into high-level security compromises, such as the venerable SELinux (which is enabled by default in Android 4.4 and several "enterprise" Linux distributions). This presentation focuses on a complementary set of security mechanisms that work independently from the overlying frameworks: PIE toolchain hardening, PAX kernel hardening, and the PAX userland tools. These technologies work together to demote whole classes of bugs from headline-grabbing remote compromise and/or data theft exploits to "mere" DoS annoyances.
SANS @Night There's Gold in Them Thar Package Management DatabasesPhil Hagen
This document discusses how package management databases like RPM can provide useful evidence during Linux forensic examinations. It describes how RPM stores file metadata that can be queried to identify file ownership and validate installed packages. Examples are provided of using RPM to find modified or orphaned files, as well as techniques like directly validating the filesystem against package files to avoid issues with a compromised RPM database. The document encourages developing shell scripts to efficiently extract relevant RPM information.
This document discusses Ganglia, an open-source distributed monitoring system. It begins with an introduction to single host and distributed monitoring. It then covers Ganglia in more detail, explaining that Ganglia uses gmond and gmetad daemons to collect metrics from nodes and save them to RRD files. Finally, it provides steps to install and configure Ganglia, including installing dependencies, configuring gmetad and gmond, and setting up the web front-end.
This document provides solutions to common Linux commands and tasks. It covers topics such as environment setting, hardware and system specifications, file editing and compression, networking, performance monitoring, package management with RPM, and multimedia. Solutions are provided for tasks like changing the startup runlevel, monitoring swap size, editing files, getting the network IP and registering the hostname, and burning discs.
This document provides an overview of the Linux kernel, including its history, structure, build process, installation, updating, and customization. It discusses getting the kernel source code, configuring and building the kernel, installing modules and the kernel, applying updates via patches, and determining the correct driver for PCI devices by matching the vendor and device IDs. The key steps are to find the PCI IDs, search for the IDs in kernel headers to identify the driver, search the kernel makefiles and configuration to enable that driver for compilation.
Introduction to ESP32 Programming [Road to RIoT 2017]Alwin Arrasyid
Introduction to ESP32 programming using official development framework, ESP-IDF and Arduino for ESP32.
Every demo code is published in this github repository:
https://github.com/alwint3r/RTR_Surabaya2017
SystemTap is a dynamic tracing tool for Linux systems. It allows users to easily gather information about the running Linux system by defining probe points in a script. The script is compiled into a kernel module which can then be loaded to monitor the specified probe points. Some examples of useful probe points include functions, system calls, and kernel statements. SystemTap scripts can be used to trace execution, profile performance, monitor kernel functions and debug problems by printing at probe points. It provides a safe way to observe a live system without needing to recompile the kernel.
Bundling Packages and Deploying Applications with RPMAlexander Shopov
This document summarizes the steps to build an RPM package for a sample Java application called Counterbean using Tomcat. It describes preparing the build environment by installing necessary packages, creating a dedicated packager user, and initializing the RPM build tree. The document then walks through editing the spec file, adding dependencies, and building and installing the RPM package locally. Key aspects covered include file ownership, startup scripts, and switching the application's database.
Efficient System Monitoring in Cloud Native EnvironmentsGergely Szabó
This document discusses efficient system monitoring in cloud native environments using eBPF. It provides an overview of eBPF and how it can be used for monitoring applications like Prometheus. Specific topics covered include BPF, Linux kernel tracing using kprobes and tracepoints, eBPF maps and programs, and an example Prometheus exporter that leverages eBPF to export metrics.
This document discusses the Red Hat Package Manager (RPM), including what it can do, who uses it, terminology, database location, common operations like install, uninstall, query, and upgrade, and various options for those operations. RPM is used to install, manage, and uninstall software packages on Red Hat, Fedora, CentOS and other Linux distributions. It allows adding, removing, upgrading and verifying packages and their dependencies.
The document provides guidance on troubleshooting Linux systems. It discusses preparing for troubleshooting by backing up data and documentation. When issues arise, it recommends gathering information from logs, researching if the problem is widespread, and considering likely causes such as user error, software/hardware issues, or network problems. It then offers solutions such as software and hardware remedies, and provides tips for troubleshooting specific components like applications, networks, disks, and packages.
This document provides an overview of PITR (Point-in-Time Recovery) and introduces PITRTools, an open source tool that simplifies implementing PITR for PostgreSQL databases. PITRTools acts as a wrapper around common utilities like rsync, ssh, and pg_standby to enable features like warm standbys, backups, failover, and monitoring alerts. It works by pushing transaction log files from a master database to a slave, and includes configuration files and scripts to initialize, start, and manage the master archiver and slave standby processes.
Joshua D. Drake
Are you tired of not having a real solution for PITR? Enter PITRTools, a single and secure solution for using Point In Time Recovery for PostgreSQL.
This document discusses container technologies including App Container (appc) and rkt. It provides an overview of appc components like the image format, discovery, and executor. It then discusses rkt, an implementation of appc, describing its modular architecture with stages 0-2 and use of systemd and cgroups for isolation. It also touches on rkt security, networking, and integration with systemd and user namespaces.
The document provides an introduction to Linux and device drivers. It discusses Linux directory structure, kernel components, kernel modules, character drivers, and registering drivers. Key topics include dynamically loading modules, major and minor numbers, private data, and communicating with hardware via I/O ports and memory mapping.
This document provides instructions for installing and configuring Snort 2.9.6 and DAQ 2.0 on CentOS 6.3/6.4 running in a VirtualBox virtual machine. It describes compiling and installing necessary libraries like libpcap and libdnet. It then provides commands for extracting, configuring, compiling and installing DAQ and Snort. Finally it discusses configuring Snort configuration files, adding the Snort user, and providing a script to start and stop Snort.
Summit 16: OPNFV on ARM - Hardware Freedom of Choice Has Arrived!OPNFV
Freedom of choice is one of the key concepts in the SDN and NFV revolution we are seeing today. OPNFV is at the heart of this revolution yet very limited freedom of choice has existed on the hardware architecture side. However, with the work done in the Armband project, ARM servers are now an alternative hardware architecture for Brahmaputra deployments. The Armband team has ported the OPNFV Fuel Project to support deployments on ARM servers. The necessary code changes have been upstreamed through the OPNFV armband project. End users are now able to download or build their own Brahmaputra OPNFV ISO ready for ARM and install it using available OPNFV documentation. In addition to this and to further the OPNFV VNF ecosystem, a full specification OPNFV Pharos lab based on ARM servers was built by Enea for running continuous integration (CI) and continuous deployment (CD). In this presentation, we will walk you through the experiences gained in this process, the challenges and how they were overcome and what is coming next.
The CASPUR Staging System II is a disk and tape storage solution that uses a staging process to migrate files between disks and tapes. It consists of three main components - a stager, movers, and user interface commands. The stager monitors disk usage and migrates files to tape to maintain optimal disk occupancy based on configurable policies. Movers handle the actual data transfer between disks and tapes. Users can control files and view statuses using commands.
Similar to CAMERA metagenomic annotation pipeline (20)
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
4. CALIT2 Compute Grid
48 dual-core dual-CPU 64 bit machines
192 SGE slots
Redhat-based ‘Rocks Clusters’ Linux
distribution (see http://rocksclusters.org)
‘Rocks Rolls’
Bio-roll (/opt/Bio)
Used to image/install each node separately,
including local Perl module installs (patches)
6. SOS Cluster Global Mounts
/share/apps
applications (and related files) are installed here,
analysis data should not be stored here
/home/thumper6
a global mount point --- 18T(!!!) storage volume
on which all analysis data/results should be
stored
/opt/Bio
tools such as clustalw, EMBOSS, hmmer, ncbi
blast are installed under here
7. SOS Local Mounts
(on each grid node)
/state/partition1
local storage device on each grid node available
for local scratch space (438G)
/tmp
system tmp partition (7G)
10. pg0-0.camera.calit2.net
CGI scripts run as the user 'apache' on pg0-0, but ‘apache’ has
sudo permissions for user 'ergatis'
The two CGI scripts in the install which run RunWorkflow and
KillWorkflow (ergatis/kill_wf.cgi, ergatis/Ergatis/Pipeline.pm)
have been modified, and 'sudo -u ergatis ' has been appended
to their normal execution strings
IdGenerator.pm has been modified to use JCVIIdGenerator.pm
Many of the settings in ergatis.ini have been changed from
defaults, including disabling a number of the components
When updating the Ergatis CGI directory from the SVN
repository, a backup copy should be set-aside in advance
11. SGE/Workflow Notes
Two SGE queues have been configured for ergatis:
ergatis.q (192 slots)
ergatis-fast.q (144 slots)
ergatis.q is subordinate queue of ergatis-fast.q
ergatis.q is set as default queue for user ‘ergatis’ by specifying ‘-q ergatis.q’ in
/home/ergatis/.sge_request
Workflow version 3.0 is installed
/share/apps/workflow
Workflow requires that the SGE queue's prolog and epilog scripts be set to the
following:
prolog=/share/apps/workflow/bin/prolog $host $job_owner $job_id $job_name $queue
epilog=/share/apps/workflow/bin/epilog $host $job_owner $job_id $job_name $queue
The queue configuration can be checked using the command
'qconf -sq ergatis.q'
12. Ergatis Application Install
The main ergatis application install directory is under /share/apps/ergatis
The chado-v1r12b1 release is the current version installed
direct copy of the install located at /usr/local/devel/ANNOTATION/ard/ at JCVI
Perl wrappers were modified via sed to the correct local directory structures
Proper install wasn't done because no working installer script was available at the
time
/share/apps/ergatis/chado-v1r12b1
symlinked to /share/apps/ergatis/current
Executables which some ergatis component use, but are not installed with
Ergatis (e.g.: JCVI internal scripts) are located under /share/apps/ergatis/bin
External tools which are not globally installed on sos are installed under
/share/apps/ergatis/external_apps
Ergatis global directories (global_id_repository, global_saved_templates) are
located under /share/apps/ergatis/ergatis_global
13. Ergatis Data Locations
All ergatis data should be put under /home/thumper6/ergatis
Project repositories are located under
/home/thumper6/ergatis/projects
or symlink /share/apps/ergatis/projects
CAMERA project repository is
/home/thumper6/ergatis/projects/camera
Databases are located under /home/thumper6/ergatis/db
or symlink /share/apps/ergatis/db
Global scratch space is under /home/thumper6/ergatis/scratch
or symlink /share/apps/ergatis/scratch
14. ikelite.rocksclusters.org
Less machines than sos cluster (~20 slots?)
Initial test ergatis install was done here
(similar directory structure to sos)
Completely distinct from sos cluster
Sandbox
Shibu, Weizhong Li and others run computes
here (e.g.: clustering pipeline)
17. Challenges
All computes in pipeline must be performed on
multi-sequence input/output files, as the filesystem
can not physically support 12M+ individual FASTA
input files/output files
other partitioning solutions could work(?) but most tools
support multiple sequence inputs anyway
Overall total space consumption was an issue when
computes were running on TIGR grid, but this is not
as much an issue (currently) on CALIT2 grid
Solution here was to keep all inputs/outputs gzipped
during pipeline execution, at the cost of some performance
loss (using things like zcat –f | with NCBI BLAST, etc.)
22. CAMERA rRNA Finder Overview
BLAST vs. a database of coded pooled rRNA
subunit sequences
BLAST prefilter step with loose parameters
blastall -p blastn -i reads.fsa -d rrna_db.fsa -e 0.1 -F 'T' -b 1 -v 1
-z 3000000000 -W 9
Reads with prefilter hits are searched using strict
parameters
blastall -p blastn -i aligned.fsa -d rrna_db.fsa -e 1e-4 -F 'm L' -b
1500 -v 1500 -q -5 -r 4 -X 1500 -z 3000000000 -W 9 -U T
Collapse aligned intervals of the same rRNA type
and extract the highest scoring alignments from
each region
25. rRNA Finder DB
/usr/local/annotation/CAMERA/CustomDB/camera_rRNA_finder.all_rRNA.coded.cdhit_80.fsa
5S
Sequences from Archaea, Bacteria and Eukaryota were
obtained from the 5S Ribosomal RNA Database
http://biobases.ibch.poznan.pl/5SData/
16S
Sequences for Archaea and Bactera were obtained from the
Green Genes 16S db
http://greengenes.lbl.gov/
18S
Source was Doug Rusch's 18S database prepared for the GOS
paper
23S
Source was Doug Rusch's 23S database prepared for the GOS
paper.
26. rRNA Finder DB
Fasta headers were coded as follows:
>#S [D] ...original.header...
where # is one of (5, 16, 18, 23) and D is one of
(A, B, E). The camera_rrna_finder
component expects this format.
27. rRNA Finder DB
CD-HIT was run on the entire database to cluster sequences with
high similarity to reduce the database size but maintain a range
of diverse sequences
Command line:
/usr/local/devel/ANNOTATION/bwhitty/cdhit/cd-hit/cd-hit-est -i
input_database.fsa -o output_database.fsa -c 0.8 -n 4
Consistency of clustering was checked with a Perl script to
ensure no heterogeneous clustering
(e.g.: 18S and 16S clustering together)
Clusters were consistent
Database size was reduced from 65,591 sequences to 1,329
32. The absence of called
ORFs in this region of
the read is due to the
soft-masked rRNA
sequence
RNAmmer didn’t
identify the 23S
sequence, though it is
capable of finding 23S
39. Thoughts on Specifications
Annotation rules should not be literally codified as
Perl code (and only Perl code)!!!
(especially when the “decision makers” never look at the code)
What tools do we trust?
What cutoffs do we use?
What evidence/data types do we consider?
These will (in some cases should) change over time
40. More Thoughts
Specifications are easier to change than
code, so code should be written to support
change
But unless they’re defined first, the
specifications will be a moving target
41. (My) Design Objectives
Must be able to add/remove annotation data
sources as the annotation SOP changes
Must be able to easily change the ways in
which these annotation data types are
applied/combined to produce final annotation
Must be able to change/expand the types of
final annotation data we are producing
42. Object-Oriented Design Approach
OOP in Perl == *, but lesser of two evils
(don’t ask me what the other evil is, but it must be pretty evil)
Encapsulates possible sources of change and prevents
them from affecting downstream components
(like HACCP)
Polymorphism of $parser->parse($infile) producing
annotation objects is nice
Re-use was not really a motive here
*Damian Conway in his OOP Perl book says using OOP in Perl yields 5X performance hit
43. Annotation Pipeline Overview
Annotation Tool(s)
Annotation Source Data
Parser(s)
We can make changes
Annotation Data Object(s) to the annotation rules,
without having to
necessarily re-run or re-
parse the data
Annotation
Rules
Final Annotation Data
44. Design Objectives for Parsers
A parser must:
Produce polypeptides with associated AnnotationData objects of a defined type
Produce AnnotationData object with attributes specified in a consistent way
E.g.: All parsers should produce EC number attributes that look like ‘1.1.1.1’ ->
‘1.-.-.-’, not sometimes ‘1.-’. Multiple values should be split. Any clean-up or
verification should be done before the AnnotationData object is created; if the data is
invalid, the attribute should not be populated, or the object should not be created.
Produce annotation data objects that are independent of the source annotation
data they were parsed from
e.g.: They have already been canonized as a type of ‘trusted annotation evidence
type’ when they are created as AnnotationData objects. These trusted types are
defined in the annotation SOP.
These features create a separation between how trusted evidence is defined
(input data), and how the evidence is used to produce annotation (annotation
rules)
46. AnnotationRules
AnnotationRules object implements the rules
from the annotation SOP document
AnnotationRules::PredictedProtein takes a
Polypeptide object with associated
AnnotationData objects of varying type and
applies the annotation rules to create a final
AnnotationData object
47. AnnotationRules
Rules are encoded as an array in the following
format:
ANNOTATION_TYPE|OPERATOR|ATTRIBUTE1 ATTRIBUTE2
Where OPERATOR is one of:
= for assign attribute (if unassigned)
+ for append attribute
- for overwrite attribute
Any operators can be defined as they are applied
with a hash of handler subroutines
48. AnnotationRules::PredictedProtein
my @annotation_order = (
## equivalog level tigrfam hits
'TIGRFAM::FullLength::Equivalog|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::Exception|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::HypotheticalEquivalog|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FRAG::Equivalog|=|GO',
'TIGRFAM::FRAG::Exception|=|GO',
'TIGRFAM::FRAG::HypotheticalEquivalog|=|GO',
'TIGRFAM::FullLength::Domain|=|GO',
'PandaBLASTP::Characterized|=|GO',
'PRIAM|=|GO EC',
## equivalog level hits vs tigrfam frag
'TIGRFAM::FRAG::Equivalog|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FRAG::Exception|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FRAG::HypotheticalEquivalog|=|common_name gene_symbol GO EC TIGR_role',
## characterized high confidence blast hit
'PandaBLASTP::Characterized|=|common_name gene_symbol',
## pfam and non-equivalog tigrfams
'PFAM::FullLength::Equivalog|=|common_name gene_symbol GO EC TIGR_role',
'PFAM::FullLength::HypotheticalEquivalog|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::Subfamily|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::Superfamily|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::EquivalogDomain|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::HypotheticalEquivalogDomain|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::SubfamilyDomain|=|common_name gene_symbol GO EC TIGR_role',
'TIGRFAM::FullLength::Domain|=|common_name gene_symbol GO EC TIGR_role',
…
54. Future Development
(My 2 cents)
Pipeline development must be driven by annotation SOP development
work
Feedback on pipeline bugs must be vigilantly kept separate from feedback
on annotation SOP bugs
First discuss and update the SOP, then modify the code
Cluster summary annotation
Shortest path here seems to be a combination of GO Slim and EC
assignments? GO consortium makes some scripts available for
summarizing sets of GO assignments
If using the current code, PolypeptideSet container class exists already.
Cluster members can be added to a PolypeptideSet and that can be used
as input to an AnnotationRules::FinalCluster object that is similar to the one
for PredictedProtein, but with a different set of handler routines.
Incremental clustering pipeline
Good luck