SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDrive
Highlighted notes of:
Introduction to CUDA C: NVIDIA
Author: Blaise Barney
From: GPU Clusters, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/linux_clusters/gpu/NVIDIA.Introduction_to_CUDA_C.1.pdf
Blaise Barney is a research scientist at Lawrence Livermore National Laboratory.
Abstract: Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored.
In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively around 22x and 8x, and a prediction error that is on average less than 1%.
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
The Implementing AI: High Performance Architectures webinar, hosted by KTN and eFutures, was the fourth event in the Implementing AI webinar series.
The focus of the webinar was the impact of processing AI data on data centres - particularly from the technology perspective. Timothy Lanfear, Director of Solution Architecture and Engineering EMEA, NVIDIA, presented on a Universal Accelerated Computing Platform.
Highlighted notes of:
Introduction to CUDA C: NVIDIA
Author: Blaise Barney
From: GPU Clusters, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/linux_clusters/gpu/NVIDIA.Introduction_to_CUDA_C.1.pdf
Blaise Barney is a research scientist at Lawrence Livermore National Laboratory.
Abstract: Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored.
In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively around 22x and 8x, and a prediction error that is on average less than 1%.
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
The Implementing AI: High Performance Architectures webinar, hosted by KTN and eFutures, was the fourth event in the Implementing AI webinar series.
The focus of the webinar was the impact of processing AI data on data centres - particularly from the technology perspective. Timothy Lanfear, Director of Solution Architecture and Engineering EMEA, NVIDIA, presented on a Universal Accelerated Computing Platform.
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
In this deck from the Stanford HPC Conference, Doug Miles from NVIDIA presents: Accelerating HPC Applications on NVIDIA GPUs with OpenACC."
"OpenACC is a directive-based parallel programming model for GPU accelerated and heterogeneous parallel HPC systems. It offers higher programmer productivity compared to use of explicit models like CUDA and OpenCL.
Application source code instrumented with OpenACC directives remains portable to any system with a standard Fortran/C/C++ compiler, and can be efficiently parallelized for various types of HPC systems – multicore CPUs, heterogeneous CPU+GPU, and manycore processors.
This talk will include an introduction to the OpenACC programming model, provide examples of its use in a number of production applications, explain how OpenACC and CUDA Unified Memory working together can dramatically simplify GPU programming, and close with a few thoughts on OpenACC future directions."
Watch the video: https://youtu.be/CaE3n89QM8o
Learn more: https://www.openacc.org/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev
Alexey Zinoviev presented this paper on the JavaDayKiev'15 conference http://javaday.org.ua/kyiv/#schedule
This paper covers next topics: Java, Spark, Hadoop, Mahout, MLlib, Weka, Machine Learning, Data Mining
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
In this deck from the Stanford HPC Conference, Doug Miles from NVIDIA presents: Accelerating HPC Applications on NVIDIA GPUs with OpenACC."
"OpenACC is a directive-based parallel programming model for GPU accelerated and heterogeneous parallel HPC systems. It offers higher programmer productivity compared to use of explicit models like CUDA and OpenCL.
Application source code instrumented with OpenACC directives remains portable to any system with a standard Fortran/C/C++ compiler, and can be efficiently parallelized for various types of HPC systems – multicore CPUs, heterogeneous CPU+GPU, and manycore processors.
This talk will include an introduction to the OpenACC programming model, provide examples of its use in a number of production applications, explain how OpenACC and CUDA Unified Memory working together can dramatically simplify GPU programming, and close with a few thoughts on OpenACC future directions."
Watch the video: https://youtu.be/CaE3n89QM8o
Learn more: https://www.openacc.org/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev
Alexey Zinoviev presented this paper on the JavaDayKiev'15 conference http://javaday.org.ua/kyiv/#schedule
This paper covers next topics: Java, Spark, Hadoop, Mahout, MLlib, Weka, Machine Learning, Data Mining
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
This is a sharing on a seminar held together by Cathay Bank and the AWS User Group in Taiwan. In this sharing, overview of Amazon EMR and AWS Glue is offered and CDK management on those services via practical scenarios is also presented
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
Speaker: Paul Navrátil, Texas Advanced Computing Center (TACC)
The design emphasis for supercomputing systems has moved from raw performance to performance-per-watt, and as a result, supercomputing architectures are converging on processors with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in software. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this “software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be performed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization).
This talk presents recent work in high-fidelity visualization using the OSPRay ray tracing framework on TACC’s local and remote visualization systems. We present work using OSPRay within ParaView Catalyst in situ framework from Kitware, including capitalizing on opportunities to reduce data costs migrating through VTK filters for visualization. We highlight the performance opportunities and advantages of Intel® Advanced Vector Extensions 512, the memory system improvements possible with Intel® Xeon Phi™ processor multi-channel DRAM (MCDRAM) and the Intel® Omni-Path Architecture interconnect.
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...Alessandro Confetti
In complex legacy modernization projects, rebuilding company-wide knowledge about and around business processes is one of the most challenging tasks. Engaging business stakeholders and capturing their actual needs is paramount, but not always enough to get all the underlying complex business logics, and, most of all, assessing the impact of changes.
The DReAMS research line focuses on the definition of methodologies and software frameworks supporting the development of hardware-software system for reconfigurable systems, in the personalized medicine, genomics, machine learning, cryptography, cloud infrastructure or embedded and IoT context, industrial and consumer fields, characterizes the methodological activities.
Special attention is devoted to the definition of methodologies for developing heterogeneous distributed adaptive computing systems, studying methodologies to model, simulate, design and optimize those architectures, both in terms of performances and power consumption.
During the talk we will focus on two main research projects inside the DReAMS research line: HUGenomics and CAOS.
The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture. After HUGenomics we will present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA.
This talk will provide an introduction to the DReAMS reserach line at NECSTLab. At NECSTLab we are working at developing a Coursera specialization. The set of four courses will introduce the students to the FPGA technologies, to the concept of reconfigurability in FPGAs, presenting the available mechanisms and technologies at the device level and the tools and design methodologies required to design FPGA-based computing systems. The course will present the different aspects of the design of FPGA-based systems, starting from basic knowledge to advanced design methodologies to implement complex design via SDAccel on Amazon AWS F1 instances. This talk will start describing the work done so far and the future plans in realizing the specialization.
We will then focus on two research projects that will be also used during the online classes.
We will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS will present the HUGenomics projects. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
Introduction to Software Defined Visualization (SDVis)Intel® Software
Software defined visualization (SDVis) is an open-source initiative from Intel and industry collaborators. Improve the visual fidelity, performance, and efficiency of prominent visualization solutions, while supporting the rapidly growing big data use on workstations through high-performance computing (HPC) on supercomputing clusters without memory limitations and cost of GPU-based solutions.
In this deck from FOSDEM'19, Thomas Schwinge presents: Speeding up Programs with OpenACC in GCC.
"Proven in production use for decades, GCC (the GNU Compiler Collection) offers C, C++, Fortran, and other compilers for a multitude of target systems. Over the last few years, we -- formerly known as "CodeSourcery", now a group in "Mentor, a Siemens Business" -- added support for the directive-based OpenACC programming model. Requiring only few changes to your existing source code, OpenACC allows for easy parallelization and code offloading to accelerators such as GPUs. We will present a short introduction of GCC and OpenACC, implementation status, examples, and performance results.
OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model."
Watch the video: https://wp.me/p3RLHQ-jOR
Learn more: https://fosdem.org/2019/
and
https://www.openacc.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
This implemented DSP system utilizes TCP socket communication. Upon message reception, it decides the appropriate process to be executed based on cases which can be categorized as follows:
1) image capture
2) image transfer
3) image processing
4) sensor calibration
A user-friendly MATLAB GUI, named DIPeth, facilitates the system's control.
Mobile data traffic has quadrupled since 2013. In order to cope with a newly diversified device landscape, engineers have embraced responsive design. Implementing “responsive images” is the most important thing that you can do for a responsive site’s performance.
In this session, we discuss the past, present, and future of responsive images.
For image optimization, reducing the quality doesn’t always lead to degradation of visual experience. In fact, precise adjustment of compression level and fine tuning of encoding settings can reduce significantly the file size without any noticeable degradation. But, there is no standard quality setting that works for all images - it depends on the compression algorithm, image format and content. And manually experimentation is not scalable.
In this webinar we cover how to find the best quality compression level and optimal encoding settings, in order to produce a perceptually fine image while minimizing the file size.
B2B Product Marketing. What is the role of Product Marketing in organizations? What are the most important skills to be a good product marketing manager?
Listen to Professor Ross Walker and Adrian Roitberg explaining the new GPU features of AMBER version 14. With these features, AMBER is now world's fastest Molecular Dynamics package.
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChemCan Ozdoruk
Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to O(10 to the 3) atoms feasible. Simulations on such systems that previously required half a week on traditional processors can now be completed in only half an hour. Listen to Professor Heather Kulik, Massachusetts Institute of Technology, as she discusses how she leverages these GPU-accelerated quantum chemistry methods in the code TeraChem to investigate large-scale quantum mechanical features in applications ranging from protein structure to mechanochemical depolymerization. In each case, large-scale and rapid evaluation of electronic structure properties is critical for unearthing previously poorly understood properties and mechanistic features of these systems. Professor Kulik also discusses outstanding challenges in the use of Gaussian localized-basis-set codes on GPUs pertaining to limitations in basis set size and how she circumvents such challenges to computational efficiency with systematic, physics-based error corrections to basis set incompleteness
Slides by VMD lead developer Mr. John Stone, a pioneer in the field of MD Visualization. Visualization is essential to unlocking key insights from the results of MD simulations. Mr. Stone explains the many GPU-accelerated features of VMD. You can learn how these features can help you speed up a wide range of simulation preparation, analyses, and visualization tasks.
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDCan Ozdoruk
Computational scientists at the University of Illinois at Urbana–Champaign and the University of Pittsburg have now resolved the HIV capsid's chemical structure. As reported recently on the cover of Nature, the researchers combined NMR structure analysis, electron microscopy and data-guided molecular dynamics simulations utilizing VMD to prepare and analyze simulations performed using NAMD on NVIDIA GPUs in one of the most powerful computers worldwide, Blue Waters, to obtain and characterize the HIV-1 capsid. The discovery can now guide the design of novel drugs for enhanced antiviral therapy.Also learn how NAMD performs with the latest Kepler GPUs, as well as details about GPU Test Drive (www.nvidia.com/GPUTestDrive) and how to try NAMD on Kepler GPUs for free.
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUsCan Ozdoruk
Acellera Founder Gianni De Fabritiis, and CTO Matt Harvey talk about the latest developments of high-throughput molecular dynamics both in terms of applications and methodological advances. Examples are in the context of ACEMD, a highly efficient, best-in-class graphical processing units (GPUs) centric code for running MD simulations, and its protocols. In particular, attendees will learn how the high arithmetic performance and intrinsic parallelism of the latest NVIDIA Kepler GPUs can offer a technological edge for molecular dynamics simulations. Try GPUs for free via: www.Nvidia.com/GPUTestDrive
This webinar showcases the latest GPU-acceleration technologies available to AMBER users and discusses features, recent updates and future plans. Go through the sides to learn how to obtain the latest accelerated versions of AMBER, which features are supported, the simplicity of its installation and use, and how it performs with Kepler GPUs. To run AMBER free on GPUs register here: www.Nvidia.com/GPUTestDrive
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Connector Corner: Automate dynamic content and events by pushing a button
Introduction to SeqAn, an Open-source C++ Template Library
1. Test Drive NVIDIA GPUs!
Experience The Acceleration
Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters
www.nvidia.com/GPUTestDrive
2. Prof. Dr. Knut Reinert
Algorithmische Bioinformatik, FB Mathematik und Informatik
Intro to SeqAn
An Open-Source C++ template library
for biological sequence analysis
Knut Reinert, David Weese
Freie Universität Berlin Berlin
Institute for Computer Science
4. ~ 15 years ago...
Data volume and cost:
In 2000 the 3 billion base pairs of the
human genome were sequenced for
about 3 billion US$ Dollar
100 million bp per day
Nvidia Webinar, 22.10.2013
4
5. Sequencing today...
Illumina HiSeq
100 Billion bps per DAY
Within roughly ten years sequencing has
become about 10 million times cheaper
Nvidia Webinar, 22.10.2013
5
6. Future of NGS data analysis
Nvidia Webinar, 22.10.2013
6
8. SeqAn
Now SeqAn/SeqAn tools have been cited more
than 360 times
Among the institutions are (omitting German institutes):
Department of Genetics, Harvard Medical School, Boston,
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
J. Craig Venter Institute, under BSD USA,
Is Rockville MD, license and
Department of Molecular Biology, Princeton University,
hence free for academic
Applied Mathematics Program, Yale University, New Haven,
IBM T.J. Watson Research Center, Yorktown Heights,
AND commercial use.
The Ohio State University, Columbus, University of Minnesota,
Australian National University, Canberra,
Department of Statistics, University of Oxford,
Swedish University of Agricultural Sciences (SLU), Uppsala,
Graduate School of Life Sciences, University of Cambridge,
Broad Institute, Cambridge, USA,
EMBL-EBI, University of California, University of Chicago,
Iowa State University, Ames, The Pennsylvania State University,
Peking University, Beijing University of Science and Technology of China,
BGI-Shenzhen, China, Beijing Institute of Genomics……
Nvidia Webinar, 22.10.2013
8
30. SeqAn SDK Components
Review Board to ensure code quality
CDash/CTest to automatically
Code coverage reports
compile and test across platforms
Nvidia Webinar, 22.10.2013
30
32. Unified Alignment Algorithms
Versatile & Extensible DP-Interface
For Example ...
Standard DP-Algorithms
Global & Semi Global Alignments
Local Alignments
Modified DP-Algorithms
Split Breakpoint Detection
Banded Chain Alignment
Nvidia Webinar, 22.10.2013
32
33. Unified Alignment Algorithms
For
Example
...
Needleman-Wunsch with Traceback:
DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback:
DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Banded Smith-Waterman with Affine Gap Costs:
DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Split-Breakpoint Detection for Right Anchor:
DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >
Nvidia Webinar, 22.10.2013
33
34. Support for Common File Formats
Important file formats for HTS analysis
SequenceStream
ss(“file.fa.gz”);
Sequences
while
(!atEnd(ss))
FASTA, FASTQ
Indexed FASTA (FAI) for random access {
Genomic Features
GFF 2, GFF 3, GTF, BED
Read Mapping
SAM, BAM (plus BAM indices)
Variants
VCF
readRecord(id,
seq,
ss);
cout
<<
id
<<
't'
<<
seq
<<
'n';
}
BamStream
bs(“file.bam”);
while
(!atEnd(bs))
{
readRecord(record,
bs);
cout
<<
record.qName
<<
't'
<<
record.pos
<<
'n’;
}
… or write your own parser
Tutorials and helper routines for writing your own parsers.
Nvidia Webinar, 22.10.2013
34
36. Fragment
Store
(Multi) Read Alignments
Read alignments can be easily imported:
std::ifstream
file("ex1.sam");
read(file,
store,
Sam());
… and accessed as a multiple alignment, e.g. for visualization:
AlignedReadLayout
layout;
layoutAlignment(layout,
store);
printAlignment(svgFile,
Raw(),
layout,
store,
1,
0,
150,
0,
36);
Nvidia Webinar, 22.10.2013
36
37. Unified
Full-‐Text
Indexing
Framework
Available Indices
Suffix Trees:
• suffix array
• enhanced suffix array
• lazy suffix tree
Prefix Trie:
• FM-index
q-Gram Indices:
• direct addressing
• open addressing
• gapped
Index<TSeq,
IndexEsa<>
>
Index<StringSet<TSeq>,
FMIndex<>
>
All indices support multiple strings and external memory construction/usage.
Index Lookup Interface
All indices support the (sequential) find interface:
Finder<TIndex>
finder(index);
while
(find(finder,
"TATAA"))
cout
<<
"Hit
at
position"
<<
position(finder)
<<
endl;
Nvidia Webinar, 22.10.2013
37
40. Masai read mapper
Reads
Genome
Chr.
1
Chr.
2
Chr.
X
ACGCTTCATCGCCCT…
Index
of
reads
(Radix
tree
of
seeds)
Index
of
genome
(e.g.
FM-‐index)
Algorithm
is
based
on
the
simultaneous
traversal
of
two
string
indices
(e.g.,
FM-‐index,
Enhanced
suffix
array,
Lazy
suffix
tree)
40
Nvidia Webinar, 22.10.2013
41. Read Mapping: Masai
Faster
and
more
accurate
than
BWA
and
BowLe2
Timings
on
a
single
core
Nvidia Webinar, 22.10.2013
41
43. Collaboration to parallelize indices and
verification algorithms in SeqAn, to speed up any
applications making use of indices
What about multi-core implementation?
Nvidia Webinar, 22.10.2013
43
44. SeqAn going parallel
GOAL
Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU
Will
be
replaced
by
hg18
and
10
million
20-‐mers
Nvidia Webinar, 22.10.2013
44
46. SeqAn going parallel : NVIDIA GPUs
Copy
needles
and
index
to
GPU
SAME
count
funcLon
as
on
CPU
!
Nvidia Webinar, 22.10.2013
46
47. SeqAn going parallel
Count
occurrences
of
10
million
20-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
…12...
Intel
Xeon
Phi
7120,
244
threads
NVIDIA
Tesla
K20
Nvidia Webinar, 22.10.2013
18.6
sec
1
X
2.66
sec
7
X
2.18
sec
8.5
X
0.4 s
47
X
47
48. SeqAn going parallel
Approx.
count
occurrences
of
1.2
million
33-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
…12...
66.1
s
9.0
s
1
X
7.3
X
Intel
Xeon
Phi
7120,
244
threads
3.9 s
16.9
X
NVIDIA
Tesla
K20
3.2 s
20.7
X
Nvidia Webinar, 22.10.2013
48
49. Part II: The details
Nvidia Webinar, 22.10.2013
49
51. CUDA preliminaries
In
order
to
use
CUDA
we
first
had
to
adapt
some
parts
of
SeqAn:
• CUDA
requires
each
funcLon
to
be
prefixed
with
domain
qualifiers
__host__
or
__device__
in
order
to
generate
CPU/GPU
code
• We
prefixed
all
basic
template
funcLons
with
a
SEQAN_HOST_DEVICE
macro
#ifdef __CUDACC__!
#define SEQAN_HOST_DEVICE inline __device__ __host__!
#else!
#define SEQAN_HOST_DEVICE inline!
#endif!
• StaLc
const
arrays
are
not
allowed
in
the
way
SeqAn
defines
them
• We
replaced
alphabet
conversion
lookup
tables
(e.g.
Dna<-->
char)
by
conversion
funcLons
Nvidia Webinar, 22.10.2013
52. Strings
• Instead
of
defining
a
new
CUDA
string
we
simply
use
the
Thrust
library:
• Provides
host_vector
and
device_vector
classes,
which
are
vectors
with
buffers
in
host
or
device
memory
• However,
Thrust
funcLons
are
callable
only
from
host-‐side
• We
made
both
vectors
accessible
from
SeqAn
• SeqAn
strings
have
to
provide
a
set
of
global
(meta-‐)funcLons,
e.g.
Value<>,
resize(),
…
• We
simply
defined
the
required
wrapper
funcLons
for
these
two
vectors
Nvidia Webinar, 22.10.2013
53. Standard Strings
• Up
to
here,
all
strings
can
only
be
used
on
the
side
of
their
scope
Device
Memory
Host
Memory
thrust::host_vector!
Buffer
Buffer
thrust::device_vector!
seqan::String!
Nvidia Webinar, 22.10.2013
Buffer
seqan::String!
Buffer
54. Host-Device String
• How
to
access
a
device_vector
from
device-‐side?
• We
could
pass
(POD)
iterators
to
the
kernel
• However,
many
SeqAn
algorithms
work
on
more
complex
containers
• We
need
the
same
interface
of
the
container
on
the
device
side
• For
strings
we
developed
a
so-‐called
ContainerView (POD
type)
• Provides
a
container
interface
given
the
begin/end
pointers
of
vector
buffer
• The
view()
funcLon
creates
the
ContainerView
object
for
a
given
device_vector!
Nvidia Webinar, 22.10.2013
55. Host-Device String
• How
to
use
a
device_vector
on
the
device
Device
Memory
Host
Memory
Buffer
thrust::device_vector!
view()!
seqan::ContainerView!
Nvidia Webinar, 22.10.2013
kernel
launch!
seqan::ContainerView!
56. Device and View metafunctions
• For
generic
GPU
programming:
• The
Device
metafuncLon
returns
the
device-‐memory
equivalent
of
a
class
// Replaces String with thrust::device_vector.!
template <typename TValue, typename TSpec>!
struct Device<String<TValue, TSpec> >!
{!
typedef thrust::device_vector<TValue> Type;!
};!
• The
View
metafuncLon
returns
the
(POD)
view
type
of
a
class
// Returns a view type that can be passed to a CUDA kernel.!
template <typename TValue, typename TAlloc>!
struct View<thrust::device_vector<TValue, TAlloc> >!
{!
typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;!
};!
Nvidia Webinar, 22.10.2013
57. Hello world
• A
simple
example
to
reverse
a
string
on
the
GPU
// A standard SeqAn string over the Dna alphabet.!
String<Dna> myString = "ACGT";!
!
// A Dna string on device global memory.!
typename Device<String<Dna> >::Type myDeviceString;!
!
// Copy the string to global memory.!
assign(myDeviceString, myString);!
!
// Pass a view of the device string to the CUDA kernel.!
myKernel<<<1,1>>>(view(myDeviceString));!
!
// TString is ContainerView<device_vector<Dna> >.!
template <typename TString>!
__global__ void myKernel(TString string)!
{!
printf(”length(string) = %dn", length(string));!
reverse(string);!
}!
Nvidia Webinar, 22.10.2013
58. Porting complex data structures
• More
complex
structures
(e.g.
Index,
Graph)
can
only
be
ported
to
the
GPU
if
they
…
• don’t
use
pointers
• use
only
strings
of
POD
types
(String<Dna>,
but
not
String<String<…> >)
• use
only
1-‐dimensional
StringSets
(ConcatDirect)
• Nested
classes
are
no
problem
• View
metafuncLon
converts
all
member
types
into
their
view
types
• view()
funcLon
is
called
recursively
on
all
members
Nvidia Webinar, 22.10.2013
63. The FM-index in SeqAn
• The
FM-‐index
can
be
implemented
using
a
number
of
string-‐based
lookup
tables
• ...
as
well
as
other
indices,
e.g.
enhanced
suffix
array,
q-‐gram
index
• There
is
a
space-‐Lme
tradeoff
between
all
these
indices
• The
FM
index
has
the
minimal
memory
requirements
Nvidia Webinar, 22.10.2013
64. A generic FM-index
• SeqAn‘s
FM-‐index
consists
of
some
nested
classes
storing
Strings
FM-‐index
(host-‐only)
Nvidia Webinar, 22.10.2013
65. A generic FM-index
• The
Device
type
of
the
FM
index
uses
device_vector
instead
of
String!
GPU
FM-‐index
(host-‐part)
• The
view
of
this
object
(=
device-‐part)
is
the
same
tree,
where
leaves
are
replaced
by
ContainerViews
of
device_vectors
Nvidia Webinar, 22.10.2013
66. CPU vs. GPU
• Invoking
an
FM-‐index
based
search
on
CPU
and
GPU:
// Select the index The findGPU kernel AND the
type.!
findCPU function will TIndex;!
typedef Index<DnaString, FMIndex<> > invoke many
!
instances of the SAME generic
// Type is Index<device_vector<Dna>, FMIndex<> >.!
function which will perform a
typedef typename Device<TIndex>::Type TDeviceIndex;!
!
// ======== On CPU
!
backtracking algorithm on our
========
// ==========
generic index interface
On
// Create an index.
TIndex index("ACGTTGCAA");
GPU ===========!
// Create a device index.!
TIndex index("ACGTTGCAA");!
TDeviceIndex deviceIndex;!
assign(deviceIndex, index);!
!
// Use the FM-index on CPU.
findCPU(index,…);
!
template <typename TIndex>
void
findCPU(TIndex & index,…);
Nvidia Webinar, 22.10.2013
// Use the FM-index in a CUDA kernel.!
findGPU<<<...>>>(view(deviceIndex),…);!
template <typename TIndex>!
__global__ void!
findGPU(TIndex index,…);!
67. Approximate search via backtracking
do {!
if (finder.score == finder.scoreThreshold)!
{!
if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);!
goUp(textIt);!
if (isRoot(textIt)) break;!
}!
else if (finder.score < finder.scoreThreshold)!
{!
if (atEnd(patternIt)) delegate(finder);!
else if (goDown(textIt))!
{!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
continue;!
}!
}!
!
!
do {!
goPrevious(patternIt);!
finder.score -= parentEdgeLabel(textIt) != value(patternIt);!
} while (!goRight(textIt) && goUp(textIt));!
if (isRoot(textIt)) break;!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
}!
while (true);!
Nvidia Webinar, 22.10.2013
68. Outlook for GPU support
• Our
next
steps
are:
• Provide
parallelFor()
to
hide
CUDA
kernel
call/OpenMP
for-‐loop
• Develop
classes
for
concurrent
access
(String,
job
queues)
• Port
more
indices
and
index
iterators
to
be
used
with
CUDA
• Port
SeqAn‘s
alignment
module
• Develop
a
CPU/GPU
version
of
the
FM-‐index
based
read
mapper
Masai
• ...
• Follow
our
development:
• Sources:
hqps://github.com/seqan/seqan/tree/develop
• Code
examples:
hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA
Nvidia Webinar, 22.10.2013
70. Multicore parallelization
• We
first
introduced
Tags
to
switch
between
serial
and
parallel
algorithms:
struct Serial_;!
typedef Tag<Serial_> Serial;!
!
struct Parallel_;!
typedef Tag<Parallel_> Parallel;!
• Then
we
defined
basic
atomic
operaLons
required
for
thread
safety:
template <typename T>!
inline T atomicInc(T &x, Serial)!
{!
return ++x;!
}!
!
template <typename T>!
inline T atomicInc(volatile T &x, Parallel)!
{!
__sync_add_and_fetch(&x, 1);!
}!
71. Splitter
• To
this
end,
we
developed
the
Splitter<TValue, TSpec>
to
compute
a
parLLon
into
subintervals
of
(almost)
equal
length
…
Splitter<unsigned> splitter(10, 20, 3);!
for (unsigned i = 0; i < length(splitter); ++i)!
cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;!
!
// [10,14)!
// [14,17) !
// [17,20)!
72. Splitter
• The
Spliqer
can
also
be
used
with
iterators
directly
• The
Serial
/
Parallel
tag
divides
an
interval
range
into
1
/
#thread_num
many
intervals
template <typename TIter, typename TVal, typename TParallelTag>!
inline void arrayFill(TIter begin_, TIter end_, !
TVal const &value, Tag<TParallelTag> parallelTag)!
{!
Splitter<TIterator> splitter(begin_, end_, parallelTag);!
!
SEQAN_OMP_PRAGMA(parallel for)!
for (int job = 0; job < (int)length(splitter); ++job)!
arrayFill(splitter[job], splitter[job + 1], value, Serial());!
}!
• The
parallel
tag
can
be
used
to
switch
off
the
parallel
behaviour
73. SeqAn going parallel
Count
occurrences
of
10
million
20-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
18.6
sec
1
X
Thank you for your
2.66
sec
7
X
…12...
attention
Intel
Xeon
Phi
7120,
244
threads
NVIDIA
Tesla
K20
2.18
sec
0.4 s
8.5
X
47
X
73
74. Upcoming GTC Express Webinars
October 23 - Revolutionize Virtual Desktops with the One
Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray
Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with
Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with
GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready
Management Solution for GPU-based HPC
Register at www.gputechconf.com/gtcexpress
75. GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§ Science and research
§ Professional graphics
§ Mobile computing
§ Automotive applications
§ Game development
§ Cloud computing
Call opens October 29
www.gputechconf.com
76. Test Drive NVIDIA GPUs!
Experience The Acceleration
Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters
www.nvidia.com/GPUTestDrive