1. The document describes a new technique for storing and analyzing k-mers from large DNA datasets in a memory and computationally efficient manner using probabilistic data structures.
2. It allows for querying whether a k-mer is present, traversing the k-mer graph, and partitioning the graph into smaller disconnected components in a way that guarantees correct "no" answers.
3. This technique has been implemented in a Python package that can partition and assemble datasets of up to 50 Gb in under a week using only 70gb of RAM, providing a 10x speed improvement over existing assemblers.
A brief introduction to deep learning, providing rough interpretation to deep neural networks and simple implementations with Keras for deep learning beginners.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
A brief introduction to deep learning, providing rough interpretation to deep neural networks and simple implementations with Keras for deep learning beginners.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
Brief presentation about keras framework. The propose of this presentation is to give some ideas about how it works and its main functionalities. In addition, is also shown a function to create different models from a config file.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
Time series data is increasingly ubiquitous. This trend is especially obvious in health and wellness, with both the adoption of electronic health record (EHR) systems in hospitals and clinics and the proliferation of wearable sensors. In 2009, intensive care units in the United States treated nearly 55,000 patients per day, generating digital-health databases containing millions of individual measurements, most of those forming time series. In the first quarter of 2015 alone, over 11 million health-related wearables were shipped by vendors. Recording hundreds of measurements per day per user, these devices are fueling a health time series data explosion. As a result, we will need ever more sophisticated tools to unlock the true value of this data to improve the lives of patients worldwide.
Deep learning, specifically with recurrent neural networks (RNNs), has emerged as a central tool in a variety of complex temporal-modeling problems, such as speech recognition. However, RNNs are also among the most challenging models to work with, particularly outside the domains where they are widely applied. Josh Patterson, David Kale, and Zachary Lipton bring the open source deep learning library DL4J to bear on the challenge of analyzing clinical time series using RNNs. DL4J provides a reliable, efficient implementation of many deep learning models embedded within an enterprise-ready open source data ecosystem (e.g., Hadoop and Spark), making it well suited to complex clinical data. Josh, David, and Zachary offer an overview of deep learning and RNNs and explain how they are implemented in DL4J. They then demonstrate a workflow example that uses a pipeline based on DL4J and Canova to prepare publicly available clinical data from PhysioNet and apply the DL4J RNN.
Kudos - A Peer-to-Peer Discussion System Based on Social VotingLuca Matteis
Decentralized Reddit using a DHT to store content and a blockchain to rank such content. Whitepaper with more details here: http://lucaa.org/docs/kudos.pdf
Trying to explore why we need a new platform for mobile computing, and few features of PhoneGap. Each OS has different IDE needs, what is common for them ? and what tools are available in the market ?
Don Gregory, director and chair of Kegler Brown's construction law practice, presented "Mediation- What Every Advocate Should Know" at the Columbus Bar Association on April 23, 2014. The presentation examined mediation and the associated benefits, vanishing civil trials, and settlement.
This paper describes the evolution of the Plan table and DBMSX_PLAN in 11g and some of the features that can be used to troubelshoot SQL performance effectively and efficiently.
"Navigating Your Way to Business Success in India" is a seminar hosted by the Columbus, Ohio, law firm of Kegler, Brown, Hill & Ritter on March 10, 2011.
Topics included:
Legal Issues
Financing
Cultural Considerations
Case Studies
Brief presentation about keras framework. The propose of this presentation is to give some ideas about how it works and its main functionalities. In addition, is also shown a function to create different models from a config file.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
Time series data is increasingly ubiquitous. This trend is especially obvious in health and wellness, with both the adoption of electronic health record (EHR) systems in hospitals and clinics and the proliferation of wearable sensors. In 2009, intensive care units in the United States treated nearly 55,000 patients per day, generating digital-health databases containing millions of individual measurements, most of those forming time series. In the first quarter of 2015 alone, over 11 million health-related wearables were shipped by vendors. Recording hundreds of measurements per day per user, these devices are fueling a health time series data explosion. As a result, we will need ever more sophisticated tools to unlock the true value of this data to improve the lives of patients worldwide.
Deep learning, specifically with recurrent neural networks (RNNs), has emerged as a central tool in a variety of complex temporal-modeling problems, such as speech recognition. However, RNNs are also among the most challenging models to work with, particularly outside the domains where they are widely applied. Josh Patterson, David Kale, and Zachary Lipton bring the open source deep learning library DL4J to bear on the challenge of analyzing clinical time series using RNNs. DL4J provides a reliable, efficient implementation of many deep learning models embedded within an enterprise-ready open source data ecosystem (e.g., Hadoop and Spark), making it well suited to complex clinical data. Josh, David, and Zachary offer an overview of deep learning and RNNs and explain how they are implemented in DL4J. They then demonstrate a workflow example that uses a pipeline based on DL4J and Canova to prepare publicly available clinical data from PhysioNet and apply the DL4J RNN.
Kudos - A Peer-to-Peer Discussion System Based on Social VotingLuca Matteis
Decentralized Reddit using a DHT to store content and a blockchain to rank such content. Whitepaper with more details here: http://lucaa.org/docs/kudos.pdf
Trying to explore why we need a new platform for mobile computing, and few features of PhoneGap. Each OS has different IDE needs, what is common for them ? and what tools are available in the market ?
Don Gregory, director and chair of Kegler Brown's construction law practice, presented "Mediation- What Every Advocate Should Know" at the Columbus Bar Association on April 23, 2014. The presentation examined mediation and the associated benefits, vanishing civil trials, and settlement.
This paper describes the evolution of the Plan table and DBMSX_PLAN in 11g and some of the features that can be used to troubelshoot SQL performance effectively and efficiently.
"Navigating Your Way to Business Success in India" is a seminar hosted by the Columbus, Ohio, law firm of Kegler, Brown, Hill & Ritter on March 10, 2011.
Topics included:
Legal Issues
Financing
Cultural Considerations
Case Studies
Circles of San Antonio Community Coalition is a program of the San Antonio Council on Alcohol and Drug Abuse SACADA). This presentation was used during a new SACADA board member orientation.
"Future Developments + Regulations" was presented by Margeaux Kimbrough on December 4, 2015, at the Ultimate Guide to Oil and Gas Title Law hosted by the National Business Institute.
Margeaux examined how the market for crude oil impacts laws and provided an overview of dormant mineral statutes.
In this presentation we’ll look at five ways in which we can use efficient coding to help our garbage collector spend less CPU time allocating and freeing memory, and reduce GC overhead.
ConFoo - Exploring .NET’s memory management – a trip down memory laneMaarten Balliauw
The .NET Garbage Collector (GC) is really cool. It helps providing our applications with virtually unlimited memory, so we can focus on writing code instead of manually freeing up memory. But how does .NET manage that memory? What are hidden allocations? Are strings evil? It still matters to understand when and where memory is allocated. In this talk, we’ll go over the base concepts of .NET memory management and explore how .NET helps us and how we can help .NET – making our apps better. Expect profiling, Intermediate Language (IL), ClrMD and more!
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.
Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.
Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.
Abstract Summary:
Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.
In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
2. Assistant Professor (2008)
Computer Science & Engineering /
Microbiology and Molecular Genetics,
Michigan State University
BA Reed College/Math
PhD Caltech / Developmental Biology
Member of the Python Software Foundation
(a.k.a. awesomest programming language)
3. I’m a bit sick, so I may cough loudly and
obnoxiously at times.
4. 1. O’Reilly folk asked if I had anything to talk
about.
2. Professors love talking.
3. Nifty techniques, applied to a new problem.
1. Can they be applied to your problem?
2. Do you have any ideas for me?
10. Wisconsin
◦ Native prairie (Goose Pond,
Audubon)
◦ Long term cultivation (corn)
◦ Switchgrass rotation (previously
corn)
◦ Restored prairie (from 1998)
Iowa
◦ Native prairie (Morris prairie)
◦ Long term cultivation (corn)
Kansas
◦ Native prairie (Konza prairie)
◦ Long term cultivation (corn)
Iowa Native Praire
Switchgrass
(Wisconsin)
Iowa >100 yr tilled
11. 30 Gb of sequence from Iowa corn
50 Gb of sequence from Iowa prairie
200 Gb of sequence from Wisconsin corn,
prairie
http://ivory.idyll.org/blog/aug-10/assembly-part-i
http://ivory.idyll.org/blog/jul-10/kmer-filtering
http://ivory.idyll.org/blog/jul-10/illumina-read-
phenomenology
12. Whole (meta)genome shotgun sequencing
involves fragmenting and sequencing,
followed by re-assembly.
The shorter the reads, the more difficult this
is to do reliably.
Assembly scales poorly.
13. Randomly fragment & sequence from DNA;
reassemble computationally.
UMD assembly primer (cbcb.umd.edu)
14. Assembly is inherently an all by all process.
There is no good way to subdivide the short
sequences without potentially missing a key
connection:
15. Essentially, break reads (of any length) down into
multiple overlapping words of fixed length k.
ATGGACCAGATGACAC (k=12) =>
ATGGACCAGATG
TGGACCAGATGA
GGACCAGATGAC
GACCAGATGACA
ACCAGATGACAC
18. For decisions about which paths etc, biology-
based heuristics come into play as well.
19. Fixed-length words => great CS techniques
(hashing, trie structures, etc.)
Data loading/comparison scales with size of your
data, N.
Memory usage scales with # of unique words.
This is an advantage over other techniques
◦ NxN comparisons…
Some disadvantages, too; see review,
J.R. Miller et al. / Genomics (2010)
20. Unlike some other common computational
science problems in physics and chemistry,
which are combinatorial in nature, graph
analysis requires a lot of RAM (to store the
graph).
This leads to the mildly unusual HPC scaling
issue of RAM as a limiting factor.
…and RAM is expensive.
21. If we knew which original genomes our short
sequences came from?
Then we could just put all the sequences that
came from a particular genome in a smaller
bin, and assemble that independently!
23. If we knew which original genomes our short
sequences came from?
Then we could just put all the sequences that
came from a particular genome in a smaller
bin, and assemble that independently!
Unfortunately this is already equivalent to
solving the hard component of the assembly
problem…
24. Q: is this k-mer present in the data set?
A: no => then it is not.
A: yes => it may or may not be present.
This lets us store k-mers efficiently.
25. Once we can store/query k-mers efficiently in
this oracle, we can build additional oracles on
top of it:
26. Q: does this k-mer overlap with this other k-
mer?
A: no => then it does not, guaranteed.
A: yes => it may or may not.
This lets us traverse k-mer graphs efficiently.
27. Conveniently, perhaps
the simplest data
structure in computer
science is what we
need…
…a hash table that
ignores collisions.
Note, P(false positive) =
fractional occupancy.
28. If you ignore collisions…
O(1) query, insertion, update
Fixed memory usage
Ridiculously simple to implement (although
developing a good hash function can take
some effort)
29. Conveniently, perhaps
the simplest data
structure in computer
science is what we
need…
…a hash table that
ignores collisions.
Note, P(false positive) =
fractional occupancy.
30. Use a Bloom filter approach – multiple oracles,
in serial, are multiplicatively more reliable.
http://en.wikipedia.org/wiki/Bloom_filter
31. Adding additional filters increases discrimination
at the cost of speed.
This gives you a fairly straightforward tradeoff:
memory (decrease individual false positives) vs
computation (more filters!)
35. We can now ask, “does k-mer
ACGTGGCAGG… occur in the data set?”,
quickly and accurately.
This implicitly lets us store the graph
structure, too!
36. Once you can look up k-mers quickly, traversal
is easy: there are only 8 possible overlapping
k-mers:
4 before, and 4 after.
37. We can now ask, “does k-mer
ACGTGGCAGG… occur in the data set?”,
quickly and accurately.
This implicitly lets us store the graph
structure, too, because there are only 8
possible connected nodes.
We can now traverse this graph structure and
ask several times of questions:
55. Nodes will never be erroneously disconnected.
This is critically important: it guarantees that our
k-mer graph representation yields reliable “no”
answers.
This, in turn, lets us reliably partition graphs into
smaller graphs…
…and we can do so iteratively.
56.
57. 1. Built lightweight probabilistic data
structure/algorithm for k-mer storage.
- Constant memory, constant lookup
- Linear time to create structure
2. Implemented systematic graph traversal of
arbitrarily large graphs (> ~3 billion connected
k-mers, so far)
- Affine memory (with small linear constant)
- Bounded time for exploration; bound traded for
memory
3. Built partitioning system to eliminate small
graphs and extract disconnected graphs.
60. Python wrapping C++, ~5000 LoC. (Python handles
parallelization; go free, GIL!)
Partitioning & assembling 2 Gb data set can be done in ~8
gb of RAM in < 1 day
◦ Compare with 40 gb requirement for existing (released) assemblers.
◦ Probably 10-fold speed improvement easily (KISS; no premature opt)
Can partition, assemble ~50 Gb in < 1 wk in 70 gb of RAM,
single chassis, 8 CPU.
Not yet clear how well it scales to 200 Gb, but should…
…all of this is running on Amazon cloud rentals.
61. Lightweight probabilistic storage system for
k-mers, ~1 byte / k-mer.
Large graph traversal (10-20 bn k-mers)
◦ Tabu search
◦ Neighborhood exclusion
Graph partitioning, trimming, grokking.
◦ Iterative refinement is “perfect”
◦ Failure rate ~ memory usage, with good failover (
connectivity increases).
62. More general assembly graph analysis
Breaking graphs in good places
Clustering of large protein similarity graphs/matrices
Caveats:
Preferential attachment with false positives?
First publication --
Bloom counting hash (see kmer-filtering blog post)
63. We were lucky & could turn our graph traversal
problem into a set membership query.
Tabu search / neighborhood exclusion for
exhaustive graph traversal isn’t novel, but might
be useful. Requires systematic tagging.
But… random and probabilistic approaches (skip
lists, Bloom filters, etc.) can be surprisingly
useful.
◦ One sided errors are awesome for Big Data.
http://en.wikipedia.org/wiki/Category:
Probabilistic_data_structures
64. GED lab / k-mer gang
Adina Howe (w/Tiedje)
Arend Hintze, postdoc
Jason Pell, grad
Rosangela Canino-Koning,
grad
Qingpeng Zhang, grad
Collaborators (MSU)
Weiming Li
Charles Ofria
Jim Tiedje
(w/Janet Jansson, Rachel
Mackelprang (JGI))
Funding
USDA NIFA, NSF, DOE,
Michigan State U.
65. ABySS assembler – multi-node assembly in RAM
On-disk assembly:
SOAP assembler (BGI) – not open source
Cortex assembler (EBI) – unpub/not released
Contrail assembler (Michael Schatz) – unpub/not
released
It’s hard for me to tell how these last three compare ;)
BUT our current approach is orthogonal and can be
used in conjunction (as a pre-filter) with these
assemblers.
Editor's Notes
Note, no tolerance for indels
@@
@@
Paint between the greens.
When a green connects two or more colors, recolor one color.