•

2 likes•1,249 views

The document summarizes a method for mining frequent subgraphs from linear graphs. It describes: 1) Representing data like proteins, RNA and texts as linear graphs and the need for algorithms to mine frequent patterns from such graphs. 2) A method called LGM that can efficiently enumerate and mine both connected and disconnected subgraphs from linear graphs using reverse search techniques. 3) Experiments applying LGM to mine motifs from protein structures and phrases from texts, achieving better performance than existing methods.

Report

Share

Report

Share

Lgm pakdd2011 public

LGM is an algorithm that efficiently mines frequent subgraphs from a set of linear graphs. It uses a reverse search approach to enumerate all subgraphs without duplication, defining a search tree with a reduction map. By inverting the reduction map, it can extend patterns from parent to children nodes. Experiments apply LGM to mine motifs from protein structures, finding statistically significant patterns associated with thermophilic or mesophilic functions.

gSpan algorithm

Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes to compare graphs for isomorphism testing. The algorithm grows patterns by extending edges in lexicographic order, checking the anti-monotonic property to prune infrequent subgraphs. Gspan compares the minimum DFS codes of two graphs to determine isomorphism, allowing simple string comparison of graphs. This helps reduce the problem size versus subgraph isomorphism testing.

Data Mining Seminar - Graph Mining and Social Network Analysis

Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures

Trends In Graph Data Management And Mining

Keynote speech at Symposium on Emerging Trends in Database Technologies (ETDT), Pune Institute of Engineering and Technology, October 2004.

5.5 graph mining

Graph mining involves discovering frequent subgraphs, patterns, or substructures from a graph database. It has applications in domains like cheminformatics, bioinformatics, social network analysis, and knowledge discovery. There are two main approaches for frequent subgraph mining - Apriori-based approaches that generate candidates level-wise and pattern growth approaches that extend frequent subgraphs. The gSpan algorithm reduces redundant searching by using a depth-first search ordering of the graphs. Mining closed, maximal or dense frequent subgraphs can further reduce the number of patterns discovered. Applications include graph indexing, substructure similarity search, and graph classification or clustering.

Close Graph

The document discusses improvements made in the CloseGraph algorithm over the previous gSpan algorithm for mining frequent graph patterns. CloseGraph mines only closed frequent graphs, which significantly reduces the number of generated patterns compared to gSpan. This is done by introducing the concepts of equivalent occurrence and extended frequency counting to allow for early termination of the pattern growth. Experimental results show CloseGraph outperforms gSpan by a factor of 4 to 10 in both runtime and number of patterns generated.

Survey on Frequent Pattern Mining on Graph Data - Slides

The document discusses various approaches for graph-based data mining to identify frequently occurring subgraph patterns. It describes mathematical graph theory based approaches like Apriori-based methods, greedy search based approaches like SUBDUE and GBI, inductive logic programming approaches like WARMR and FARMER, and inductive database approaches. It also covers kernel function based approaches using support vector machines for classification.

Graph mining ppt

This document discusses graph mining, including its motivation, applications, and algorithms. Graph mining aims to discover repetitive subgraphs in graph datasets. It has many applications including analyzing chemical compounds, biological networks, program flows, and social networks. The document outlines several graph mining algorithms, including the Apriori-based FSG algorithm, the DFS-based gSpan algorithm, and the greedy Subdue algorithm. It also distinguishes between the transaction setting and single graph setting for graph mining problems.

Lgm pakdd2011 public

LGM is an algorithm that efficiently mines frequent subgraphs from a set of linear graphs. It uses a reverse search approach to enumerate all subgraphs without duplication, defining a search tree with a reduction map. By inverting the reduction map, it can extend patterns from parent to children nodes. Experiments apply LGM to mine motifs from protein structures, finding statistically significant patterns associated with thermophilic or mesophilic functions.

gSpan algorithm

Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes to compare graphs for isomorphism testing. The algorithm grows patterns by extending edges in lexicographic order, checking the anti-monotonic property to prune infrequent subgraphs. Gspan compares the minimum DFS codes of two graphs to determine isomorphism, allowing simple string comparison of graphs. This helps reduce the problem size versus subgraph isomorphism testing.

Data Mining Seminar - Graph Mining and Social Network Analysis

Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures

Trends In Graph Data Management And Mining

Keynote speech at Symposium on Emerging Trends in Database Technologies (ETDT), Pune Institute of Engineering and Technology, October 2004.

5.5 graph mining

Graph mining involves discovering frequent subgraphs, patterns, or substructures from a graph database. It has applications in domains like cheminformatics, bioinformatics, social network analysis, and knowledge discovery. There are two main approaches for frequent subgraph mining - Apriori-based approaches that generate candidates level-wise and pattern growth approaches that extend frequent subgraphs. The gSpan algorithm reduces redundant searching by using a depth-first search ordering of the graphs. Mining closed, maximal or dense frequent subgraphs can further reduce the number of patterns discovered. Applications include graph indexing, substructure similarity search, and graph classification or clustering.

Close Graph

The document discusses improvements made in the CloseGraph algorithm over the previous gSpan algorithm for mining frequent graph patterns. CloseGraph mines only closed frequent graphs, which significantly reduces the number of generated patterns compared to gSpan. This is done by introducing the concepts of equivalent occurrence and extended frequency counting to allow for early termination of the pattern growth. Experimental results show CloseGraph outperforms gSpan by a factor of 4 to 10 in both runtime and number of patterns generated.

Survey on Frequent Pattern Mining on Graph Data - Slides

The document discusses various approaches for graph-based data mining to identify frequently occurring subgraph patterns. It describes mathematical graph theory based approaches like Apriori-based methods, greedy search based approaches like SUBDUE and GBI, inductive logic programming approaches like WARMR and FARMER, and inductive database approaches. It also covers kernel function based approaches using support vector machines for classification.

Graph mining ppt

This document discusses graph mining, including its motivation, applications, and algorithms. Graph mining aims to discover repetitive subgraphs in graph datasets. It has many applications including analyzing chemical compounds, biological networks, program flows, and social networks. The document outlines several graph mining algorithms, including the Apriori-based FSG algorithm, the DFS-based gSpan algorithm, and the greedy Subdue algorithm. It also distinguishes between the transaction setting and single graph setting for graph mining problems.

AI Lesson 06

Iterative deepening A* (IDA*) is an informed search algorithm similar to iterative deepening depth-first search but uses an f-limit instead of depth limit. It expands nodes in best-first order up to the f-limit. The f-limit is increased each iteration by the minimum f-value of any node pruned in the previous iteration. IDA* is complete, optimal, and requires less space than A* but can expand more nodes on problems where heuristic values are unique. Local search methods like hill climbing iteratively improve the current state by moving to a neighboring state with better value until no improvement is possible.

Example of iterative deepening search & bidirectional search

There are the some examples of Iterative deepening search & Bidirectional Search with some definitions and some theory related to the both searches. If you have any query please ask in comment or mail i will be happy to help you

Search algorithms master

•Common Problems Needs Computers
•The Search Problem
•Basic Search Algorithms
–Algorithms used for searching the contents of an array
•Linear or Sequential Search
•Binary Search
•Comparison Between Linear and Binary Search
•Algorithms for solving shortest path problems
–Sequential Search Algorithms
•Depth-First Search
•Breadth First Search
–Parallel or distributed Search Algorithms
•Parallel Depth-First Search
•Parallel Breadth First Search

09 heuristic search

This document discusses various heuristic search algorithms including A*, iterative-deepening A*, and recursive best-first search. It begins by introducing the concept of using evaluation functions to guide best-first search and preferentially expand nodes with lower heuristic values. It then presents the general graph search algorithm and describes how A* specifically reorders nodes using an evaluation function that considers path cost and estimated cost to the goal. Consistency conditions for the heuristic function are discussed which guarantee A* finds optimal solutions.

AI Lesson 05

The document summarizes informed search strategies, including best-first search algorithms like greedy search, uniform-cost search (UCS), and A* search. It provides an overview of how heuristics can be used to guide search toward more promising solutions. A* search is described as using both path cost g(n) and heuristic estimate h(n) to determine the best order of node expansion. The properties of A*, including admissibility, completeness, and optimality, are proven assuming h(n) underestimates cost to the goal. Performance depends on heuristic accuracy, with exponential growth possible if errors are large.

Dfs presentation

The document describes depth-first search (DFS), an algorithm for traversing or searching trees or graphs. It defines DFS, explains the process as visiting nodes by going deeper until reaching the end and then backtracking, provides pseudocode for the algorithm, gives an example on a directed graph, and discusses time complexity (O(V+E)), advantages like linear memory usage, and disadvantages like possible infinite traversal without a cutoff depth.

Solving problems by searching Informed (heuristics) Search

This document discusses various informed (heuristic) search strategies for solving problems, including greedy best-first search, A* search, and memory-bounded variations. Greedy best-first search uses the heuristic function h(n) alone to select nodes for expansion. A* search combines the path cost g(n) and heuristic estimate h(n) to select nodes, guaranteeing an optimal solution if h is admissible. The document provides examples of applying these searches to route finding between cities in Romania. A* search is identified as finding the optimal solution for this problem if using an admissible heuristic like straight-line distance.

Informed search (heuristics)

1) The document discusses various search algorithms including uninformed searches like breadth-first search as well as informed searches using heuristics.
2) It describes greedy best-first search which uses a heuristic function to select the node closest to the goal at each step, and A* search which uses both path cost and heuristic cost to guide the search.
3) Genetic algorithms are introduced as a search technique that generates successors by combining two parent states through crossover and mutation rather than expanding single nodes.

Jarrar: Informed Search

Lecture slides by Mustafa Jarrar at Birzeit University, Palestine.
See the course webpage at: http://jarrar-courses.blogspot.com/2012/04/aai-spring-jan-may-2012.html
and http://www.jarrar.info
and on Youtube:
http://www.youtube.com/watch?v=aNpLekq6-oA&list=PL44443F36733EF123

Lecture 08 uninformed search techniques

This document discusses various uninformed search techniques including breadth-first search (BFS), depth-first search (DFS), uniform cost search, and others. It provides descriptions of each technique including concepts, properties, advantages, and disadvantages. Uniform cost search is described as expanding nodes in order of cost from the source to ensure the lowest cost node is selected, making it complete and optimal/admissible.

A star algorithms

The document describes best first search algorithms. It discusses how best first search algorithms work by always selecting the most promising path based on a heuristic function. The algorithm expands the node closest to the goal at each step. The document provides pseudocode for the best first search algorithm and discusses its advantages of being more efficient than breadth-first and depth-first search, but that it can also get stuck in loops like depth-first search. An example of applying best first search to a problem is given.

DFS and BFS

This document discusses and provides examples of depth-first search (DFS) and breadth-first search (BFS) algorithms for traversing graphs. It explains that DFS involves recursively exploring all branches of the graph as deep as possible before backtracking, while BFS involves searching the neighbors of the starting node first before moving to the next level. Examples are given showing the step-by-step process of applying DFS and BFS to traverse graphs and mark visited vertices.

Lecture 12 Heuristic Searches

The document describes heuristic search algorithms including best first search, branch and bound search. Best first search maintains a priority queue of nodes and expands the node with the lowest cost function first. Branch and bound finds the optimal solution by keeping track of the best solution found so far and abandoning partial solutions that cannot improve on the best. It uses pruning to reduce the number of explored nodes. Both algorithms use concepts like traversing the root node and its neighbors in ascending order of distance from the root until reaching the goal node.

Functions

The document defines and explains key concepts related to functions:
- A function relates an input to an output and maps elements from its domain to its codomain.
- For a function to be valid, each input can only map to one output.
- Functions have properties like being one-to-one (injective), onto (surjective), or both (bijective).
- Other topics covered include the domain, codomain, range, and examples of floor, ceiling, integer and absolute value functions.

GATE Computer Science Solved Paper 2004

GATE Computer Science Solved Paper with solution. Free Practice for GATE Computer Science with Avatto.com.Prepare for GATE Aptitude at Avatto.com.

Lecture13

This document discusses graph algorithms and graph search techniques. It begins with an introduction to graphs and their representations as adjacency matrices and adjacency lists. It then covers graph terminology like vertices, edges, paths, cycles, and weighted graphs. Common graph search algorithms like breadth-first search and depth-first search are explained. Variations of these algorithms like recursive depth-first search and Dijkstra's algorithm for finding shortest paths in weighted graphs are also covered. Examples are provided throughout to illustrate the concepts and algorithms.

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

YouTube Link: https://youtu.be/PbCl67GY1ck
** Machine Learning Engineer Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
In this Edureka Session on Breadth-First Search Algorithm, we will discuss the logic behind graph traversal methods and use examples to understand the working of the Breadth-First Search algorithm.
Here’s a list of topics covered in this session:
1. Introduction To Graph Traversal
2. What is the Breadth-First Search?
3. Understanding the Breadth-First Search algorithm with an example
4. Breadth-First Search Algorithm Pseudocode
5. Applications Of Breadth-First Search
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in

20110319 parameterized algorithms_fomin_lecture03-04

The document discusses graph minors and fixed parameter algorithms. It introduces several important concepts in fixed parameter algorithm design like treewidth, kernelization, color coding, and iterative compression. It also discusses applications of the Graph Minors Theorem to showing that certain problems are fixed-parameter tractable.

Heuristic search

Best-first search is a heuristic search algorithm that expands the most promising node first. It uses an evaluation function f(n) that estimates the cost to reach the goal from each node n. Nodes are ordered in the fringe by increasing f(n). A* search is a special case of best-first search that uses an admissible heuristic function h(n) and is guaranteed to find the optimal solution.

Pathfinding - Part 1: Α* heuristic search

These slides are part of a course about interactive objects in games. The lectures cover some of the most widely used methodologies that allow smart objects and non-player characters (NPCs) to exhibit autonomy and flexible behavior through various forms of decision making, including techniques for pathfinding, reactive behavior through automata and processes, and goal-oriented action planning. More information can be found here: http://tinyurl.com/sv-intobj-2013

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

河原林ERATO感謝祭IIIの発表資料です。

AI Lesson 06

Iterative deepening A* (IDA*) is an informed search algorithm similar to iterative deepening depth-first search but uses an f-limit instead of depth limit. It expands nodes in best-first order up to the f-limit. The f-limit is increased each iteration by the minimum f-value of any node pruned in the previous iteration. IDA* is complete, optimal, and requires less space than A* but can expand more nodes on problems where heuristic values are unique. Local search methods like hill climbing iteratively improve the current state by moving to a neighboring state with better value until no improvement is possible.

Example of iterative deepening search & bidirectional search

There are the some examples of Iterative deepening search & Bidirectional Search with some definitions and some theory related to the both searches. If you have any query please ask in comment or mail i will be happy to help you

Search algorithms master

•Common Problems Needs Computers
•The Search Problem
•Basic Search Algorithms
–Algorithms used for searching the contents of an array
•Linear or Sequential Search
•Binary Search
•Comparison Between Linear and Binary Search
•Algorithms for solving shortest path problems
–Sequential Search Algorithms
•Depth-First Search
•Breadth First Search
–Parallel or distributed Search Algorithms
•Parallel Depth-First Search
•Parallel Breadth First Search

09 heuristic search

This document discusses various heuristic search algorithms including A*, iterative-deepening A*, and recursive best-first search. It begins by introducing the concept of using evaluation functions to guide best-first search and preferentially expand nodes with lower heuristic values. It then presents the general graph search algorithm and describes how A* specifically reorders nodes using an evaluation function that considers path cost and estimated cost to the goal. Consistency conditions for the heuristic function are discussed which guarantee A* finds optimal solutions.

AI Lesson 05

The document summarizes informed search strategies, including best-first search algorithms like greedy search, uniform-cost search (UCS), and A* search. It provides an overview of how heuristics can be used to guide search toward more promising solutions. A* search is described as using both path cost g(n) and heuristic estimate h(n) to determine the best order of node expansion. The properties of A*, including admissibility, completeness, and optimality, are proven assuming h(n) underestimates cost to the goal. Performance depends on heuristic accuracy, with exponential growth possible if errors are large.

Dfs presentation

The document describes depth-first search (DFS), an algorithm for traversing or searching trees or graphs. It defines DFS, explains the process as visiting nodes by going deeper until reaching the end and then backtracking, provides pseudocode for the algorithm, gives an example on a directed graph, and discusses time complexity (O(V+E)), advantages like linear memory usage, and disadvantages like possible infinite traversal without a cutoff depth.

Solving problems by searching Informed (heuristics) Search

This document discusses various informed (heuristic) search strategies for solving problems, including greedy best-first search, A* search, and memory-bounded variations. Greedy best-first search uses the heuristic function h(n) alone to select nodes for expansion. A* search combines the path cost g(n) and heuristic estimate h(n) to select nodes, guaranteeing an optimal solution if h is admissible. The document provides examples of applying these searches to route finding between cities in Romania. A* search is identified as finding the optimal solution for this problem if using an admissible heuristic like straight-line distance.

Informed search (heuristics)

1) The document discusses various search algorithms including uninformed searches like breadth-first search as well as informed searches using heuristics.
2) It describes greedy best-first search which uses a heuristic function to select the node closest to the goal at each step, and A* search which uses both path cost and heuristic cost to guide the search.
3) Genetic algorithms are introduced as a search technique that generates successors by combining two parent states through crossover and mutation rather than expanding single nodes.

Jarrar: Informed Search

Lecture slides by Mustafa Jarrar at Birzeit University, Palestine.
See the course webpage at: http://jarrar-courses.blogspot.com/2012/04/aai-spring-jan-may-2012.html
and http://www.jarrar.info
and on Youtube:
http://www.youtube.com/watch?v=aNpLekq6-oA&list=PL44443F36733EF123

Lecture 08 uninformed search techniques

This document discusses various uninformed search techniques including breadth-first search (BFS), depth-first search (DFS), uniform cost search, and others. It provides descriptions of each technique including concepts, properties, advantages, and disadvantages. Uniform cost search is described as expanding nodes in order of cost from the source to ensure the lowest cost node is selected, making it complete and optimal/admissible.

A star algorithms

The document describes best first search algorithms. It discusses how best first search algorithms work by always selecting the most promising path based on a heuristic function. The algorithm expands the node closest to the goal at each step. The document provides pseudocode for the best first search algorithm and discusses its advantages of being more efficient than breadth-first and depth-first search, but that it can also get stuck in loops like depth-first search. An example of applying best first search to a problem is given.

DFS and BFS

This document discusses and provides examples of depth-first search (DFS) and breadth-first search (BFS) algorithms for traversing graphs. It explains that DFS involves recursively exploring all branches of the graph as deep as possible before backtracking, while BFS involves searching the neighbors of the starting node first before moving to the next level. Examples are given showing the step-by-step process of applying DFS and BFS to traverse graphs and mark visited vertices.

Lecture 12 Heuristic Searches

The document describes heuristic search algorithms including best first search, branch and bound search. Best first search maintains a priority queue of nodes and expands the node with the lowest cost function first. Branch and bound finds the optimal solution by keeping track of the best solution found so far and abandoning partial solutions that cannot improve on the best. It uses pruning to reduce the number of explored nodes. Both algorithms use concepts like traversing the root node and its neighbors in ascending order of distance from the root until reaching the goal node.

Functions

The document defines and explains key concepts related to functions:
- A function relates an input to an output and maps elements from its domain to its codomain.
- For a function to be valid, each input can only map to one output.
- Functions have properties like being one-to-one (injective), onto (surjective), or both (bijective).
- Other topics covered include the domain, codomain, range, and examples of floor, ceiling, integer and absolute value functions.

GATE Computer Science Solved Paper 2004

GATE Computer Science Solved Paper with solution. Free Practice for GATE Computer Science with Avatto.com.Prepare for GATE Aptitude at Avatto.com.

Lecture13

This document discusses graph algorithms and graph search techniques. It begins with an introduction to graphs and their representations as adjacency matrices and adjacency lists. It then covers graph terminology like vertices, edges, paths, cycles, and weighted graphs. Common graph search algorithms like breadth-first search and depth-first search are explained. Variations of these algorithms like recursive depth-first search and Dijkstra's algorithm for finding shortest paths in weighted graphs are also covered. Examples are provided throughout to illustrate the concepts and algorithms.

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

YouTube Link: https://youtu.be/PbCl67GY1ck
** Machine Learning Engineer Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
In this Edureka Session on Breadth-First Search Algorithm, we will discuss the logic behind graph traversal methods and use examples to understand the working of the Breadth-First Search algorithm.
Here’s a list of topics covered in this session:
1. Introduction To Graph Traversal
2. What is the Breadth-First Search?
3. Understanding the Breadth-First Search algorithm with an example
4. Breadth-First Search Algorithm Pseudocode
5. Applications Of Breadth-First Search
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in

20110319 parameterized algorithms_fomin_lecture03-04

The document discusses graph minors and fixed parameter algorithms. It introduces several important concepts in fixed parameter algorithm design like treewidth, kernelization, color coding, and iterative compression. It also discusses applications of the Graph Minors Theorem to showing that certain problems are fixed-parameter tractable.

Heuristic search

Best-first search is a heuristic search algorithm that expands the most promising node first. It uses an evaluation function f(n) that estimates the cost to reach the goal from each node n. Nodes are ordered in the fringe by increasing f(n). A* search is a special case of best-first search that uses an admissible heuristic function h(n) and is guaranteed to find the optimal solution.

Pathfinding - Part 1: Α* heuristic search

These slides are part of a course about interactive objects in games. The lectures cover some of the most widely used methodologies that allow smart objects and non-player characters (NPCs) to exhibit autonomy and flexible behavior through various forms of decision making, including techniques for pathfinding, reactive behavior through automata and processes, and goal-oriented action planning. More information can be found here: http://tinyurl.com/sv-intobj-2013

AI Lesson 06

AI Lesson 06

Example of iterative deepening search & bidirectional search

Example of iterative deepening search & bidirectional search

Search algorithms master

Search algorithms master

09 heuristic search

09 heuristic search

AI Lesson 05

AI Lesson 05

Dfs presentation

Dfs presentation

Solving problems by searching Informed (heuristics) Search

Solving problems by searching Informed (heuristics) Search

Informed search (heuristics)

Informed search (heuristics)

Jarrar: Informed Search

Jarrar: Informed Search

Lecture 08 uninformed search techniques

Lecture 08 uninformed search techniques

A star algorithms

A star algorithms

DFS and BFS

DFS and BFS

Lecture 12 Heuristic Searches

Lecture 12 Heuristic Searches

Functions

Functions

GATE Computer Science Solved Paper 2004

GATE Computer Science Solved Paper 2004

Lecture13

Lecture13

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

20110319 parameterized algorithms_fomin_lecture03-04

20110319 parameterized algorithms_fomin_lecture03-04

Heuristic search

Heuristic search

Pathfinding - Part 1: Α* heuristic search

Pathfinding - Part 1: Α* heuristic search

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

河原林ERATO感謝祭IIIの発表資料です。

Sketch sort ochadai20101015-public

The document summarizes a multiple sorting method called SketchSort for performing all pairs similarity search on large-scale datasets. It maps vector data to binary sketches to reduce memory usage, then applies locality sensitive hashing and multiple sorting to efficiently find all pairs of data points within a given distance threshold. The method is evaluated on large image, chemical compound, and genome sequence datasets and is shown to outperform other state-of-the-art similarity search methods.

Mlab2012 tabei 20120806

The document describes a workshop on machine learning and applications to biology held in Sapporo, Japan in August 2012. It focuses on presenting space-efficient data structures for large-scale chemical fingerprint searches, including multibit trees and succinct representations of trees and tries. The goal is fast similarity searches of chemical fingerprints while using less memory than pointer-based representations.

Kdd2015reading-tabei

KDD2015読み会発表資料

Dmss2011 public

This document summarizes a method for performing kernel-based similarity search in massive graph databases using wavelet trees. It introduces the need for efficient graph similarity search as graph databases grow large. It describes representing graphs as bags-of-words and using a semi-conjunctive query to relax cosine similarity searches. The method replaces inverted indexes with a wavelet tree to enable fast top-down search while using less memory than traditional inverted indexes. Experiments on a dataset of 25 million chemical compounds demonstrate the method's ability to perform similarity search efficiently in large graph databases.

Sketch sort sugiyamalab-20101026 - public

- The document describes a multiple sorting method called SketchSort for efficiently finding all pairs of similar items in large-scale datasets.
- SketchSort maps high-dimensional vector data to binary sketches while preserving distances. It then performs multiple sorting on the sketches to enumerate similar item pairs.
- Experiments show SketchSort can efficiently find neighbor pairs in large image and genetic datasets, outperforming other state-of-the-art methods. It enables applications like clustering and information retrieval in big data domains.

DCC2014 - Fully Online Grammar Compression in Constant Space

FREQ_FOLCA and LOSSY_FOLCA are variants of FOLCA that work in constant space by removing infrequent production rules from the hash table. FREQ_FOLCA divides text into blocks and removes the lowest frequency rules each time the hash table reaches a size limit. LOSSY_FOLCA divides text into blocks and keeps rules for successive blocks based on frequency. Experiments show they can compress 100 human genomes totaling 306GB in about one day while using only a few dozen megabytes of working space.

GIW2013

The document summarizes research on developing a scalable method for predicting compound-protein interactions using minwise hashing. Key points:
- Minwise hashing is used to build compact fingerprints from high-dimensional fingerprints of compound-protein pairs, reducing memory and training time compared to previous methods.
- Linear support vector machines trained on the compact fingerprints achieve similar prediction accuracy as previous nonlinear methods, while requiring less memory and training faster, especially on large datasets of 216 million compound-protein pairs.
- Experiments show the proposed method, MH-L1SVM and MH-L2SVM, outperform baselines in training time while maintaining predictive performance, and it can extract important predictive features.

Gwt presen alsip-20111201

The document describes using a wavelet tree data structure to enable fast similarity searches of massive graph databases. A Weisfeiler-Lehman procedure is used to represent graphs as bags-of-words. The wavelet tree indexes these bags-of-words and allows semi-conjunctive queries to find graphs sharing a minimum number of words with a query graph in sublinear time. Experiments on 25 million molecular graphs showed the approach significantly outperformed inverted indexes in search time and memory usage.

CPM2013-tabei201306

This document summarizes research presented at the 24th Annual Symposium on Combinatorial Pattern Matching. It discusses three open problems in optimally encoding Straight Line Programs (SLPs), which are compressed representations of strings. The document presents information theoretic lower bounds on SLP size and describes novel techniques for building optimal encodings of SLPs in close to minimal space. It also proposes a space-efficient data structure for the reverse dictionary of an SLP.

SPIRE2013-tabei20131009

FOLCA is a fully-online grammar compression method that builds a partial parse tree in an online manner and directly encodes it into a succinct representation using just nlgn+2n+o(n) bits of space. This is asymptotically optimal. It achieves small working space of (1+α)nlgn+n(3+lg(αn)) bits using a compressed hash table. It can extract substrings in O(l+h) time using extra space of nlg(N/n)+3n+o(n) bits. Experiments show it compresses and extracts faster than LZend while using less space.

WABI2012-SuccinctMultibitTree

This document summarizes a presentation on succinct representations of multibit trees for efficient chemical fingerprint searches. It describes:
1) Using succinct data structures like rank/select dictionaries and LOUDS representations to compactly encode multibit trees and fingerprint databases in memory.
2) Two approaches for compactly representing fingerprint databases - a variable-length array and succinct trie.
3) How the succinct representations allow fast similarity searches on large chemical fingerprint datasets while using less memory than pointer-based representations.

Gwt sdm public

(1) The document describes a method for efficient similarity search in massive graph databases using wavelet trees. (2) It converts graphs into bags-of-words representations using the Weisfeiler-Lehman procedure and indexes the words with a wavelet tree to enable fast semi-conjunctive queries. (3) Experiments on 25 million chemical compounds showed the method was significantly faster than alternative approaches while using less memory.

Jayant lrs

The document introduces lexicographic reverse search (LRS), an algorithm for enumerating the vertices of a polyhedron given its H-representation. LRS uses a lexicographic pivoting rule within the simplex method and traces all possible paths in reverse to enumerate all vertices. It was developed by Avis and Fukuda to improve upon the original reverse search algorithm. The document outlines the key concepts behind LRS, including lexicographic ordering of vectors, the lexicographic simplex method, and its applications in areas like convex hull problems and linear programming.

CSMR11b.ppt

This document summarizes a research paper on identifying micro-architectures in evolving object-oriented software systems. The paper presents an approach called SGFinder that models class diagrams as labeled graphs and defines micro-architectures as connected induced subgraphs. SGFinder efficiently enumerates all micro-architectures up to a given size. The paper applies SGFinder to two open-source systems and analyzes the identified micro-architectures to find those that are particularly fault-prone, fault-free, stable or change-prone. The results provide insights into common micro-architecture patterns and their relationships to quality attributes.

Effective community search_dami2015

Community search is the problem of finding a good community for a given set of query vertices.
In this work we propose a novel method that it is in general more efficient and effective than state-of-the art, it can handle multiple query vertices, it yields optimal communities, and it is parameter free.

Lp Boost

Lp Boost

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

Sketch sort ochadai20101015-public

Sketch sort ochadai20101015-public

Mlab2012 tabei 20120806

Mlab2012 tabei 20120806

Kdd2015reading-tabei

Kdd2015reading-tabei

Dmss2011 public

Dmss2011 public

Ibisml2011 06-20

Ibisml2011 06-20

Sketch sort sugiyamalab-20101026 - public

Sketch sort sugiyamalab-20101026 - public

DCC2014 - Fully Online Grammar Compression in Constant Space

DCC2014 - Fully Online Grammar Compression in Constant Space

GIW2013

GIW2013

Gwt presen alsip-20111201

Gwt presen alsip-20111201

CPM2013-tabei201306

CPM2013-tabei201306

SPIRE2013-tabei20131009

SPIRE2013-tabei20131009

WABI2012-SuccinctMultibitTree

WABI2012-SuccinctMultibitTree

Gwt sdm public

Gwt sdm public

NIPS2013読み会: Scalable kernels for graphs with continuous attributes

NIPS2013読み会: Scalable kernels for graphs with continuous attributes

Jayant lrs

Jayant lrs

CSMR11b.ppt

CSMR11b.ppt

20110501 csseminar rybalkin_substructure_search

20110501 csseminar rybalkin_substructure_search

Effective community search_dami2015

Effective community search_dami2015

A Subgraph Pattern Search over Graph Databases

The document discusses methods for continuous subgraph pattern searching over graph databases and graph streams. It proposes using Node-Neighbor Trees (NNTs) to represent local graph structures, and projecting NNTs to numerical vectors to enable approximate subgraph isomorphism checking. It also describes how to handle uncertain graph streams by computing probability upper bounds to filter out graph stream-query pairs that are unlikely to match. The overall approach conducts structural filtering followed by probability pruning to reduce the search space when capturing patterns over uncertain graph streams.

call for papers, research paper publishing, where to publish research paper, ...

call for papers, research paper publishing, where to publish research paper, ...International Journal of Engineering Inventions www.ijeijournal.com

The document discusses algorithms for mining frequent subgraphs from graph data. It begins by introducing graph mining and defining frequent subgraphs. The main algorithms are categorized into greedy search, inductive logic programming, and graph theory approaches. Graph theory approaches are further divided into Apriori-based and pattern growth algorithms. The algorithms are compared based on attributes like graph representation, search strategy, nature of input, and completeness of output. Key algorithms discussed include SUBDUE, AGM, and gSpan.Paired-end alignments in sequence graphs

Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix- matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.

Survey of Graph Indexing

The document discusses techniques for indexing and querying graph data. It begins by categorizing graph queries as exact subgraph matching, similarity subgraph matching, or super graph matching. It then describes querying approaches for collection databases containing many small graphs versus large singular graphs. The document proceeds to summarize several graph indexing techniques including GraphGrep, gIndex, Grafil, C-tree, QuickSI, and others. It focuses on filtering techniques used to reduce the number of verification steps in subgraph matching queries over graph databases.

141205 graphulo ingraphblas

This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like sparse matrix multiplication, element-wise multiplication, scaling and reduction. The goal is to demonstrate how fundamental graph problems can be solved within the GraphBLAS framework using linear algebraic formulations of graph computations.

141222 graphulo ingraphblas

This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like SpGEMM, SpMV, element-wise multiplication, and scaling. The goal is to demonstrate how common graph analytics can utilize the linear algebra approach of the GraphBLAS framework.

Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

This paper proposes a new technique for top-K graph similarity queries that aims to reduce computational cost. It defines a new distance measure for graphs based on the maximum common subgraph (MCS). It derives several distance lower bounds to prune graphs from consideration without needing to fully compute the MCS. This allows reducing the number of expensive MCS computations. The techniques are evaluated on a real graph dataset to test their performance improvements over existing approaches.

Colombo14a

The document discusses modifications to the PC algorithm for constraint-based causal structure learning that remove its order-dependence, which can lead to highly variable results in high-dimensional settings; the modified algorithms are order-independent while maintaining consistency under the same conditions, and simulations and analysis of yeast gene expression data show they improve performance over the original PC algorithm in high-dimensional settings.

Kailash(13EC35032)_mtp.pptx

I studied in Indian Institute of Technology, Kharagpur, India. I did my B.Texh and M.Tech in the department of Electronics and Electrical Communication Engineering. I was student of 2018 batch. After that, I joined Schneider Electric Systems India Private limited Company as Software design Engineer. Currently I am designated as Senior Firmware Engineer in the same company. I have work experience of 4+ years. The uploaded ppt is my MTP Thesis. It is about "temperature aware application mapping on to mesh based network on chip using Genetic Algorithm".

Dycops2019

This document summarizes an academic paper that proposes modifying well-known local linear models for system identification by replacing their original recursive learning rules with outlier-robust variants based on M-estimation. It describes three existing local linear models - local linear map (LLM), radial basis function network (RBFN), and local model network (LMN) - and then introduces the concept of M-estimation as a way to make the learning rules of these models more robust to outliers. The performance of the proposed outlier-robust variants is evaluated on three benchmark datasets and is found to provide considerable improvement in the presence of outliers compared to the original models.

SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...

Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km

Subgraph matching with set similarity in a

This document proposes an efficient approach for processing subgraph matching queries with set similarity (SMS2 queries) in large graph databases. The approach uses a "filter-and-refine" framework with offline indexing and online query processing. In the filtering phase, it builds an inverted lattice index of frequent element set patterns and encodes vertices as signatures. It then applies set similarity and structure-based pruning techniques. In the refinement phase, it uses a dominating set-based subgraph matching algorithm to find matching subgraphs guided by a dominating set selection method. Experimental results show the proposed approach outperforms state-of-the-art methods by an order of magnitude.

Subgraph matching with set similarity in a

Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

Scalable and Adaptive Graph Querying with MapReduce

This document summarizes a research paper that proposes a distributed graph querying algorithm called MR-Graph that employs MapReduce. MR-Graph uses a filter-and-verify scheme to first filter graphs based on contained features before verifying subgraph isomorphism. It also adaptively tunes the feature size at runtime by sampling data graphs to determine the most appropriate size. The experiments showed MR-Graph outperforms conventional algorithms in scalability and efficiency for processing multiple graph queries over massive datasets.

Parallel Biological Sequence Comparison in GPU Platforms

Prof Alba shared parallel biological sequence alignment with the Smith-Waterman algorithm and present CUDAlign, our fine-grained multi-GPU strategy.. This project is part of Research project at University of Brasilia

Reproducibility and differential analysis with selfish

Selfish is a Python tool for identifying differentially interacting chromatin regions from Hi-C contact maps of two conditions with no replicates. It begins by distance-correcting the interaction frequencies. It then computes Gaussian filters over neighboring bins to capture spatial dependencies. It compares the evolution of these filters between conditions and assigns p-values assuming Gaussian differences. Selfish is faster than existing methods and shows enrichment for epigenetic markers near differential regions. However, its statistical justification could be improved as it does not model overdispersion like other methods.

Graph mining seminar_2009

This document provides an overview of graph mining techniques. It discusses the motivation and applications of graph mining, including that graphs are useful for modeling many real-world domains. It also outlines different approaches for frequent subgraph mining, including the Apriori-based FSG algorithm and the DFS-based gSpan algorithm. FSG uses canonical labeling and other heuristics to efficiently find frequent subgraphs in a transactional graph database. gSpan represents graphs as DFS codes to explore the search space in a systematic way to discover all frequent subgraphs.

Mayank

FASTA is a bioinformatic tool for fast sequence searching that allows for comparison of a query sequence against a database to find similar sequences. It uses heuristic methods like focusing on diagonal areas of alignment matrices to achieve high sensitivity searches at high speed. The FASTA algorithm works by first finding regions of identity, rescoring matches using substitution matrices, joining matching segments while eliminating low scoring segments, and constructing an optimal alignment using dynamic programming.

graph_mining_seminar_2009.ppt

This document provides an overview of graph mining techniques. It discusses the motivation and applications of graph mining, including that graphs are commonly used to model data in domains like chemistry, biology, and social networks. It introduces the problem of finding frequent subgraphs and outlines some key graph mining algorithms, including the Apriori-based Frequent Subgraph Mining (FSG) algorithm, the Depth-First Search (DFS)-based gSpan algorithm, and greedy approaches. It also discusses some of the computational challenges of graph mining, such as performing graph isomorphism checks for operations like candidate generation and pruning.

A Subgraph Pattern Search over Graph Databases

A Subgraph Pattern Search over Graph Databases

call for papers, research paper publishing, where to publish research paper, ...

call for papers, research paper publishing, where to publish research paper, ...

Paired-end alignments in sequence graphs

Paired-end alignments in sequence graphs

Survey of Graph Indexing

Survey of Graph Indexing

141205 graphulo ingraphblas

141205 graphulo ingraphblas

141222 graphulo ingraphblas

141222 graphulo ingraphblas

Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Colombo14a

Colombo14a

Kailash(13EC35032)_mtp.pptx

Kailash(13EC35032)_mtp.pptx

Dycops2019

Dycops2019

SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...

SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...

Subgraph matching with set similarity in a

Subgraph matching with set similarity in a

Subgraph matching with set similarity in a

Subgraph matching with set similarity in a

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Scalable and Adaptive Graph Querying with MapReduce

Scalable and Adaptive Graph Querying with MapReduce

Parallel Biological Sequence Comparison in GPU Platforms

Parallel Biological Sequence Comparison in GPU Platforms

Reproducibility and differential analysis with selfish

Reproducibility and differential analysis with selfish

Graph mining seminar_2009

Graph mining seminar_2009

Mayank

Mayank

graph_mining_seminar_2009.ppt

graph_mining_seminar_2009.ppt

- 1. MaxPlanckInstitute@Tubingen, Feb. 25th, 2010 Mining Frequent Subgraphs from Linear Graphs YasuoTabei Computational Biology Research Center, AIST Joint work with Daisuke Okanohara (Univ. of Tokyo), Shuichi Hirose (AIST), Koji Tsuda (AIST)
- 3. Frequent Subgraph Mining Enumerate all frequent subgraphs in a graph database Input: graph database G={g1,g2,…,gN} G1 G2 G3 Output: frequent subgraphs appearing in at least m graphs
- 4. gSpan algorithm (Yan et al., 2002) Rightmost pattern extension Duplication can happen Minimum DFS code checking Time exponential to pattern size g
- 5. Linear Graph (Davydov et al., 2004) c b a a Labeled graph whose vertices are totally ordered Linear graph g=(V,E,LV,LE) - V⊂N: ordered vertex set - E⊆V×V: edge set - LV: V->ΣV: vertex labeling - LE:E->ΣE: edge labeling Ex) RNA, protein, alternative splicing forms, PAS 1 2 3 4 5 6 A B A B C A
- 9. Predicate-argument structurea a 1 2 3 4 5 6 A B A B C A
- 10. Linear Subgraph Relation g1 is a linear subgraph of g2 ⇔ i)The ordinary subgraph condition - the vertex labels are matched - all edges of g1 also exit in g2 with the correctlabels ii) The order of vertices are conserved Ex) ⊂ 1 3 2 6 4 5 3 2 1 C G A C G A T A T g1 g2
- 11. Example of Not Linear Subgraph g1 is not a linear subgraph of g2 - vertex labels are matched - all edges of g1 also exit in g2 with the correct labels - the order of vertices is not conserved Ex) b b c × ⊂ c 1 2 1 3 2 3 g1 g2 A A A B B A
- 12. Total order among edges in a linear graph Compare the left nodes first. If they are identical, look at the right nodes ∀e1=(i,j),e2=(k,l)∈Eg, e1<ee2 if and only if (i)i<k or (ii)i=k, j < l Ex) 2 3 1 1 2 3 4 e2 e1 i j k l
- 15. Alternative splicing4 1 2 3 4 D A R N D
- 17. Enumeration of All Linear Subgraphs of a Linear Graph Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first How to enumerate all subgraphs of the following linear graph without duplication?
- 18. Search Lattice of All Subgraphs # of edges (level) 1 empty 2 3 4
- 19. Reverse Search (Avis and Fukuda, 1993) All subgraphs can be enumerated by traversing the search lattice - To prevent duplication is difficult Need to define a search tree in the search lattice Reduction map f Mapping from a child to its parent Remove the largest edge f 2 2 3 1 1 1 2 3 4 1 2 3
- 20. Search Tree induced by the reduction map By applying the reduction map to each element search tree can be induced empty
- 22. Pattern Extension Rule 0-vertex addition (A-1) Parent Graph the largest edge new added edge (B-2) (B-3) (B-4) i i 1-vertex addition i i i (B-1) i (B-5) (B-6) (B-7) i i i 2-verteces addition (C-2) (C-3) (C-1) i i i (C-4) (C-5) (C-6) i i i
- 23. Traversing search tree from root Depth first traversal for its memory efficiency the largest edge new added edge empty
- 27. No edge labels.Rank the patterns by statistical significance (p-values) Association to thermophilic/methophilic label Fisher exact test
- 28. Applying gSpan Want to compare the execution time of our algorithm with that of gSpan gSpan is not directly applicable - Contact maps are not always connected - Made 1-gap and 2-gap linear graphs
- 30. Execution time of LGM is reasonable.gSpan does not work on the 2-gap linear graph dataset even if the minimum support threshold is 50.
- 31. Minimum support = 10 103 patterns whose p-value < 0.001 Thermophilic (TATA), Mesophilic (pol II)
- 32. Mapping motifs in 3D structure
- 36. Classification Accuracy The accuracy of LGM is better than that of gSpan PAS representation is comparable to the other representations.
- 39. Another topics Alignment algorithms for RNA sequences - Ph.D. study All pairs similarity search method - nearest neighbor graphs
- 40. Q & A
- 41. Data represented as linear graphs DNA, RNA, protein-3D structure, predicate argument structure - reference point: 5-strand(DNA, RNA), N-terminal (protein) Ex) RNA Protein (edge: 5Å) 1 2 3 4 1 2 3 4 3’ N C 5’ G U G C A R N D
- 43. Right most pattern extension right most path A graph is extended from a vertex on the right most path v1 v1 v3 v2 v3 v4 v1 v2 v1 v4 v2 v1 v2 v3 v2 v3
- 44. What is a code for an edge A code assigned for an edge in a graph - a set of label ids, vertex labels, edge ids Ex) ( vertex id1, vertex id2, vertex id1 label, vertex id2 label, edge label) v1 v2 v3 v4
- 46. Motif extraction To extract protein-3D motifs, we use the Fisher’s exact test. The P-value can be computed by the sum of all probabilities of tables that are more extreme than this table. Ranked the frequent subgraphs according to the P-values. Focused on a pair of proteins, TATA-binding protein and human polIIpromotor protein Table1: 2×2 contingency table
- 47. Unannoated Data VHLTPEEKKVVVK ? Prediction GGCCGGCCGGCCC ? Model Ex) HMM, SCFG etc ? Learning Feedback Annotated Data Ex) DNA, Protein, RNA etc ATGGGGCCCCGGC Gene VHLTPEEKKVVVK Protein RNA
- 48. Algorithms for prediction and learning are based on Dynamic Programming (DP). Ordering in linear graphs is useful for designing DP algorithms