Suffix trees and suffix arrays are data structures used to solve problems related to string matching and text indexing in an efficient manner. Suffix trees allow finding patterns in text in O(m) time where m is the pattern length, by traversing the tree. Suffix arrays store suffixes in sorted order and allow pattern searching in O(m+logn) time where n is text length. Both structures take O(n) time and space to construct where n is text length. They find applications in bioinformatics, data compression, and other string algorithms.
Indexing is used to speed up access to desired data.
E.g. author catalog in library
A search key is an attribute or set of attributes used to look up records in a file. Unrelated to keys in the db schema.
An index file consists of records called index entries.
An index entry for key k may consist of
An actual data record (with search key value k)
A pair (k, rid) where rid is a pointer to the actual data record
A pair (k, bid) where bid is a pointer to a bucket of record pointers
Index files are typically much smaller than the original file if the actual data records are in a separate file.
If the index contains the data records, there is a single file with a special organization.
Much of data is sequential – think speech, text, DNA, stock prices, financial transactions and customer action histories. Modern methods for modelling sequence data are often deep learning-based, composed of either recurrent neural networks (RNNs) or attention-based Transformers. A tremendous amount of research progress has recently been made in sequence modelling, particularly in the application to NLP problems. However, the inner workings of these sequence models can be difficult to dissect and intuitively understand.
This presentation/tutorial will start from the basics and gradually build upon concepts in order to impart an understanding of the inner mechanics of sequence models – why do we need specific architectures for sequences at all, when you could use standard feed-forward networks? How do RNNs actually handle sequential information, and why do LSTM units help longer-term remembering of information? How can Transformers do such a good job at modelling sequences without any recurrence or convolutions?
In the practical portion of this tutorial, attendees will learn how to build their own LSTM-based language model in Keras. A few other use cases of deep learning-based sequence modelling will be discussed – including sentiment analysis (prediction of the emotional valence of a piece of text) and machine translation (automatic translation between different languages).
The goals of this presentation are to provide an overview of popular sequence-based problems, impart an intuition for how the most commonly-used sequence models work under the hood, and show that quite similar architectures are used to solve sequence-based problems across many domains.
Indexing is used to speed up access to desired data.
E.g. author catalog in library
A search key is an attribute or set of attributes used to look up records in a file. Unrelated to keys in the db schema.
An index file consists of records called index entries.
An index entry for key k may consist of
An actual data record (with search key value k)
A pair (k, rid) where rid is a pointer to the actual data record
A pair (k, bid) where bid is a pointer to a bucket of record pointers
Index files are typically much smaller than the original file if the actual data records are in a separate file.
If the index contains the data records, there is a single file with a special organization.
Much of data is sequential – think speech, text, DNA, stock prices, financial transactions and customer action histories. Modern methods for modelling sequence data are often deep learning-based, composed of either recurrent neural networks (RNNs) or attention-based Transformers. A tremendous amount of research progress has recently been made in sequence modelling, particularly in the application to NLP problems. However, the inner workings of these sequence models can be difficult to dissect and intuitively understand.
This presentation/tutorial will start from the basics and gradually build upon concepts in order to impart an understanding of the inner mechanics of sequence models – why do we need specific architectures for sequences at all, when you could use standard feed-forward networks? How do RNNs actually handle sequential information, and why do LSTM units help longer-term remembering of information? How can Transformers do such a good job at modelling sequences without any recurrence or convolutions?
In the practical portion of this tutorial, attendees will learn how to build their own LSTM-based language model in Keras. A few other use cases of deep learning-based sequence modelling will be discussed – including sentiment analysis (prediction of the emotional valence of a piece of text) and machine translation (automatic translation between different languages).
The goals of this presentation are to provide an overview of popular sequence-based problems, impart an intuition for how the most commonly-used sequence models work under the hood, and show that quite similar architectures are used to solve sequence-based problems across many domains.
Breadth First Search & Depth First SearchKevin Jadiya
The slides attached here describes how Breadth first search and Depth First Search technique is used in Traversing a graph/tree with Algorithm and simple code snippet.
Information retrival system and PageRank algorithmRupali Bhatnagar
We discuss the various models for Information retrieval system present in literature and discuss them mathematically. We also study the PageRank Algorithm which is used for relevant search.
Breadth First Search & Depth First SearchKevin Jadiya
The slides attached here describes how Breadth first search and Depth First Search technique is used in Traversing a graph/tree with Algorithm and simple code snippet.
Information retrival system and PageRank algorithmRupali Bhatnagar
We discuss the various models for Information retrieval system present in literature and discuss them mathematically. We also study the PageRank Algorithm which is used for relevant search.
TRACES (TDS-CPC) has enabled filing Form 26B for online request for refund of excess TDS deposited. This was the much awaited and highly anticipated change.
Trie (aka radix tree or prefix tree), is an ordered tree data structure where the keys are usually strings. Tries have tremendous applications from all sorts of things like dictionary to
In this section we will be discussing about the Boyer-Moore algorithm defined by Robert S. Boyer and J Strother Moore in 1977 and used to improve the search of a pattern in a given text. Copy the link given below and paste it in new browser window to get more information on Boyre Moore Algorithm:- http://www.transtutors.com/homework-help/computer-science/boyre-moore-algorithm.aspx
• List is a collection, which is ordered and changeable. Allows duplicate members.
• Tuple is a collection, which is ordered and unchangeable. Allows duplicate members.
• Set is a collection, which is unordered and unindexed. No duplicate members.
• Dictionary is a collection, which is unordered, changeable and indexed. No duplicate members.
: String, List, Tuple, Dictionary
• List is a collection, which is ordered and changeable. Allows duplicate members.
• Tuple is a collection, which is ordered and unchangeable. Allows duplicate members.
• Set is a collection, which is unordered and unindexed. No duplicate members.
• Dictionary is a collection, which is unordered, changeable and indexed. No duplicate members.
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patternskvaderlipa
Description of designed and implemented Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns which is used for fast generating word search games (wordfind).
Principles of compiler design ppt on regular expression . Conversion regular expression to dfa and dfa to nfa. Direct method for converting regular expression to dfa
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
1. INDIAN INSTITUTE OF TECHNOLOGY (BHU),
VARANASI
CS-4301:SEMINAR/GROUPDISCUSSION
By:
HARSHITAGARWAL
11100EN006
2. Suffix Tree & Suffix Array
Construction
Pattern Searching
Space & Time Complexity
Applications
(In a Nutshell …)
3. Motivation
• Need fast text searching algorithm with low space
cost.
• DNA sequences and protein sequences are too large
to search by traditional algorithms.
• Rabin Karp
• Naive String Matching
• Some improved algorithms perform efficiently
• KMP, BM algorithms for string matching.
5. Definition
A suffix tree (also called PAT tree ) is a compressed trie
containing all the suffixes of the given text.
• Properties :
• Each tree edge is labeled by a substring of S.
• Each internal node except the root ,has >= 2 children.
• No edges branching out from the same internal node can start
with the same character.
• Each S(i) has its corresponding labeled path from root to a leaf, for
1 i n .
• There are n leaves.
9. Construction of Suffix Tree
Uniqueness :
(1) Here we preprocess the text string instead of pattern .
(2) Each suffix string is padded with a terminal symbol not seen
in the string (usually denoted by $). This ensures that no suffix
is a prefix of another.
10. Naïve Method
ALGORITHM :
1) Put suffix S [1,m] into the tree.
2) Then put S [i, m] into the tree for 2<=i<=m .
Naïve method - O(m2) (m = text size)
while suffixes remain:
add next shortest suffix to the tree
12. Ukkonen’s Algorithm
• Build suffix tree incrementally from left-to-right
• Build the tree in ‘m’ phases, one for each character.
• At the end of phase i, we will have tree T’i, which is the tree
representing the prefix S[1..i].
• Build “implicit" suffix trees (no end-of-string marker).
• Extend the implicit suffix trees from the previous step.
• Convert implicit suffix tree to “explicit” in final step.
• Three Extension Rules involved :
• ADD a new edge if new character.
• mSPLIT the existing edge if character already present.
• DO NOTHING if the current suffix already present.
20. Shortcuts :Improve the complexity
• Suffix Links
• Skip and Count Trick
• Edge Label Compressions
• A Stopper
21. Pattern Searching
Idea:
Every pattern that is present in text (or we can say every substring
of text) must be a prefix of one of all possible suffixes.
Algorithm:
• Starting from the first character of the pattern and root of Suffix
Tree, do following for every character.
• For the current character of pattern, if there is an edge from the
current node of suffix tree, follow the edge
• If there is no edge, print “pattern doesn’t exist in text” and return.
• If all characters of pattern have been processed, i.e., there is a
path from root for characters of the given pattern, then print
“Pattern found”.
Complexity:Order n (size of pattern) rather than Order m (size of text)
23. Definition
A suffixarray is just a sorted array of all the suffixes of a given
string. It contain integers that represent the starting indexes of
the all the suffixes of a given string, after the suffixes are sorted.
• Properties : Let Sbe a string and let S[i..j] denote the
substring of ranging from itoj.
• Suffix array is defined to be an array of integers providing the
starting positions of suffixesof S in lexicographicalorder.
• An entry A[i]contains the starting position of the i-thsmallest
suffixinS.
• For all 1 < i <= n : S[A[i-1],n]<S[A[i],n]
25. Construction of Suffix Array
• The text ends with the special sentinel letter $ that is unique
and lexicographically smaller than any other character.
• EasyO(n2logn)algorithm:
- Sort the n suffixes, which takes O(n log n) comparisons.
- Each comparison takes O(n).
• There are O(n log n) & O(n) algorithms for constructing suffix
arrays that use very little space.
26. Can we do it in
O(n) ..??
Skew Algorithm-
Divide & Conquer
29. Pattern Searching
• If Pattern(P) occurs in Text (T) then all its occurrences are
consecutive in the suffix array.
• Do a binary search on the suffix array.
• Complexity :Takes O(nlogm) time ,
where --m->length of Text
-- n->length of Pattern
• It can be improved to O(n+logm) time using LCP information.
30. Accelerate the Search :
L
R
Maintain l = LCP(P,L)
Maintain r = LCP(P,R)
M
If l = r then start
comparing M to P at l + 1
l
r
31. L
R
Suppose we know LCP(L,M)
If LCP(L,M) < l we go left
If LCP(L,M) > l we go right
If LCP(L,M) = l we start
comparing at l + 1
M
If l > r then
r
l
32. Suffix Array Vs Suffix Tree
• Suffix arrays are closely related to suffix trees:
• A suffix array can be constructed from Suffix tree by doing a DFS
(DepthFirstSearch)traversal of the suffix tree.
• A suffix tree can be constructed in linear time by using a
combination of suffix and LCP(LeastCommonPrefix)array.
• Slightspacevs.timetradeoff: Suffix arrays are more space
efficient way but just a bit slower to store the suffixes because we
just store the original string + a list of integers.
33. Applications
• Finding the longestrepeatedsubstring.
• Finding the longestcommonsubstring.
• Finding the longestpalindromein a string.
• Others :
• Data Compression & Clustering Algorithms.
• Bioinformatics.