Syntax-Directed Translation into Three Address Codesanchi29
The document discusses syntax-directed translation of code into three-address code. It defines semantic rules for generating three-address code for expressions, boolean expressions, and control flow statements. Temporary variables are generated for subexpressions and intermediate values. The semantic rules specify generating three-address code statements using temporary variables. Backpatching is also discussed as a technique to replace symbolic names in goto statements with actual addresses after code generation.
This slides contains assymptotic notations, recurrence relation like subtitution method, iteration method, master method and recursion tree method and sorting algorithms like merge sort, quick sort, heap sort, counting sort, radix sort and bucket sort.
This document discusses algorithms and their analysis. It defines an algorithm as a step-by-step procedure to solve a problem or calculate a quantity. Algorithm analysis involves evaluating memory usage and time complexity. Asymptotics, such as Big-O notation, are used to formalize the growth rates of algorithms. Common sorting algorithms like insertion sort and quicksort are analyzed using recurrence relations to determine their time complexities as O(n^2) and O(nlogn), respectively.
The document discusses solving the 8 queens problem using backtracking. It begins by explaining backtracking as an algorithm that builds partial candidates for solutions incrementally and abandons any partial candidate that cannot be completed to a valid solution. It then provides more details on the 8 queens problem itself - the goal is to place 8 queens on a chessboard so that no two queens attack each other. Backtracking is well-suited for solving this problem by attempting to place queens one by one and backtracking when an invalid placement is found.
A presentation on prim's and kruskal's algorithmGaurav Kolekar
This slides are for a presentation on Prim's and Kruskal's algorithm. Where I have tried to explain how both the algorithms work, their similarities and their differences.
This document discusses syntax-directed translation, which refers to a method of compiler implementation where the source language translation is completely driven by the parser. The parsing process and parse trees are used to direct semantic analysis and translation of the source program. Attributes and semantic rules are associated with the grammar symbols and productions to control semantic analysis and translation. There are two main representations of semantic rules: syntax-directed definitions and syntax-directed translation schemes. Syntax-directed translation schemes embed program fragments called semantic actions within production bodies and are more efficient than syntax-directed definitions as they indicate the order of evaluation of semantic actions. Attribute grammars can be used to represent syntax-directed translations.
The document discusses algorithms and pseudocode conventions. It defines an algorithm as a step-by-step procedure to solve a problem and get the desired output. Some key characteristics of algorithms are that they must be unambiguous, have well-defined inputs and outputs, terminate after a finite number of steps, and be feasible with available resources. Pseudocode is a notation for writing algorithms in a human-friendly way without ambiguity. The document provides conventions for writing pseudocode, such as using line numbers, proper indentation, assignment operators, array indexing, and flow control statements like if/else and for/while loops. It includes an example pseudocode for insertion sort following these conventions.
Syntax-Directed Translation into Three Address Codesanchi29
The document discusses syntax-directed translation of code into three-address code. It defines semantic rules for generating three-address code for expressions, boolean expressions, and control flow statements. Temporary variables are generated for subexpressions and intermediate values. The semantic rules specify generating three-address code statements using temporary variables. Backpatching is also discussed as a technique to replace symbolic names in goto statements with actual addresses after code generation.
This slides contains assymptotic notations, recurrence relation like subtitution method, iteration method, master method and recursion tree method and sorting algorithms like merge sort, quick sort, heap sort, counting sort, radix sort and bucket sort.
This document discusses algorithms and their analysis. It defines an algorithm as a step-by-step procedure to solve a problem or calculate a quantity. Algorithm analysis involves evaluating memory usage and time complexity. Asymptotics, such as Big-O notation, are used to formalize the growth rates of algorithms. Common sorting algorithms like insertion sort and quicksort are analyzed using recurrence relations to determine their time complexities as O(n^2) and O(nlogn), respectively.
The document discusses solving the 8 queens problem using backtracking. It begins by explaining backtracking as an algorithm that builds partial candidates for solutions incrementally and abandons any partial candidate that cannot be completed to a valid solution. It then provides more details on the 8 queens problem itself - the goal is to place 8 queens on a chessboard so that no two queens attack each other. Backtracking is well-suited for solving this problem by attempting to place queens one by one and backtracking when an invalid placement is found.
A presentation on prim's and kruskal's algorithmGaurav Kolekar
This slides are for a presentation on Prim's and Kruskal's algorithm. Where I have tried to explain how both the algorithms work, their similarities and their differences.
This document discusses syntax-directed translation, which refers to a method of compiler implementation where the source language translation is completely driven by the parser. The parsing process and parse trees are used to direct semantic analysis and translation of the source program. Attributes and semantic rules are associated with the grammar symbols and productions to control semantic analysis and translation. There are two main representations of semantic rules: syntax-directed definitions and syntax-directed translation schemes. Syntax-directed translation schemes embed program fragments called semantic actions within production bodies and are more efficient than syntax-directed definitions as they indicate the order of evaluation of semantic actions. Attribute grammars can be used to represent syntax-directed translations.
The document discusses algorithms and pseudocode conventions. It defines an algorithm as a step-by-step procedure to solve a problem and get the desired output. Some key characteristics of algorithms are that they must be unambiguous, have well-defined inputs and outputs, terminate after a finite number of steps, and be feasible with available resources. Pseudocode is a notation for writing algorithms in a human-friendly way without ambiguity. The document provides conventions for writing pseudocode, such as using line numbers, proper indentation, assignment operators, array indexing, and flow control statements like if/else and for/while loops. It includes an example pseudocode for insertion sort following these conventions.
This document provides an introduction to finite automata. It defines key concepts like alphabets, strings, languages, and finite state machines. It also describes the different types of automata, specifically deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs). DFAs have a single transition between states for each input, while NFAs can have multiple transitions. NFAs are generally easier to construct than DFAs. The next class will focus on deterministic finite automata in more detail.
This document discusses string matching algorithms. It begins with an introduction to the naive string matching algorithm and its quadratic runtime. Then it proposes three improved algorithms: FC-RJ, FLC-RJ, and FMLC-RJ, which attempt to match patterns by restricting comparisons based on the first, first and last, or first, middle, and last characters, respectively. Experimental results show that these three proposed algorithms outperform the naive algorithm by reducing execution time, with FMLC-RJ working best for three-character patterns.
The document discusses algorithms and their analysis. It covers:
1) The definition of an algorithm and its key characteristics like being unambiguous, finite, and efficient.
2) The fundamental steps of algorithmic problem solving like understanding the problem, designing a solution, and analyzing efficiency.
3) Methods for specifying algorithms using pseudocode, flowcharts, or natural language.
4) Analyzing an algorithm's time and space efficiency using asymptotic analysis and orders of growth like best-case, worst-case, and average-case scenarios.
This document provides an overview of algorithms and algorithm analysis. It discusses key concepts like what an algorithm is, different types of algorithms, and the algorithm design and analysis process. Some important problem types covered include sorting, searching, string processing, graph problems, combinatorial problems, geometric problems, and numerical problems. Examples of specific algorithms are given for some of these problem types, like various sorting algorithms, search algorithms, graph traversal algorithms, and algorithms for solving the closest pair and convex hull problems.
Context free grammars (CFGs) are formal systems that describe the structure of languages. A CFG consists of variables, terminals, production rules, and a start variable. Production rules take the form of a single variable producing a string of terminals and/or variables. CFGs can capture the recursive structure of natural languages while ignoring agreement and reference. They are used to define context-free languages and generate parse trees. Ambiguous grammars have sentences with multiple parse trees, and disambiguation aims to impose an ordering on derivations. While ambiguity cannot always be eliminated, simplifying and restricting grammars has theoretical and practical benefits.
Intermediate code generation in Compiler DesignKuppusamy P
The document discusses intermediate code generation in compilers. It begins by explaining that intermediate code generation is the final phase of the compiler front-end and its goal is to translate the program into a format expected by the back-end. Common intermediate representations include three address code and static single assignment form. The document then discusses why intermediate representations are used, how to choose an appropriate representation, and common types of representations like graphical IRs and linear IRs.
Graph traversal techniques are used to search vertices in a graph and determine the order to visit vertices. There are two main techniques: breadth-first search (BFS) and depth-first search (DFS). BFS uses a queue and visits the nearest vertices first, producing a spanning tree. DFS uses a stack and visits vertices by going as deep as possible first, also producing a spanning tree. Both techniques involve marking visited vertices to avoid loops.
The document discusses finite automata including nondeterministic finite automata (NFAs) and deterministic finite automata (DFAs). It provides examples of NFAs and DFAs that recognize particular strings, including strings containing certain substrings. It also gives examples of DFA state machines and discusses using finite automata to recognize regular languages.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
BackTracking Algorithm: Technique and ExamplesFahim Ferdous
This slides gives a strong overview of backtracking algorithm. How it came and general approaches of the techniques. Also some well-known problem and solution of backtracking algorithm.
This document discusses macros and macro processing. It defines macros as units of code abbreviation that are expanded during compilation. The macro processor performs two passes: pass 1 reads macros and stores them in a table, pass 2 expands macros by substituting actual parameters. Advanced features like conditional expansion and looping are enabled using statements like AIF, AGO, and ANOP. Nested macro calls follow a LIFO expansion order.
The document summarizes a seminar presentation on using directed acyclic graphs (DAGs) to represent and optimize basic blocks in compiler design. DAGs can be constructed from three-address code to identify common subexpressions and eliminate redundant computations. Rules for DAG construction include creating a node only if it does not already exist, representing identifiers as leaf nodes and operators as interior nodes. DAGs allow optimizations like common subexpression elimination and dead code elimination to improve performance of local optimizations on basic blocks. Examples show how DAGs identify common subexpressions and avoid recomputing the same values.
Lecture: Regular Expressions and Regular LanguagesMarina Santini
This document provides an introduction to regular expressions and regular languages. It defines the key operations used in regular expressions: union, concatenation, and Kleene star. It explains how regular expressions can be converted into finite state automata and vice versa. Examples of regular expressions are provided. The document also defines regular languages as those languages that can be accepted by a deterministic finite automaton. It introduces the pumping lemma as a way to determine if a language is not regular. Finally, it includes some practical activities for readers to practice converting regular expressions to automata and writing regular expressions.
The document discusses hash tables and hashing techniques. It describes how hash tables use a hash function to map keys to positions in a table. Collisions can occur if multiple keys map to the same position. There are two main approaches to handling collisions - open hashing which stores collided items in linked lists, and closed hashing which uses techniques like linear probing to find alternate positions in the table. The document also discusses various hash functions and their properties, like minimizing collisions.
This document provides an overview of deterministic finite automata (DFA) through examples and practice problems. It begins with defining the components of a DFA, including states, alphabet, transition function, start state, and accepting states. An example DFA is given to recognize strings ending in "00". Additional practice problems involve drawing minimal DFAs, determining the minimum number of states for a language, and completing partially drawn DFAs. The document aims to help students learn and practice working with DFA models.
The document discusses algorithm analysis and asymptotic notation. It defines algorithm analysis as comparing algorithms based on running time and other factors as problem size increases. Asymptotic notation such as Big-O, Big-Omega, and Big-Theta are introduced to classify algorithms based on how their running times grow relative to input size. Common time complexities like constant, logarithmic, linear, quadratic, and exponential are also covered. The properties and uses of asymptotic notation for equations and inequalities are explained.
Theory of automata and formal language lab manualNitesh Dubey
The document describes several experiments related to compiler design including lexical analysis, parsing, and code generation.
Experiment 1 involves writing a program to identify if a given string is an identifier or not using a DFA. Experiment 2 simulates a DFA to check if a string is accepted by the given automaton. Experiment 3 checks if a string belongs to a given grammar using a top-down parsing approach. Experiment 4 implements recursive descent parsing to parse expressions based on a grammar. Experiment 5 computes FIRST and FOLLOW sets and builds a LL(1) parsing table for a given grammar. Experiment 6 implements shift-reduce parsing to parse strings. Experiment 7 generates intermediate code like Polish notation, 3-address code, and quadruples
This document discusses asymptotic notations that are used to characterize an algorithm's efficiency. It introduces Big-Oh, Big-Omega, and Theta notations. Big-Oh gives an upper bound on running time. Big-Omega gives a lower bound. Theta gives both upper and lower bounds, representing an algorithm's exact running time. Examples are provided for each notation. Time complexity is also introduced, which describes how the time to solve a problem changes with problem size. Worst-case time analysis provides an upper bound on runtime.
This document contains a presentation on Breadth-First Search (BFS) given to students. The presentation includes:
- An introduction to BFS and its inventor Konrad Zuse.
- Definitions of key terms like graph, tree, vertex, level-order traversal.
- An example visualization of BFS on a graph with 14 steps.
- Pseudocode and a Java program implementing BFS.
- Applications of BFS like shortest paths, social networks, web crawlers.
- The time and space complexity of BFS is O(V+E) and O(V).
- A conclusion that BFS is an important algorithm that traverses
This document discusses top-down parsing and LL parsing. It begins by defining top-down parsing as discovering a parse tree from top to bottom using a preorder traversal and leftmost derivation. It then explains that LL parsing is a technique of top-down parsing that uses a parser stack, parsing table, and driver function. It identifies left recursion and left factoring as problems for LL(1) parsers, as they prevent deterministic parsing. It provides examples of eliminating left recursion and left factoring through grammar transformations.
This document is a preprint that summarizes a paper on developing a spell checker model using automata. It discusses using fuzzy automata rather than finite automata for improved string comparison. The proposed spell checker would incorporate autosuggestion features into a Windows application. It would use techniques like edit distance and tries to store dictionaries and suggest corrections. The paper outlines the design of the spell checker, discussing functions like comparing words to a dictionary and considering morphology. Advantages like improved accuracy and speed are discussed along with potential disadvantages like inability to detect all errors.
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
The document discusses advancements in neural machine translation models for the Hindi-English language pair using Long Short-Term Memory (LSTM) networks with an attention mechanism. It provides details on preprocessing the parallel Hindi-English dataset, developing encoder-decoder LSTM models with attention, training the models over multiple epochs, and evaluating the trained models on test data. The proposed LSTM model achieves over 90% accuracy on Hindi to English translation tasks, demonstrating better performance than recurrent neural network baselines.
This document provides an introduction to finite automata. It defines key concepts like alphabets, strings, languages, and finite state machines. It also describes the different types of automata, specifically deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs). DFAs have a single transition between states for each input, while NFAs can have multiple transitions. NFAs are generally easier to construct than DFAs. The next class will focus on deterministic finite automata in more detail.
This document discusses string matching algorithms. It begins with an introduction to the naive string matching algorithm and its quadratic runtime. Then it proposes three improved algorithms: FC-RJ, FLC-RJ, and FMLC-RJ, which attempt to match patterns by restricting comparisons based on the first, first and last, or first, middle, and last characters, respectively. Experimental results show that these three proposed algorithms outperform the naive algorithm by reducing execution time, with FMLC-RJ working best for three-character patterns.
The document discusses algorithms and their analysis. It covers:
1) The definition of an algorithm and its key characteristics like being unambiguous, finite, and efficient.
2) The fundamental steps of algorithmic problem solving like understanding the problem, designing a solution, and analyzing efficiency.
3) Methods for specifying algorithms using pseudocode, flowcharts, or natural language.
4) Analyzing an algorithm's time and space efficiency using asymptotic analysis and orders of growth like best-case, worst-case, and average-case scenarios.
This document provides an overview of algorithms and algorithm analysis. It discusses key concepts like what an algorithm is, different types of algorithms, and the algorithm design and analysis process. Some important problem types covered include sorting, searching, string processing, graph problems, combinatorial problems, geometric problems, and numerical problems. Examples of specific algorithms are given for some of these problem types, like various sorting algorithms, search algorithms, graph traversal algorithms, and algorithms for solving the closest pair and convex hull problems.
Context free grammars (CFGs) are formal systems that describe the structure of languages. A CFG consists of variables, terminals, production rules, and a start variable. Production rules take the form of a single variable producing a string of terminals and/or variables. CFGs can capture the recursive structure of natural languages while ignoring agreement and reference. They are used to define context-free languages and generate parse trees. Ambiguous grammars have sentences with multiple parse trees, and disambiguation aims to impose an ordering on derivations. While ambiguity cannot always be eliminated, simplifying and restricting grammars has theoretical and practical benefits.
Intermediate code generation in Compiler DesignKuppusamy P
The document discusses intermediate code generation in compilers. It begins by explaining that intermediate code generation is the final phase of the compiler front-end and its goal is to translate the program into a format expected by the back-end. Common intermediate representations include three address code and static single assignment form. The document then discusses why intermediate representations are used, how to choose an appropriate representation, and common types of representations like graphical IRs and linear IRs.
Graph traversal techniques are used to search vertices in a graph and determine the order to visit vertices. There are two main techniques: breadth-first search (BFS) and depth-first search (DFS). BFS uses a queue and visits the nearest vertices first, producing a spanning tree. DFS uses a stack and visits vertices by going as deep as possible first, also producing a spanning tree. Both techniques involve marking visited vertices to avoid loops.
The document discusses finite automata including nondeterministic finite automata (NFAs) and deterministic finite automata (DFAs). It provides examples of NFAs and DFAs that recognize particular strings, including strings containing certain substrings. It also gives examples of DFA state machines and discusses using finite automata to recognize regular languages.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
BackTracking Algorithm: Technique and ExamplesFahim Ferdous
This slides gives a strong overview of backtracking algorithm. How it came and general approaches of the techniques. Also some well-known problem and solution of backtracking algorithm.
This document discusses macros and macro processing. It defines macros as units of code abbreviation that are expanded during compilation. The macro processor performs two passes: pass 1 reads macros and stores them in a table, pass 2 expands macros by substituting actual parameters. Advanced features like conditional expansion and looping are enabled using statements like AIF, AGO, and ANOP. Nested macro calls follow a LIFO expansion order.
The document summarizes a seminar presentation on using directed acyclic graphs (DAGs) to represent and optimize basic blocks in compiler design. DAGs can be constructed from three-address code to identify common subexpressions and eliminate redundant computations. Rules for DAG construction include creating a node only if it does not already exist, representing identifiers as leaf nodes and operators as interior nodes. DAGs allow optimizations like common subexpression elimination and dead code elimination to improve performance of local optimizations on basic blocks. Examples show how DAGs identify common subexpressions and avoid recomputing the same values.
Lecture: Regular Expressions and Regular LanguagesMarina Santini
This document provides an introduction to regular expressions and regular languages. It defines the key operations used in regular expressions: union, concatenation, and Kleene star. It explains how regular expressions can be converted into finite state automata and vice versa. Examples of regular expressions are provided. The document also defines regular languages as those languages that can be accepted by a deterministic finite automaton. It introduces the pumping lemma as a way to determine if a language is not regular. Finally, it includes some practical activities for readers to practice converting regular expressions to automata and writing regular expressions.
The document discusses hash tables and hashing techniques. It describes how hash tables use a hash function to map keys to positions in a table. Collisions can occur if multiple keys map to the same position. There are two main approaches to handling collisions - open hashing which stores collided items in linked lists, and closed hashing which uses techniques like linear probing to find alternate positions in the table. The document also discusses various hash functions and their properties, like minimizing collisions.
This document provides an overview of deterministic finite automata (DFA) through examples and practice problems. It begins with defining the components of a DFA, including states, alphabet, transition function, start state, and accepting states. An example DFA is given to recognize strings ending in "00". Additional practice problems involve drawing minimal DFAs, determining the minimum number of states for a language, and completing partially drawn DFAs. The document aims to help students learn and practice working with DFA models.
The document discusses algorithm analysis and asymptotic notation. It defines algorithm analysis as comparing algorithms based on running time and other factors as problem size increases. Asymptotic notation such as Big-O, Big-Omega, and Big-Theta are introduced to classify algorithms based on how their running times grow relative to input size. Common time complexities like constant, logarithmic, linear, quadratic, and exponential are also covered. The properties and uses of asymptotic notation for equations and inequalities are explained.
Theory of automata and formal language lab manualNitesh Dubey
The document describes several experiments related to compiler design including lexical analysis, parsing, and code generation.
Experiment 1 involves writing a program to identify if a given string is an identifier or not using a DFA. Experiment 2 simulates a DFA to check if a string is accepted by the given automaton. Experiment 3 checks if a string belongs to a given grammar using a top-down parsing approach. Experiment 4 implements recursive descent parsing to parse expressions based on a grammar. Experiment 5 computes FIRST and FOLLOW sets and builds a LL(1) parsing table for a given grammar. Experiment 6 implements shift-reduce parsing to parse strings. Experiment 7 generates intermediate code like Polish notation, 3-address code, and quadruples
This document discusses asymptotic notations that are used to characterize an algorithm's efficiency. It introduces Big-Oh, Big-Omega, and Theta notations. Big-Oh gives an upper bound on running time. Big-Omega gives a lower bound. Theta gives both upper and lower bounds, representing an algorithm's exact running time. Examples are provided for each notation. Time complexity is also introduced, which describes how the time to solve a problem changes with problem size. Worst-case time analysis provides an upper bound on runtime.
This document contains a presentation on Breadth-First Search (BFS) given to students. The presentation includes:
- An introduction to BFS and its inventor Konrad Zuse.
- Definitions of key terms like graph, tree, vertex, level-order traversal.
- An example visualization of BFS on a graph with 14 steps.
- Pseudocode and a Java program implementing BFS.
- Applications of BFS like shortest paths, social networks, web crawlers.
- The time and space complexity of BFS is O(V+E) and O(V).
- A conclusion that BFS is an important algorithm that traverses
This document discusses top-down parsing and LL parsing. It begins by defining top-down parsing as discovering a parse tree from top to bottom using a preorder traversal and leftmost derivation. It then explains that LL parsing is a technique of top-down parsing that uses a parser stack, parsing table, and driver function. It identifies left recursion and left factoring as problems for LL(1) parsers, as they prevent deterministic parsing. It provides examples of eliminating left recursion and left factoring through grammar transformations.
This document is a preprint that summarizes a paper on developing a spell checker model using automata. It discusses using fuzzy automata rather than finite automata for improved string comparison. The proposed spell checker would incorporate autosuggestion features into a Windows application. It would use techniques like edit distance and tries to store dictionaries and suggest corrections. The paper outlines the design of the spell checker, discussing functions like comparing words to a dictionary and considering morphology. Advantages like improved accuracy and speed are discussed along with potential disadvantages like inability to detect all errors.
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
The document discusses advancements in neural machine translation models for the Hindi-English language pair using Long Short-Term Memory (LSTM) networks with an attention mechanism. It provides details on preprocessing the parallel Hindi-English dataset, developing encoder-decoder LSTM models with attention, training the models over multiple epochs, and evaluating the trained models on test data. The proposed LSTM model achieves over 90% accuracy on Hindi to English translation tasks, demonstrating better performance than recurrent neural network baselines.
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Examshishamrizvi
The document provides an overview of key concepts in C++ including:
- Data types like int, char, float, and double
- Variables, constants, and escape sequences
- Operators like assignment, arithmetic, relational, and logical operators
- Control structures like if/else, switch, for, while, and do-while loops
- Functions like main(), exit(), get(), put(), and getline()
- Input/output streams like cin, cout, and file streams
The document covers fundamental C++ concepts to help prepare for board exams.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Static dictionary for pronunciation modelingeSAT Journals
Abstract Speech Recognition is the process by which a computer can recognize spoken words. Basically it means talking to your computer and having it correctly recognize what you are saying. This work is to improve the speech recognition accuracy for Telugu language using pronunciation model. Speech Recognizers based upon Hidden Markov Model have achieved a high level of performance in controlled environments. One naive approach for robust speech recognition in order to handle the mismatch between training and testing environments, many techniques have been proposed. some methods work on acoustic model and some methods work on Language model. Pronunciation dictionary is the most important component of the Speech Recognition system. New pronunciations for the words will be considered for speech recognition accuracy.
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
This document presents a method for generating suggestions for specific erroneous parts of sentences in Indian languages like Malayalam using deep learning. The method uses recurrent neural networks with long short-term memory layers to train a model on input-output examples of sentences and their corrections. The model takes in preprocessed sentence data and generates a set of possible corrections for erroneous parts through multiple network layers. An analysis of the model shows that it can accurately generate suggestions for word length of three, but requires more data and study to handle the complex morphology and symbols of Malayalam. The performance of the method is limited by the hardware used and it could be improved with a more powerful system and additional training data.
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
A performance of svm with modified lesk approach for word sense disambiguatio...eSAT Journals
Abstract WSD is a Technique used to find the correct meaning of a given word in any human language. Each human language has a problem called ambiguity of a word. To finds the correct meaning of any ambiguous word is easy for human but for a machine it is great issues No of work has done on WSD but not enough in Hindi language. My objective is to provide the training to the system so that it can easily find the correct meaning of the any ambiguous word in Hindi language .For this purpose I simple used a one existing technique name modified lesk approach and give its output to the SVM to get the better result and show that SVM is better in compare to modified Lesk Approach, In this paper I simply take nine Hindi ambiguous words and three different databases to show the result. Keywords: Support Vector Machine, NLP, Word Sense Disambiguation, Modified Lesk approach, Comparison
1) This document discusses stemming algorithms that have been used for the Odia language. Stemming is the process of reducing inflected words to their root or stem for purposes like information retrieval.
2) It reviews different stemming algorithms that have been applied to Odia text, including suffix stripping, affix removal, and stochastic algorithms. It also discusses common errors in stemming like over-stemming and under-stemming.
3) Applications of stemming discussed include information retrieval, text summarization, machine translation, indexing, and question answering systems. The document concludes by surveying prior work on stemming algorithms for Odia.
hExarAbax makkAmasjix samayaM anni rojulu 5:00 am - 9:00 pm
(Mecca Masjid timings in Hyderabad - All days 5:00 am - 9:00 pm)
User query: makkAmasjix PIju eVMwa?
(What is the fee for Mecca Masjid?)
POS-tagger: makkAmasjix PIju/WQ eVMwa
Replace with root word: makkAmasjix PIju/WQ eMwa
Context Handler: Updates context to 'makkAmasjix'
Advanced Filter: Keywords - makkAmas
This lab report describes developing a program to perform string operations using suffix arrays. It includes 3 modules: 1) Finding the longest repeated substring, 2) Finding the longest common substring, and 3) Finding the longest palindrome in a string. The report provides code for building a suffix tree from a string and performing traversal to solve each problem. It also includes sample outputs and references.
This document discusses recent advances in machine learning for natural language processing using recurrent neural networks and word embeddings. It summarizes that neural networks trained on huge datasets can now understand human language better than ever before. Word embeddings represent words as dense vectors that encode semantic meaning, allowing machines to understand relationships between words. The document then discusses using recurrent neural networks and LSTM models to generate text, and provides an example output from a character-level text generation model trained on Alice in Wonderland.
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
This document discusses training a dependency parser using an unparsed corpus rather than a manually parsed training set. It develops an iterative training method that generates training examples using heuristics from past parsing decisions. The method is shown to produce parse trees qualitatively similar to conventionally trained parsers. Three avenues for future research using this corpus-based generation method are proposed.
The document discusses processing Boolean queries in an information retrieval system using an inverted index. It describes the steps to process a simple conjunctive query by locating terms in the dictionary, retrieving their postings lists, and intersecting the lists. More complex queries involving OR and NOT operators are also processed in a similar way. The document also discusses optimizing query processing by considering the order of accessing postings lists.
The document discusses implementing chatbots using deep learning. It begins by defining what a chatbot is and listing some popular existing chatbots. It then describes two types of chatbot models - retrieval-based models which use predefined responses and generative models which continuously learn from conversations. The document focuses on implementing a retrieval-based model using the Ubuntu Dialog Corpus dataset and a dual encoder LSTM network model in TensorFlow. It outlines the preprocessing, model architecture, creating input functions, training, evaluating, and making predictions with the trained model.
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
The document describes a system for multilingual speech-to-text conversion using Hugging Face that aims to assist deaf individuals. The system uses a Transformer-based encoder-decoder model trained on audio datasets. Feature extraction and tokenization are performed on the audio inputs. The model is fine-tuned using Hugging Face and evaluated based on word error rate. A web application allows users to record speech via microphone and see the transcription output. The implemented model achieved a 32.42% word error rate on a Hindi language dataset. The goal is to enable seamless communication for those with hearing impairments across multiple languages.
1) The paper proposes an efficient Tamil text compaction system that reduces Tamil text to around 40% of the original by identifying word categories and mapping words to compact forms while maintaining meaning.
2) The system handles common Tamil words, abbreviations/acronyms, and numbers by using a morphological analyzer to identify word roots and a generator to re-add suffixes. Compact forms are retrieved from mappings stored in data structures like trees and hashmaps.
3) Testing on over 10,000 words showed the final text was reduced to 40% of the original size, providing a more efficient way to communicate in Tamil on platforms with character limits like social media and text messages.
The document discusses analyzing Twitter data on the 2016 Chevrolet Camaro to detect consumer sentiment. It describes conducting word cloud analysis and sentiment classification using R programming. Key steps include collecting Twitter data on the Camaro, preprocessing the text data, generating a word cloud to identify frequent keywords, classifying tweets by emotion using naive Bayes classification, and classifying polarity as positive or negative sentiment. Graphs are produced to show the results of the emotion and polarity classification analyses.
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGcsandit
The proposed approach deals with the detection of jargon words in electronic data in different communication mediums like internet, mobile services etc. But in the real life, the jargon words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach detects those abbreviated forms also using semi supervised learning methodology. This learning methodology derives the probability of a suspicious word to be a jargon word by the synset and concept analysis of the text.
Similar to Souvenir's Booth - Algorithm Design and Analysis Project Project Report (20)
The document describes the development of a low-cost Landslide and Earthquake Monitoring and Warning System (LEMWS). Key components include low-cost MEMS sensors to measure soil moisture, accelerations, and tilt. Data from the sensors is transmitted to the cloud via a microcontroller and GSM module. Thresholds for the sensors were determined through experiments. When thresholds were breached, alerts were sent by SMS. The total estimated cost for a deployment across a hill is INR 25,000, making it significantly cheaper than existing systems. The system was able to successfully detect landslides in a lab-scale experiment.
Akshit Arora is a final year computer engineering student at Thapar University in India. He has conducted research internships at IIT Mandi and IIT Delhi focused on artificial intelligence, machine learning, and data analytics. Some of his projects include developing an interactive landslide simulation tool and analyzing politician tweets to detect policy opinions. He has published papers and presented his work at several conferences. Arora is proficient in programming languages like Python and skills like data mining, machine learning, and software engineering.
Capstone Report - Industrial Attachment Program (IAP) Evaluation PortalAkshit Arora
Capstone Project Report on IAP Evaluation Portal submitted in partial fulfillment of the requirements for the award of the degree of Bachelor of Engineering in Computer Science and Engineering Department by:
Abhinav Garg (101303004), Arush Nagpal (101303034), Akshit Arora (101303012) and Chahak Gupta (101303041)
Under the supervision of:
Dr. Prashant Singh Rana, Assistant Professor, CSED (http://psrana.com) and Dr. Ajay Batish, Professor, MED (www.thapar.edu/index.php/mechanical-engineering-department/faculty?pid=153&sid=387:dr-ajay-batish)
Thapar University,
Patiala, Punjab, India - 147004.
December 2016
Organizational behavior presentation - Origins of IntelligenceAkshit Arora
Based on the following links:
http://bigthink.com/big-think-tv/the-origin-of-intelligence
https://www.ted.com/talks/simon_sinek_how_great_leaders_inspire_action
Application of Management Tools: Total Quality Management CourseAkshit Arora
This document provides an overview of several management tools used in total quality management: affinity diagrams, activity network diagrams, process decision program charts, and interrelationship diagrams. It explains the purpose and basic process for applying each tool. For affinity diagrams and activity network diagrams, example scenarios are given to demonstrate how each tool can be implemented. The document aims to communicate how these tools help with planning, organizing ideas, identifying relationships and dependencies, addressing potential problems, and ensuring project success.
A multilevel automatic thresholding method based on a genetic algorithm for a...Akshit Arora
This document summarizes a project to implement a multilevel automatic thresholding method for image segmentation based on a genetic algorithm. The method uses a genetic algorithm to initialize population of threshold values, applies genetic operators like selection, crossover and mutation to improve the threshold values at each generation, and refines the best thresholds to segment the image. The project studied this thresholding method from a research paper and implemented it in MATLAB to segment images using a genetic algorithm approach to optimization of threshold values.
Industrial Attachment Program (IAP) ReportAkshit Arora
Project report for IAP, project developed for automating the Industrial Attachment Program of Dept. of Mechanical Engineering at Thapar University. Developed under the guidance of Dr. Ajay Batish (Professor at Mechanical Engg. Dept, TU)
This project report describes the implementation of Otsu's method for image segmentation. Otsu's method is a global thresholding technique that automatically performs image thresholding. It finds the optimal threshold to separate foreground objects from the background by minimizing intra-class variance. The report provides an overview of image segmentation and thresholding techniques. It explains Otsu's algorithm and how it maximizes between-class variance. Results of applying Otsu's method on sample images using histogram analysis, the graythresh function, and adaptive thresholding are presented. The report concludes that Otsu's method is a simple and effective approach for automatic image thresholding and segmentation.
1) The document describes a project to develop an interactive landslide simulator under the guidance of Dr. Varun Dutt at the Indian Institute of Technology Mandi between June and July 2015.
2) The project involved creating a computational model and web-based game to simulate landslides and improve public understanding of landslide risks, causes, and preparedness.
3) Key activities included analyzing historical rainfall data, developing mathematical models, implementing web technologies like PHP and MySQL, and measuring changes in user risk perception and knowledge through surveys before and after playing the game.
The document describes a computational model and web-based game to simulate landslides and their impacts. The model accounts for both natural factors like rainfall and soil type, as well as human investments in prevention. It calculates the probability of landslides based on these factors. The game allows users to invest in prevention and see the consequences. The goal is to improve public understanding of landslides and inform policy. Future work includes expanding it to be multi-player, adding languages, and making it more culturally independent.
The document describes a 6-week summer internship project to develop an interactive landslide simulator. The intern created a mathematical model to simulate landslide probability based on investment in prevention techniques and rainfall. A web-based game was developed using this model to educate people about landslides. The game allows players to invest against landslides and experience consequences of landslides through messages and imagery. The project aims to improve public understanding of landslides and support policymaking and education. Future work includes making it multiplayer, adding regional languages, and making it culture independent.
An asynchronous processor chip eliminates the global clock that governs synchronous chips. Instead, asynchronous chips use completion signals to indicate when instructions are finished and can transfer data without a clock. This allows asynchronous chips to have higher performance, better power efficiency, and smaller size than synchronous chips. A key advantage is they only activate parts consuming power and minimize wasteful power from glitches. Epson developed the first flexible 8-bit asynchronous microprocessor using low-temperature polysilicon thin-film transistors that achieved 70% lower energy use and 20dB lower electromagnetic radiation than synchronous designs.
Asynchronous Processors - The Clock less FutureAkshit Arora
The synchronous processor chips of computers use a clock to time the entire chip. But this clock is also the cause of speed and power consumption headaches. Here is how an asynchronous chip overcomes these problems by eliminating the clock altogether.
A presentation by Akshit Arora and Ankit Goyal for Computer Science Architecture (UCS401) at Thapar University, Patiala
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Embedded machine learning-based road conditions and driving behavior monitoring
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
1. ANALYSIS AND DESIGN OF ALGORITHMS (UCS 501) PROJECT REPORT
Souvenir's Booth
Using Auto Complete
Abhinav Garg(101303004)
Akshit Arora(101303012)
Chahak Gupta(101303041)
1/12/2015
Submitted To-
Ms. Tarunpreet Bhatia
2. SOUVENIR’S BOOTH USING AUTO COMPLETE 1
Acknowledgement__________________________________________________________________
We take this opportunity to express my profound gratitude and deep regards to Ms. Tarunpreet
Bhatia for her cordial support, valuable information and guidance, which helped us in
completing this task through various stages while I did face many problem regarding my project
"Souvenir's Booth". She explained the concepts of algorithmic techniques in an illustrious
manner along with providing the necessary guidance through links. This work simply would not
have reached up to such extent without her rational guidance.
Lastly, I thank almighty, my family members and friends for their constant encouragement
without which this assignment would not be possible.
3. SOUVENIR’S BOOTH USING AUTO COMPLETE 2
INDEX
Sr. No. Title Page Number
1. Abstract 3
2. Introduction 4
3. Objective 5
4. Literature Review 6
5. Algorithmic Techniques Used 6
6. Code 7
7. Schema of Database 10
8. Asymptotic Analysis 11
9. Working Examples 12
11. Conclusion 14
12. Contribution to Society 14
13. References 16
4. SOUVENIR’S BOOTH USING AUTO COMPLETE 3
Abstract_______________________________________________________________________________
Auto Complete is the most important feature and essential tool when it comes to accessing web
search engines such as Google, Yahoo, and YouTube etc. Although it is intensively used on
internet. Auto-complete features in many offline activities as well. Finest example can be smart
phones. Not only smart phones provide most accurate words but they also predict the words
based on the priorities of the users like previous search history. Form the utilities of this features
in wide application made us to analyze this feature and implement this feature in most efficient
way possible. We implemented auto complete using Trie and analysed the time complexity of
this algorithm.
5. SOUVENIR’S BOOTH USING AUTO COMPLETE 4
Introduction__________________________________________________________________________
Over the previous couple of years, auto-complete has exploded up all over the web. Facebook,
YouTube, Google, Bing, MSDN, LinkedIn and lots of new websites all try to complete your
phrase as soon as you start typing.
Autocomplete speeds up human-computer interactions when it correctly predicts words being
typed. It works best in domains with a limited number of possible words (such as in command),
when some words are much more common (such as when addressing an e-mail), or writing
structured and predictable text (as in source code editors).
Many autocomplete algorithms learn new words after the user has written them a few times, and
can suggest alternatives based on the learned habits of the individual user done by using machine
learning.
Here, we implement auto complete in a game. The player gets an ordered set of alphabets called
POX. The player needs to use those starting alphabets and make a word. Any word in the
English language starting with POX may or may not exist. If there doesn't exist any such word
then he has to use the maximum possible string from POX starting from its first letter. If he is
not able to make a word with POX but such a word exists in the English language then he is
awarded zero points. If he is able to make any such word then he'll be given points according to
the letters in the word. Now Sukriti wants his friend Abhinav who is a programmer to make a
program that helps him find the possible words so that she can choose from them.
We use a dictionary as data store for storing the words. When we type few letters of a word, we
can do a regular expression match each time through the dictionary text file and print the words
that starting the letters.
Trie data structure is a tree with each node consisting of one letter as data and pointer to the next
node. It uses Finite Deterministic automation. That means, it uses state to state transition.
Words are paths along this tree and the root node has no
characters associated with it.
1. The value of each node is the path or the character
sequence leading up to it.
2. The children at each node are ideally represented using a
hash table mapping the next character to the child nodes.
3. It’s also useful to set a flag at every node to indicate
whether a word/phrase ends there.
The insertion cost has a linear relationship with the string
length. Now let’s define the methods for listing all the
strings which start with a certain prefix. The idea is to
traverse down to the node representing the prefix and
from there do a breadth first search of all the nodes that are descendants of the prefix node. We
check if the node is at a word boundary from the flag we defined earlier and append the node’s
value to the results. A caveat of the recursive approach, is you might run into stack depth
problems when the maximum length of a string reaches tens of thousands of characters.
6. SOUVENIR’S BOOTH USING AUTO COMPLETE 5
Objectives____________________________________________________________________________
1) To optimize auto-completion of the word.
2) To optimize Depth first search by improving the traversal of data structure by special
algorithm and analysis techniques.
3) To Search Word in the Trie.
4) Modify the default Structure of Trie and with certain tweaks improve the overall time
complexity.
5) We substantially increased the number of word in the Trie from 4 to 45000 words and it
worked correctly.
7. SOUVENIR’S BOOTH USING AUTO COMPLETE 6
Literature Review___________________________________________________________________
There are many algorithms and data structures to index and search strings inside a text, some of
them are included in the standard libraries, but not all of them; the trie data structure is a good
example of one that isn’t.
Let word be a single string and let dictionary be a large set of words. If we have a dictionary, and
we need to know if a single word is inside of the dictionary the tries are a data structure that can
help us. But you may be asking yourself, “Why use tries if set <string> and hash tables can do
the same?” There are two main reasons:
The tries can insert and find strings in O(L) time (where L represent the length of a single word).
This is much faster than set, but is it a bit faster than a hash table.
The trie is a tree where each vertex represents a single word or a prefix.
The root represents an empty string (“”), the vertexes that are direct sons of the root
represent prefixes of length 1, the vertexes that are 2 edges of distance from the root
represent prefixes of length 2, the vertexes that are 3 edges of distance from the root
represent prefixes of length 3 and so on. In other words, a vertex that are k edges of distance
of the root have an associated prefix of length k.
Let v and w be two vertexes of the trie, and assume that v is a direct father of w, then v must
have an associated prefix of w.
The next figure shows a trie with the words “tree”, “trie”, “algo”, “assoc”, “all”, and “also.”
Note that every vertex of the tree does not store entire prefixes or entire words. The idea is that
the program should remember the word that represents each vertex while lower in the tree.
8. SOUVENIR’S BOOTH USING AUTO COMPLETE 7
Algorithmic Techniques_____________________________________________________________
For printing all possible words in a trie with given root: Backtracking Traversal (similar
to DFS)
For searching the prefix in trie: naïve algorithm used
For insertion of new words in trie: Recursion
Code___________________________________________________________________________________
#include<iostream>
#include<fstream>
#include<string>
using namespace std;
//Global variables (for the purpose of flags) =>
//found: to indicate whether the word is found or not
//len: to indicate user specified length
int found=0,len=-9999;
class node{ //Node class
public:
char info; //the letter corresponding to node is stored here
string Word;
//for storing words (if completed) till that node
class node* ptrs[256];
//to hold the pointers to 256 next possible letters (ASCII)
node(){
//node class constructor
for(int i=0;i<256;i++){
ptrs[i]=NULL;
}
info=NULL;
Word="";
}
};
void insertword(string word,int pos,class node * root){
//function to insert word
if(word.length()==pos){
//if last position is reached
root->Word=word;
//Put the final word into node
return;
}
9. SOUVENIR’S BOOTH USING AUTO COMPLETE 8
if( root-> ptrs[word[pos]]==NULL ){
//if next letter is not found
node *newnode;
//create new node
newnode= new node;
newnode->info=word[pos];
//with info = letter to store
root->ptrs[word[pos]]=newnode;
//pointer from current letter to next letter's node
insertword(word,pos+1,root->ptrs[word[pos]]);
//call insert word finction again with next position
}
else
insertword(word,pos+1,root->ptrs[word[pos]]);
//if next letter to add already exists
}
//call insert word directly
void printall(class node * root){
//function to print all possible words given root of trie
for(int i=0;i<256;i++)
//for all 256 pointers in the provided root node
if(root->ptrs[i]!=NULL){
//if those pointers are not null
printall(root->ptrs[i]);
//call printall recursively at each one of them
}
//Similar to DFS traversal
//following code will be executed when the recursion starts
backtracking
if(root->Word != "" && (root->Word.length()==len && len!=-9999))
//first user specified length words will be searched
cout<<" -> "<<root->Word<<endl;
//and printed if they exist
else if(root->Word != "" && len==-9999)
//otherwise, all matching words are printed
{
cout<<" -> "<<root->Word<<endl;
found=1;
}
}
void suggest(string key,int pos, class node * root){
//function to print suggested words
10. SOUVENIR’S BOOTH USING AUTO COMPLETE 9
if(root->ptrs[key[pos]] != NULL){
//if node is the last position of given key
suggest(key,pos+1,root->ptrs[key[pos]]);
//call suggest with next position
}
else{
printall(root);
//when last node, print all words below it
}
}
int main(){
ifstream in("wordlist.txt");
//input file "wordlist.txt"
string word,current="",key;
char ch;
node *root;
//root node of trie defined
root = new node;
while(in){
in>>word;
//file parsed and all words input to trie
insertword(word,0,root);
}
in.close();
//file stream closed
cout<<endl<<"Trie has been created successfully!"<<endl;
//trie created
cout<<"Enter the starting letters of the word : ";
cin>>key;
//input key from user
cout<<"Do you know the length of the word?(y/n)";
//input required word length(if any)
cin>>ch;
sos:
if(ch=='y')
{
cout<<"Enter the lengthn";
cin>>len;
if(len<=0) {
//if user enters incorrect string length
cout<<"Please enter a length greater than 0.n";
goto sos;
}
}
11. SOUVENIR’S BOOTH USING AUTO COMPLETE 10
cout<<endl<<"Possible suggestions are :"<<endl;
suggest(key,0,root);
//suggest function call
if(found && ch=='y')
{
cout<<"No words of specified length found";
//if no string found
}
return 0;
}
Schema of Databse____________________________________________________________________
It is the file-based database which used to store words for the Trie. Initially we took 1000 words
and inserted them into the Trie. The words in the file can be of same prefix or different prefix.
As we optimized the code we could store up to 45000 words in the Trie.
12. SOUVENIR’S BOOTH USING AUTO COMPLETE 11
Asymptotic Analysis__________________________________________________________________
Space Complexity
In worst case analysis, the dictionary will consist of words with no common prefix. At every
level, 256 combinations will be possible from each node. So this will result in a lot of memory
usage.
Maximum number of nodes possible at level 1=256
Maximum number of nodes possible at level 2=2562
Maximum number of nodes possible at level 3=2563
Maximum number of nodes possible at level 4=2564
.
.
.
13. SOUVENIR’S BOOTH USING AUTO COMPLETE 12
Maximum number of nodes possible at level m=256m
Maximum total number of nodes in a trie with m as maximum length of word is:
256+2562
+2563
+2564
+……………. 256m
= (256(256m
+1 – 1) )/( 256-1)
= (256/255) (256m+1
– 1)
= O(256m
)
So trie will have exponential space complexity in worst case complexity.
Time Complexity
The key determines trie depth. Trie will have linear time complexities in all the cases. So it is
considered to be the most efficient algorithm for implementing trie data structure in spite of
large space complexity.
Using trie, we can search / insert the single key in O(M) time. Here, M is maximum string
length.
In this project, we get the search result in O(key_length + ∑(L)), where key_length is input
given by user and ∑(L) is summation of all the string lengths starting with prefix entered by
user.
Working Examples____________________________________________________________________
Figure 1: Output when user doesn't know length.
14. SOUVENIR’S BOOTH USING AUTO COMPLETE 13
Figure 2: Output when user inputs non-positive length
Figure 3: Output when specified length matches
Figure 4: The dictionary
15. SOUVENIR’S BOOTH USING AUTO COMPLETE 14
Conclusion__________________________________________________________________________
Sukriti's problem of finding the desired solution was solved.
Auto complete was successfully implemented using Trie.
Suggests words if words of desired length are not available.
DFS and Trie's implementation was understood.
Contribution to Society_______________________________________________________________
Auto Complete speeds up human-computer interactions when it correctly predicts words being
typed.
It works best in domains with a limited number of possible words (such as in command line
interpreters), when some words are much more common (such as when addressing an e-mail), or
writing structured and predictable text (as in source code editors).
Works for:
1. Web Browsers
2. E-Mail Programs
3. Search Engines
4. Source Code
5. Word Processors
6. Command-Line Interpreters