This document proposes a range query algorithm for big data based on MapReduce and P-Ring indexing. It introduces a dual-index structure that uses an improved P-Ring to index files across nodes, and B+ trees to index attribute values within files. The algorithm divides range queries into two steps: using P-Ring to search for relevant files, then searching within those files' B+ trees to find matching data. It describes how the P-Ring and B+ trees are constructed and used to process range queries in a distributed manner optimized for big data environments.
This document summarizes chapters from a book on database management systems using Oracle SQL and PL/SQL. It provides an overview of topics covered in each chapter, including query processing, file organization, distributed processing, transaction processing, concurrency control, data warehousing, online analytical processing and data mining techniques. The book is divided into three parts that introduce theoretical and programming concepts. It includes examples, objective questions, and exercises. The authors provide their contact information and invite feedback to improve the book.
This document provides an overview of indexing and hashing techniques for database systems. It discusses ordered indices like primary and secondary indices, which are based on sorted keys. It also covers hash indices, which distribute keys uniformly across hash buckets. The document evaluates different indexing techniques based on factors like access time, insertion/deletion time, and space overhead. It describes B+ tree indices, which maintain efficiency during data modifications. Multi-level indexing is introduced to handle large index files that do not fit in memory.
Exploration of Call Transcripts with MapReduce and Zipf’s LawTom Donoghue
This study implements a proof of concept
pipeline to capture web based call transcripts and produces
a word frequency dataset ready for textual analysis
Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
Spatial database are becoming more and more popular in recent years. There is more and more
commercial and research interest in location-based search from spatial database. Spatial keyword search
has been well studied for years due to its importance to commercial search engines. Specially, a spatial
keyword query takes a user location and user-supplied keywords as arguments and returns objects that are
spatially and textually relevant to these arguments. Geo-textual index play an important role in spatial
keyword querying. A number of geo-textual indices have been proposed in recent years which mainly
combine the R-tree and its variants and the inverted file. This paper propose new index structure that
combine K-d tree and inverted file for spatial range keyword query which are based on the most spatial
and textual relevance to query point within given range.
This document discusses several key differences between traditional databases and Hive. Hive uses a schema-on-read model where the schema is not enforced during data loading, making the initial load much faster. However, this impacts query performance since indexing and compression cannot be applied during loading. Pig Latin is a data flow language where each step transforms the input relation, unlike SQL which is declarative. While Hive originally lacked features like updates, transactions and indexing, the developers are working to integrate HBase and improve support for these features.
IRJET- Resume Information Extraction FrameworkIRJET Journal
The document discusses a framework for extracting information from resumes. Resumes are semi-structured documents that contain varying information like different fields, field names, and formats, making them difficult to parse. The proposed framework uses text mining and rule-based parsing to extract keywords from resumes, scores qualifications and skills, clusters the extracted information using DBSCAN, and classifies the resumes using gradient boosting machines. It aims to help recruiters filter and categorize large numbers of resumes more efficiently.
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
This document describes a cluster model for combining latent topics with document attributes in text analysis. It introduces topic models and describes how metadata can be incorporated. The model restricts each document to one topic to allow collapsing observations. An algorithm is provided and applied to congressional speech and restaurant review data. Results show the model can recover topics similarly to topic models, while also capturing variation explained by metadata like political affiliation or review rating.
This document summarizes chapters from a book on database management systems using Oracle SQL and PL/SQL. It provides an overview of topics covered in each chapter, including query processing, file organization, distributed processing, transaction processing, concurrency control, data warehousing, online analytical processing and data mining techniques. The book is divided into three parts that introduce theoretical and programming concepts. It includes examples, objective questions, and exercises. The authors provide their contact information and invite feedback to improve the book.
This document provides an overview of indexing and hashing techniques for database systems. It discusses ordered indices like primary and secondary indices, which are based on sorted keys. It also covers hash indices, which distribute keys uniformly across hash buckets. The document evaluates different indexing techniques based on factors like access time, insertion/deletion time, and space overhead. It describes B+ tree indices, which maintain efficiency during data modifications. Multi-level indexing is introduced to handle large index files that do not fit in memory.
Exploration of Call Transcripts with MapReduce and Zipf’s LawTom Donoghue
This study implements a proof of concept
pipeline to capture web based call transcripts and produces
a word frequency dataset ready for textual analysis
Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
Spatial database are becoming more and more popular in recent years. There is more and more
commercial and research interest in location-based search from spatial database. Spatial keyword search
has been well studied for years due to its importance to commercial search engines. Specially, a spatial
keyword query takes a user location and user-supplied keywords as arguments and returns objects that are
spatially and textually relevant to these arguments. Geo-textual index play an important role in spatial
keyword querying. A number of geo-textual indices have been proposed in recent years which mainly
combine the R-tree and its variants and the inverted file. This paper propose new index structure that
combine K-d tree and inverted file for spatial range keyword query which are based on the most spatial
and textual relevance to query point within given range.
This document discusses several key differences between traditional databases and Hive. Hive uses a schema-on-read model where the schema is not enforced during data loading, making the initial load much faster. However, this impacts query performance since indexing and compression cannot be applied during loading. Pig Latin is a data flow language where each step transforms the input relation, unlike SQL which is declarative. While Hive originally lacked features like updates, transactions and indexing, the developers are working to integrate HBase and improve support for these features.
IRJET- Resume Information Extraction FrameworkIRJET Journal
The document discusses a framework for extracting information from resumes. Resumes are semi-structured documents that contain varying information like different fields, field names, and formats, making them difficult to parse. The proposed framework uses text mining and rule-based parsing to extract keywords from resumes, scores qualifications and skills, clusters the extracted information using DBSCAN, and classifies the resumes using gradient boosting machines. It aims to help recruiters filter and categorize large numbers of resumes more efficiently.
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
This document describes a cluster model for combining latent topics with document attributes in text analysis. It introduces topic models and describes how metadata can be incorporated. The model restricts each document to one topic to allow collapsing observations. An algorithm is provided and applied to congressional speech and restaurant review data. Results show the model can recover topics similarly to topic models, while also capturing variation explained by metadata like political affiliation or review rating.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Ginix Generalized Inverted Index for Keyword SearchIRJET Journal
This paper presents a new index structure called Ginix (Generalized Inverted Index) that more efficiently supports keyword searches on text datasets. Ginix compresses traditional inverted indexes by merging consecutive document IDs into intervals to reduce storage space. It also develops efficient algorithms for set operations like union and intersection directly on the interval lists without requiring decompression. Experiments show that Ginix not only reduces index size but also improves search performance compared to traditional inverted indexes on real datasets.
This document proposes a hypergraph-based approach for optimizing SPARQL queries on RDF data. It first transforms an RDF graph into a hypergraph by grouping subjects and objects connected by the same predicate under a hyperedge for that predicate. It then rearranges the patterns in a SPARQL query based on the size of corresponding hyperedges to build a query path for efficient processing. The query is executed by looping through the rearranged patterns and extracting required subjects and objects from the hypergraph representation to find matching triples. Experimental results show this approach performs better than existing systems like RDF-3x, Jena and AllegroGraph.
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSINGijdms
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown
kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition
index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
This document provides an introduction to data analysis techniques using Python. It discusses key Python libraries for data analysis like NumPy, Pandas, SciPy, Scikit-Learn and libraries for data visualization like matplotlib and Seaborn. It covers essential concepts in data analysis like Series, DataFrames and how to perform data cleaning, transformation, aggregation and visualization on data frames. It also discusses statistical analysis, machine learning techniques and how big data and data analytics can work together. The document is intended as an overview and hands-on guide to getting started with data analysis in Python.
The document discusses indexing and hashing techniques in database systems. It describes ordered indices like B-trees which store index entries in sorted order and allow efficient retrieval of records within a range. B-trees overcome limitations of indexed sequential files by reorganizing with local changes instead of periodic rebuilds. The document also covers hashing which distributes records using a hash function, and compares indexing with hashing.
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
This document discusses using a Master Resource Description Framework (MRDF) to improve data retrieval efficiency from databases. The MRDF combines multiple RDF files into a single framework to reduce the time needed for search engines to query each individual RDF file. It also describes using a user profile to track user interests and tailor query results accordingly for a personalized search experience. The MRDF approach is presented as improving search efficiency while retrieving data from databases.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Multikeyword Hunt on Progressive GraphsIRJET Journal
1. The document presents a visualization technique for organizing and exploring large document collections by topic. It uses techniques like removing stop words, stemming, identifying frequent words, generating synonyms, and force-directed layout to generate topics from document collections and visualize them in a graph.
2. Key steps include preprocessing the documents by removing stop words and stemming, identifying frequent words, generating synonyms to group related words into topics, using force-directed layout to position topics based on their relationships, and enabling interactive visualization of the topic graph and retrieval of related documents.
3. The proposed technique aims to help users efficiently browse, organize and find relevant information from large document collections with minimal human intervention through an interactive visualized topic graph.
Survey on scalable continual top k keyword search in relational databaseseSAT Journals
Abstract Keyword search in relational database is a technique that has higher relevance in the present world. Extracting data from a large number of sets of database is very important .Because it reduces the usage of man power and time consumption. Data extraction from a large database using the relevant keyword based on the information needed is a very interactive and user friendly. Without knowing any database schemas or query languages like sql the user can get information. By using keyword in relational database data extraction will be simpler. The user doesn’t want to know the query language for search. But the database content is always changing for real time application for example database which store the data of publication data. When new publications arrive it should be added to database so the database content changes according to time. Because the database is updated frequently the result should change. In order to handle the database updation takes the top-k result from the currently updated data for each search. Top-k keyword search means take greatest k results based on the relevance of document. Keyword search in relational database means to find structural information from tuples from the database. Two types of keyword search are schema-based method and graph based approach. Using top-k keyword search instead of executing all query results taking highest k queries. By handling database updation try to find the new results and remove expired one
Triple-Triple RDF Store with Greedy Graph based GroupingVinoth Chandar
This document proposes improvements to the query performance of triple stores that store RDF data in a relational database. It explores storing RDF triples in different orders in three tables and developing a query rewriting scheme. It also looks at optimizing the physical schema through graph clustering techniques that aim to reduce the number of disk accesses by grouping related triples closer together. The paper presents an implementation of these techniques over a million triples and shows they can yield significant performance benefits on complex queries.
This document summarizes various techniques for scalable continual top-k keyword search in relational databases. There are two main approaches: schema-based and graph-based. Schema-based methods generate candidate networks from the database schema and evaluate them. Graph-based methods represent the database as a graph and use techniques like bidirectional expansion. Top-k keyword search finds the highest scoring k results instead of all results. Methods like the Global Pipeline algorithm and Skyline-Sweeping algorithm efficiently process top-k queries over multiple candidate networks. Techniques for updating results with database changes include maintaining an initial top-k and recalculating scores. Lattice-based methods share computational costs for keyword search in data streams.
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
This document describes using the Febrl (Freely Extensible Biomedical Record Linkage) framework to perform efficient record de-duplication. It discusses how Febrl allows for data cleaning, standardization, indexing, field comparison, and weight vector classification. Indexing techniques like blocking indexes, q-grams, and canopy clustering are used to reduce the number of record pair comparisons. Field comparison functions calculate matching weights, and classifiers like Fellegi-Sunter and support vector machines are used to determine matches. The method is evaluated on real-world health data, showing accuracy, precision, recall, and false positive rates for different partitioning methods.
This document discusses database management systems and contains slides related to external data storage, file organization, indexing, and performance comparisons. Specifically, it provides information on different file organizations like heap files, sorted files, and indexed files. It also describes index structures like B+ trees and hash indexes. The slides provide comparisons of the cost of common operations like scans, searches, and updates between different file organization approaches.
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document provides an overview of Python basics for data analysis, including introductions to key Python packages like NumPy, Pandas, and Matplotlib. It covers fundamental Python concepts like data types, operators, conditional statements, loops and functions. It also demonstrates how to load and manipulate data with NumPy arrays and Pandas DataFrames, including indexing, slicing, grouping, merging, and handling missing values. Visualization with Matplotlib charts is also covered.
Context Based Web Indexing For Semantic WebIOSR Journals
This summarizes a document that proposes a new context-based indexing technique using a B+ tree for web search engines. It extracts keywords from web documents and indexes them along with their contexts and ontologies in a B+ tree. This improves search speed by allowing relevant documents to be found faster from the semantic web through an optimized indexing structure as compared to linear searches or other trees. The proposed technique increases precision and recall for user queries by incorporating contextual information into the indexing process.
The document discusses the small business marketing technique of risk reversal, where customers are offered something with no risk, such as a money-back guarantee or free trial period. It provides examples of companies offering 60-day money-back guarantees or free 60-day trials of software. The author explains that making a limited-time offer, such as in celebration of an event, can spark people into action and help filter out non-serious customers. Their own company provides a free one-month trial of its membership site through November 22nd to attract new subscribers.
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...IJMER
Along with an increase in the use and development of various types of mobile ad hoc and
wireless sensor networks the necessity for presenting optimum routing in these networks is a topic yet to
be discussed and new algorithms are presented. Using ant colony optimization algorithm or ACO as a
routing method because of its structural similarities to these networks’ model, has had acceptable results
regarding different parameters especially quality of service (QoS). Considering the fact that many
articles have suggested and presented various models for ant based routing, the need for studying and comparing them can be felt. There are about 17 applied ant based routings, this article studies and compares the most important ant based algorithms so as to indicate the quality and importance of each of them under different conditions
Artificial Intelligence based optimization of weld bead geometry in laser wel...IJMER
This paper reports on a modeling and optimization of laser welding of aluminum-magnesium alloy thickness of 1.7mm. Regression analysis is used for modeling and Genetic algorithm is used for optimize the process parameters.The input values for the regression methods is taken according the Taguchi based orthogonal array. A software named Computer aided Robust Parameter Genetic Algorithm CARPGA has been developed in MATLAB 2013 which combine all of these methodologies. This software has been validated with some published paper.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Ginix Generalized Inverted Index for Keyword SearchIRJET Journal
This paper presents a new index structure called Ginix (Generalized Inverted Index) that more efficiently supports keyword searches on text datasets. Ginix compresses traditional inverted indexes by merging consecutive document IDs into intervals to reduce storage space. It also develops efficient algorithms for set operations like union and intersection directly on the interval lists without requiring decompression. Experiments show that Ginix not only reduces index size but also improves search performance compared to traditional inverted indexes on real datasets.
This document proposes a hypergraph-based approach for optimizing SPARQL queries on RDF data. It first transforms an RDF graph into a hypergraph by grouping subjects and objects connected by the same predicate under a hyperedge for that predicate. It then rearranges the patterns in a SPARQL query based on the size of corresponding hyperedges to build a query path for efficient processing. The query is executed by looping through the rearranged patterns and extracting required subjects and objects from the hypergraph representation to find matching triples. Experimental results show this approach performs better than existing systems like RDF-3x, Jena and AllegroGraph.
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSINGijdms
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown
kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition
index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
This document provides an introduction to data analysis techniques using Python. It discusses key Python libraries for data analysis like NumPy, Pandas, SciPy, Scikit-Learn and libraries for data visualization like matplotlib and Seaborn. It covers essential concepts in data analysis like Series, DataFrames and how to perform data cleaning, transformation, aggregation and visualization on data frames. It also discusses statistical analysis, machine learning techniques and how big data and data analytics can work together. The document is intended as an overview and hands-on guide to getting started with data analysis in Python.
The document discusses indexing and hashing techniques in database systems. It describes ordered indices like B-trees which store index entries in sorted order and allow efficient retrieval of records within a range. B-trees overcome limitations of indexed sequential files by reorganizing with local changes instead of periodic rebuilds. The document also covers hashing which distributes records using a hash function, and compares indexing with hashing.
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
This document discusses using a Master Resource Description Framework (MRDF) to improve data retrieval efficiency from databases. The MRDF combines multiple RDF files into a single framework to reduce the time needed for search engines to query each individual RDF file. It also describes using a user profile to track user interests and tailor query results accordingly for a personalized search experience. The MRDF approach is presented as improving search efficiency while retrieving data from databases.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Multikeyword Hunt on Progressive GraphsIRJET Journal
1. The document presents a visualization technique for organizing and exploring large document collections by topic. It uses techniques like removing stop words, stemming, identifying frequent words, generating synonyms, and force-directed layout to generate topics from document collections and visualize them in a graph.
2. Key steps include preprocessing the documents by removing stop words and stemming, identifying frequent words, generating synonyms to group related words into topics, using force-directed layout to position topics based on their relationships, and enabling interactive visualization of the topic graph and retrieval of related documents.
3. The proposed technique aims to help users efficiently browse, organize and find relevant information from large document collections with minimal human intervention through an interactive visualized topic graph.
Survey on scalable continual top k keyword search in relational databaseseSAT Journals
Abstract Keyword search in relational database is a technique that has higher relevance in the present world. Extracting data from a large number of sets of database is very important .Because it reduces the usage of man power and time consumption. Data extraction from a large database using the relevant keyword based on the information needed is a very interactive and user friendly. Without knowing any database schemas or query languages like sql the user can get information. By using keyword in relational database data extraction will be simpler. The user doesn’t want to know the query language for search. But the database content is always changing for real time application for example database which store the data of publication data. When new publications arrive it should be added to database so the database content changes according to time. Because the database is updated frequently the result should change. In order to handle the database updation takes the top-k result from the currently updated data for each search. Top-k keyword search means take greatest k results based on the relevance of document. Keyword search in relational database means to find structural information from tuples from the database. Two types of keyword search are schema-based method and graph based approach. Using top-k keyword search instead of executing all query results taking highest k queries. By handling database updation try to find the new results and remove expired one
Triple-Triple RDF Store with Greedy Graph based GroupingVinoth Chandar
This document proposes improvements to the query performance of triple stores that store RDF data in a relational database. It explores storing RDF triples in different orders in three tables and developing a query rewriting scheme. It also looks at optimizing the physical schema through graph clustering techniques that aim to reduce the number of disk accesses by grouping related triples closer together. The paper presents an implementation of these techniques over a million triples and shows they can yield significant performance benefits on complex queries.
This document summarizes various techniques for scalable continual top-k keyword search in relational databases. There are two main approaches: schema-based and graph-based. Schema-based methods generate candidate networks from the database schema and evaluate them. Graph-based methods represent the database as a graph and use techniques like bidirectional expansion. Top-k keyword search finds the highest scoring k results instead of all results. Methods like the Global Pipeline algorithm and Skyline-Sweeping algorithm efficiently process top-k queries over multiple candidate networks. Techniques for updating results with database changes include maintaining an initial top-k and recalculating scores. Lattice-based methods share computational costs for keyword search in data streams.
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
This document describes using the Febrl (Freely Extensible Biomedical Record Linkage) framework to perform efficient record de-duplication. It discusses how Febrl allows for data cleaning, standardization, indexing, field comparison, and weight vector classification. Indexing techniques like blocking indexes, q-grams, and canopy clustering are used to reduce the number of record pair comparisons. Field comparison functions calculate matching weights, and classifiers like Fellegi-Sunter and support vector machines are used to determine matches. The method is evaluated on real-world health data, showing accuracy, precision, recall, and false positive rates for different partitioning methods.
This document discusses database management systems and contains slides related to external data storage, file organization, indexing, and performance comparisons. Specifically, it provides information on different file organizations like heap files, sorted files, and indexed files. It also describes index structures like B+ trees and hash indexes. The slides provide comparisons of the cost of common operations like scans, searches, and updates between different file organization approaches.
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document provides an overview of Python basics for data analysis, including introductions to key Python packages like NumPy, Pandas, and Matplotlib. It covers fundamental Python concepts like data types, operators, conditional statements, loops and functions. It also demonstrates how to load and manipulate data with NumPy arrays and Pandas DataFrames, including indexing, slicing, grouping, merging, and handling missing values. Visualization with Matplotlib charts is also covered.
Context Based Web Indexing For Semantic WebIOSR Journals
This summarizes a document that proposes a new context-based indexing technique using a B+ tree for web search engines. It extracts keywords from web documents and indexes them along with their contexts and ontologies in a B+ tree. This improves search speed by allowing relevant documents to be found faster from the semantic web through an optimized indexing structure as compared to linear searches or other trees. The proposed technique increases precision and recall for user queries by incorporating contextual information into the indexing process.
The document discusses the small business marketing technique of risk reversal, where customers are offered something with no risk, such as a money-back guarantee or free trial period. It provides examples of companies offering 60-day money-back guarantees or free 60-day trials of software. The author explains that making a limited-time offer, such as in celebration of an event, can spark people into action and help filter out non-serious customers. Their own company provides a free one-month trial of its membership site through November 22nd to attract new subscribers.
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...IJMER
Along with an increase in the use and development of various types of mobile ad hoc and
wireless sensor networks the necessity for presenting optimum routing in these networks is a topic yet to
be discussed and new algorithms are presented. Using ant colony optimization algorithm or ACO as a
routing method because of its structural similarities to these networks’ model, has had acceptable results
regarding different parameters especially quality of service (QoS). Considering the fact that many
articles have suggested and presented various models for ant based routing, the need for studying and comparing them can be felt. There are about 17 applied ant based routings, this article studies and compares the most important ant based algorithms so as to indicate the quality and importance of each of them under different conditions
Artificial Intelligence based optimization of weld bead geometry in laser wel...IJMER
This paper reports on a modeling and optimization of laser welding of aluminum-magnesium alloy thickness of 1.7mm. Regression analysis is used for modeling and Genetic algorithm is used for optimize the process parameters.The input values for the regression methods is taken according the Taguchi based orthogonal array. A software named Computer aided Robust Parameter Genetic Algorithm CARPGA has been developed in MATLAB 2013 which combine all of these methodologies. This software has been validated with some published paper.
The document discusses how the creator used, developed, and challenged existing codes and conventions of soap operas and melodrama genres when creating their own soap opera trailer. They researched conventions of genres like social realism and melodrama to ensure their trailer followed expectations. Scenes and shots from Hollyoaks were analyzed and used as inspiration. Existing front covers and billboards were also examined to incorporate standard design elements while also adding new creative elements to conventions.
This document summarizes an experiment on using electrocoagulation with iron electrodes to remove mercury from wastewater. Key findings include:
1) Maximum mercury removal of 94.5% was achieved after 40 minutes of electrocoagulation at an applied potential of 9V, agitation of 400 rpm, initial pH of 4.5, and electrolyte concentration of 1.333 g/L.
2) Higher applied potentials and agitation rates decreased mercury removal efficiency due to excessive oxygen generation and unsuitable floc formation.
3) Operating costs were calculated based on energy consumption and electrode material costs. Electrocoagulation was found to be an efficient and fast method for mercury removal compared to conventional techniques.
Development of Nanocomposite from Epoxy/PDMS-Cyanate/Nanoclay for Materials w...IJMER
Dicyanate monomer viz bis-4-cyanato-polydimethylsiloxane(PDMS-CY) containing
siloxane known as thermally stable structural unit was prepared. The PDMS-CY/DGEBA-stability, thermal degradation kinetics and microstructures
This document summarizes a research paper on using a multilevel converter topology for AC-DC harmonic immunity in VSC HVDC transmission. Key points:
- Voltage-source converter (VSC) based HVDC systems use multilevel converters to achieve high voltage switching capabilities. Cascaded H-bridge converters allow utilization of different DC voltages on individual cells.
- Selective harmonic elimination pulse-width modulation (SHE-PWM) techniques offer the lowest possible number of switching transitions and losses for VSC HVDC. The paper discusses optimized modulation patterns providing controlled harmonic immunity between AC and DC sides for a two-level converter with a rippled DC link voltage.
- Simulation and experimental results presented confirm the
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Skyline Query Processing using Filtering in Distributed EnvironmentIJMER
This document summarizes a research paper about skyline query processing in distributed databases. Skyline queries return multidimensional data points that are not dominated by other points. In distributed databases, skyline queries must be processed across multiple data sites. The paper proposes using multiple filtering points selected from each local skyline result to reduce the number of false positive results and communication costs between sites. Two heuristics called MaxSum and MaxDist are described for selecting filtering points that maximize their combined dominating potential across sites to improve distributed skyline query processing performance.
Voice over IP (VOIP) Security Research- A ResearchIJMER
This document summarizes research on Voice over IP (VoIP) security. It begins with an overview of SIP (Session Initiation Protocol), a commonly used VoIP standard, and a taxonomy of VoIP security threats. It then surveys existing research on VoIP security classified according to the threat categories. The research covers threats like eavesdropping, denial of service attacks, toll fraud, and spam over IP telephony (SPIT). Approaches studied include encryption, authentication, reputation systems, audio fingerprinting, and Turing tests to detect automated SPIT callers. The goal is to identify gaps and guide future work on analyzing VoIP attackers and improving the security and resilience of VoIP systems.
Performance of MMSE Denoise Signal Using LS-MMSE TechniqueIJMER
This paper presents performance of mmse denoises signal using consistent cycle spinning
(ccs) and least square (LS) techniques. In the past decade, TV denoise technique is used to reduced the
noisy signal. The main drawback is the low quality signal and high MMSE signal. Presently, we
proposed the CCS-MMSE and LS-MMSE technique .The CCS-MMSE technique consists of two steps.
They are wavelet based denoise and consistent cycle spinning. The wavelet denoise is powerful
decorrelating effect on many signal domains. The consistent cycle spinning is used to estimation the
MMSE in the signal domain. The LS-MMSE is better estimation of MMSE signal domain compare to
CCS-MMSE.The experimental result shows the average MMSE reduction using various techniques.
To make a biogas energy from different sources & creating awareness between h...IJMER
Biogas from biomass appears as an alternative source of energy, which is potentially enriched in biomass resources. This article gives an overview of present and future use of biomass as an industrial feedstock for production of fuels, chemicals and other materials. However, to be truly competitive in an open market situation, higher value products are required. Results suggest that biogas technology must be encouraged, promoted, invested, implemented, and demonstrated, but especially in remote rural areas. Different types of wastes are used for production of biogas .these wastes are found very easy and an every palace. This article helps to make biogas form different wastes. From this study, it can be concluded that this method not only contributed to renewable biogas production but also improved the effluent quality
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
This document introduces and studies the concept of δˆ s-locally closed sets in ideal topological spaces. Some key points:
- A subset A is δˆ s-locally closed if A can be written as the intersection of a δˆ s-open set and a δˆ s-closed set.
- Various properties of δˆ s-locally closed sets are introduced and characterized, including relationships to other concepts like generalized locally closed sets.
- It is shown that a subset A is δˆ s-locally closed if and only if A can be written as the intersection of a δˆ s-open set and the δˆ s-closure of A.
- Theore
Dynamic Spectrum Allocation in Wireless sensor NetworksIJMER
Radio frequency spectrum is considered the most expensive and scarce resource among all wireless
network resources, and it is closely followed by the energy consumption, especially in low energy, battery powered
wireless sensor network devices. These days, there is a tremendous growth in the applications of wireless sensor
networks (WSNs) operating in unlicensed spectrum bands (ISM). Moreover, due to the rapid growth of wireless
devices that are designed to be operated in unlicensed spectrum bands, these spectrum bands have been overcrowded.
The problem with overcrowded spectrum or scarcity of spectrum can be solved by Dynamic Allocation of Spectrum.
In this paper we have presented the implementation and analysis of dynamic spectrum allocation in Wireless Sensors
Networks using the concept of Cognitive Radio Ad Hoc Network.
Fatigue Analysis of Acetylene converter reactorIJMER
The structural integrity of mechanical components during several transients should be
assured in the design stage. This requires a fatigue analysis including thermal and structural analysis. As
an example, this study performs a fatigue analysis of the acetylene converter reactor during arbitrary
transients. Using heat transfer coefficients determined based on the operating environments, a transient
thermal analysis is performed and the results are applied to a finite element model along with the
pressure to calculate the stresses. The total stress intensity range and cumulative fatigue usage factor are
investigated to determine the adequacy of the design.
Stability of the Equilibrium Position of the Centre of Mass of an Inextensibl...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A Survey of User Authentication Schemes for Mobile DeviceIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
: The main objective of this paper is the systematic description of the current research and
development of small or miniature unmanned aerial vehicles and micro aerial vehicles, with a focus on
rotary wing vehicles. In recent times, unmanned/Micro aerial vehicles have been operated across the
world; they have also been the subject of considerable research. In particular, UAVs/MAVs with rotary
wings have been expected to perform various tasks such as monitoring at fixed points and surveillance
from the sky since they can perform not only perform static flights by hovering but also achieve vertical
takeoffs and landing. Helicopters have been used for personnel transport, carrying goods, spreading
information, and performing monitoring duties for long periods. A manned helicopter has to be used for
all these duties. On the other hand, unmanned helicopters that can be operated by radio control have
been developed as a hobby. Since unmanned helicopters are often superior to manned helicopters in
terms of cost and safety, in recent years, accomplishing tasks using unmanned helicopters has become
popular. Considerable expertise is required to operate unmanned helicopters by radio control, and
hence, vast labor resources are employed to train operators. Moreover, it is impossible to operate
unmanned helicopters outside visual areas because of lack of radio control, and the working area is
hence limited remarkably. For solving the above problems, it is necessary to realize autonomous
control of unmanned helicopters. However, no general method for designing the small unmanned
helicopters has been developed yet – today, various design techniques by different study groups using
different helicopters exist. In this paper the conceptual design process is explained.
Este documento describe los fundamentos y clasificación de las eflorescencias en ladrillos de construcción. Explica que las eflorescencias son depósitos de sales solubles que aparecen en la superficie de los ladrillos, reduciendo su calidad estética. Los tipos y causas de eflorescencias son diversos y están influenciados por múltiples factores como la migración de sales, la capacidad de absorción de agua del ladrillo, y las condiciones ambientales. Finalmente, detalla las sales solubles más comunes que causan eflorescencias
Patent database a methodology of information retrieval from pdfijdms
Patent document holds wealth of information in itse
lf. A brief detail of Indian patent application
information is published as eighteen month publicat
ion by Indian patent Office, in electronic gazette
weekly. To date, a proper database of Indian patent
s specifically for research determination has not b
een
available, making it complicated for researcher to
use this data for measuring any kind of research
activities in terms of patents in India. To facilit
ate this, we constructed a comprehensive patent dat
abase
which incorporates the information presented in the
electronic gazette. This database includes informa
tion
such as technology class, applicant, inventor, coun
try of origin etc., of the patent submitted. We pre
sent the
methodology for the creation of this database, its
basic features along with its accuracy and reliabil
ity in
this research paper. Patent based database has bee
n developed and can be used for various innovation
researches and activities.
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Web Oriented FIM for large scale dataset using Hadoopdbpublications
In large scale datasets, mining frequent itemsets using existing parallel mining algorithm is to balance the load by distributing such enormous data between collections of computers. But we identify high performance issue in existing mining algorithms [1]. To handle this problem, we introduce a new approach called data partitioning using Map Reduce programming model.In our proposed system, we have introduced new technique called frequent itemset ultrametric tree rather than conservative FP-trees. An investigational outcome tells us that, eradicating redundant transaction results in improving the performance by reducing computing loads.
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...IRJET Journal
This document compares the performance of different file formats (Avro, Parquet, ORC, textfile) for storing large datasets in Hadoop. It conducts experiments loading datasets into tables using each format and measuring file size compression and query execution times. The results show that Parquet provides the most compact storage size, compressing data to about 1/3 the size of the original textfile. ORC and Parquet have similar performance for query execution, with ORC having a slightly smaller file size. In general, the binary formats (Avro, Parquet, ORC) outperform textfile for both storage efficiency and query performance.
A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES
Data analysis, data management, and big data play a major role in both social and business perspective, in the last decade. Nowadays, the graph database is the hottest and trending research topic. A graph database is preferred to deal with the dynamic and complex relationships in connected data and offer better results. Every data element is represented as a node. For example, in social media site, a person is represented as a node, and its properties name, age, likes, and dislikes, etc and the nodes are connected with the relationships via edges. Use of graph database is expected to be beneficial in business, and social networking sites that generate huge unstructured data as that Big Data requires proper and efficient computational techniques to handle with. This paper reviews the existing graph data computational techniques and the research work, to offer the future research line up in graph database management.
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET Journal
This document provides an overview of Hadoop and compares it to Spark. It discusses the key components of Hadoop including HDFS for storage, MapReduce for processing, and YARN for resource management. HDFS stores large datasets across clusters in a fault-tolerant manner. MapReduce allows parallel processing of large datasets using a map and reduce model. YARN was later added to improve resource management. The document also summarizes Spark, which can perform both batch and stream processing more efficiently than Hadoop for many workloads. A comparison of Hadoop and Spark highlights their different processing models.
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
Big data is the biggest challenges as we need huge processing power system and good algorithms to make a decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.
This document discusses big data and Hadoop. It defines big data as large datasets that are difficult to process using traditional methods due to their volume, variety, and velocity. Hadoop is presented as an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. The key components of Hadoop are the Hadoop Distributed File System (HDFS) for storage and MapReduce as a programming model for distributed processing. A number of other technologies in Hadoop's ecosystem are also described such as HBase, Avro, Pig, Hive, Sqoop, Zookeeper and Mahout. The document concludes that Hadoop provides solutions for efficiently processing and analyzing big data.
Improving Association Rule Mining by Defining a Novel Data StructureIRJET Journal
This document proposes a new data structure to improve the performance of association rule mining algorithms on large datasets. It involves compressing the original data in three steps: shuffling records based on similarity, creating an inverted index to map attribute values to transactions, and applying run length encoding. This compressed data structure reduces data size, loading time, and memory requirements compared to the original data. Experimental results on two datasets show the proposed structure improves the speed of an association rule mining algorithm while maintaining data validity. The structure is designed to work with existing algorithms without changing them.
This document discusses using the R programming language and RHadoop libraries to perform association rule mining on big data stored in Hadoop. It first provides background on big data, association rule mining, and integrating R with Hadoop using RHadoop. It then describes setting up an 8-node Hadoop cluster using Ambari and installing RHadoop libraries to enable R scripts to run MapReduce jobs on the cluster. The goal is to use R and RHadoop to analyze a training dataset and discover interesting association rules.
IRJET- Data Mining - Secure Keyword ManagerIRJET Journal
This document presents a secure and efficient scheme for multi-keyword ranked search over encrypted cloud data that supports dynamic operations like document deletion and insertion. It constructs an index tree based on the vector space model to enable multi-keyword search while supporting flexible updates. Cosine similarity is used to accurately rank search results. The scheme also proposes a "Greedy Depth first Traverse Strategy" search algorithm and a secure scheme to protect search privacy in the known ciphertext threat model.
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET Journal
This document proposes a method called Top Order Preserving Encryption (TOPE) to support top-k queries on encrypted data outsourced to public clouds. TOPE extends existing Order Preserving Encryption (OPE) schemes to preserve order when encrypting data, allowing comparison operations and queries to be run directly on the encrypted data. The proposed method uses TOPE to encrypt data before outsourcing to the cloud, and stores the encrypted data in min-heap and max-heap trees indexed by attributes. This allows the cloud to efficiently retrieve the top-k most relevant tuples for a given query while preserving data privacy during query processing.
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmIRJET Journal
This document proposes using a ranking algorithm and sampling algorithm to improve the performance of a heterogeneous Hadoop cluster. The ranking algorithm prioritizes data distribution based on node frequency, so that higher frequency nodes are processed first. The sampling algorithm randomly selects nodes for data distribution instead of evenly distributing across all nodes. The proposed approach reduces computation time and improves overall cluster performance compared to the existing approach of evenly distributing data across nodes of varying sizes. Results show the proposed approach reduces execution time for various file sizes compared to the existing approach.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
Mankind has stored more than 295 billion gigabytes (or 295 Exabyte) of data since 1986, as per a report by the University of Southern California. Storing and monitoring this data in widely distributed environments for 24/7 is a huge task for global service organizations. These datasets require high processing power which can’t be offered by traditional databases as they are stored in an unstructured format. Although one can use Map Reduce paradigm to solve this problem using java based Hadoop, it cannot provide us with maximum functionality. Drawbacks can be overcome using Hadoop-streaming techniques that allow users to define non-java executable for processing this datasets. This paper proposes a THESAURUS model which allows a faster and easier version of business analysis.
IRJET- Determining Document Relevance using Keyword ExtractionIRJET Journal
This document describes a system that aims to search for and retrieve relevant documents from a large collection based on a user's query. It does this through three main components: keyword extraction, document searching, and a question answering bot. Keyword extraction is done using the TF-IDF algorithm to identify important words in documents. These keywords are stored in a database along with their TF-IDF weights. When a user submits a query, the system searches for documents containing keywords from the query and returns relevant results. It also includes a feedback mechanism for users to improve search accuracy over time. The goal is to deliver accurate search results quickly from large document collections.
Mining High Utility Patterns in Large Databases using Mapreduce FrameworkIRJET Journal
This document discusses mining high utility patterns from large databases using the MapReduce framework. It proposes using the d2HUP algorithm to efficiently mine high utility patterns from partitioned big data in parallel. The algorithm traverses a reverse set enumeration tree using depth-first search to identify high utility patterns based on a minimum utility threshold, without generating candidates. It partitions the data using MapReduce and mines patterns from each partition individually. The results are then combined to obtain the final high utility patterns. The proposed approach aims to improve efficiency over existing methods that are not scalable to large datasets.
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
This document summarizes research on the fabrication and characterization of bio-composite materials using sunnhemp fibre. The document discusses how sunnhemp fibre was used to reinforce an epoxy matrix through hand lay-up methods. Various mechanical properties of the bio-composites were tested, including tensile, flexural, and impact properties. The results of the mechanical tests on the bio-composite specimens are presented. Potential applications of the sunnhemp fibre bio-composites are also suggested, such as in fall ceilings, partitions, packaging, automotive interiors, and toys.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
This document discusses integrating the Spring, Struts, and Hibernate frameworks to develop enterprise applications. It provides an overview of each framework and their features. The Spring Framework is a lightweight, modular framework that allows for inversion of control and aspect-oriented programming. It can be used to develop any or all tiers of an application. The document proposes an architecture for an e-commerce website that integrates these three frameworks, with Spring handling the business layer, Struts the presentation layer, and Hibernate the data access layer. This modular approach allows for clear separation of concerns and reduces complexity in application development.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Natural Language Ambiguity and its Effect on Machine LearningIJMER
This document discusses natural language ambiguity and its effect on machine learning. It begins by introducing different types of ambiguity that exist in natural languages, including lexical, syntactic, semantic, discourse, and pragmatic ambiguities. It then examines how these ambiguities present challenges for computational linguistics and machine translation systems. Specifically, it notes that ambiguity is a major problem for computers in processing human language as they lack the world knowledge and context that humans use to resolve ambiguities. The document concludes by outlining the typical process of machine translation and how ambiguities can interfere with tasks like analysis, transfer, and generation of text in the target language.
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
This document describes the implementation of an I2C slave interface using Verilog HDL. It introduces the I2C protocol which uses only two bidirectional lines (SDA and SCL) for communication. The document discusses the I2C protocol specifications including start/stop conditions, addressing, read/write operations, and acknowledgements. It then provides details on designing an I2C slave module in Verilog that responds to commands from an I2C master and allows synchronization through clock stretching. The module is simulated in ModelSim and synthesized in Xilinx. Simulation waveforms demonstrate successful read and write operations to the slave device.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
Pipelines are the least expensive and most effective method for the oil transportation.
Due to high viscosity of crude oil, the pressure drop and pumping power requirements are very high.
So it is necessary to bring down the viscosity of crude oil. Heated pipelines are used reduce the oil
viscosity by increasing the oil temperature. Electrical heating and direct flame heating are the common
method used for heating the oil pipeline. In this work, a new application of Parabolic Trough Collector
in the field of oil pipeline transport is introduced for reducing pressure drop in oil pipelines. Oil
pipeline is heated by applying concentrated solar radiation on the pipe surface using a Parabolic
Trough Collector in which the oil pipeline acts as the absorber pipe. 3-D steady state analysis is
carried out on a heated oil pipeline using commercial CFD software package ANSYS Fluent 14.5. In
this work an effort is made to investigate the effect of concentrated solar radiation for reducing
pressure drop in the oil pipeline. The results from the numerical analysis shows that the pressure drop
in oil pipeline is get reduced by heating the pipe line using concentrated solar radiation. From this
work, the application of PTC in oil pipeline transportation is justified.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
1. International
OPEN ACCESS Journal
Of Modern Engineering Research (IJMER)
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 | Feb. 2014 | 128 |
Range Query on Big Data Based on Map Reduce
Zhang Tingting1
, Qu Haipeng2
1
(College of Information Science and Engineering, Ocean University of China, China)
2
(Corresponding Author, College of Information Science and Engineering, Ocean University of China, China)
I. Introduction
Today, the amount of data in various fields is increasing. Traditional data processing is transforming to
big data. Big data which is firstly put forward by Viktor Mayer-Schönberger 0 is a data set whose content can’t
be crawled, managed and processed through conventional software tools within a certain time. It has four
characteristics which are Volume, Variety, Velocity and Value 0. Google deals with data through large-scale
cluster and MapReduce software and the amount of data processed every month is over 400PB 0. Cloud
computing platform of Yahoo! has 34 clusters and more than 30000 computers. Its total storage capacity is over
100PB 0. The traditional query algorithm is unable to meet the needs of enterprises. How to quickly find
qualified data in such a flood of information becomes a major problem of enterprises.
Traditional range query is mainly based on establishing ordered indexes. Reference 0 presented a range
query algorithm in P2P system. Firstly it creates a B+ tree index for each numerical attributes of the file and the
index information of the file is only stored in the leaf nodes. When making range the original range is divided
into sub-ranges according to the data stored in the root node of the B + tree. Each sub-query is made along the
corresponding child node until the leaf node is reached. Then the queried results are returned. However, the
performance of this algorithm extended to big data environment is to be improved. Reference 0 proposed a
global index structure based on Map Reduce to optimize range query. Firstly sort the indexed attributes. Then
the original index table is divided into a number of sub-tables sequentially and these sub-tables are assigned to
different nodes of the cluster. Each sub-table of global index stores the location list of files. When making range
query the range condition is split to different nodes and each sub-condition is parallel executed. But
performance of this algorithm is limited to the speed of the global index.
The main difficulty of data queries on big data environment lies in how to extend the traditional query
algorithm to massive data environment while query efficiency is not affected. Establishing a global index will
lead to index being algorithm bottleneck of large data environment. Therefore, we need to get rid of the
limitations of traditional centralized structure algorithm.
Map Reduce precisely is a model for large-scale data sets. It includes two functions Map and Reduce.
Users only need to define these two functions without considering how to achieve the distribution of the
underlying system. So this model facilitates programmer to run program in distributed system under the
circumstance of not familiar with distributed programming. This model is frequently used to process large-scale
distributed systems 0.
This paper presents a range query algorithm processing big data based on Map Reduce and P-Ring
Error! Reference source not found.. Files on each node are organized by P-Ring and then nodes of
distributed system are organized by Map Reduce. Range query is divided into two steps: search files and query
data.
The rest of this paper is organized as follows, section II surveys related work. Section III introduces
dual-index. Section IV presents basic idea of the algorithm. Section V concludes this paper.
Abstract: Range query in P2P system is mainly made by establishing index, such as B+ tree or DST.
However when the number of nodes in the system and the amount of data in a single node increases
significantly, the above traditional index will become extremely large so it will affect the query
efficiency. In the present, enterprises require effective data analysis when making some important
decisions, such as user's consumption habits deriving from analyzing user data. This paper aims at
optimization of range query in big data. This algorithm introduces MapReduce in P2P system and
organizes files by P-Ring in different nodes. When making range query we use P-Ring to find the
corresponding files and then search data in the file by B+ tree.
Keywords: Range query, Big Data, Map Reduce, P-Ring, B+ tree
2. Range Query on Big Data Based on Map Reduce
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 |Feb. 2014 | 129 |
II. Related Work
Range query is achieved mainly based on building ordered indexes. But the index will become very
large in big data. So what we need to do is to try to simplify the index structure. At present, the main index
structure for range queries is B+ tree. It can sort files according to one attribute so that it is convenient for range
query.
In order to simplify the index structure, this paper combines P-Ring and B+ tree to build a dual-index
structure. This structure avoids the shortcoming of index being too large caused by establishing B+ tree directly
among documents. This paper establishes a P-Ring for files in different nodes of distributed system. The
algorithm complexity of exact search by P-Ring is O (logN) Error! Reference source not found.. N is the
number of nodes in P-Ring 0. Then create a B+ tree for a file based on attribute in the query condition and query
data inside a file. This method effectively reduces the height of the B + tree and takes full advantage of the
characteristics of P-Ring and B + tree.
At present P-Ring is mainly used in large-scale distributed P2P systems to search data. At present index
structures of range query about P2P have some defects in a certain extent. P-trees can only handle a single data
item per peer, and hence, are not well-suited for large data sets. The index proposed by Gupta et al. 0 only
provides approximate answers to range queries and can miss results. Mercury [4] and P-Grid 00 provide
probabilistic (as opposed to absolute) guarantees on search and load-balancing, even when the P2P system is
fully consistent. Reference 0 pointed out that the main advantage of P-Ring relative to existing index structures
of range query is high accuracy and it can be extended to environment where there are a large number of nodes
and data.
III. Dual-Index
The index structure proposed in this paper firstly indexes to the file by improved P-Ring, and then to
specific data by B+ tree.
A.Defect of current P-Ring
Traditional P-Ring mainly index to data. Under the environment of big data, data is distributed different
nodes by the unit of file. So if we want to execute range query, the file including corresponding data must be
found. In this circumstance traditional P-Ring can’t meet our needs.
B. Improved P-Ring Index
P-Ring is an index based on sort. Data on P-Ring is strictly sequential. Each node stores a value and the
location between two nodes stores values ranging from previous node to next one. In P-Ring every node saves
hierarchically information about its successor node and the value of hierarchy is differentials of value between
successor node and current node. Improved P-Ring is improved to index by files through creating index for
ranges of attribute values in a file. Value segment between two nodes can store the index information for one or
more files.
The index structure proposed in this article organizes files between different nodes by improved P-
Ring. Every node stores one value according to ranges of attributes in file. Different nodes are arranged
sequentially and everyone stores a table of index information named infoTable. The infoTable stores range
segments, file name and file address of corresponding files of these segments.
The following are examples of file data:
Table 1 File information of node 1
File name Range of attribute value File address Node of P-Ring
F1 10-15 A1 P0
F2 5-10 A2 P1
F3 6-8 A3 P1
F4 10-18 A4 P0
F5 1-8 A5 P1,P2
Table 2 File information of node 2
File name Range of attribute value File address Node of P-Ring
F6 2-10 A6 P1,P2
F7 13-15 A7 P0
F8 4-6 A8 P1,P2
F9 2-10 A9 P1,P2
F10 16-20 A10 P3
3. Range Query on Big Data Based on Map Reduce
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 |Feb. 2014 | 130 |
Process of establishing P-Ring index on the above documents is as follows.
1) Store F1. Firstly define two nodes P0 and P1 on the ring, respectively stores 10 and 15. Then save the
address and attribute ranges of F1 in infoTable of P0. Assuming the name of range is R, R0=(10,15);
2) Store F2. Because attribute range of F2 is 5 ~ 10, it is not included in the scope of R0, so create a new node
P storing value 5. Because 5 is smaller than 10 stored in P0, so the P0 and P1, respectively, plus 1 to P1 and
P2. The newly inserted node is P0. Save the address and attribute ranges R2=(5,10) of F2 in infoTable of P0;
3) Store F3. The attribute range of F3 is 6~8, so its name and address is stored in P0 and divide the range of P0
which is R2 into r21=(5,6), r22=(6,8) and r23=(8,10). Then corresponding files of every sub range are
stored its infoTable. In this example the corresponding file of r21, r22 and r23 is respectively F2, F2 and F3,
F2;
4) Store F4. The procedure is similar to 3). When storing file address we need to save the corresponding
machine number or address of file;
5) Follow this step to store the remaining files.
The P-Ring created in accordance with above steps using files in Table1 and Table2 is as shown below.
Figure 1 P-Ring index based on files
The first line of info Table in the above Fig.1 stores file name and file address. The second line stores
every sub ranges. Corresponding file name of every range is stored in third line. Node of this ring can be Map of
Map Reduce.
The central issue of P2P is routing. So after index structure is completed, we need to build the routing
table for nodes. Traditional P-Ring structure uses hierarchical routing algorithm and the value difference of
successor node and the current node in level n is 2n-1
. This article draws lessons from hierarchical algorithm and
improves it on the basis of current one. Specific algorithm is as follows:
Routing of every node includes its predecessor and successor node. The definition of level is same to
traditional P-Ring. As the number of levels in traditional P-Ring is and in this article predecessor nodes
are also indexed as similar with successor nodes so the number of levels is smaller than traditional P-Ring. It
is . N is the number of nodes.
Routing table of nodes in Fig.1 according to above algorithm is as Fig.2.
Because the query complexity of current P-Ring is . In the modified routing algorithm predecessor and
successor nodes respectively index half of all nodes in the ring. So as long as traversing nodes we can
find information we need. Therefore query complexity of this algorithm is .
4. Range Query on Big Data Based on Map Reduce
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 |Feb. 2014 | 131 |
Figure 2 Routing table of nodes
When a range query is coming firstly we need to determine whether the scope of query contains sub
ranges. If a range is split into several sub-ranges we can firstly record corresponding file name of each sub-
range, and then merge them.
For example supposing that P4 receives a query whose attribute range is 1~8. Firstly find the minimum
value 1 in the infoTable of P4. It is located in P0. Because stored values of range in every node of P-Ring are no
smaller than the value of this node. As 8 is smaller than 10 which is the value of P2 so the maximum value 8 of
the query range is located in P1. Then the search goes on in P2 and P1. The locating based on nodes in this
example needs only once.
When completing locating nodes we need concretely search based on values in located nodes. In the
above example we firstly query P0. Since the range stored in P0 is 1~5 and our query range is 1~8 which
includes all values in P0. So all files stored in P0 need be queried in detail which are F5, F6, F8 and F9. Then
query P1 to find files meeting with 5~8. There are two sub ranges in P1 meeting conditions which are (5, 6) and
(6, 8). So record corresponding files of these two ranges and they are F2, F3, F5, F6, F8 and F9. Summarizing
files meeting with query condition of the above two ranges (The repeated files in two ranges don’t need to
search twice.). The final results are that F2, F3, F5, F6, F8 and F9 need to be searched in detail. According to
index information in infoTable find corresponding files to query concretely based on attribute values. That is B+
tree index.
C.B+ tree index
After finding corresponding files, we need to seek data meeting the condition in each file. Such search
based on data requires building a B+ tree. That is to say the logical organization structure among data is B +
tree. This paper takes F4 as an example. Assuming that data stored in F4 shows as the following table.
Table 3 Data in F4
Order Number Attribute Order Number Attribute
1 43 11 54
2 46 12 58
3 51 13 43
4 44 14 46
5 42 15 52
6 50 16 56
7 55 17 59
8 48 18 57
9 49 19 47
10 41 20 60
B+ trees established based on attribute of this file show as follow.
5. Range Query on Big Data Based on Map Reduce
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 |Feb. 2014 | 132 |
Figure 3 B+ tree based on attribute of file 4
In B+ tree only leaf nodes store keywords (In this paper they refer to data in the table). Non-leaf nodes
store index only containing the biggest keyword of its sub-tree.
In the dual-index proposed in this paper, files found in Chord consist of information about B+ tree pointed
to these files. For example, when the query range is 47<a<51 F4 is the found file in Chord and then next step is
making specific search in B+ tree of F4. Query results are 48、49、50、51.
B + tree index is created before the first query and the following queries can use it directly unless data is
update.
D.Map Reduce
The Master node needs to know about information stored on each node so as to avoid sending query to
a node which will return null results. That will affect query efficiency. Therefore each node needs to send its
own ID, the range of fileID in this node and numeric attributes (Before receiving the query which attribute will
be searched is not determined.) in each file to Master node which stores these index information. When deciding
Map node we can choose nodes which don’t execute query.
When Master node receives query request it firstly checked index information stored in it. Then Master
node sends this request to a node containing this queried attribute. The chord will retrieve files containing the
query attribute and attributes whose values range meets the query condition.
IV. Range Query Processing
When one query comes the Master node of Map Reduce decides which Map node the query will be
sent to. After Determining queried node the search based on file begins in each node to find qualified files. Then
search data inside the file. Finally data found is sent to the Reduce node to merge results.
The concrete steps show as following:
1. The query is sent firstly to the Master node. As the Master node stores attributes value range of files of the
other nodes. It maintains a P-Ring index including all files in other nodes. So the Master node can
determine which one or more nodes the query will be executed in. In this article we use P-Ring to organize
nodes of MapReduce. Every node under query is Map.
2. When Master receives query it firstly needs to determine which node the maximum and minimum value of
range is located in. Files in the node which storing values meeting with query range. There are following
circumstances.
1) The maximum and minimum value is exactly the value of node. If this happens, infoTable of all nodes (including
node which minimum value is located in) between these two nodes need to be searched concretely;
2) The minimum value is exactly equal to one node assuming it is P1 and the maximum value is between two nodes
assuming P0 and P3. We need to determine which sub range of infoTable of P0 the maximum value is located in.
So all files in P1 and files which are located in range from the first sub range of P0 to sub range includes the
maximum need to be query.
3) The maximum value is exactly equal to one node or two values of query range are between two nodes in ring.
Locate files according to step 2) and return file name and address needing to search concretely. If two values of
query range are between two nodes we can search files in two Map nodes at the same time using the parallelism
of MapReduce.
6. Range Query on Big Data Based on Map Reduce
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 2 |Feb. 2014 | 133 |
1. Query B+ tree of each file in parallel in the list of found files according to file name and file address in
infoTable. The search begins with root node of B+ tree. If child nodes fit with query condition the search
will continue from this branch until the query arrives at leaf nodes. Finally results are returned to Reduce
nodes designated by Master.
2. On the end of each Map node query, it sends a message to the Master node. Master decides which Reduce
node to merge these results will send to.
V. Conclusion
This paper proposed a range query algorithm mainly based on big data which applies dual- index
structure. In order to simplify B + tree we combined P-Ring with it. Files are stored in different nodes. The
range between two nodes is corresponding to files containing these values. MapReduce dispatches tasks among
nodes. The time complexity of this algorithm is finding files in P-Ring adding up to finding data in B+ tree. That
is +logP .N is the number of nodes in P-Ring and P is the maximum number of items in one file.
REFERENCES
[1] Viktor Mayer-Schönberger, Big Data: A Revolution That Will Transform How We Live, Work, and Think.
[2] Wang Dan, Li Maozeng, A Range Query Model based on DHT in P2P System, 2009 International Conference on
Networks Security, Wireless Communications and Trusted Computing.
[3] Hui Zhao, Shuqiang Yang, Zhikun Chen, Songcang Jin, Hong Yin, Long Li, MapReduce model-based optimization
of range queries, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012).
[4] Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, P-Ring: An Efficient and Robust P2P Range Index
Structure, SIGMOD '07 Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Pages 223-234 New York 2007 .
[5] A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: supporting scalable multi-attribute range queries. SIGCOMM
Comput. Commun. Rev., 34(4), 2004.
[6] Michael Cardosa, Chenyu Wang, Anshuman Nangia, Abhishek Chandra, Jon Weissman, Exploring MapReduce
efficiency with highly-distributed data, MapReduce '11 Proceedings of the second international workshop on Map
Reduce and its applications, Pages 27-34 , New York 2011.
[7] Samet Ayhan, Johnathan Pesce, Paul Comitz, Gary Gerberick, Steve Bliesner, Predictive analytics with surveillance
big data, BigSpatioal’12 Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big
Geospatial Data, Pages 81-90, New York 2012.
[8] Edward Z. Yang, Robert J. Simmons, Profile Jeff Dean: Big data at Google, XRDS: Crossroads, The ACM Magazine
for Students - Big Data, Volume 19 Issue 1, fall 2012, Pages 69-69
[9] Jens Dittrich, Jorge-Arnulfo Quiane-Rioz, Efficient big data processing in Hadoop MapReduce, Proceedings of the
VLDB Endowment, Volume 5 Issue 12, August 2012 , Pages 2014-2015.
[10] A. Gupta, D. Agrawal, and A. El Abbadi. Approximate range selection queries in peer-to-peer systems. In CIDR,
2003.
[11] A. Datta, M. Hauswirth, R. John, R. Schmidt, and K. Aberer. Range-queries in trie-structured overlays. In P2P Comp.
2005.
[12] P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-
peer systems. In VLDB, 2004.