It is a semantic data model that is used for the graphical representation of the conceptual database design. The semantic data models provide more constructs that is why a database design in a semantic data model can contain/represent more details. With a semantic data model, it becomes easier to design the database, at the first place, and secondly it is easier to understand later. We also know that conceptual database is our first comprehensive design. It is independent of any particular implementation of the database, that is, the conceptual database design expressed in E-R data model can be implemented using any DBMS. For that we will have to transform the conceptual database design from E-R data model to the data model of the particular DBMS. There is no DBMS based on the E-R data model, so we have to transform the conceptual database design anyway.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
It is a semantic data model that is used for the graphical representation of the conceptual database design. The semantic data models provide more constructs that is why a database design in a semantic data model can contain/represent more details. With a semantic data model, it becomes easier to design the database, at the first place, and secondly it is easier to understand later. We also know that conceptual database is our first comprehensive design. It is independent of any particular implementation of the database, that is, the conceptual database design expressed in E-R data model can be implemented using any DBMS. For that we will have to transform the conceptual database design from E-R data model to the data model of the particular DBMS. There is no DBMS based on the E-R data model, so we have to transform the conceptual database design anyway.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the internet. When we want to store and query XML data, we can use two approaches either by using native databases or XML enabled databases. In this paper we deal with XML enabled databases. We use relational databases to store XML documents. In this paper we focus on mapping of XML DTD into relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the
internet. When we want to store and query XML data, we can use two approaches either by using native
databases or XML enabled databases. In this paper we deal with XML enabled databases. We use
relational databases to store XML documents. In this paper we focus on mapping of XML DTD into
relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified
DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Database Systems
DBMS
Database System Environment
Traditional File Systems
Advantages of DBMS over File Systems
Disadvantages of DBMS
DBMS
Describing and Storing data in DBMS
Three Schema Architecture
Data Independence
Queries
Transactions
Structure of DBMS
Users of DBMS
Steps in database Design Process
ER Concepts and Notations
Class Hierarchies
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the internet. When we want to store and query XML data, we can use two approaches either by using native databases or XML enabled databases. In this paper we deal with XML enabled databases. We use relational databases to store XML documents. In this paper we focus on mapping of XML DTD into relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the
internet. When we want to store and query XML data, we can use two approaches either by using native
databases or XML enabled databases. In this paper we deal with XML enabled databases. We use
relational databases to store XML documents. In this paper we focus on mapping of XML DTD into
relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified
DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Database Systems
DBMS
Database System Environment
Traditional File Systems
Advantages of DBMS over File Systems
Disadvantages of DBMS
DBMS
Describing and Storing data in DBMS
Three Schema Architecture
Data Independence
Queries
Transactions
Structure of DBMS
Users of DBMS
Steps in database Design Process
ER Concepts and Notations
Class Hierarchies
Overview of structured search technology. Using the structure of a document to create better search results for document search and retrieval.
How both search precision and recall is improved when the structure of a document is used.
How a keyword match in a title of a document can be used to boost the search score.
Case studies with the eXist native XML database.
Steps to set up a pilot project.
Information retrival system and PageRank algorithmRupali Bhatnagar
We discuss the various models for Information retrieval system present in literature and discuss them mathematically. We also study the PageRank Algorithm which is used for relevant search.
To download slides:
http://www.intelligentmining.com/category/knowledge-base/
These are my notes for a presentation I did internally at IM. It covers both the multinomial and multi-variate Bernoulli event models in Naive Bayes text classification.
What is NoSQL? NoSQL describes a family of approaches to managing data at an enterprise level that have key similarities, but - at the same time - are very different from classic SQL based relational databases.
NoSQL has emerged as a 'movement' over the last 5 years and many specific noSQL datastores - Mongo, Redis, HBase, Cassandra, Neo4J - are being used for mission critical systems by many organizations including Facebook, LinkedIn, Dropbox, American Express, NSA, & the CIA. Does NoSQL spell the end of SQL based relational datastores like Oracle, MySQL, SQLServer, & Sybase? Definitely not, but the world is moving in the direction of "Polyglot Persistence" and away from the "Relational Persistence" hegemony. In my presentation I will explain why this shift is occurring and will speculate about what the future will hold.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
The student will understand the basics of the Relational Database Model.
The student will learn Database Administration functions as appropriate for software developers.
The student will learn SQL.
The student will become familiar with the entire implementation cycle of a client server application.
And, you will build one.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Introduction
Structured data
Schema as a summary of the data
Retrieve through structured language
What would big data bring to structured data
retrieval?
3. Introduction
In terms of high volume of data
Hadoop + Pig Latin came to rescue
However, is this enough?
Recall how you write selection. What do you need
to know
Can you remember this ?
4. Introduction
Big data-> big and complicated schema
Hard to remember and operate!
May not even fit in main memory!
What should we do about it ?
How does information retrieval deals with this ?
5. Introduction
Search based on keywords
No need for schema
Efficiency guaranteed using index
All seem to to be straightforward and easy
What are the challenges ?
6. Introduction
Search for “Apple + company”
Match to “apple(fruit)”, “Apple Inc.”, “Adams’
apple”
Which one is correct ? How to filter?
Challenge1:
Filtering and disambiguat
7. Introduction
Search for “Steve Jobs + Apple”
Normalization. What to return ?
ID Nam
e
Gend
er
Employ
er
Location
ID Company Locatio
n
Type Product
ID Street City State Countr
y
Challenge2:
Automatic join
back
8. Introduction
Search for “Jordan”
Match “Jordan (brand)” , ”Michael Jordan (player)”,
“Michael Jordan (professor)” etc.
All of them should match. Which one is better ?
Ranking
Challenge3:
Ranking of the
result
9. Literature Overview
Two kinds of approaches
1. Interpretative approach
Reuse database query language and index
Translate the keywords into queries
Will introduce 3 papers
2. Un-interpretative approach (focus)
Typically build own index and data structure
Model as graph and use graph-based analysis
Will introduce 3 papers
10. Literature Overview –
Interpretative approach
DBXplorer Sanjay Agrawal et al.
General: two steps
Publish step: pre-computation, indexing etc.
Search step: lookup, enumerate over join tree,
generate SQL etc.
Efficiency:
Symbol table (index) design
Symbol table compaction
11. Literature Overview –
Interpretative approach
Publish step:
1: A database is identified, along with the set of
tables and columns within the database to be
published.
2: Auxiliary tables are created for supporting
keyword searches. E.g. index table
But, how to build efficient index ?
12. Literature Overview –
Interpretative approach
Index goal: find out the keyword belonging
row_id and column_id.
If the column (attribute) already has index, we
need only column_id index (reuse database
index)
ID Name Gender Addr Org
1
2
3
Column index
Row index
13. Literature Overview –
Interpretative approach
Compress index table
Foreign key constraint etc.
General Algorithm -- CP-Comp
Name Product …
Name Gender …
Sells table
Person table
Table 1.
Compressed
table Table 2.
Uncompressed table
14. Literature Overview –
Interpretative approach
Search step
Step 1: look up index find columns/rows of the
database that contain the query keywords.
Step 2: All potential subsets of tables in the
database that, if joined, might contain rows having
all keywords, are identified and enumerated. Join
Tree
Step 3: For each enumerated join tree, a SQL
statement is constructed (and executed) that joins
the tables in the tree and selects those rows that
contain all keywords. The final rows are ranked
and presented to the user.
17. Integrating IR and DB
DB techniques provide users with efficient
ways to access structured data in RDBMSs
IR techniques allow users to use keywords to
access unstructured data
Eg. Structural keyword search, finds how
tuples that contain keywords in a RDB are
interconnected (the structure), three types:
18. Schema-based approach
Connected Tree Semantics: query
results in minimal total joining network
of tuples; adjacent tuples joined by
foreign key reference, #tuples <=
Tmax
19. Connected Tree Semantics
1. Candidate Network (CN) generation:
relational algebra expressions that creates
trees with all keywords up to a certain size
2. CN evaluation: evaluates generated CNs
using SQL
20. Schema-based approach
Distinct Root Semantics: query
results in collection of tuples all
reachable from root; root uniquely
defines tuples, distance(any tuple,
root) <= Dmax
21. Schema-based approach
Distinct Core Semantics: query results in
multi-center subgraphs (communities);
keyword tuples uniquely defines a
community, distance(any keyword tuple, any
center tuple) <= Dmax
22. Distinct Core/Root Semantics
1. Creates pairs between tuple containing
keyword and every other tuple, that is the
shortest distance between them
2. generate graphs using SQL with distinct
core/roots
24. Problem Definition
A database D is a collection of relational tables. Each relational table
contains its name, attributes and value domains. All these elements
together form the vocabulary.
A keyword query q is an ordered list of keywords. Each keyword
specifies the element of the interest.
A configuration of a keyword query on Database is an injective
mapping from the keyword to vocabulary of the database
Task: First derive the top configurations based on some metrics and
then interpret it as SQL query (select-project-join interpretations)
25. From Keywords to Queries
Need to consider inter-dependency of the query keywords:
Introduce two different kinds of weights: the intrinsic weights, and the
contextual weights
Need to give a ranked list of all the configurations
Develop an algorithm based on and extends the Hungarian (a.k.a.,
Munkres) algorithm
Need to separate the process of evaluating the schema terms and
value terms
Evaluate the value weights based on the schema mapping
26.
27. Contributions and Insights
Formally define the problem of keyword querying over relational
databases that lack a-priori access to the database instance
Introduce the notion of a weight as a measure of the likelihood that the
semantics of a keyword are represented by a database structure.
Need to consider both intrinsic weights and contextual weights
Extend and exploit the Hungarian (a.k.a., Munkres) algorithm to
generate a ranking of different interpretations.
28. Literature Overview
Two kinds of approaches
1. Interpretative approach
Reuse database query language and index
Translate the keywords into queries
2. Un-interpretative approach
Typically build own index and data structure
Model as graph and use graph-based analysis
30. Difficulties of Keyword Search
Keyword search in text databases only need to
compute score for each document
Keyword search on RDBMS more complicated
(relations, attributes, tuples):
1. Generate tuple trees (answers) by joining
tuples from different tables
2. Rank the answers by computing score
31. Generate Answer Tuple Trees
Tuple tree answer rules:
1. Each leaf node in a tuple tree must contain at
least one keyword
2. Each tuple only appears at most once in tree
Separate tuples into tuple sets that contain
keywords and tuple sets that contain all tuples
for each relation, join adjacent sets from
schema graph within constraints of answer
trees
32. Ranking Tuple Trees
Treat the text of each tuple within an answer
set as a “document”
Assign similarity rating between each
document and query, normalizing for:
Term Frequency
Document Frequency
Document Length
Compute score for tuple tree as average over
all documents
33. Focused work
Keyword Searching and Browsing in
Databases using BANKS
Gaurav Bhalotia et al.
ICDE 02
34. BANKS (Browsing And Keyword
Searching)
a system which enables keyword-
based search on relational
databases, together with data and
schema browsing
User HTTP
BANKS
System JDBC Database
35. Database and Query Model
Relational Database -> Directed
Graph
Each Tuple in Database -> Node in
Graph
Foreign Key -> Directed Edge
37. Database and Query Model
An answer to a query should be a
subgraph connecting nodes matching
the keywords.
The importance of a link depends upon
the type of the link i.e. what relations it
connects and on its semantics
Ignoring directionality would cause
problems because of “hubs” which are
connected to a large numbers of nodes.
38. Database and Query Model
We may restrict the information node to
be from a selected set of nodes of the
graph
We incorporate another interesting
feature, namely node weights, inspired
by prestige rankings
Node weights and tree weights need to
be combined to get an overall relevance
score
39. Formal Model
Node Weight : N(u)
Depends on the prestige
Set the node prestige = the in-degree of
the node
Nodes that have multiple pointers to
them get a higher prestige
40. Formal Model
Edge Weights
Some pupluar tuples can be connected
many other tuples Edge with forward
and backward edge weights
Weight of a forward link = the strength of
the proximity relationship between two
tuples (set to 1 by default)
Weight of a backward link = in-degree of
edges pointing to the node
43. Searching for the best answer
Backward Expanding Search
Algorithm
Intuition: find vertices from which a
forward path exists to at least one node
from each Si.
Run concurrent single source shortest
path algorithm from each node matching
a keyword
44. Searching for the best answer
S.
Sudarsha
n
Prasan
Roy
writes
author
paper
Charuta
BANKS: Keyword
search…
45. As an extension of BANKS
BLINKS: ranked keyword searches on
graphs.
He H et al.
SIGMOD 07
46. Introduction
Efficient ranked keyword searches on schemaless node-labeled
graphs.
Challenges:
Lack of schema for optimization
Hard to guarantee strong performance
Proposed technique
Backward search algorithm
SLINKS: single-level index search *
Extension for scalability: BLINKS ( bi-level index search )
Contributions
Cost-balanced expansion based backward search
Combining indexing with search
Partition-based indexing (bi-level indexing)
52. BLINKS ( brief idea)
The index is too large to store and too expensive to construct in large
graphs?
Use a divide and conquer approach to create a bi-level index
Partition the data graph into multiple subgraphs, or blocks.
Intra-Block Index
indexes information inside a block
4 kinds of index, 2 for separator nodes (important, so specially considered )
Block Index
2 simple index
53. Conclusion
Keywords search challenges:
Filtering and disambiguation
Automatic join back
Ranking of the result
Additional consideration:
Efficiency
Space