The document discusses information retrieval (IR) techniques for private and public data. It provides an overview of key concepts in web-based IR including technologies, models, architecture, and challenges. It also introduces the concept of private information retrieval (PIR) which aims to allow a user to query a database while hiding which item they are accessing, in order to protect user privacy. The document outlines a potential approach for PIR using linear algebra operations on the database to retrieve the desired item without revealing which item was queried. Overall the document provides background on IR techniques for both public and private data, with a focus on the goal of PIR to allow private querying of databases.
This document summarizes a project that proposes a privacy-preserving personalized web search framework called UPS. UPS allows users to specify customized privacy requirements for their hierarchical user profiles. It performs online generalization of user profiles for each query to balance personalization utility and privacy risk without compromising search quality or unnecessarily exposing user profiles. The framework consists of client and server components, with the client maintaining profiles and privacy specifications and handling query processing and result personalization.
One application, multiple platforms. This application will enable the users to control their home appliances and smart devices from their mobile or tablet, and also share information and communicate with their friends and family through this application. Based on the electricity usage, each users will receive a social score and will be ranked among their friends and their neighbors, who are using the same platforms anonymously.
Invited lecture at POSI (Post Graduation on Information Systems at INESC, Lisbon, Portugal) about Information Retrieval Challenges: real-time retrieval, context awareness and inferring identity from content.
Privacy Protectin Models and Defamation caused by k-anonymityHiroshi Nakagawa
Introduction of Privacy Protection Mathematical Models are the topics of this slide. The Models explained are 1) Private Information Retrieval, 2) IR with Homomorphic Encryption, 3) k-anonymity, 4) l-diversity, and finally 5) Defamation caused by k-Anonymity
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
This document summarizes a project that proposes a privacy-preserving personalized web search framework called UPS. UPS allows users to specify customized privacy requirements for their hierarchical user profiles. It performs online generalization of user profiles for each query to balance personalization utility and privacy risk without compromising search quality or unnecessarily exposing user profiles. The framework consists of client and server components, with the client maintaining profiles and privacy specifications and handling query processing and result personalization.
One application, multiple platforms. This application will enable the users to control their home appliances and smart devices from their mobile or tablet, and also share information and communicate with their friends and family through this application. Based on the electricity usage, each users will receive a social score and will be ranked among their friends and their neighbors, who are using the same platforms anonymously.
Invited lecture at POSI (Post Graduation on Information Systems at INESC, Lisbon, Portugal) about Information Retrieval Challenges: real-time retrieval, context awareness and inferring identity from content.
Privacy Protectin Models and Defamation caused by k-anonymityHiroshi Nakagawa
Introduction of Privacy Protection Mathematical Models are the topics of this slide. The Models explained are 1) Private Information Retrieval, 2) IR with Homomorphic Encryption, 3) k-anonymity, 4) l-diversity, and finally 5) Defamation caused by k-Anonymity
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
The document discusses a novel method called ProMiSH (Projection and Multi Scale Hashing) for keyword search in multi-dimensional datasets. ProMiSH uses random projection and hash-based index structures to achieve high scalability and speedup of more than four orders over state-of-the-art tree-based techniques. Empirical studies on real and synthetic datasets of sizes up to 10 million objects and 100 dimensions show ProMiSH scales linearly with dataset size, dimension, query size, and result size. The method groups objects embedded in a vector space that are tagged with keywords matching a given query.
This document summarizes a research paper that proposes a framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach that combines user search context and browsing behavior to generate personalized search results with high relevance. The framework consists of five components: a request handler, query processor, result handler, event handler, and response handler. The result handler applies a re-ranking approach using query log and clickthrough data to personalize search results before returning them to the user. An evaluation found the framework and re-ranking approach to be effective for personalized web search and information retrieval.
This document summarizes a research paper that proposes a novel framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach to generate personalized search results with high relevance. It derives an extended set of user preferences and concepts based on extracted data from query logs and clickthrough information. An evaluation found the framework and re-ranking approach to be highly effective for personalized search and information retrieval.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Detection of Behavior using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to analyze user behavior and predict future behavior based on browsing history data. The researchers collected browsing history data from 20 computers in a lab over various time periods. They extracted features from the data like URL, title, time of visit, and categorized browsing into types like social media, academic, shopping etc. They then trained machine learning models like SVM on 80% of the data and tested it on the remaining 20% to classify users and predict their future behavior. The goal is to develop personalized services and detect anomalies in behavior.
1. The document describes a search engine scraper that extracts data from websites, summarizes the extracted information, and converts it into a relevant result for users.
2. The search engine scraper works in three stages: extraction of data from website content, summarization of the extracted data using natural language processing techniques, and conversion of the summarized data into a meaningful format for users.
3. The summarization stage uses natural language toolkit processing libraries to determine sentence similarity, assign weights to sentences, and select sentences with higher ranks to include in the summary.
The size of the Internet enlarging as per to grow the users of search providers continually demand search
results that are accurate to their wishes. Personalized Search is one of the options available to users in
order to sculpt search results based on their personal data returned to them provided to the search
provider. This brings up fears of privacy issues however, as users are typically anxious to revealing
personal info to an often faceless service provider along the Internet. This work proposes to administer
with the privacy issues surrounding personalized search and discusses ways that privacy can be improved
so that users can get easier with the dismissal of their personal information in order to obtain more precise
search results.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
This document discusses strategies for implementing a mobile office solution using various apps and software. It analyzes potential partners and competitors in mobile device supply chains. It evaluates customer relationship management, enterprise resource planning and accounting software options like NetSuite, QuickBooks, Salesforce and Microsoft Dynamics. Implementation plans address integrating these solutions while ensuring data security, disaster recovery and network security.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...IRJET Journal
This document describes a proposed personalized mobile search engine (PMSE) that learns users' content and location preferences from clickthrough data and GPS locations to provide personalized search results. The PMSE uses an ontology-based user profile learned from clicks and locations, without requiring extra user effort. It has a client-server architecture where the client handles interactions and stores privacy-sensitive click data, while the server performs computationally intensive tasks like concept extraction, training, and result reranking. The PMSE aims to improve search personalization by separately considering content and location concepts based on user interests.
Team of Rivals: UX, SEO, Content & Dev UXDC 2015Marianne Sweeny
The search engine landscape has changed dramatically and now relies heavily on user experience signals to influence rank in search results. In this presentation, I explore search engine methods for evaluating UX in a machine readable fashion and present a framework for successful cross-discipline collaboration.
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Denodo
In this presentation, you will learn how Denodo expedites the use of data by business users through its new self-service discovery and search capabilities.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/uOPpGe.
The document discusses grid computing and the speaker's background in the topic. It provides key takeaways about understanding the evolution of technologies like grid computing and envisioning upcoming trends. It then discusses what a grid is, including early definitions, and elements of grid computing like resource sharing, coordinated problem solving, and dynamic virtual organizations. The document also outlines attributes of grid computing related to virtualization, dynamic provisioning, resource pooling, and self-adaptive software. It provides examples of how grids are used and lists common grid components.
1.supporting privacy protection in personalized web search..9440480873 ,proje...RamaKrishnaReddyKona
supporting privacy protection in personalized web search..new projects java and web based projects available low cost ..plz dial 9440480873 krishna reddy...low price provided projects
Going Mobile With Enterprise Applications - A study on user behavior and perceptions.
This paper presents findings from three research studies carried out to understand the user behavior and explore the value in using mobile devices for accessing enterprise products.
The focus is essentially on the expectations of the end-‐users, namely, information technology (IT) administrators. In this case, we were exploring how the users of enterprise products might want to leverage mobile technology to access their everyday tasks and information, and therefore identify potential opportunities and challenges for extending their user experience to such devices.
The document discusses various topics related to web mining and data mining including:
- Web mining techniques like web content mining, web usage mining, and web structure mining.
- Common data mining techniques like classification, clustering, association rule mining etc. and how they are applied in web content mining.
- How web usage mining analyzes server log files to understand user browsing behavior and patterns.
- Classification and clustering are two popular techniques used in web usage mining, with decision trees and k-means clustering provided as examples.
The document discusses various topics related to web mining and data mining. It defines web mining as using data mining techniques to extract useful information from web data. It covers different categories of web mining including web content mining, web usage mining, and web structure mining. Popular data mining techniques for these categories are discussed such as classification, clustering, association rule mining. Other topics covered include social media mining, text mining, and applications of web mining in e-commerce.
The document announces the 10th International Conference on Pattern Recognition and Machine Intelligence (PReMI'23) to be held in December 2023 in Kolkata, India. The conference aims to provide a platform for presenting research in pattern recognition, machine intelligence, and related fields. Full papers will be published in Springer's LNCS series, and selected papers may be published in special journal issues. Authors are invited to submit papers by April 30, 2023 related to various topics in pattern recognition and machine intelligence.
The document outlines JavaServer Pages (JSP) technology which extends Servlet technology to simplify delivery of dynamic web content. It discusses key JSP components like directives, actions, scriptlets and tag libraries. It provides an example JSP page that displays the current date and time using scripting. It also describes standard JSP actions like <jsp:include> and <jsp:forward> that can be used to include or forward to other resources.
More Related Content
Similar to Information Retrieval AICTE FDP at GCT Coimbatore
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
The document discusses a novel method called ProMiSH (Projection and Multi Scale Hashing) for keyword search in multi-dimensional datasets. ProMiSH uses random projection and hash-based index structures to achieve high scalability and speedup of more than four orders over state-of-the-art tree-based techniques. Empirical studies on real and synthetic datasets of sizes up to 10 million objects and 100 dimensions show ProMiSH scales linearly with dataset size, dimension, query size, and result size. The method groups objects embedded in a vector space that are tagged with keywords matching a given query.
This document summarizes a research paper that proposes a framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach that combines user search context and browsing behavior to generate personalized search results with high relevance. The framework consists of five components: a request handler, query processor, result handler, event handler, and response handler. The result handler applies a re-ranking approach using query log and clickthrough data to personalize search results before returning them to the user. An evaluation found the framework and re-ranking approach to be effective for personalized web search and information retrieval.
This document summarizes a research paper that proposes a novel framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach to generate personalized search results with high relevance. It derives an extended set of user preferences and concepts based on extracted data from query logs and clickthrough information. An evaluation found the framework and re-ranking approach to be highly effective for personalized search and information retrieval.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Detection of Behavior using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to analyze user behavior and predict future behavior based on browsing history data. The researchers collected browsing history data from 20 computers in a lab over various time periods. They extracted features from the data like URL, title, time of visit, and categorized browsing into types like social media, academic, shopping etc. They then trained machine learning models like SVM on 80% of the data and tested it on the remaining 20% to classify users and predict their future behavior. The goal is to develop personalized services and detect anomalies in behavior.
1. The document describes a search engine scraper that extracts data from websites, summarizes the extracted information, and converts it into a relevant result for users.
2. The search engine scraper works in three stages: extraction of data from website content, summarization of the extracted data using natural language processing techniques, and conversion of the summarized data into a meaningful format for users.
3. The summarization stage uses natural language toolkit processing libraries to determine sentence similarity, assign weights to sentences, and select sentences with higher ranks to include in the summary.
The size of the Internet enlarging as per to grow the users of search providers continually demand search
results that are accurate to their wishes. Personalized Search is one of the options available to users in
order to sculpt search results based on their personal data returned to them provided to the search
provider. This brings up fears of privacy issues however, as users are typically anxious to revealing
personal info to an often faceless service provider along the Internet. This work proposes to administer
with the privacy issues surrounding personalized search and discusses ways that privacy can be improved
so that users can get easier with the dismissal of their personal information in order to obtain more precise
search results.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
This document discusses strategies for implementing a mobile office solution using various apps and software. It analyzes potential partners and competitors in mobile device supply chains. It evaluates customer relationship management, enterprise resource planning and accounting software options like NetSuite, QuickBooks, Salesforce and Microsoft Dynamics. Implementation plans address integrating these solutions while ensuring data security, disaster recovery and network security.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...IRJET Journal
This document describes a proposed personalized mobile search engine (PMSE) that learns users' content and location preferences from clickthrough data and GPS locations to provide personalized search results. The PMSE uses an ontology-based user profile learned from clicks and locations, without requiring extra user effort. It has a client-server architecture where the client handles interactions and stores privacy-sensitive click data, while the server performs computationally intensive tasks like concept extraction, training, and result reranking. The PMSE aims to improve search personalization by separately considering content and location concepts based on user interests.
Team of Rivals: UX, SEO, Content & Dev UXDC 2015Marianne Sweeny
The search engine landscape has changed dramatically and now relies heavily on user experience signals to influence rank in search results. In this presentation, I explore search engine methods for evaluating UX in a machine readable fashion and present a framework for successful cross-discipline collaboration.
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Denodo
In this presentation, you will learn how Denodo expedites the use of data by business users through its new self-service discovery and search capabilities.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/uOPpGe.
The document discusses grid computing and the speaker's background in the topic. It provides key takeaways about understanding the evolution of technologies like grid computing and envisioning upcoming trends. It then discusses what a grid is, including early definitions, and elements of grid computing like resource sharing, coordinated problem solving, and dynamic virtual organizations. The document also outlines attributes of grid computing related to virtualization, dynamic provisioning, resource pooling, and self-adaptive software. It provides examples of how grids are used and lists common grid components.
1.supporting privacy protection in personalized web search..9440480873 ,proje...RamaKrishnaReddyKona
supporting privacy protection in personalized web search..new projects java and web based projects available low cost ..plz dial 9440480873 krishna reddy...low price provided projects
Going Mobile With Enterprise Applications - A study on user behavior and perceptions.
This paper presents findings from three research studies carried out to understand the user behavior and explore the value in using mobile devices for accessing enterprise products.
The focus is essentially on the expectations of the end-‐users, namely, information technology (IT) administrators. In this case, we were exploring how the users of enterprise products might want to leverage mobile technology to access their everyday tasks and information, and therefore identify potential opportunities and challenges for extending their user experience to such devices.
The document discusses various topics related to web mining and data mining including:
- Web mining techniques like web content mining, web usage mining, and web structure mining.
- Common data mining techniques like classification, clustering, association rule mining etc. and how they are applied in web content mining.
- How web usage mining analyzes server log files to understand user browsing behavior and patterns.
- Classification and clustering are two popular techniques used in web usage mining, with decision trees and k-means clustering provided as examples.
The document discusses various topics related to web mining and data mining. It defines web mining as using data mining techniques to extract useful information from web data. It covers different categories of web mining including web content mining, web usage mining, and web structure mining. Popular data mining techniques for these categories are discussed such as classification, clustering, association rule mining. Other topics covered include social media mining, text mining, and applications of web mining in e-commerce.
The document announces the 10th International Conference on Pattern Recognition and Machine Intelligence (PReMI'23) to be held in December 2023 in Kolkata, India. The conference aims to provide a platform for presenting research in pattern recognition, machine intelligence, and related fields. Full papers will be published in Springer's LNCS series, and selected papers may be published in special journal issues. Authors are invited to submit papers by April 30, 2023 related to various topics in pattern recognition and machine intelligence.
The document outlines JavaServer Pages (JSP) technology which extends Servlet technology to simplify delivery of dynamic web content. It discusses key JSP components like directives, actions, scriptlets and tag libraries. It provides an example JSP page that displays the current date and time using scripting. It also describes standard JSP actions like <jsp:include> and <jsp:forward> that can be used to include or forward to other resources.
The document discusses various aspects of computer hardware and software. It begins by listing the main hardware components of a computer like the keyboard, mouse, monitor, and printer. It then discusses the internal components like the CPU, RAM, and different storage areas. The document also covers computer languages from machine language to assembly language to high-level languages. It provides examples of algorithms, flowcharts, and programs in C language. Finally, it discusses key concepts in C programming like data types, operators, functions, and translation of programs.
Enhancing Information Retrieval by Personalization Techniquesveningstonk
This document outlines the research modules proposed for a PhD thesis focused on enhancing information retrieval through personalization techniques. The research will include four modules: 1) enhancing retrieval using term association graph representation, 2) integrating document and user topic models for personalization, 3) using genetic algorithms for document re-ranking, and 4) employing ant colony optimization for query reformulation. Module 1 will represent documents as a term graph and use the graph to re-rank documents based on term associations. The methodology for Module 1 includes preprocessing, frequent itemset mining to construct the term graph, and approaches for ranking documents based on semantic associations in the graph.
Personalized Information Retrieval system using Computational Intelligence Te...veningstonk
The document presents research on developing a personalized information retrieval system using computational intelligence techniques. It discusses four proposed models: 1) a term association graph model for document re-ranking, 2) a topic model for document re-ranking, 3) a genetic intelligence model for document re-ranking, and 4) a swarm intelligence model for search query reformulation. The objectives are to improve retrieval effectiveness using term graphs and enhance personalized ranking using user topic modeling. Computational techniques like genetic algorithms and ant colony optimization will be used to re-rank documents and reformulate queries.
The document proposes a method to re-rank images returned from an image search engine by incorporating visual similarity. It extracts interest points from images to determine visual content. Images are then re-ranked based on visual similarity, as determined by comparing interest points. A graph model is generated to represent visual similarities between images as links. PageRank is then applied to the graph to assign priority scores to images, with more visually similar images being ranked higher. The goal is to return images that are both relevant and visually diverse.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
1. INFORMATION RETRIEVAL (IR)
(PRIVATE VS. PUBLIC)
VENINGSTON. K
Ph.D. Student, Department of CSE,
Government College of Technology, Coimbatore.
veningstonk@gct.ac.in
2. PRESENTATION OUTLINE
Public IR
What is Web IR?
Overview of Web IR Technologies
Web IR Models
Web Search architecture
Semantic Matching
Personalization in Web IR
Challenges in Web based IR
Challenges in Personalizing Web IR
Summary Note
Private IR
What is Private IR?
How Does It Work?
PIR Model
Approaches to PIR
PIR Properties
Summary Note
2
11/December/2013AICTEFDPonWebApplicationSecurity
4. WEB INFORMATION RETRIEVAL
(WEB SEARCH)
Technologies for helping users to accurately,
quickly, and easily find information on the web
11/December/2013
4
AICTEFDPonWebApplicationSecurity
5. GOAL OF WEB SEARCH
Accurate Efficient Easy to Use
Results are
relevant
Response time
is short
Good user
experience
Results are
comprehensive
Results are
novel
Fast task
completion
11/December/2013
5
AICTEFDPonWebApplicationSecurity
6. WEB USERS HEAVILY RELY ON SEARCH
ENGINES
11/December/2013
6
AICTEFDPonWebApplicationSecurity
10. COMPONENT TECHNOLOGIES FOR WEB IR
Relevance Ranking
Importance Ranking
Web Page Understanding
Query Understanding
Crawling
Indexing
Search Result Presentation
Anti-Spam
Search Log Data Mining / Web Mining
11/December/2013
10
AICTEFDPonWebApplicationSecurity
11. THREE IMPORTANT PROCESSES IN WEB IR
Retrieval
Finding documents from inverted index
Matching
Calculating relevance score between query and
document pair
Ranking
Ranking documents based on relevance scores,
importance scores, etc.,
11/December/2013
11
AICTEFDPonWebApplicationSecurity
12. WEB IR MODELS
Vector Space Model (Salton 1975 )
Probabilistic Model
Okapi or BM25 Model (Robertson and Walker
1994 )
Language Model (Ponte and Croft 1998 )
User Model
11/December/2013
12
AICTEFDPonWebApplicationSecurity
17. USER MODEL
User models are personal characteristics of the
user that the system maintains
A user profile can be thought as a user model
Types of user models
Depending on the user being modeled
Individual
Canonical (group)
Depending on Acquisition model
Explicit (stated)
Implicit (inferred)
11/December/2013
17
AICTEFDPonWebApplicationSecurity
19. PERSONALIZATION - ENVIRONMENTS WHERE
IS BEING USED
Databases
Newsgroups
Personal Information Management (desktop files, E-mail,
bookmarks, etc.)
News: electronic journals
Search engines
Web sites
Business
e-commerce
e-health
e-etc.,
11/December/2013
19
AICTEFDPonWebApplicationSecurity
20. OBJECTIVES
To enhance the Personalized Web Search and
Retrieval with an intention to satisfy user‟s search
context
To customize the Web Information Retrieval (IR)
for users.
To Provide results specific to individual users.
It is predominantly important because different users
expect different information even for the same query
To predict whether personalization required or not
To develop Computationally intelligent and
efficient algorithm for this personalization task
11/December/2013
20
AICTEFDPonWebApplicationSecurity
21. PERSONALIZATION IN WEB IR [1/2]
Web Personalization is viewed as an application
of data mining and machine learning techniques
to build models of user behavior that can be
applied to the task of predicting user needs and
adapting future interactions with the ultimate
goal of improved user satisfaction.
11/December/2013
21
AICTEFDPonWebApplicationSecurity
22. PERSONALIZATION IN WEB IR [2/2]
Initially Search engines were concerned with
retrieving relevant documents to a query.
Within the information overload on the web,
it is increasingly difficult for search engines
to satisfy the individual user needs.
Personalization has long been recognized as
an avenue to greatly improve search
experience.
Disambiguates the web search by modeling
the user profile by his/her interests and
preferences.
11/December/2013
22
AICTEFDPonWebApplicationSecurity
23. PROBLEM DESCRIPTION
Personalization in Web IR
Customize search results according to each individual user
Research questions in Personalized Web IR
What to use to Personalize?
How to model and represent past search contexts?
How to Personalize?
How to use it to improve search results?
When not to Personalize?
How to decide whether personalization required or not?
How to know Personalization helped?
How to evaluate personalized results?
11/December/2013
23
AICTEFDPonWebApplicationSecurity
24. GENERAL PROBLEM STATEMENT
When search query is issued, most of the search
engines return the same results irrespective of
the users interest
Lack the existence of semantic structure and
hence it makes difficult for the machine to
understand the information provided by the user
Lack in Identifying intention of the user
Lack in processing Inaccurate / Ambiguous
queries imprecise keyword
11/December/2013
24
AICTEFDPonWebApplicationSecurity
25. RELATED WORKS
Short term personalization - book mark
Long term personalization - browsing history
Result Diversification - Query reformulation
Collaborative personalization - for group of
users
Search interaction personalization - Clicks
Session based personalization
Location based personalization
Task based personalization
and so on…
11/December/2013
25
AICTEFDPonWebApplicationSecurity
26. ARCHITECTURE OF PERSONALIZATION BASED
WEB IR
Rankings
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
1. Doc1
2. Doc2
3. Doc3
.
.
Feedback
Query
String
Revise
d
Query
Re-Ranked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
Personalized
IR
Web
11/December/2013
26
AICTEFDPonWebApplicationSecurity
27. CHALLENGES FOR WEB IR
Distributed Data: Documents spread over millions
of different web servers.
Volatile Data: Many documents change or
disappear rapidly (e.g. dead links).
Large Volume: Billions of separate documents.
Unstructured and Redundant Data: No uniform
structure, HTML errors, up to 30% near duplicate
documents.
Quality of Data: No editorial control, false
information, poor quality writing, typos, etc.
Heterogeneous Data: Multiple media types (images,
video), languages, character sets, etc.
11/December/2013
27
AICTEFDPonWebApplicationSecurity
28. CHALLENGES FOR PERSONALIZATION IN
WEB IR
From the system centered approach to a
user centered approach to IR
Modeling the user context in personalized
IR
Exploiting the user context to enhance
search quality
The privacy issues
The evaluation issues
11/December/2013
28
AICTEFDPonWebApplicationSecurity
Focused on the
next part of
presentation
29. POSSIBLE APPROACHES TO INFORMATION
RETRIEVAL
Statistical approaches
◦ Co-occurrence of features between document
and query
◦ Rank documents based on similarity
Semantic approaches
◦ “Understand” the query, find matching
documents
User profile approaches
◦ User profiles store approximations of user
interests
11/December/2013
29
AICTEFDPonWebApplicationSecurity
30. BENEFITS OF PERSONALIZED SEARCH
Resolving ambiguity
The profile provides a context to the query in order
to reduce ambiguity.
Example: The profile of interests will allow to distinguish what
the user asked about “Jaguar” (“Animal”, “Car”) really wants
Revealing hidden treasures
The profile allows to bring the most relevant
documents, which could be hidden beyond top
results page
Example: Owner of iPhone searches for Google Android. Pages
referring to both would be most interesting
11/December/2013
30
AICTEFDPonWebApplicationSecurity
31. WHERE TO APPLY USER PROFILES?
The user profile can be applied in several ways
To modify the query itself pre-processing
Query Expansion User profile is applied to add
terms to the query
To process results of a query post-processing
To present document snippets
Adaptation of meta-search
11/December/2013
31
AICTEFDPonWebApplicationSecurity
32. VARIATIONS OF USER PROFILE USAGE
11/December/2013
32
AICTEFDPonWebApplicationSecurity
33. SUMMARY ON IR
Web Information Retrieval is a very challenging
yet exciting area!
Solution: Learning individual user to match the
query with the document
Personalized Web Information Retrieval
Promises significant quality improvements. However,
they are far from optimal
Thus, more research is necessary in the field of IR
“Computational Intelligence“ could be adopted by
search tools to manage effectively search,
retrieval, filtering and presenting relevant
information.
11/December/2013
33
AICTEFDPonWebApplicationSecurity
34. PRIVATE INFORMATION RETRIEVAL (PIR)
[1995]
Goal: allow user to query database while hiding the
identity of the data-items.
Note: hides identity of data-items; not existence of
interaction with the user.
Motivation: patent databases; stock quotes; web access
and so on.
Paradox(?): imagine buying in a store without the seller
knowing what you buy.
(Encrypting requests is useful against third parties; not
against owner of data.)
11/December/2013
34
AICTEFDPonWebApplicationSecurity
35. WHAT IS PRIVATE INFORMATION
RETRIEVAL?
Real-World Example:
Suppose there is a movie database and we
want to find information on the movie „Indian‟
We do not want anyone to know about our
interest in this movie.
11/December/2013
35
AICTEFDPonWebApplicationSecurity
36. THE GOAL OF PIR
Suppose there is a movie database and we want
to find information on the movie „Endiran‟
We do not want the database operator to know
about our interest in this movie.
Users' intentions are to be kept secret
11/December/2013
36
AICTEFDPonWebApplicationSecurity
37. HOW DOES IT WORK?
Very Simple approach
Download the entire database
Improved approach
Suppose there is a database with blocks D1,…, Dr.
A client wants to retrieve block Dα from the database
in such a way that the database operator learns
nothing about α.
Do this without downloading the entire database.
11/December/2013
37
AICTEFDPonWebApplicationSecurity
38. GOLDBERG‟S SCHEME
We can represent a database of r blocks as an rxs
matrix D and get the αth block (αth row) of D
using simple linear algebra
Dα = eα.D
Where eα =[0 0 … 1… 0] is a vector with all zeros,
except a one for the α coordinate.
There are l servers, each with a copy of the
database.
We secretly share eα in to v1,….,vl and send one to
each server.
Each server computes and sends their response
ri=vi.D
11/December/2013
38
AICTEFDPonWebApplicationSecurity
39. GOLDBERG‟S SCHEME
The responses r1,….rk are secret shares for Dα. (k
is the number of responses)
What happens if some of the responses are
wrong?
11/December/2013
39
AICTEFDPonWebApplicationSecurity
40. AOL SEARCH LOG DATA SCANDAL
#4417749:
clothes for age 60
60 single men
best retirement city
jarrett arnold
jack t. arnold
jaylene and jarrett arnold
gwinnett county yellow pages
rescue of older dogs
movies for dogs
sinus infection
Thelma Arnold
62-year-old widow
Lilburn, Georgia
11/December/2013
40
AICTEFDPonWebApplicationSecurity
41. OBSERVATION
The owners of databases know a lot about the
users!
This poses a risk to users‟ privacy.
E.g. consider database with stock prices
What can we do?
Trust them that they will protect our secrecy,
or
Use Cryptography
11/December/2013
41
AICTEFDPonWebApplicationSecurity
42. HOW CAN CRYPTO HELP?
Note: This problem has nothing to do with
secure communication!
user U database D
11/December/2013
42
AICTEFDPonWebApplicationSecurity
43. CURRENT SETTING
user U
database D
A new primitive:
Private Information Retrieval (PIR)
secure link
11/December/2013
43
AICTEFDPonWebApplicationSecurity
44. MODELING PIR
Server: holds n-bit string x
n should be thought of as very large
User: desires
to retrieve xi and
to keep i private
11/December/2013
44
AICTEFDPonWebApplicationSecurity
45. x=x1,x2 , . . ., xn {0,1}n
SERVER
i {1,…n}
xi
USER
i j
PRIVATE PROTOCOL TO INFORMATION
RETRIEVAL
11/December/2013
45
AICTEFDPonWebApplicationSecurity
46. There is NO privacy preservation.
Communication Cost: log n
SERVER
USER
x =x1,x2 , . . ., xn
xi
NON-PRIVATE PROTOCOL
i
i {1,…n}
11/December/2013
46
AICTEFDPonWebApplicationSecurity
47. Server sends entire database x to User.
Information theoretic privacy.
Communication Cost: n
SERVER
xi
USER
x =x1,x2 , . . ., xn
x1,x2 , . . ., xn
TRIVIAL PRIVATE PROTOCOL
Is this optimal?
“The number of bits communicated
between U and S has to be smaller
than n.”
11/December/2013
47
AICTEFDPonWebApplicationSecurity
48. PROBLEM
In any 1-server PIR with information
theoretic privacy the communication is at
least n.
11/December/2013
48
AICTEFDPonWebApplicationSecurity
49. POSSIBLE SOLUTIONS
User is asked for additional random indices.
Drawback: reveals a lot of information
Employ general crypto protocols to compute xi
privately.
Drawback: highly inefficient (polynomial in n).
Anonymity.
Note: Hides identity of user; not the fact that xi is
retrieved.
11/December/2013
49
AICTEFDPonWebApplicationSecurity
50. ANONYMITY - EXAMPLE
Original Data vs. Anonymized Data
11/December/2013
50
AICTEFDPonWebApplicationSecurity
51. TWO APPROACHES
Information-Theoretic PIR
Replicate database among k servers.
Unconditional privacy against t servers.
Computational PIR
Computational privacy, based on cryptographic
assumptions.
11/December/2013
51
AICTEFDPonWebApplicationSecurity
52. INFORMATION THEORETIC PRIVACY
(PERFECT PRIVACY)
The distribution of the queries the user sends to
any server is independent of the index he/she
wishes to retrieve.
This means that each server cannot gain any
information about user‟s interest regardless of
his computational power.
11/December/2013
52
AICTEFDPonWebApplicationSecurity
53. COMPUTATIONAL PRIVACY
The distributions of the queries the user sends to
any server are computationally indistinguishable
by varying the index.
This means that each server cannot gain any
information about user‟s interest provided that
he/she is computationally bounded.
11/December/2013
53
AICTEFDPonWebApplicationSecurity
54. COMMUNICATION COST
Multiple servers, information-theoretic
PIR:
2 servers, comm. n1/2
k servers, comm. n1/k
log n servers, comm. Poly( log(n) )
Single server, computational PIR:
Comm. Poly( log(n) )
11/December/2013
54
AICTEFDPonWebApplicationSecurity
55. K-SERVER PIR
Correctness: User
obtains xi
Privacy: No single
server gets
information about i
U
S1
x {0,1}n
S2
x {0,1}n
i
x {0,1}n
Sk
11/December/2013
55
AICTEFDPonWebApplicationSecurity
56. input:
PIR PROPERTIES
B1 B2 … Bw
input:
index i = 1,…,w
• the user learns Bi
• the database does not learn i
• the total communication is < w
Note: secrecy of the database is not required
correctness
secrecy (of the user)
non-triviality
These properties needs to be defined more formally!
polynomial time randomized interactive algorithms
11/December/2013
56
AICTEFDPonWebApplicationSecurity
57. PIR PROPERTIES
Correctness
In every invocation of the protocol the user retrieves
the bit he is interested in (i.e. xi)
Privacy
In every invocation of the protocol each server does
not gain any information about the index of the bit
retrieved by the user (i.e. i).
11/December/2013
57
AICTEFDPonWebApplicationSecurity
58. PIR DOESN‟T EXISTS [1/4]
Correctness, Non-triviality and Secrecy CANNOT be
satisfied simultaneously.
Def: A transcript T is possible for (i,B) if P(T(i,B) = T) > 0
Take some T’, and look where it is possible:
T’ T’
T’ T’
indices i
databasesB
11/December/2013AICTEFDPonWebApplicationSecurity
58
59. PIR DOESN‟T EXISTS [2/4]
secrecy → if
T’ is possible for some B and i
then
it is possible for B and all the other i’s
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
indices i
databasesB
T’ T’
T’ T’
11/December/2013AICTEFDPonWebApplicationSecurity
59
60. PIR DOESN‟T EXISTS [3/4]
non-triviality → length(transcript) < length(database)
↓
# transcripts < #databases
↓
there has to exist T’ that is possible for
two databases B0 and B1
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
databasesB
← B0
← B1
indices i
11/December/2013AICTEFDPonWebApplicationSecurity
60
61. PIR DOESN‟T EXISTS [4/4]
B0 and B1 differ on at least one index i’. So, if i’ is the input
of the user then
correctness → contradiction
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
databasesB
← B0
← B1
i‟
↓
indices i
11/December/2013AICTEFDPonWebApplicationSecurity
61
62. THUS, IDEAL PIR DOESN‟T EXIST!
How to bypass the impossibility result?
Two ideas:
limit the computing power of a cheating database
use a larger number of “independent” databases
11/December/2013AICTEFDPonWebApplicationSecurity
62
63. SUMMARY
Complexity of PIR
Communication
Computation
Possible Extensions
Symmetric PIR
User may not learn any item other than the one he/she
requested
Searching by key-words
Public-key encryption with key-word search
11/December/2013
63
AICTEFDPonWebApplicationSecurity
64. REFERENCES
Xiaohui Tao, Yuefeng Li, and Ning Zhong, “A Personalized Ontology model for
Web information gathering”, IEEE Trans. Knowledge and Data Engg., vol.23, No.
4, pp 496-511, April 2011.
Markus Strohmaier, Mark Kr¨oll“Acquiring Knowledge about human goals from
search query logs”, ACM Transactions on Information System, March 2011.
K.W.-T. Leung, W. Ng, and D.L. Lee, “Deriving Concept- Based User Profiles
from Search Engine Logs,” IEEE Trans. Knowledge and Data Engg., vol. 22,
no. 7, pp 969-982, July. 2010.
Zhicheng Dou, Ruihua Song, Ji-Rong Wen, and Xiaojie Yuan, “Evaluating the
Effectiveness of Personalized Web Search” IEEE Trans. Knowledge and Data
Engg., Vol. 21, No. 8,pp 1178-1190, Aug 2009.
Y. Li and N. Zhong. “Mining Ontology for Automatically Acquiring Web User
Information Needs”, IEEE Transactions on Knowledge and Data Engg., 18(4), pp
554-568, April 2006.
Fang Liu, Clement Yu, Weiyi Meng, “Personalized Web Search for Improving
Retrieval Effectiveness” IEEE Trans. Knowledge and Data Engg., Vol. 16, No.
1,pp 28-40, January 2004.
B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information
retrieval”. Journal of the ACM 45(6),pp 965-982, 1995.