Social networks initially had been places for people to contact each other, find friends or new
acquaintances. As such they ever proved interesting for machine aided analysis. Recent
developments, however, pivoted social networks to being among the main fields of information
exchange, opinion expression and debate. As a result there is growing interest in both analyzing
and integrating social network services. In this environment efficient information retrieval is
hindered by the vast amount and varying quality of the user-generated content. Guiding users to
relevant information is a valuable service and also a difficult task, where a crucial part of the
process is accurately resolving duplicate entities to real-world ones. In this paper we propose a
novel approach that utilizes the principles of link mining to successfully extend the methodology
of entity resolution to multitype problems. The proposed method is presented using an
illustrative social network-based real-world example and validated by comprehensive
evaluation of the results.
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
Https://javacoffeeiq.com
Alex Pentland puts it in his productivity study, “fewer memos, more coffee breaks” increases productivity via socialisation and collaboration among staff members.
Evolving social data mining and affective analysis Athena Vakali
Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications
Survey of data mining techniques for socialFiras Husseini
This document summarizes data mining techniques that have been used for social network analysis. It discusses how social networks generate massive amounts of data that present computational challenges due to their size, noise, and dynamism. It then reviews both traditional and recent unsupervised, semi-supervised, and supervised data mining techniques that have been applied to social network analysis to handle these challenges and discover useful knowledge from social network data, including graph theoretic techniques, tools for analyzing opinions and sentiment, and techniques for topic detection and tracking.
Online Data Preprocessing: A Case Study ApproachIJECEIAES
Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.
Looking beyond plain text for document representation in the enterpriseArjen de Vries
In many real life scenarios, searching for information is not the user's end goal. In this presentation I look into the specific example of corporate strategy and business development in a university setting.
In today's academic institutions, strategic questions are those that relate to dependency on funding instruments, the public private partnerships that exist (and those that should be extended!), and the match between topic areas addressed by the research staff and those claimed important by policy makers. The professional search tasks encountered to answer questions in this domain are usually addressed by business intelligence (BI) tools, and not by search engines. However, professionals are known to be busy people inspired by their own research interests, and not particularly fond of keeping the
customer relationship management (CRM) or knowledge management systems up to date for the organisation's strategic interest. This then results in incomplete and inaccurate data.
Instead of requiring research staff (or their administrative support) to provide this management information, I will illustrate by example how the desired information usually exists already in the documents inherent to the academic work process. Information retrieval could thus play an important role in the computer systems that support the business analytics involved, and could significantly improve the coverage of entities of interest - i.e., to reduce the effort involved in achieving good recall in business analytics. The ranking functionality over the enterprise's (textual) content should however not be an isolated component. Our example setting integrates the information derived from research proposals, research publications and the financial systems, providing an excellent motivation for a more unified approach to structured and unstructured data.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
Https://javacoffeeiq.com
Alex Pentland puts it in his productivity study, “fewer memos, more coffee breaks” increases productivity via socialisation and collaboration among staff members.
Evolving social data mining and affective analysis Athena Vakali
Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications
Survey of data mining techniques for socialFiras Husseini
This document summarizes data mining techniques that have been used for social network analysis. It discusses how social networks generate massive amounts of data that present computational challenges due to their size, noise, and dynamism. It then reviews both traditional and recent unsupervised, semi-supervised, and supervised data mining techniques that have been applied to social network analysis to handle these challenges and discover useful knowledge from social network data, including graph theoretic techniques, tools for analyzing opinions and sentiment, and techniques for topic detection and tracking.
Online Data Preprocessing: A Case Study ApproachIJECEIAES
Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.
Looking beyond plain text for document representation in the enterpriseArjen de Vries
In many real life scenarios, searching for information is not the user's end goal. In this presentation I look into the specific example of corporate strategy and business development in a university setting.
In today's academic institutions, strategic questions are those that relate to dependency on funding instruments, the public private partnerships that exist (and those that should be extended!), and the match between topic areas addressed by the research staff and those claimed important by policy makers. The professional search tasks encountered to answer questions in this domain are usually addressed by business intelligence (BI) tools, and not by search engines. However, professionals are known to be busy people inspired by their own research interests, and not particularly fond of keeping the
customer relationship management (CRM) or knowledge management systems up to date for the organisation's strategic interest. This then results in incomplete and inaccurate data.
Instead of requiring research staff (or their administrative support) to provide this management information, I will illustrate by example how the desired information usually exists already in the documents inherent to the academic work process. Information retrieval could thus play an important role in the computer systems that support the business analytics involved, and could significantly improve the coverage of entities of interest - i.e., to reduce the effort involved in achieving good recall in business analytics. The ranking functionality over the enterprise's (textual) content should however not be an isolated component. Our example setting integrates the information derived from research proposals, research publications and the financial systems, providing an excellent motivation for a more unified approach to structured and unstructured data.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
This document discusses social network analysis and its applications. It defines a social network as a social structure made up of individuals or organizations connected by relationships. Social network analysis maps and measures these relationships to derive insights. The document provides an overview of key social network analysis concepts like nodes, edges, degrees of separation, centrality, and communities. It then discusses various business applications of social network analysis like churn reduction, cross-selling, marketing, and competitive analysis. Overall, the document promotes social network analysis as a technique for understanding relationship dynamics and improving business outcomes.
The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.
Details at: http://dmml.asu.edu/smm/
Organizational Overlap on Social Networks and its ApplicationsSam Shah
[This work was presented at WWW 2013.]
Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.
Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval!
The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks.
In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web.
[1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107
An improvised model for identifying influential nodes in multi parameter soci...csandit
Influence Maximization is one of the major tasks in the field of viral marketing and community
detection. Based on the observation that social networks in general are multi-parameter graphs
and viral marketing or Influence Maximization is based on few parameters, we propose to
convert the general social networks into “interest graphs”. We have proposed an improvised
model for identifying influential nodes in multi-parameter social networks using these “interest
graphs”. The experiments conducted on these interest graphs have shown better results than the
method proposed in [8].
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Literature Review on Social Networking in Supply chainSujoy Bag
This document provides a literature review on social networking in supply chains. It begins with an introduction discussing how social media influences purchase decisions and how products gain popularity through social influence. The objective is to reduce uncertainty and delay to maximize profit and utility. The literature review then summarizes several academic papers from 2008-2016 addressing topics like influence maximization in networks, learning to coordinate in social networks, competing over networks, and using social networks to optimize demand forecasting and pricing. References are provided at the end.
Crowdsourcing the Policy Cycle - Collective Intelligence 2014 Araz Taeihagh ...Araz Taeihagh
Crowdsourcing is beginning to be used for policymaking. The “wisdom of crowds” [Surowiecki 2005], and crowdsourcing [Brabham 2008], are seen as new avenues that can shape all kinds of policy, from transportation policy [Nash 2009] to urban planning [Seltzer and Mahmoudi 2013], to climate policy. In general, many have high expectations for positive outcomes with crowdsourcing, and based on both anecdotal and empirical evidence, some of these expectations seem justified [Majchrzak and Malhotra 2013]. Yet, to our knowledge, research has yet to emerge that unpacks the different forms of crowdsourcing in light of each stage of the well-established policy cycle. This work addresses this research gap, and in doing so brings increased nuance to the application of crowdsourcing techniques for policymaking.
Results for Web Graph Mining Base Recommender System for Query, Image and Soc...iosrjce
1) The document discusses results from implementing a web graph mining based recommender system for queries, images, and social networks using query suggestion and heat diffusion algorithms.
2) It shows that the system effectively recommends queries and images that are literally and semantically related to test queries and images.
3) It also demonstrates recommending items to users in a social network based on trust values between users, combining user-user and user-item relationships to provide personalized recommendations.
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014Araz Taeihagh
This document summarizes an experiment that tested whether non-expert crowds can assess policy measures as well as experts. Researchers conducted two experiments using crowdsourced labor markets: one with Dutch participants and one with global participants. Both crowds evaluated 96 climate change adaptation policy measures on criteria like importance and urgency. The measures were previously assessed by Dutch experts. Results were analyzed to see how non-expert crowd assessments compared to expert assessments and whether geography impacted non-expert performance. The goal was to determine if crowds could be useful for policy design by evaluating policy alternatives.
An empirical study on the usage of social media in german b2 c online storesijait
Customers in electronic commerce (e-commerce) are shifting more and more from content consumers to
content producers. Social media features (like customer reviews) allow and encourage user interaction in
online stores. An interesting question is, which social media features are actually provided by online stores
to support user interaction. We contribute knowledge to this question, by studying the social media features
of the 115 highest-grossing German B2C-online stores from the years 2010 and 2011. We categorize the
results of the observational study into the seven building blocks of social media to understand what areas
of social media are used the most in these online stores. The results of our study show, that the average
online store implements about five social media features and that the majority of the features are placed on
product pages. The most common features were customer reviews and ratings and the sharing and liking of
product details.
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
The popularity of online networks provides an opportunity to study the characteristics of online social network graphs is important, both to improve current systems and to design new application of online social networks. Although personalized search has been proposed for many years and many personalization strategies have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users, and under different search contexts. In this paper, we study performance of information collection in a dynamic social network. By analyzing the results, we reveal that personalized search has significant improvement over common web search.
The mixing time of thee sampling process strongly depends on the characteristics of the graph.
Page access coefficient algorithm for information filteringIAEME Publication
The document presents a new algorithm called PAC (Page Access Coefficient) to calculate the rank of web pages for information filtering in social networks. The PAC algorithm aims to reduce computational calculations and time complexity compared to existing PageRank and Weighted PageRank algorithms. The PAC value for a web page is calculated based on the number of incoming links to the page, outgoing links from the page, and total number of pages. This value is used to rank pages when retrieving information, with the goal of presenting users the most relevant results first using less computations than other algorithms.
1. The document discusses the potential uses of social media analytics in local government. It identifies four main purposes: communication, public relations, customer services, and public consultation/engagement.
2. It details a research project with two UK city councils that included a workshop introducing social media analytics tools and interviews. The workshop aimed to demonstrate tools and generate reflection on their value.
3. Interview findings suggested analytics could provide insights supplementing existing knowledge of public networks and discussions. Tools were seen as potentially valuable across different council departments.
Small Worlds With a Difference: New Gatekeepers and the Filtering of Politica...Pascal Juergens
Political discussions on social network platforms represent an increasingly relevant source of political information, an opportunity for the exchange of opinions and a popular source of quotes for media outlets. We analyzed political communication on Twitter during the run-up to the German general election of 2009 by extracting a directed network of user interactions based on the exchange of political information and opinions. In consonance with expectations from previous research, the resulting network exhibits small-world properties, lending itself to fast and efficient information diffusion. We go on to demonstrate that precisely the highly connected nodes, characteristic for small-world networks, are in a position to exert strong, selective influence on the information passed within the network. We use a metric based on entropy to identify these New Gatekeepers and their impact on the information flow. Finally, we perform an analysis of their input and output of political messages. It is shown that both the New Gatekeepers and ordinary users tend to filter political content on Twitter based on their personal preferences. Thus, we show that political communication on Twitter is at the same time highly dependent on a small number of users, critically positioned in the structure of the network, as well as biased by their own political perspectives.
Social network websites: best practices from leading servicesFabernovel
This document provides a general background for understanding social network websites and the study of online matchmaking websites and business network websites
This study is only the first step. Distributed under creative commons license, it should be completed and improved through the contribution of external experts, firms and web users as major moves in the industry are expected to occur in the coming months
Topological Relations in Linked Geographic DataBlake Regalia
Computing and querying strict, approximate and metrically refined topological relations in Linked Geographic Data. Enriching DBpedia knowledge graph with place-place topology using complex geometries from OpenStreetMap to support querying and reasoning.
https://blake-regalia.net/resource/2019-TGIS_Topology.pdf
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
This document summarizes a presentation about exploring the structure of government on the web. It discusses a research project comparing the institutional structure of e-government in the UK and Australia by analyzing government hyperlink networks. The project aims to assess whether online network structures reflect offline institutions and determine the "nodality" or centrality of government websites. It describes collecting hyperlink data from government seed websites in both countries using crawling software. Network reduction techniques and community detection algorithms are discussed to analyze the large datasets. Preliminary results found influential communities formed around social media and some government business websites. The document also covers manually coding a sample of websites to train machine learning models to automatically classify the policy domains of other websites.
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses different approaches for analyzing social media data to gain customer insights:
1) Channel reporting tools provide overviews of specific social media platforms but lack deeper insights.
2) Scorecard systems aggregate data across sources but users cannot enhance the data.
3) Text mining analyzes sentiment but network analysis examines relationships; each technique has limitations alone.
4) The document proposes combining text mining, network analysis, and other techniques using a predictive analytics platform to generate new insights, as was done successfully for a major European telecom company.
It provides examples analyzing publicly available Slashdot data to identify influencers and show how sentiment relates to influence.
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
This document describes a proposed algorithm for improving recommendation systems for e-services. It involves the following key steps:
1. Clustering customer transaction histories to group similar purchase patterns and derive customer-based recommendations.
2. Using incremental association rule mining on the transaction data to detect frequently purchased item sets and relationships between items.
3. Developing a fuzzy model to classify customers and provide dynamic recommendations tailored to different customer types. The recommendations will be based on matching customer preferences and purchase histories to specific product sets.
4. The algorithm clusters transactions, mines association rules incrementally as new data is added, and generates recommendations by classifying customers and matching them to relevant product clusters. This provides a personalized and
This document discusses social network analysis and its applications. It defines a social network as a social structure made up of individuals or organizations connected by relationships. Social network analysis maps and measures these relationships to derive insights. The document provides an overview of key social network analysis concepts like nodes, edges, degrees of separation, centrality, and communities. It then discusses various business applications of social network analysis like churn reduction, cross-selling, marketing, and competitive analysis. Overall, the document promotes social network analysis as a technique for understanding relationship dynamics and improving business outcomes.
The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.
Details at: http://dmml.asu.edu/smm/
Organizational Overlap on Social Networks and its ApplicationsSam Shah
[This work was presented at WWW 2013.]
Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.
Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval!
The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks.
In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web.
[1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107
An improvised model for identifying influential nodes in multi parameter soci...csandit
Influence Maximization is one of the major tasks in the field of viral marketing and community
detection. Based on the observation that social networks in general are multi-parameter graphs
and viral marketing or Influence Maximization is based on few parameters, we propose to
convert the general social networks into “interest graphs”. We have proposed an improvised
model for identifying influential nodes in multi-parameter social networks using these “interest
graphs”. The experiments conducted on these interest graphs have shown better results than the
method proposed in [8].
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Literature Review on Social Networking in Supply chainSujoy Bag
This document provides a literature review on social networking in supply chains. It begins with an introduction discussing how social media influences purchase decisions and how products gain popularity through social influence. The objective is to reduce uncertainty and delay to maximize profit and utility. The literature review then summarizes several academic papers from 2008-2016 addressing topics like influence maximization in networks, learning to coordinate in social networks, competing over networks, and using social networks to optimize demand forecasting and pricing. References are provided at the end.
Crowdsourcing the Policy Cycle - Collective Intelligence 2014 Araz Taeihagh ...Araz Taeihagh
Crowdsourcing is beginning to be used for policymaking. The “wisdom of crowds” [Surowiecki 2005], and crowdsourcing [Brabham 2008], are seen as new avenues that can shape all kinds of policy, from transportation policy [Nash 2009] to urban planning [Seltzer and Mahmoudi 2013], to climate policy. In general, many have high expectations for positive outcomes with crowdsourcing, and based on both anecdotal and empirical evidence, some of these expectations seem justified [Majchrzak and Malhotra 2013]. Yet, to our knowledge, research has yet to emerge that unpacks the different forms of crowdsourcing in light of each stage of the well-established policy cycle. This work addresses this research gap, and in doing so brings increased nuance to the application of crowdsourcing techniques for policymaking.
Results for Web Graph Mining Base Recommender System for Query, Image and Soc...iosrjce
1) The document discusses results from implementing a web graph mining based recommender system for queries, images, and social networks using query suggestion and heat diffusion algorithms.
2) It shows that the system effectively recommends queries and images that are literally and semantically related to test queries and images.
3) It also demonstrates recommending items to users in a social network based on trust values between users, combining user-user and user-item relationships to provide personalized recommendations.
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014Araz Taeihagh
This document summarizes an experiment that tested whether non-expert crowds can assess policy measures as well as experts. Researchers conducted two experiments using crowdsourced labor markets: one with Dutch participants and one with global participants. Both crowds evaluated 96 climate change adaptation policy measures on criteria like importance and urgency. The measures were previously assessed by Dutch experts. Results were analyzed to see how non-expert crowd assessments compared to expert assessments and whether geography impacted non-expert performance. The goal was to determine if crowds could be useful for policy design by evaluating policy alternatives.
An empirical study on the usage of social media in german b2 c online storesijait
Customers in electronic commerce (e-commerce) are shifting more and more from content consumers to
content producers. Social media features (like customer reviews) allow and encourage user interaction in
online stores. An interesting question is, which social media features are actually provided by online stores
to support user interaction. We contribute knowledge to this question, by studying the social media features
of the 115 highest-grossing German B2C-online stores from the years 2010 and 2011. We categorize the
results of the observational study into the seven building blocks of social media to understand what areas
of social media are used the most in these online stores. The results of our study show, that the average
online store implements about five social media features and that the majority of the features are placed on
product pages. The most common features were customer reviews and ratings and the sharing and liking of
product details.
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
The popularity of online networks provides an opportunity to study the characteristics of online social network graphs is important, both to improve current systems and to design new application of online social networks. Although personalized search has been proposed for many years and many personalization strategies have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users, and under different search contexts. In this paper, we study performance of information collection in a dynamic social network. By analyzing the results, we reveal that personalized search has significant improvement over common web search.
The mixing time of thee sampling process strongly depends on the characteristics of the graph.
Page access coefficient algorithm for information filteringIAEME Publication
The document presents a new algorithm called PAC (Page Access Coefficient) to calculate the rank of web pages for information filtering in social networks. The PAC algorithm aims to reduce computational calculations and time complexity compared to existing PageRank and Weighted PageRank algorithms. The PAC value for a web page is calculated based on the number of incoming links to the page, outgoing links from the page, and total number of pages. This value is used to rank pages when retrieving information, with the goal of presenting users the most relevant results first using less computations than other algorithms.
1. The document discusses the potential uses of social media analytics in local government. It identifies four main purposes: communication, public relations, customer services, and public consultation/engagement.
2. It details a research project with two UK city councils that included a workshop introducing social media analytics tools and interviews. The workshop aimed to demonstrate tools and generate reflection on their value.
3. Interview findings suggested analytics could provide insights supplementing existing knowledge of public networks and discussions. Tools were seen as potentially valuable across different council departments.
Small Worlds With a Difference: New Gatekeepers and the Filtering of Politica...Pascal Juergens
Political discussions on social network platforms represent an increasingly relevant source of political information, an opportunity for the exchange of opinions and a popular source of quotes for media outlets. We analyzed political communication on Twitter during the run-up to the German general election of 2009 by extracting a directed network of user interactions based on the exchange of political information and opinions. In consonance with expectations from previous research, the resulting network exhibits small-world properties, lending itself to fast and efficient information diffusion. We go on to demonstrate that precisely the highly connected nodes, characteristic for small-world networks, are in a position to exert strong, selective influence on the information passed within the network. We use a metric based on entropy to identify these New Gatekeepers and their impact on the information flow. Finally, we perform an analysis of their input and output of political messages. It is shown that both the New Gatekeepers and ordinary users tend to filter political content on Twitter based on their personal preferences. Thus, we show that political communication on Twitter is at the same time highly dependent on a small number of users, critically positioned in the structure of the network, as well as biased by their own political perspectives.
Social network websites: best practices from leading servicesFabernovel
This document provides a general background for understanding social network websites and the study of online matchmaking websites and business network websites
This study is only the first step. Distributed under creative commons license, it should be completed and improved through the contribution of external experts, firms and web users as major moves in the industry are expected to occur in the coming months
Topological Relations in Linked Geographic DataBlake Regalia
Computing and querying strict, approximate and metrically refined topological relations in Linked Geographic Data. Enriching DBpedia knowledge graph with place-place topology using complex geometries from OpenStreetMap to support querying and reasoning.
https://blake-regalia.net/resource/2019-TGIS_Topology.pdf
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
This document summarizes a presentation about exploring the structure of government on the web. It discusses a research project comparing the institutional structure of e-government in the UK and Australia by analyzing government hyperlink networks. The project aims to assess whether online network structures reflect offline institutions and determine the "nodality" or centrality of government websites. It describes collecting hyperlink data from government seed websites in both countries using crawling software. Network reduction techniques and community detection algorithms are discussed to analyze the large datasets. Preliminary results found influential communities formed around social media and some government business websites. The document also covers manually coding a sample of websites to train machine learning models to automatically classify the policy domains of other websites.
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses different approaches for analyzing social media data to gain customer insights:
1) Channel reporting tools provide overviews of specific social media platforms but lack deeper insights.
2) Scorecard systems aggregate data across sources but users cannot enhance the data.
3) Text mining analyzes sentiment but network analysis examines relationships; each technique has limitations alone.
4) The document proposes combining text mining, network analysis, and other techniques using a predictive analytics platform to generate new insights, as was done successfully for a major European telecom company.
It provides examples analyzing publicly available Slashdot data to identify influencers and show how sentiment relates to influence.
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
This document describes a proposed algorithm for improving recommendation systems for e-services. It involves the following key steps:
1. Clustering customer transaction histories to group similar purchase patterns and derive customer-based recommendations.
2. Using incremental association rule mining on the transaction data to detect frequently purchased item sets and relationships between items.
3. Developing a fuzzy model to classify customers and provide dynamic recommendations tailored to different customer types. The recommendations will be based on matching customer preferences and purchase histories to specific product sets.
4. The algorithm clusters transactions, mines association rules incrementally as new data is added, and generates recommendations by classifying customers and matching them to relevant product clusters. This provides a personalized and
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
This document summarizes recent advances in collaborative filtering techniques for recommender systems. It describes how matrix factorization models have become popular for implementing collaborative filtering due to their accuracy. Neighborhood methods were also improved to be more accurate. The document outlines extensions that leverage temporal data and implicit feedback to further improve model accuracy. Key collaborative filtering approaches like matrix factorization, neighborhood methods, and techniques that combine their strengths are discussed.
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
This document describes a study that used a logic-based machine learning approach called APARELL to learn user preferences from pairwise comparisons in a recommender system. APARELL learns description logic rules from examples of users' pairwise item preferences and background knowledge about item properties. It was implemented in a used car recommender system. A user study evaluated the pairwise preference elicitation method compared to a standard list interface. Results showed the pairwise interface was significantly better in recommending items that matched users' preferences.
How Does Data Create Economic Value_ Foundations For Valuation Models.pdfDr. Anke Nestler
In order to clarify how data creates economic value, the measurement of data value and the process of valuation of data contribution to economic benefits is being discussed in this article leading to the identification of ways to operationalize data valuation methodologies. The four independent authors André Gorius, Véronique Blum, Andreas Liebl, and Anke Nestler are leading European IP and licensing specialists of the International Licensing Executives Society (LESI).
Um zu klären, wie Daten einen wirtschaftlichen Wert schaffen, werden in diesem Artikel die Messung von Datenwerten und der Prozess der Bewertung des Beitrags von Daten zum wirtschaftlichen Nutzen erörtert, um Wege zur Operationalisierung von Datenbewertungsmethoden zu finden. Die vier unabhängigen Autoren André Gorius, Véronique Blum, Andreas Liebl und Anke Nestler sind führende europäische IP- und Lizenzierungsspezialisten der International Licensing Executives Society (LESI).
Here is a proposed relational database design for an ecommerce website:
Entities:
- Products
- Categories
- Orders
- Customers
- Reviews
Attributes:
Products:
- Product ID (primary key)
- Name
- Description
- Price
- Inventory level
- Category ID (foreign key to Categories)
Categories:
- Category ID (primary key)
- Name
Orders:
- Order ID (primary key)
- Customer ID (foreign key to Customers)
- Date
- Status
OrderItems:
- OrderItem ID (primary key)
- Order ID (foreign key to Orders)
- Product ID (
Multidirectional Product Support System for Decision Making In Textile Indust...IOSR Journals
This document proposes a multidirectional rank prediction algorithm (MDRP) for decision making in the textile industry using collaborative filtering methods. MDRP learns asymmetric similarities between users, items, ratings, and sellers simultaneously through matrix factorization to overcome data sparsity and scalability issues. The algorithm was tested on textile datasets and analyzed product and user preferences. Results showed MDRP provided more accurate recommendations than existing similarity learning and collaborative filtering methods. MDRP allows effective decision making for multiple entities with multiple attributes.
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
The document discusses a novel method called ProMiSH (Projection and Multi Scale Hashing) for keyword search in multi-dimensional datasets. ProMiSH uses random projection and hash-based index structures to achieve high scalability and speedup of more than four orders over state-of-the-art tree-based techniques. Empirical studies on real and synthetic datasets of sizes up to 10 million objects and 100 dimensions show ProMiSH scales linearly with dataset size, dimension, query size, and result size. The method groups objects embedded in a vector space that are tagged with keywords matching a given query.
Matching data detection for the integration systemIJECEIAES
The purpose of data integration is to integrate the multiple sources of heterogeneous data available on the internet, such as text, image, and video. After this stage, the data becomes large. Therefore, it is necessary to analyze the data that can be used for the efficient execution of the query. However, we have problems with solving entities, so it is necessary to use different techniques to analyze and verify the data quality in order to obtain good data management. Then, when we have a single database, we call this mechanism deduplication. To solve the problems above, we propose in this article a method to calculate the similarity between the potential duplicate data. This solution is based on graphics technology to narrow the search field for similar features. Then, a composite mechanism is used to locate the most similar records in our database to improve the quality of the data to make good decisions from heterogeneous sources.
This document describes a project to develop an adaptive filter for detecting advertisements in social media posts about financial topics. It will do this through four main steps: 1) preprocessing and cleaning the raw text data, 2) selecting important features from the text, 3) developing filtering algorithms using naive Bayes and random forest classifiers, and 4) validating the results. The goal is to remove advertisement posts from the data to obtain more valuable business insights for enterprises analyzing social media about finance.
The document proposes a framework called SCAN to help optimize the spread of information on the web and its potential impact. SCAN uses semantic technologies, statistical and learning methods to 1) model and analyze communication across online channels, 2) forecast and measure the impact of communication, and 3) provide suggestions on which channels to use based on the analysis and measurements to increase reach and effectiveness of social media marketing campaigns. The framework aims to help professionals and users manage complexity in the modern, multi-channel web environment.
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...ijaia
This document summarizes a research paper that used machine learning algorithms to analyze social networks on YouTube. The researchers used unsupervised learning techniques like clustering and centrality measures to identify communities and influential users. Specifically, they used Louvain modularity and spectral clustering to detect groups for advertising purposes. Degree centrality and clique centrality were calculated to find central nodes that could be targeted for sponsorship deals. The experiments showed the algorithms could successfully find tightly-knit groups and key influencers within the larger YouTube network.
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...gerogepatton
The goal of this research project is to analyze the dynamics of social networks using machine learning techniques to locate maximal cliques and to find clusters for the purpose of identifying a target demographic. Unsupervised machine learning techniques are designed and implemented in this project to analyze a dataset from YouTube to discover communities in the social network and find central nodes. Different clustering algorithms are implemented and applied to the YouTube dataset. The well-known Bron-Kerbosch algorithm is used effectively in this research to find maximal cliques. The results obtained from this research could be used for advertising purposes and for building smart recommendation systems. All algorithms were implemented using Python programming language. The experimental results show that we were able to successfully find central nodes through clique-centrality and degree centrality. By utilizing clique detection algorithms, the research shown how machine learning algorithms can detect close knit groups within a larger network.
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...gerogepatton
This document summarizes a research paper that used machine learning algorithms to analyze social networks on YouTube. The researchers used unsupervised learning techniques like clustering and centrality measures to identify communities and influential users. Specifically, they used Louvain modularity and spectral clustering to detect groups for advertising purposes. Degree centrality and clique centrality were calculated to find central nodes that could be identified as influencers for product sponsorship. The experiments showed the algorithms could successfully find tightly-knit groups and key users within the larger YouTube network.
This document discusses the SMAC stack, which refers to an ecosystem of four converging technologies - Social, Mobile, Analytics, and Cloud - that can enhance IT architecture and enable digital transformation.
It provides details on each component: social for networking and marketing, mobile for accessibility, analytics for analyzing large datasets, and cloud for data storage and sharing resources. When implemented together, the SMAC stack provides a holistic solution.
Trends show customers now rely on online information, reviews, and easy mobile access. Companies use social media, mobile apps, big data analysis, and cloud services. The author's main interest is analytics due to growing data volumes and the potential to help companies manage this data.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
The document describes the Semantic Communication Engine Innsbruck (SCEI), a software suite that supports online communication, feedback collection, and impact measurement across multiple channels. It introduces key terms and defines the problem of managing content distribution across different online channels. The proposed solution features a semantic layer that abstracts domain concepts from specific channels, and a "weaving process" that aligns content with channels. The architecture separates the software into a content management system and a distribution component called dacodi, which uses adapters to interface with individual channels in a standardized way.
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
Knowledge is the ultimate output of decisions on a dataset. The revolution of the Internet has made the global distance closer with the touch on the hand held electronic devices. Usage of social media sites have increased in the past decades. One of the most popular social media micro blog is Twitter. Twitter has millions of users in the world. In this paper the analysis of Twitter data is performed through the text contained in hash tags. After Preprocessing clustering algorithms are applied on text data. The different clusters formed are compared through various parameters. Visualization techniques are used to portray the results from which inferences like time series and topic flow can be easily made. The observed results show that the hierarchical clustering algorithm performs better than other algorithms.
Similar to A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS (20)
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
2. 100 Computer Science & Information Technology (CS & IT)
cleaning or when integrating data from multiple sources. Besides corporate master data
management for customer and product information an obvious example is price engines, e.g.
Shopzilla or PriceGrabber. These services offer easy selling price comparison of a single product
offered by multiple vendors. A real-world example of duplicated records is shown in Table 1.
While this action is relatively easy for humans and is carried out on-the-fly, machines however,
without the proper understanding of information encapsulated in each record, tend to stumble in
resolving the issue. The high importance and difficulty of the entity resolution problem has
triggered a large amount of research on different variants of the problem and a fair amount of
approaches have been proposed, including predictive algorithms [1], similarity functions [2],
graphical approaches [3] and even crowdsourcing methods [4].
Table 1. A real-world price engine example
ID Product Name Price
1 Apple iPad 4 (16 GB) Tablet $589.00
2 iPad 2 16 GB with WiFi in $313.45
3 Apple iPad Air WiFi 32GB $379.99
4 Sealed Apple Ipad4 Wi-fi 16gb $379.99
5 Apple iPad 2, 3, 4 Aluminum $31.95
6 Apple iPad 64 GB with WiFi $897.52
7 Apple iPad Air (64 GB) Tablet $857.87
8 Apple iPad 2 16 GB with WiFi $488.99
Two main varieties were presented in the past by researchers of the subject: similarity-based and
learner-based methods.
Similarity-based techniques require two inputs, a similarity function S that calculates entity
distance and also a threshold T to be applied. The similarity function S(e1,e2) takes a pair of
entities as its input, and outputs a similarity value. The more similar the two entities are according
to the function, the higher the output value is. The basic approach is to calculate the similarity of
all pairs of records. If S(e1,e2) ≥ T is true for the given {e1,e2} entity pair, they are considered to
refer to the same entity. Gravano et al. show in [5] that cosine similarity and other similarity
functions are efficiently applicable where data is mainly textual.
Learner-based techniques handle entity resolution as a type of classification problem and build a
model M to solve it. They represent a pair of entities {e1,e2} as a [a1,…,an] vector of attributes,
where ai is a similarity value of the entities calculated on a single attribute. This approach, as any
learner-based method, requires a training dataset Tr in order to build M. A suitable training set
contains both positive and negative feature vectors corresponding to matching pairs and non-
matching pairs respectively. The classifier built this way can later be applied to label new record
pairs as matching or non-matching entities. Formally the entity pair {e1,e2} resolve to the same
real-world entity if M(Tr, e1,e2, [T/CM]) delivers true. In this case providing threshold T is
optional, as it can be calculated based on a cost matrix CM as well. Sehgal et al. [6] successfully
implement learners like logistic regression and support vector machines to enhance geospatial
data integration. While Breese et al. [1] used decision trees and Bayesian networks for
collaborative filtering. From now on e1↔e2 denotes both S(e1,e2) and M(Tr, e1,e2, [T/CM]).
The rest of the paper is organized as follows: in Section 2 an intriguing application is presented
for motivational reasons then problem formulation and experimental setup is discussed. In
Sections 5 and 6 we go into detail on data preparation and ER methods to be applied, while
results are presented in Section 7. Finally, we conclude in Section 8.
3. Computer Science & Information Technology (CS & IT) 101
2. A MOTIVATING EXAMPLE
We will now motivate the problem of entity resolution in a specific domain, social networks, and
highlight some of the issues that surface using an illustrative example from the telecasting
domain. Consider the problem of trying to construct a database of television programs
broadcasted on major channels and their respective “fanpage”, a place for sharing information
and discussion, collected from Facebook, one of the most popular social networks.
Table 2. Example of an EPG record (a) and its respective fanpage (b)
(a)
Id Day Title Category Start Stop Channel Subtitle Description
605 21-10-
2013
Britain
from
Above
Nature 13:00 14:00 BBC
HD
null Documentary
series in
which
broadcaster
Andrew
Marr…
(b)
Id Name Link Category Likes Website Talking
about
109…813 Britain
From
Above
www.facebook.com/
pages/Britain-From-
Above/109…813
Tv show 282 www.bbc.co.uk/
britainfromabove
3
While Facebook enlists an extensive amount of extremely useful and informative pages, the lack
of central regulation and multitude of duplicate topics make information retrieval cumbersome.
Finding a way to isolate the most suitable page for each television title is not only a practical
service for users, but also an algorithmically difficult task. Apart from the actual content,
fanpages have very few descriptive attributes, and the same goes for television programs acquired
from an EPG1
. Some attributes of a television show extracted from an EPG and its respective
fanpage to be recommended are shown in Table 2, note some marginal variables are omitted.
Performing ER in such an information poor environment is a tolling task.
For any given television show there is a number of suitable fanpages. A fanpage is considered
suitable if its content is closely related to the show. The definition of “the most suitable” page is
strictly subjective, although there are several ground rules to be considered during fanpage
selection.
• Rule 1: Under no circumstances should an unrelated page be recommended.
• Rule 2: When more than one page is eligible, the most suitable is selected.
Violating Rule 1 would lead to user distrust, while the violation of Rule 2 degrades the quality of
service. In context to Rule 2 the most suitable fanpage is the one most interwoven with the subject
and also with the most intensive discussion going on.
Entity resolution in general is a delicate, human input intensive error prone process. In contrast to
our example generic ER methods work on a single entity type only. As we have seen in Section 1
generic methods use attribute similarity functions to calculate entity-level similarity. Since there
are two types of entities discussed here (television program and fanpage) attribute matching is not
1
Electronic Program Guide
4. 102 Computer Science & Information Technology (CS & IT)
so straightforward. Human input is crucial to identify corresponding attributes, and find a way to
establish hooks between the rest, a way of connecting entities. This knowledge is also very
application-specific, there is no proper way to automatically establish the hooks.
3. PROBLEM FORMULATION
As opposed to generic ER this example enlists two distinct entity types; type A and B
corresponds to television programs and social network fanpages respectively. Consider the graph
example G displayed in Figure 1 where the two distinct entity types form the vertices. To identify
related vertices edges (E) are added to the graph, this information is provided by the blocking
method (see Section 4 for more). Optionally the edges can also be weighted based on the
confidence level of the similarity function chosen. The G(A,B,E) graph is bipartite, meaning there
are no edges between identical types. A↔A similarity (comparing TV titles) is out of the scope of
this paper, while B↔B similarity (between fanpages) is only observed as a function of their ties
with their respective type A entity. For example if both A3↔B7 and A3↔B8 delivers a positive
answer then due to the transitivity of the ↔ operator also B7↔B8 resolves to the same entity
weighted by the two edges involved. This approach, called link mining, has been researched
extensively (refer to the survey by Getoor et al. [7]). The achievements of link mining and link-
based ranking are used in algorithms like PageRank and HITS, and are consequently utilized in
the way we select the suitable candidates in the post-processing step.
Figure 1. The telecasting example as a bipartite graph
4. EXPERIMENTAL SETUP
Our experiments were based on data gathered by XMLTV2
, a publicly available backend to
download television listings. We opted for selecting the schedule of four major channels (BBC
News, BBC HD, Al Jazeera English and Animal Planet), and the time period was set to October
2013. The query returned 1110 records to be used in our experiments, due to the innate program
structure of television channels, there is substantial amount of repetition both on a daily and
2
See http://sourceforge.net/projects/xmltv/
5. Computer Science & Information Technology (CS & IT) 103
weekly basis. Since the Facebook API takes only a single query expression, the title of the
television program, to avoid unnecessary querying and processing caused by recurring titles, all
electronic program guide records are aggregated on {Channel, Title, Duration} level. It is safe to
assume that the {Channel, Title, Duration} tuple is a unique identifier for all television shows. All
other query parameters are being set independently of the respective television program.
Aggregating television programs to avoid unnecessary queries results in substantial compression
of the experimental database, 163 unique television shows are extracted of the original 1110
titles. As of 2014 there are 54 million fanpages accessible on Facebook3
, comparing all titles with
all available pages would be utterly unfeasible. To avoid the computational explosion caused by
comparing all available candidates, generic ER approaches use a blocking method, a heuristic to
identify the most probable candidates in order to cut search space significantly. In our case
querying the Facebook search engine is a convenient way of blocking, as it delivers the most
probable pages based on the internal context of Facebook (see Fig. 1).
Executing subsequent Facebook queries for every unique show title presents us with a total 258
fanpage results. That gives us the number of record pair comparisons to make after applying the
blocking method. Figure 2 shows the number of results returned and their respective frequency.
Keep in mind that the number of results per query is limited to 10 as part of the blocking
mechanism. Although the high number of queries with a single result is promising, we should, by
no means, automatically accept and recommend those results as it could violate Rule 1 presented
in Section 2. In any case, evaluation of query results is required.
Figure 2. Distribution of result frequency
5. ENTITY SIMILARITY MEASURES
As we have seen in Section 2 in order to successfully identify matching entities of distinct types
certain connections or hooks are to be set up. This step requires substantial human input. Also
Table 2 shows that very few attributes are available to do so. In case of string attributes (Title and
Name) edit distance and related string functions can be used. Long-winded strings containing
whole paragraphs (e.g. Description) are to be processed to isolate a set of proper nouns and
3
See http://www.statisticbrain.com/facebook-statistics
6. 104 Computer Science & Information Technology (CS & IT)
compare those only. Polynomial variables like Category need preprocessing as well, since they
are from different sources distinct variable values are to be matched first.
As we have seen before the low number of entity attributes account for an information poor
environment. Utilizing any additional data source can boost the performance of the ER process
greatly. Since the whole process is run on-line, as the resolving method relies heavily on
Facebook, other on-line services can be included as well (e.g. Google Search to find the URL of
each channel). With the help of these additional services new variables can be computed.
As the variable Link is available, incidentally the main output of the whole process, the content of
the fanpage can be queried (the same goes for Website). Page content then can be extensively
used to generate hooks, in this case these are flags (yes-no questions) regarding the actual page
content and especially the descriptive About section (since that is publicly available). New
variables include whether the Channel variable or its URL is explicitly mentioned on the page, if
the website link referenced by the fanpage is the actual website of the channel, whether the linked
website is referencing the Channel and its URL and Title at all.
There are a total of 12 new descriptive features created. Flags have a value of either 0 or 1, while
distance like features such as Title edit distance have continuous values on the [0,1] interval.
Numeric variables are normalized in the same interval as well.
6. RESOLVING ALTERNATIVES
In Section 1 we discussed two well-known options for solving ER problems. In our work we
explored the possibility of extending both of them to 2-type ER problems (see Figure 3).
Figure 3. Similarity-based (a) and learner-based (b) techniques extended to multi-type ER
We opted for one of the most common implementation for the similarity-based method by using a
simple score model. This is quite straightforward since all our features are already transformed
into the same interval. On some group of features a maximum was computed first in order to
avoid bias as they referenced the same underlying attributes only in slightly different aspects.
Otherwise by a simple addition the final score is calculated (on a 10-point scale) and a threshold
can be applied. For this setup threshold was manually set to 5 points, which covered the majority
of the examples (see Figure 4 for details).
The learner-based approach treats ER as classification problem using the feature vector generated
in the previous section. We explored several different classifier models implemented in
7. Computer Science & Information Technology (CS & IT) 105
RapidMiner (RM) a free data mining toolset4
. Out of the many possibilities 3 were selected based
on their performance and penetration; Logistic Regression (LR), Support Vector Machine (SVM)
and Random Forest (RF). Both the LR and SVM implementation in question was done by
Stephen Rueping called myKLR and mySVM respectively (see [8]) RM also offers automatic
threshold adjustment based on model confidence, costs and ROC analysis.
Figure 4. Similarity-based approach: respective score frequency and threshold application.
7. RESULTS AND DISCUSSION
Our experiments were carried out on the telecasting dataset using the four different decision
making method described in the previous section. A dataset was prepared by labelling entries
manually for training and testing purposes. For the sake of result comparability we elected three
common known performance measures in machine learning; recall, precision and F-measure. In
case of learner-based methods training was done using 10-fold cross-validation with stratified
sampling and a cost matrix was provided with a 20% extra penalty for the so-called type I
misclassifications (false positive examples).
As seen in Figure 5a all approaches delivered comparable results when looking for “the most
suitable” fanpage. Learner-based methods performed very similarly, with RF falling behind in
precision only. The score model dominated in Recall suggesting a relatively low threshold, but F-
measure compared fairly well to learning methods.
Tests were also carried out after relaxing Rule 2; so that all related fanpages would be acceptable.
In Figure 5b all methods but RF show a substantial drop in recall caused by the foreseeable
increase of false negatives. Compensated by increased precision F-measure increased for all
classification models as well, yielding an almost identical performance. The more rigid approach
of the similarity-based scoring method however failed to satisfy the needs of the modified
environment.
4
See http://www.rapidminer.com
8. 106 Computer Science & Information Technology (CS & IT)
(a)
(b)
Figure 5. Comparing the results of the “most suitable” (a) and “anything suitable” (b) methods
8. CONCLUSIONS
In this paper we proposed a method to successfully extend entity resolution to multitype problems
through an interesting application domain. This particular solution eases information retrieval in
social networks, an increasingly growing field for analysis. The procedure utilizes the
achievements of link mining to effectively bound entities of distinct types. We performed an
extensive evaluation of the proposed methods on an illustrative real-world problem and conclude
9. Computer Science & Information Technology (CS & IT) 107
that the promising results can lead to growing scientific interest and may contribute to valuable
services in the future.
REFERENCES
[1] John S. Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms
for collaborative filtering. In Proceedings of the Fourteenth conference on Uncertainty in artificial
intelligence (UAI'98), Gregory F. Cooper and Serafín Moral (Eds.). Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, -52.
[2] Brizan, D.G., Tansel, A.U.: A survey of entity resolution and record linkage methodologies.
Communications of the IIMA 6(3), 41–50 (2006), http://www.iima.org/CIIMA/8%20CIIMA%206-
3%2041-50%20%20Brizan.pdf
[3] Zhaoqi Chen, Dmitri V. Kalashnikov, and Sharad Mehrotra. 2007. Adaptive graphical approach to
entity resolution. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
(JCDL '07). ACM, New York, NY, USA, 204-213. DOI=10.1145/1255175.1255215
http://doi.acm.org/10.1145/1255175.1255215
[4] Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: crowdsourcing
entity resolution. Proc. VLDB Endow. 5, 11 (July 2012), 1483-1494.
[5] Gravano, L., Ipeirotis, P., Koudas, N., Srivastava, D.: Text joins for data cleansing and integration in
an rdbms. In: Data Engineering, 2003. Proceedings. 19th International Conference on. pp. 729 – 731
(March 2003)
[6] Vivek Sehgal, Lise Getoor, and Peter D Viechnicki. 2006. Entity resolution in geospatial data
integration. In Proceedings of the 14th annual ACM international symposium on Advances in
geographic information systems (GIS '06). ACM, New York, NY, USA, 83-90.
DOI=10.1145/1183471.1183486 http://doi.acm.org/10.1145/1183471.1183486
[7] Lise Getoor and Christopher P. Diehl. 2005. Link mining: a survey. SIGKDD Explor. Newsl. 7, 2
(December 2005), 3-12. DOI=10.1145/1117454.1117456
http://doi.acm.org/10.1145/1117454.1117456
[8] Rueping, S.: mySVM-Manual. University of Dortmund, Lehrstuhl Informatik 8 (2000), URL
http://www-ai.cs.uni-dortmund.de/software/mysvm/
Authors
Gergo Barta received the MSc degree in computer science in 2012 from the Budapest
University of Technology and Economics, Hungary, where he is currently working toward
the PhD degree in the Department of Telecommunications and Media Informatics. His
research interests include spatial data analysis, data mining in interdisciplinary fields, and
machine learning algorithms.