This paper was presented at the 5th International Conference on Social Informatics (http://www.socinfo2013.com/) in Kyoto, Japan on 27 November 2013.
The full paper can be found at: http://link.springer.com/chapter/10.1007%2F978-3-319-03260-3_25
Abstract: Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best our knowledge, this is the first large scale study to highlight the issues with Bitly’s spam detection policies and proposing a suitable countermeasure.
Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.
Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 11% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.
An Approach to Detect and Avoid Social Engineering and Phasing Attack in Soci...IJASRD Journal
Digital physical frameworks are the key advancement driver for some spaces, for example, car, flight, mechanical procedure control, and industrial facility mechanization. Be that as it may, their interconnection possibly gives enemies simple access to delicate information, code, and setups. In the event that aggressors gain control, material harm or even damage to individuals must be normal. To neutralize information burglary, framework control and digital assaults, security instruments must be implanted in the digital physical framework. The social building assault layouts are changed over to social designing assault situations by populating the format with the two subjects and articles from genuine precedents while as yet keeping up the point by point stream of the assault as gave in the format. Social Engineering by E-Mail is by a wide margin the most intensely utilized vector of assault, trailed by assaults beginning from sites. The aggressor in this way misuses the set up trust by requesting that consent utilize the organization's remote system office to send an email. A social designer can likewise join mechanical intends to accomplish the assault goals. The heuristic-based discovery method examines and separates phishing site includes and recognizes phishing locales utilizing that data .Based on the robotized examination of the record in the informal organization, you can construct suppositions about the power of correspondence between clients. In view of this data, it is conceivable to compute the likelihood of achievement of a multistep social building assault from the client to the client in digital physical/digital social framework. Furthermore, the proposed social designing assault layouts can likewise be utilized to create social building mindfulness material.
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
Abstract: Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best our knowledge, this is the first large scale study to highlight the issues with Bitly’s spam detection policies and proposing a suitable countermeasure.
Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.
Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 11% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.
An Approach to Detect and Avoid Social Engineering and Phasing Attack in Soci...IJASRD Journal
Digital physical frameworks are the key advancement driver for some spaces, for example, car, flight, mechanical procedure control, and industrial facility mechanization. Be that as it may, their interconnection possibly gives enemies simple access to delicate information, code, and setups. In the event that aggressors gain control, material harm or even damage to individuals must be normal. To neutralize information burglary, framework control and digital assaults, security instruments must be implanted in the digital physical framework. The social building assault layouts are changed over to social designing assault situations by populating the format with the two subjects and articles from genuine precedents while as yet keeping up the point by point stream of the assault as gave in the format. Social Engineering by E-Mail is by a wide margin the most intensely utilized vector of assault, trailed by assaults beginning from sites. The aggressor in this way misuses the set up trust by requesting that consent utilize the organization's remote system office to send an email. A social designer can likewise join mechanical intends to accomplish the assault goals. The heuristic-based discovery method examines and separates phishing site includes and recognizes phishing locales utilizing that data .Based on the robotized examination of the record in the informal organization, you can construct suppositions about the power of correspondence between clients. In view of this data, it is conceivable to compute the likelihood of achievement of a multistep social building assault from the client to the client in digital physical/digital social framework. Furthermore, the proposed social designing assault layouts can likewise be utilized to create social building mindfulness material.
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYIJNSA Journal
Email is a channel of communication which is considered to be a confidential medium of communication for exchange of information among individuals and organisations. The confidentiality consideration about e-mail is no longer the case as attackers send malicious emails to users to deceive them into disclosing their private personal information such as username, password, and bank card details, etc. In search of a solution to combat phishing cybercrime attacks, different approaches have been developed. However, the traditional exiting solutions have been limited in assisting email users to identify phishing emails from legitimate ones. This paper reveals the different email and website phishing solutions in phishing attack detection. It first provides a literature analysis of different existing phishing mitigation approaches. It then provides a discussion on the limitations of the techniques, before concluding with an explorationin to how phishing detection can be improved.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Vulnerability Assessment and Penetration Testing using Webkillijtsrd
Data is more defenseless than any time in recent memory and each mechanical development raises new security danger that requires new security arrangements. web kill tool is directed to assess the security of an IT framework by securely uncovering its weaknesses. The performance of an application is measured based on the number of false negatives and false positives. Testing technique that is highly automated, which covers several boundary cases by means of invalid data as the application input to make sure that exploitable vulnerabilities are absent. Deepesh Seth | Ms. N. Priya "Vulnerability Assessment and Penetration Testing using Webkill" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd37919.pdf Paper URL : https://www.ijtsrd.com/computer-science/computer-security/37919/vulnerability-assessment-and-penetration-testing-using-webkill/deepesh-seth
A Survey: Data Leakage Detection Techniques IJECEIAES
Data is an important property of various organizations and it is intellectual property of organization. Every organization includes sensitive data as customer information, financial data, data of patient, personal credit card data and other information based on the kinds of management, institute or industry. For the areas like this, leakage of information is the crucial problem that the organization has to face, that poses high cost if information leakage is done. All the more definitely, information leakage is characterize as the intentional exposure of individual or any sort of information to unapproved outsiders. When the important information is goes to unapproved hands or moves towards unauthorized destination. This will prompts the direct and indirect loss of particular industry in terms of cost and time. The information leakage is outcomes in vulnerability or its modification. So information can be protected by the outsider leakages. To solve this issue there must be an efficient and effective system to avoid and protect authorized information. From not so long many methods have been implemented to solve same type of problems that are analyzed here in this survey. This paper analyzes little latest techniques and proposed novel Sampling algorithm based data leakage detection techniques.
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
Reducing the risk pose by phishers and other cybercriminals in the cyber space requires a robust and
automatic means of detecting phishing websites, since the culprits are constantly coming up with new
techniques of achieving their goals almost on daily basis. Phishers are constantly evolving the methods
they used for luring user to revealing their sensitive information. Many methods have been proposed in
past for phishing detection. But the quest for better solution is still on. This research covers the
development of phishing website model based on different algorithms with different set of features in order
to investigate the most significant features in the dataset.
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...IJCNCJournal
The most popular way to deceive online users nowadays is phishing. Consequently, to increase cybersecurity, more efficient web page phishing detection mechanisms are needed. In this paper, we propose an approach that rely on websites image and URL to deals with the issue of phishing website recognition as a classification challenge. Our model uses webpage URLs and images to detect a phishing attack using convolution neural networks (CNNs) to extract the most important features of website images and URLs and then classifies them into benign and phishing pages. The accuracy rate of the results of the experiment was 99.67%, proving the effectiveness of the proposed model in detecting a web phishing attack.
"In the current scenario, the data on the web is growing exponentially. Social media is generating a large amount of data such as reviews, comments, and customer's opinions on a daily basis. This huge amount of user generated data is worthless unless some mining operations are applied to it. As there are a number of fake reviews so opinion mining technique should incorporate Spam detection to produce a genuine opinion. Nowadays, there are a number of people using social media opinions to create their call on shopping for product or service. Opinion Spam detection is an exhausting and hard problem as there are many faux or fake reviews that have been created by organizations or by the people for various purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to degrade their reputations. The proposed technique includes Ontology, Geo location and IP address tracking, Spam words Dictionary using Naïve Bayes, Brand only review detection and tracking account used. Piyush Jain | Karan Chheda | Mihir Jain | Prachiti Lade ""Fake Product Review Monitoring System"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd21644.pdf
Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/21644/fake-product-review-monitoring-system/piyush-jain"
Studying user footprints in different online social networksIIIT Hyderabad
With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users’ online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler
similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for
disambiguating profiles belonging to the same user from different social networks UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we
achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.
This research aims to solve the problem selection to a decision. In the profile matching method, a parameter assessed on the difference between the target value with the value that is owned by an individual. There are two important parameters in this method such as core factors and secondary factors. These values are converted into a percentage of weight so as to produce the final decision as a determinant of the data which will be closer to the predetermined targets. By doing this method, sorting the data against specific criteria that are dynamically preformed.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYIJNSA Journal
Email is a channel of communication which is considered to be a confidential medium of communication for exchange of information among individuals and organisations. The confidentiality consideration about e-mail is no longer the case as attackers send malicious emails to users to deceive them into disclosing their private personal information such as username, password, and bank card details, etc. In search of a solution to combat phishing cybercrime attacks, different approaches have been developed. However, the traditional exiting solutions have been limited in assisting email users to identify phishing emails from legitimate ones. This paper reveals the different email and website phishing solutions in phishing attack detection. It first provides a literature analysis of different existing phishing mitigation approaches. It then provides a discussion on the limitations of the techniques, before concluding with an explorationin to how phishing detection can be improved.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Vulnerability Assessment and Penetration Testing using Webkillijtsrd
Data is more defenseless than any time in recent memory and each mechanical development raises new security danger that requires new security arrangements. web kill tool is directed to assess the security of an IT framework by securely uncovering its weaknesses. The performance of an application is measured based on the number of false negatives and false positives. Testing technique that is highly automated, which covers several boundary cases by means of invalid data as the application input to make sure that exploitable vulnerabilities are absent. Deepesh Seth | Ms. N. Priya "Vulnerability Assessment and Penetration Testing using Webkill" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd37919.pdf Paper URL : https://www.ijtsrd.com/computer-science/computer-security/37919/vulnerability-assessment-and-penetration-testing-using-webkill/deepesh-seth
A Survey: Data Leakage Detection Techniques IJECEIAES
Data is an important property of various organizations and it is intellectual property of organization. Every organization includes sensitive data as customer information, financial data, data of patient, personal credit card data and other information based on the kinds of management, institute or industry. For the areas like this, leakage of information is the crucial problem that the organization has to face, that poses high cost if information leakage is done. All the more definitely, information leakage is characterize as the intentional exposure of individual or any sort of information to unapproved outsiders. When the important information is goes to unapproved hands or moves towards unauthorized destination. This will prompts the direct and indirect loss of particular industry in terms of cost and time. The information leakage is outcomes in vulnerability or its modification. So information can be protected by the outsider leakages. To solve this issue there must be an efficient and effective system to avoid and protect authorized information. From not so long many methods have been implemented to solve same type of problems that are analyzed here in this survey. This paper analyzes little latest techniques and proposed novel Sampling algorithm based data leakage detection techniques.
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
Reducing the risk pose by phishers and other cybercriminals in the cyber space requires a robust and
automatic means of detecting phishing websites, since the culprits are constantly coming up with new
techniques of achieving their goals almost on daily basis. Phishers are constantly evolving the methods
they used for luring user to revealing their sensitive information. Many methods have been proposed in
past for phishing detection. But the quest for better solution is still on. This research covers the
development of phishing website model based on different algorithms with different set of features in order
to investigate the most significant features in the dataset.
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...IJCNCJournal
The most popular way to deceive online users nowadays is phishing. Consequently, to increase cybersecurity, more efficient web page phishing detection mechanisms are needed. In this paper, we propose an approach that rely on websites image and URL to deals with the issue of phishing website recognition as a classification challenge. Our model uses webpage URLs and images to detect a phishing attack using convolution neural networks (CNNs) to extract the most important features of website images and URLs and then classifies them into benign and phishing pages. The accuracy rate of the results of the experiment was 99.67%, proving the effectiveness of the proposed model in detecting a web phishing attack.
"In the current scenario, the data on the web is growing exponentially. Social media is generating a large amount of data such as reviews, comments, and customer's opinions on a daily basis. This huge amount of user generated data is worthless unless some mining operations are applied to it. As there are a number of fake reviews so opinion mining technique should incorporate Spam detection to produce a genuine opinion. Nowadays, there are a number of people using social media opinions to create their call on shopping for product or service. Opinion Spam detection is an exhausting and hard problem as there are many faux or fake reviews that have been created by organizations or by the people for various purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to degrade their reputations. The proposed technique includes Ontology, Geo location and IP address tracking, Spam words Dictionary using Naïve Bayes, Brand only review detection and tracking account used. Piyush Jain | Karan Chheda | Mihir Jain | Prachiti Lade ""Fake Product Review Monitoring System"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd21644.pdf
Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/21644/fake-product-review-monitoring-system/piyush-jain"
Studying user footprints in different online social networksIIIT Hyderabad
With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users’ online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler
similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for
disambiguating profiles belonging to the same user from different social networks UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we
achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.
This research aims to solve the problem selection to a decision. In the profile matching method, a parameter assessed on the difference between the target value with the value that is owned by an individual. There are two important parameters in this method such as core factors and secondary factors. These values are converted into a percentage of weight so as to produce the final decision as a determinant of the data which will be closer to the predetermined targets. By doing this method, sorting the data against specific criteria that are dynamically preformed.
In present times any marketing or customer strategy is incomplete without a social media presence. With customers depending all the more on social media channels to access and disseminate information and reviews, it becomes all the more important for organizations to tap social media channels for actionable insights.
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...kcortis
This paper was presented at the Fifth International Workshop on Resource Discovery (RED 2012: http://www.labf.usb.ve/RED2012/) at ESWC 2012 (http://2012.eswc-conferences.org/) Conference in Heraklion, Crete, Greece on 27 May 2012.
The full paper can be found at: http://ceur-ws.org/Vol-862/REDp5.pdf
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
This is a short presentation about the FAIR Metrics Evaluator - software that automates the application of FAIR Metrics against a given resource, in order to determine its degree of "FAIRness"
Candidate Ranking and Evaluation System based on Digital FootprintsIOSRjournaljce
Digital resume provides insights about a candidate to the organization. This paper proposes a system where digital resumes of candidates are generated by extracting data from social networking sites like Facebook, Twitter and LinkedIn. Data which is relevant to recruitment is obtained from unstructured data using Data Mining algorithms. Candidates are evaluated based on their digital resumes and ranked accordingly. Ranking is done based on the requirements specified by an organization for a key position. The key aspects of this paper are a) Specification and design of system. b) Generation of digital Resume. c) Ranking of candidates. According to the ranking provided by this system, Recruiters can shortlist candidates for interviews. Thus, it revolutionizes the traditional recruitment process.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
Network Analysis for SEO and Social MediaMediative
Network graphs and network analysis have applications in a number of fields. Learn how this can be applied to SEO and social media - an SMX West Presentation.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
An Ontology-based Technique for Online Profile Resolution
1. www.insight-centre.org
An Ontology-based Technique for
Online Profile Resolution
Keith Cortis, Simon Scerri, Ismael Rivera,
Siegfried Handschuh
International Conference on Social Informatics
Kyoto, Japan
27th November 2013
2. Introduction (1)
www.insight-centre.org
Instance Matching : if two instances /
representations refer to the same real world
entity or not e.g., persons
Research Challenge : Discovery of multiple
online profiles that refer to the same person
identity on heterogeneous social networks
3. Introduction (2)
www.insight-centre.org
Improved profile matching system extended
with:
Named
Entity Recognition
Linked Open Data
Semantic Matching
Additional Benefit: Ontology used
background schema
Advantage: Standard schema enables
cross-network interoperability
as
a
4. Motivation
www.insight-centre.org
Contact Matcher Applications:
Control sharing of personal data
Detection of fully or partly anonymous
contacts
o
> 83 million fake accounts
New contacts suggestions that are of direct
interest to user
5. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
6
Online Profile Merging
Attribute Weighting
Function
7. Semantic Lifting
www.insight-centre.org
Lifting semi-/un-structured profile information
from a remote schema
Transform information to instances of the
Contact Ontology (NCO)
NCO - Identity-related online profile information
8. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Large KB
Gazetteer
Surname
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
Country
10. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Large KB
Gazetteer
Surname
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
Semantic-based
Matching Extension
City
Country
country
Country
11. Semantic-based Matching
www.insight-centre.org
Indirect semantic relations at a schema level
Use-case: Location-related profile attributes
Location sub-entities being semantically
compared are: city, region and country
Find the semantic relations between the subentities in question in a bi-directional manner
E.g. Galway (profile 1) vs. Ireland (profile 2)
Galway
locatedWithin
Ireland
Ireland
country
isPartOf
isLocationOf
containsLocation
Galway
capital
largestCity
12. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
country
Country
Attribute Weighting
Function
13. Attribute Weighting Function
www.insight-centre.org
Approach 1: Direct Similarity Score
Name
Justin Bieber
Similarity Value
J. Bieber
0.90
Approach 2: Normalised Similarity Score
based on a threshold for each attribute type
Attribute Threshold for Name : 0.70
Name
Justin Bieber
J. Bieber
Metric Similarity Value
0.90
Similarity Value
1.0
Name
Justin Bieber
Joffrey Baratheon
Metric Similarity Value
0.4
Similarity Value
0.0
14. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
Attribute Weighting
Function
15. Online Profile Suggestions
www.insight-centre.org
Name
Joffrey Baratheon
Joff Baratheon
City
King’s Landing
King’s Landing
Role
King
King
286AL
286AL
Date of Birth
Similarity Score
0.95
Similarity Threshold: 0.90
Name
Joffrey Baratheon
Joffrey Bieber
City
King’s Landing
London, Ontario
Role
King
Singer
286AL
01/03/1994
Date of Birth
Similarity Score
0.30
17. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
6
Online Profile Merging
Attribute Weighting
Function
18. Experiments & Evaluation
www.insight-centre.org
Two-staged evaluation:
1. Technique
a) Best attribute similarity score approach
b) If NER & semantic-based matching extension
improve overall technique
c) The computational performance of hybrid
technique against the syntactic-based one
d) A similarity threshold that determines profile
equivalence within a satisfactory degree of
confidence
2. Usability
e) Level of precision for the profile matching
19. Technique Evaluation
www.insight-centre.org
Two Datasets:
1. A controlled dataset of public profiles obtained
from the Web (LinkedIn and Twitter)
182 online profiles
–
–
112 ambiguous real-world
persons (common attributes)
70 refer to 35 well-known
sports journalists
Maximised False Positives
2. Private personal and contact-list profiles
obtained from 5 consenting participants
21. Technique Evaluation – Experiment 2
www.insight-centre.org
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
String
Technique
Precision
Recall
F1-Measure
0.7
0.75
Threshold value
0.8
Result
Result
String-based technique vs. String + NER + Semanticbased technique
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Hybrid
Technique
Precision
Recall
F1-Measure
0.7
0.75
0.8
Threshold value
New hybrid technique improves the results
considerably over the string-only based one
F-measure -> more or less stable for thresholds of
0.75 and 0.8.
22. Technique Evaluation – Experiment 3
www.insight-centre.org
Computational performance of hybrid technique vs.
syntactic-only based one
For this test we selected profile pairs:
Having a number of common attributes
At least 1 attribute candidate for semantic matching
40
35
Time (ms)
30
25
20
Syntactic
15
Hybrid
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of Common Attributes
On average hybrid technique takes ≈15ms more
24. Usability Evaluation (1)
www.insight-centre.org
Quantitative & Qualitative
Performance of profile matching technique
Contact matcher run against the two social
networks that user is most active
Social Networks chosen:
Number of participants: 16
Person suggestion page
Short survey about their user experience
28. Limitations
www.insight-centre.org
Person’s gender is not provided by all social
network APIs
Identify gender based on first name or
surname through NER
Weights of some profile attributes e.g., first
name, surname are too high
In some cases they impact the final result too
strongly
More experiments will be conducted to finetune these weights
29. Future Work
www.insight-centre.org
Consider identification of higher degrees of
semantic relatedness
country
Enrich technique with other LOD cloud datasets
Additional social networks targeted
30. Conclusion
www.insight-centre.org
Profile matching algorithm with:
Semantic Lifting
NER on semi-/un-structured profile information
Linked Open Data to improve the NER process
Semantic matching at the schema level to find
any possible indirect semantic relations
Weighted Profile Attribute Matching
Quantitative & Qualitative Evaluation
Thank you for your attention
31. Related Work Comparison
www.insight-centre.org
Existing Profile Matching Approaches based on:
User’s friends
Specific Inverse Functional Properties e.g., email
address
String matching of all profile attribute
Semantic relatedness between text, depending
on remote Knowledge Bases e.g., Wikipedia
Evaluation of these Approaches:
Technique Evaluation on controlled datasets
No Usability Evaluation