The document discusses analyzing social networks online. It begins with defining social networks and the reasons for analyzing them, such as marketing, customer relationship management, and identifying new trends. It then discusses how to analyze social networks, including the types of data and analysis methods used. As an example, it provides details on analyzing one social network with 5,928 edges and 1,953 nodes, including the appropriate number of clusters calculated and their properties. The sources listed at the end provide more information on topics like anomaly detection techniques.
1) The document discusses using k-means clustering to analyze big data. K-means is an algorithm that partitions data into k clusters based on similarity.
2) It provides background on big data characteristics like volume, variety, and velocity. It also discusses challenges of heterogeneous, decentralized, and evolving data.
3) The document proposes applying k-means clustering to big data to map data into clusters according to its properties in a fast and efficient manner. This allows statistical analysis and knowledge extraction from large, complex datasets.
Statistical analysis and data mining both involve analyzing data, but have different objectives. Statistical analysis aims to describe datasets, while data mining aims to model data to predict, simulate, and optimize. Statistical analysis uses established methodology and hypothesis testing on structured data, while data mining uses heuristics to uncover hidden patterns in large, complex datasets. Data science incorporates techniques from statistics, data mining, and other fields to extract meaningful knowledge from data.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Characterizing Data and Software for Social Science ResearchMicah Altman
This presentation describes the landscape of data and software use across the social sciences in terms of the abstract dimensions of data and data use. It then examines three use cases.
Presentation for DASPOS < https://daspos.crc.nd.edu/index.php/workshops/workshop-2 > Workshop at JCDL.
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
This document discusses link mining and its application in detecting anomalies. It begins by defining link mining as focusing on discovering explicit links between objects, as opposed to data mining which aims to find patterns within datasets. The document then surveys different types of anomalies that can be detected through link mining, including contextual, point, collective, online, and distributed anomalies. It also discusses challenges in link mining like logical vs statistical dependencies and the skewed class distribution problem in link prediction. Applications of link mining mentioned include social networks, epidemiology, and bibliographic analysis. Overall, the document provides an overview of the emerging field of link mining and its relevance for detecting unusual or anomalous links within linked datasets.
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
This document contains references from various papers and books on topics related to user modeling, information retrieval, recommender systems, and machine learning. Specifically, it lists over 30 references used in research on statistical machine learning, collaborative filtering algorithms, information needs modeling, language modeling, text clustering, and other areas related to adaptive information access and recommendation. The references are from a variety of published sources including conference proceedings, journals, books, and theses spanning the years 1979-2003.
1) The document discusses using k-means clustering to analyze big data. K-means is an algorithm that partitions data into k clusters based on similarity.
2) It provides background on big data characteristics like volume, variety, and velocity. It also discusses challenges of heterogeneous, decentralized, and evolving data.
3) The document proposes applying k-means clustering to big data to map data into clusters according to its properties in a fast and efficient manner. This allows statistical analysis and knowledge extraction from large, complex datasets.
Statistical analysis and data mining both involve analyzing data, but have different objectives. Statistical analysis aims to describe datasets, while data mining aims to model data to predict, simulate, and optimize. Statistical analysis uses established methodology and hypothesis testing on structured data, while data mining uses heuristics to uncover hidden patterns in large, complex datasets. Data science incorporates techniques from statistics, data mining, and other fields to extract meaningful knowledge from data.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Characterizing Data and Software for Social Science ResearchMicah Altman
This presentation describes the landscape of data and software use across the social sciences in terms of the abstract dimensions of data and data use. It then examines three use cases.
Presentation for DASPOS < https://daspos.crc.nd.edu/index.php/workshops/workshop-2 > Workshop at JCDL.
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
This document discusses link mining and its application in detecting anomalies. It begins by defining link mining as focusing on discovering explicit links between objects, as opposed to data mining which aims to find patterns within datasets. The document then surveys different types of anomalies that can be detected through link mining, including contextual, point, collective, online, and distributed anomalies. It also discusses challenges in link mining like logical vs statistical dependencies and the skewed class distribution problem in link prediction. Applications of link mining mentioned include social networks, epidemiology, and bibliographic analysis. Overall, the document provides an overview of the emerging field of link mining and its relevance for detecting unusual or anomalous links within linked datasets.
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
This document contains references from various papers and books on topics related to user modeling, information retrieval, recommender systems, and machine learning. Specifically, it lists over 30 references used in research on statistical machine learning, collaborative filtering algorithms, information needs modeling, language modeling, text clustering, and other areas related to adaptive information access and recommendation. The references are from a variety of published sources including conference proceedings, journals, books, and theses spanning the years 1979-2003.
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
This document contains information about a Data Mining and Warehousing course taught by Mr. Sagar Pandya at Medi-Caps University. The course code is IT3ED02 and it is a 3 credit course taught over 3 hours per week. The document provides details about the course units which include introductions to data mining, association and classification, clustering, and business analysis. It also lists reference textbooks and includes sections taught by Mr. Pandya on topics like the basics of data mining, techniques, applications and challenges.
The FACT platform is an open, federated AI system that evaluates news streams, assigns trust ratings to content and sources, and adjusts these ratings over time based on new stories. It includes memory and intelligence engines to generate narratives, produce counterfactuals, and rate the trustworthiness of articles. FACT is a distributed platform that federates through self-organization and novel human-AI interaction design. Its target audiences are citizens, journalists, and civic writers. The first year goals are to develop the FACT platform, run experiments with 500+ citizens, and launch a FACT reporting channel. The core team developing FACT has expertise in AI, computational modeling, and evaluating digital platforms and algorithms.
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses data mining and knowledge discovery in databases. It defines data mining as the nontrivial extraction of implicit and potentially useful information from large amounts of data. With huge increases in data collection and storage, data mining aims to analyze data and discover patterns that can provide insights and knowledge about businesses and the real world. The data mining process involves selecting, preprocessing, transforming, and analyzing data to extract hidden patterns and relationships, which are then interpreted and evaluated.
The document discusses analyzing political trends on social networks using the Hidden Markov Model. It begins by introducing how social network data can be analyzed to observe user behaviors and interests. It then discusses using NodeXL to gather Twitter data based on political keywords and applying the Hidden Markov Model to statistically analyze the data and determine what political topics people focus on most. Finally, it reviews related work where other researchers have used techniques like the Hidden Markov Model and social network analysis to gather and analyze data from social media platforms.
This document provides an introduction and overview of data mining. It discusses how data mining extracts knowledge from large amounts of data to discover hidden patterns and predict future trends. It notes that for effective data mining, data sets need to be extremely large. The document outlines some key techniques of data mining including associative learning, artificial neural networks, clustering, genetic algorithms, and hidden Markov models. It also discusses applications of data mining in bioinformatics such as gene finding, protein function prediction, and disease diagnosis. Finally, it acknowledges that while bioinformatics data is rich, developing comprehensive theories remains challenging but creates opportunities for novel knowledge discovery methods.
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
This presentations is a supplementary material for presenting the "Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business" (authored by Anastasija Nikiforova and Natalija Kozmina) research paper during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based)
Read paper here -> Nikiforova, A., & Kozmina, N. (2021, November). Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 66-73). IEEE -> https://ieeexplore.ieee.org/abstract/document/9660802?casa_token=LFJa20LrXAwAAAAA:wVwhTcCPWqxdloAvDQ3-l98KkkLx70xzG3zNvIIkJbC6wvJ4VxwX_VGc3mmW_7c1T-QJlOtTiao
The document discusses a link mining methodology adapted from the CRISP-DM process to incorporate anomaly detection using mutual information. It applies this methodology in a case study of co-citation data. The methodology involves data description, preprocessing, transformation, exploration, modeling, and evaluation. Hierarchical clustering identified 5 clusters, with cluster 1 showing strong links and cluster 5 weak links. Mutual information validated the results, showing cluster 5 had the lowest mutual information, indicating independent variables. The case study demonstrated the approach can interpret anomalies semantically and be used with real-world data volumes and inconsistencies.
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...IJMTST Journal
Despite the fact that big data technologies appear to be overhyped and guaranteed to have extraordinary potential in the space of pharmaceutical, if the improvement happens in coordinated condition in mix with other showing strategies, it will going to ensure an unvarying redesign of in-silico solution and prompt positive clinical reception. This proposed explore is wanted to investigate the real issues with a specific end goal to have a compelling coordination of enormous information analytics and effective modeling in healthcare.
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
Building Effective Visualization Shiny WVFOlga Scrivner
This document provides an overview of web visualization tools and frameworks for business intelligence and data visualization. It discusses reactive web frameworks, the Shiny application framework from RStudio, and the Web Visualization Framework (WVF) developed by the Cyberinfrastructure for Network Science Center. Examples of visualizations created with Shiny and WVF are presented, including Sankey diagrams, streamgraphs, heatmaps, and network maps. The document concludes by discussing the future outlook for WVF and promoting an online course on information visualization.
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...Lu Xiao
This document outlines the speaker's research background and future work on supporting rationale awareness in large-scale online participative activities. It begins with an introduction to the speaker's interdisciplinary research interests at the intersection of information science, human-computer interaction, data science, and social sciences. It then defines rationales and rationale awareness, and discusses related computational approaches the speaker has developed to automatically detect rationales from online discussions. The document concludes by discussing the design of awareness tools to support rationale awareness and outlines areas of future work.
The document provides a literature review on data mining. It discusses data mining concepts such as classification and prediction. Data mining has roots in machine learning, statistics, and artificial intelligence. It involves extracting patterns from large datasets. The document outlines several uses and functions of data mining, including classification, clustering, and anomaly detection. It also gives examples of data mining applications in fields like medicine, banking, insurance, and electronic commerce.
Big Data Analytics : A Social Network ApproachAndry Alamsyah
This document discusses using social network analysis approaches for big data analytics. It begins by introducing social network metrics like centrality and modularity that can be applied to large social network datasets. It then provides examples of how social network analysis has been used to detect terrorist cells and identify research communities. Finally, it outlines the author's research interests and publications in areas like sentiment analysis on social media and using social networks to analyze industries.
The document discusses three potential divides that may emerge in big data research: 1) between developed and developing countries, 2) between academic and commercial sector researchers, and 3) between researchers with strong computational skills versus those with less computational skills. It provides examples of methods used in different country/region contexts and notes a critique of big data research around issues like changing definitions of knowledge, misleading claims of objectivity/accuracy, and new digital divides around data access.
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
PhD proposal: Specialized heuristics for crowdsourcing website designdonellemckinley
The document discusses research on developing heuristics to support the design and evaluation of GLAM and academic crowdsourcing websites. It aims to address the lack of empirically-based guidance for these projects. The research will use Action Design Research methodology to iteratively develop a set of specialized heuristics. These heuristics will provide a tool to help meet project objectives of sufficient participation and high-quality contributions. The heuristics will also support crowdsourcing website design and evaluation practice.
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
This document contains information about a Data Mining and Warehousing course taught by Mr. Sagar Pandya at Medi-Caps University. The course code is IT3ED02 and it is a 3 credit course taught over 3 hours per week. The document provides details about the course units which include introductions to data mining, association and classification, clustering, and business analysis. It also lists reference textbooks and includes sections taught by Mr. Pandya on topics like the basics of data mining, techniques, applications and challenges.
The FACT platform is an open, federated AI system that evaluates news streams, assigns trust ratings to content and sources, and adjusts these ratings over time based on new stories. It includes memory and intelligence engines to generate narratives, produce counterfactuals, and rate the trustworthiness of articles. FACT is a distributed platform that federates through self-organization and novel human-AI interaction design. Its target audiences are citizens, journalists, and civic writers. The first year goals are to develop the FACT platform, run experiments with 500+ citizens, and launch a FACT reporting channel. The core team developing FACT has expertise in AI, computational modeling, and evaluating digital platforms and algorithms.
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses data mining and knowledge discovery in databases. It defines data mining as the nontrivial extraction of implicit and potentially useful information from large amounts of data. With huge increases in data collection and storage, data mining aims to analyze data and discover patterns that can provide insights and knowledge about businesses and the real world. The data mining process involves selecting, preprocessing, transforming, and analyzing data to extract hidden patterns and relationships, which are then interpreted and evaluated.
The document discusses analyzing political trends on social networks using the Hidden Markov Model. It begins by introducing how social network data can be analyzed to observe user behaviors and interests. It then discusses using NodeXL to gather Twitter data based on political keywords and applying the Hidden Markov Model to statistically analyze the data and determine what political topics people focus on most. Finally, it reviews related work where other researchers have used techniques like the Hidden Markov Model and social network analysis to gather and analyze data from social media platforms.
This document provides an introduction and overview of data mining. It discusses how data mining extracts knowledge from large amounts of data to discover hidden patterns and predict future trends. It notes that for effective data mining, data sets need to be extremely large. The document outlines some key techniques of data mining including associative learning, artificial neural networks, clustering, genetic algorithms, and hidden Markov models. It also discusses applications of data mining in bioinformatics such as gene finding, protein function prediction, and disease diagnosis. Finally, it acknowledges that while bioinformatics data is rich, developing comprehensive theories remains challenging but creates opportunities for novel knowledge discovery methods.
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
This presentations is a supplementary material for presenting the "Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business" (authored by Anastasija Nikiforova and Natalija Kozmina) research paper during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based)
Read paper here -> Nikiforova, A., & Kozmina, N. (2021, November). Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 66-73). IEEE -> https://ieeexplore.ieee.org/abstract/document/9660802?casa_token=LFJa20LrXAwAAAAA:wVwhTcCPWqxdloAvDQ3-l98KkkLx70xzG3zNvIIkJbC6wvJ4VxwX_VGc3mmW_7c1T-QJlOtTiao
The document discusses a link mining methodology adapted from the CRISP-DM process to incorporate anomaly detection using mutual information. It applies this methodology in a case study of co-citation data. The methodology involves data description, preprocessing, transformation, exploration, modeling, and evaluation. Hierarchical clustering identified 5 clusters, with cluster 1 showing strong links and cluster 5 weak links. Mutual information validated the results, showing cluster 5 had the lowest mutual information, indicating independent variables. The case study demonstrated the approach can interpret anomalies semantically and be used with real-world data volumes and inconsistencies.
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...IJMTST Journal
Despite the fact that big data technologies appear to be overhyped and guaranteed to have extraordinary potential in the space of pharmaceutical, if the improvement happens in coordinated condition in mix with other showing strategies, it will going to ensure an unvarying redesign of in-silico solution and prompt positive clinical reception. This proposed explore is wanted to investigate the real issues with a specific end goal to have a compelling coordination of enormous information analytics and effective modeling in healthcare.
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
Building Effective Visualization Shiny WVFOlga Scrivner
This document provides an overview of web visualization tools and frameworks for business intelligence and data visualization. It discusses reactive web frameworks, the Shiny application framework from RStudio, and the Web Visualization Framework (WVF) developed by the Cyberinfrastructure for Network Science Center. Examples of visualizations created with Shiny and WVF are presented, including Sankey diagrams, streamgraphs, heatmaps, and network maps. The document concludes by discussing the future outlook for WVF and promoting an online course on information visualization.
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...Lu Xiao
This document outlines the speaker's research background and future work on supporting rationale awareness in large-scale online participative activities. It begins with an introduction to the speaker's interdisciplinary research interests at the intersection of information science, human-computer interaction, data science, and social sciences. It then defines rationales and rationale awareness, and discusses related computational approaches the speaker has developed to automatically detect rationales from online discussions. The document concludes by discussing the design of awareness tools to support rationale awareness and outlines areas of future work.
The document provides a literature review on data mining. It discusses data mining concepts such as classification and prediction. Data mining has roots in machine learning, statistics, and artificial intelligence. It involves extracting patterns from large datasets. The document outlines several uses and functions of data mining, including classification, clustering, and anomaly detection. It also gives examples of data mining applications in fields like medicine, banking, insurance, and electronic commerce.
Big Data Analytics : A Social Network ApproachAndry Alamsyah
This document discusses using social network analysis approaches for big data analytics. It begins by introducing social network metrics like centrality and modularity that can be applied to large social network datasets. It then provides examples of how social network analysis has been used to detect terrorist cells and identify research communities. Finally, it outlines the author's research interests and publications in areas like sentiment analysis on social media and using social networks to analyze industries.
The document discusses three potential divides that may emerge in big data research: 1) between developed and developing countries, 2) between academic and commercial sector researchers, and 3) between researchers with strong computational skills versus those with less computational skills. It provides examples of methods used in different country/region contexts and notes a critique of big data research around issues like changing definitions of knowledge, misleading claims of objectivity/accuracy, and new digital divides around data access.
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
PhD proposal: Specialized heuristics for crowdsourcing website designdonellemckinley
The document discusses research on developing heuristics to support the design and evaluation of GLAM and academic crowdsourcing websites. It aims to address the lack of empirically-based guidance for these projects. The research will use Action Design Research methodology to iteratively develop a set of specialized heuristics. These heuristics will provide a tool to help meet project objectives of sufficient participation and high-quality contributions. The heuristics will also support crowdsourcing website design and evaluation practice.
The document discusses the emergence of data-driven science and computational social science. It covers several key areas:
- The growth of computational approaches and use of digital tools to manage large datasets in social science research.
- Debate around the role of theory and whether big data means the "end of theory". While data can provide insights, context from experts is still needed.
- The development of new research areas like data science, computational social science, and webometrics that utilize digital methods and focus on analyzing online data.
- Challenges in the field including uneven global development of data skills and divides between computational and non-computational researchers.
ABSTRACT : Computational social science (CSS) is an academic discipline that combines the traditional social sciences with computer science. While social scientists provide research questions, data sources, and acquisition methods, computer scientists contribute mathematical models and computational tools. CSS uses computationally methods and statistical tools to analyze and model social phenomena, social structures, and human social behavior. The purpose of this paper is to provide a brief introduction to computational social science.
Key Words: computational social science, social-computational systems, social simulation models, agent-based models
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
Keynote Integrative Bioinformatics 2018
https://docs.google.com/document/d/1E7D4_CS0vlldEcEuknXjEnSBZSZCJvbI5w1FdFh-gG4/edit
Can we improve research productivity through providing answers stemming from knowledge graphs? In this presentation, I discuss different ways of building and combining knowledge graphs.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This document provides information about a computational intelligence and soft computing course including the instructor's contact information, class times, required text, and an overview of upcoming lectures on data mining with neural networks. It then discusses key issues in data mining such as theory, methods/algorithms, processes, applications, and tools/techniques. Several example data mining projects are also summarized along with homework and exam due dates for the course.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining is needed due to the massive growth of data. It defines data mining as the extraction of interesting patterns from large datasets. The document outlines the key steps in the knowledge discovery process and how data mining fits within business intelligence applications. It also describes different types of data that can be mined and popular data mining algorithms.
Open Grid Forum workshop on Social Networks, Semantic Grids and WebNoshir Contractor
Workshop organized by David De Roure at the Open Grid Forum XIX. Other participants included Carole Gobler, Jeremy Frey, Pamela Fox.
January 29, 2007, Chapel Hill, NC
Top 5 most viewed articles from academia in 2019 - gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications
Positioning and presenting design science research for maximum impactNauman Shahid
Design science research (DSR) has staked its rightful ground as an important and legitimate Information
Systems (IS) research paradigm. We contend that DSR has yet to attain its full potential impact on the development
and use of information systems due to gaps in the understanding and application of DSR concepts and
methods. This essay aims to help researchers (1) appreciate the levels of artifact abstractions that may be DSR
contributions, (2) identify appropriate ways of consuming and producing knowledge when they are preparing
journal articles or other scholarly works, (3) understand and position the knowledge contributions of their
research projects, and (4) structure a DSR article so that it emphasizes significant contributions to the knowledge
base. Our focal contribution is the DSR knowledge contribution framework with two dimensions based
on the existing state of knowledge in both the problem and solution domains for the research opportunity under
study. In addition, we propose a DSR communication schema with similarities to more conventional publication
patterns, but which substitutes the description of the DSR artifact in place of a traditional results section.
We evaluate the DSR contribution framework and the DSR communication schema via examinations of DSR
exemplar publications.It is clear from the preceding that every “art” [technique] has its speculative and its practical side. Its speculation
is the theoretical knowledge of the principles of the technique; its practice is but the habitual and instinctive
application of these principles. It is difficult if not impossible to make much progress in the application
without theory; conversely, it is difficult to understand the theory without knowledge of the technique.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This document discusses the rise of big data and data science. It notes that while data volumes are growing exponentially, data alone is just an asset - it is data scientists that create value by building data products that provide insights. The document outlines the data science workflow and highlights both the tools used and challenges faced by data scientists in extracting value from big data.
The workshop opens with a discussion of how to repurpose digital "methods of the medium" for social and cultural scholarly research, including its limitations, critiques and ethics. Subsequently participants are trained in using digital methods in hands-on sessions. How to use crawlers for dynamic URL sampling and issue network mapping? How to employ scrapers to create a bias or partisanship diagnostic instrument? We also consider how to deploy online platforms for social research. How to transform Wikipedia from an online encyclopaedia to a device for cross-cultural memory studies? How to make use of social media so as to profile the preferences and tastes of politicians’ friends, and also locate most engaged with content? How to make use of Twitter analytics to debanalize tweets, and provide compelling accounts of events on the ground? Finally, the workshop turns to the question of employing web data and metrics as societal indices more generally.
Similar to تحلیل شبکههای اجتماعی چرا و چگونه (20)
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
18. منابع
[۱] Chandola V, Banerjee A and Kumar V "Anomaly Detection: A Survey", ACM Computer Survey, 41(3), Article
15, 1-58, 2009.
[۲] Hodge V and Austin J "A Survey of Outlier Detection Methodologies", Artificial Intelligence Review, 22(2),
85–126, 2004.
[۳] Hassanzadeh R Anomaly Detection in Online Social Networks: Using Data Mining Techniques and Fuzzy
Logic, School of Electrical Engineering and Computer Science, Queensland University of Technology, 2014.
[۴] Shearer C "The CRISP-DM model: The New Blueprint for Data Mining", Journal of Data Warehousing, 5,
13-22, 2000.
[۵] Tran D, Ma W and Sharma D "Network Anomaly Detection Using Fuzzy Gaussian Mixture Models",
International Journal of Future Generation Communication and Networking, 1(1), 37-42, 2006.
[۶] Kaur H and Gill N "Host Based Anomaly Detection Using Fuzzy Genetic Approach (FGA)", International
Journal of Computer Applications, 74(20), 2013.
]۷[در ناهنجاری تشخیص برای مصنوعی زنبورهای کلونی و منفی انتخاب هایالگوریتم بر مبتنی ترکیبی رویکردی م آبادی و ف بارانی"
متحرک اقتضایی هایشبکه"ایران رمز انجمن المللیبین کنفرانسشهریور ،مشهد فردوسی دانشگاه ،۱۳۹۰.
]۸[، فازی گیریرای روش با خوشه بر مبتنی اقتضایی هایشبکه در ناهنجاری تشخیص س جلیلی و م منش رحمانی" "برق مهندسی نشریه
ایران کامپیوتر مهندسی و) ،۲(۱۰زمستان ،۱۳۹۱.
[9] Izakian H and Pedrycz W "Anomaly Detection in Time Series Data using a Fuzzy C-Means Clustering" Fifth
International Symposium on Computational Intelligence and Design (ISCID), IEEE, 1513-1518, China, 2013.
[10] [10] Rabatel J, Bringay S and Poncelet P "Fuzzy Anomaly Detection in Monitoring Sensor Data" 18th
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, 1-8, Spain, 2011.
19. منابع
[۱۱] Gómez J, González F and Dasgupta D "An Immuno-Fuzzy Approach to Anomaly Detection" The 12th IEEE
International Conference on Fuzzy Systems, 2, 1219-1224, USA, 2003.
[۱۲] Parvath Devi K and Siva Prasad YA "Study of Anomaly Identification Techniques in Large scale systems",
International Journal of Computer Trends and Technology, 3(1), 11-17, 2012.
[۱۳] Doostari M A, Zeinali R, Lashkari H and Ajamzamani M "Anomaly Detection in Cliques of Online Social
Networks Using Fuzzy Node-Fuzzy Graph",Journal of Basic and Applied Scientific Research, 3(8), 614-626,
2013.
[۱۴] Shamshirband Sh, Anuar N B, Mat Kiah L and Misra S "Anomaly Detection using Fuzzy Q-learning
Algorithm", Acta Polytechnica Hungarica, 11(8), 5-28, 2014.
[۱۵] Jabez J and Muthukumar B "Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection
Approach", Procedia Computer Science, sciencedirect, 48, 338-346, 2015.
[۱۶] Savage D, Zhang X, Yu X, Choua P b, Wang Q "Anomaly Detection in Online Social Networks", Social
Networks, sciencedirect, 39, 62-70, 2014.
[۱۷] www.systems-thinking.org/dikw/dikw.htm, Bellinger G, Castro D and Mills A, 2 January 2017, 23:00.
[۱۸] Mouthann N Effects of Big Data Analytic on Organization's Value Creation, Faculty of science and faculty of
economics and business, University of Amsterdam, 2012.
[۱۹] www.computable.nl/artikel/opinie/business-intelligence/4485335/1509029/big-data-wat-was-de-vraag.html,
Ras P, 2 January 2017, 23:30.
[۲۰] www.sync.nl/de-opvolger-van-de-cloud-big-data, Leeuwen R V, 2 January 2017, 23:30.
20. منابع
[۲۱] Han J and Kamber M and Cerra D Data Mining: Concepts and Techniques, Morgan Kaufmann,Waltham,
USA, 2012.
[۲۲] Kantardzic M Data Mining: Concepts, Models, Methods and Algorithms, Second Edition, John Wiley &
Sons, Hoboken, USA, 2011.
[۲۳] Tamilselvi R and Kalaiselvi S "An Overview of Data Mining Techniques and Applications", International
Journal of Science and Research (IJSR), 2(2), 506-509, 2013.
[۲۴] www.thearling.com/text/dmwhite/dmwhite.htm, Gartner Group High Performance Computing Research, 2
January 2017, 16:30.
[۲۵] Tan p, Steincach M and Kumar V Introduction to Data Mining, Addison Wesley, 2005.
[۲۶] Oracle BI Group Oracle Data Mining Concepts, Release 1, Oracle, 2008.
[۲۷] Gibert K, Sànchez-Marrè M and Codina V "Choosing the Right Data Mining Technique: Classification of
Methods and Intelligent Recommendation" International Congress on Environmental Modelling and Software
Modelling for Environment’s Sake, International Environmental Modelling and Software Society (iEMSs),
Canada, 2010.
[۲۸] Cheeseman P and Oldford R W Selecting models from data, Springer, 1994.
[۲۹] www.data-mining.philippefournierviger.com/whatarethestepstoimplementadataminingalgorithm, Fournier
Viger Ph, 2 January 2017, 20:30.
[۳۰] www.msdn.microsoft.com/en-us/library/ms175595.aspx, Anon, 2 January 2017, 21:00.
[۳۱] Witten I.H, Frank E and Hall M.A Data Mining: Practical Machine Learning Tools and Techniques, 3rd
Edtion, Morgan Kaufmann,Waltham, USA, 2011.
21. منابع
[۳۲] Chikohora T T "A Study of the Factors Considered when Choosing an Appropriate Data Mining Algorithm",
International Journal of Soft Computing and Engineering (IJSCE), 4(3), 42-45, July 2014.
[۳۳] www.ibm.com/developerworks/ru/library/ba-data-mining-techniques, Brown M, 2 January 2017, 21:30.
[۳۴] www.iaonline.theiia.org/data-mining-101-tools-and-techniques, Silltow J, 2 January 2017, 21:45.
[۳۵] Chung M.H and Gray P "Special Section Data mining", Journal of Management Information Systems, 16(1),
1999.
[۳۶] Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C and Wirth R CRISP-DM 1.0, SPSS, 2000.
[۳۷] Marsala Ch And Bounchon-Meunier B "Fuzzy Data Mining and Management of Interpretable and Subjective
Information", Elsevier Fuzzy Sets and Systems Journal, 281, 252–259, December 2015.
[۳۸] Casillas J, Cordón O, Herrera F and Magdalena L "Interpretability Improvements to Find the Balance
Interpretability Accuracy in Fuzzy Modeling: An Overview, Interpretability Issues in Fuzzy Modeling",
Springer, Berlin Heidelberg, 128, 3–22, 2003.
[۳۹] Elouedi Z, Mellouli K and Smets P "Belief decision trees: Theoretical Foundations", International Journal of
Approximate Reasoning, 28, 91–124, 2001.
[۴۰] Bujnowski P, Szmidt E and Kacprzyk J "Intuitionistic Fuzzy Decision Trees: A New Approach" 13th
International Conference on Artificial Intelligence and Soft Computing (ICAISC), Vol 8467 of Lecture Notes in
Computer Science, Springer, 181–192, 2014.
[۴۱] www.lrs.ed.uiuc.edu/TSE-portal/analysis/social-network-analysis, Gretzel U, 2 January 2017, 18:00.
[۴۲] Hogan B, Carrasco J A and Wellman B "Visualizing Personal Networks: Working with Participant-aided
Sociograms", Field Methods, Sage Publications, 19(2), 116-144, May 2007.
22. منابع
[۴۳] Freeman L The Development of Social Network Analysis, Empirical Press, Vancouver, 2006.
[۴۴] Otte E and Rousseau R "Social Network Analysis: a powerful strategy, also for the information sciences",
Journal of Information Science, 28(6), 441-453, 2002.
[۴۵] McPherson N, Smith-Lovin L and Cook J.M "Birds of a Feather: Homophily in Social Networks", Annual
Review of Sociology, 27, 415–444, 2001.
[۴۶] Podolny, J M and Baron J N "Resources and Relationships: Social Networks and Mobility in the
Workplace", American Sociological Review, 62(5), 673-693, 1997.
[۴۷] Flynn, F J, Reagans R E and Guillory L "Do You Two Know Each Other? Transitivity, Homophily and the
Need for (Network) Closure", Journal of Personality, 99(5), November 2010.
[۴۸] Maksim T and Alexander K Social Network Analysis for Startups: Finding Connections on the Social Web,
O'Reilly, 2011.
[۴۹] Guandong X, Xu Yu J and Lee W Web Mining and Social Networking: Techniques and Applications,
Springer, 2012.
[۵۰] www.intersci.ss.uci.edu/wiki/index.php/Cohesive_blocking, 2 January 2017, 23:30.
[۵۱] Hanneman R.A and Mark R "Concepts and Measures for Basic Network Analysis", The Sage Handbook of
Social Network Analysis, Sage, 346–347, 2005.
[۵۲] Pattillo J, Youssef N and Butenko S "On Clique Relaxation Models in Social Network Analysis", European
Journal of Operational Research, 226, 9-18, 20, 2013.
[۵۳] Kurka D.B, Godoy A and Von Zuben F.J "Online Social Network Analysis: A Survey of Research
Applications in Computer Science", The Computing Research Repository (CoRR), 2016.
23. منابع
[۵۴] Teng H, Chen K and Lu S " Adaptive Real-Time Anomaly Detection Using Inductively Generated
Sequential Patterns", IEEE Computer Society Symposium on Research in Security and Privacy, IEEE Computer
Society Press, 278–284, 1990.
[۵۵] Rousseeuw P J and Leroy A M Robust Regression and Outlier Detection, John Wiley & Sons, USA, 1987.
[۵۶] Markou M and Singh S "Novelty Detection: A Review-part 1: Statistical Approaches", Signal Processing,
Elsevier, 83(12), 2481–2497, 2003.
[۵۷] Markou M and Singh S "Novelty Detection: A Review-Part 2: Neural Network Based Approaches", Signal
Processing, Elsevier, 83(12), 2499–2521, 2003.
[۵۸] Saunders R and Gero J "The Importance of Being Emergent" Conference of Artificial Intelligence in
Design, 2000.
[۵۹] Song X, Wu M, Jermaine C and Ranka S "Conditional Anomaly Detection", IEEE Transactions on
Knowledge and Data Engineering, 19(5), 631–645, 2007.
[۶۰] Weigend, A S, Mangeas M and Srivastava A.N "Nonlinear Gated Experts for Time-Series, Discovering
Regimes and Avoiding Overfitting", International Journal of Neural Systems, 6(4), 373–399, 1995.
[۶۱] Salvador S and Chan P "Learning States and Rules for Time-Series Anomaly Detection", Tech Rep in
Department of Computer Science, Florida Institute of Technology, Melbourne, 2003.
[۶۳] Shekhar S, Lu C T and Zhang P "Detecting Graph-Based Spatial Outliers: Algorithms and Applications (A
Summary of Results)" 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
ACM Press, USA, 371–376, 2001.
24. منابع
[۶۴] Phoha V V "The Springer Internet Security Dictionary", Springer Verlag, 2002
[۶۵] Denning D E, An Intrusion Detection Model", IEEE Transactions of Software Engineering, 13(2), 222–232,
1987.
[۶۶] Fawcett T and Provost F "Activity Monitoring: Noticing Interesting Changes in Behavior" 5th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 53–62, 1999.
[۶۷] Albrecht S, Busch J, Kloppenburg M, Metze F and Tavan P, Generalized Radial Basis Function Networks
for Classification and Novelty Detection: Self-Organization of Optional Bayesian Decision", Neural Networks,
13(10), 1075–1093, 2000.
[۶۸] Emamian V, Kaveh M and Tewfik A "Robust Clustering of Acoustic Emission Signals Using the Kohonen
Network" IEEE International Conference of Acoustics, Speech and Signal Processing IEEE Computer Society,
IEEE, 6, 3891-3894, Turkey, 2000.
[۶۹] Crook P and Hayes G "A Robot Implementation of a Biologically Inspired Method for Novelty Detection"
Towards Intelligent Mobile Robots Conference, Division of Informatics, UK, 2001.
[۷۰] Crook PA, Marsland S, Hayes G and Nehmzow "A Tale of Two Filters On-Line Novelty Detection"
International Conference on Robotics and Automation, 3894–3899, 2002.
[۷۱] Marsland S, Nehmzow U and Shapiro J "A Model of Habituation Applied to Mobile Robots" Towards
Intelligent Mobile Robots (TIMR), Bristol, 1999.
[۷۲] Marsland S, Nehmzow U and Shapiro J "Novelty Detection for Robot Neotaxis" 2nd International
Symposium on Neural Compuatation, 554 – 559, 2000.
[۷۳] Marsland S, Nehmzow U and Shapiro J "A Real-Time Novelty Detector for a Mobile Robot, EUREL
Conference on Advanced Robotics Systems, The Computing Research Repository (CoRR), 2000.
25. منابع
[۷۴] Sun J, Qu H, Chakrabarti D and Faloutsos "Neighborhood Formation and Anomaly Detection in Bipartite
Graphs", 5th IEEE International Conference on Data Mining, IEEE Computer Society, Washington, DC, USA,
418–425, 2005.
[۷۵] Ide T and Kashima H "Eigenspace-Based Anomaly Detection in Computer Systems" 10th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, ACM Press, USA, 440–449, 2004.
[۷۶] MacDonald J W and Ghosh D "Copa–Cancer Outlier Profile Analysis", Bioinformatics, 22(23), 2950–2951,
2007.
[۷۷] Tibshirani R and Hastie T "Outlier Sums for Differential Gene Rxpression Analysis", Biostatistics, 8(1), 2–
8, 2007.
[۷۸] Lu C T, Chen D and Kou Y "Algorithms for Spatial Outlier Detection" 3rd International Conference on
Data Mining, 597–600, 2003.
[۷۹] Lin S and Brown D E "An Outlier-Based Data Association Method for Linking Criminal Incidents" 3rd
SIAM Data Mining Conference, 2003.
[۸۰] Dutta H, Giannella C, Borne K and Kargupta H "Distributed Top-K Outlier Detection in Astronomy
Catalogs Using the Demac System" 7th SIAM International Conference on Data Mining, 2007.
[۸۱] Kou Y, Lu C T and Chen, D "Spatial Weighted Outlier Detection", SIAM Conference on Data Mining, 2006.
[۸۲] Forrest S, Warrender C and Pearlmutter B "Detecting Intrusions Using System Calls: Alternate Data
Models", IEEE ISRSP, IEEE Computer Society, USA, 133–145, 1999.
[۸۳] Sun P, Chawla S and Arunasalam B "Mining for Outliers in Sequential Databases" SIAM International
Conference on Data Mining, 2006.
26. منابع
[۸۴] Noble C C and Cook D J "Graph-Based Anomaly Detection" 9th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, ACM Press, 631–636, 2003.
[۸۵] Kumar V "Parallel and Distributed Computing for Cybersecurity", Online Distributed Systems, IEEE, 6(10),
2005.
[۸۶] Spence C, Parra L and Sajda P "Detection, Synthesis and Compression in Mammo Graphic Image Analysis
With a Hierarchical Image Probability Model" IEEE Workshop on Mathematical Methods in Biomedical Image
Analysis. IEEE Computer Society, USA, 3, 2001.
[۸۷] Fujimaki R, Yairi T and Machida K "An Approach to Spacecraft Anomaly Detection Problem Using Kernel
Feature Space" 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM
Press, USA, 401–410, 2005.
[۸۸] Edgeworth F Y "On Discordant Observations", Philosophical Magazine, 23(5), 364–375, 1887.
[۸۹] Abdulsahib A Kh "Anomaly Detection In Text Data That Represented As A Graph Using DBSCAN
Algorithm", Journal Of Theoretical And Applied Information Technology, 95(9), 2017.
[۹۰]Carvalho L, Teixeira C, Dias E, Meira W and Carvalho O "A Simple And Effective Method For Anomaly
Detection In Health Care" SDM DMMH Workshop, 2015.
[۹۱] Nian Ke, Zhang H, Tayal A, Coleman Th And Li Y "Auto Insurance Fraud Detection Using Unsupervised
Specrtal Ranking For Anomaly", The Journal Of Finance And Data Science, 58-75, 2016.
[۹۲]Kroll B, Schaffranek D, Schriegel S And Niggemann O "System Modelling Based On Machine Learning
For Anomaly Detection And Predective Maintenance In Industrial Plants", IEEE Emerging Technology And
Factory Automation (ETFA), 2014.
27. منابع
[۹۳] Gosh A, Qin Sh, Lee J And Wang G "PLAT: An Automated Fault And Behavioral Anomaly Detection Tool
For PLC Controlled Manufacturing Systems", Computational Intelligence And Neuroscience, Hidwai Publishing
Corporation, 2016.
[۹۴] Boyd D M and Ellison N B "Social Network Sites: Definition, History, and Scholarship", Journal of
Computer Mediated Communication, 13(1), 210-230, 2008.
[۹۵] Sundén J Material Virtualities: Approaching Online Textual Embodiment, Peter Lang Publishing, 2003.
[۹۶] Tang L "Online Friendship", Encyclopedia of Cyber Behavior, 412-421, 2012.
[۹۷] www.aic.gov.au/en/publications/current%20series/rpp/100-120/rpp103.aspx, Choo K R, 2 January 2017,
18:30.
[۹۸] www.socialmediatoday.com/pamdyer/564409/6-types-social-media-users, Rozen D, Askalani M And Senn
T, 2 January 2017, 19:30..
[۹۹] Kumar R, Novak J and Tomkins, A Structure and Evolution of Online Social Networks In Link Mining:
Models, Algorithms, and Applications, Springer,337-357 (1 Chapter of Book), 2010.
[۱۰۰] Thelwall M "Social Networks, Gender, and Friending: An Analysis of MySpace Member Profiles", Journal
of the American Society for Information Science and Technology, 59(8), 1321-1330, 2008.
[۱۰۱] Gupta A, Sycara K P, Gordon G J and Hefn, A "Exploring Friend's Influence in Cultures in Twitter",
Advances in Social Networks Analysis and Mining (ASONAM) IEEE ACM International Conference, 584-591,
USA, 2013.
[۱۰۲] Akoglu L, McGlohon M and Faloutsos C "OddBall: Spotting Anomalies in Weighted Graphs" 14th
Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Springer, 410-421, Berlin,
2010.
28. منابع
[۱۰۴] Gjoka M, Kurant M, Butts C T and Markopoulou A "Walking in Facebook: A Case Study of Unbiased
Sampling of OSNs", INFOCOM, IEEE, 1-9, 2010.
[۱۰۵] Gross R and Acquisti A "Information Revelation and Privacy in Online Social Networks" ACM Workshop on
Privacy in the Electronic Society, 71-80, USA, 2005.
[۱۰۶] Hodge M.J "The Fourth Amendment and Privacy Issues on the New Internet: Facebook.com and
Myspace.com", ULJ, 31(2006), 95-123, 2006.
[۱۰۷] Ugander J, Karrer B, Backstrom L and Marlow C "The Anatomy of the Facebook Social Graph", The
Computing Research Repository (CoRR), 2011.
[۱۰۸] ] Newman M E, Watts D J and Strogatz S.H "Random Graph Models of Social Networks", National
Academy of Sciences of the United States of America, 99(1), 2566-2572. 2002.
[۱۰۹] Chakrabarti D, Faloutsos C and McGlohon M Managing and Mining Graph Data, 69-123 (1 Chapter),
Springer, US, 2010.
[۱۱۰] Guzman jh and Poblete B "On-line Relevant Anomaly Detection in the Twitter Stream: An Efficient Bursty
Keyword Detection Model" ACM SIGKDD Workshop on Outlier Detection and Description, USA, 31-39, 2013.
[۱۱۱] Kaur R and Singh S "A Survey of Data Mining and Social Network Analysis Based Anomaly Detection
Techniques", Egyptian Informatics Journal, 17, 199–216, 2016.
[۱۱۲] Rawat A, Gugnani G, Shastri M and Kumar P "Anomaly Recognition in Online Social Networks",
International Journal of Security and Its Applications, 9(7), 109-118, 2015.
[۱۱۳] Viswanath B, Bashir A B, Crovella M and Guha S "Towards Detecting Anomalous User Behavior in Online
Social Networks" 23rd USENIX Security Symposium, USA, 2014.
29. منابع
[۱۱۴] Egele M, Stringhini G, Kruegel Ch and Vigna G "COMPA : Detecting Compromised Accounts on Social
Networks" 20th Annual Network & Distributed System Security Symposium, USA,2013.
[۱۱۵] Altshuler Y, Fire M, Shmueli E, Elovici E, Bruckstein A, Pentland A and Lazer D "Detecting Anomalous
Behaviors Using Structural Properties of Social Networks", 6th International Conference, 433-440, USA, 2013.
[۱۱۶] Charrad M, Ghazzali N, Boiteau V and Niknafs A "Package NbClust V3.0", The Comprehensive R Archive
Network, 2015.
[۱۱۷] Breunig, M, Kriegel H And Sander J "LOF: Identifying Density-based Local Outliers" ACM SIGMOD
International Conference on Management of Data, 93–104, 2000.
[۱۱۸] Martin E, Hans-Peter K, Jorg S and Xiaowei X "A Density-based Algorithm for Discovering Clusters in
Large Spatial Databases with Noise" Second International Conference on Knowledge Discovery and Data
Mining (KDD-96), 226-231, USA, 1996.
[۱۱۹] David M W "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness And
Correlation", Journal of Machine Learning Technologies, 2(1), 37–63, 2011.