Requirements engineering (RE), as a part of the project development life cycle, has increasingly been recognized as the key to ensure on-time, on-budget, and goal-based delivery of software projects. RE of big data projects is even more crucial because of the rapid growth of big data applications over the past few years. Data processing, being a part of big data RE, is an essential job in driving big data RE process successfully. Business can be overwhelmed by data and underwhelmed by the information so, data processing is very critical in big data projects. Employing traditional data processing techniques lacks the invention of useful information because of the main characteristics of big data, including high volume, velocity, and variety. Data processing can be benefited by process mining, and in turn, helps to increase the productivity of the big data projects. In this paper, the capability of process mining in big data RE to discover valuable insights and business values from event logs and processes of the systems has been highlighted. Also, the proposed big data requirements engineering framework, named REBD, helps software requirements engineers to eradicate many challenges of big data RE.
The document discusses tools and techniques for big data analytics, including A/B testing, crowdsourcing, machine learning, and data mining. It provides an overview of the big data analysis pipeline, including data acquisition, information extraction, integration and representation, query processing and analysis, and interpretation. The document also discusses fields where big data is relevant like industry, healthcare, and research. It analyzes tools like A/B testing, machine learning, and data mining techniques in more detail.
This document provides an evaluation of big data analytics projects and discusses the Project Predictive Analytics (PPA) approach. It finds that many big data projects fail, with failure rates estimated between 60-85%. Common reasons for failure include a lack of managers with the appropriate competencies for the project. The document argues that effective project manager assignment is critical for project success, as the manager provides leadership, planning, and control. However, assigning the right project manager can be challenging due to issues like a lack of information on effective assignments and the difficulty of predicting human behavior. The criteria for project manager assignment should match the manager's competencies with the requirements of the specific project.
This document summarizes a literature review paper on big data analytics. It begins by defining big data as large datasets that are difficult to handle with traditional tools due to their size, variety, and velocity. It then discusses how big data analytics applies advanced analytics techniques to big data to extract valuable insights. The paper reviews literature on big data analytics tools and methods for storage, management, and analysis of big data. It also discusses opportunities that big data analytics provides for decision making in various domains.
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Artificial intelligence has been a buzz word that is impacting every industry in the world. With the rise of such advanced technology, there will be always a question regarding its impact on our social life, environment and economy thus impacting all efforts exerted towards sustainable development. In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets for different industries and business operations. Numerous use cases have shown that AI can ensure an effective supply of information to citizens, users and customers in times of crisis. This paper aims to analyse some of the different methods and scenario which can be applied to AI and big data, as well as the opportunities provided by the application in various business operations and crisis management domains.
The document proposes a domain-driven data mining methodology to efficiently process tickets in an IT organization. The methodology involves classifying tickets by category, identifying tickets with high issue rates, applying root cause analysis (RCA) to determine the root cause of issues, and applying continuous improvement (CI) to identify and implement solutions. An experiment applying the methodology to a banking sector showed it improved processing rates and reduced tickets with issues compared to processing tickets independently without categorization or RCA/CI. The methodology aims to efficiently solve ticket issues, increase customer satisfaction and requests, and improve processing without waiting for service level agreements.
The effect of technology-organization-environment on adoption decision of bi...IJECEIAES
Big data technology (BDT) is being actively adopted by world-leading organizations due to its expected benefits. However, most of the organizations in Thailand are still in the decision or planning stage to adopt BDT. Many challenges exist in encouraging the BDT diffusion in businesses. Thus, this study develops a research model that investigates the determinants of BDT adoption in the Thai context based on the technology-organizationenvironment (TOE) framework and diffusion of innovation (DOI) theory. Data were collected through an online questionnaire. Three hundred IT employees in different organizations in Thailand were used as a sample group. Structural equation modeling (SEM) was conducted to test the hypotheses. The result indicated that the research model was fitted with the empirical data with the statistics: Normed Chi-Square=1.651, GFI=0.895, AFGI=0.863, NFI=0.930, TLI=0.964, CFI=0.971, SRMR=0.0392, and RMSEA=0.046. The research model could, at 52%, explain decision to adopt BDT. Relative advantage, top management support, competitive pressure, and trading partner pressure show significant positive relation with BDT adoption, while security negatively influences BDT adoption.
The document discusses tools and techniques for big data analytics, including A/B testing, crowdsourcing, machine learning, and data mining. It provides an overview of the big data analysis pipeline, including data acquisition, information extraction, integration and representation, query processing and analysis, and interpretation. The document also discusses fields where big data is relevant like industry, healthcare, and research. It analyzes tools like A/B testing, machine learning, and data mining techniques in more detail.
This document provides an evaluation of big data analytics projects and discusses the Project Predictive Analytics (PPA) approach. It finds that many big data projects fail, with failure rates estimated between 60-85%. Common reasons for failure include a lack of managers with the appropriate competencies for the project. The document argues that effective project manager assignment is critical for project success, as the manager provides leadership, planning, and control. However, assigning the right project manager can be challenging due to issues like a lack of information on effective assignments and the difficulty of predicting human behavior. The criteria for project manager assignment should match the manager's competencies with the requirements of the specific project.
This document summarizes a literature review paper on big data analytics. It begins by defining big data as large datasets that are difficult to handle with traditional tools due to their size, variety, and velocity. It then discusses how big data analytics applies advanced analytics techniques to big data to extract valuable insights. The paper reviews literature on big data analytics tools and methods for storage, management, and analysis of big data. It also discusses opportunities that big data analytics provides for decision making in various domains.
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Artificial intelligence has been a buzz word that is impacting every industry in the world. With the rise of such advanced technology, there will be always a question regarding its impact on our social life, environment and economy thus impacting all efforts exerted towards sustainable development. In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets for different industries and business operations. Numerous use cases have shown that AI can ensure an effective supply of information to citizens, users and customers in times of crisis. This paper aims to analyse some of the different methods and scenario which can be applied to AI and big data, as well as the opportunities provided by the application in various business operations and crisis management domains.
The document proposes a domain-driven data mining methodology to efficiently process tickets in an IT organization. The methodology involves classifying tickets by category, identifying tickets with high issue rates, applying root cause analysis (RCA) to determine the root cause of issues, and applying continuous improvement (CI) to identify and implement solutions. An experiment applying the methodology to a banking sector showed it improved processing rates and reduced tickets with issues compared to processing tickets independently without categorization or RCA/CI. The methodology aims to efficiently solve ticket issues, increase customer satisfaction and requests, and improve processing without waiting for service level agreements.
The effect of technology-organization-environment on adoption decision of bi...IJECEIAES
Big data technology (BDT) is being actively adopted by world-leading organizations due to its expected benefits. However, most of the organizations in Thailand are still in the decision or planning stage to adopt BDT. Many challenges exist in encouraging the BDT diffusion in businesses. Thus, this study develops a research model that investigates the determinants of BDT adoption in the Thai context based on the technology-organizationenvironment (TOE) framework and diffusion of innovation (DOI) theory. Data were collected through an online questionnaire. Three hundred IT employees in different organizations in Thailand were used as a sample group. Structural equation modeling (SEM) was conducted to test the hypotheses. The result indicated that the research model was fitted with the empirical data with the statistics: Normed Chi-Square=1.651, GFI=0.895, AFGI=0.863, NFI=0.930, TLI=0.964, CFI=0.971, SRMR=0.0392, and RMSEA=0.046. The research model could, at 52%, explain decision to adopt BDT. Relative advantage, top management support, competitive pressure, and trading partner pressure show significant positive relation with BDT adoption, while security negatively influences BDT adoption.
This document discusses data mining and provides an overview of the topic. It begins by defining data mining as the process of analyzing large amounts of data to discover hidden patterns and rules. The goal is to analyze this data and summarize it into useful information that can be used to make decisions.
It then describes some common data mining techniques like decision trees, neural networks, and clustering. It also discusses the typical stages of a data mining project, including business understanding, data preparation, modeling, evaluation, and deployment.
Finally, it provides examples of applications for data mining, such as in healthcare to identify patterns in patient data, education to improve learning outcomes, and manufacturing to enhance product quality. In summary, the document outlines the
Hadoop and Big Data Readiness in Africa: A Case of Tanzaniaijsrd.com
Big data has been referred to as a forefront pillar of any modern analytics application. Together with Hadoop which is open source software, they have emerged to be a solution to the processing of massive generated both structured and unstructured data. With different strategies and initiatives taken by governments and private institutions in the world towards deployment and support of big data analytics and hadoop, Africa cannot be left isolated. In this paper, we assessed the readiness of Africa with a case study of Tanzania in harnessing the power of big data analytics and hadoop as a tool for drawing insights that might help them make crucial decisions. We used a survey in collecting the data using questionnaires. Results reveal that majority of the companies are either not aware of the technologies or still in their infancy stages in using big data analytics and hadoop. We identified that most companies are in either awakening or advancing stages of the big data continuum. This is attributed by challenges such as lack of IT skills to manage big data projects, cost of technology infrastructure, making decision on which data are relevant, lack of skills to analyze the data, lack of business support and deciding on what technology is best compared to others. It has also been found out that most of the companies' IT officers are not aware with the concepts and techniques of big data analytics and hadoop.
challenges of big data to big data mining with their processing frameworkKamleshKumar394
This document summarizes a paper presentation on the challenges of big data mining and processing frameworks. It discusses big data characteristics like volume, variety and velocity. It outlines data challenges including volume, variety, velocity, variability, value, veracity and visualization. Process challenges involving data acquisition, cleaning, analysis, integration and querying are also summarized. Management challenges involving privacy, security, data sharing and ownership are covered. Finally, a big data mining processing framework involving the HACE theorem is presented.
The document describes developing a framework to predict human performance using ensemble techniques. It discusses collecting data on project personnel attributes and performance ratings. A random forest model is developed using attributes like programming skills, reasoning ability, domain knowledge, time efficiency, GPA, and communication skills. The random forest improves on previous studies by enhancing prediction accuracy. Results show attributes like programming skills and domain knowledge are highly important for predicting performance. The model achieves high accuracy rates based on evaluation metrics.
This document discusses big data and tools for analyzing big data. It defines big data as very large data sets that are difficult to analyze using traditional data management systems due to the volume, variety and velocity of the data. It describes characteristics of big data including the 5 V's: volume, variety, velocity, veracity and value. It also discusses some common tools for analyzing big data, including MapReduce, Apache Hadoop, Apache Mahout and Apache Spark. Finally, it briefly mentions the potential of quantum computing for big data analysis by being able to process exponentially large amounts of data simultaneously.
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
This webinar focuses on obtaining business value from data quality initiatives. The presenter will illustrate how chronic business challenges can often be traced to poor data quality. Data quality should be engineered by providing a framework to more quickly identify business and data problems, as well as prevent recurring issues caused by structural or process defects. The webinar will cover data quality definitions, the data quality engineering cycle and complications, causes of data quality issues, quality across the data lifecycle, tools for data quality engineering, and takeaways.
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
This document discusses how analytics can help organizations derive business value from big data. It describes how statistical analysis, machine learning, optimization and text mining can extract meaningful insights from social media, online commerce, telecommunications, smart utility meters, and improve security. While tools exist to analyze big data, challenges remain around data security, privacy, and developing skilled talent. The paper aims to illustrate how existing algorithms can generate value from different industry use cases.
Data Quality - Standards and Application to Open DataMarco Torchiano
This document provides an overview of data quality standards and their application to open data. It discusses ISO standards for data quality, including ISO 25012 which defines a data quality model. Key data quality characteristics like accuracy, completeness, consistency and understandability are explained. The document also presents two case studies on open data quality: Open Coesione portal data in Italy and public contract data from Italian universities. Various data quality measures defined in ISO 25024 are calculated for the case studies, and findings around areas like traceability, metadata and decentralized vs centralized disclosure are discussed.
This document discusses challenges and outlooks related to big data. It begins with an introduction describing how big data is being collected and analyzed in various fields such as science, education, healthcare, urban planning, and more. It then outlines the key phases in big data analysis: data acquisition and recording, information extraction and cleaning, data integration and representation, query processing and analysis, and result interpretation. For each phase, it discusses challenges and how existing techniques can be applied or extended to address big data issues. Some of the major challenges discussed are data scale, heterogeneity, lack of structure, privacy, timeliness, provenance, and visualization across the entire big data analysis pipeline.
This document discusses big data, including its characteristics of volume, velocity, and variety. It outlines issues related to big data such as storage and processing challenges due to the massive size of datasets. Privacy, security, and access are also concerns. Advantages include better understanding of customers, business optimization, improved science and healthcare. Effectively addressing the technical and analytical challenges will help realize big data's value.
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
Data warehousing solutions work as information base for large organizations to support their decision
making tasks. With the proven need of such solutions in current times, it is crucial to effectively design,
implement and utilize these solutions. Data warehouse (DW) implementation has been a challenge for the
organizations and the success rate of its implementation has been very low. To address these problems, we
have proposed a framework for developing effective data warehousing solutions. The framework is
primarily based on procedural aspect of data warehouse development and aims to standardize its process.
We first identified its components and then worked on them in depth to come up with the framework for
effective implementation of data warehousing projects. To verify effectiveness of the designed framework,
we worked on National Rural Health Mission (NRHM) project of Indian government and designed data
warehousing solution using the proposed framework.
KEYWORDS
Data warehousing, Framework Design, Dimensional
IRJET- Analysis of Big Data Technology and its ChallengesIRJET Journal
This document discusses big data technology and its challenges. It begins by defining big data as large, complex data sets that are growing exponentially due to increased internet usage and digital data collection. It then outlines the key steps in big data analysis: acquisition, assembly, analysis, and action. Several big data technologies that support these steps are described, including Hadoop, MapReduce, Pig, and Hive. The document concludes by examining challenges of big data such as storage limitations, data lifecycle management, analysis difficulties, and ensuring data privacy and security when analyzing large datasets.
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
This document summarizes discussions from a workshop organized by the National Institute of Standards and Technology's (NIST) Big Data Public Working Group. The workshop included four panels that discussed: 1) the current state of big data technologies; 2) future trends in big data hardware, computing models, analytics and measurement; 3) methods for improving big data sharing and collaboration; and 4) security and privacy concerns with big data. The panels featured presentations on topics such as big data reference architectures, use cases, benchmarks, data consistency issues, and approaches for enabling secure big data applications while preserving privacy.
Prasad Kommoju has 5 years of experience in ETL development using Informatica. He has worked on projects in banking and manufacturing. His skills include Informatica PowerCenter, IDQ, MDM, DIH, SQL, and UNIX. Currently he works as a Senior Software Engineer at Tech Mahindra on a project with MasterCard managing customer data.
This certificate recognizes that Roel Morales Magda completed an online course on big data analytics through Griffith University. The course explored key concepts of big data, applications across industries, and the relationship between big data and social media. Topics included the data analytics cycle, opportunities and challenges of big data, and how analytics can solve problems. Magda completed the 2-week, 4-hours-per-week course and achieved an average test score of 70%.
Machine learning techniques to improve data management and data quality - presentation by Tobias Pentek and Martin Fadler from the Competence Center Corporate Data Quality. This presentation was presented during the Marcus Evans Event in Amsterdam 08.02.2019
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2]connectomics, complex physics simulations,[3] and biological and environmental research.[4]
Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks.[5]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-6" [6]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-7" [7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data were created;[9] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[10]
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebook[11] that can handle the available data set.
Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".[12] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]
Contents
· 1 Definition
· 2 Characteristics
· 3 Architecture
· 4 Technologies
· 5 Applications
· 5.1 Government
· 5.1.1 United States of America
· 5.1.2 India
· 5.1.3 United Kingdom
· 5.2 International development
· 5.3 Manufacturing
· 5.3.1 Cyber-Physical Models
· 5.4 Media
· 5.4.1 Internet of Things (IoT)
· 5.4.2 Technology
· 5.5 Private sector
· 5.5.1 Retail
· 5.5.2 Retail Banking
· 5.5.3 Real Estate
· 5.6 Science
· 5.6.1 Science and Resear ...
An Investigation of Critical Failure Factors In Information Technology ProjectsIOSR Journals
Rate of failed projects in information technology system project remains high in comparison with other infrastructure or high technology projects. The objective of this paper is to determine and represent a broad range of potential failure factors during the implementation phase and cause of IS/IT Project defeat/failure. Challenges exist in order to achieve the projects goal successfully and to avoid the failure. In this research study, 12 articles were studied as significant contributions to analyze developing a list of critical failure factors of IT projects
This document discusses big data and its challenges related to the Internet of Things (IoT). It first defines big data and explains how the aggregation of data from many IoT systems can lead to big data. It then discusses some key challenges of big data, including issues with data volume, velocity, variety, and veracity. Specific challenges for big data from IoT systems are also reviewed, such as authentication, security, and uncertainty of data. Finally, the document outlines some potential solutions to big data challenges, such as using MapReduce for heterogeneous data, data cleaning techniques for inconsistencies, and cloud-based security platforms for IoT devices.
Encroachment in Data Processing using Big Data TechnologyMangaiK4
Abstract—The nature of big data is now growing and information is present all around us in different kind of forms. The big data information plays crucial role and it provides business value for the firms and its benefits sectors by accumulating knowledge. This growth of big data around all the concerns is high and challenge in data processing technique because it contains variety of data in enormous volume. The tools which are built on the data mining algorithm provides efficient data processing mechanisms, but not fulfill the pattern of heterogeneous, so the emerging tools such like Hadoop MapReduce, Pig, SPARK, Cloudera, Impala and Enterprise RTQ, IBM Netezza and Apache Giraphe as computing tools and HBase, Hive, Neo4j and Apache Cassendra as storage tools useful in classifying, clustering and discovering the knowledge. This study will focused on the comparative study of different data processing tools, in big data analytics and their benefits will be tabulated.
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
In recent past, big data opportunities have gained much momentum to enhance knowledge management in
organizations. However, big data due to its various properties like high volume, variety, and velocity can
no longer be effectively stored and analyzed with traditional data management techniques to generate
values for knowledge development. Hence, new technologies and architectures are required to store and
analyze this big data through advanced data analytics and in turn generate vital real-time knowledge for
effective decision making by organizations. More specifically, it is necessary to have a single infrastructure
which provides common functionality of knowledge management, and flexible enough to handle different
types of big data and big data analysis tasks. Cloud computing infrastructures capable of storing and
processing large volume of data can be used for efficient big data processing because it minimizes the
initial cost for the large-scale computing infrastructure demanded by big data analytics. This paper aims to
explore the impact of big data analytics on knowledge management and proposes a cloud-based conceptual
framework that can analyze big data in real time to facilitate enhanced decision making intended for
competitive advantage. Thus, this framework will pave the way for organizations to explore the relationship
between big data analytics and knowledge management which are mostly deemed as two distinct entities.
This document discusses data mining and provides an overview of the topic. It begins by defining data mining as the process of analyzing large amounts of data to discover hidden patterns and rules. The goal is to analyze this data and summarize it into useful information that can be used to make decisions.
It then describes some common data mining techniques like decision trees, neural networks, and clustering. It also discusses the typical stages of a data mining project, including business understanding, data preparation, modeling, evaluation, and deployment.
Finally, it provides examples of applications for data mining, such as in healthcare to identify patterns in patient data, education to improve learning outcomes, and manufacturing to enhance product quality. In summary, the document outlines the
Hadoop and Big Data Readiness in Africa: A Case of Tanzaniaijsrd.com
Big data has been referred to as a forefront pillar of any modern analytics application. Together with Hadoop which is open source software, they have emerged to be a solution to the processing of massive generated both structured and unstructured data. With different strategies and initiatives taken by governments and private institutions in the world towards deployment and support of big data analytics and hadoop, Africa cannot be left isolated. In this paper, we assessed the readiness of Africa with a case study of Tanzania in harnessing the power of big data analytics and hadoop as a tool for drawing insights that might help them make crucial decisions. We used a survey in collecting the data using questionnaires. Results reveal that majority of the companies are either not aware of the technologies or still in their infancy stages in using big data analytics and hadoop. We identified that most companies are in either awakening or advancing stages of the big data continuum. This is attributed by challenges such as lack of IT skills to manage big data projects, cost of technology infrastructure, making decision on which data are relevant, lack of skills to analyze the data, lack of business support and deciding on what technology is best compared to others. It has also been found out that most of the companies' IT officers are not aware with the concepts and techniques of big data analytics and hadoop.
challenges of big data to big data mining with their processing frameworkKamleshKumar394
This document summarizes a paper presentation on the challenges of big data mining and processing frameworks. It discusses big data characteristics like volume, variety and velocity. It outlines data challenges including volume, variety, velocity, variability, value, veracity and visualization. Process challenges involving data acquisition, cleaning, analysis, integration and querying are also summarized. Management challenges involving privacy, security, data sharing and ownership are covered. Finally, a big data mining processing framework involving the HACE theorem is presented.
The document describes developing a framework to predict human performance using ensemble techniques. It discusses collecting data on project personnel attributes and performance ratings. A random forest model is developed using attributes like programming skills, reasoning ability, domain knowledge, time efficiency, GPA, and communication skills. The random forest improves on previous studies by enhancing prediction accuracy. Results show attributes like programming skills and domain knowledge are highly important for predicting performance. The model achieves high accuracy rates based on evaluation metrics.
This document discusses big data and tools for analyzing big data. It defines big data as very large data sets that are difficult to analyze using traditional data management systems due to the volume, variety and velocity of the data. It describes characteristics of big data including the 5 V's: volume, variety, velocity, veracity and value. It also discusses some common tools for analyzing big data, including MapReduce, Apache Hadoop, Apache Mahout and Apache Spark. Finally, it briefly mentions the potential of quantum computing for big data analysis by being able to process exponentially large amounts of data simultaneously.
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
This webinar focuses on obtaining business value from data quality initiatives. The presenter will illustrate how chronic business challenges can often be traced to poor data quality. Data quality should be engineered by providing a framework to more quickly identify business and data problems, as well as prevent recurring issues caused by structural or process defects. The webinar will cover data quality definitions, the data quality engineering cycle and complications, causes of data quality issues, quality across the data lifecycle, tools for data quality engineering, and takeaways.
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
This document discusses how analytics can help organizations derive business value from big data. It describes how statistical analysis, machine learning, optimization and text mining can extract meaningful insights from social media, online commerce, telecommunications, smart utility meters, and improve security. While tools exist to analyze big data, challenges remain around data security, privacy, and developing skilled talent. The paper aims to illustrate how existing algorithms can generate value from different industry use cases.
Data Quality - Standards and Application to Open DataMarco Torchiano
This document provides an overview of data quality standards and their application to open data. It discusses ISO standards for data quality, including ISO 25012 which defines a data quality model. Key data quality characteristics like accuracy, completeness, consistency and understandability are explained. The document also presents two case studies on open data quality: Open Coesione portal data in Italy and public contract data from Italian universities. Various data quality measures defined in ISO 25024 are calculated for the case studies, and findings around areas like traceability, metadata and decentralized vs centralized disclosure are discussed.
This document discusses challenges and outlooks related to big data. It begins with an introduction describing how big data is being collected and analyzed in various fields such as science, education, healthcare, urban planning, and more. It then outlines the key phases in big data analysis: data acquisition and recording, information extraction and cleaning, data integration and representation, query processing and analysis, and result interpretation. For each phase, it discusses challenges and how existing techniques can be applied or extended to address big data issues. Some of the major challenges discussed are data scale, heterogeneity, lack of structure, privacy, timeliness, provenance, and visualization across the entire big data analysis pipeline.
This document discusses big data, including its characteristics of volume, velocity, and variety. It outlines issues related to big data such as storage and processing challenges due to the massive size of datasets. Privacy, security, and access are also concerns. Advantages include better understanding of customers, business optimization, improved science and healthcare. Effectively addressing the technical and analytical challenges will help realize big data's value.
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
Data warehousing solutions work as information base for large organizations to support their decision
making tasks. With the proven need of such solutions in current times, it is crucial to effectively design,
implement and utilize these solutions. Data warehouse (DW) implementation has been a challenge for the
organizations and the success rate of its implementation has been very low. To address these problems, we
have proposed a framework for developing effective data warehousing solutions. The framework is
primarily based on procedural aspect of data warehouse development and aims to standardize its process.
We first identified its components and then worked on them in depth to come up with the framework for
effective implementation of data warehousing projects. To verify effectiveness of the designed framework,
we worked on National Rural Health Mission (NRHM) project of Indian government and designed data
warehousing solution using the proposed framework.
KEYWORDS
Data warehousing, Framework Design, Dimensional
IRJET- Analysis of Big Data Technology and its ChallengesIRJET Journal
This document discusses big data technology and its challenges. It begins by defining big data as large, complex data sets that are growing exponentially due to increased internet usage and digital data collection. It then outlines the key steps in big data analysis: acquisition, assembly, analysis, and action. Several big data technologies that support these steps are described, including Hadoop, MapReduce, Pig, and Hive. The document concludes by examining challenges of big data such as storage limitations, data lifecycle management, analysis difficulties, and ensuring data privacy and security when analyzing large datasets.
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
This document summarizes discussions from a workshop organized by the National Institute of Standards and Technology's (NIST) Big Data Public Working Group. The workshop included four panels that discussed: 1) the current state of big data technologies; 2) future trends in big data hardware, computing models, analytics and measurement; 3) methods for improving big data sharing and collaboration; and 4) security and privacy concerns with big data. The panels featured presentations on topics such as big data reference architectures, use cases, benchmarks, data consistency issues, and approaches for enabling secure big data applications while preserving privacy.
Prasad Kommoju has 5 years of experience in ETL development using Informatica. He has worked on projects in banking and manufacturing. His skills include Informatica PowerCenter, IDQ, MDM, DIH, SQL, and UNIX. Currently he works as a Senior Software Engineer at Tech Mahindra on a project with MasterCard managing customer data.
This certificate recognizes that Roel Morales Magda completed an online course on big data analytics through Griffith University. The course explored key concepts of big data, applications across industries, and the relationship between big data and social media. Topics included the data analytics cycle, opportunities and challenges of big data, and how analytics can solve problems. Magda completed the 2-week, 4-hours-per-week course and achieved an average test score of 70%.
Machine learning techniques to improve data management and data quality - presentation by Tobias Pentek and Martin Fadler from the Competence Center Corporate Data Quality. This presentation was presented during the Marcus Evans Event in Amsterdam 08.02.2019
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2]connectomics, complex physics simulations,[3] and biological and environmental research.[4]
Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks.[5]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-6" [6]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-7" [7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data were created;[9] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[10]
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebook[11] that can handle the available data set.
Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".[12] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]
Contents
· 1 Definition
· 2 Characteristics
· 3 Architecture
· 4 Technologies
· 5 Applications
· 5.1 Government
· 5.1.1 United States of America
· 5.1.2 India
· 5.1.3 United Kingdom
· 5.2 International development
· 5.3 Manufacturing
· 5.3.1 Cyber-Physical Models
· 5.4 Media
· 5.4.1 Internet of Things (IoT)
· 5.4.2 Technology
· 5.5 Private sector
· 5.5.1 Retail
· 5.5.2 Retail Banking
· 5.5.3 Real Estate
· 5.6 Science
· 5.6.1 Science and Resear ...
An Investigation of Critical Failure Factors In Information Technology ProjectsIOSR Journals
Rate of failed projects in information technology system project remains high in comparison with other infrastructure or high technology projects. The objective of this paper is to determine and represent a broad range of potential failure factors during the implementation phase and cause of IS/IT Project defeat/failure. Challenges exist in order to achieve the projects goal successfully and to avoid the failure. In this research study, 12 articles were studied as significant contributions to analyze developing a list of critical failure factors of IT projects
This document discusses big data and its challenges related to the Internet of Things (IoT). It first defines big data and explains how the aggregation of data from many IoT systems can lead to big data. It then discusses some key challenges of big data, including issues with data volume, velocity, variety, and veracity. Specific challenges for big data from IoT systems are also reviewed, such as authentication, security, and uncertainty of data. Finally, the document outlines some potential solutions to big data challenges, such as using MapReduce for heterogeneous data, data cleaning techniques for inconsistencies, and cloud-based security platforms for IoT devices.
Encroachment in Data Processing using Big Data TechnologyMangaiK4
Abstract—The nature of big data is now growing and information is present all around us in different kind of forms. The big data information plays crucial role and it provides business value for the firms and its benefits sectors by accumulating knowledge. This growth of big data around all the concerns is high and challenge in data processing technique because it contains variety of data in enormous volume. The tools which are built on the data mining algorithm provides efficient data processing mechanisms, but not fulfill the pattern of heterogeneous, so the emerging tools such like Hadoop MapReduce, Pig, SPARK, Cloudera, Impala and Enterprise RTQ, IBM Netezza and Apache Giraphe as computing tools and HBase, Hive, Neo4j and Apache Cassendra as storage tools useful in classifying, clustering and discovering the knowledge. This study will focused on the comparative study of different data processing tools, in big data analytics and their benefits will be tabulated.
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
In recent past, big data opportunities have gained much momentum to enhance knowledge management in
organizations. However, big data due to its various properties like high volume, variety, and velocity can
no longer be effectively stored and analyzed with traditional data management techniques to generate
values for knowledge development. Hence, new technologies and architectures are required to store and
analyze this big data through advanced data analytics and in turn generate vital real-time knowledge for
effective decision making by organizations. More specifically, it is necessary to have a single infrastructure
which provides common functionality of knowledge management, and flexible enough to handle different
types of big data and big data analysis tasks. Cloud computing infrastructures capable of storing and
processing large volume of data can be used for efficient big data processing because it minimizes the
initial cost for the large-scale computing infrastructure demanded by big data analytics. This paper aims to
explore the impact of big data analytics on knowledge management and proposes a cloud-based conceptual
framework that can analyze big data in real time to facilitate enhanced decision making intended for
competitive advantage. Thus, this framework will pave the way for organizations to explore the relationship
between big data analytics and knowledge management which are mostly deemed as two distinct entities.
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
In recent past, big data opportunities have gained much momentum to enhance knowledge management in
organizations. However, big data due to its various properties like high volume, variety, and velocity can
no longer be effectively stored and analyzed with traditional data management techniques to generate
values for knowledge development. Hence, new technologies and architectures are required to store and
analyze this big data through advanced data analytics and in turn generate vital real-time knowledge for
effective decision making by organizations. More specifically, it is necessary to have a single infrastructure
which provides common functionality of knowledge management, and flexible enough to handle different
types of big data and big data analysis tasks. Cloud computing infrastructures capable of storing and
processing large volume of data can be used for efficient big data processing because it minimizes the
initial cost for the large-scale computing infrastructure demanded by big data analytics. This paper aims to
explore the impact of big data analytics on knowledge management and proposes a cloud-based conceptual
framework that can analyze big data in real time to facilitate enhanced decision making intended for
competitive advantage. Thus, this framework will pave the way for organizations to explore the relationship
between big data analytics and knowledge management which are mostly deemed as two distinct entities.
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC...ijdpsjournal
In recent past, big data opportunities have gained much momentum to enhance knowledge management in
organizations. However, big data due to its various properties like high volume, variety, and velocity can
no longer be effectively stored and analyzed with traditional data management techniques to generate
values for knowledge development. Hence, new technologies and architectures are required to store and
analyze this big data through advanced data analytics and in turn generate vital real-time knowledge for
effective decision making by organizations. More specifically, it is necessary to have a single infrastructure
which provides common functionality of knowledge management, and flexible enough to handle different
types of big data and big data analysis tasks. Cloud computing infrastructures capable of storing and
processing large volume of data can be used for efficient big data processing because it minimizes the
initial cost for the large-scale computing infrastructure demanded by big data analytics. This paper aims to
explore the impact of big data analytics on knowledge management and proposes a cloud-based conceptual
framework that can analyze big data in real time to facilitate enhanced decision making intended for
competitive advantage. Thus, this framework will pave the way for organizations to explore the relationship
between big data analytics and knowledge management which are mostly deemed as two distinct entities.
The document is a seminar report submitted by Nikita Sanjay Rajbhoj to Prof. Ashwini Jadhav at G.S. Moze College of Engineering. It discusses big data, including its definition, characteristics, architecture, technologies, and applications. The report includes an abstract, introduction, and sections on definition, characteristics, architecture, technologies, and applications of big data. It also includes references, acknowledgements, and certificates.
We have concentrated on a range of strategies, methodologies, and distinct fields of research in this article, all of which are useful and relevant in the field of data mining technologies. As we all know, numerous multinational corporations and major corporations operate in various parts of the world. Each location of business may create significant amounts of data. Corporate decision-makers need access to all of these data sources in order to make strategic decisions.
A gigantic archive of terabytes of information is created every day from current data frameworks and computerized advances, for example, Internet of Things and distributed computing. Examination of these gigantic information requires a ton of endeavors at various levels to extricate information for dynamic. Hence, huge information examination is an ebb and flow region of innovative work. The essential goal of this paper is to investigate the likely effect of huge information challenges, and different instruments related with it. Accordingly, this article gives a stage to investigate enormous information at various stages. Moreover, it opens another skyline for analysts to build up the arrangement, in light of the difficulties and open exploration issues.
The Big Data Importance – Tools and their UsageIRJET Journal
This document discusses big data, tools for analyzing big data, and opportunities that big data analytics provides. It begins by defining big data and its key characteristics of volume, variety and velocity. It then discusses tools for storing, managing and processing big data like Hadoop, MapReduce and HDFS. Finally, it outlines how big data analytics can be applied across different domains to enable new insights and informed decision making through analyzing large datasets.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
The document discusses the characteristics of big data, known as the "V's of big data". It begins by describing the traditional 3 V's of big data - volume, velocity and variety. It then discusses the evolution of additional V's proposed by different researchers to describe big data, culminating in a list of 14 V's. The document argues that while previous research explored these 14 V's, issues still exist in managing big data effectively. It then proposes 3 new V's to characterize big data - verbosity, voluntariness and versatility - to further the understanding of big data.
Big data is a prominent term which characterizes the improvement and availability of data in all three
formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record
or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file
includes text and multimedia contents. The primary objective of this big data concept is to describe the
extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V”
dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity.
Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is
described with the types of the data, Value which derives the business value and Veracity describes about
the quality of the data and data understandability. Nowadays, big data has become unique and preferred
research areas in the field of computer science. Many open research problems are available in big data
and good solutions also been proposed by the researchers even though there is a need for development of
many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper,
a detailed study about big data, its basic concepts, history, applications, technique, research issues and
tools are discussed.
Big data is a prominent term which characterizes the improvement and availability of data in all three
formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record
or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file
includes text and multimedia contents. The primary objective of this big data concept is to describe the
extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V”
dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity.
Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is
described with the types of the data, Value which derives the business value and Veracity describes about
the quality of the data and data understandability. Nowadays, big data has become unique and preferred
research areas in the field of computer science. Many open research problems are available in big data
and good solutions also been proposed by the researchers even though there is a need for development of
many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper,
a detailed study about big data, its basic concepts, history, applications, technique, research issues and
tools are discussed.
Big data is a prominent term which characterizes the improvement and availability of data in all three
formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record
or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file
includes text and multimedia contents. The primary objective of this big data concept is to describe the
extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V”
dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity.
Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is
described with the types of the data, Value which derives the business value and Veracity describes about
the quality of the data and data understandability. Nowadays, big data has become unique and preferred
research areas in the field of computer science. Many open research problems are available in big data
and good solutions also been proposed by the researchers even though there is a need for development of
many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper,
a detailed study about big data, its basic concepts, history, applications, technique, research issues and
tools are discussed.
Big data is a prominent term which characterizes the improvement and availability of data in all three formats like structure, unstructured and semi formats. Structure data is located in a fixed field of a record or file and it is present in the relational data bases and spreadsheets whereas an unstructured data file includes text and multimedia contents. The primary objective of this big data concept is to describe the extreme volume of data sets i.e. both structured and unstructured. It is further defined with three “V” dimensions namely Volume, Velocity and Variety, and two more “V” also added i.e. Value and Veracity. Volume denotes the size of data, Velocity depends upon the speed of the data processing, Variety is described with the types of the data, Value which derives the business value and Veracity describes about the quality of the data and data understandability. Nowadays, big data has become unique and preferred research areas in the field of computer science. Many open research problems are available in big data and good solutions also been proposed by the researchers even though there is a need for development of many new techniques and algorithms for big data analysis in order to get optimal solutions. In this paper, a detailed study about big data, its basic concepts, history, applications, technique, research issues and tools are discussed.
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
Data has become an indispensable part of every economy, industry, organization, business
function and individual. Big Data is a term used to identify the datasets that whose size is
beyond the ability of typical database software tools to store, manage and analyze. The Big
Data introduce unique computational and statistical challenges, including scalability and
storage bottleneck, noise accumulation, spurious correlation and measurement errors. These
challenges are distinguished and require new computational and statistical paradigm. This
paper presents the literature review about the Big data Mining and the issues and challenges
with emphasis on the distinguished features of Big Data. It also discusses some methods to deal
with big data.
Similar to Importance of Process Mining for Big Data Requirements Engineering (20)
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
In the era of data-driven warfare, the integration of big data and machine learning (ML) techniques has
become paramount for enhancing defence capabilities. This research report delves into the applications of
big data and ML in the defence sector, exploring their potential to revolutionize intelligence gathering,
strategic decision-making, and operational efficiency. By leveraging vast amounts of data and advanced
algorithms, these technologies offer unprecedented opportunities for threat detection, predictive analysis,
and optimized resource allocation. However, their adoption also raises critical concerns regarding data
privacy, ethical implications, and the potential for misuse. This report aims to provide a comprehensive
understanding of the current state of big data and ML in defence, while examining the challenges and
ethical considerations that must be addressed to ensure responsible and effective implementation.
Cloud Computing, being one of the most recent innovative developments of the IT world, has been
instrumental not just to the success of SMEs but, through their productivity and innovative contribution to
the economy, has even made a remarkable contribution to the economic growth of the United States. To
this end, the study focuses on how cloud computing technology has impacted economic growth through
SMEs in the United States. Relevant literature connected to the variables of interest in this study was
reviewed, and secondary data was generated and utilized in the analysis section of this paper. The findings
of this paper revealed that there have been meaningful contributions that the usage of virtualization has
made in the commercial dealings of small firms in the United States, and this has also been reflected in the
economic growth of the country. This paper further revealed that as important as cloud-based software is,
some SMEs are still skeptical about how it can help improve their business and increase their bottom line
and hence have failed to adopt it. Apart from the SMEs, some notable large firms in different industries,
including information and educational services, have adopted cloud computing technology and hence
contributed to the economic growth of the United States. Lastly, findings from our inferential statistics
revealed that no discernible change has occurred in innovation between small and big businesses in the
adoption of cloud computing. Both categories of businesses adopt cloud computing in the same way, and
their contribution to the American economy has no significant difference in the usage of virtualization.
Energy-constrained Wireless Sensor Networks (WSNs) have garnered significant research interest in
recent years. Multiple-Input Multiple-Output (MIMO), or Cooperative MIMO, represents a specialized
application of MIMO technology within WSNs. This approach operates effectively, especially in
challenging and resource-constrained environments. By facilitating collaboration among sensor nodes,
Cooperative MIMO enhances reliability, coverage, and energy efficiency in WSN deployments.
Consequently, MIMO finds application in diverse WSN scenarios, spanning environmental monitoring,
industrial automation, and healthcare applications.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication. IJCSIT publishes original research papers and review papers, as well as auxiliary material such as: research papers, case studies, technical reports etc.
With growing, Car parking increases with the number of car users. With the increased use of smartphones
and their applications, users prefer mobile phone-based solutions. This paper proposes the Smart Parking
Management System (SPMS) that depends on Arduino parts, Android applications, and based on IoT. This
gave the client the ability to check available parking spaces and reserve a parking spot. IR sensors are
utilized to know if a car park space is allowed. Its area data are transmitted using the WI-FI module to the
server and are recovered by the mobile application which offers many options attractively and with no cost
to users and lets the user check reservation details. With IoT technology, the smart parking system can be
connected wirelessly to easily track available locations.
Welcome to AIRCC's International Journal of Computer Science and Information Technology (IJCSIT), your gateway to the latest advancements in the dynamic fields of Computer Science and Information Systems.
Computer-Assisted Language Learning (CALL) are computer-based tutoring systems that deal with
linguistic skills. Adding intelligence in such systems is mainly based on using Natural Language
Processing (NLP) tools to diagnose student errors, especially in language grammar. However, most such
systems do not consider the modeling of student competence in linguistic skills, especially for the Arabic
language. In this paper, we will deal with basic grammar concepts of the Arabic language taught for the
fourth grade of the elementary school in Egypt. This is through Arabic Grammar Trainer (AGTrainer)
which is an Intelligent CALL. The implemented system (AGTrainer) trains the students through different
questions that deal with the different concepts and have different difficulty levels. Constraint-based student
modeling (CBSM) technique is used as a short-term student model. CBSM is used to define in small grain
level the different grammar skills through the defined skill structures. The main contribution of this paper
is the hierarchal representation of the system's basic grammar skills as domain knowledge. That
representation is used as a mechanism for efficiently checking constraints to model the student knowledge
and diagnose the student errors and identify their cause. In addition, satisfying constraints and the number
of trails the student takes for answering each question and fuzzy logic decision system are used to
determine the student learning level for each lesson as a long-term model. The results of the evaluation
showed the system's effectiveness in learning in addition to the satisfaction of students and teachers with its
features and abilities.
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This research aims to further understanding in the field of continuous authentication using behavioural
biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing
Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed
machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KNN), and
Support Vector Classifier (SVC), to determine the authenticity of specific user actions. Our most robust
model was SVC, which achieved an average accuracy of approximately 90%, demonstrating that touch
dynamics can effectively distinguish users. However, further studies are needed to make it viable option
for authentication systems. You can access our dataset at the following
link:https://github.com/AuthenTech2023/authentech-repo
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
- The document presents 6 different models for defining foot size in Tunisia: 2 statistical models, 2 neural network models using unsupervised learning, and 2 models combining neural networks and fuzzy logic.
- The statistical models (SM and SHM) are based on applying statistical equations to morphological foot data.
- The neural network models (MSK and MHSK) use self-organizing Kohonen maps to cluster foot data and model full and half sizes.
- The fuzzy neural network models (MSFK and MHSFK) incorporate fuzzy logic into the neural network learning process to better account for uncertainty in foot sizes.
The security of Electric Vehicle (EV) charging has gained momentum after the increase in the EV adoption
in the past few years. Mobile applications have been integrated into EV charging systems that mainly use a
cloud-based platform to host their services and data. Like many complex systems, cloud systems are
susceptible to cyberattacks if proper measures are not taken by the organization to secure them. In this
paper, we explore the security of key components in the EV charging infrastructure, including the mobile
application and its cloud service. We conducted an experiment that initiated a Man in the Middle attack
between an EV app and its cloud services. Our results showed that it is possible to launch attacks against
the connected infrastructure by taking advantage of vulnerabilities that may have substantial economic and
operational ramifications on the EV charging ecosystem. We conclude by providing mitigation suggestions
and future research directions.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...amsjournal
The Fourth Industrial Revolution is transforming industries, including healthcare, by integrating digital,
physical, and biological technologies. This study examines the integration of 4.0 technologies into
healthcare, identifying success factors and challenges through interviews with 70 stakeholders from 33
countries. Healthcare is evolving significantly, with varied objectives across nations aiming to improve
population health. The study explores stakeholders' perceptions on critical success factors, identifying
challenges such as insufficiently trained personnel, organizational silos, and structural barriers to data
exchange. Facilitators for integration include cost reduction initiatives and interoperability policies.
Technologies like IoT, Big Data, AI, Machine Learning, and robotics enhance diagnostics, treatment
precision, and real-time monitoring, reducing errors and optimizing resource utilization. Automation
improves employee satisfaction and patient care, while Blockchain and telemedicine drive cost reductions.
Successful integration requires skilled professionals and supportive policies, promising efficient resource
use, lower error rates, and accelerated processes, leading to optimized global healthcare outcomes.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Manufacturing Process of molasses based distillery ppt.pptx
Importance of Process Mining for Big Data Requirements Engineering
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
DOI: 10.5121/ijcsit.2020.12401 1
IMPORTANCE OF PROCESS MINING FOR BIG DATA
REQUIREMENTS ENGINEERING
Sandhya Rani Kourla, Eesha Putti, and Mina Maleki
Department of Electrical and Computer Engineering and Computer Science,
University of Detroit Mercy, Detroit, MI, 48221, USA
ABSTRACT
Requirements engineering (RE), as a part of the project development life cycle, has increasingly been
recognized as the key to ensure on-time, on-budget, and goal-based delivery of software projects. RE of big
data projects is even more crucial because of the rapid growth of big data applications over the past few
years. Data processing, being a part of big data RE, is an essential job in driving big data RE process
successfully. Business can be overwhelmed by data and underwhelmed by the information so, data
processing is very critical in big data projects. Employing traditional data processing techniques lacks the
invention of useful information because of the main characteristics of big data, including high volume,
velocity, and variety. Data processing can be benefited by process mining, and in turn, helps to increase
the productivity of the big data projects. In this paper, the capability of process mining in big data RE to
discover valuable insights and business values from event logs and processes of the systems has been
highlighted. Also, the proposed big data requirements engineering framework, named REBD, helps
software requirements engineers to eradicate many challenges of big data RE.
KEYWORDS
Big data, requirements engineering, requirements elicitation, data processing, knowledge discovery,
process mining
1. INTRODUCTION
Requirements engineering (RE) is the first and most critical phase of the Software Development
Life Cycle (SDLC). RE is the branch of engineering concerned with the real-world goals for,
functions of, constraints on systems, and the relationship of these factors to precise specifications
of system behavior [1-4]. Software requirements engineer (SRE) is responsible for translating
stakeholders’ needs, desires, and wishes of a software system into precise and formal software
requirements specification. SREs need to communicate effectively and frequently with
stakeholders to elicit, analyze, model, and manage their requirements. Doing this at the early
stage of the software development helps ensure requirements are meeting stakeholders’ needs
while addressing compliance and staying on schedule and within budget [3].
In contrast, it has been indicated in many researches that performing improper and careless RE is
one of the main sources of the time-consuming rework, inadequate deliveries, budget overruns,
and consequently failures of the projects [3, 5-6]. The HMS Titanic (1912), the Mars Climate
Orbiter (1999), the Apollo 13 (1970), Space Shuttle Challenger (1986), and Space Shuttle
Columbia (2002) projects are some examples of high-profile projects broke down due to poor
requirements engineering [6]. Also, according to PMI’s Pulse of the profession (2018), 35% of
companies proclaimed that the primary reason for their project failures is inaccurate requirements
gathering [7]. As a consequence, performing a proper RE at the early stages of project
development is critical in the success of projects, especially for big data projects.
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
2
In a newly released study, International Data Corporation (IDC) forecasts that the big data
technology and analytics software market, which in 2018 reached $60.7 billion worldwide, is
expected to grow at a five-year compound annual growth rate (CAGR) of 12.5% [8]. Big data is a
technology that deals with data sets that are too large or complex to handle with traditional data
processing techniques for capturing, storing, analyzing, searching, sharing, transferring,
visualizing, querying, and updating of data. The main characteristics of big data technologies are
5V’s: volume, velocity, variety, volatility, and variability [4, 9-11]. Volume refers to the massive
amount of data; Velocity refers to the high growth rate of incoming data that needs to be
processed and analyzed; Variety refers to many different forms of data; Volatility refers to the
duration which data is valid and should be stored in the repository; Variability refers to data
whose context changes invariably. For big data projects, it is essential to define the targets of the
project at earlier stages. Time and cost reduction, finding insights from statistics, and optimizing
selection making are the most common goals that make an organization to appreciate big data
projects [4, 10]. RE phase in big data projects’ development helps in achieving these goals and
finding the business values associated with projects; These values help stakeholders to
understand the importance of the project and its value in the market [12].
Big data RE is data-centric; it means there are lots of potential information, hidden patterns, and
knowledge that can be extracted and discovered from a large amount of historical and actual data
existing in big data projects. Knowledge discovery from big raw data is a process of collecting,
cleaning, transforming, storing, applying mining algorithms, and discovering values from the raw
data [13]. Data processing is a subset of knowledge discovery and deals with the processing of
the collected data. Since big data is associated with a large volume of data, employing traditional
data processing methods does not help much in discovering valuable insights and knowledge
from data. Process mining is an advanced technology that not only can use for discovering
business values from data but also can recognize the risks, deviations, bottlenecks, and
inefficiencies associated with big data [14]. Process mining helps SRE to elicit, prioritize, and
validate requirements from big data using execution logs, process discovery, and conformance
techniques [15].
In this paper, which is an extended version of [16], we have featured the importance of process
mining in big data RE process to improve the business insights and values of the big data
projects. This paper focuses on how data is prepared and how the processed data is converted to
valuable information through the knowledge discovery process and process mining. Moreover,
the main components of the proposed big data requirements engineering framework, named
REBD, have been described. This framework explains the detailed steps for carrying out RE
activities on different software projects development for improving their success rates.
The rest of the paper is organized as follows. After reviewing related works in section 2, section
3 describes big data requirements engineering activities. Section 4 briefs the knowledge
discovery steps and the role of process mining in extracting business values. Section 5 explains
the proposed REBD framework, and section 5 wraps up the research.
2. RELATED WORK
Requirements engineering in the context of big data applications is a hot research topic that
attracted the attention of researchers in recent years. An analysis of the state of the art of big data
RE research studies shows that little research has been conducted in this area by 2018 [10]. The
investigation areas of this research included the phases of the RE process, type of requirements,
application domains, RE research challenges, and solution proposals intended by RE research in
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
3
the context of big data applications. In the following, some of the related work for big data RE
will be presented.
To understand the context of the problem for the big data software applications, an empirically
derived RE artifact model has been proposed in [17], which is equivalent to domain models. The
proposed model can capture the fundamental RE components and the relationships concerning
the development of big data software applications. “How can the requirements of a project be
identified and analyzed to determine the suitability of a combined big data technology
application?”. To answer this question, a new process model has been proposed, detailed, and
evaluated in [1]. This model considers the compound requirements, Knowledge Discovery in
Databases (KDD), and a big data classification framework to identify the most relevant
requirements of big data projects. Identified requirements for building big data applications must
address big data characteristics (5V’s) in terms of quality requirements. Quality requirements,
also known as quality attributes, are those functional or nonfunctional requirements that used to
measure the system’s performance, such as reliability and availability. A new approach is
proposed in [9] to ensure that the big data characteristics have adequately addressed in the
specification of quality requirements.
Even though there are many requirements elicitation techniques in traditional RE, the researchers
believe that more efficient tools and techniques should be employed to identify all requirements,
business values, and knowledge from big data. Machine learning, deep learning, natural language
processing, and data mining are some data analysis technologies that can use to discover
requirements and valuable knowledge from big data [18-20]. To formalize the overall knowledge
discovery process (KDP) within a common framework, [13] introduced the concept of a process
model. This model helps organizations to understand the KDP better and provides a roadmap to
follow while planning and executing the project. This, in turn, results in cost and time savings,
better understanding, and acceptance of the results of such projects. KDP is primarily used to
plan, work through, and reduce the cost of any given project. Also, actionable use case (AUC)
diagram is one of the efficient tools introduced in [12] that allows identified business values
unleashed from data to be depicted in the data scientists’ models, together with their roles in the
software and their interactions with other software components. The AUC diagrams enhance the
users’ experience, optimize the systems’ utility, and consequently maximize profit.
Process mining is another efficient method to discover business values from the existing big data
[14-15, 21-22]. The research done in [14], explains how to turn the event logs into valuable
insights, and the result of this research are used to identify and understand the deviations, risks,
bottlenecks, and inefficiencies involved. Process mining helps SRE to elicit, prioritize, and
validate requirements from big data using execution logs, process discovery, and conformance
techniques [15]. One of the challenges in process mining is monitoring and controlling the
existing big data in the organizations. Because of this, [22] introduced a concept of load
balancing for distributing the workloads uniformly to support business goals in process mining.
[21] explained the perspective of processing mining in an organizational context by conducting
an assessing literature review and analyzing 58 papers from the literature on process mining to
synthesize the existing knowledge on business value realization from process mining. The
capability of process mining to discover valuable insights from event logs and processes of the
system helps SRE to eradicate many challenges of traditional RE.
3. BIG DATA REQUIREMENTS ENGINEERING
As mentioned earlier, RE is one of the essential phases of the project’s life cycle, and the failure
of the RE leads to the failure of the project. So, software requirement engineers are responsible
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
4
for conducting the RE activities in a very well-mannered way, which can resolve many conflicts
between stakeholders as well as among requirements. There are three different types of processes
in the big data RE model proposed in [12]: processes drove by software requirements engineer
(SRE), processes drove by data scientists (DS), and processes drove jointly by software
requirements engineers and data scientists (SRE/DS). Table1 demonstrates the RE activities for
the big data products, including the RE activities, the responsible person for executing each
activity, and their artifacts. Moreover, the column “Included in” indicates whether each activity is
carried out in the traditional RE model (TRE) and/or big data RE model (BDRE).
Table 1. A Requirement Engineering Model for Big Data Software
Process Performed by Output Included in
Requirements Elicitation
SRE/DS along
with Customers
List of requirements
TRE and
BDRE
Data Acquisition DS
Filtered data used for
knowledge discovery
BDRE
Requirements Analysis SRE
Technical specifications
and SRE models
TRE and
BDRE
Data Analysis SRE
Extracted values and DS
model
BDRE
Use case Consolidation SRE /DS
Combined SRE and DS
model
BDRE
Requirements Modelling SRE Finalized AUC diagrams TRE and
BDRE
Requirements Validation SRE Validated requirements
TRE and
BDRE
Requirements Specification SRE SRS report TRE and
BDRE
From the table, it is clear that compared to the traditional RE, big data RE contains a few extra
steps. The main activities of RE are described below [2-3,12].
3.1. Requirements Elicitation
Requirement elicitation as a critical activity in the requirement development process is the
practice of uncovering and gathering the stakeholders’ needs and desires from the system in
terms of functional and nonfunctional requirements. To have a productive elicitation and dig all
of the stakeholders’ requirements, SRE should be able to communicate effectively with all
stakeholders. One of the major problems that influence the elicitation process is the social issue.
Conflicts between different cultures and work styles influence the efficiency of the requirements
elicitation process. Figuring out social scenarios and considering those in the SRS document can
improve the RE process. Social issues rely on a few other factors, like technical, financial, and
domain issues, which can disturb the RE process [23]. Multiple techniques such as brainstorming,
card sorting, laddering, interviews, prototyping, domain analysis, Joint Application Development
(JAD), and Quality Function Deployment (QFD) can be employed to conduct requirements
elicitation. SREs select the elicitation techniques based on different factors, including project
characteristics, stakeholders’ characteristics, organizational characteristics, analyst
characteristics, and software development process [24]. The flaw in the requirements elicitation
will be carried out to the successive RE activities and then to the following development phase
resulting in negotiating the process. As a consequence, spending enough time to investigate and
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
5
choose the best requirements elicitation techniques will save the cost and time for rectifying
errors at the later stage.
3.2. Data Acquisition
This activity can be explained as collecting, gauzing, and fine-tuning the data before storing it in
the data repository [12, 25]. Big data acquisition is expensive and characteristics-driven; It is
administrated by 5Vs (volume, velocity, variety, variability, and veracity) [25]. Hence, the
traditional collecting methods cannot make data acquisition. The importance of data acquisition is
to gather data from the distributed information source and desires to store the collected
information in a scalable, proficient big data storage. Data acquisition makes use of three
components: (a) Protocols for gathering information, (b) Framework which is used to collect
information based on the defined protocols, and (c) Technologies which allow continuous storage
of the retrieved data [25]. Hence, data scientists use different data acquisition methods to discard
useless data and keep important ones.
3.3. Requirements Analysis
Requirement analysis is the practice of defining the system boundaries and analyzing elicited and
documented requirements to make sure that they are clear, concise, understandable,
unambiguous, consistent, and complete. Also, requirements agreement, which is a process of
resolving the conflicts in the requirements derived from different sources, is part of this activity.
A successful requirement analysis establishes the first step towards developing an excellent
system that meets the needs of various stakeholders and expected to reduce the cost of resolving
unnecessary conflict issues between the stakeholders [26]. Use cases and user stories are often
useful tools for this purpose. Moreover, task analysis, Rich pictures, and video recording instruct
SRE to analyze the requirements.
3.4. Data Analysis and Value Discovery
In this activity, the acquired big data is analyzed by data scientists to reveal information and
discover business values. These business values are capable of making high profits for
companies. In an IT industry, there will be many channels in the process operation which carries
the information. This information gathers for deriving effective value through data analysis [27].
Data analysis is a time-consuming task, and failing in an efficient analysis will lead to making
wrong decisions. As a consequence, before analyzing the data, the quality of data should be
improved to ensure the deriving of accurate information. The quality of data is measured using
multiple dimensions, resulting in the evaluation of accuracy, timeliness, and other factors of data
[27]. Different data analysis technologies such as machine learning, deep learning, natural
language processing, data mining, text analytics, and predictive analytics can be employed to
analyze data and extract business values from it [12, 18-20]. Moreover, techniques like process
mining [15] and tools like smart grids [28] can be employed to accelerate the process of
knowledge discovery from big data.
3.5. Use Case Consolidation
Consolidation is merging the work and building models by software requirements engineers and
data scientists. This consolidation is represented as an actionable use case diagram. The AUC
diagram for big data software proposed in [12] is a merging of the data scientist model and the
traditional use case. Once the values are defined and presented, the consolidated model is ready
to be presented and discussed with the customers. SRE and DS together analyze AUC diagrams
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
6
to discover the values and document them. AUC contains (a) actors who can be a manager, user,
system, event triggers, or website, (b) actionable intelligence that is the specific information,
value, data needed to come to a decision [29], (c) data scientist model that contains actionable
intelligence gathered by the data scientist, algorithms, analysis methods, and data sources for
gathering actionable intelligence, and a “Rely on” relationship between actionable intelligence
and data scientist model, (d) relationships between actor and actionable intelligence, and (e)
relationships between data source, analysis, and algorithm.
3.6. Requirements Modeling
The requirement modeling intends to mold raw requirements into technical representations of the
system. This activity guides SRE to archive all requirements during the specification process.
Proper requirements representation eases the communication of requirements and conversion of
those requirements into system architecture and design. Insights of the requirements are hard to
understand when the modeling of the requirements is done improperly. The feature of the
requirements modeling is to figure out how the business requirements are connected to the
information technology requirements and backward. Modeling helps in promoting business-IT
alignment by understanding, analyzing, and structuring the way business requirements are
hooked to the information technology requirements and the way around [30]. To represent the
analyzed functional and nonfunctional requirements in any systems, various modeling techniques
can be employed, such as use cases, user stories, natural languages, formal languages, and a
variety of formal, semiformal, and informal diagrams [3].
3.7. Requirements Validation
Requirements validation is the practice of checking and revising requirements to ensure that the
specified requirements meet customer needs. Requirements validation also involves checking that
(a) the system does not provide the functions that the customer does not need, (b) there are no
requirements conflicts, and (c) time and budget constraints will meet. Validation establishes
advanced support for the requirements and ensures their good qualities. The main purpose of this
step is to identify and accumulate the problems before the stakeholders are dedicated to meet the
requirements. An improper requirements validation at the early stages of the project leads to the
irrelevant design and out graded product quality because of invalid and error-prone requirements.
Validation should be a part of all activities in RE. Systematic manual analysis of the
requirements, test-case generation, comparative product evaluation tools, or some of the
requirements elicitation techniques can be used for requirements validation [3].
3.8. Requirements Specification
Requirements specification is the process of documenting the stakeholders’ needs in terms of
precise and formal software requirements specification. A software requirement specification
(SRS) is a document that uses as a foundation of software development and acts as a contract
between the software stakeholders and the developers. As a consequence, preparing an accurate
SRS report at the end of the RE phase is critical in the success of projects. User requirements are
used as the input for the first specification draft of the system, its goals, and missions. Preparing a
precise requirements specification is not easy if the developing system is a novel system and
complex, and when the stakeholders have more knowledge of the acquiring system. SRS is
prepared by SRE and stakeholders together. SRS contains the mission, scope, goals of the
project, software and hardware requirements of the project, as well as functional and
nonfunctional system requirements in terms of formal, semiformal, and formal models created in
the previous processes. In addition to the requirements matrices, the big data SRS report contains
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
7
business values extracted from data and AUC diagrams. Moreover, to ensure that big data
characteristics are appropriately addressed, SRS should contain the quality attribute matrices. The
quality attribute matrix is designed by intersecting a big data characteristic with a quality attribute
and then identifying the system’s quality requirements that apply to that intersection during big
data RE process, as explained in [9].
4. KNOWLEDGE DISCOVERY FROM BIG DATA
As mentioned earlier, traditional RE methods are user-centric; focus on requirements apparent to
users. However, big data RE methodologies should be data-centric as well because there are lots
of potential information and hidden patterns that can be extracted from data. As a consequence,
data processing and knowledge discovery are essential jobs in big data RE.
Knowledge discovery from big raw data in RE refers to the finding of useful data patterns,
valuable information, and hidden business values from the massive amount of existing data.
Knowledge discovery concerns the entire knowledge extraction process, including (a) how data
are stored and accessed, (b) how to use efficient and scalable algorithms to analyze massive
datasets, (c) how to interpret and visualize the results, and (d) how to model and support the
interaction between humans and machines[31].
4.1. KNOWLEDGE MINING PROCESS
Figure 1 depicts the knowledge discovery process from existing raw big data. Exploration of data
followed bydata preparation is defined as the knowledge discovery process. Before processing
data and extracting knowledge, data should be preprocessed.Data preparation includes the
following steps:
• Data Accumulation: In this step, the raw data is gathered for mining and discovering some
valuable information out of it. Different tools such as web data collection tools,
encyclopedia, google forms can be employed to gather the data.
• Data Categorization: Data categorization is the process of analyzing the data and split them
up into different categories. Understanding the data types and assigning data to the target
categories is the main goal of data categorization.
• Data Cleaning: Raw data are usually noisy, inconsistent with lots of missing values. Data
cleaning is defined as identifying incomplete, incorrect, inaccurate, or irrelevant parts of
the data and then replacing, modifying, or deleting the dirty data. After categorizing data
into different datasets, it is crucial to filtering the unwanted data and keeping the important
data and locking the sensitive data.
• Data Reshaping: Data reshaping is the process of converting the refined data into useful
patterns that can help to understand the values of data more. Data reshaping is done to fit
well-defined output that can be understood by a broad range of audiences.
• Data Storage: After cleaning and reshaping data, it should be stored for further analysis.
Data warehouses are used for storing the preprocessed data.
• Data preparation is necessary for improving data quality and removing unwanted errors.
Improved data quality will help in finding useful knowledge and helps in understanding the
valuable insight and business values from data.
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
8
Data exploration allows data scientists to understand the datasets and explore the hidden business
values. For this, the preprocessed clean data should be analyzed to investigate the parts which are
more useful to discover knowledge. The prepared data is analyzed by the data scientist, to check
which process mining algorithm is best suited for the analyzed data and is applied to it. Out of the
mining, the information extracted is converted into business values by data analysts and business
analysts together. Data can be explored manually by writing some scripts and/or by using
automated tools like data visualization tool. Data exploration provides instructions for applying
data mining and process mining to discover knowledge.
As the knowledge discovery process is made from a massive amount of data, its efficiency and
accuracy are the critical concerns to take care of. Discovering information from such a huge
amount of data is out of scope when the traditional data processing techniques are used. Hence, it
is essential to use a proper knowledge discovery technique or approach, such as process mining.
4.2. PROCESS MINING
Process mining is a technique used in process management for analyzing business processes
based on event logs [21]. A business process is defined as the sequence oftasks or activities
performed
Figure1. Knowledge discovery from big data
by the people or systems in which a specific sequence produces a service. In addition to the
identification of business values or insights, process mining helps in recognizing and
understanding the bottlenecks, inefficiencies, deviations, and risks from data [14]. Several
mining algorithms can be employed to identify patterns, trends, and details present in the event
logs of the IT systems such as heuristic miner, fuzzy manner, and genetic miners [32].
A data scientist has to face many challenges when using process mining for big data in value
discovery. Some of these challenges are listed below:
• Many of the process discovery algorithms work for the static events in a stagnant fashion,
but not pertinent for processing real-time event streams.
• Distribution of keys (load balancing) is another issue regarding the process mining
algorithm. Load balancing is a technique for distributing workloads uniformly [22].
• Because of the lack of remote data access support and the lack of information about
internal storage, integrity checking has become a crucial aspect. Integrity checking is to
ensure the validity and accuracy of data coming from different logs.
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
9
• Big data application’s massive amount of data will result in poor quality. The low-quality
data leads to the accessibility and integrity issue by restricting the user or organization with
limited access to the data.
Data quality can be improved by using the primary process mining analysis. This analysis reveals
the vital processes which are not tracked by information systems. This process discovery will
help in unfolding the new tracking capabilities such as radio frequency identification (RFID).
This tracking power will facilitate advanced analysis, which leads to identify the broader insights.
5. CONCEPTUAL REBD FRAMEWORK
In this section, we describe our proposed big data requirements engineering framework, REBD,
as depicted in Figure 2. This framework explains the planned approach for carrying out big data
projects for improving the success rate. The purpose of this conceptual framework is to eradicate
challenges like deciding whether to perform big data requirements engineering or traditional
requirements engineering, knowledge discovery, and balancing the efforts to have the successful
execution as described in the following.
Initially, when the project starts to develop, it is crucial to derive and analyze its characteristics to
find whether the project falls under the category of big data or traditional projects. Doing this at
the earliest phase of the project will save a considerable amount of time spent on the project
lifecycle. To identify the category of the project, we have to figure out whether the recognized
characteristics are big data characteristics or not. As mentioned earlier, the main characteristics of
big data projects are 5V’s: volume, velocity, variety, volatility, and variability. Next, a
comprehensive quantitative classification model proposed by [1] is used to verify the project
type. This model combines the recognized big data characteristics to figure out the project type.
Figure 2. Requirements Engineering for Big Data (REBD) framework
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
10
If the project is being considered in the group of traditional projects, traditional RE activities
should be carried out in the order specified in Figure 1 by T1 to T5 labels. However, if the project
is found to be in the group of big data projects, then it should be checked whether RE and SD are
balanced or not.
Balancing RE and SD determines the challenges which are related to RE and SD activities.
Challenges related to RE is identifying 5V’s, which has been conducted already, and the
challenges associated with SD are designing, coding, testing, and maintaining a big data end-user
application. SD challenges are very much concerned because it may lead to the failure of the
project, even after spending a lot of time in RE activities. To resolve this issue, a model has been
proposed by [33], which contains a separate unit for doing the research needed for RE and SD.
Obviously, anything without proof of concepts (PoC) and research may cause a shortage of time.
The research and PoC, which give the optimal solution, should be used in the project practice.
This ensures the balance between RE and SD, and the development of novel and better big data
end-user applications. Once there is a solution for all challenges, then big data RE activities
should be carried out in the order specified in Figure 1 by B1 to B8 labels.
After gathering engineered requirements documented in the SRS report, other processes of the
software development life cycles, including design, implementation, testing, deployment, and
maintenance, will start to produce software with the highest quality and lowest cost in the
shortest time possible.
6. CONCLUSION
One of the main reasons behind the failure of many projects is improper requirements
engineering and failing to capture the stakeholders’ requirements, which leads to time-consuming
rework, inadequate deliveries, and budget overruns. So, performing requirements engineering is a
crucial phase of the project’s development lifecycle, especially for big data projects. As big data
is one of the new and emerging technology, developing big data projects and improving their
success rates and market value is an excellent achievement for companies. One of the outcomes
of big data RE process is discovering the business values and insights from the existing big data.
There is a lot of knowledge, valuable information, business values, and quality features hidden in
a large amount of historical and actual raw data existing in big data projects. To analyze these
data and discover hidden requirements, data scientists need to collaborate with SRE in big data
RE activities. However, employing traditional data processing is not enough to discover valuable
information from data because of the 5V’s characteristics of big data. To analyze these data and
discover hidden requirements, data scientists need to use advanced technology like process
mining.
In this paper, we have aimed to describe the importance of process mining in the big data RE
process. In addition to improve the business value, process mining helps in determining risks,
deviations, inefficiencies, and bottlenecks of the systems. This paper explains the detailed steps
of the knowledge discovery from data preparation to the data exploration process. Moreover, the
detailed steps for carrying out big data RE activities in the proposed REBD framework has been
explained. This conceptual framework helps to increase the success rate of project development.
Although process mining increases the market value and digs deeper into the data by making sure
that the complete set of collected data is covered, the framework, along with process mining,
should be applied practically for the big data projects to validate its efficiency and the market
value. This research will be extended to discover better tools for knowledge discovery, which can
be used in various industries. Also, the management of big data quality requirements will be
investigated in the future in a preferred way.
11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
11
REFERENCES
[1] M. Volk, N. Jamous, and K. Turowski, “Ask the right questions: requirements engineering for the
execution of big data projects,” presented at the 23rd Americas Conference on Information Systems,
SIGITPROJMGMT, Boston, MA, Aug. 2017, pp 1-10.
[2] J. Dick, E. Hull, and K. Jackson, Requirements Engineering, 4th edition, Springer International
Publishing, 2017.
[3] P.A. Laplante, Requirements Engineering for Software and Systems, 3rd Edition. Auerbach
Publications (T&F), 20171024. VitalBook file, 2017.
[4] D. Arruda, “Requirements engineering in the context of big data applications,” ACM SIGSOFT
Software Engineering Notes, 43(1): 1-6, Mar. 2018.
[5] A.G. Khan, et al., “Does software requirement elicitation and personality make any relation?” Journal
of Advanced Research in Dynamical and Control Systems. 11. 1162-1168, 2019.
[6] T.A. Bahill and S.J. Henderson, “Requirements development, verification, and validation exhibited in
famous failures,” Systems Engineering, 8(1): 1–14, 2005.
[7] Pulse of the Profession 2018: Success in Disruptive Times, 2018.
[8] C. Gopal, et al., “Worldwide big data and analytics software forecast, 2019–2023,” IDC Market
Analysis, US44803719, Sept. 2019.
[9] I. Noorwali, D. Arruda, and N. H. Madhavji, “Understanding quality requirements in the context of
big data systems,” presented at the 2nd International Workshop on Big Data Software Engineering
(BIGDSE), Austin, USA, May 2016, pp. 76-79.
[10] D. Arruda and N.H. Madhavji, “State of requirements engineering research in the context of big data
applications,” presented at the 24th International Working Conference on Requirements Engineering:
Foundation for Software Quality (REFSQ), Utrecht, The Netherlands, Mar. 2018, pp 307-323.
[11] M. Volk, D. Staegemann, M. Pohl, and K. Turowski, “Challenging big data engineering: positioning
of current and future development,” presented at the 4th International Conference on Internet of
Things, Big Data and Security (IoTBDS), Heraklion, Greece, May. 2019, pp 351-358.
[12] H.H. Altarturi, K. Ng, M.I.H. Ninggal, A.S.A. Nazri, and A.A.A. Ghani, “A requirement engineering
model for big data software,” presented at the 2nd International Conference on Big Data Analysis
(ICBDA), Beijing, China, Mar. 2017, pp 111-117.
[13] K.J. Cios, W.Pedrycz, R.W.Swiniarski, L. Kurgan, Data Mining: a Knowledge Discovery Approach.
Springer, 2010.
[14] W. van der Aalst, Process Mining in Action: Principles, Use Cases and Outlook, Springer, 1st ed.
2020.
[15] M. Ghasemi, “What requirements engineering can learn from process mining,” presented at the 1st
International Workshop on Learning from other Disciplines for Requirements Engineering (D4RE),
Banff, Canada, Aug. 2018, pp 8-11.
[16] S. Kourla, E. Putti, and M. Maleki, “REBD: A Conceptual Framework for Big Data Requirements
Engineering,” 7th International Conference on Computer Science and Information Technology (CSIT
2020), Helsinki, Finland, June 2020.
[17] D. Arruda, N.H. Madhavji, and I. Noorwali, “A validation study of a requirements engineering
artefact model for big data software development projects,” presented at the 14th International
Conference on Software Technologies (ICSOFT), Prague, Czech Republic, Jul. 2019, pp 106-116.
[18] B. Jan, et al., “Deep learning in big data analytics: a comparative study,” Journal of Computer
Electrical Engineering, 75(1): 275-287, 2019.
[19] A. Haldorai, A. Ramum, and C. Chow, “Editorial: big data innovation for sustainable cognitive
computing,” Mobile Netw Application Journal, 24(1): 221-226, 2019.
[20] R.H. Hariri, E.M. Fredericks, and K.M. Bowers, “Uncertainty in big data analytics: survey,
opportunities, and challenges,” Journal of Big Data, 6(1), 2019.
[21] J. Eggers and A. Hein, “Turning Big Data Into Value: A Literature Review on Business Value
Realization From Process Mining,” presented at the Twenty-Eighth European Conference on
Information Systems (ECIS2020), Marrakesh, Morocco, Jun. 2020.
[22] K.K Azumah, S. Kosta, and L.T. Sørensen, “Load Balancing in Hybrid Clouds Through Process
Mining Monitoring,” Internet and Distributed Computing Systems, Springer, Cham, 2019, pp 148-
157.
12. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 4, August 2020
12
[23] S. Ramachandran, S. Dodda, and L. Santapoor, “Overcoming Social Issues in Requirements
Engineering,” in Meghanathan N., Kaushik B.K., Nagamalai D. (eds) Advanced Computing,
Springer, Berlin, Heidelberg, 2011, pp 310-324.
[24] M. Batra and A. Bhatnagar, “Requirements Elicitation Technique Selection: A Five Factors
Approach,” International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 –
8958, 8(5C):1332-3141, India, May 2019.
[25] K. Lyko, M. Nitzschke, and AC. Ngonga Ngomo, “Big data acquisition,” in New Horizons for a
Data-Driven Economy, Springer, Cham, 2016, pp 35-61.
[26] M. Suhaib, “Conflicts Identification among Stakeholders in Goal Oriented Requirements Engineering
Process,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), 2019.
[27] C. Yao, “The Effective Application of Big Data Analysis in Supply Chain Management,” IOP
Conference Series: Materials Science and Engineering Mar. 2020.
[28] X. Han, X. Wang, and H. Fan, “Requirements analysis and application research of big data in power
network dispatching and planning,” presented at the 3rd Information Technology and Mechatronics
Engineering Conference (ITOEC), Chongqing Shi, China, Oct. 2017, pp 663-668.
[29] J. Kelly, M.E. Jennex, K. Abhari, A. Durcikova, and E. Frost, “Data in the Wild: A KM Approach to
Collecting Census Data Without Surveying the Population and the Issue of Data Privacy,” in
Knowledge Management, Innovation, and Entrepreneurship in a Changing World, IGI Global, 2020,
pp. 286-312.
[30] D. Pandey, U. Suman, and A. Ramani, “A Framework for Modelling Software Requirements,”
International Journal of Computer Science Issues. 8(3):164, 2011.
[31] O. Marbán, G. Mariscal, and J. Segovia, “A Data Mining & Knowledge Discovery Process Model,” in
book Data Mining and Knowledge Discovery in Real Life Applications, IntechOpen, 2009.
[32] O. Dogan, J.L. Bayo-Monton, C. Fernandez-Llatas, and B. Oztaysi, “Analyzing of gender behaviors
from paths using process mining: A shopping mall application,” Sensors, 19(3), 557, 2019.
[33] NH. Madhavji, A. Miranskyy, and K. Kontogiannis, “Big picture of big data software Engineering:
with example research challenges,” presented at the IEEE/ACM 1st International Workshop on Big
Data Software Engineering (BIGDSE), Florence, Italy, May 2015, pp 11-14.
AUTHORS
Sandhya Rani Kourla received her Bachelor’s degree in Computer Science and
Software Engineering from Kuvempu University, Davangere, India, in 2011. She is
currently pursuing her Master’s degree majoring in Computer Science and Software
Engineering from the University of Detroit Mercy, MI, USA. Before joining Detroit
Mercy, she worked as a software engineer in Mindtree Ltd, Bangalore, India. She is
skilled in Requirements Engineering, Software Engineering, Software development,
Agile software development, and Manual testing.
Eesha Putti is a Master student in the Management Information system at the
University of Detroit Mercy, Michigan, USA. She received her B.Tech in Computer
Science from Manav Bharati University, Shimla, India. Prior to this, she had
participated in several Computer Science Fairs and had developed a skill relevant to
Computer Science and Software engineering. She has an aspiration to exile further in
the field of Big Data, Data Base Management Systems, and Cloud related areas.
Mina Maleki received her Bachelor’s degree in computer engineering from Azzahra
University, Tehran, Iran, in 2002, her Master in computer engineering and
information technology from Amirkabir University of Technology, Tehran, Iran, in
2006, and her Ph.D. in computer science from the University of Windsor, Canada, in
2014. She is currently working as an Assistant Professor of Software Engineering and
Computer Science at the University of Detroit Mercy, MI, USA. Her research
interests are mainly focused on software engineering, machine learning, text, and big
data mining.