This document discusses using open source software (OSS) to provide automated disturbance analytics and system-wide dashboard insights. It summarizes the benefits of OSS including lower costs, faster deployment, and enabling collaboration. It then describes an OSS data layer and tools developed including a power quality dashboard to aggregate and visualize data from across a utility's system. A case study is presented of the open power quality dashboard currently in beta testing at three major utilities. The dashboard provides a system-wide view of power quality events and trends to help utilities more proactively manage their system.
Power from big data - Are Europe's utilities ready for the age of data?Steve Bray
European utilities are facing growing volumes of data from smart meters and grids, but many are not yet maximizing the value of the data. While utilities rate themselves highly in collecting data, nearly half say they do not consistently maximize its value. Strategies for leveraging big data are immature, with over 40% having no strategy or just beginning to develop one. Utilities will need to improve at analyzing large amounts of diverse data and developing new business models to gain competitive advantage from big data insights. Talent shortages, organizational silos, and a lack of standards also pose challenges to utilities effectively capturing value from big data.
This document provides an overview of predictive analytics and its growing importance. It discusses how advances in technologies like cloud computing and the internet of things are enabling businesses to gather and analyze vast amounts of data. While descriptive and diagnostic analytics describe what happened in the past, predictive analytics uses statistical techniques to create models that forecast future outcomes. The document outlines several key drivers that are pushing predictive analytics towards mainstream adoption over the next few years, including easier-to-use tools, open source software, innovation from startups, and the availability of cloud-based solutions. It concludes that the combination of big data and predictive analytics will continue to accelerate innovation across industries.
Students at the University of Michigan are researching how to use big data to help predict weather patterns and avoid flight delays related to weather. They analyzed 10 years of hourly weather data, which is a huge dataset, to understand similarities in past weather that could help predict future weather. This predictive analysis using big data has the potential to help airlines be cautious of bad weather in advance and prevent delays or cancellations. The goal is to apply big data computing methods to the large weather dataset to solve the social issue of frequent flight delays and cancellations due to unexpected weather.
This document discusses uncertainty in big data analytics. It begins by providing background on big data, defining the common "5 V's" characteristics of big data - volume, variety, velocity, veracity, and value. It then discusses uncertainty, which exists in big data due to noise, incompleteness, and inconsistency in data. The document surveys techniques for big data analytics and how uncertainty impacts machine learning, natural language processing, and other artificial intelligence approaches. It identifies challenges that uncertainty presents and strategies for mitigating uncertainty in big data analytics.
This document discusses challenges and outlooks related to big data. It begins with an introduction describing how big data is being collected and analyzed in various fields such as science, education, healthcare, urban planning, and more. It then outlines the key phases in big data analysis: data acquisition and recording, information extraction and cleaning, data integration and representation, query processing and analysis, and result interpretation. For each phase, it discusses challenges and how existing techniques can be applied or extended to address big data issues. Some of the major challenges discussed are data scale, heterogeneity, lack of structure, privacy, timeliness, provenance, and visualization across the entire big data analysis pipeline.
The document discusses challenges in analytics for big data. It notes that big data refers to data that exceeds the capabilities of conventional algorithms and techniques to derive useful value. Some key challenges discussed include handling the large volume, high velocity, and variety of data types from different sources. Additional challenges include scalability for hierarchical and temporal data, representing uncertainty, and making the results understandable to users. The document advocates for distributed analytics from the edge to the cloud to help address issues of scale.
1. The U.S. Census Bureau faces challenges from the rise of big data sources produced outside of traditional government surveys. These new sources are generated faster and more cheaply than surveys.
2. To remain reliable sources of demographic and economic information, the Census Bureau must integrate these new big data sources with traditional surveys. This requires linking massive datasets and developing new statistical modeling techniques.
3. The Census Bureau is exploring ways to use new big data sources like web search data, social media, and e-commerce transactions to improve surveys and provide more timely, detailed information. However, maintaining privacy and developing new technology is difficult.
Power from big data - Are Europe's utilities ready for the age of data?Steve Bray
European utilities are facing growing volumes of data from smart meters and grids, but many are not yet maximizing the value of the data. While utilities rate themselves highly in collecting data, nearly half say they do not consistently maximize its value. Strategies for leveraging big data are immature, with over 40% having no strategy or just beginning to develop one. Utilities will need to improve at analyzing large amounts of diverse data and developing new business models to gain competitive advantage from big data insights. Talent shortages, organizational silos, and a lack of standards also pose challenges to utilities effectively capturing value from big data.
This document provides an overview of predictive analytics and its growing importance. It discusses how advances in technologies like cloud computing and the internet of things are enabling businesses to gather and analyze vast amounts of data. While descriptive and diagnostic analytics describe what happened in the past, predictive analytics uses statistical techniques to create models that forecast future outcomes. The document outlines several key drivers that are pushing predictive analytics towards mainstream adoption over the next few years, including easier-to-use tools, open source software, innovation from startups, and the availability of cloud-based solutions. It concludes that the combination of big data and predictive analytics will continue to accelerate innovation across industries.
Students at the University of Michigan are researching how to use big data to help predict weather patterns and avoid flight delays related to weather. They analyzed 10 years of hourly weather data, which is a huge dataset, to understand similarities in past weather that could help predict future weather. This predictive analysis using big data has the potential to help airlines be cautious of bad weather in advance and prevent delays or cancellations. The goal is to apply big data computing methods to the large weather dataset to solve the social issue of frequent flight delays and cancellations due to unexpected weather.
This document discusses uncertainty in big data analytics. It begins by providing background on big data, defining the common "5 V's" characteristics of big data - volume, variety, velocity, veracity, and value. It then discusses uncertainty, which exists in big data due to noise, incompleteness, and inconsistency in data. The document surveys techniques for big data analytics and how uncertainty impacts machine learning, natural language processing, and other artificial intelligence approaches. It identifies challenges that uncertainty presents and strategies for mitigating uncertainty in big data analytics.
This document discusses challenges and outlooks related to big data. It begins with an introduction describing how big data is being collected and analyzed in various fields such as science, education, healthcare, urban planning, and more. It then outlines the key phases in big data analysis: data acquisition and recording, information extraction and cleaning, data integration and representation, query processing and analysis, and result interpretation. For each phase, it discusses challenges and how existing techniques can be applied or extended to address big data issues. Some of the major challenges discussed are data scale, heterogeneity, lack of structure, privacy, timeliness, provenance, and visualization across the entire big data analysis pipeline.
The document discusses challenges in analytics for big data. It notes that big data refers to data that exceeds the capabilities of conventional algorithms and techniques to derive useful value. Some key challenges discussed include handling the large volume, high velocity, and variety of data types from different sources. Additional challenges include scalability for hierarchical and temporal data, representing uncertainty, and making the results understandable to users. The document advocates for distributed analytics from the edge to the cloud to help address issues of scale.
1. The U.S. Census Bureau faces challenges from the rise of big data sources produced outside of traditional government surveys. These new sources are generated faster and more cheaply than surveys.
2. To remain reliable sources of demographic and economic information, the Census Bureau must integrate these new big data sources with traditional surveys. This requires linking massive datasets and developing new statistical modeling techniques.
3. The Census Bureau is exploring ways to use new big data sources like web search data, social media, and e-commerce transactions to improve surveys and provide more timely, detailed information. However, maintaining privacy and developing new technology is difficult.
La XXIV Olimpiada Espinosina 2015 establece las bases y normas para organizar las competiciones deportivas a nivel primario. Los objetivos son fomentar la integración y práctica del deporte entre la familia Espinosina mediante su participación en actividades según sus aptitudes, y encaminar a los estudiantes hacia la obtención de valores morales y espirituales como parte de su formación personal. Se detallan los fixtures y eliminatorias para los deportes de futsal, balonmano y voleibol entre los diferentes grados y secciones.
This document summarizes a study on the structure of benthic diatom communities in the Arieş River catchment area in Transylvania, Romania. Samples were collected from 18 sites in the river and its tributaries in 2008. A total of 214 diatom taxa were identified, with the most diverse genera being Navicula, Nitzschia, Cymbella, and Gomphonema. Water quality parameters like pH, salinity, and conductivity varied among sites due to geological and human factors. Impacts of pollution on diatom community structure were detected. Floristic similarity analysis showed seasonal grouping of communities and less influence of water quality variations.
The document is a collection of photos from Flickr shared under various Creative Commons licenses. There are over 30 photos in total contributed by different photographers showcasing a variety of subjects including nature, cities, and people. Each photo is credited and linked back to the original on Flickr along with the type of Creative Commons license it carries.
This document summarizes the history and technology of optical fibers. It discusses key developments such as Alexander Graham Bell's patent in 1880, the discovery of high light loss in 1965, and optical fiber technology becoming the backbone of long-distance phone networks in the 1980s. It describes how optical fibers work via total internal reflection to transmit light along their length and are made of glass or plastic with a core and cladding. The document also differentiates between single-mode and multi-mode optical fibers and their uses, advantages, and applications.
1) The study investigated how larval and adult diet affect desiccation resistance in the marula fruit fly Ceratitis cosyra. Larvae were reared on either high- or low-yeast diets and adults were fed either sugar alone or sugar with additional yeast hydrolysate (a protein source).
2) Results showed that larval diet had a greater influence than adult diet on desiccation resistance. Flies reared as larvae on a low-yeast diet exhibited higher desiccation resistance as adults compared to those from a high-yeast larval diet.
3) Adult diet effects differed between males and females. Females thrived on a low protein adult diet irrespective
This document summarizes the results of a survey conducted at a Swedish medical college to understand student perceptions of problem-based learning (PBL) group tutorials. The survey had a high response rate of 84% and included students from terms 6-11 who had experience with PBL across the curriculum. Factor analysis identified four key factors related to student perceptions: 1) PBL as a learning method, 2) the tutor's role, 3) stress and insecurity from PBL, and 4) traditional teaching methods. Overall, students valued PBL but saw room for improvement in tutorial aims/objectives, group size, and more formal lectures. Perceptions declined over time, and students saw uneven tutor quality as an issue.
Singapore has four official languages but English is the most common. It has used various currencies throughout its history and currently uses the Singapore dollar. Singapore is an island city-state located in Southeast Asia with a population of over 5 million. It has a parliamentary republic government system. Some of Singapore's most popular tourist attractions include Universal Studios Singapore, Sentosa Island, the Night Safari zoo, and the Sri Mariamman Hindu temple. Singapore has a diverse culture that reflects its Chinese, Malay, Indian and European influences and its main religions include Buddhism, Taoism, Christianity, Islam and Hinduism. Major festivals celebrated include the Dragon Boat Festival and the biennial Garden Festival. Singapore cuisine features a blend of Chinese, Malay
WR Based Opinion Mining on Traffic Sentiment Analysis on Social MediaIRJET Journal
This document presents a study on rule-based traffic sentiment analysis (TSA) using social media data. The study aims to develop a system to automatically retrieve tweets related to traffic and extract safety topics and sentiment polarity using unsupervised sentiment analysis. The system architecture crawls web data, performs preprocessing, extracts subjects/objects, extracts sentiment properties, and classifies sentiment. The goal is to help reduce traffic injuries and identify risk regions in real-time by monitoring public sentiment on social media. The study argues that while sentiment analysis research exists, more work is needed on transportation-related sentiment to improve transportation efficiency and safety.
ANOMALY DETECTION AND ATTRIBUTION USING AUTO FORECAST AND DIRECTED GRAPHSIJDKP
In the business world, decision makers rely heavily on data to back their decisions. With the quantum of
data increasing rapidly, traditional methods used to generate insights from reports and dashboards will
soon become intractable. This creates a need for efficient systems which can substitute human intelligence
and reduce time latency in decision making. This paper describes an approach to process time series data
with multiple dimensions such as geographies, verticals, products, efficiently, and to detect anomalies in
the data and further, to explain potential reasons for the occurrence of the anomalies. The algorithm
implements auto selection of forecast models to make reliable forecasts and detect such anomalies. Depth
First Search (DFS) is applied to analyse each of these anomalies and find its root causes. The algorithm
filters the redundant causes and reports the insights to the stakeholders. Apart from being a hair-trigger
KPI tracking mechanism, this algorithm can also be customized for problems lke A/B testing, campaign
tracking and product evaluations.
The advent of hybrid clouds, multi clouds and app driven business models hasnecessitated a network that has to be robust, secure and scalable to meet rapidly changing business expectations.
This paper acknowledges the great improvements that have taken place in lightning location systems, and in power system monitoring data, over the past 20 years or more. However, it suggests that there may be even more refinement possible if these two disparate data systems are brought together at the sensor data level rather than simply comparing the independent system results. It also covers a brief history of open source software (OSS) and discusses the advantages that OSS provides.
Over the past decade, cloud computing has acted as a disrupter in several areas of IT business. Soon, it will overhaul one area of technology that has been in rapid growth itself: Data Analytics. Nicky will focus on the recent study of IBM Institute of Business Value which shows that capabilities that enable an organization to consume data faster – to move from raw data to insight-driven actions – are now the key differentiator to creating value using data and analytics. He will also talk about the requirements for the underlying infrastructure as critical component allowing real-time crunching and analysis of high volume of data. Based on real cases like retailers and energy companies, we will look at five predictions in five years, based on:
Analytics, Big data, and Cloud coming together will energize the Speed Advantage.
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
This document compares and analyzes various tools for data mining and big data mining. It discusses traditional open source data mining tools like Orange, R, Weka, Shogun, Rapid Miner and KNIME. Each tool has different capabilities for data preprocessing, machine learning algorithms, visualization, platforms and programming languages. The document aims to help researchers select the most appropriate data mining tool for their needs and research.
This document discusses bridging reliability engineering and systems engineering when developing complex systems with software. It recommends including a formal knowledge management system to store and retrieve failure information from past projects. This closed-loop between reliability tools and systems engineering processes would help identify potential failure modes earlier and improve dependability. The document maps commonly used reliability engineering tools to each phase of the systems engineering lifecycle to integrate learnings from past failures into new designs.
This paper contains the details of the study
of Insurance Management system. The developed system
will manage all the information regarding Insured and
policies offered by the Life Insurance companies. It also
contains an integrated tool of voice enabled appointment
scheduler that alerts an agent for his daily activities. It also
contains features like Smart Data backup system,
Provisioning System, Policies Record, Commission Reports
The application created Proposal/ Policy Entries and then
was helpful for agents. It will be designed to offer east
accessible to all records to provide better maintainability
and to enable the user to make the required modification
as and when necessary. Execution of this project would
enable the user to seek, use and manipulate the records
pertaining to every client.
This document discusses stream computing and its applications. Stream computing involves processing continuous streams of data in real-time, as opposed to batch processing of large static datasets. It describes key aspects of stream computing like filtering data streams and producing output streams. It also provides examples of applications that can benefit from stream computing, such as efficient traffic management, real-time surveillance, critical care monitoring in hospitals, and intrusion detection systems. The document concludes that stream computing platforms like System S are well-suited for scalable and adaptive real-time data processing.
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
This document discusses analyzing transportation data using open source big data analytic tools. It provides an overview of H2O and SparkR, two popular tools. It then demonstrates applying these tools to a transportation dataset, using a generalized linear model. Specifically, it shows importing and splitting the data, building a GLM model with H2O and SparkR, making predictions on test data, and comparing predicted versus actual values. The document provides examples of the coding and outputs at each step of the analysis process.
The article describes types of data used in autonomous driving, its intrinsic value and ability to monetize. Ecosystem data, fast versus slow moving data informs AV business models
La XXIV Olimpiada Espinosina 2015 establece las bases y normas para organizar las competiciones deportivas a nivel primario. Los objetivos son fomentar la integración y práctica del deporte entre la familia Espinosina mediante su participación en actividades según sus aptitudes, y encaminar a los estudiantes hacia la obtención de valores morales y espirituales como parte de su formación personal. Se detallan los fixtures y eliminatorias para los deportes de futsal, balonmano y voleibol entre los diferentes grados y secciones.
This document summarizes a study on the structure of benthic diatom communities in the Arieş River catchment area in Transylvania, Romania. Samples were collected from 18 sites in the river and its tributaries in 2008. A total of 214 diatom taxa were identified, with the most diverse genera being Navicula, Nitzschia, Cymbella, and Gomphonema. Water quality parameters like pH, salinity, and conductivity varied among sites due to geological and human factors. Impacts of pollution on diatom community structure were detected. Floristic similarity analysis showed seasonal grouping of communities and less influence of water quality variations.
The document is a collection of photos from Flickr shared under various Creative Commons licenses. There are over 30 photos in total contributed by different photographers showcasing a variety of subjects including nature, cities, and people. Each photo is credited and linked back to the original on Flickr along with the type of Creative Commons license it carries.
This document summarizes the history and technology of optical fibers. It discusses key developments such as Alexander Graham Bell's patent in 1880, the discovery of high light loss in 1965, and optical fiber technology becoming the backbone of long-distance phone networks in the 1980s. It describes how optical fibers work via total internal reflection to transmit light along their length and are made of glass or plastic with a core and cladding. The document also differentiates between single-mode and multi-mode optical fibers and their uses, advantages, and applications.
1) The study investigated how larval and adult diet affect desiccation resistance in the marula fruit fly Ceratitis cosyra. Larvae were reared on either high- or low-yeast diets and adults were fed either sugar alone or sugar with additional yeast hydrolysate (a protein source).
2) Results showed that larval diet had a greater influence than adult diet on desiccation resistance. Flies reared as larvae on a low-yeast diet exhibited higher desiccation resistance as adults compared to those from a high-yeast larval diet.
3) Adult diet effects differed between males and females. Females thrived on a low protein adult diet irrespective
This document summarizes the results of a survey conducted at a Swedish medical college to understand student perceptions of problem-based learning (PBL) group tutorials. The survey had a high response rate of 84% and included students from terms 6-11 who had experience with PBL across the curriculum. Factor analysis identified four key factors related to student perceptions: 1) PBL as a learning method, 2) the tutor's role, 3) stress and insecurity from PBL, and 4) traditional teaching methods. Overall, students valued PBL but saw room for improvement in tutorial aims/objectives, group size, and more formal lectures. Perceptions declined over time, and students saw uneven tutor quality as an issue.
Singapore has four official languages but English is the most common. It has used various currencies throughout its history and currently uses the Singapore dollar. Singapore is an island city-state located in Southeast Asia with a population of over 5 million. It has a parliamentary republic government system. Some of Singapore's most popular tourist attractions include Universal Studios Singapore, Sentosa Island, the Night Safari zoo, and the Sri Mariamman Hindu temple. Singapore has a diverse culture that reflects its Chinese, Malay, Indian and European influences and its main religions include Buddhism, Taoism, Christianity, Islam and Hinduism. Major festivals celebrated include the Dragon Boat Festival and the biennial Garden Festival. Singapore cuisine features a blend of Chinese, Malay
WR Based Opinion Mining on Traffic Sentiment Analysis on Social MediaIRJET Journal
This document presents a study on rule-based traffic sentiment analysis (TSA) using social media data. The study aims to develop a system to automatically retrieve tweets related to traffic and extract safety topics and sentiment polarity using unsupervised sentiment analysis. The system architecture crawls web data, performs preprocessing, extracts subjects/objects, extracts sentiment properties, and classifies sentiment. The goal is to help reduce traffic injuries and identify risk regions in real-time by monitoring public sentiment on social media. The study argues that while sentiment analysis research exists, more work is needed on transportation-related sentiment to improve transportation efficiency and safety.
ANOMALY DETECTION AND ATTRIBUTION USING AUTO FORECAST AND DIRECTED GRAPHSIJDKP
In the business world, decision makers rely heavily on data to back their decisions. With the quantum of
data increasing rapidly, traditional methods used to generate insights from reports and dashboards will
soon become intractable. This creates a need for efficient systems which can substitute human intelligence
and reduce time latency in decision making. This paper describes an approach to process time series data
with multiple dimensions such as geographies, verticals, products, efficiently, and to detect anomalies in
the data and further, to explain potential reasons for the occurrence of the anomalies. The algorithm
implements auto selection of forecast models to make reliable forecasts and detect such anomalies. Depth
First Search (DFS) is applied to analyse each of these anomalies and find its root causes. The algorithm
filters the redundant causes and reports the insights to the stakeholders. Apart from being a hair-trigger
KPI tracking mechanism, this algorithm can also be customized for problems lke A/B testing, campaign
tracking and product evaluations.
The advent of hybrid clouds, multi clouds and app driven business models hasnecessitated a network that has to be robust, secure and scalable to meet rapidly changing business expectations.
This paper acknowledges the great improvements that have taken place in lightning location systems, and in power system monitoring data, over the past 20 years or more. However, it suggests that there may be even more refinement possible if these two disparate data systems are brought together at the sensor data level rather than simply comparing the independent system results. It also covers a brief history of open source software (OSS) and discusses the advantages that OSS provides.
Over the past decade, cloud computing has acted as a disrupter in several areas of IT business. Soon, it will overhaul one area of technology that has been in rapid growth itself: Data Analytics. Nicky will focus on the recent study of IBM Institute of Business Value which shows that capabilities that enable an organization to consume data faster – to move from raw data to insight-driven actions – are now the key differentiator to creating value using data and analytics. He will also talk about the requirements for the underlying infrastructure as critical component allowing real-time crunching and analysis of high volume of data. Based on real cases like retailers and energy companies, we will look at five predictions in five years, based on:
Analytics, Big data, and Cloud coming together will energize the Speed Advantage.
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
This document compares and analyzes various tools for data mining and big data mining. It discusses traditional open source data mining tools like Orange, R, Weka, Shogun, Rapid Miner and KNIME. Each tool has different capabilities for data preprocessing, machine learning algorithms, visualization, platforms and programming languages. The document aims to help researchers select the most appropriate data mining tool for their needs and research.
This document discusses bridging reliability engineering and systems engineering when developing complex systems with software. It recommends including a formal knowledge management system to store and retrieve failure information from past projects. This closed-loop between reliability tools and systems engineering processes would help identify potential failure modes earlier and improve dependability. The document maps commonly used reliability engineering tools to each phase of the systems engineering lifecycle to integrate learnings from past failures into new designs.
This paper contains the details of the study
of Insurance Management system. The developed system
will manage all the information regarding Insured and
policies offered by the Life Insurance companies. It also
contains an integrated tool of voice enabled appointment
scheduler that alerts an agent for his daily activities. It also
contains features like Smart Data backup system,
Provisioning System, Policies Record, Commission Reports
The application created Proposal/ Policy Entries and then
was helpful for agents. It will be designed to offer east
accessible to all records to provide better maintainability
and to enable the user to make the required modification
as and when necessary. Execution of this project would
enable the user to seek, use and manipulate the records
pertaining to every client.
This document discusses stream computing and its applications. Stream computing involves processing continuous streams of data in real-time, as opposed to batch processing of large static datasets. It describes key aspects of stream computing like filtering data streams and producing output streams. It also provides examples of applications that can benefit from stream computing, such as efficient traffic management, real-time surveillance, critical care monitoring in hospitals, and intrusion detection systems. The document concludes that stream computing platforms like System S are well-suited for scalable and adaptive real-time data processing.
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
This document discusses analyzing transportation data using open source big data analytic tools. It provides an overview of H2O and SparkR, two popular tools. It then demonstrates applying these tools to a transportation dataset, using a generalized linear model. Specifically, it shows importing and splitting the data, building a GLM model with H2O and SparkR, making predictions on test data, and comparing predicted versus actual values. The document provides examples of the coding and outputs at each step of the analysis process.
The article describes types of data used in autonomous driving, its intrinsic value and ability to monetize. Ecosystem data, fast versus slow moving data informs AV business models
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
The document proposes a new framework for efficient semantic search in large datasets. It aims to improve understanding of short texts by enriching them with concepts and related terms from a probabilistic knowledge base. A deep learning model using stacked autoencoders is designed to learn features from the enriched short texts and encode them into binary codes, allowing similarity searches. Experiments show the new approach captures semantics better than existing methods and enables applications like short text retrieval and classification.
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
This document summarizes a research paper that predicts road accidents using machine learning algorithms. It discusses how large datasets have enabled data mining techniques to discover useful information. The paper aims to determine the most suitable machine learning classification technique for road accident prediction. It uses logistic regression, an algorithm that predicts a binary outcome (yes/no). The researchers clean the data, divide it into training and testing sets, and use logistic regression in Jupyter notebooks with the Python programming language. It provides percentage predictions of accident likelihood to users through a website interface. The results show logistic regression can accurately predict accidents for numerical data but has limitations for non-numerical text data.
Data observability is a collection of technologies and activities that allows data science teams to prevent problems from becoming severe business issues.
Big Data Security Challenges: An Overview and Application of User Behavior An...IRJET Journal
This document discusses big data security challenges and the application of user behavior analytics (UBA) to address those challenges. It first provides background on big data, defining its key characteristics and applications. It then outlines security risks to big data like privacy risks and risks to the data itself. Common big data security challenges are also summarized such as issues around data distribution, privacy, integrity and access control. The document then introduces UBA as a novel security analytics method, explaining how it uses machine learning to analyze user behaviors and detect anomalies that may indicate security threats like credential compromise or insider threats. Key advantages of UBA over other security tools are that it can more efficiently detect malicious user behavior and privileged account abuse.
A SESERV methodology for tussle analysis in Future Internet technologies - In...ictseserv
This document introduces a methodology for analyzing "tussles" that may occur between stakeholders with differing interests when new internet technologies are introduced. It defines tussles as conflicts that can arise at each stage of a technology's adoption and use. The methodology involves: 1) Identifying stakeholder roles and interests for a given functionality, 2) Identifying potential tussles between stakeholders, and 3) Assessing the impact of each tussle on stakeholders and the risk of spillover effects on other functionalities. The methodology aims to help understand how new technologies may affect stakeholders and to design technologies that allow for varying outcomes while avoiding instability and spillovers.
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedIRJET Journal
This document proposes a system for efficiently detecting and analyzing spam reviews using a live data feed. The system aims to evaluate genuine customer feedback to help business analysts make decisions. It involves acquiring data from various sources, processing the data in parallel to detect fake reviews, and analyzing the results to identify spam. The key aspects of the system include filtering the data, load balancing among processing servers, aggregating results, and making decisions based on the analysis. The system architecture is divided into three units - data acquisition, data processing, and data analysis and decision making. Various algorithms are used for filtration, load balancing, processing, normalization, and summarization. The system provides accurate identification of spam while extracting useful customer feedback.
Privacy Preserving Aggregate Statistics for Mobile CrowdsensingIJSRED
This document summarizes a research paper on preserving privacy in mobile crowdsensing applications. It discusses how crowdsourced data can be aggregated and mined for valuable information but also risks disclosing sensitive user information. The paper proposes a new framework that introduces multiple agents between users and an untrusted server to help preserve location privacy of both workers and tasks in spatial crowdsourcing applications. Users upload sensed data to random agents, who then aggregate and perturb the statistics before further aggregation to publish overall statistics to third parties while protecting individual privacy through differential privacy techniques. The framework aims to enable privacy-preserving participation in crowdsourcing without relying on any single trusted entity.
Similar to 2015 GT FDA Elmendorf - ADAS and SDI-Title (20)
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCGrid Protection Alliance
GEP was tested against IEEE C37.118 for wide-area distribution of phasor data. Results showed that GEP had much less data loss than C37.118 over the same network conditions. GEP also required 60-70% less bandwidth for large and medium data flows compared to C37.118. There was no significant impact on servers between the two protocols. In conclusion, GEP represents an improved target for high-volume synchrophasor data distribution due to its robust and scalable pub/sub design.
This document provides an overview of open source software developed by the Grid Protection Alliance (GPA) for electric utilities. It describes GPA's business approach of developing open source solutions that are in production use. It then summarizes GPA's clients and the core technologies that many of its products are built upon, including the Grid Solutions Framework and Time Series Library. Finally, it outlines GPA's various open source products for synchrophasors, power quality monitoring, security, and other areas, as well as the services GPA provides.
1) The document discusses using open source software (OSS) tools to build an automated system for advanced power grid analytics.
2) It describes the key components or "building blocks" of such a system - getting data from devices, analyzing the data, and visualizing the results. Specific OSS projects like openMIC, openXDA, and Open PQ Dashboard are presented as implementing these functions.
3) Examples and use cases of automated analytics are provided, like fault detection and automated lightning correlation. Integrating additional analyses is also discussed.
Electric power companies are no exception when it comes to the flood of data now available to support business decisions and practices. To leverage the value in that flood rather than being overwhelmed, new automated analytic systems are critical. This presentation describes an environment that allows the deployment of robust automated systems that integrate data from disparate sources and present targeted proactive notifications and enterprise wide dashboard visualizations.
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
Fred Elmendorf presented on using open source software (OSS) tools to build automated analytics systems. He discussed OSS projects that can get data from devices (openMIC), analyze the data (openXDA), and visualize results (Open PQ Dashboard). Examples of automated analytics included fault detection and breaker timing. Integrating lightning data was also proposed. The OSS approach stimulates collaboration and innovation while reducing costs compared to proprietary software.
Lightning continues to have major adverse impacts on the electric power grid, but there are no automated and tightly integrated systems to quickly and accurately identify all of the specific lightning strokes that impact power systems. As it stands today, comparing the estimated lightning parameters produced by continental scale lightning detection systems to the accurately measured local power system data does not produce the needed results. Nearly two decades ago concepts were described that have the potential to provide comprehensive information from the combination of data from these two different kinds of systems in near real time. This presentation gives an overview of lightning detection system evolution, and electric power data monitoring systems, then suggests using open source software (OSS) to develop a new approach for integrating these disparate systems at a fundamental data level to facilitate a dramatic step change in the timeliness and accuracy of identifying lightning strokes that impact the grid. This OSS based approach will foster collaboration and community involvement to further refine and deploy this new paradigm.
There are many very good power quality data analysis tools, but they are often vendor specific, and designed as an engineering desktop tool. This presentation is an introduction to a 'fleet view' dashboard built in the open source software (OSS) space. It is web based, built on an open source data layer that provides automated analytics, and can accept data from any standard COMTRADE or PQDIF formatted event file.
1. 2015 Georgia Tech Annual
Fault & Disturbance Analysis Conference
April 27-28, 2015 Atlanta, GA
-- 1 --
Automated Disturbance Analytics
And
System-wide Dashboard Insights
Using
Open Source Software
2. -- 2 --
Automated Disturbance Analytics and System-wide
Dashboard Insights Using Open Source Software
Fred L. Elmendorf
Grid Protection Alliance
Chattanooga, TN USA
felmendorf@gridprotectionalliance.org
Abstract— Power companies have invested huge sums of
money in building out their substation infrastructure with
current technology devices and the supporting communications
systems to integrate and operate new devices and data gathering
systems. The challenge created by an increasing number of
intelligent electronic devices (IEDs) producing data, and the
resulting increased volume of data to be managed and analyzed
makes it impossible for a human fully understand the operational
health of the fleet of reporting devices, or to extract all the value
from the data that is being recorded. Economic pressures are
reducing the available staff to analyze data, and customers are
demanding better performance and power quality (PQ). With
greater volumes of data and decreased staff, automated
disturbance analytic systems are becoming ever more critical.
An open source software (OSS) approach maximizes
investments and facilitates industry wide collaboration to meet
the challenge.
Existing desktop tools are not designed for dynamic, real-
time, system-wide reporting, and typically, analysis engineers
and staff are so overwhelmed with data that only the most
critical events can be explored in any detail. Employing state of
the art technologies to aggregate data from the entire fleet of
reporting devices, and positioning that data in a highly
optimized database allows new value to be extracted from the
existing data. An open source ‘dashboard’ presentation of
information related to the entire population of reporting devices,
regardless of the type of device or manufacturer, can quickly
identify and alert on significant events or conditions.
This paper will provide a brief update on the growth and
benefits of OSS for the electric power industry, and a follow-up
to last year’s paper ‘The BIG Picture – A Look at Automated
Systems for Disturbance Analytics using Open Source
Software’. Fleet-wide techniques will be explored that can
move disturbance data analysis from reactive ‘firefighting’ to a
near-real-time understanding that facilitates proactive decisions.
Specific data management, aggregation, and positioning
techniques that make up an effective data layer to support a
responsive and scalable dashboard solution will be presented,
and system-wide insights facilitated by this approach will be
discussed. Whether you choose to use OSS in facing the
automated disturbance analysis challenge or not, this paper will
give you a better understanding of the complexity of the
challenge, and prepare you to make more informed solution
decisions. The paper will conclude with a case study of an
Electric Power Research Institute (EPRI) sponsored open source
power quality (PQ) dashboard funded by a number of major
utilities. The Open PQ Dashboard is currently in beta testing,
and has being deployed at the Tennessee Valley Authority
(TVA), Dominion Virginia Power, and Georgia Transmission
Corporation for further evaluation, testing and extension.
Keywords—power quality, dashboard, open source
software, disturbance analytics
I. GROWTH AND BENEFITS OF OSS
OSS has received a lot of attention already this year as
Microsoft continues with new contributions and provides blog
posts and online “how to” training videos. Microsoft is just one
example of a major, historically proprietary IT company that has
embraced OSS in a huge way. 2015 also marks the ninth year
that Black Duck Software has conducted a comprehensive cross-
industry survey to assess the future of open source, and in a
recent webcast they included these three points:
OSS is becoming a more important part of the software
ecosystem
The use of OSS is critical strategy for commercial
companies
The OSS business model has been validated
There is no longer a question regarding OSS as a possible
solution. It should be evaluated on an equal basis with
proprietary offerings. All software should be evaluated on
quality, security, and features whether OSS or proprietary, but
visibility of the source code, and community involvement give
OSS potential advantages in these areas. A recent EPRI white
paper provides a fresh perspective on OSS, lists some of their
important OSS projects, and presents the results of an electric
3. -- 3 --
utility specific OSS survey conducted in late 2014. The initial
survey results support the observation that OSS is still not well
understood within U.S. electric power companies.
Additional benefits of OSS that are particularly valuable in
the relatively small electric utility industry include:
Lower total cost of ownership
Reduced time to deployment
Stimulates innovation
Encourages and facilitates collaboration
Results from the 2015 Future of Open Source Survey
conducted by Black Duck Software were presented in a webinar
on April 16, 2015i
. Figure 1 below shows examples of a few
recent OSS related presentations and activities.
Figure 1. OSS Collage
II. FOLLOW-UP: “THE BIG PICTURE - …”
Leveraging the benefits of OSS and continuing to encourage
the use of industry standards over the past year has yielded many
improvements in automated disturbance analytic systems.
Following is an update on the gaps identified in “The Big Picture
– A Comprehensive Look at Automated Systems for
Disturbance Analytics using Open Source Software”ii
.
Data Retrieval – The ever increasing demand for more
information on the health and operation of the power system is
driving continuous growth in the communications
infrastructure. While the rate of change varies widely from one
company to another, overall it is improving. With regard to
automated near-real-time disturbance analytics, having this data
highway available is the first step. Managing the traffic on the
data highway is the next critical step in the process and at this
point it is still a patchwork of proprietary vendor supplied
systems. An OSS solution to isolate the analytic processes from
the proprietary uniqueness of reporting devices offers potential
value to all of the players. The OSS approach is good for
vendors because data from their device becomes more valuable
if there are fewer barriers to its use and it is more readily
incorporated into new applications with new audiences. It’s also
good for power companies because they can extract more value
from their installed devices, and have more flexibility in
choosing new hardware solutions. Many vendors and utilities
have expressed interest in an OSS solution, but at this time it has
not been accomplished.
Data Quality – OSS projects are underway to address a
number of data quality and availability issues. In one
application a large historical data set is analyzed to determine
the normal operating range for any trended value. Once the
normal operating range is established, each new data point is
compared to the range and appropriate alarms and notifications
are generated when the range is exceeded. New work for this
year will address missing data, latched values, engineering
reasonableness, and possibly others.
Analytics - Automated fault distance calculations continue
to be enhanced. Ongoing work funded through Dominion
Virginia Power, EPRI, Georgia Transmission Corporation, and
TVA, has added a sixth single-ended distance calculation
method and a native E-Max DFR format parser, and additional
work this year will add double-ended fault distance calculation
and breaker timing analysis and reporting. Additional analytics
under consideration are capacitor bank and other substation
equipment health, and cataloging and reporting on transient
events. The existing OSS data layer is capable of automatically
performing any analytics appropriate for disturbance or trending
data recorded in PQDIF, COMTRADE, or native E-Max DFR
formats.
Applications – Automated fault distance calculation and notification
systems have been deployed at Dominion Virginia Power, Georgia
Transmission Corporation, and TVA. Features and analytic methods
are being enhanced in projects this year as noted above. An exciting
new use for the OSS data layer is to position data for visualization in
an OSS dashboard. The initial development of the dashboard is to
provide a fleet view of PQ related information. An independent web
based OSS system event exploreriii
has been developed to provide
interactive review and comparisons of waveform data associated with
an event. A screenshot of the system event explorer is shown below
in Figure 2. The data layer is also being extended this year to
integrate PQ data with a proprietary EPRI PQ investigation tool.
Figure 2. System Event Explorer
III. REAL-TIME INFORMATION FOR PROACTIVE DECISIONS
Historically, PQ and event related information have been
recorded and archived to support largely manual processes for
event investigation, and manually initiated batch processing to
4. -- 4 --
produce reports of trending data. Typically this data has only
been reviewed to produce periodic reports or to investigate
events that are known to have caused system or customer issues.
Automated real-time processes are capable of analyzing and
categorizing information from every event record or trending
file. In this context, real-time means as soon as the data is
available. Data retrieval processes dictate the ‘real-time’
periodicity and lag time. Data from network connected devices
can be analyzed to produce reports and notifications within
seconds from the time of the event.
IV. EFFECTIVE DATA LAYER
PQ and disturbance data is available from many different
types of devices and different manufacturers. As mentioned
previously, this presents a challenge in retrieving the data from
field devices, and it also presents a challenge in analyzing the
data. Through the extension of a 2012 EPRI OSS project to
prove the concept of automated fault location at the enterprise
level, an open source data layeriv
has been developed to address
these challenges.
The data layer consists of:
• An automated back office service (Windows OS)
• Input parsers for event and trending data
– PQDIF
– IEEE COMTRADE
– EMAX native file format
• Output: database, emails, etc.
• Data sources:
– Power quality (PQ) monitors
– Digital fault recorders (DFRs)
– Other information systems
A logical overview of the automation platform is shown below
in Figure 3.
Figure 3. Logical Overview
A physical overview of the automation platform is shown
below in Figure 4.
Figure 4. Physical Overview
V. SYSTEM-WIDE INSIGHTS
Using the data layer and presentation tools that have been
developed using OSS as previously described in this paper, it is
now possible to draw data together from many disparate data
sources, and present it in a system-wide context. The initial PQ
Dashboard uses this technique to convey information through a
combination of geographic, grid, histogram, and tabular
visualization panels to present a ‘one shot visual’. This ‘one
shot’ approach assists the user in comprehending the
information represented in very large volumes of data.
Additional functionality is being added in current projects that
will facilitate system wide visualization of any trended quantity
overlaid with power system representations. For example, a heat
map of system-wide minimum voltage could be displayed with
a system single line.
VI. PQ DASHBOARD CASE STUDY
In 2014 EPRI initiated a project to use the open source
extensible disturbance analytics platform (openXDA) to provide
the data layer for an OSS PQ Dashboard. The Open PQ
Dashboardv
is currently in beta status, and one of the tasks to be
completed this year is to produce a stable, easily deployable,
maintainable version 1.0. Additional tasks in the project will
provide greatly enhanced geographic displays, add new data
quality and availability alarming and reporting, and other
features as budget and schedule allow. The Open PQ Dashboard
has been deployed at two utilities with a third deployment
scheduled in June, 2015. Because of the OSS nature of the Open
PQ Dashboard and the openXDA, additional features and
functions are being added through independent projects that all
benefit the code base. Some of the features that have been added
through other projects include much more flexible time controls
and application navigation, the inclusion of new tabs for ‘Faults’
and ‘Breaker Timing’, and optimization of code for
5. -- 5 --
responsiveness. An additional EPRI project is underway that
uses the openXDA to integrate PQ data with EPRI’s popular PQ
Investigator tool, and displays the results through the PQ
Dashboard.
An example of the EVENTS tab with the PQ Dashboard in
the Map view is shown below in Figure 5.
Figure 5. PQ Dashboard Events with Map
An example of the EVENTS tab with the PQ Dashboard in
the Grid view is shown below in Figure 6.
Figure 6. PQ Dashboard Events with Grid
An example of the TRENDING tab with the PQ Dashboard in the
Map view is shown below in Figure 7.
Figure 7. PQ Dashboard Trends with Map
An example of the TRENDING tab with the PQ
Dashboard in the Grid view is shown below in Figure 8.
Figure 8. PQ Dashbaord Trends with Grid
VII. SPAWNING NEW TOOLS
The automated analytic functions provided through the
openXDA and the fleet wide visualizations available through the
PQ Dashboard allow the user to quickly understand events or
changes on the system while positioning the relevant data for
detailed analysis. As mentioned earlier, an OSS system event
explorer (openSEE) has been developed to facilitate this detailed
analysis. When openXDA is configured to produce automated
email notifications for fault distance calculations, a link to
openSEE can be imbedded in the email so that a user can
instantly view the waveforms associated with the fault in an
interactive web environment. Additionally, openSEE is directly
available through the PQ Dashboard and allows the user
seamlessly examine the associated waveforms. openSEE is one
example of new analysis tools that can further leverage the
power of the OSS tools described in this paper.
Figure 9. openSEE with Phasor chart
The frameworks are in place, and real-world experience
demonstrates that it is now possible to develop robust, extensible
software systems that can achieve automated disturbance
analytics and system-wide dashboard insights using an OSS
development strategy.
6. -- 6 --
i
2015 Future of Open Source
https://www.blackducksoftware.com/future-of-open-source
ii
The Big Picture
http://www.slideshare.net/FredElmendorf/2014-georgia-tech-
fda-pres-asda-using-oss-37239423
iii
openSEE-System Event Explorer
http://opensee.codeplex.com
iv
openXDA http://openxda.codeplex.com
v
Open Power Quality Dashboard
http://sourceforge.net/projects/epriopenpqdashboard/