This paper studies a new, quantitative approach using fractal geometry to analyse basic tenets
of good programming style. Experiments on C source of the GNU/Linux Core Utilities, a
collection of 114 programs or approximately 70,000 lines of code, show systematic changes in
style are correlated with statistically significant changes in fractal dimension (P≤0.0009). The
data further show positive but weak correlation between lines of code and fractal dimension
(r=0.0878). These results suggest the fractal dimension is a reliable metric of changes that
affect good style, the knowledge of which may be useful for maintaining a code base.
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...cscpconf
The growing need of handwritten Hindi character recognition in Indian offices such as passport, railway etc, has made it a vital area of research. Similar shaped characters are more prone to misclassification. In this paper four Machine Learning (ML) algorithms namely Bayesian Network, Radial Basis Function Network (RBFN), Multilayer Perceptron (MLP), and C4.5 Decision Tree are used for recognition of Similar Shaped Handwritten Hindi Characters (SSHHC) and their performance is compared. A novel feature set of 85 features is generated on the basis of character geometry. Due to the high dimensionality of feature vector, the classifiers can be computationally complex. So, its dimensionality is reduced to 11 and 4 using Correlation-Based (CFS) and Consistency-Based (CON) feature selection techniques respectively. Experimental results show that Bayesian Network is a better choice when used with CFS while C4.5 gives better performance with CON features.
COQUEL: A CONCEPTUAL QUERY LANGUAGE BASED ON THE ENTITYRELATIONSHIP MODELcsandit
As more and more collections of data are available on the Internet, end users but not experts in
Computer Science demand easy solutions for retrieving data from these collections. A good
solution for these users is the conceptual query languages, which facilitate the composition of
queries by means of a graphical interface. In this paper, we present (1) CoQueL, a conceptual
query language specified on E/R models and (2) a translation architecture for translating
CoQueL queries into languages such as XQuery or SQL..
Towards a semantic for uml activity diagram based on institution theory for i...csandit
In this article, we define an approach for model transformation. We use the example of UML
Activity Diagram (UML AD) and Event-B as a source and a target formalism. Before doing the
transformation, a formal semantic is given to the source formalism. We use the institution
theory to define the intended semantic. With this theory, we gain a algebraic specification for
this formalism. Thus, the source formalism will be defined in its own natural semantic meaning
without any intermediate semantic. Model transformation will be performed by a set of
transformation schema which preserve the semantic expressed in the source model during the
transformation process. The generated model expressed in Event-B language will be used for
the formal verification of the source model. As a result, some model expressed in a precise
formalism, the verification of this model can be seen as the verification of the Event-B model
semantically equivalent to the source model. Then, in the present work we combine the
institution theory, Event-Bmethod and graph grammar to develop an approach supporting the
specification, the transformation and the verification of UML AD.
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsWaqas Tariq
Nowadays, code mobility technology is one of the most attractive research domains. Numerous domains are concerned, many platforms are developed and interest applications are realized. However, the poorness of modeling languages to deal with code mobility at requirement phase has incited to suggest new formalisms. Among these, we find Labeled Reconfigurable Nets (LRN) [9], This new formalism allows explicit modeling of computational environments and processes mobility between them. it allows, in a simple and an intuitive approach, modeling mobile code paradigms (mobile agent, code on demand, remote evaluation). In this paper, we propose an approach based on the combined use of Meta-modeling and Graph Grammars to automatically generate a visual modeling tool for LRN for analysis and simulation purposes. In our approach, the UML Class diagram formalism is used to define a meta-model of LRN. The meta-modeling tool ATOM3 is used to generate a visual modeling tool according to the proposed LRN meta-model. We have also proposed a graph grammar to generate R-Maude [22] specification of the graphically specified LRN models. Then the reconfigurable rewriting logic language R-Maude is used to perform the simulation of the resulted R-Maude specification. Our approach is illustrated through examples.
Functional Verification of Large-integers Circuits using a Cosimulation-base...IJECEIAES
Cryptography and computational algebra designs are complex systems based on modular arithmetic and build on multi-level modules where bit-width is generally larger than 64-bit. Because of their particularity, such designs pose a real challenge for verification, in part because large-integer‘s functions are not supported in actual hardware description languages (HDLs), therefore limiting the HDL testbench utility. In another hand, high-level verification approach proved its efficiency in the last decade over HDL testbench technique by raising the latter at a higher abstraction level. In this work, we propose a high-level platform to verify such designs, by leveraging the capabilities of a popular tool (Matlab/Simulink) to meet the requirements of a cycle accurate verification without bit-size restrictions and in multi-level inside the design architecture. The proposed high-level platform is augmented by an assertion-based verification to complete the verification coverage. The platform experimental results of the testcase provided good evidence of its performance and re-usability.
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...cscpconf
The growing need of handwritten Hindi character recognition in Indian offices such as passport, railway etc, has made it a vital area of research. Similar shaped characters are more prone to misclassification. In this paper four Machine Learning (ML) algorithms namely Bayesian Network, Radial Basis Function Network (RBFN), Multilayer Perceptron (MLP), and C4.5 Decision Tree are used for recognition of Similar Shaped Handwritten Hindi Characters (SSHHC) and their performance is compared. A novel feature set of 85 features is generated on the basis of character geometry. Due to the high dimensionality of feature vector, the classifiers can be computationally complex. So, its dimensionality is reduced to 11 and 4 using Correlation-Based (CFS) and Consistency-Based (CON) feature selection techniques respectively. Experimental results show that Bayesian Network is a better choice when used with CFS while C4.5 gives better performance with CON features.
COQUEL: A CONCEPTUAL QUERY LANGUAGE BASED ON THE ENTITYRELATIONSHIP MODELcsandit
As more and more collections of data are available on the Internet, end users but not experts in
Computer Science demand easy solutions for retrieving data from these collections. A good
solution for these users is the conceptual query languages, which facilitate the composition of
queries by means of a graphical interface. In this paper, we present (1) CoQueL, a conceptual
query language specified on E/R models and (2) a translation architecture for translating
CoQueL queries into languages such as XQuery or SQL..
Towards a semantic for uml activity diagram based on institution theory for i...csandit
In this article, we define an approach for model transformation. We use the example of UML
Activity Diagram (UML AD) and Event-B as a source and a target formalism. Before doing the
transformation, a formal semantic is given to the source formalism. We use the institution
theory to define the intended semantic. With this theory, we gain a algebraic specification for
this formalism. Thus, the source formalism will be defined in its own natural semantic meaning
without any intermediate semantic. Model transformation will be performed by a set of
transformation schema which preserve the semantic expressed in the source model during the
transformation process. The generated model expressed in Event-B language will be used for
the formal verification of the source model. As a result, some model expressed in a precise
formalism, the verification of this model can be seen as the verification of the Event-B model
semantically equivalent to the source model. Then, in the present work we combine the
institution theory, Event-Bmethod and graph grammar to develop an approach supporting the
specification, the transformation and the verification of UML AD.
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsWaqas Tariq
Nowadays, code mobility technology is one of the most attractive research domains. Numerous domains are concerned, many platforms are developed and interest applications are realized. However, the poorness of modeling languages to deal with code mobility at requirement phase has incited to suggest new formalisms. Among these, we find Labeled Reconfigurable Nets (LRN) [9], This new formalism allows explicit modeling of computational environments and processes mobility between them. it allows, in a simple and an intuitive approach, modeling mobile code paradigms (mobile agent, code on demand, remote evaluation). In this paper, we propose an approach based on the combined use of Meta-modeling and Graph Grammars to automatically generate a visual modeling tool for LRN for analysis and simulation purposes. In our approach, the UML Class diagram formalism is used to define a meta-model of LRN. The meta-modeling tool ATOM3 is used to generate a visual modeling tool according to the proposed LRN meta-model. We have also proposed a graph grammar to generate R-Maude [22] specification of the graphically specified LRN models. Then the reconfigurable rewriting logic language R-Maude is used to perform the simulation of the resulted R-Maude specification. Our approach is illustrated through examples.
Functional Verification of Large-integers Circuits using a Cosimulation-base...IJECEIAES
Cryptography and computational algebra designs are complex systems based on modular arithmetic and build on multi-level modules where bit-width is generally larger than 64-bit. Because of their particularity, such designs pose a real challenge for verification, in part because large-integer‘s functions are not supported in actual hardware description languages (HDLs), therefore limiting the HDL testbench utility. In another hand, high-level verification approach proved its efficiency in the last decade over HDL testbench technique by raising the latter at a higher abstraction level. In this work, we propose a high-level platform to verify such designs, by leveraging the capabilities of a popular tool (Matlab/Simulink) to meet the requirements of a cycle accurate verification without bit-size restrictions and in multi-level inside the design architecture. The proposed high-level platform is augmented by an assertion-based verification to complete the verification coverage. The platform experimental results of the testcase provided good evidence of its performance and re-usability.
Image-Based Literal Node Matching for Linked Data IntegrationIJwest
This paper proposes a method of identifying and aggregating literal nodes that have the same meaning in Linked Open Data (LOD) in order to facilitate cross-domain search. LOD has a graph structure in which most nodes are represented by Uniform Resource Identifiers (URIs), and thus LOD sets are connected and searched through different domains.However, 5% of the values are literal values (strings without URI) even in a de facto hub of LOD, DBpedia. In SPARQL Protocol and RDF Query Language (SPARQL) queries, we need to rely on regular expression to match and trace the literal nodes. Therefore, we propose a novel method, in which part of the LOD graph structure is regarded as a block image, and then the matching is calculated by image features of LOD. In experiments, we created about 30,000 literal pairs from a Japanese music category of DBpedia Japanese and Freebase, and confirmed that the proposed method determines literal identity with F-measure of 76.1-85.0%.
Bca3020– data base management system(dbms)smumbahelp
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
Mit202 data base management system(dbms)smumbahelp
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
Cmaps as intellectual prosthesis (GERAS 34, Paris)Lawrie Hunter
At the present time, 'increasing accessibility of technology' is readily read as 'increasing accessibility of electronic information technology', but this is to ignore a history of pre-electronic technologies which have generally been conflated with the original media of education, first speech and rather later the writing of continuous text.
The insertion of spaces between words in text was a technology for accessibility of encoding. The paragraph was a technology for the signaling of rhetorical shifts. The bullet list is used for the representation of clusters of notions, either atomic (listing) or aggregates (classification). More substantial technological innovations include the data table and the graph.
One revolutionary technology that has not become mainstream in instructional communication is the Novakian concept map (i.e. the map whose links have text labels to specify the relation between two nodes). This technology has been substantially migrated to electronic information technology, and is arguably more prevalent there than in the traditional sphere, though it is still largely regarded as a novelty or non-essential element of instructional discourse.
This paper reports a case study of a fruitful application of Novakian mapping, wherein EAP learners of academic writing for management discover intellectual leverage in mapping, and develop their own use of the technique, in an iterative manner, in counterpoint with text analysis work. It tracks the cycling between moves analysis and concept mapping as these members of a graduate seminar work to unpack a paper that they have identified as a 'good model', but which they have realized is not a well-written paper.
The observations made here suggest that concept mapping is a pre-electronic technology that deserves a place amongst the essential tools for instructional discourse, particularly in settings such as EAP where the identification of rhetorical orchestration is difficult and where argument is often masked by other rhetorical devices.
Modeling and Evaluation of Performance and Reliability of Component-based So...Editor IJCATR
Validation of software systems is very useful at the primary stages of their development cycle. Evaluation of functional
requirements is supported by clear and appropriate approaches, but there is no similar strategy for evaluation of non-functional requirements
(such as performance and reliability). Whereas establishing the non-functional requirements have significant effect on success of software
systems, therefore considerable necessities are needed for evaluation of non-functional requirements. Also, if the software performance has
been specified based on performance models, may be evaluated at the primary stages of software development cycle. Therefore, modeling
and evaluation of non-functional requirements in software architecture level, that are designed at the primary stages of software systems
development cycle and prior to implementation, will be very effective.
We propose an approach for evaluate the performance and reliability of software systems, based on formal models (hierarchical timed
colored petri nets) in software architecture level. In this approach, the software architecture is described by UML use case, activity and
component diagrams, then UML model is transformed to an executable model based on hierarchical timed colored petri nets (HTCPN) by a
proposed algorithm. Consequently, upon execution of an executive model and analysis of its results, non-functional requirements including
performance (such as response time) and reliability may be evaluated in software architecture level.
The Download: Tech Talks by the HPCC Systems Community, Episode 16HPCC Systems
This episode will feature our 2018 HPCC Systems summer interns:
Shah Muhammad Hamdi, PhD student, CS at Georgia State University - Dimensionality Reduction and Feature Selection in ECL-ML
Hamdi will discuss the parallel implementation of Principal Component Analysis (PCA) using the Parallel Block Basic Linear Algebra Subsystem (PBblas) library and ECL implementations of feature selection algorithms for the HPCC Systems platform.
Robert Kennedy, PhD student in Computer Science at Florida Atlantic University - Parallel Distributed Deep Learning on HPCC Systems
Robert will cover what he implemented during his summer internship. Combining HPCC Systems and Google’s TensorFlow, Robert created a parallel stochastic gradient descent algorithm to provide a basis for future deep neural network research and to enhance HPCC System’s distributed neural network training capabilities.
Aramis Tanelus, programmer and senior at American Heritage High School where he is the lead programmer for the Advanced Robotics Team - Developing HPCC Systems Data Ingestion APIs for Common Robotic Sensors.
Aramis’s project will make it easy for anyone in robotics around the world to ingest data from common robotic sensors into an HPCC Systems platform for use in data analysis. Aramis will be speaking about his work on the autonomous agricultural robot and implementing new packages for the Robotics Operating System to interface with HPCC Systems for big data analysis.
Saminda Wijeratne, Masters student, Computational Science and Engineering at Georgia Institute of Technology, Atlanta - MPI Proof of Concept
The built-in "Message Passing" library in HPCC Systems is designed to handle these communications among dissimilar components and perform non-trivial communication patterns among them. Saminda will explore how this library currently operates and how we can introduce a different implementation such as an existing popular library called MPI.
A design pattern is a general solution to a commonly occurring problem in software design. It is a
template to solve a problem that can be used in many different situations. Patterns formalize best practices
that the programmer can use to solve common problems when designing an application or systems. In this
article we have focused our attention on it, how the proposed UML diagrams can be implemented in C#
language and whether it is possible to make the diagram implementation in the program code with the
greatest possible precision.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...IJERA Editor
The Case Based Reasoning is a paradigm of intelligent reasoning which consists on reusing results of previously solved problems (Source Cases) to solve new problems (Target Cases). It has been formalized as a five-step process consisting of: "Elaboration", "Retrieve", "Reuse", "Revise" and "Retain". In this paper we focus on the first phase of the CBR cycle with all of the required modeling to formalize a Case in our CBR-based system for semantic Web service discovery (CBR4WSD). This phase consists in formalizing the problem description and its structuring before launching the “Retrieve” phase and select the most appropriate Source Cases from the Case Base. We identify a set of basic descriptors to formalize Cases handled in our CBR4WSD system. In this conduct and in accordance with CBR policies, we put forward our Case representation model.
In economical societies of today, using cash is an inseparable aspect of human life. People use
cashes for marketing, services, entertainments, bank operations and so on. This huge amount of
contact with cash and the necessity of knowing the monetary value of it caused one of the most
challenging problems for visually impaired people. In this paper we propose a mobile phone
based approach to identify monetary value of a picture taken from cashes using some image
processing and machine vision techniques. While the developed approach is very fast, it can
recognize the value of cash by average accuracy of about 95% and can overcome different
challenges like rotation, scaling, collision, illumination changes, perspective, and some others
In this paper, a modified invasive weed optimization (IWO) algorithm is presented for
optimization of multiobjective flexible job shop scheduling problems (FJSSPs) with the criteria
to minimize the maximum completion time (makespan), the total workload of machines and the
workload of the critical machine. IWO is a bio-inspired metaheuristic that mimics the
ecological behaviour of weeds in colonizing and finding suitable place for growth and
reproduction. IWO is developed to solve continuous optimization problems that’s why the
heuristic rule the Smallest Position Value (SPV) is used to convert the continuous position
values to the discrete job sequences. The computational experiments show that the proposed
algorithm is highly competitive to the state-of-the-art methods in the literature since it is able to
find the optimal and best-known solutions on the instances studied.
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...csandit
This paper proposes a new framework based on Binary Decision Diagrams (BDD) for the graph distribution problem in the context of explicit model checking. The BDD are yet used to represent the state space for a symbolic verification model checking. Thus, we took advantage of high compression ratio of BDD to encode not only the state space, but also the place where each state will be put. So, a fitness function that allows a good balance load of states over the nodes of an homogeneous network is used. Furthermore, a detailed explanation of how to
calculate the inter-site edges between different nodes based on the adapted data structure is presented
Image-Based Literal Node Matching for Linked Data IntegrationIJwest
This paper proposes a method of identifying and aggregating literal nodes that have the same meaning in Linked Open Data (LOD) in order to facilitate cross-domain search. LOD has a graph structure in which most nodes are represented by Uniform Resource Identifiers (URIs), and thus LOD sets are connected and searched through different domains.However, 5% of the values are literal values (strings without URI) even in a de facto hub of LOD, DBpedia. In SPARQL Protocol and RDF Query Language (SPARQL) queries, we need to rely on regular expression to match and trace the literal nodes. Therefore, we propose a novel method, in which part of the LOD graph structure is regarded as a block image, and then the matching is calculated by image features of LOD. In experiments, we created about 30,000 literal pairs from a Japanese music category of DBpedia Japanese and Freebase, and confirmed that the proposed method determines literal identity with F-measure of 76.1-85.0%.
Bca3020– data base management system(dbms)smumbahelp
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
Mit202 data base management system(dbms)smumbahelp
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
Cmaps as intellectual prosthesis (GERAS 34, Paris)Lawrie Hunter
At the present time, 'increasing accessibility of technology' is readily read as 'increasing accessibility of electronic information technology', but this is to ignore a history of pre-electronic technologies which have generally been conflated with the original media of education, first speech and rather later the writing of continuous text.
The insertion of spaces between words in text was a technology for accessibility of encoding. The paragraph was a technology for the signaling of rhetorical shifts. The bullet list is used for the representation of clusters of notions, either atomic (listing) or aggregates (classification). More substantial technological innovations include the data table and the graph.
One revolutionary technology that has not become mainstream in instructional communication is the Novakian concept map (i.e. the map whose links have text labels to specify the relation between two nodes). This technology has been substantially migrated to electronic information technology, and is arguably more prevalent there than in the traditional sphere, though it is still largely regarded as a novelty or non-essential element of instructional discourse.
This paper reports a case study of a fruitful application of Novakian mapping, wherein EAP learners of academic writing for management discover intellectual leverage in mapping, and develop their own use of the technique, in an iterative manner, in counterpoint with text analysis work. It tracks the cycling between moves analysis and concept mapping as these members of a graduate seminar work to unpack a paper that they have identified as a 'good model', but which they have realized is not a well-written paper.
The observations made here suggest that concept mapping is a pre-electronic technology that deserves a place amongst the essential tools for instructional discourse, particularly in settings such as EAP where the identification of rhetorical orchestration is difficult and where argument is often masked by other rhetorical devices.
Modeling and Evaluation of Performance and Reliability of Component-based So...Editor IJCATR
Validation of software systems is very useful at the primary stages of their development cycle. Evaluation of functional
requirements is supported by clear and appropriate approaches, but there is no similar strategy for evaluation of non-functional requirements
(such as performance and reliability). Whereas establishing the non-functional requirements have significant effect on success of software
systems, therefore considerable necessities are needed for evaluation of non-functional requirements. Also, if the software performance has
been specified based on performance models, may be evaluated at the primary stages of software development cycle. Therefore, modeling
and evaluation of non-functional requirements in software architecture level, that are designed at the primary stages of software systems
development cycle and prior to implementation, will be very effective.
We propose an approach for evaluate the performance and reliability of software systems, based on formal models (hierarchical timed
colored petri nets) in software architecture level. In this approach, the software architecture is described by UML use case, activity and
component diagrams, then UML model is transformed to an executable model based on hierarchical timed colored petri nets (HTCPN) by a
proposed algorithm. Consequently, upon execution of an executive model and analysis of its results, non-functional requirements including
performance (such as response time) and reliability may be evaluated in software architecture level.
The Download: Tech Talks by the HPCC Systems Community, Episode 16HPCC Systems
This episode will feature our 2018 HPCC Systems summer interns:
Shah Muhammad Hamdi, PhD student, CS at Georgia State University - Dimensionality Reduction and Feature Selection in ECL-ML
Hamdi will discuss the parallel implementation of Principal Component Analysis (PCA) using the Parallel Block Basic Linear Algebra Subsystem (PBblas) library and ECL implementations of feature selection algorithms for the HPCC Systems platform.
Robert Kennedy, PhD student in Computer Science at Florida Atlantic University - Parallel Distributed Deep Learning on HPCC Systems
Robert will cover what he implemented during his summer internship. Combining HPCC Systems and Google’s TensorFlow, Robert created a parallel stochastic gradient descent algorithm to provide a basis for future deep neural network research and to enhance HPCC System’s distributed neural network training capabilities.
Aramis Tanelus, programmer and senior at American Heritage High School where he is the lead programmer for the Advanced Robotics Team - Developing HPCC Systems Data Ingestion APIs for Common Robotic Sensors.
Aramis’s project will make it easy for anyone in robotics around the world to ingest data from common robotic sensors into an HPCC Systems platform for use in data analysis. Aramis will be speaking about his work on the autonomous agricultural robot and implementing new packages for the Robotics Operating System to interface with HPCC Systems for big data analysis.
Saminda Wijeratne, Masters student, Computational Science and Engineering at Georgia Institute of Technology, Atlanta - MPI Proof of Concept
The built-in "Message Passing" library in HPCC Systems is designed to handle these communications among dissimilar components and perform non-trivial communication patterns among them. Saminda will explore how this library currently operates and how we can introduce a different implementation such as an existing popular library called MPI.
A design pattern is a general solution to a commonly occurring problem in software design. It is a
template to solve a problem that can be used in many different situations. Patterns formalize best practices
that the programmer can use to solve common problems when designing an application or systems. In this
article we have focused our attention on it, how the proposed UML diagrams can be implemented in C#
language and whether it is possible to make the diagram implementation in the program code with the
greatest possible precision.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...IJERA Editor
The Case Based Reasoning is a paradigm of intelligent reasoning which consists on reusing results of previously solved problems (Source Cases) to solve new problems (Target Cases). It has been formalized as a five-step process consisting of: "Elaboration", "Retrieve", "Reuse", "Revise" and "Retain". In this paper we focus on the first phase of the CBR cycle with all of the required modeling to formalize a Case in our CBR-based system for semantic Web service discovery (CBR4WSD). This phase consists in formalizing the problem description and its structuring before launching the “Retrieve” phase and select the most appropriate Source Cases from the Case Base. We identify a set of basic descriptors to formalize Cases handled in our CBR4WSD system. In this conduct and in accordance with CBR policies, we put forward our Case representation model.
In economical societies of today, using cash is an inseparable aspect of human life. People use
cashes for marketing, services, entertainments, bank operations and so on. This huge amount of
contact with cash and the necessity of knowing the monetary value of it caused one of the most
challenging problems for visually impaired people. In this paper we propose a mobile phone
based approach to identify monetary value of a picture taken from cashes using some image
processing and machine vision techniques. While the developed approach is very fast, it can
recognize the value of cash by average accuracy of about 95% and can overcome different
challenges like rotation, scaling, collision, illumination changes, perspective, and some others
In this paper, a modified invasive weed optimization (IWO) algorithm is presented for
optimization of multiobjective flexible job shop scheduling problems (FJSSPs) with the criteria
to minimize the maximum completion time (makespan), the total workload of machines and the
workload of the critical machine. IWO is a bio-inspired metaheuristic that mimics the
ecological behaviour of weeds in colonizing and finding suitable place for growth and
reproduction. IWO is developed to solve continuous optimization problems that’s why the
heuristic rule the Smallest Position Value (SPV) is used to convert the continuous position
values to the discrete job sequences. The computational experiments show that the proposed
algorithm is highly competitive to the state-of-the-art methods in the literature since it is able to
find the optimal and best-known solutions on the instances studied.
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...csandit
This paper proposes a new framework based on Binary Decision Diagrams (BDD) for the graph distribution problem in the context of explicit model checking. The BDD are yet used to represent the state space for a symbolic verification model checking. Thus, we took advantage of high compression ratio of BDD to encode not only the state space, but also the place where each state will be put. So, a fitness function that allows a good balance load of states over the nodes of an homogeneous network is used. Furthermore, a detailed explanation of how to
calculate the inter-site edges between different nodes based on the adapted data structure is presented
Twitter is a popular microblogging service where users create status messages (called
“tweets”). These tweets sometimes express opinions about different topics; and are presented to
the user in a chronological order. This format of presentation is useful to the user since the
latest tweets from are rich on recent news which is generally more interesting than tweets about
an event that occurred long time back. Merely, presenting tweets in a chronological order may
be too embarrassing to the user, especially if he has many followers. Therefore, there is a need
to separate the tweets into different categories and then present the categories to the user.
Nowadays Text Categorization (TC) becomes more significant especially for the Arabic
language which is one of the most complex languages.
In this paper, in order to improve the accuracy of tweets categorization a system based on
Rough Set Theory is proposed for enrichment the document’s representation. The effectiveness
of our system was evaluated and compared in term of the F-measure of the Naïve Bayesian
classifier and the Support Vector Machine classifier.
Adaptive Trilateral Filter for In-Loop Filteringcsandit
High Efficiency Video Coding (HEVC) has achieved si
gnificant coding efficiency improvement
beyond existing video coding standard by employing
several new coding tools. Deblocking
Filter, Sample Adaptive Offset (SAO) and Adaptive L
oop Filter (ALF) for in-loop filtering are
currently introduced for the HEVC standard. However
, these filters are implemented in spatial
domain despite the fact of temporal correlation wit
hin video sequences. To reduce the artifacts
and better align object boundaries in video, a prop
osed algorithm in in-loop filtering is
proposed. The proposed algorithm is implemented in
HM-11.0 software. This proposed
algorithm allows an average bitrate reduction of ab
out 0.7% and improves the PSNR of the
decoded frame by 0.05%, 0.30% and 0.35% in luminanc
e and chroma.
Performance of the Maximum Stable Connected Dominating Sets in the Presence o...csandit
The topology of mobile ad hoc networks
(
MANETs
)
change dynamically with time. Connected
dominating sets
(
CDS
)
are considered to be an effective topology for net
work-wide broadcasts
in MANETs as only the nodes that are part of the CD
S need to broadcast the message and the
rest of the nodes merely receive the message. Howev
er, with node mobility, a CDS does not exist
for the entire duration of the network session and
has to be regularly refreshed
(
CDS
transition
)
. In an earlier work, we had proposed a benchmarkin
g algorithm to determine a
sequence of CDSs
(
Maximum Stable CDS
)
such that the number of transitions is the global
minimum. In this research, we study the performance
(
CDS Lifetime and CDS Node Size
)
of the
Maximum Stable CDS when a certain fraction of the n
odes in the network are static and
compare the performance with that of the degree-bas
ed CDSs. We observe the lifetime of the
Maximum Stable CDS to only moderately increase
(
by a factor of 2.3
)
as we increase the
percentage of the static nodes in the network; on t
he other hand, the lifetime of the degree-based
CDS increases significantly
(
as large as 13 times
)
as we increase the percentage of static nodes
from 0 to 80
.
The Appropriateness of the Factual Density as an Informativeness Measure for ...csandit
In circumstances where the receptivity of the onlin
e news is affected by the media bias in
covering public attention events, the quality of th
e textual component is of pervasive importance
for a reliable perception of their informativeness.
Aware of this threat, several natural language
processing techniques have been developed for the p
urpose of capturing the quality of the web
content based on the concepts of objectivity classi
fication and stylometric features, knowledge
maturing, factual density, or simple word count. Th
is paper explores the appropriateness of the
factual density as an adequate quality measure of t
he information reported on the missing
Malaysia Airliners Flight 370 as a public attention
event. The results suggest that the factual
density needs to be applied under strict conditions
in terms of increased confidence level of the
textual news content, if its substance is a subject
of capitalization as a referent source of
information.
ENHANCING PERFORMANCE OF AN HPC CLUSTER BY ADOPTING NONDEDICATED NODES csandit
Persona-sized HPC clusters are widely used in many small labs, because they are cost-effective
and easy to build. Instead of adding costly new nodes to old clusters, we may try to make use of
some servers’ idle times by including them working independently on the same LAN, especially
during the night. However such extension across a firewall raises not only some security
problem with NFS but also a load balancing problem caused by heterogeneity. In this paper, we
propose a method to solve such problems using only old techniques applicable to old systems as
is, without requiring any upgrade for hardware or software. Some experimental results dealing
with heterogeneity and load balancing are presented using a two-queue overflow queuing
network problem.
DESIGN AND IMPLEMENTATION OF INTEL-SPONSORED REAL-TIME MULTIVIEW FACE DETECTI...csandit
The paper introduces a case study of design and implementation of Intel-sponsored real-time
face detection system conducted in University of Michigan—Shanghai Jiao Tong University
Joint Institute (JI). This work is teamed up totally by 15 JI students and developed in three
phases during 2013 and 2014. The system design of face detection is based on Intel High
Definition (HD) 4000 graphics and OpenCL. With numerous techniques including the
accelerated pipeline over CPU and GPU, image decomposition, two-dimensional (2D) task
allocation, and the combination of Viola-Jones algorithm and continuously adaptive mean-shift
(Camshift) algorithm, the speed reaches 32 fps for real-time multi-view face detection. Plus, the
frontal view detection accuracy obtains 81% in Phase I and reaches 95% for multi-view
detection, in Phase III. Furthermore, an innovative application called face-detection game
controller (FDGC) is developed. At the time of this writing, the technology has been
implemented in wearable devices and mobile with Intel cores.
Designing a Routing Protocol for Ubiquitous Networks Using ECA Schemecsandit
We have designed a novel Event-Condition-Action (ECA) scheme based Ad hoc On-demand Distance Vector(ECA-AODV) routing protocol for a Ubiquitous Network (UbiNet). ECA-AODV
is designed to make routing decision dynamically and quicker response to dynamic network conditions as and when event occur. ECA scheme essentially consists of three modules to make
runtime routing decision quicker. First, event module receive event that occur in a UbiNet and split up event into event type and event attributes. Second, condition module obtain event details from event module split up each condition into condition attributes that matches event and fire
the rule as soon as condition hold. Third, action module make runtime decisions based on event obtained and condition applied. We have simulated and tested the designed ECA scheme by considering ubiquitous museum environment as a case study with nodes range from 10 to 100.
The simulation results show the time efficient with minimal operations.
A NALYSIS O F S UPPLIER’S P ERFORMANCE T HROUGH F PIR /F NIR A ND M EM...csandit
In today’s highly competitive business environment,
evaluation of suppliers is the prime function
of the purchasing department of the organization. I
t is due to the fact that high percentage of the
material cost for manufacturing of a product is inv
olved. Identification of decision criteria and
methods for supplier evaluation are appearing to be
the important research area in the
literature. In this paper, hybrid methodology of Fu
zzy positive Ideal rating /Fuzzy Negative
Ideal rating and Membership Degree Transformation-
M (1, 2, 3) is proposed for evaluation of
supplier’s performance. A wide literature review is
made and six selection criteria namely:
Cost, Quality, Service, Business performance, Techn
ical Capability and Delivery performance
are considered for evaluation. A detailed applicati
on of the proposed methodology is illustrated.
The proposed methodology is useful not only to judg
e the overall performance of the supplier
but also to know which criteria/sub-criteria need t
o be improved
A M ODIFIED M ETHOD F OR P REDICTIVITY OF H EART R ATE V ARIABILITYcsandit
Heart Rate Variability (HRV) plays an important rol
e for reporting several cardiological and
non-cardiological diseases. Also, the HRV has a pro
gnostic value and is therefore quite
important in modelling the cardiac risk. The nature
of the HRV is chaotic, stochastic and it
remains highly controversial. Because the HRV has u
tmost importance, it needs a sensitive tool
to analyze the variability. In previous work, Rosen
stein and Wolf had used the Lyapunov
exponent as a quantitative measure for HRV detectio
n sensitivity. However, the two methods
diverge in determining the HRV sensitivity. This pa
per introduces a modification to both the
Rosenstein and Wolf methods to overcome their drawb
acks. The introduced Mazhar-Eslam
algorithm increases the sensitivity to HRV detectio
n with better accuracy.
H IDDEN M ARKOV M ODEL A PPROACH T OWARDS E MOTION D ETECTION F ROM S PEECH S...csandit
Emotions carry the token indicating a human’s menta
l state. Understanding the emotion
exhibited becomes difficult for people suffering fr
om autism and alexithymia. Assessment of
emotions can also be beneficial in interactions inv
olving a human and a machine. A system is
developed to recognize the universally accepted emo
tions such as happy, anger, sad, disgust,
fear and surprise. The gender of the speaker helps
to obtain better clarity for identifying the
emotion. Hidden Markov Model serves the purpose of
gender identification
C RITICAL A SSESSMENT OF A UDITING C ONTRIBUTIONS T O E FFECTIVE AND E FF...csandit
Database auditing has become a very crucial aspect
of security as organisations increase their
adoption of database management systems (DBMS) as m
ajor asset that keeps, maintain and
monitor sensitive information. Database auditing is
the group of activities involved in observing
a set of stored data in order to be aware of the ac
tions of users. The work presented here
outlines the main auditing techniques and methods.
Some architectural based auditing systems
were also considered to assess the contribution of
auditing to database security. Here a
framework of several stages to be used in the insti
gation of auditing is proposed. Some issues
relating to handling of audit trails are also discu
ssed in this paper. This paper also itemizes
some of the key important impacts of the concept to
security and how compliance with
government policies and regulations is enforced thr
ough auditing. Once the framework is
adopted, it will provide support to database audito
rs and DBAs.
Topic Modeling : Clustering of Deep Webpagescsandit
The internet is comprised of massive amount of info
rmation in the form of zillions of web
pages.This information can be categorized into the
surface web and the deep web. The existing
search engines can effectively make use of surface
web information.But the deep web remains
unexploited yet. Machine learning techniques have b
een commonly employed to access deep
web content.
Under Machine Learning, topic models provide a simp
le way to analyze large volumes of
unlabeled text. A "topic" consists of a cluster of
words that frequently occur together. Using
contextual clues, topic models can connect words wi
th similar meanings and distinguish
between words with multiple meanings. Clustering is
one of the key solutions to organize the
deep web databases.In this paper, we cluster deep w
eb databases based on the relevance found
among deep web forms by employing a generative prob
abilistic model called Latent Dirichlet
Allocation(LDA) for modeling content representative
of deep web databases. This is
implemented after preprocessing the set of web page
s to extract page contents and form
contents.Further, we contrive the distribution of “
topics per document” and “words per topic”
using the technique of Gibbs sampling. Experimental
results show that the proposed method
clearly outperforms the existing clustering methods
.
Analysis of near field distribution variation using awas electromagnetic code...csandit
Rapid Fluctuations and variations of signal strengt
h at higher frequency range in Near Field
zone, is a common difficulty to achieve higher data
rate. As signal varies continuously, it starts
decaying by the interference of the atmospheric obs
tructions and the electric field intensity
gradually decreases with the distance. This effect
is observed by AWAS Electromagnetic Code
which predicts the rapid variations in electric fie
ld intensity irrespective of environment,
whereas statistical models do not capture the funda
mental physics and variations as per
Environment. An Adequate and optimum values of thes
e external parameters is essential for
controlled and efficient transmission.
An improvised model for identifying influential nodes in multi parameter soci...csandit
Influence Maximization is one of the major tasks in the field of viral marketing and community
detection. Based on the observation that social networks in general are multi-parameter graphs
and viral marketing or Influence Maximization is based on few parameters, we propose to
convert the general social networks into “interest graphs”. We have proposed an improvised
model for identifying influential nodes in multi-parameter social networks using these “interest
graphs”. The experiments conducted on these interest graphs have shown better results than the
method proposed in [8].
Dynamic selection of symmetric key cryptographic algorithms for securing data...csandit
Most of the information is in the form of electroni
c data. A lot of electronic data exchanged
takes place through computer applications. Therefor
e information exchange through these
applications needs to be secure. Different cryptogr
aphic algorithms are usually used to address
these security concerns. However, along with securi
ty there are other factors that need to be
considered for practical implementation of differen
t cryptographic algorithms like
implementation cost and performance. This paper pro
vides comparative analysis of time taken
for encryption by seven symmetric key cryptographic
algorithms (AES, DES, Triple DES, RC2,
Skipjack, Blowfish and RC4) with variation of param
eters like different data types, data density,
data size and key sizes.
An unsupervised method for real time video shot segmentationcsandit
Segmentation of a video into its constituent shots
is a fundamental task for indexing and
analysis in content based video retrieval systems.
In this paper, a novel approach is presented
for accurately detecting the shot boundaries in rea
l time video streams, without any a priori
knowledge about the content or type of the video. T
he edges of objects in a video frame are
detected using a spatio-temporal fuzzy hostility in
dex. These edges are treated as features of the
frame. The correlation between the features is comp
uted for successive incoming frames of the
video. The mean and standard deviation of the corre
lation values obtained are updated as new
video frames are streamed in. This is done to dynam
ically set the threshold value using the
three-sigma rule for detecting the shot boundary (a
brupt transition). A look back mechanism
forms an important part of the proposed algorithm t
o detect any missed hard cuts, especially
during the start of the video. The proposed method
is shown to be applicable for online video
analysis and summarization systems. In an experimen
tal evaluation on a heterogeneous test set,
consisting of videos from sports, movie songs and m
usic albums, the proposed method achieves
99.24% recall and 99.35% precision on the average
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
The feature matching is a basic step in matching different datasets. This article proposes shows a new hybrid model of a pretrained Natural Language Processing (NLP) based model called BERT used in parallel with a statistical model based on Jaccard similarity to measure the similarity between list of features from two different datasets. This reduces the time required to search for correlations or manually match each feature from one dataset to another.
The large-scale cyberinformatics method to replication is defined not only by the analysis of local-area networks, but also by the structured need for the Internet. Here, we confirm the refinement of superpages, which embodies the unfortunate principles of operating systems. SHODE, our new methodology for secure methodologies, is the solution to all of these obstacles.
Surrogate modeling for industrial designShinwoo Jang
We describe GTApprox | a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe several applications of GTApprox to real engineering problems.
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling techniques were applied into code clone firstly and a new clone group mapping method was proposed. By using topic modeling techniques to transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was indirectly reached by mapping clone group topics. Experiments on four open source software show that the recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone group mapping.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. 2 Computer Science & Information Technology (CS & IT)
anecdotal arguments rather than metrics to reason about aesthetic outcomes, and no research
effort heretofore has investigated the problem or its opportunities.
In this paper, we study a new, quantitative way to analyse basic tenets good programming style
using fractal geometry [7]. Fractals are often associated with beauty in nature and human designs
[8]. Furthermore since fractals are self-similar and scale-invariant, we hypothesized a fractal
approach might be inherently robust for handling distributions of source sizes.
Experiments with the C source code of the GNU/Linux Core Utilities [9], 114 commands of the
Linux shell or about 70,000 lines of code (LOC), show systematic changes in programming style
are correlated with statistically significant changes (P≤0.0002) in fractal dimension [10]. The data
further show that while the baseline sizes of C source files vary widely, there is a positive but
weak correlation with fractal dimension (r=0.0878). These data suggest the fractal dimension is a
reliable metric of changes in source that affect good style, the knowledge of which may be useful
for maintaining a code base.
2. RELATED WORK
Aesthetic value in source is not the same as readability [11] [12], although the two are related.
The latter is more about comprehending code whereas the former, appreciating it, l’art pour l’art.
Beauty in source is also not the same as functional complexity [13]. Complexity relates to design
and efficiency in algorithms and data structures, which may have appeal in a conceptual, though
not necessarily a visual sense, although here again there is overlap. Beautiful Code [14] explores
just this sort of conceptual aesthetic, not only in source but also in debugging and testing which
are not subjects we consider. Gabriel [15] argues against clarity and conceptual beauty as primary
goals of software in favour of what the author calls “habitability.” Yet comfort with the code is
independent of style since programmers might forgo style best practices as long as they can live
with it, whereas our starting point is good style. The fractal dimension has been applied to a wide
range of disciplines, though not software development [16]. Our code depends on Fractop [17], a
Java library originally developed to categorize neural tissue. We have reused this library to
analyze source code. Some researchers have employed the fractal dimension to study paintings of
artists [18]; others working in a similar vein have used the fractal dimension to authenticate
Jackson Pollack’s “action paintings” [19] [20]. Still others have used the fractal dimension to
examine aesthetic appeal in artificially intelligent path finding in videogames [21] [22] [23]. An
investigation of Scala repositories on GitHub.com found sources are organized according to
power-law distributions [24] [25] but that effort did not consider style. Kokol, et al, [26] [27] [28]
reported evidence of fractal structure and long-range correlations in source; however, they were
investigating not style but fine details, character, operator, and string patterns in a small sample
of randomly generated Pascal programs. We study style in a moderate size sample of highly
functional C programs.
3. METHODS
We use a multi-phase operation to process a single source file: 1) beautify or de-beautify the
source style, if necessary; convert the result to an in-memory representation called an artefact; 3)
calculate the fractal dimension of the artefact.
3. Computer Science & Information Technology (CS & IT) 3
To beautify the source in phase 1, we use a combination of the GNU/Linux indent command and
a kit we developed called Mango [29] (see below). The indent manual page [6] gives input
options for beautifying the source according to four distinct C styles: GNU, K&R (Kernighan and
Ritchie), Berkeley, and Linux (kernel). They affect indentation, spacing, and comments and
differences can be found in the manual page. The command, indent, does not, however, change
mnemonics.
Mango is a kit written in Scala, C, and to drive the experiments, Korn shell scripts. During the
first phase of processing, Mango mostly does the reverse of indent: it “mangles” or de-beautifies
C source and outputs new source as we discuss below.
3.1. Base lining measurements
To get baseline measurements of the source, Mango skips phase 1 and sends the unmodified
source directly to phases 2 and 3 to generate the artefact and calculate the fractal dimension,
respectively.
3.2. De-beautifying source
When de-beautifying source in phase 1, Mango does one of the following: remove indentation,
randomize indentation, remove comments, or make the names of variables, functions, macros,
and labels less mnemonic. To remove indentation, Mango trims each line of spaces. To
randomize the indentation, Mango inserts a random number of spaces to the beginning of the line.
To remove comments, Mango strips the file of both block (/* … */) and line (//) comments.
Finally, to make names less mnemonic, Mango shortens them according to the algorithm below.
3.3. Non-mnemonic algorithm
The algorithm to shorten names requires two passes over the source. During the first pass Mango
filters key words, compiler directives, library references, names with less than a minimum length
(l=3), and names appearing less than a minimum frequency (n=3). For names that get through
these filters, Mango calculates new, non-mnemonic names as follows. If a name has at least one
under bar (“_”), Mango splits the name along the under bar and recombines the first letter of each
subsequent sub-name with the whole first sub-name followed by an under bar. If a name is
uppercase name, Mango uses every other letter to reform the name, effectively, cutting the name
in half. If a name is neither of these, it shortens the name by half. Mango puts the old name and
the new name in a database for lookup and substitution back into the source during the second
pass. The table below gives some examples of how the algorithm works.
Table 1. Example changes by non-mnemonic algorithm
Old name New name
i i
T_FATE_INIT T_FI
NOUPDATE NUDT
linkname link
4. 4 Computer Science & Information Technology (CS & IT)
3.4. Mnemonic algorithm
Mango also has a beautify mode of phase 1 to make names more mnemonic. Mango does not, of
course, know the intention of programmers or semantics of names. However, it can simulate
these by lengthening names. The algorithm to lengthen names is similar to the one to shorten
them. During the first pass Mango collects appropriately filtered candidate names of a maximum
length (l=3) and with a minimum frequency (n=3). Mango makes these names a maximum of
length of four by repeating the letters in the name or adding an under bar after the name. The
table below gives some examples of how the algorithm works.
Table 2. Example changes by the mnemonic algorithm
Old name New name
loop loop
foo foo_
go gogo
i iiii
3.5. Artefact generation
Phase 2 of Mango converts an input source file it to an artefact, which has one of two types of
encodings: literal and block.
With literal encoding, the flat text of the source is written to a buffered image using a graphics
context. The text is Courier New, ten-point, plain style, and black foreground over a white
background with ten-point line height. In this case, the artefact looks identical the flat text except
it’s in bitmap form.
With block encoding, each character in the input is written to the graphics context as “blocks” or
8×10 (pixels) black filled rectangles over a white background with two pixels between each
rectangle. Spaces are 10×10 pixels. A block artifact resembles the source but in digital outline.
Block encoding has two advantages. It makes the artefact more robust, more independent
language. Similarly, it makes the mnemonic and non-mnemonic algorithms more robust. In fact,
for these algorithms with block encoding, only the length of the name is relevant, not the name
itself.
The figure below is an example of a simple C program.
Figure 1. Simple C file which is identical to its literal artefact encoding except in bitmap form
A literal artifact looks identical to the figure above except it is a bitmap.
The figure below shows the same C program as a block artifact.
#include <stdio.h>
int main(int argc, char** argv) {
printf("Hello, world!");
return 0;
}
5. Computer Science & Information Technology (CS & IT)
Figure 2. Same C file as an artefact with block encoding
As the reader can see from the figure above, all the language details have been “bloc
the digital outline persists.
3.6. Fractal dimension calculation
The third and final phase of Mango measures the fractal dimension of the artefact. Mandelbrot [9]
described fractals as geometric objects, which are no
self-similar at different scales. We use the geometric interpretation based on reticular cell
counting or the box counting dimension. We choose this method for two reasons. Firstly, the box
counting dimension is conceptually and computat
provides a tested, high quality implementation.
Mandelbrot also said fractal objects have fractional dimension,
called the fractional dimension. Mathematically,
where S represents a set of points on a surface (e.g., coastlines, brush strokes, source lines of
code, etc.), ε is the size of the measuring tool or ruler and
objects or subcomponents covered by the measuring tool. For fractal objects, log
greater than log (1/ε) by a fractional amount. If the tool is a uniform grid of square cells, then a
straight line passes through twice as many cells if the
fractal object passes through more than twice as many cells.
The artefact is S from Equation 1. Mango uses the Fractop default grid sizes of 2, 3, 4, 6, 8, 12,
16, 32, 64, and 128 measured in pixels for
which is the slope of the line of the log proportion of cells intersected by the surface increases as
log cell size decreases.
4. EXPERIMENT DESIGN
The GNU/Linux Core Utilities version 8.10 [8] comprise 114 dot
generated descriptive statistics for this test bed for number of files and LOC.
We then ran three experiments as follows
1. Established baseline D
artefact encodings.
Computer Science & Information Technology (CS & IT)
. Same C file as an artefact with block encoding
As the reader can see from the figure above, all the language details have been “bloc
3.6. Fractal dimension calculation
The third and final phase of Mango measures the fractal dimension of the artefact. Mandelbrot [9]
described fractals as geometric objects, which are no-where differentiable, that is, textured, and
similar at different scales. We use the geometric interpretation based on reticular cell
counting or the box counting dimension. We choose this method for two reasons. Firstly, the box
counting dimension is conceptually and computationally straightforward. Secondly, Fractop [x]
provides a tested, high quality implementation.
Mandelbrot also said fractal objects have fractional dimension, D, namely, a non-whole number
Mathematically, D is given by the Hausdorff dimension [15]:
ܦሺܵሻ ൌ lim
Ԫ→ஶ
݈ܰ݃Ԫ
log ሺ
1
Ԫ
ሻ
represents a set of points on a surface (e.g., coastlines, brush strokes, source lines of
is the size of the measuring tool or ruler and Nε(S) is the number of self
objects or subcomponents covered by the measuring tool. For fractal objects, log
) by a fractional amount. If the tool is a uniform grid of square cells, then a
straight line passes through twice as many cells if the cell length is reduced by a factor of two. A
fractal object passes through more than twice as many cells.
from Equation 1. Mango uses the Fractop default grid sizes of 2, 3, 4, 6, 8, 12,
16, 32, 64, and 128 measured in pixels for ε. For any given input artefact, Mango returns
which is the slope of the line of the log proportion of cells intersected by the surface increases as
ESIGN
The GNU/Linux Core Utilities version 8.10 [8] comprise 114 dot C source files. First, we
generated descriptive statistics for this test bed for number of files and LOC.
We then ran three experiments as follows
using the original, unmodified C files with literal and block
5
As the reader can see from the figure above, all the language details have been “blocked”. Only
The third and final phase of Mango measures the fractal dimension of the artefact. Mandelbrot [9]
is, textured, and
similar at different scales. We use the geometric interpretation based on reticular cell
counting or the box counting dimension. We choose this method for two reasons. Firstly, the box
ionally straightforward. Secondly, Fractop [x]
whole number
e Hausdorff dimension [15]:
(1)
represents a set of points on a surface (e.g., coastlines, brush strokes, source lines of
is the number of self-similar
objects or subcomponents covered by the measuring tool. For fractal objects, log Nε(S) will be
) by a fractional amount. If the tool is a uniform grid of square cells, then a
cell length is reduced by a factor of two. A
from Equation 1. Mango uses the Fractop default grid sizes of 2, 3, 4, 6, 8, 12,
any given input artefact, Mango returns D,
which is the slope of the line of the log proportion of cells intersected by the surface increases as
C source files. First, we
using the original, unmodified C files with literal and block
6. 6 Computer Science & Information Technology (CS & IT)
2. Treat the source with de-beautifying regimes using Mango to i) remove indentation, ii)
randomize indentation by 0-20 spaces, iii) randomize indentation by 0-40 spaces, iv)
make names non-mnemonic, and v) remove comments.
3. Treat the source with beautifying regimes using Mango to i) make names more and using
GNU/Linux indent to refactor the source with ii) GNU, iii) K&R, iv) Berkeley, and v)
Linux style settings.
We observed the frequency and direction in which D changes relative to the baseline. We
computed the percentage change and the one-tailed P-value using the Binomial test [30]. We also
measured the rank correlation coefficient, Spearman’s rho [30], between the baseline D and lines
of code over all source files.
5. RESULTS
The table below gives the test bed summary statistics. The range of LOC is fairly wide, from files
with just two lines to several thousand lines.
Table 3. Test bed summary statistics
Files 114
Total LOC 69,722
Median LOC 356
Maximum LOC 4,733
Minimum LOC 2
The table below gives the baseline fractal dimension values for literal and block encodings.
Table 4. Baseline analysis
Literal Block
Median D 1.4592 1.6500
Maximum D 1.5448 1.7176
Minimum D 0.9836 1.4011
r (LOC v. D) 0.0878 0.0878
5.1 De-beautifying treatments
The tables below give the direction and the frequency of changes D decreases in relation to the
baseline. As the reader can see the fractal dimension decreases in each case with a small
difference between literal and block encoded artefacts. Removing indents is statistically
significant, however, as a contrarian indicator. In other words, rather than decreasing D, it
increases it in relation to the baseline. We explore this matter further below.
7. Computer Science & Information Technology (CS & IT) 7
Table 5. Changes in D in relation to the baseline with literal encoding
Treatment Dir. Freq. Rate P
Random indents 0-20 down 112 98% <0.0001
Random indents 0-40 down 109 96% <0.0001
Remove indents up 107 94% <0.0001
Remove comments down 82 72% <0.0001
Non-mnemonic down 104 91% <0.0001
Table 6. Changes in D in relation to the baseline with block encoding
Treatment Dir. Freq. Rate P
Random indents 0-20 down 113 99% <0.0001
Random indents 0-40 down 113 99% <0.0001
Remove indents up 107 94% <0.0001
Remove comments down 112 98% <0.0001
Non-mnemonic down 106 93% <0.0001
5.2 Beautifying treatments
The tables below give the direction and the frequency of changes D decreases in relation to the
baseline.
Table 7. Changes in D in relation to the baseline with literal encoding
Treatment Dir. Freq. Rate P
GNU style up 100 88% <0.0001
K&R style up 105 92% <0.0001
Berkeley style up 74 65% 0.0009
Linux style up 106 93% <0.0001
Mnemonic up 97 85% <0.0001
Table 8. Changes in D in relation to the baseline with block encoding
Treatment Dir. Freq. Rate P
GNU style up 112 98% <0.0001
K&R style up 104 91% <0.0001
Berkeley style up 78 68% <0.0001
Linux style up 105 92% <0.0001
Mnemonic up 99 87% <0.0001
5.3 No indentation as contrarian indicator
The experimental results in section 5.1, “De-beautifying treatments,” removed indentation on all
the source lines and we found D increased. We hypothesized that if removing indentation were a
contrary indicator, we expect D to rise from the baseline (0% rate) to complete indentation
removal (100% rate). The null hypothesis is no change in D is affected by the removal rate. To
test the null hypothesis, namely, no change in D with change in removal rate, we examined
several files and found we could reject the null, at least on a subset of typical size files. For
instance, mktemp.c has 358 LOC, which is very close to the median size file. We removed the
8. 8 Computer Science & Information Technology (CS & IT)
indentation on randomly selected lines at 75%, 50%, and 25% rates and measured D in ten trials
using literal encoding. The data for mktemp.c is in the table below is typical for other programs
we examined.
Table 9 D for different random remove rates over ten trials for mktemp.c
Indentation removal rate
Trial 25% 50% 75%
1 1.468205428 1.470438295 1.476648907
2 1.46463698 1.472219091 1.47721244
3 1.465692458 1.470056954 1.475848552
4 1.465102815 1.47256331 1.479550183
5 1.464691894 1.469024252 1.477846232
6 1.464413407 1.470376845 1.480434004
7 1.465313286 1.474732486 1.481568639
8 1.466252928 1.470800863 1.480060737
9 1.469609632 1.470203698 1.474179211
10 1.467231153 1.468487205 1.480865379
Median 1.465502872 1.47040757 1.478698207
The chart below shows the plot with the median values for 25%, 50%, and 75% removal rates,
the baseline (0%), and complete removal (100%).
Figure 3 The rate of indentation removal rate vs. D for mktemp.c where 0% is the baseline and 100% is
removal of all indentation.
6. DISCUSSION
The first observation we make is generally Dliteral
< Dblock
. This makes sense since the block
encoding covers more surface area, S, in the artefact than the literal encoding. Our preference is
for block encoding because of its robustness we mentioned earlier. Nevertheless the pattern of
9. Computer Science & Information Technology (CS & IT) 9
results is consistent between literal and block encoding. When we de-beautify the source, D
decreases; when we beautify the source, D increases.
The exception, we noted, is the removal of all indentation. Yet Figure 1 suggests that removing
indentation is a contrarian indicator of style. We believe the contrariness is a peculiar property of
the fractal dimension. That is, keeping in mind that D=2 means there is no texture and we have a
completely covered surface of a solid colour, the larger D for removing indentation implies
greater surface area. Thus, having all the text aligned on the left gives a more compact, and thus
complete, surface.
All the beautifying treatments increase in D. The indent command programmed with Linux style
is the most effective for raising D and Berkeley style, the least effective.
What is most interesting is that since the GNU/Linux Core Utilities were presumably written with
the GNU style guide, the GNU style-beautifying regime nonetheless increases D. If changes in D
are represent changes in style as the data suggests, then it appears there may be room yet for style
improvements in the Core Utilities.
This observation offers insight into how to formulate a relative aesthetic value. Consider, for
instance, the conflict between regimes that beautify code and increase D and the contrarian effect
of removing all indentation, which de-beautify the code but also increase D. One way to resolve
this is to randomly sample the removal of indentation at different rates, measure D for each rate
as we did above, and test the slope of the line. If it is near zero, we assume there must be poor
indentation. In fact, the slope might be the aesthetic value of the indentation. A similar process
could be developed for documentation and mnemonics.
7. CONCLUSIONS
We have seen how systematic changes in the style of C programs affect the fractal dimension in a
statistically significant manner. Future research may consider the nature of these changes, i.e.,
how much beauty was added or removed by a change in style as suggested in the discussion.
Another useful avenue is confirming these results for programming languages other than C.
REFERENCES
[1] Vermeulen, Allan & Ambler, Scott W., (2000) The Elements of Java Style, Cambridge
[2] Oulline, S., (1992) C Elements of Style: The Programmer’s Style Manual for Elegant C and C++
Programs, M&T, 1992
[3] Google, Inc., (2015) “google-styleguide”, http://code.google.com/p/google-styleguide/, accessed 11-
May-2015
[4] NOAA National Weather Service, National Weather Service Office of Hydrologic Development,
(2007) “General Software Development Standards and Guidelines Version 3.5”
[5] Kant, Immanuel, (1978) The Critique of Judgment (1790), translation by J. C. Meredith, Oxford
University Press
[6] Free Software Foundation, (2015) http://linux.die.net/man/1/indent, access 13-May-2015
[7] Mandelbrot, Benoit, (1967) “How long is the coast of Britain? Statistical self-similarity and fractional
dimension,” Science, vol. 156 (3775), p. 636-638
[8] Peltgen, Heinz-Otto & Richter, P.H., (1986) The Beauty of Fractals, Springer, 1986
10. 10 Computer Science & Information Technology (CS & IT)
[9] Free Software Foundation (2015) http://www.gnu.org/software/coreutils/coreutils.html, accessed 11-
May-2015
[10] Mandelbrot, Benoit, (1982) Fractal Geometry of Nature, Freeman, 1982
[11] Posnett, Daryl, Hindle, Abram & Devanbu, Prem, (2011) “A Simpler Model of Software
Readability”, MSR ’11 Proceedings of the 8th Working Conference on Mining Software Repositories
[12] Buse, Raymond P.L., & Weimer, Westley R., (2008) “A metric for software readability,” ISSTA '08
Proceedings of the 2008 international symposium on Software testing and analysis
[13] Tran-Cao, De, Lévesque, Ghislain, & Meunier, Jean-Guy, (2004) "A Field Study of Software
Functional Complexity Measurement," Proceedings of the 14th International Workshop on Software
Measurement
[14] Oram, Andy & Wilson, Greg, eds. (2007) Beautiful Code, O’Reilly
[15] Gabriel, Richard, (1996) Patterns of Software, Oxford
[16] Schroeder, M., (2009) Fractals, Chaos, and Power Laws, Dover, 2009
[17] Cornforth, David, Jelinek, Herbert, Peichl, Leo, (2002) “Fractop: A Tool for Automated Biological
Image Classification,” Proceedings of the Sixth Australia-Japan Joint Workshop on Intelligent and
Evolutionary Systems, p. 1-8
[18] Gerl, Peter, Schönlieb, Carola, Wang, Kung Cheih, (2004) “The Use of Fractal Dimension in Arts
Analysis,” Harmonic and Fractal Image Analysis, 2004, p. 70-73
[19] Coddington, Jim, Elton, John, & Rockmore, Daniel, Wang, Yang, (2008) “Multifractal analysis and
authentication of Jackson Pollock paintings” Proc. SPIE 6810, Computer Image Analysis in the Study
of Art, 68100F; doi: 10.1117/12.765015
[20] Taylor, R.P, Micolich, A.P., Jonas, D., (1999) “Fractal analysis of Pollock’s drip paintings,” Nature,
vol. 399, June 1999
[21] Coleman, R, (2009) “Long-Memory of Pathfinding Aesthetics,” International Journal of Computer
Games Technology, Volume 2009, Article ID 318505
[22] Coleman, R., (2009) “Fractal Analysis of Stealthy Pathfinding,” International Journal of Computer
Games Technology, Special Issue on Artificial Intelligence for Computer Games, Volume 2009,
Article ID 670459
[23] Coleman, R., (2008) “Fractal Analysis of Pathfinding Aesthetics,” International Journal of Simulation
Modeling, Vol. 7, No. 2
[24] Coleman, Ron, Johnson, Matthew, (2014) ”A Study of Scala Repositories on Github”, International
Journal of Advanced Computer Science Applications, vol. 5, issue 7, August 2014
[25] Coleman, Ron, Johnson, Matthew, (2014) “Power-Laws and Structure in Functional Programs,”
Proceedings of 2014 International Conference on Computational Science & Computational
Intelligence, Las Vegas, NV, IEEE Computer Society
[26] P. Kokol, J. Brest, and V. Zumer, “Long-range correlations in computer programs,” Cybernetics and
systems, 28(1), 1997, p43-57
[27] P. Kokol, J. Brest, “Fractal structure of random programs,” SIGPLAN notices 33(6), 1998, p33-38
[28] P. Kokol “Searching for fractal structure in computer programs,” SIGPLAN 29(1), 1994
[29] Coleman, R., Pretty project, (2015) http://github.com/roncoleman125/Pretty, accessed 11-May-2015
[30] Conover, W.J., (1999) Practical Non-Parametric Statistics, Wiley