The document discusses using k-nearest neighbor (k-NN) algorithm for missing data imputation. It compares the performance of mean, median, and standard deviation imputation techniques when combined with k-NN. The techniques are applied to group data of different sizes, and median and standard deviation show better results than mean substitution. Accuracy improves with larger group sizes and higher percentages of missing data. Median and standard deviation imputation have slightly better performance than mean imputation for missing data imputation when combined with k-NN.
A tutorial on LDA that first builds on the intuition of the algorithm followed by a numerical example that is solved using MATLAB. This presentation is an audio-slide, which becomes self-explanatory if downloaded and viewed in slideshow mode.
Analysis of Variance (ANOVA) is a generalized statistical
technique used to analyze sample variances to obtain information on comparing multiple
population means.
A tutorial on LDA that first builds on the intuition of the algorithm followed by a numerical example that is solved using MATLAB. This presentation is an audio-slide, which becomes self-explanatory if downloaded and viewed in slideshow mode.
Analysis of Variance (ANOVA) is a generalized statistical
technique used to analyze sample variances to obtain information on comparing multiple
population means.
This presentation discusses the application of discriminant analysis in sports research. One can understand the steps involved in the analysis and testing its assumptions.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...IOSR Journals
Experts in the mathematical modeling for two interacting technologies have observed the different contributions between the intraspecific and the interspecific coefficients in conjunction with the starting population sizes and the trading period. In this complex multi-parameter system of competing technologies which evolve over time, we have used the numerical method of mathematical norms to measure the sensitivity values of the intraspecific coefficients b and e, the starting population sizes of the two interacting technologies and the duration of trading. We have observed that the two intraspecific coefficients can be considered as most sensitive parameter while the starting populations are called least sensitive. We will expect these contributions to provide useful insights in the determination of the important parameters which drive the dynamics of the technological substitution model in the context of one-at-a-timesensitivity analysis
This report includes information about:
1. Pre-Processing Variables
a. Treating Missing Values
b. Treating correlated variables
2. Selection of Variables using random forest weights
3. Building model to predict donors and amount expected to be donated.
This presentation discusses in detail about the procedure involved in two-factor MANOVA. Both the analysis i.e. multivariate as well as univariate has been shown in this design by solving an illustration using SPSS software.
This presentation discusses the procedure involved in two-way mixed ANOVA design. The procedure has been discussed by solving a problem using SPSS functionality.
This presentation discusses the application of discriminant analysis in sports research. One can understand the steps involved in the analysis and testing its assumptions.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...IOSR Journals
Experts in the mathematical modeling for two interacting technologies have observed the different contributions between the intraspecific and the interspecific coefficients in conjunction with the starting population sizes and the trading period. In this complex multi-parameter system of competing technologies which evolve over time, we have used the numerical method of mathematical norms to measure the sensitivity values of the intraspecific coefficients b and e, the starting population sizes of the two interacting technologies and the duration of trading. We have observed that the two intraspecific coefficients can be considered as most sensitive parameter while the starting populations are called least sensitive. We will expect these contributions to provide useful insights in the determination of the important parameters which drive the dynamics of the technological substitution model in the context of one-at-a-timesensitivity analysis
This report includes information about:
1. Pre-Processing Variables
a. Treating Missing Values
b. Treating correlated variables
2. Selection of Variables using random forest weights
3. Building model to predict donors and amount expected to be donated.
This presentation discusses in detail about the procedure involved in two-factor MANOVA. Both the analysis i.e. multivariate as well as univariate has been shown in this design by solving an illustration using SPSS software.
This presentation discusses the procedure involved in two-way mixed ANOVA design. The procedure has been discussed by solving a problem using SPSS functionality.
As in many other scientific domains where computer–based tools need to be evaluated, also medical imaging often requires the expensive generation of manual ground truth. For some specific tasks medical doctors can be required to guarantee high quality and valid results, whereas other tasks such as the image modality classification described in this text can in sufficiently high quality be performed with simple domain experts.
Discriminant analysis is a technique that is used by the researcher to analyze the research data when the criterion or the dependent variable is categorical and the predictor or the independent variable is the interval in nature. The term categorical variable means that the predictor variable is divided into a number of categories.
DA is typically used when the groups are already defined prior to the study.
The end result of DA is a model that can be used for the prediction of group memberships. This model allows us to understand the relationship between the set of selected variables and the observations. Furthermore, this model will enable one to assess the contributions of different variables.
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
The variable selection is an important technique the reducing dimensionality of data frequently used in data preprocessing for performing data mining. This paper presents a new variable selection algorithm uses the heuristic variable selection (HVS) and Minimum Redundancy Maximum Relevance (MRMR). We enhance the HVS method for variab le selection by incorporating (MRMR) filter. Our algorithm is based on wrapper approach using multi-layer perceptron. We called this algorithm a HVS-MRMR Wrapper for variables selection. The relevance of a set of variables is measured by a convex combination of the relevance given by HVS criterion and the MRMR criterion. This approach selects new relevant variables; we evaluate the performance of HVS-MRMR on eight benchmark classification problems. The experimental results show that HVS-MRMR selected a less number of variables with high classification accuracy compared to MRMR and HVS and without variables selection on most datasets. HVS-MRMR can be applied to various classification problems that require high classification accuracy.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Performance evaluation of hepatitis diagnosis using single and multi classifi...ahmedbohy
The goal of our paper is to obtain superior accuracy of different classifiers or multi-classifiers fusion in diagnosing Hepatitis using world wide data set from Ljubljana University. We present an implementation among some of the classification methods which are defined as the best algorithms in medical field. Then we apply a fusion between classifiers to get the best multi-classifier fusion approach. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. The experimental results show that for all data sets (complete, reduced, and no missing value) using multi-classifiers fusion achieved better accuracy than the single ones
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
The paper addresses the importance of welding design to prevent corrosion at steel. Welding is
used to join pipe, profiles at bridges, spindle, and a lot more part of engineering construction. The
problems happened associated with welding are common issues in these fields, especially corrosion.
Corrosion can be reduced with many methods, they are painting, controlling humidity, and also good
welding design. In the research, it can be found that reducing residual stress on the welding can be
solved in corrosion rate reduction problem.
Preheating on 500oC and 600oC give better condition to reduce corosion rate than condition after
preheating 400oC. For all welding groove type, material with 500oC and 600oC preheating after 14 days
corrosion test is 0,5%-0,69% lost. Material with 400oC preheating after 14 days corrosion test is 0,57%-0,76%
lost.
Welding groove also influence corrosion rate. X and V type welding groove give better condition to reduce
corrosion rate than use 1/2V and 1/2 X welding groove. After 14 days corrosion test, the samples with
X welding groove type is 0,5%-0,57% lost. The samples with V welding groove after 14 days corrosion test is
0,51%-0,59% lost. The samples with 1/2V and 1/2X welding groove after 14 days corrosion test is 0,58%-
0,71% lost.
Router 1X3 – RTL Design and VerificationIJERD Editor
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, it‟s top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a person’s social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesn’t only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
One of the major environmental concerns is the disposal of the waste materials and utilization of
industrial by products. Lime stone quarries will produce millions of tons waste dust powder every year. Having
considerable high degree of fineness in comparision to cement this material may be utilized as a partial
replacement to cement. For this purpose an experiment is conducted to investigate the possibility of using lime
stone powder in the production of SCC with combined use GGBS and how it affects the fresh and mechanical
properties of SCC. First SCC is made by replacing cement with GGBS in percentages like 10, 20, 30, 40, 50 and
by taking the optimum mix with GGBS lime stone powder is blended to mix in percentages like 5, 10, 15, 20 as
a partial replacement to cement. Test results shows that the SCC mix with combination of 30% GGBS and 15%
limestone powder gives maximum compressive strength and fresh properties are also in the limits prescribed by
the EFNARC.
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Welcome to International Journal of Engineering Research and Development (IJERD)
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN : 2278-800X, www.ijerd.com
Volume 5, Issue 1 (November 2012), PP.05-07
K-Nearest Neighbor in Missing Data Imputation
Ms.R.Malarvizhi1, Dr.Antony Selvadoss Thanamani2,
1
Research Scholar ,Research Department of Computer Science, NGM College, 90 Palghat Road,Pollachi, ,Bharathiyar
University,Coimbatore.
2
Professor and HOD, Research Department of Computer Science, NGM College 90 Palghat Road, Pollachi Bharathiyar
University, Coimbatore.
Abstract:- We propose a comparative study on single imputation techniques such as Mean, Median, and Standard
Deviation combined with k-NN algorithm. Training set with their corresponding class groups the data of different
sizes. The above techniques are applied in each group and the results are compared. Median/ Standard Deviation
shown better result than Mean Substitution.
Keywords:- Mean Substitution, Median Substitution, Standard deviation, kNN algorithm, Training Set
I. INTRODUCTION
Missing data is one of the problems which are to be solved for real-time application. Traditional and Modern
Methods are there for solving this problem. The variables may be of Missing Completely at Random, Missing at Random,
Missing not at Random. Each variable should be treated separately. k-NN algorithm is used to group the data set into
different groups. The training examples are vectors in a multidimensional feature space, each with a class label. The training
phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. In the
classification phase, k is a user-defined constant, and an unlabeled vector is classified by assigning the label which is most
frequent among the k training samples nearest to that query point. Usually Euclidean distance is used as the distance metric.
After grouping of data, missing data in each group is imputed by Mean/ Median/Standard Deviation. The results are
compared in different percentage of accuracy.
II. LITERATURE SURVEY
Rubin and Little have defined three classes of them: missing completely at random (MCAR), missing at random
(MAR), and not missing at random (NMAR).In the case of statistical modelling, when external knowledge is available about
the dependencies between the missing values as well as about the dependencies between missing and observed data, one
might use the Markov or Gibbs processes for imputation. Likelihood based tests have been proposed by Fuchs (1982) for
contingency tables, and by Little (1988) for multivariate normal data. A nonparametric test has been proposed by Diggle
(1989 for preliminary screening. Rideout and Diggle (1991) have proposed a parametric test which requires the modelling of
the missing-data mechanism. Chen and Little (1999) have generalized Little’s (1988) basic idea of constructing test statistics.
They avoid distributional assumptions, whereas Little (1988) assumed normal data. Some specific tests for linear regression
models have been proposed too. Simon and Simonoff (1986) have written an article in which they describe tools for MAR
diagnostic and for other purposes. They make no assumptions about the nature of the missing value process. Simonoff has
introduced (1998) a test to detect non-MCAR mechanisms. His diagnostics are based on standard outlier and leverage-point
regression diagnostics. Recently Toutenburg and Fieger (2001) introduced methods to analyse and detect non-MCAR
processes for missing covariates. They use an outlier detection to identify non-MCAR cases.
III. SINGLE IMPUTATION TECHNIQUES
A. Mean Substitution
The most commonly practiced approach is mean substitution— single imputation techniques. Mean substitution
replaces missing values on a variable with the mean value of the observed values. The imputed missing values are contingent
upon one and only one variable – the between subjects mean for that variable based on the available data. Mean substitution
preserves the mean of a variables distribution; however, mean substitution typically distorts other characteristics of a
variables distribution.
B. Median Substitution
Mean or median substitution of covariates and outcome variables is still frequently used. This method is slightly
improved by first stratifying the data into subgroups and using the subgroup average. Median imputation results in the
median of the entire data set being the same as it would be with case deletion, but the variability between individuals’
responses is decreased, biasing variances and covariances toward zero.
C. Standard Deviation
The standard deviation measures the spread of the data about the mean value. It is useful in comparing sets of data
which may have the same mean but a different range. The Standard Deviation is given by the formula
5
2. K-Nearest Neighbor in Missing Data Imputation
D. k-Nearest Neighbor Algorithm for Classification
If each sample in our data set has n attributes which we combine to form an n-dimensional vector: x = (x1, x2, .
.xn). These n attributes are considered to be the independent variables. Each sample also has another attribute, denoted by y
(the dependent variable), whose value depends on the other n attributes x. We assume that y is a categoric variable, and there
is a scalar function, f, which assigns a class, y = f(x) to every such vectors. We suppose that a set of T such vectors are given
together with their corresponding classes: x(i), y(i) for i = 1, 2, . . . , T. This set is referred to as the training set.
The idea in k-Nearest Neighbor methods is to identify k samples in the training set whose independent variables x
are similar to u, and to use these k samples to classify this new sample into a class, v. f is a smooth function, a reasonable
idea is to look for samples in our training data that are near it (in terms of the independent variables) and then to compute v
from the values of y for these samples. The distance or dissimilarity measure can be computed between samples by
measuring distance using Euclidean distance.
The Euclidean distance between the points is
The simplest case is k = 1 where we find the sample in the training set that is closest (the nearest neighbor) to u
and set v = y where y is the class of the nearest neighboring sample. For k-NN , find the nearest k neighbors of u and then
use a majority decision rule to classify the new sample. The advantage is that higher values of k provide smoothing that
reduces the risk of over-fitting due to noise in the training data. In typical applications k is in units or tens rather than in
hundreds or thousands. Notice that if k = n, the number of samples in the training data set, we are merely predicting the class
that has the majority in the training data for all samples irrespective of u.
IV. EXPERIMENTAL ANALYSIS AND RESULT
A dataset consisting of 5000 records with 5 variables has been taken for analysis. The test dataset is prepared with
some data’s missing. The missing percentage varies as 2, 5, 10, 15 and 20. k-NN algorithm is implemented with different
training data and its corresponding class. The group size also differs as 3, 6 and 9. Each group is separately assigned with
mean, median and standard deviation. The results are shown in the below table.
Table 1: Classification of Data into Various Groups
Percentage 3 GROUPS 6 GROUPS 9 GROUPS
of Missing Mean Med Std Mean Med Std Mean Med Std
Sub Sub Dev Sub Sub Dev Sub Sub Dev
2 67 70 70 69 71 71 71 73 72
5 69 72 72 75 76 76 74 77 77
10 66 71 71 72 78 78 71 80 80
15 64 72 72 66 77 77 66 79 79
20 60 72 72 55 78 77 50 80 80
The below table shows the average percentage of the above table. The result shows that median and standard
deviation has some improvement over mean substitution. There is also a gradual improvement in the percentage of accuracy
in case of different sizes of groups.
Table 2: Average of the above Method
Percentage of Mean Median Standard
Missing Substitution Substitution Deviation
2 69 71 71
5 73 75 75
10 70 76 76
15 65 76 76
20 55 77 76
6
3. K-Nearest Neighbor in Missing Data Imputation
V. CONCLUSION AND FUTURE WORK
k-NN algorithm is one of the famous classifier for grouping up of data .Traditional Methods such us Mean/
Median and Standard Deviation is used to improve the performance of accuracy in missing data imputation. This can be
further enhanced by comparing with some other machine learning techniques like SOM, MLP.
REFERENCE
[1]. Allison, P.D-―Missing Data‖, Thousand Oaks, CA: Sage -2001.
[2]. Bennett, D.A. ―How can I deal with missing data in my study? Australian and New Zealand Journal of Public
Health‖, 25, pp.464 – 469, 2001.
[3]. Graham, J.W. ―Adding missing-data-relevant variables to FIML- based structural equation models. Structural
Equation Modeling‖, 10, pp.80 – 100, 2003.
[4]. Graham, J.W, ―Missing Data Analysis: Making it work in the real world. Annual Review of Psychology‖, 60, 549
– 576 , 2009.
[5]. Gabriel L.Schlomer, Sheri Bauman, and Noel A. Card : ― Best Practices for Missing Data Management in
Counseling Psychology‖ , Journal of Counseling Psychology 2010, Vol.57.No 1,1 – 10.
[6]. Jeffrey C.Wayman , ―Multiple Imputation For Missing Data : What Is It And How Can I Use It?‖ , Paper
presented at the 2003 Annual Meeting of the American Educational Research Association, Chicago, IL ,pp . 2 -16,
2003.
[7]. A.Rogier T.Donders, Geert J.M.G Vander Heljden, Theo Stijnen, Kernel G.M Moons, ―Review: A gentle
introduction to imputation of missing values‖ , Journal of Clinical Epidemiology 59 , pp.1087 – 1091, 2006.
[8]. Kin Wagstaff ,‖Clustering with Missing Values : No Imputation Required‖ -NSF grant IIS-0325329,pp.1-10.
[9]. S.Hichao Zhang , Jilian Zhang, Xiaofeng Zhu, Yongsong Qin,chengqi Zhang , ―Missing Value Imputation Based
on Data Clustering‖, Springer-Verlag Berlin, Heidelberg ,2008.
[10]. Richard J.Hathuway , James C.Bezex, Jacalyn M.Huband , ―Scalable Visual Assessment of Cluster Tendency for
Large Data Sets‖, Pattern Recognition ,Volume 39, Issue 7,pp,1315-1324- Feb 2006.
[11]. Qinbao Song, Martin Shepperd ,‖A New Imputation Method for Small Software Project Data set‖, The Journal of
Systems and Software 80 ,pp,51–62, 2007.
[12]. Gabriel L.Scholmer, Sheri Bauman and Noel A.card ―Best practices for Missing Data Management in Counseling
Psychology‖, Journal of Counseling Psychology, Vol. 57, No. 1,pp. 1–10,2010.
[13]. R.Kavitha Kumar, Dr.R.M Chandrasekar,―Missing Data Imputation in Cardiac Data Set‖ ,International Journal on
Computer Science and Engineering , Vol.02 , No.05,pp-1836 – 1840 , 2010.
[14]. Jinhai Ma, Noori Aichar –Danesh , Lisa Dolovich, Lahana Thabane , ―Imputation Strategies for Missing Binary
Outcomes in Cluster Randomized Trials‖- BMC Med Res Methodol. 2011; pp- 11: 18. – 2011.
[15]. R.S.Somasundaram , R.Nedunchezhian , ―Evaluation of Three Simple Imputation Methods for Enhancing
Preprocessing of Data with Missing Values‖, International Journal of Computer Applications (0975 – 8887)
Volume 21 – No.10 ,pp.14-19 ,May 2011.
[16]. K.Raja , G.Tholkappia Arasu , Chitra. S.Nair , ―Imputation Framework for Missing Value‖ , International Journal
of Computer Trends and Technology – volume3Issue2 – 2012.
[17]. BOB L.Wall , Jeff K.Elser – ―Imputation of Missing Data for Input to Support Vector Machines‖ ,
7