Ijrime complimentary copy vol1 issue5

IJRIME Volume1Issue5 ISSN-2249- 1619

Sr. TITLE & NAME OF  THE AUTHOR(S) Page
No. No.
1 REAL TIME NETWORK MONITORING SYSTEM IN LAN ENVIRONMENT 1
M. Shoaib Yousaf , Ahmed Mattin ,  Ahsan Raza Sattar
2 QUALITY OF WORKING LIFE IN INSURANCE SECTOR 12
Rita Goyal
3 REFACTORABILITY ANALYSIS USING LINEAR REGRESSION 23
Gauri Khurana, Sonika Jindal
4 OPTIMIZING FILTERING PHASE FOR NEAR‐DUPLICATE DETECTION OF WEB PAGES   USING TDW‐MATRIX 38
Tanvi Gupta
5 STUDY AND DESIGN OF BUILDING INTEGRATED PHOTO VOLTAIC SYSTEM AT HCTM CAMPUS KAITHAL, HARYANA 47
Rajeev Kumar, Gagan Deep Singh
6 FUNDS MANAGEMENT OF ICICI BANK 64
Manju Sharma
7 EMERGING TRENDS IN HUMAN RESOURCE MANAGEMENT—A CHALLENGE TO THE ITES 77
Raunak Narayan
8 FUNDAMENTAL CHALLENGES IN EMERGENT FIELD OF SENSOR NETWORK SECURITY AND INITIAL APPROACHES TO SOLVE 88
THEM
D. P. Mishra, M. K. Kowar
9 THE ECONOMICS & BUSINESS OF EUROPEAN LEAGUE FOOTBALL 105
Rosy Kalra
10 AN ALGORITHM FOR SOLVING A CAPACITATED FIXED CHARGE BI‐CRITERION INDEFINITE QUADRATIC TRANSPORTATION 123
PROBLEM WITH RESTRICTED FLOW
S.R. Arora, Kavita Gupta
11 IMPACTS OF USE OF RFBIDW ON TAXATION 141
Sulatan Singh, Surendra Kundu, Madhu Arora
12 EVALUATION OF KNOWLEDGE MANAGEMENT LEVEL OF EDUCATIONAL INSTITUTIONS USING FCE AND AHP 148
Mohit Maheshwarkar, N. Sohani, Pallavi Maheshwarkar
13 EVALUATION OF KNOWLEDGE MANAGEMENT LEVEL OF EDUCATIONAL INSTITUTIONS USING ANALYTICAL HIERARCHY 165
PROCESS: A CASE STUDY IN INDIA
Mohit Maheshwarkar, N. Sohani, Pallvai Maheshwarkar
14 PROBE FEED RECTANGULAR PATCH MICROSTRIP ANTENNA: CAD METHODOLOGY 180
R.D. Kanphade, D.G. Wakade, N.T. Markad
15 DETERMINANTS OF GROWTH OF TOURISM INDUSTRY IN GOA: A STUDY   191
Dr. Achut Pednekar

International Journal of Research in IT, Management and Engineering
www.gjmr.org

IJRIME Volume1Issue5 ISSN‐2249‐ 1619

REAL TIME NETWORK MONITORING SYSTEM IN LAN
ENVIRONMENT

M. Shoaib Yousaf *
Ahmed Mattin *
Ahsan Raza Sattar*

ABSTRACT
In this research thesis, I have compared different NMS tools and their feature. I have also
analyzed the available three SNMP versions and compare them in respect of security to select
which one is best to use. The SNMP v1 and v2 have most of similar features but in SNMPv2
some modifications were made to overcome the deficiencies in version 1. After that SNMP
version 3 (SNMPv3) added security and remotely configurations is added in the earlier versions
and SNMP v3 is now most up to date version available today. I have examines the two methods
to secure network traffic i.e. SNMP v3, the latest version and combination of SNMP with the
non secure version like Internet Protocol Security i.e. SNMP over IPSec. These two techniques
implement authorization, safety and privacy of network traffic passing through SNMP.

Keywords: NMS, LAN, SNMP, TCP /IP, IPSec.

*Computer Science Department, University of Agriculture, Faisalabad, Pakistan

www.gjmr.org
1


INTRODUCTION
Network management systems are use to make sure accessibility and complete take care of
computers and network devices installed in LAN. An NMS is able of detection and report
failures of devices configured in network to administrator efficiently. NMS continuously send
messages across the network to all other host to confirm their status. When failures of devices
and slow responses from devices shown, then these systems send extra messages called alerts to
inform system administrators regarding the problems.

To have control of overall network, administrator wants to know the condition of all devices on
configured on the network i.e. Data flowing in / out from each host etc. there is a protocol
available within the TCP / IP suite called Simple Network Management Protocol (SNMP) to
meet this purpose (Amir and Maccane, 2003).

Administrator used multiple tools for monitoring the internet as there is no restriction to select
specific monitoring tool available. E.g. to have complete view of network devices on the internet,
shared intranet, mail servers, database servers etc administrators use IP monitor software and
update them upon receiving alerts via alarms, messages or e-mail etc is case of a connection fails
(Bradley, 2002).

The basic idea of this thesis is to compare the different NMS tools and their feature. In this
research paper we will discuss the available three SNMP versions. The SNMP v1 and v2 have
most of similar features but in SNMPv2 some modifications were made to overcome the
deficiencies in version 1. After that SNMP version 3 (SNMPv3) added security and remotely
configurations is added in the earlier versions and SNMP v3 is now most up to date version
available today. Our main target is to examines the two methods to secure network traffic (i)
SNMP v3, the latest version (ii) combination of SNMP with the non secure version like Internet
Protocol Security i.e. SNMP over IPSec. These two techniques implement authorization, safety
and privacy of network traffic passing through SNMP.

MATERIALS & METHODS
In this section the main focus is on the design of the network management system as well as the
major parts of the system will be discus in this chapter. Also different parts and how these parts
www.gjmr.org
2


correlate each other in network management system to work will be discuss here. We are going
to compare the SNMP versions available and find out the better one to be used with network
management system. Also administrator should be well aware of the security issues such as
ability to restore, capable to delete / add user, able to monitor network accessibility, amount of
traffic, rerouting, user authentication and response time of the faults.

WORKING MECHANISM
SNMP was proposed as a protocol that manages the nodes of network such as important servers,
workstations, routers, switches etc. SNMP protocol is placed inside the UDP transport layer
which is a connectionless layer in OSI model. To calculate the network performance, to locate
the hosts and resolve network problem and to update the network, SNMP is used. SNMP
managed networks consists of there fundamental parts: NMS devices, NMS agents and NMSs.

An SNMP managed device comprises of an SNMP agent which is placed inside the network and
watches all activities of network. The SNMP agent collects all the network information and
stores that information to the use of this information by NMSs. All devices of network like
routers, servers, switches and printers etc are control by the NMSs in the network. An agent is
placed inside the SNMP device that is regularly watching all events of network. SNMP agent is
provided limited access to the collected data and converted this data to a readable form necessary
to use with SNMP.

How NMS Works (Swee, 2006).

Three versions of SNMP are most commonly used: SNMP v1, SNMP v2 and SNMP v3. The
both versions 1 and 2 are similar in function except that in v2 security has been enhanced to

www.gjmr.org
3


overcome the security issues. Keeping in view the importance of security a new version SNMP
v3 was developed that covers all the security issues and provide more features like remote
configuration.

SNMP v1 is placed inside the layers of OSI and performs its functions independently without
any disturbance to those OSI layers. SNMPv1 is most commonly used protocols in early days
and before the invention of next version. An NMS generates a request to devices and devices
respond back to these requests. There are four operations used in v1: Get, Get Next, Set, and
Trap. The Get command used to request objects their values by the NMS. The Get Next
operation is used to request the next value in the table. The Set operation used to fix the values
inside SNMP agent. The last operation that is used for updating any change of the network to
NMS is Trap. The basic limitations in version 1 are the security i.e. message authentication and
protection from outside intruders. SNMP v2 was designed in 1993 to overcome above problems
and was to be an improvement of its ancestor.

SNMPv2 was modified then with GetBulk and Inform operation after version 1. The GetBulk
function collects the huge block of information simultaneously and provides access to NMS to
this information. And Inform function is used in communication of one NMS with another NMS
using trap operation and then receives a response from other NMS. The major area enhanced in
SNMP v2 was security that makes developers for its invention. SNMP v2 has different message
formats. The difference in version 1 and 2 is purely in the field of security. However message
format is same as of version 1 in the UDP for version 2. More security and remote configuration
is added in newer version SNMP V3 that protects messages and provide an easy module to
access these messages for SNMP.

A new characteristic that was not available in previous versions is the user friendly view module
for SNMPv3 addition. This feature allows the elements to control the access to the important
information. SNMP engine having VACM that is consists of many message formats with
different security models. This improvement in NMS and SNMP is suitable for all types of
hardware. In SNMPv3 security is modified into three levels: upper level is authentication and
privacy, middle level is authentication with no privacy and the bottom level is no authentication

www.gjmr.org
4


no privacy. SNMP has the ability to reboot the network devices due to its security features.
Below figure 3.4.3 shows security subsystem of SNMP v3 (Swee, 2006).

COMPARATIVE STUDY OF EXISTING SYSTEMS

It is a basic requirement that the network which I selected must have the capability to reduce the
problems and other issues of traffic concerning with the delay, response time and throughput.
Several materials are existing on the internet or market as concern to these networks and their
relevant problems, furthermore several procedure also exist concerning to the each kind of
networks. But the research is concerned with the performance analysis of NMS protocols and
selecting best one protocol from them. Certain issues are there regarding to the types of traffic,
throughput, latency and network availability. These issues are very common and challenging for
the administration especially in those organizations having WAN link contains the routing
devices. Such organizations can suffer from various kinds of issues regarding to the traffic delays
if the careful selection of the proper network is not made by suspicious investigation.

RESULTS

I have analyzed security of SNMP in this research thesis to conclude which is best to be used in
network. I have examined two techniques of security for secure SNMP traffic: firstly SNMPv3,
most up-to-date invention of SNMP and non secure version of SNMP in a combination of
Internet Protocol Security (IPSec). The security used in SNMP V2 consumes less network
capacity as compare to SNMPv3 and also provides security to IP application which is not
possible in SNMP v3. Also reduces load on administrators in configuring, managing, and
maintaining monitoring systems so that their concentration is focused more on higher level
policies and critical abnormal circumstances also discusses in previous chapter.

Result 1 for one variable

The network capacity used by SNMP is examine by running the SNMP agent with the help of an
SNMP management function. The IPSec used a tunnel mode security mechanism to
communicate between the gateways. Ethereal captured the IP packets generated by SNMP
operations running of the host machine are shown in below table 3.3.1.

www.gjmr.org
5


SNMP Version / Get Response Total
Security Scheme
V2c 78 102 180
V2c over IPSec 137 153 288
V3 noAuthNoPriv 141 165 306
V3 authNoPriv 153 177 330
V3 authPriv 168 192 358
V3 noAuthNoPriv over 191 217 408
IPSec
V3 authNoPriv over 209 233 440
IPSec
V3 authPriv over IPSec 223 249 472

Table 1 shows the SNMP-Get messages, SNMP-Response messages and the total of SNMP
Get/Response sizes in byte using different security schemes for one variable.

Result 2 for seven variables

The second result is almost same as we get the first except it is obtain using 7 variables.

SNMP Version / Security Get Response Total
Scheme
V2c 176 288 464
V2c over IPSec 233 345 578
V3 noAuthNoPriv 249 351 590
V3 authNoPriv 251 363 614
V3 authPriv 265 378 643
V3 noAuthNoPriv over IPSec 289 401 690

www.gjmr.org
6


V3 authNoPriv over IPSec 305 417 722
V3 authPriv over IPSec 321 433 754

Table 2 shows the SNMP-Get messages, SNMP-Response messages and the total of SNMP
Get/Response sizes in byte using different security schemes for seven variables.

From the above results we can conclude that the IPSec using authentication and triple-DES
encryption scheme consume 57 bytes more than the normal IP packet moving this payload. Also
SNMPv3 consume 89 bytes more than the normal IP packet by using HMAC-MD5- 96
authentication and DES encryption schemes.

RESULT 3 NETWORK CAPACITY CONSUMED BY SNMP

The SNMP agent running on gateways is used to get the processing time consumed by a secure
SNMP operation. Ethereal captured the IP packets generated by the SNMP-Get operation
running on observer host. We use node to node tunnel-mode security connection to distinguish
the source of packet and destination of packet. As DES encryption scheme processing is
computationally extremely intensive and by using triple-DES adds three times more processing
than DES. But we can experiment to draw results to gain insight conclusions. The Processing
time interval can be define as the time from capturing the SNMP Get message by Ethereal to the
time corresponding the SNMP Response Message. Table 3.3 shows the average processing time
interval and the standard deviation calculated for both approach.

Mean Time
SNMP Version / Security Scheme Standard Deviation

V2c 310.4 12.2
V3 noAuthNoPriv 525.9 6.5
V3 AuthNoPriv 591.7 6.1
V3 AuthPriv 696.8 57.7
V2c over IPSerc 778.8 80.1
V3 noAuthNoPriv over IPSec 1057.0 19.4

www.gjmr.org
7


21.2
V3 AuthNoPriv over IPSec 1160.0

V3 AuthPriv over IPSec 1457.7 79.5

Table 3 shows the network capacity consumed by SNMP.

RESULT 4 CAPACITY CONSUMED BY SNMP V3 FOR DISCOVERING EXCHANGE

We calculate the capacity consumed by SNMPv3 during discovering exchange for SNMP-Get
message, its corresponding SNMP-Report message and the total bytes used in discovery
exchange. From the result shown below we can predict that a SNMPv3 discovery exchange is
same in size and function to a typical SNMP Get/Response exchange. A more stylish SNMP
management suite remembers the most recent timeliness parameters received from each
SNMPv3 unit to which it communicates, thus reducing the need for discovery exchanges.

SNMP Version Request Report Total
SNMP V3 AuthPriv 102 139 241
SNMP V3 AuthPriv 159 193 352
over IPSec

Table 4 shows capacity consumed by SNMP v3 for discovering exchange.

RESULT 5 CAPACITY CONSUMED BY AN IPSEC

To get the result of network capacity consumed by IPSec Free S/WAN IPSec tool is configured
to keep informed about security between the gateways every minute. Many of these updates are
also capture by the ethereal application running on host observer. Below table shows the IP
packet sizes (in bytes) for all nine packets captured while the initial tunnel-mode security
association is established.

Packet # Mode Length
1 Main 204
2 Main 108

www.gjmr.org
8


3 Main 208
4 Main 208
5 Main 96
6 Main 96
7 Quick 344
8 Quick 320
9 Quick 80
Table 5 shows network capacity consumed by IPSec.

SUMMARY / CONCLUSION
From the obtained results in previous chapter, we conclude that the version 3 of SNMP required
24 % more capacity of network than the use of SNMP v2 with IPSec design. Also with the
change in the size of application layer, the output of SNMP v2 with IPSec changes significantly.
Both techniques SNMPv2 over IPSec and SNMPv3 overheads network devices equally. It will
doubles the processing overhead of devices in SNMP v2 when used authentication and
encryption schemes and when installing IPSec on that device. We can get better results if we
used security and SNMP processing on separate devices. The security gateway is different from
network devices where SNMP agent is implemented in case of SNMP v2 over IPSec. However
in SNMP v3 both security processing and SNMP processing are running on single devices which
creates problems to implement SNMP v3.

The discovery exchange with SNMP v3 consumes 240 more bytes of network capacity. The
complexity of SNMP application effect on discovery exchanges frequency. There is no As
SNMP application has no feature to store the parameters of timelines, hence efficiency of
network capacity badly affected in discovering process making network more overloaded.

REFERENCES

www.gjmr.org
9


Amir, E. and S. McCanne 2003. An active service framework and its application,
Communications Architectures and Protocols, pp: 178–189.

Apostolopoulos, T. and V. Daskalou 1995. On the Implementation of a Prototype for
Performance Management Services, IEEE symposium on computers and communications, 57-
63. A research paper on a prototype for management services.

Behrouz, A. F. 2004. TCP-IP Protocol Suit, McGraw Hill publication, pp: 156-163.

Bettati R. 2008. Modern Fault Trace Analysis and its Capabilities Department of Computer
Science and Center for Information Assurance and Security Texas A&M University College
Station, TX, 77801,USA

Bierman, A. and L. Bucci 2002. Remote Network Monitoring MIB Protocol Identifiers,
Proposed technical specification for RMON2 protocol identifiers, pp: 194-220.

Blum A. and D. Song 2004. Monitoring and Measurements of network bounds. In Proceedings
of the 7th International Symposium on Recent Advances in Intrusion Detection, RAID ’04,
September 2004.

Bradley, M. 2002. Remote Network Monitoring MIB Extensions for Switched Networks
proposed technical specification for RMON of switched networks, pp: 51-68.

Symantec Internet Security threat report highlights (Symantec.com),
http://www.prdomain.com/companies/Symantec/newreleases/Symantec_internet_205032.htm
Accessed on 15 May 2011.

Chang, C. and L. Sung. 2008. Integration and Application of Web-Service-Based Expert System
and Computer Maintenance Management Information System. In Proceedings of the 2008 IEEE
Asia-Pacific Services Computing Conference, pp: 207-212.

Cheswick R. 2002. Firewall and Internet Security, Addison Wesley Professional Computing
Series; pp: 201-223.

www.gjmr.org
10


Corey V. and C. Peterman 2005. IEEE Internet Computing Volume 6, Issue 6 Pages: 60 – 66.
Year of Publication: 2002 ISSN: 1089-7801

Cottrell, L. and C. Logg. 2004. Network monitoring for the LAN and WAN,
http://www.slac.stanford.edu/grp/scs/net/talk/ornl-96/ornl.html,A tutorial paper on monitoring on
Wide Area Network including the internet.

Ergin, M., K. Ramachandran and M. Gruteser 2007. Understanding the effect of access point
density on wireless LAN performance, International Conference on Mobile Computing and
Networking Proceedings of the 13th annual ACM international conference on Mobile computing
and networking, pp: 62-64.

Gast, M. 2002. 802.11 wireless networks: the definitive guide, Wiley, pp: 85-89.

Huges, J. 1996.Characterizing Network Behavior Using Remote Monitoring Devices
Telecommunications, pp: 43-44.

Jung H.J. and J.Y.Choen 2007. Real-time network monitoring scheme based on SNMP for
dynamic information, Journal of Network and computer Applications, 30 (1), pp: 331-353.

www.gjmr.org
11


QUALITY OF WORKING LIFE IN INSURANCE SECTOR

Rita Goyal*

ABSTRACT

The study of Quality of working life has been an important and critical area in management and
organizational performance from last several years especially in the LIC.. This paper aims to
study the extent of QWL in the LIC, and explores the proposed link between the QWL and
employees productivity. Two hundred fifty employees responded to the researcher’s
questionnaire. The study makes use of statistical techniques such as mean, standard deviation, t
test. ANOVA analysis to process and analysis the data collected for this study .The demographic
portion of the instrument was developed by the researcher to sort out the demographic
information. To explore difference between the means of two group t-test was applied. One way
ANOVA was used for exploring the difference among more than two groups. The paper ends by
offering useful suggestions to the management involved in the operations of the corporations.
Key words: Quality of working life, Insurance Sector, Competency Development, Employees
Productivity, Work-Life Balance

*Lecturer Dept. of Humanities and Social Sciences, Maharishi Markendeshwar University,
Mullana (Ambala)
www.gjmr.org 12


INTRODUCTION

Quality of Working Life is a process of work organizations which enables its members at all
levels to actively participate in shaping the organization environment, methods and outcomes.
Conceptual categories which together make up the quality of working life are adequate and fair
compensation, safe and healthy working conditions, immediate opportunity to use and develop
human capacities, opportunity for continued growth and security, social integration in the work
organization, constitutionalization in the work organization, work and the total life space and the
social relevance of work life. Quality of Work Life was the term actually introduced in the late
1960’s. From that period till now the term is gaining more and more importance everywhere, at
every work place. Initially quality of work life was focusing on the effects of employment on the
general well being and the health of the workers. But now its focus has been changed. Every
organization need to give good environment to their workers including all financial and non
financial incentives so that they can retain their employees for the longer period and for the
achievement of the organization goals. The concept of QWL is based on the assumption that a
job is more than just a job. It is the center of a person’s life. In recent years there has been
increasing concern for QWL due to several factors: Increase in education level and consequently
job aspirations of employees; Association of workers; Significance of human resource
management; widespread industrial unrest; Growing of knowledge in human behaviors, etc.

LITERATURE REVIEWS
Bear field, (2003) used 16 questions to examine quality of working life, and distinguished
between causes of dissatisfaction in professionals, intermediate clerical, sales and service
workers, indicating that different concerns might have to be addressed for different groups. The
distinction made between job satisfaction and dissatisfaction in quality of working life reflects
the influence of job satisfaction theories. Lawler, (2004) Quality of Working Life is not a
unitary concept, but has been seen as incorporating a hierarchy of perspectives that not only
include work-based factors such as job satisfaction, satisfaction with pay and relationships with
work colleagues, but also factors that broadly reflect life satisfaction and general feelings of
well-being suggested that quality of working life was associated with satisfaction with wages,
hours and working conditions, describing the “basic elements of a good quality of work life” as:

www.gjmr.org 13


 safe work environment,
 equitable wages,
 Equal employment opportunities and opportunities for advancement.
Waddell Jane and Carr Paul (2005) In addition to competition of globalization and products,
organization face competition related to employee retention at the same time employees face
competition for their time. As increasing number of employees face competing demands between
work and family, the importance of maintaining a healthy work life balance is of paramount
consideration. In spite of family- friendly policies, many employees perceive negative
consequences associated with availing themselves of these policies. At the same time, over 50%
of American employees fail to take their allotted vacation time. Failure to achieve a healthy work
life balance can lead to overload, which may result in loss of employees. Encouraging a healthy
work life balance benefits both the organization and the employees. Lawler and Porter (2006).
An individual’s experience of satisfaction or dissatisfaction can be substantially rooted in their
perception, rather than simply reflecting their “real world”. Further, an individual’s perception
can be affected by relative comparison – am I paid as much as that person - and comparisons of
internalized ideals, aspirations, and expectations, for example, with the individual’s current state
In summary, where it has been considered, authors differ in their views on the core constituents
of Quality of Working Life (e.g. Sirgy, Efraty, Siegel & Lee, 2001 and Warr, Cook & Wall,
1979). It has generally been agreed however that Quality of Working Life is conceptually similar
to well-being of employees but differs from job satisfaction which solely represents the
workplace domain. Banerjee Indranil (2006) Jobs are getting increasingly demanding, as the
organization face competition and become leaner in structure, leading to conflict between
people’s professionals deliverable and personal requirements. It is acknowledged that continuous
disregard of personal issues ultimately lead to employees’ underperformance and so people often
discuss work life balance but seldom act on it. So, the focus now is “Who is going to bell the
cat?” For tackling the problem, multi-pronged effort, comprising the organization, the employee,
the Government, the Industry, the society, etc., is required. Tekuru Siva ram (2007) Work- life
balance is all about need for individuals having complete control over their work, i.e. deciding
when, why, where and how to work. Finding these pressures encroaching into their private life
and time, they are unable to do anything about it and are finally squeezed out. Organization
should consider Work –life balance as an extension of the fringe benefits offered to the

www.gjmr.org 14


employees. This will help both the employees and the organization. Aggarwala Tanuja (2007)
Conflicting demands and pressures from works and life (family) can interfere with each other
since the two domains are complementary, not conflicting priorities. Acceptance of this reality
by the organization and new business and societal trends, have seen the growth of family-
friendly practices at work place. Adopting a win- win approach, growing number of organization
believe that helping employees balance and integrate their work lives with the rest of their lives
leads to positive outcomes for both the employee and the employer. Work- family practices
should be viewed as a part of overall HR and business strategy that is related to a firm’s
competitive advantage. Swamy (2007) In today’s business context the pressures of work have
been intensifying and there is a growing feeling among employees that the demand of work
being to dominate life and a sense of work-life imbalance is felt. The challenge of integrating
work and family life is a part of everyday reality for the majority of employees. Organizations
have to continually innovate and come up with programs that provide scope for employees to
balance their responsibility at their work place and interest they have outside work.
Suman Ghalawat (2010) states that QWL is a Process of work organizations which enables its
members at all levels to actively participate in shaping the organizations’ environment, methods
and outcomes. This value based process is aimed towards meeting the twin goals of enhanced
effectiveness of organization and improved quality of the life at work for employees. Work is an
integral part of our everyday life, as it is our livelihood or career or business. On an average we
spend around twelve hours daily in the work place, that is one third of our entire life; it does
influence the overall quality of our life. It should yield job satisfaction, give peace of mind, a
fulfillment of having done a task, as it is expected, without any flaw and having spent the time
fruitfully, constructively and purposefully. Even if it is a small step towards our lifetime goal, at
the end of the day it gives satisfaction and eagerness to look forward to the next day. The factors
that influence and decide the Quality of Work Life are: Attitude, environment, opportunities,
nature of job, people, stress level, career prospects, growth and development, risk involved and
reward.

OBJECTIVES OF STUDY

In light of the domain for research, the study was undertaken:-

www.gjmr.org 15


1. To examine the nature of quality of working life prevailing in some selected Branches of LIC.
2. To study the differences in the perception of employees on the basis of gender.
3. To study the differences in the perception of employees on the basis of designation.
4. To study the differences in the perception of employees on the basis of Qualification.

HYPOTHESIS
In view of the objectives set for the study, following null hypothesis was formulated:

Ho1.1 There is no significant difference between the perception of male and female employees
regarding quality of working life.
Ho1.2There is no significant difference between the perceptions of employees at different levels
regarding quality of working life
Ho1.3 There is no significant difference between the perception of graduate and post graduate
employees regarding quality of working life.
RESEARCH METHODOLOGY
Data
A total of 400 employees were chosen randomly from the 4branches, keeping in view their total
strength and range of activities. Out of 400 questionnaires distributed only 250questionnaires
were received completed in all respects. Therefore with 62.5% response rate the researcher has
conducted this study.
SAMPLE OF THE STUDY
Following table represents the sample of study:
Gender-wise distribution of employees

N Percent
Male 185 74
Female 65 26
Total 250 100

www.gjmr.org 16


Designation-wise distribution of employees
Employees
N Percent
Class-1 100 40
Class-11 69 27.6
Class-111 81 32. 4

Total 250 100
Qualification wise distribution of Employees
Employees
No. Percent
Graduate 140 56
Post Graduate 110 44
Total 250 100

QUESTIONNAIRE
The questions were designed to facilitate the respondents to identify major strengths and
weakness of the Corporations and provide insights. The endeavors were to identify the key
quality of working life issues, on which employee’s perception can be obtained. The respondents
were requested specifically to ignore their personal prejudices and use their best judgment on a 5
point Likert scale. The purpose of this exercise was to make the response a true reflection of
organization reality rather than an individual opinion. The 5 point of the scale indicated in the
questionnaire are- 1. Strongly disagree, 2 disagree, 3-Undecided, 4-Agree and 5- Strongly Agree.
Reliability (Cronbach’s coefficient alpha) of the questionnaire has found to be 0.89.This shows
data has satisfactory internal consistency.

Descriptive Analysis:
Result & Discussion
The results in the following table reveal that in the scale for quality of working Life, the highest
mean score (44.29) is for male and the lowest (33.56) is for level III employees. The same has
been shown graphically in figure1.1

www.gjmr.org 17


Summary of “t”test presented in the table 1.2 indicates that t-value (1.60) is significant as p-
value (.110) is more than 0.05.Hence the hypothesis stating the difference is not significant
between the perception of male and female employees regarding. Quality of working life is
accepted at 0.05 level of significance. So there is not a significant difference between the
perception of male and female employees regarding quality of working life.
Mean value for males (74.29) is less than females (78.89) therefore it is concluded that female
employees have better perception of QWC than male employees.
Summary of the univariate analysis of variance presented in the table 1.3 indicates that p-value
(0.232) is greater than 0.05as F value (1.469) is not significant at 0.05 level of significance.
Hence the hypothesis is accepted at 0.05 level of significance, so there is no significant
difference among the perception of employees at different levels regarding quality of working
life.
Summary of “t”test presented in the table 1.4 indicates that t-value (.348) is significant as p-
value (0.728) is more than 0.05.Hence the hypothesis stating, The difference is not significant
between the perception of graduate and post graduates employees regarding QWC. “Is accepted
at 0.05 level of significance. So there is not a significant difference between the perception of
graduate and post graduate employees regarding QWC in selected branches of LIC.
Mean value for graduate (34.69) is less than Postgraduate Employees (35.58) therefore it is
concluded that post graduate employees have better perception of QWC than graduate
employees. Thus findings are:
The difference is not significant between the perception of male and female employees regarding
quality of working life. It shows that gender does not affect the perception of QWL System of
employees as all are equally aware of the significance of it.
There is no significant difference among the perception of employees at different levels
regarding quality of working life. As all are equally aware of the significance of it. It shows that
the need of the employee’s development is felt in all cases. The difference is not significant
between the perception of Graduates and Post Graduates employees regarding the quality of
work life in selected branches of LIC. As both areas are equally related to improvement and
progress.

www.gjmr.org 18


CONCLUSION
In LIC, Quality of Working Life principles are the principle of security, the principle of equity,
the principle of individuation and the principle of democracy. On the basis of my study I can say
that employees of LIC in Northern region are happy with the working conditions of the LIC.
They feel that they are safe and secure in LIC. They feel that corporation should start their own
transport facilities for the staff. However, the dissatisfaction among them is the less growth
opportunities. They are not provided with extra care like health camps etc Poor work life balance
leads to many disastrous things like tardy, bad performance, lack of motivation, more errors,
absence from work and so on. The worst thing is that poor work-life balance reduces work
quality and productivity without any doubt. When an employee won't be able to give time to his
family at home, he will feel stressed out at work Sound work life balance will definitely have a
positive impact on employee’s productivity. The quality of work improves significantly as
employees feel fresh and not stressed out at all.
Suggestion
1.Corporation must be committed to an open and transparent style of operation that include
sharing appropriate information with employees and sincerely inviting their input regarding
problems opportunities and implementation of improvement plans.
2. Employees must be given opportunities for advancement in the corporation.
3. Traditional status barriers between different classes must be broken to permit establishment of
an atmosphere of trust and open communication.
4. Employees should receive feed back on results achieved and recognition for superior
performance. Other forms of positive reinforcement such as financial incentives should also be
made available where feasible.
5. Improved communication and co-ordination among the workers and organization helps to
integrate different jobs resulting in better task performance.
6. Better working condition enhances workers motivation to work in a healthy atmosphere
resulting in motivation and increase in production.
7. As QWL includes participation in group discussion and solving the problem, improving the
skill, enhancing their capabilities and thus building confidence and increased output.

www.gjmr.org 19


REFERENCES:
Anonymous (2005). Quality of Work Life Task Force looks to integrate home and work.
Vanderbilt University Medical Center, House Organ. Available from http://
www.Quality20%of/20%work/20% life. htm.

Anbarasan, V & Mehta, N. (2009), "An Exploratory Study on Perceived Quality of Working Life
among Sales Professionals Employed in Pharmaceuticals, Banking, Finance and Insurance
Companies in Mumbai", Abhigyan, 27(1): 70-81.

Ebrahim (2010) “The relation between QWL and job satisfaction”, Middle –East Journal of
scientific Research 6(4), 317-323-2010.
Feuer, D., Quality of work life: a cure for all ills? Training: The Magazine of Human Resources
Development, 26: 65-66, 1989.

Mishra, S. & Gupta, B. (2009), "Work Place Motivators and Employee's Satisfaction: A Study
on Retail Sector in India", The Journal of Industrial Relations, 44(3): 509-17.

Raduan,C. R., Loosee .B., Jegak,U & Khairuddin, I. (2006), "Quality of Work Life: Implications
of Career Dimensions", Journal of Social Sciences. 2 (2): 61-67.

Sandrick k (2003). Putting the emphasis on employees as an award. Winning employer, Baptist
health care has distant memories of the workforce shortage, Trustee. pp. 6-

Straw, R.J. and C.C. Heckscher, 1984. QWL: New working relationships in the communication
industry. Labor Studies J., Vol. 9: 261-274.

Walton, R. (1973), ― Quality of Work life Indicators- Prospects and Problems- A Portigal
Measuring the Quality of working life, pp-57-70, Ottawa

www.gjmr.org 20


Table 1.1: Scale for Quality of working Life

Factor No. Mean S.D

Gender-Male 185 44.29 19.85

Female 65 38.89 20.13

Designation-Level 1 100 39.27 18.69

Level 11 69 34.72 20.88

Level 111 81 33.56 22.86

Qualification- 140 34.69 19.34
Graduate

Post Graduate 110 35.58 20.95

Tab 1.2 Perceptual differences between male and female employees regarding quality of
working life.
Group Sample Mean S.D. t- value df p-value
size
Male 185 44.29 19.85 1.60 248 .110
Employee
Female 65 48.89 20.13
Employees
P>0.05

www.gjmr.org 21


Tab.1.3 Perceptual differences between employees at different level regarding quality of
working life.

Particulars Sample size Mean d.f F value P value
Class-1 100 39.27 2 1.469 0.232
Class-11 69 34.72
Class-111 81 33.56
P>0.05

Tab1.4: Perceptual differences between Employees with graduate and postgraduate
qualification regarding quality of working life.

Particulars Sample Size Mean SD t-test df p-Value
Graduate 140 34.69 19.34 .348 248 .728
Employee
Postgraduate 110 35.58 20.95
Employees
P>0.05

www.gjmr.org 22


REFACTORABILITY ANALYSIS USING LINEAR REGRESSION

Gauri Khurana*

Sonika Jindal **

ABSTRACT

Software refactoring - improving the internal structure of the software without changing its
external behavior - is an important action towards avoiding software quality decay. Key to
this activity is the identification of portions of the source code that offers opportunities for
refactoring -- the so called bad smells. The underlying objective is to improve the quality of
the software system, with regard to future maintenance and development activities. The goal
of this review paper is the discussion of an approach to help on the detection of code bad
smells through source code metrics and the results obtained from its use. In this discussion,
we propose measure of refactorability based on the four factors- reusability,
understandability, modifiability and maintainability. Since, each of the factors is intangible in
nature and is hard to measure. It is also proposed that they should be measured in terms of
point system. It is also important to bring new elements that might be affected through a
refactoring sequence as, for example, structural testing requirements that can be used in the
future as a new metric to detect refactoring opportunities.

Keywords: Refactoring, reusability, understandability, modifiability, maintainability, bad
smell, metrics

*CSE, SBSCET, Ferozpur. PTU, Jalandhar
** Assistant Professor, Department of Computer Science, SBSCET, Ferozpur. PTU,
Jalandhar.


www.gjmr.org

23


1. INTRODUCTION
1.1 Introduction to refactoring
Refactoring is a well-defined process that improves the quality of systems and allows
developers to repair code that is becoming hard to maintain, without throwing away the
existing source code and starting again. By careful application of refactorings the system’s
behavior will remain the same, but return to a well-structured design. The use of automated
refactoring tools makes it more likely that the developer will perform the necessary
refactorings, since the tools are much quicker and reduce chance of introducing bugs.
“Refactoring is the process of changing a software system in such a way that it does not alter
the external behavior of the code yet it improves its internal structure.”-Martin Flower in
Refactoring, Improving the Design of Existing Code.

Refactoring is a kind of reorganization. Technically, it comes from mathematics when you
factor an expression into an equivalence- the factors are cleaner ways of expressing the same
statement. Refactoring implies equivalence- the beginning and the end product must be
functionally identical. The shift from Structured Programming to Object-oriented
Programming is a fundamental example of refactoring. [1]

“Refactoring is the process of taking an object design and rearranging it in various ways to
make the design more flexible and/or usable.” – Ralph Johnson.

Four Reasons to change the code:

The four primary reasons to change the code are [2]:

1. Adding a feature

2. Fixing a bug

3. Improving the design

4. Optimizing resource usage

1.2 Preserving Behavior
Feature addition and bug fixing are very much like refactoring and optimization. In all cases
of changing code, we want to change some functionality, some behavior, but we want to
preserve much more (see Figure 1)


www.gjmr.org

24


Existing Behavior New Behavior

Figure 1: Preserving Behavior [2]

Figure 1 shows what is supposed to happen when we make changes, but what does it mean
for us practically? On the positive side, it seems to tell us what we have to concentrate on.
We have to make sure that small numbers of things that we change are changed correctly. On
the negative side, that isn’t the only thing we have to concentrate on. We have to figure out
how to preserve the rest of the behavior. The amount of behavior to be preserved is usually
very large.

Preserving behavior is a large challenge. When we need to make changes and preserve
behavior, it can involve considerable risk. [2] To mitigate risk, we have to ask three
questions:

1. What changes do we have to make?

2. How will we know that we’ve done them correctly?

3. How will we know that we haven’t broken anything?

1.3 Why do we need refactoring?
The longer object oriented systems are in use, the more probable it is that these systems have
to be maintained [3], i.e. they have to be optimized to a given goal (Perfective Maintenance),
they have to be corrected with respect to identified defects (Corrective Maintenance) and they
have to be adjusted to a changing environment (Adaptive Maintenance). Whereas many of
these activities can be subsumed under the reengineering area, there are additional changing
activities that are much less difficult to apply than typical reengineering activities, and which
does not change the external behavior [4]. The main goal of these “mini-reengineering
activities” is to improve the understandability and to simplify reengineering activities. Flower
calls these activities Refactorings, which he defines a “a change made to the internal
structure of a software to make it easier to understand and cheaper to modify without
changing its observable behavior” [1, p. 53].


www.gjmr.org

25


Fowler suggests four purposes of refactoring [1]:
1. Improve the design of software – Through accumulating code changes, code loses its
structure, thereby increasingly drifting towards a state of decay. Refactoring can be used
to cure software decay by redistributing parts of the code to the “right” places, and by
removing duplicated code. The claim that refactoring can improve the design of software
is confirmed by [3] with regard to cohesion and with respect to coupling, as indicators
for internal software quality. Another claimed benefit in the area of improved design is
improved flexibility.
2. Make software easier to understand – Refactoring can help make the code more readable
by making it better communicate its purpose. A different way in which refactoring
supports program understanding is in reflecting hypotheses about the purpose of the code
by changing the code, and afterwards testing that understanding through rerunning the
code. The suggested process to do so is to start refactoring the little details to clarify the
code, thereby exposing the design. The potential to improve understandability through
refactoring is confirmed by many authors [1, 3]. In more specific terms, [5] discusses
how refactorings can be used to improve communicating the purpose of the code.
3. Help find bugs – Through clarifying the structure of the code, the assumptions within the
code are also clarified, making it easier to find bugs.
4. Program faster – Through improving the design and overall understandability of the
code, rapid software development is supported.

1.4 When should one consider refactoring?
Ideally, refactoring would be part of a continuing quality improvement process. In other
words, refactoring would be seamlessly interwoven with other day-to-day activities of every
software developer.
Refactoring may be useful, when a bug has surfaced and the problem needs to be fixed or the
code needs to be extended. Refactoring at the same time as maintenance or adding new
features also makes management and developers more likely to allow it, since it will not
require an extra phase of testing.
If the developer in charge finds it difficult to understand the code, he will (hopefully) ask
questions, and begin to document the incomprehensible code.
Often, however, schedule pressures do not permit to implement a clean solution right away.
A feature might have to be added in a hurry, a bug patched rather than fixed. In these cases,


www.gjmr.org

26


the code in question should be marked with a FIXME note, in order to be reworked, when
time permits. Such circumstances call not for individual refactorings, but for a whole
refactoring project. When the time has come to address the accumulated problems, a scan for
FIXMEs, TODOs, etc. over the code base will return all the trouble spots for review. They
can be refactored according to priority.

2. SEMANTIC GAP
The concept of Semantic gap is relevant whenever a human activity, observation and task are
transferred into computational representation [6]. Like programs, programming languages are
not only mathematical objects but also software engineering artifacts. Describing the
semantics of real-world languages can help bring language theory to bear on both exciting
and important real-world problems. Achieving this is not purely a mathematical task, but
equally one of (semantic) engineering. The implementations of all major languages—
especially scripting languages defined by implementations—come with large and well-
structured test suites. These suites embody the intended semantics of the language. We
should be able to use such a test suite to retrofit semantics. For this to be useful, it is not
sufficient to merely create semantics for the core language [4].
 More precisely the gap means the difference between contextual knowledge in a
powerful language (e.g. natural language) and its reproducible and computational
representation in a formal language (e.g. programming language).
 The semantic gap actually opens between the selection of the rules and the representation
of the task.
With the passage of time, the business scenario keeps on changing and the software
development must match the business environment. Therefore the code of any software also
changes with respect to the business scenario. There might be architectural changes in
software due to business reengineering process. The programmer has to rethink how to do the
implementation of the code due to changes in the requirements. So, it offers opportunity to
relook, redesign, as well as refactor the code. Thus, it forces new semantics to be laid with
respect to the changing business scenario.

3. REFACTORING ACTIVITIES
The refactoring process consists of a number of different activities, each of which can be
automated to a certain extent [7]:

www.gjmr.org

27


1. Identify where the code should be refactored;

2. Determine which refactorings should be applied to the identified places;

3. Guarantee that the applied refactoring preserves behavior;

4. Apply the refactoring;

5. Assess the effect of refactoring on software quality characteristics;

6. Maintain consistency between refactored program code and other software artifacts (or
vice versa).

The steps taken when applying the refactoring should be small enough to oversee the
consequences they have and reproducible to allow others to understand them. Generalized
refactoring steps in away, are mere a rule that can be applied to any structure.

Refactoring not only covers the mechanics of restructuring, but also addresses the following
issues [Martin Flower]:

1. Refactoring emphasizes that, in absence of more formal guarantees, testing should be
used to ensure that each restructuring is behavior preserving. A rich test suite should be
built, which must be run before and after each test is applied.

2. Refactorings are described in a catalog, using a template reminiscent of design patterns.

3. Refactorings are applied in small steps, one by one, running the test suite after every step
to make it into commercial development tools.

4. METRICS FOR REFACTORABILITY
The various metrics are identified for calculating the values of four factors proposed here
separately. Those are defined as follows:

1. LinesOfCode (NbLines): The LOC for a method is equals to the number of sequence
point found for this method in the file. A sequence point is used to mark a spot in the
IL code that corresponds to a specific location in the original source. Notice that
sequence points which correspond to braces ‘{‘ and ‘}’ are not taken into account.
Interfaces, abstract methods and enumerations have a LOC equals to 0. Only concrete
code that is effectively executed is considered when computing LOC.


www.gjmr.org

28


 Namespaces, types, fields and methods declarations are not considered as line of code
because they don’t have corresponding sequence points.
 LOC computed from an anonymous method doesn’t interfere with the LOC of its
outer declaring methods.

Recommendations: Methods where LinesOfCode is higher than 20 are hard to
understand and maintain. Methods where ILInstructions is higher than 40 are
extremely complex and should be split in smaller methods (except if they are
automatically generated by a tool).

2. LinesOfComment(NbComments): This metric can be computed only if PDB files
are present and if corresponding source files can be found. The number of lines of
comment is computed as follow:

 For a method, it is the number of lines of comment that can be found in its body. If a
method contains an anonymous method, lines of comment defined in the anonymous
method are not counted for the outer method but are counted for the anonymous
method.
 For a type, it is the sum of the number of lines of comment that can be found in each
of its partial definition.
 For a namespace, it is the sum of the number of lines of comment that can be found in
each of its partial definition.
 For an assembly, it is the sum of the number of lines of comment that can be found in
each of its source file.

Notice that this metric is not an additive metric (i.e. for example, the number of lines of
comment of a namespace can be greater than the number of lines of comment over all its
types).
Recommendations: This metric is not helpful to asses the quality of source code. We
prefer to use the metric PercentageComment.

3. NbMethods: The number of methods. A method can be an abstract, virtual or non-
virtual method, a method declared in an interface, a constructor, a class constructor, a
finalizer, a property/indexer getter or setter, an event adder or remover.
Recommendations: Types where NbMethods > 20 might be hard to understand and

www.gjmr.org

29


maintain but there might be cases where it is relevant to have a high value for
NbMethods.

4. NbFields: The number of fields. A field can be a regular field, an enumeration's value
or a read only or a const field.

Recommendations: Types that are not enumeration and where NbFields is higher 20
might be hard to understand and maintain but there might be cases where it is relevant
to have a high value for NbFields.

5. Afferent coupling (Ca): The number of types outside this assembly that depend on
types within this assembly. High afferent coupling indicates that the concerned
assemblies have many responsibilities.
6. Efferent coupling (Ce): The number of types outside this assembly used by child
types of this assembly. High efferent coupling indicates that the concerned assembly
is dependant.

There is a whole range of interesting code metrics relative to coupling. The simplest
ones are named Afferent Coupling (Ca) and Efferent Coupling (Ce). Basically, the
Ca for a code element is the number of code elements that use it and the Ce is the
number of code elements that it uses.

Figure 2: Afferent and Efferent Coupling

You can define Ca and Ce for the graph of assemblies dependencies, the graph of
namespaces dependencies, the graph of types dependencies and the graph of methods
dependencies of a code base. You can also define the Ca metric on the fields of a
program as the number of methods that access the field.


www.gjmr.org

30


7. Cyclomatic Complexity (CC): Cyclomatic complexity is a popular procedural
software metric equal to the number of decisions that can be taken in a procedure.
Concretely, in C# the CC of a method is 1 + {the number of following expressions
found in the body of the method }:

if | while | for | foreach | case | default | continue | goto | && | || | catch | ternary
operator? : | ??

Following expressions are notcounted for CC computation:
else | do | switch | try | using | throw | finally | return | object creation | method call |
field access

The Cyclomatic Complexity metric is defined on methods. Adapted to the OO world,
this metric is also defined for classes and structures as the sum of its methods CC.
Notice that the CC of an anonymous method is not counted when computing the CC
of its outer method.

Recommendations: Methods where CC is higher than 15 are hard to understand and
maintain. Methods where CC is higher than 30, are extremely complex and should be
split in smaller methods (except if they are automatically generated by a tool).
8. Efferent coupling at method level (MethodCe): The Efferent Coupling for a
particular method is the number of methods it directly depends on.
9. Afferent coupling at field level (FieldCa): The Afferent Coupling for a particular
field is the number of methods that directly use it.
10. NbOverloads: The number of overloads of a method. . If a method is not overloaded,
its NbOverloads value is equals to 1. This metric is also applicable to constructors.
Recommendations: Methods where NbOverloads is higher than 6 might be a problem
to maintain and provoke higher coupling than necessary. This feature helps reducing
the number of constructors of a class.
11. Association Between Classes (ABC): The Association between Classes metric for a
particular class or structure is the number of members of others types it directly uses
in the body of its methods.
12. Depth of Inheritance Tree (DIT): The Depth of Inheritance Tree for a class or a
structure is its number of base classes (including the System.Object class thus DIT >=


www.gjmr.org

31


1).
Recommendations: Types where DepthOfInheritance is higher or equal than 6 might
be hard to maintain. However it is not a rule since sometime your classes might
inherit from third-party classes which have a high value for depth of inheritance.
13. NbAssemblies: Only application assemblies are taken into account.
14. NbNamespaces: The number of namespaces. The anonymous namespace counts as
one. If a namespace is defined over N assemblies, it will count as N.
15. PercentageCoverage: The percentage of code coverage by tests. Code coverage data
are imported from coverage files. If you are using the uncoverable attribute feature on
a method for example, if all sibling methods are 100% covered, then the parent type
will be considered as 100% covered. Coverage metrics are not available if the metric
LinesOfCode is not available.
Recommendations: The closer to 100%, the better.
16. Relational Cohesion (H): Average number of internal relationships per type. Let R
be the number of type relationships that are internal to this project (i.e. that do not
connect to types outside the project). Let N be the number of types within the project.
H = (R + 1)/ N. The extra 1 in the formula prevents H=0 when N=1. The relational
cohesion represents the relationship that this project has to all its types.

Recommendations: As classes inside an project should be strongly related, the
cohesion should be high. On the other hand, too high values may indicate over-
coupling. A good range for RelationalCohesion is 1.5 to 4.0. Projects where,
RelationalCohesion < 1.5 or RelationalCohesion > 4.0 might be problematic.

5. RATING SCALE

A rating scale is a set of categorize designed to elicit information about a quantitative or a
qualitative attribute. In the social sciences, common examples are the Likert scale and 1-10
rating scales in which a person selects the number which is considered to reflect the
perceived quality of a product. More than one rating scale is required to measure an attitude
or perception due to the requirement for statistical comparisons between the categories in the
polytomous Rasch model for ordered categories (Andrich, 1978).

5.1 Likert scale


www.gjmr.org

32


A Likert scale is a psychometric scale commonly used in questionnaires, and is the most
widely used scale in survey research, such that the term is often used interchangeably with
rating scale even though the two are not synonymous. When responding to a Likert
questionnaire item, respondents specify their level of agreement to a statement. The scale is
named after its inventor, the US organizational-behavior psychologist Rensis Likert (1903-
81). Each item may be analyzed separately or in some cases item responses may be summed
to create a score for a group of items. Hence, Likert scales are often called summative scales.

Likert scale data can, in principle, be used as a basis for obtaining interval level estimates on
a continuum by applying the polytomous Rasch model, when data can be obtained that fit this
model. In addition, the polytomous Rasch model permits testing of the hypothesis that the
statements reflect increasing levels of an attitude or trait, as intended. For example,
application of the model often indicates that the neutral category does not represent a level of
attitude or trait between disagree and agree categories. Again, not every set of Likert scaled
items can be used for Rasch measurement. The data has to be thoroughly checked to fulfill
the strict formal axioms of the model.

Likert scales usually have five potential choices (strongly agree, agree, neutral, disagree,
strongly disagree) but sometimes go up to ten or more. The final average score represents
overall level of accomplishment or attitude toward the subject matter [8].

Since, each of the factors is intangible in nature and is hard to measure. It is also proposed
that they should be measured in terms of point system as follows:

Table 1: Scale of Reusability:

High Reusability 10-9

Medium Reusability 8-7

Low Reusability 6-5

Very low Reusability 4-3

No Reusability 2-1


www.gjmr.org

33


Table 2: Scale of Maintainability:

High Maintainability 10-9

Medium Maintainability 8-7

Low Maintainability 6-5

Very low Maintainability 4-3

No Maintainability 2-1

Table 3: Scale of Understandability:

High Understandability 10-9

Medium Understandability 8-7

Low Understandability 6-5

Very low Understandability 4-3

No Understandability 2-1

Table 4: Scale of Modifiability:

High Modifiability 10-9

Medium Modifiability 8-7

Low Modifiability 6-5

Very low Modifiability 4-3

No Modifiability 2-1


www.gjmr.org

34


6. CORRELATION AND REGRESSION ANALYSIS
Correlation and regression are generally performed together. The application of correlation
analysis is to measure the degree of association between two sets of quantitative data. There
are virtually no limits of applying correlation analysis to any dataset of two or more variables.
It is the researcher’s responsibility to ensure correct use of correlation analysis. Correlation is
usually followed by regression analysis in many applications. The main objective of
regression analysis is to explain the variation in one variable (called the dependent variable),
based on the variation in one or more other variables (called the independent variables). If
there are only one dependent variable and only one independent variable used to explain the
variation in it, then the model is known as simple regression. If multiple independent
variables are used to explain variation in one dependent variable, it is called multiple
regressions [9]. Even though the regression equation could be either linear or non-linear, we
limited our discussion to linear models.
From the regression analysis of the various four factors (reusability, understandability,
modifiability, maintainability) separately, using their respective metrics, the analysis of
refactorability can be done by applying linear regression over refactorability using these four
factors. Thus, the regression equation for refactorability will be as follows:

Y=a+bX1+cX2+dX3+eX4

Where, Dependent Variable= Y

Independent Variables are: X1, X2, X3, and X4.

The above mentioned regression equation is applied to each factor that is considered to be
affecting the refactorability of the software. The underlying steps are carried out for each of
the factor separately, by considering their respective metrics as their independent variables.

Step 1: Collect the dataset containing the values for each metric identified. And based on that
dataset, the points based on the rating scale are assigned, considering the rules.

Step 2: The correlation is found among the independent variables and dependent variables,
for each factor affecting refactoring. The SPSS 16 tool is used to find the correlation. The
positive value of correlation specifies that the factor is directly affected by that variable. And,
the negative value shows that the factor is inversely affected by the respective variable.


www.gjmr.org

35


Step 3: The regression analysis is done to explain the variation in one variable (dependent
variable), based on the other variable (independent variable). The linear equation is used for a
regression analysis and the values of the coefficients of the linear equation are determined.

Step 4: The output of the regression is determined with the help of value of R-square. The
measure of strength of association in the regression analysis is given by the determination of
R-square. The coefficient varies between 0 and 1 and represents the proportion of total
variation in the dependent variable that is accounted by the variation in the factors.

After applying all the steps to each factor, the refactorability is estimated using the linear
regression equation, considering refactorability as the dependent variable and other four
factors affecting refactoring as independent variables. The partial regression plots are
obtained for each factor, the slope of which determines that the model designed to determine
the refactorability is good or bad. The linear slope of the graph determines that the model
developed for refactorability based on that factor is good enough to determine the
refactorability.

The results of the regression analysis of all the factors, considered, that affect refactoring are
studied. Based on the results of each factor the points on the Rating scale are obtained for
refactorability.

7. CONCLUSION
Software Refactoring is an important area of research that promises substantial
benefits to software maintenance. Refactoring is a process that improves the
quality and allows developers to repair code that is becoming hard to maintain,
without throwing away the existing source code and starting again. We can return
with a well structured and well designed code after proper application of
refactoring techniques. By careful application of refactorings the system’s behavior will
remain the same, but return to a well-structured design. The use of automated refactoring
tools makes it more likely that the developer will perform the necessary refactorings, since
the tools are much quicker and reduce chance of introducing bugs.
From the literature survey of various research papers, the following factors are determined for
measuring refactoring of code and level of optimization of code namely- reusability,
maintainability, understandability, modifiability. Here, we have proposed a 10-point system,
to measure refactorability. The 10-point system is based on the Likert’s Rating Scale. The


www.gjmr.org

36


metrics that affect each factor of refactoring are determined and the values are calculated.
The correlation and regression analysis is performed to determine the associations and
variations among the various metrics used and their respective factors. The linear regression
equation for applying regression analysis used is given as

Y=a + bX1 + cX2 + dX3 + eX4

Where, Y= dependent variable, a, b, c, d, e are correlation coefficients, and X1, X2, X3, X4
are independent variables. The variation in independent variables affects the variation in
dependent variable.The measure of strength of association in the regression analysis is given
by the coefficient of determination, denoted by R-square. The coefficient varies between 0
and 1 and represents the proportion of total variation in the dependent variable that is
accounted for, by the variation in the factors.

REFERENCES

[1] Martin Flower, Kent Beck, John Brant, William F. Opdyke, Don Roberts, 1999,
Refactoring: Improving the Design of Existing Code, Addison Wesley.
[2] Robert C. Martin Series, 2004, Working Effectively with Legacy Code, Michael C.
Feathers, Prentice Hall.
[3] Frank Simon, Frank Steinbruckner, Claus Lewerentz, 2001, Metrics Based Refactorings,
In: Proceedings of 5th European Conference on Software Maintenance and Reengineering,
IEEE CS Press, Lisbon, Portugal, pp. 30-38.
[4] Arjun Guha, Shriram Krishnamurthi, 2010, Minding the (Semantic) Gap, Engineering
Programming Language Theory.
[5] W. C. Wake, 2003. Refactoring Workbook, Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA.
[6] C. Dorai, S. Venkatesh, 2003. Bridging the Semantic Gap with Computational Media
Aesthetics, IEEE Multimedia, Vol. 10, No. 2, pp.15-17.
[7] Tom Mens, Tom Tourwe, 2004, A Survey of Software Refactoring, IEEE Transactions on
Software Engineering, Vol. 30, No. 2, pp. 126-139.
[8] http://www.businessdictionary.com/definition/Likert-scale.html
[9] John Fox, 1997, Applied Regression Analysis, Linear Models, and Related Methods,
Thousands Oaks, CA: Sage Publications.


www.gjmr.org

37


OPTIMIZING FILTERING PHASE FOR NEAR-DUPLICATE
DETECTION OF WEB PAGES USING TDW-MATRIX

Tanvi Gupta*

ABSTRACT
The voluminous amount of web documents has weakened the performance and reliability of
web search engines. Web content mining face huge problems due to the existence of duplicate
and near-duplicate web pages. These pages either increase the index storage space or
increase the serving costs thereby irritating the users. In this paper, the proposed work is to
optimize the filtering phase consists of prefix and positional filtering by adding suffix filtering
which is a generalization of positional filtering to the suffixes of the records. The goal is to
add one more filtering method that prunes candidates that survive the prefix and positional
filtering.

Keywords: near-duplicates, TDW-matrix, Prefix-filtering, Positional-filtering, suffix-filtering

*Lingaya’s University, Faridabad, India


www.gjmr.org

38


INTRODUCTION:
Over the last decade there is tremendous growth of information on World Wide Web
(WWW).It has become a major source of information. Web creates the new challenges of
information retrieval as the amount of information on the web and number of users using web
growing rapidly. It is practically impossible to search through this extremely large database
for the information needed by user. Hence the need for Search Engine arises. Search Engines
uses crawlers to gather information and stores it in database maintained at search engine side.
For a given user's query the search engine searches in the local database and very quickly
displays the results.
But, the voluminous amount of web documents has resulted in problems for search engines
leading to the fact that the search results are of less relevance to the user. In addition to this,
the presence of duplicate and near-duplicate web documents has created an additional
overhead for the search engines critically affecting their performance. The demand for
integrating data from heterogeneous sources leads to the problem of near-duplicate web
pages. Near-duplicate data bear high similarity to each other, yet they are not bitwise
identical [2][4].

A. TDW Matrix Algorithm

TDW Matrix Algorithm is a three-stage algorithm which receives an input record and a threshold
value and returns an optimal set of near-duplicates. In first phase, rendering phase[3], all pre-
processing are done and a weighting scheme is applied. Then a global ordering is performed to form a
term-document weight matrix. In second phase, filtering phase, two well-known filtering mechanisms,
prefix filtering and positional filtering, are applied to reduce the size of competing record set and
hence to reduce the number of comparisons. In third phase, verification phase, singular value
decomposition is applied and a similarity checking is done based on the threshold value and finally
we get an optimal number of near-duplicate records.


www.gjmr.org

39


Fig.1: General Architecture [1].

B. Suffix Filtering Method:-
Suffix filtering method, is a generalization of the positional filtering to the suffixes of the
records. However, the challenge is that the suffixes of records are not indexed nor their partial
overlap has been calculated. Therefore, we face the following two technical issues:
(i) How to establish an upper bound in the absence of indices or partial overlap results?
(ii) How to find the position of a token without tokens being indexed?
The first issue is solved by converting an overlap constraint to an equivalent Hamming
distance constraint. Then lower bound the Hamming distance by partitioning the suffixes in a
coordinated way. The suffix of a record x is denoted as xs. Consider a pair of records,
(x, y), that meets the Jaccard similarity threshold t, and without loss of generality, |y| ≤ |x|.
Since their overlap in their prefixes, is at most the minimum length of the prefixes, the
following upper bound can be derived in terms of the Hamming distance of their suffixes.
H (xs, ys) ≤ Hmax =2|x| − 2 t/1 + t ・ (|x| + |y|) − ( t ・ |x| − t ・ |y| ) –(1)
In order to check whether H (xs, ys) exceeds the maximum allowable value, an estimate of the
lower bound of H (xs, ys) is provided below. First we choose an arbitrary token w from ys,


www.gjmr.org

40


and divide ys into two partitions: the left partition yl and the right partition yr. The criterion
for the
partitioning is that the left partition contains all the tokens in ys that precede w in the global
ordering and the right partition contains w (if any) and tokens in ys that succeed w in the
global ordering. Similarly, divide xs into xl and xr using w too (even though w might not
occur in x). Since xl (xr) shares no common token with yr (yl), H(xs, ys) = H(xl, yl) + H(xr, yr).
The lower bound of H (xl, yl) can be estimated as the difference between |xl| and |yl|, and
similarly for the right partitions. Therefore,
H (xs, ys) ≥ abs (|xl| − |yl|) + abs (|xr| − |yr|) -(2)
Finally, we can safely prune away candidates whose lower bound Hamming distance is
already larger than the allowable threshold Hmax.

RELATED WORK:

A .Prefix Filtering: Consider an Ordering O of the token universe U and a set of records,
each with tokens sorted in the order of O. Let the p-prefix of a record x be the first p tokens
of x. If O(x, y) ≥ α, then the (|x|−α+1)-prefix of x and the (|y|−α+1)-prefix of y must share at
least one token.
Prefix filtering is a necessary but not sufficient condition for the corresponding overlap
constraint, an algorithm is designed as: first build inverted indices on tokens that appear in
the prefix of each record in an indexing phase. Then generate a set of candidate pairs by
merging record identifiers returned by probing the inverted indices for tokens in the prefix of
each record in a candidate generation phase. The candidate pairs are those that have the
potential of meeting the similarity threshold and are guaranteed to be a superset of the final
answer due to the prefix filtering principle. Finally, in a verification phase, evaluate the
similarity of each candidate pair and add it to the final result if it meets the similarity
threshold.
B. Positional Filtering: Consider an ordering O of the token universe U and a set of
records, each with tokens sorted in the order of O. Let token w = x[i], w partitions the record
into the left partition xl (w) = x [1 . . . (i − 1)] And the right partition xr(w) = x[i . . |x|]. If O(x,
y) ≥ α, then for every token w x ∩ y, O (xl (w), yl(w)) + min(|xr(w)|, |yr(w)|) ≥ α.


www.gjmr.org

41

Ijrime complimentary copy vol1 issue5

Ijrime complimentary copy vol1 issue5

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Ijrime complimentary copy vol1 issue5

Similar to Ijrime complimentary copy vol1 issue5 (20)

Recently uploaded

Recently uploaded (20)

Ijrime complimentary copy vol1 issue5