ISSN: 2277 – 9043
    International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE)
                                                                             Volume 1, Issue 6, August 2012




  Security threats to data mining and analysis
                               tools of TIA program
Swati Vashisht                    Divya Singh                        Bhanu Prakash Lohani
Lecturer at CSE deptt             Lecturer at CSE deptt              Lecturer at CSE deptt.
DIT SE Gr. Noida                  DIT SE Gr. Noida                   DIT SE Gr. Noida



                                                                           I. INTRODUCTION
Abstract: Data mining is the process that attempts
                                                          Data mining is the process of discovering new
to discover patterns in large data sets. The actual
                                                          patterns from large data sets involving methods at the
data mining task is the automatic or semi-
                                                          intersection    of   artificial     intelligence,   machine
automatic analysis of large quantities of data to
                                                          learning, statistics and data base system. It is the
extract previously unknown interesting patterns
                                                          process of analyzing data from different perspectives
such as groups of data records i.e.cluster analysis,
                                                          and   summarizing       it   into     useful   information,
unusual    records     (anomaly   detection)    and
                                                          information that can predict the success of a
dependencies association rule mining. This usually
                                                          marketing campaign, looking for patterns in financial
involves using database techniques such as spatial
                                                          transactions to discover illegal activities or analyzing
indexes. These patterns can then be seen as a kind
                                                          genome sequences.[1]
of summary of the input data, and may be used in
further analysis or, for example, in machine
                                                          For mining decisions data can be grouped according
learning and predictive analytics. As the internet
                                                          to the following categories:
has been involved in all areas of human activity,
there are increasing concerns that data mining            •Data classes: Stored data is used to locate data in
may pose a threat to our privacy and security then        predetermined groups.
security would be one of the major issues to
monitor. In this paper we present recent research         •Data clusters: Data items are grouped according to
on data mining and its security. We prepare a             logical relationships or consumer preferences.
survey report on data mining for crime detection.
                                                          •Data associations: Data can be mined to identify
                                                          associations.

Index Terms: data mining and security, intrusion          •Sequential patterns: Data is mined to anticipate
detection, terrorist attack.                              behavior patterns and trends.

                                                                                                                  78
                                     All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
    International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE)
                                                                             Volume 1, Issue 6, August 2012

      II. POSSIBLE THREATS TO SECURITY                          D.Intrusion Detection


A.Predict information about classified work from                 An intrusion can be defined as "any set of actions
correlation with unclassified work:                             that   attempt        to   compromise   the    integrity,
                                                                confidentiality or availability of a resource". Intrusion
Classification is a data mining technique used to               prevention techniques, such as user authentication
predict group membership for data instances in which            (e.g. using passwords or biometrics), avoiding
data instances are classified based on their feature            programming errors, and information protection (e.g.,
values. Predictive analysis could be applied to predict         encryption) have been used to protect computer
future patterns by providing a record of the past that          systems as a first line of defense.[5] Intrusion
can be analyzed more effectively on classified data.            detection system produces reports and intrusion
Unclassified work may involve duplicate and                     prevention system is placed in-line and is able to
redundant data which is difficult to manage.[2]                 actively prevent or block intrusions that are detected.
                                                                Intrusion detection systems are to identify malicious
A correlation is an index of the strength of the
                                                                activity, log information about said activity and
relationship between two variables.
                                                                report activity.[2]

B.Detect     “hidden”       information      based      on
                                                                             III.TO IMPROVE SECURITY
“conspicuous” lack of information:

                                                                • For privacy concerns, one should be only authorized
Data mining techniques are basically used in
                                                                access to privacy sensitive information such as credit
detecting hidden information from the large amount
                                                                card transaction records, health care records,
of database. Query generators and data interpretation
                                                                biological traits, criminal investigation and ethnicity.
components combine with discovery driven systems
                                                                So various data mining enhancing techniques have
to reveal hidden data.
                                                                been developed to help protecting data. Databases
                                                                can employ a multilevel security model to classify
C.Mining       “Open Source” data to determine
                                                                and restrict data according to various security levels,
predictive events:
                                                                with user permitted access to only their authority

Predictive analysis is a way to use data to predict             levels.[2]

future patterns. It is an area of statistical analysis that
                                                                • For security concerns, data mining can be used for
deals with extracting information from data and using
                                                                crime detection and prevention using various
it to predict future trends and behavior patterns. The
                                                                techniques such as TIA program ( Terrorism
core of predictive analytics relies on capturing
                                                                Information awareness) this project was to focus on
relationships between explanatory variables and the
                                                                three specific areas of research i.e. language
predicted variables from past occurrences, and
                                                                translation, data search with pattern recognition and
exploiting it to predict future outcome.
                                                                privacy protection, and advanced collaborative and

                                                                                                                      79
                                           All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
    International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE)
                                                                             Volume 1, Issue 6, August 2012

decision supportive tools.[9] CAPPS-II ( Computer -           We are in initial stage of our research, much remains
assisted Passenger Prescreening System), In this              to be done including the following task:
system, When a person books a plane ticket, certain
identifying information is collected by the airline: full     In TIA program person identification must not based

name, address, etc. This information is used to check         on statistical approach i.e. comparing with a standard

against some data store (e.g., a TSA No-Fly list,             model and known behavioral patterns , we are trying

the FBI ten most wanted fugitive list etc.) and assign        to design some technology based analysis tool for

a terrorism "risk score" to that person. High risk            Terrorism Information Awareness program.

scores require the airline to subject the person to
extended baggage and/or personal screening, and to
contact law enforcement if necessary. MATRIX
                                                                               REFERENCES
(Multistate Anti-terrorism Information Exchange)
which leverages advanced computer management                  [1]www.anderson.ucla.edu/faculty/jason.frand/teache

capabilities to more quickly access, share and analyze        r/.../datamining.htm

public records to help law enforcement generate               [2] Jiawei Han, Micheline kamber, Jian Pei Data

leads, expedite investigations, and possibly prevent          mining concepts and techniques

terrorist attacks.[3]
                                                              [3]William J. Krouse The Multi-State Anti-Terrorism
               IV. CONCLUSION                                 Information Exchange (MATRIX) Pilot Project

Though data mining involves data analysis tools to
                                                              [4] Gerhard PAAß1, Wolf REINHARDT, Stefan
discover previously unknown valid patterns and
                                                              RÜPING, and Stefan WROBEL Data Mining for
relationships in large data sets, and in TIA (Terrorism
                                                              Security and Crime Detection
Information Awareness) program, a data mining
application is designed to identify potential terrorist
                                                              [5] Wenke Lee and Salvatore J. Stolfo Data Mining
suspects in a large pool of individuals using statistical
                                                              Approaches for Intrusion Detection
approach in which the user is tested against the
predesigned model that includes information about
                                                              [6] Sushmita Mitra, Sankar K. Pal, Pabitra Mitra
known terrorists. However, while possibly re-
                                                              Data Mining in Soft Computing Framework: A
affirming a particular profile, it does not necessarily
                                                              survey
mean that the application will identify an individual
whose behavior significantly deviates from the
                                                              [7] Varun Chandola, Eric Eilertson, Levent ErtÄoz,
original model or an individual may be considered as
                                                              GyÄorgy Simon and Vipin Kumar Data mining for
a suspect if some information is found same as in
                                                              cyber security
original model.


                  V .FUTURE WORK

                                                                                                                 80
                                         All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
   International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE)
                                                                            Volume 1, Issue 6, August 2012

[8] Bhavani Thuraisingham, Latifur Khan,
Mohammad M. Masud, Kevin W. Hamlen Data
Mining for Security Applications


[9] Jeffrey W. Seifert Data Mining and Homeland
Security:An Overview


[10] Anshu Veda, Prajakta Kalekar, Anirudha
Bodhankar Intrusion Detection Using Datamining
Techniques

Author’s profile

Swati Vashisht has done bachelors in Information
Technology and pursuing Masters in Computer
Science & Engineering. Her area of interest is Data
mining & warehousing & Operating System.

Divya Singh has done bachelors in Computer
Science & Engg. and pursuing Masters in CSE from
Amity University. Her area of interest is Computer
Networks & Data mining.

Bhanu Prakash Lohani has done bachelors in
Computer Science & Engg. and pursuing Masters in
CSE from Amity University. His area of interest is
Computer Networks & Data mining.




                                                                                                       81
                                     All Rights Reserved © 2012 IJARCSEE

78 81

  • 1.
    ISSN: 2277 –9043 International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE) Volume 1, Issue 6, August 2012 Security threats to data mining and analysis tools of TIA program Swati Vashisht Divya Singh Bhanu Prakash Lohani Lecturer at CSE deptt Lecturer at CSE deptt Lecturer at CSE deptt. DIT SE Gr. Noida DIT SE Gr. Noida DIT SE Gr. Noida I. INTRODUCTION Abstract: Data mining is the process that attempts Data mining is the process of discovering new to discover patterns in large data sets. The actual patterns from large data sets involving methods at the data mining task is the automatic or semi- intersection of artificial intelligence, machine automatic analysis of large quantities of data to learning, statistics and data base system. It is the extract previously unknown interesting patterns process of analyzing data from different perspectives such as groups of data records i.e.cluster analysis, and summarizing it into useful information, unusual records (anomaly detection) and information that can predict the success of a dependencies association rule mining. This usually marketing campaign, looking for patterns in financial involves using database techniques such as spatial transactions to discover illegal activities or analyzing indexes. These patterns can then be seen as a kind genome sequences.[1] of summary of the input data, and may be used in further analysis or, for example, in machine For mining decisions data can be grouped according learning and predictive analytics. As the internet to the following categories: has been involved in all areas of human activity, there are increasing concerns that data mining •Data classes: Stored data is used to locate data in may pose a threat to our privacy and security then predetermined groups. security would be one of the major issues to monitor. In this paper we present recent research •Data clusters: Data items are grouped according to on data mining and its security. We prepare a logical relationships or consumer preferences. survey report on data mining for crime detection. •Data associations: Data can be mined to identify associations. Index Terms: data mining and security, intrusion •Sequential patterns: Data is mined to anticipate detection, terrorist attack. behavior patterns and trends. 78 All Rights Reserved © 2012 IJARCSEE
  • 2.
    ISSN: 2277 –9043 International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE) Volume 1, Issue 6, August 2012 II. POSSIBLE THREATS TO SECURITY D.Intrusion Detection A.Predict information about classified work from An intrusion can be defined as "any set of actions correlation with unclassified work: that attempt to compromise the integrity, confidentiality or availability of a resource". Intrusion Classification is a data mining technique used to prevention techniques, such as user authentication predict group membership for data instances in which (e.g. using passwords or biometrics), avoiding data instances are classified based on their feature programming errors, and information protection (e.g., values. Predictive analysis could be applied to predict encryption) have been used to protect computer future patterns by providing a record of the past that systems as a first line of defense.[5] Intrusion can be analyzed more effectively on classified data. detection system produces reports and intrusion Unclassified work may involve duplicate and prevention system is placed in-line and is able to redundant data which is difficult to manage.[2] actively prevent or block intrusions that are detected. Intrusion detection systems are to identify malicious A correlation is an index of the strength of the activity, log information about said activity and relationship between two variables. report activity.[2] B.Detect “hidden” information based on III.TO IMPROVE SECURITY “conspicuous” lack of information: • For privacy concerns, one should be only authorized Data mining techniques are basically used in access to privacy sensitive information such as credit detecting hidden information from the large amount card transaction records, health care records, of database. Query generators and data interpretation biological traits, criminal investigation and ethnicity. components combine with discovery driven systems So various data mining enhancing techniques have to reveal hidden data. been developed to help protecting data. Databases can employ a multilevel security model to classify C.Mining “Open Source” data to determine and restrict data according to various security levels, predictive events: with user permitted access to only their authority Predictive analysis is a way to use data to predict levels.[2] future patterns. It is an area of statistical analysis that • For security concerns, data mining can be used for deals with extracting information from data and using crime detection and prevention using various it to predict future trends and behavior patterns. The techniques such as TIA program ( Terrorism core of predictive analytics relies on capturing Information awareness) this project was to focus on relationships between explanatory variables and the three specific areas of research i.e. language predicted variables from past occurrences, and translation, data search with pattern recognition and exploiting it to predict future outcome. privacy protection, and advanced collaborative and 79 All Rights Reserved © 2012 IJARCSEE
  • 3.
    ISSN: 2277 –9043 International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE) Volume 1, Issue 6, August 2012 decision supportive tools.[9] CAPPS-II ( Computer - We are in initial stage of our research, much remains assisted Passenger Prescreening System), In this to be done including the following task: system, When a person books a plane ticket, certain identifying information is collected by the airline: full In TIA program person identification must not based name, address, etc. This information is used to check on statistical approach i.e. comparing with a standard against some data store (e.g., a TSA No-Fly list, model and known behavioral patterns , we are trying the FBI ten most wanted fugitive list etc.) and assign to design some technology based analysis tool for a terrorism "risk score" to that person. High risk Terrorism Information Awareness program. scores require the airline to subject the person to extended baggage and/or personal screening, and to contact law enforcement if necessary. MATRIX REFERENCES (Multistate Anti-terrorism Information Exchange) which leverages advanced computer management [1]www.anderson.ucla.edu/faculty/jason.frand/teache capabilities to more quickly access, share and analyze r/.../datamining.htm public records to help law enforcement generate [2] Jiawei Han, Micheline kamber, Jian Pei Data leads, expedite investigations, and possibly prevent mining concepts and techniques terrorist attacks.[3] [3]William J. Krouse The Multi-State Anti-Terrorism IV. CONCLUSION Information Exchange (MATRIX) Pilot Project Though data mining involves data analysis tools to [4] Gerhard PAAß1, Wolf REINHARDT, Stefan discover previously unknown valid patterns and RÜPING, and Stefan WROBEL Data Mining for relationships in large data sets, and in TIA (Terrorism Security and Crime Detection Information Awareness) program, a data mining application is designed to identify potential terrorist [5] Wenke Lee and Salvatore J. Stolfo Data Mining suspects in a large pool of individuals using statistical Approaches for Intrusion Detection approach in which the user is tested against the predesigned model that includes information about [6] Sushmita Mitra, Sankar K. Pal, Pabitra Mitra known terrorists. However, while possibly re- Data Mining in Soft Computing Framework: A affirming a particular profile, it does not necessarily survey mean that the application will identify an individual whose behavior significantly deviates from the [7] Varun Chandola, Eric Eilertson, Levent ErtÄoz, original model or an individual may be considered as GyÄorgy Simon and Vipin Kumar Data mining for a suspect if some information is found same as in cyber security original model. V .FUTURE WORK 80 All Rights Reserved © 2012 IJARCSEE
  • 4.
    ISSN: 2277 –9043 International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE) Volume 1, Issue 6, August 2012 [8] Bhavani Thuraisingham, Latifur Khan, Mohammad M. Masud, Kevin W. Hamlen Data Mining for Security Applications [9] Jeffrey W. Seifert Data Mining and Homeland Security:An Overview [10] Anshu Veda, Prajakta Kalekar, Anirudha Bodhankar Intrusion Detection Using Datamining Techniques Author’s profile Swati Vashisht has done bachelors in Information Technology and pursuing Masters in Computer Science & Engineering. Her area of interest is Data mining & warehousing & Operating System. Divya Singh has done bachelors in Computer Science & Engg. and pursuing Masters in CSE from Amity University. Her area of interest is Computer Networks & Data mining. Bhanu Prakash Lohani has done bachelors in Computer Science & Engg. and pursuing Masters in CSE from Amity University. His area of interest is Computer Networks & Data mining. 81 All Rights Reserved © 2012 IJARCSEE