Jainendra Singh
Department of Computer Science
Maharaja Surajmal Institute, New Delhi,
India
Real Time BIG Data Analytic: Security Concern
and
Challenges with Machine Learning Algorithm
Abstract
 with great power of data comes great responsibility! A
big data initiative should not only focus on the volume,
velocity or variety of the data, but also on the best
way to protect it.
 Security is usually an afterthought, but Elemental
provides the right technology framework to get you
the deep visibility and multilayer security any big data
project requires.
 Multilevel protection of your data processing nodes
means implementing security controls at the
application, operating system and network level while
keeping a bird's eye on the entire system using
actionable intelligence to deter any malicious activity,
emerging threats and vulnerabilities.
 Advances in Machine Learning (ML) provide new
challenges and solutions to the security problems
encountered in applications, technologies and
Introduction
 The simplest definition of "big data" is exactly what it
sounds like: massive amounts That' s a lot of data by
anybody' s standards. Of course, the amount of data
that constitutes "big" changes over time.
 Big data typically refers to the following types of data:
 Traditional enterprise data - includes customer
information from CRM systems, transactional ERP
data, web store transactions, and general ledger data.
 Machine-generated Isensor data - includes Call Detail
Records ("CDR"), weblogs, smart meters,
manufacturing sensors, equipment logs (often
referred to as digital exhaust).
 trading systems data. Social data - includes customer
feedback streams, microblogging sites like Twitter,
and social media platforms like Face book.
Four Key Characteristics That
Defme Big Data
 Volume.
 Velocity.
 Variety.
 Value.
Importance Of Big Data
 When big data is distilled and analyzed in
combination with traditional enterprise data,
enterprises can develop a more thorough and
insightful understanding of their business, which can
lead to enhanced productivity, a stronger competitive
position and greater innovation - all of which can have
a significant impact on the bottom line.
 Manufacturing companies deploy sensors in their
products to return a stream of telemetry. In the
automotive industry, systems such as General Motors'
OnStar ® or Renault' s Rlink ®, deliver
communications, security and navigation services.
 Retailers usually know who buys their products. Use
of social media and web log files from their
ecommerce sites.
 Finally, social media sites like Face book and
LinkedIn simply wouldn' t exist without big data.
Organize Big Data
 The infrastructure required for organizing big data
must be able to process and manipulate data in the
original storage location; support very high throughput
(often in batch) to deal with large data processing
steps; and handle a large variety of data formats, from
unstructured to structured.
 Hadoop is a new technology that allows large data
volumes to be organized and processed while
keeping the data on the original data storage cluster.
Hadoop Distributed File System (HDFS) is the long-
term storage system for web logs for example.
 These web logs are turned into browsing behavior
(sessions) by running Map Reduce programs on the
cluster and generating aggregated results on the
same cluster. These aggregated results are then
loaded into a Relational DBMS system.
Analyze Big Data
 Since data is not always moved during the organization
phase, the analysis may also be done in a distributed
environment, where some data will stay where it was
originally stored and be transparently accessed from a
data warehouse.
 The infrastructure required for analyzing big data must be
able to support deeper analytics such as statistical
analysis and data mining, on a wider variety of data types
stored in diverse systems; scale to extreme data volumes;
deliver faster response times driven by changes in
behavior; and automate decisions based on analytical
models. Most importantly, the infrastructure must be able
to integrate analysis on the combination of big data and
traditional enterprise data.
 New insight comes not just from analyzing new data, but
from analyzing it within the context of the old to provide
new perspectives on old problems.
Big Data Security Challenges
1. Secure computations in distributed programming
frameworks
2. 2. Security best practices for non-relational data
stores
3. 3. Secure data storage and transactions logs
4. 4. End-point input validation/filtering
5. 5. Real-time security/compliance monitoring
6. 6. Scalable and compos able privacy-preserving
data mining and analytics
7. 7. Cryptographically enforced access control and
secure communication
8. 8. Granular access control
9. 9. Granular audits
10. 10. Data provenance
Proposed Research Work
Conclusion
 Machine learning is a subfield of artificial
intelligence concerned with techniques that allow
computers to improve their outputs based on
previous experiences. The field is closely related
to data mining and often uses techniques from
statistics, probability theory, pattern recognition,
and a host of other areas. Although machine
learning is not a new field, it is definitely growing.
Many large companies, including IBM®, Google,
Amazon, Yahoo!, and Face book, have
implemented machine-learning algorithms in their
applications.
References
 An Oracle White Paper June 2013
 Defending Networks with Incomplete Information:
A Machine Learning Approach, BlackHat Briefmgs
USA 2013.
 http: //www.skytree.net/machine-Iearning.
 hadoop.apache.org
 http:
//www.networkworld.com/community/blogldefining-
big-data-security-analytics.
 Douglas, Laney. "3D Data Management:
Controlling Data Volume, Velocity and Variety".
Gartner. Retrieved 6 February 2001.
Real callenges in big data security
Real callenges in big data security

Real callenges in big data security

  • 1.
    Jainendra Singh Department ofComputer Science Maharaja Surajmal Institute, New Delhi, India Real Time BIG Data Analytic: Security Concern and Challenges with Machine Learning Algorithm
  • 2.
    Abstract  with greatpower of data comes great responsibility! A big data initiative should not only focus on the volume, velocity or variety of the data, but also on the best way to protect it.  Security is usually an afterthought, but Elemental provides the right technology framework to get you the deep visibility and multilayer security any big data project requires.  Multilevel protection of your data processing nodes means implementing security controls at the application, operating system and network level while keeping a bird's eye on the entire system using actionable intelligence to deter any malicious activity, emerging threats and vulnerabilities.  Advances in Machine Learning (ML) provide new challenges and solutions to the security problems encountered in applications, technologies and
  • 3.
    Introduction  The simplestdefinition of "big data" is exactly what it sounds like: massive amounts That' s a lot of data by anybody' s standards. Of course, the amount of data that constitutes "big" changes over time.  Big data typically refers to the following types of data:  Traditional enterprise data - includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data.  Machine-generated Isensor data - includes Call Detail Records ("CDR"), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust).  trading systems data. Social data - includes customer feedback streams, microblogging sites like Twitter, and social media platforms like Face book.
  • 4.
    Four Key CharacteristicsThat Defme Big Data  Volume.  Velocity.  Variety.  Value.
  • 5.
    Importance Of BigData  When big data is distilled and analyzed in combination with traditional enterprise data, enterprises can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater innovation - all of which can have a significant impact on the bottom line.  Manufacturing companies deploy sensors in their products to return a stream of telemetry. In the automotive industry, systems such as General Motors' OnStar ® or Renault' s Rlink ®, deliver communications, security and navigation services.  Retailers usually know who buys their products. Use of social media and web log files from their ecommerce sites.  Finally, social media sites like Face book and LinkedIn simply wouldn' t exist without big data.
  • 6.
    Organize Big Data The infrastructure required for organizing big data must be able to process and manipulate data in the original storage location; support very high throughput (often in batch) to deal with large data processing steps; and handle a large variety of data formats, from unstructured to structured.  Hadoop is a new technology that allows large data volumes to be organized and processed while keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the long- term storage system for web logs for example.  These web logs are turned into browsing behavior (sessions) by running Map Reduce programs on the cluster and generating aggregated results on the same cluster. These aggregated results are then loaded into a Relational DBMS system.
  • 7.
    Analyze Big Data Since data is not always moved during the organization phase, the analysis may also be done in a distributed environment, where some data will stay where it was originally stored and be transparently accessed from a data warehouse.  The infrastructure required for analyzing big data must be able to support deeper analytics such as statistical analysis and data mining, on a wider variety of data types stored in diverse systems; scale to extreme data volumes; deliver faster response times driven by changes in behavior; and automate decisions based on analytical models. Most importantly, the infrastructure must be able to integrate analysis on the combination of big data and traditional enterprise data.  New insight comes not just from analyzing new data, but from analyzing it within the context of the old to provide new perspectives on old problems.
  • 8.
    Big Data SecurityChallenges 1. Secure computations in distributed programming frameworks 2. 2. Security best practices for non-relational data stores 3. 3. Secure data storage and transactions logs 4. 4. End-point input validation/filtering 5. 5. Real-time security/compliance monitoring 6. 6. Scalable and compos able privacy-preserving data mining and analytics 7. 7. Cryptographically enforced access control and secure communication 8. 8. Granular access control 9. 9. Granular audits 10. 10. Data provenance
  • 9.
  • 10.
    Conclusion  Machine learningis a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous experiences. The field is closely related to data mining and often uses techniques from statistics, probability theory, pattern recognition, and a host of other areas. Although machine learning is not a new field, it is definitely growing. Many large companies, including IBM®, Google, Amazon, Yahoo!, and Face book, have implemented machine-learning algorithms in their applications.
  • 11.
    References  An OracleWhite Paper June 2013  Defending Networks with Incomplete Information: A Machine Learning Approach, BlackHat Briefmgs USA 2013.  http: //www.skytree.net/machine-Iearning.  hadoop.apache.org  http: //www.networkworld.com/community/blogldefining- big-data-security-analytics.  Douglas, Laney. "3D Data Management: Controlling Data Volume, Velocity and Variety". Gartner. Retrieved 6 February 2001.