Karim Baina Assises AUSIM 2016

313 views

Published on

Cyber-security Intelligence with Big Data Analytics : Values, Machine learning Algorithms & Defence strategy, Architecture & Processes, Data processing paradigms, Ecosystem overview, and case studies

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
313
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Karim Baina Assises AUSIM 2016

  1. 1. 1/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis 28 Octobre 2016 Karim BAÏNA Co-responsable du Diplôme Universitaire « Big Data Scientist » Chef du Département Génie Logiciel Chef de Service de Coopération ENSIAS, Mohammed V University of Rabat, Morocco
  2. 2. 2/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 SAN FRANCISCO — Eleven hours after a massive online attack that blocked access to many popular websites, the company under assault has finally restored its service. Dyn, a New Hampshire-based company that monitors and routes Internet traffic, was the victim of a massive attack that began at 7:10 a.m. Friday morning. The issue kept some users on the East Coast from accessing Twitter, Spotify, Netflix, Amazon, Tumblr, Reddit, PayPal and other sites. 11 hours later (at 6:17 p.m. Friday), Dyn updated its website to say it had resolved the DDoS had been restored. Mirai software (origin of the attack) uses malware from phishing emails to first infect a computer or home network, then spreads to everything on it, taking over DVRs, cable set-top boxes, routers and even Internet-connected cameras used by stores and businesses for surveillance. These devices are in turn used to create a robot network (or botnet), to send the millions of messages that knocks the out victims' computer systems. Hacked home devices caused massive Internet large-scale DDoS attack Source : USA TODAY 10:04 a.m. Tuesday, October 22, 2016 The massive DDoS attack was a sophisticated, highly distributed attack involving "10s of millions of IP addresses" of IoT devices part of the Mirai botnet protected by little more than factory-default usernames and passwords
  3. 3. 3/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 « Cybersecurity Framework, Big Data and the associated analytics tools coupled with the emergence of cloud, mobile, and social computing offer opportunities to process and analyze structured and unstructured cybersecurity-relevant data » NIST, National Institute of Standards and Technology'2014 « Security analytics market is projected to hit $7.1 billion by 2020 » Markets and Markets, 2015
  4. 4. 4/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 1. Introduction 2. Cyber-security Intelligence with Big Data Analytics : Value Machine learning Algorithms & Defence strategy Architecture & Processes Data processing paradigms Ecosystem overview 3. Case Studies 4. References Outline
  5. 5. 5/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Introduction Karim BAÏNA, ENSIAS 28 Octobre 2016 Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
  6. 6. 6/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 90% of universal data has been produced during last 5 years +1,2 T (10¹²) search on Google +4 B (10⁹) hours of video on Youtube +1 B active users on Facebook spending 700 M min per month +500 M users posting +55 M Tweets every day +30 B RFID Tag in 2013 (1.3 Billion in 2005) +6 B of mobile phones +4,6 B of camera phone +420 M of wearable, wireless health Monitors +200 M smart metter in 2014 (76 M in 2009) +100 M of GPS enabled Big Data – Digital Universe drives growth and integration of digital economy source intel.com source : IBM source Hongkiat.com
  7. 7. 7/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Crime-As-Service business model on drives digital underground economy that provides wide range of commercial services that facilitate any type of cybercrimes tragetting vulnerabilities of people, process, and technology. The financial gain from cybercrime stimulates the commercialisation of cybercrime as well as its innovation, scale, and further sophistication, intelligence, versatility, and availability. "Dark Web" and underground cybersecurity economy Prices on Dark internet 1000 verified e-mail @ 10 $ 1000 social network account 12 $ 1 passport scan 2 $ 1 Cloud account 8 $ 1 Credit card number 20 $
  8. 8. 8/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 3.5 new threat / sec, 12 600 / hour, 302 400 / day 16 identity theft / sec, 58 000 / hour, 1 350 000 / day 30 % of victims are attacked via social networks Theft of “hard” intellectual property increased by 56% in 2015 Cyber attacks in numbers ©Teradata & Ponemon Institute 2013 Cyber attacks risk mitigation priority
  9. 9. 9/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Big Data Opportunities for Security & Privacy protection risk crime/minal fraud/ulent surveillance account/ability/ant prevent/tion/ting/tive anomalies anonymise/sation trust/ed/ing/ees terrorism/ist cameras 0 10 20 30 40 50 60 70 80 90 100 Nombre d'occurrences dans les études d'opportunité Big Data Apparition OpportrunitéBigData Analysis of 10 opportunity studies of Big Data (282 pages, 115.623 words) © Karim Baïna 2016 (396 occurrences in total - 1,5 occurrence in each page) BigDataOpportunity occurrences
  10. 10. 10/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Firewalls, Intrusion Detection Systems (IDS), Intrusion Prevention System (IPS) – Security architects realized the need for layered security e.g., reactive security and breach response because a system with 100% protective security is impossible. Source: www.123rf.com Cyber-security Intelligence - 1st generation
  11. 11. 11/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Security information and event management – Managing alerts from different intrusion detection sensors and rules was a big challenge in enterprise settings. SIEM systems aggregate and filter alarms from many sources and present actionable information to security analysts. Corporate cyber crime is increasing, the number of security incidents climbed to 38% in 2015 and is growing high, and the prevention and detection methods have proved largely ineffective. Price waterhouse Coopers'2016 Currently, for complex cyberespionage attack (e.g. Advanced Persistent Threat (APT)) detection relies heavily on the expertise of human analysts to create custom signatures and perform manual investigation. Cyber-security Intelligence - 2nd generation Source: ManageEngine.com
  12. 12. 12/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Cyber-security Intelligence with Big Data Analytics Karim BAÏNA, ENSIAS 28 Octobre 2016 Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
  13. 13. 13/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Big Data analytics in security – Big Data analytics tools combined with SIEM technologies have the potential to provide a significant advance in actionable security intelligence by : becoming proactive and not only reactive consolidating/combining and analysing logs automatically from multiple data sources rather than in isolation enhancing IDS & IPS through continual adjustment and effectively learning good & bad behaviours. reducing time for correlating long-term historical data (without purge) for forensic purposes, and contextualizing diverse security event information. detecting automatically complex threats (ATP, 0-day, DDoS botnets) at an early stage, using more sophisticated pattern analysis and anomaly identification using feature extraction. Cyber-security Intelligence - 3rd generation © Dan Tembe, 2016
  14. 14. 14/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Value of Big Data and Data analytics for Cyber-security Challenges Territorial Security Prediction of natural Catastrophs Citizen Security Efficient and Personnalised recognition of malicious behaviour (pattern) representing cyber-security threatness, to suggest/recommend actions. Identification of actionable security information from large enterprise data sets and decrease of false positive rate (Veracity) to manageable levels (actions are expensive). Complex events correlation analysis (eg. user profile & behaviour similarity, event dependence or causality) to produce coherent peaces of cyber-security knowledge Prevention analysis : deduction that an event will happen – future cyber-security risk probability) and Proposition of anticipative actions to limit the impact. Prediction analysis : exact deduction and explanation when an inner or extern cyber-security issue will happen, and Prevision of consequences. Prove compliance with regulatory requirements. Financial Fraud analysis Prevention of epidemy evolution
  15. 15. 15/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Big Data Architecture and processes (Real Time Processing) Big Data Zone ata ke (Batch Processing) Big Data Lake (Processing Data at Rest) : Acquisition, Extraction cleaning/annotation, Integration/Aggregation, Representation, and Recording [un, semi] structured data. Real Time Processing (RTAP of Data in Motion) : Big Data Management and Analytics in real time Analytics Sand Box : Modeling and Analysis through inductive/inferential approach on a sample data set, Interpretation. Continuous learning loop between Big Data Zone and Analytics Sand Box (deductive/inductive process) Business Intelligence Environment : Browsing structured datamarts, KPI Reporting, Actionning and Alerting, Integration with Business Processes. Inspired from EMC (except RTAP part)
  16. 16. 16/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Defense Strategy with Big Data Analytics Algorithms © CISCO 2015
  17. 17. 17/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Network anomaly detection with Big Data Analytics Algorithms © CISCO 2015
  18. 18. 18/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Basic Network Anomaly examples © CISCO 2015
  19. 19. 19/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Anomaly Detection (AD) anomaly outlier Cluster 2 Cluster 1 anomaly Linear regression K-Means (clustering) 1) Generate a Model of what is normal : Group data using supervised or unsupervised methods.e.g. Classification/Clustering 2) Anomaly Detection : refers to the problem of finding patterns in data that don’t confirm the expected behaviour Detect data points that deviates so much from the normal expected observations. when it happens trigger a signal. Examples : Smart, customized and targeted malware, Malicious or negligent insiders who abuse their access to put data or IP at risk, Compliance breaches that require complex interrelated rule sets to be detected, etc. unsupervised algorithm supervised algorithm
  20. 20. 20/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 source : Happiest Minds Technologies'2013 Anomaly Detection (AD)
  21. 21. 21/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Behaviour change/Anomaly Detection (B.A.D) 1) Generate a Model of what is normal : If the scoring of current collected data is not an outlier (within a window of most recent data), it is added to a buffer (reference) 2) Behaviour change Detection : Keep monitoring change in patterns between the current data and the reference buffer based on distance metric. Detect shift in the score of the current data. when it happens trigger a signal. Examples : User/employee behaviour, Asset behaviour, Interaction behaviour.
  22. 22. 22/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Advanced Network Anomaly example : APT (Advanced Persistent Threat) source : IBM'2013
  23. 23. 23/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Advanced Network Anomaly example : APT (Advanced Persistent Threat) filter malicious URLs with Neural Networks or Clustering filter spam e-mail with A trained machine learning (Decision Tree, Support Vector Machine identify infected PDF files with malicious JavaScript By (1) Detecting JavaScript syntacticly + (2) categorising JS code as malicious with syntactic trees-based clustering Detect anormal outbound data transfer over the network (if data exfiltration targets the enterprise network and not a third party one's) Correlation of all those detections within a window of time
  24. 24. 24/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 OLAPnon-structured semi-structured structured Cross - multiple (inner or extern) data sources - of multiple formats (or not even formatted) - with no-schema constraint (ELT or schema on read) network traffic events from firewalls, and security devices, software application events (e.g. website traffic, financial transactions, business processes), and people action events) Managing Variety of cybersecurity data sources with BigData
  25. 25. 25/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Managing Volume of cybersecurity data with Big Data (partition/fragmentation) Data at Rest Spread data across a cluster of computers (partition/fragmentation) Keep processing physically close to the data (parallel synchronous [micro] batching for Data Locality) Large enterprise Data sets Storage for a longer period without purge Analysis scalability of big security data (min instead of hours)
  26. 26. 26/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Pattern recognition/correlation/scoring rules Data in Motion Data (event) arrives to processings and is handled before even storage Processing of Millions of events by second (real time analysis processing – RTAP) It is estimated that an enterprise as large as HP (in 2013) generates 1 trillion events per day, or roughly 12 million events per second Managing Velocity of cybersecurity data production with BigData
  27. 27. 27/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Case Studies Karim BAÏNA, ENSIAS 28 Octobre 2016 Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
  28. 28. 28/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Network Security : Zions Bancorporation announced that it is using Hadoop clusters and Big Data to parse more data more quickly than with traditional SIEM tools. In their experience, the quantity of data and the frequency analysis of events are too much for traditional SIEMs to handle alone. To better model security context of the enterprise, Zions Bancorporation built a security Hive datawarehouse on Hadoop : 120 TB (2 years storage) of more than 120 types of multi-source data : transactions, fraud alerts, server logs firewall logs, IDS logs. In their traditional systems, searching among a month’s load of data could take between 20 minutes and 1 hour. In their new Hadoop system running queries with Hive, they get the same results in about one minute. Cybersecurity with Big Data - Case Studies Zions Bancorporation, RSA Conference'2012
  29. 29. 29/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Normal data on an enterprise environment includes billions of events per day (IP traffic information and network trafic). This data that should be used to identify cybersecurity issues, that are collected by Netflow, must be stored and analyzed. Storage alone is costly. Analyzing what amount Big Data stores is an entire other challenge. Apache Spot (Incubating) offers a solution. It was designed to gather, store and analyze Big Data. In fact, Apache Spot (Incubating) is an ideal solution for this cybersecurity challenge. Apache Spot (Incubating) can integrate many different data sources in a data lake then add operational context to the data by linking configuration, inventory, service databases and other data stores. This helps you to prioritize the actions to take under different attack, malware, APT and hacking scenarios. Cybersecurity with Big Data - Case Studies source : http://spot.apache.org
  30. 30. 30/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 A large-scale graph inference approach was introduced to identify malware-infected hosts in an enterprise network and the malicious domains accessed by the enterprise's hosts. Experiments on a 2 Billion HTTP request data set collected at a large enterprise, a 1 Billion DNS request data set collected at an ISP, and a 35 Billion network IDS alert data set collected from over 900 enterprises worldwide. True positive rates and false positive rates can be decreased with having limited data labeled as normal events or attack events used to train anomaly detectors (supervised algorithm) Cybersecurity with Big Data - Case Studies Machine Learning Approach & Algorithms : Graph inference approach Supervised anomaly detectors HP Labs'2013
  31. 31. 31/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 TBs of DNS events consisting of Billions of DNS requests and responses collected at an ISP were analyzed. The goal was to use the rich source of DNS information to identify botnets, malicious domains, etc. A varied set of features were computed, including ones derived from domain names, time stamps, and DNS response time-to-live values. Then, classification techniques (e.g., decision trees and support vector machines) were used to identify infected hosts and malicious domains. The analysis has already identified many malicious activities from the ISP data set. Cybersecurity with Big Data - Case Studies Machine Learning Algorithms : Decision Trees, Support Vector Machines (SVM) HP Labs'2013
  32. 32. 32/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Big Data – an ecosystem of new concepts and innovative technologies
  33. 33. 33/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Sandip K Pal, Manish Anand, User Behavior based Anomaly Detection for Cyber Network Security, Happiest Minds Technologies, 2013 Big Data Working Group, Big Data Analytics for Security Intelligence, Cloud Security Alliance, 2013 Big Data Analytics in Cyber Defense, Sponsored by Teradata, conducted by Ponemon Institute LLC, February 2013 The Internet Organised Crime Threat Assessment (iOCTA), Europol, 2014 Detecting Hacks : Anomaly Detection on Networking Data, CISCO, 2015 Hadoop Summit Turnaround and transformation in cybersecurity Key findings from The Global State of Information Security, Survey PwC, 2016 Sri Krishnamurthy, Anomaly Detection Techniques and Best Practices, 4th Annual Global Big Data Conference, August-September 2016, Santa Clara, California Salah Baïna, Nouvelles Technologies & Nouvelles Menaces, Securisk Africa Forum, February 2016. Karim Baïna, Les Big Data : Paradigm Shift et catalyseur de création de la Valeur, ISIMA, Université Blaise Pascale, Juillet 2016 References
  34. 34. 34/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016 Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis 28 Octobre 2016 Karim BAÏNA, karim.baina@um5.ac.ma Co-responsable du Diplôme Universitaire « Big Data Scientist » Chef du Département Génie Logiciel Chef de Service de Coopération ENSIAS, Mohammed V University of Rabat, Morocco

×