Big Data for CyberSecurity


Published on

A key concern in today's Internet is the threat of cybercrime. Cybercrimes on the Web use different types of malware and fraud for various purposes such as financial theft, espionage, copyright infringement, denial of service and cyber-warfare. They spread using different protocols such as HTTP or HTTPS, links in email or IM, IRC, malware attachments, and phishing attacks. This cyber threat landscape, often controlled by organized crime and nation states, has been evolving rapidly and is becoming more evasive and difficult to detect. They often make use of multiple infection mechanisms to take control of machines and make them part of botnets, which can then be utilized to perpetrate other kinds of attacks such as data leakage and denial of service attacks. As threats blend across diverse data channels, their detection requires scalable distributed monitoring and cross-correlation with a substantial amount of contextual information. Conventional methods of protecting against cyber attacks such as signature-based detection and firewalls have become less effective.

Many corporations, security companies and governments, thus, are beginning to employ more and more sophisticated means of detecting and protecting against cyber attacks. Recently, data-driven approaches have become popular for detecting new kinds of attacks. Instead of relying on static signature-based detection, these techniques seek to detect anomalies and other patterns from various kinds of data such as network traffic statistics and server and application logs. For example, a sudden increase in the number of unresolvable DNS requests from a laptop might indicate that it is infected by a bot. These approaches rely on very large volumes of data and a variety of analytics to analyze the data. In this talk, I will describe some Big-Data based analytics and systems that IBM has built for detecting different kinds of cyber-attacks, particularly for detecting new kinds or new sources of cyber-attacks that may have not been seen before. These analytics span both real-time processing on the IBM InfoSphere Streams platform as well as off-line processing using InfoSphere Big Insights and data mining tools like SPSS.

Published in: Technology
  • greetings,

    i would like to download or print this document. Can u post me a link, where from i can download it !? It is for educational purpose.
    Are you sure you want to  Yes  No
    Your message goes here
  • Great presentation. Any chance you could post the slide narrative?
    Are you sure you want to  Yes  No
    Your message goes here
  • This talk was held at the 7th meeting on May 13 at IBM Zurich by Dr. Anand Ranganathan, Research Staff Member, IBM Research Watson Lab NY.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This slide shows you sort of a timeline of events during the first half of 2011. A bunch of different attacks against major organizations, many of whom we feel are probably pretty operationally competent. These are not surprising that some of these organizations were breached. Also, we sort of relate the attack vector as best we understand it based on what ’s been publicly disclosed. And we also - we sort of have a conjecture about the impact of the breach from a financial standpoint, and that’s a rough estimate based on what’s been publicly disclosed. So those numbers are certainly not to be bet on or anything. But it’s as good as we can do based on what we know.
  • Open Security Foundation reported 40% increase in breach events for 2012 that cover loss, theft, and exposure of personally identifiable information
  • There is need to talk: Bots receive updates and commands from the C&C node Utilize a command and control structure, through IRC, HTML, SSL, Twitter, IM or custom built solutions. Botnet communications are becoming more sophisticated and harder to track peer-to-peer, distributed vs. hierarchical control structure fast fluxing, name generation
  • Key Points - Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications Workload optimization – improvements upon open source for efficient processing and storage Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world
  • IBM IOD 2011 05/14/13 Prensenter name here.ppt
  • What we are monitoring: > 12.000 Systems, we have about 12.000 unique MAC addresses in our db and we can only get to MAC addresses for a part of the systems we monitor (mostly systems using DHCP) since we do not yet connect to infrastructure that assign fixed IP addresses. We added ARP monitoring to correlate static IP addresses with their MAC addresses but see only partially the ARP traffic since the taps are located at the network boundaries. We track about 200.000-600.000 unique domain names per day, 20K to 120K unique domain names per hour, just to give you an idea.
  • Big Data for CyberSecurity

    1. 1. © 2013 IBM CorporationMay 14, 2013Big Data for CyberSecurityAnand RanganathanResearch Staff Member, TJ Watson Research Center<>
    2. 2. Agenda Cyber Threats IBM Big Data Suite Big Data Analytics for CyberSecurity– Monitor Network Behaviors to detect known and unknown cyber-threatsin Enterprises– Detect Denial of Service Attacks in large ISPs– Detect Data-Leakage from organizations2IB
    3. 3. Cyber-Threats Are Becoming More Sophisticated3
    4. 4. 2011: Year of the Targeted AttackSource: IBM X-Force®Research 2011 Trend and Risk ReportJK2012-04-26MarketingServicesOnlineGamingOnlineGamingOnlineGamingOnlineGamingCentralGovernmentGamingGamingInternetServicesOnlineGamingOnlineGamingOnlineServicesOnlineGamingITSecurityBankingITSecurityGovernmentConsultingITSecurityTele-communicationsEnter-tainmentConsumerElectronicsAgricultureApparelInsuranceConsultingConsumerElectronicsInternetServicesCentralGovtCentralGovtCentralGovtAttack TypeSQL InjectionURL TamperingSpear Phishing3rdParty SoftwareDDoSSecureIDTrojan SoftwareUnknownSize of circle estimates relative impact ofbreach in terms of cost to businessJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecEntertainmentDefenseDefenseDefenseConsumerElectronicsCentralGovernment CentralGovernmentCentralGovernmentCentralGovernmentCentralGovernmentCentralGovernmentCentralGovernmentConsumerElectronicsNationalPoliceNationalPoliceStatePoliceStatePolicePoliceGamingFinancialMarketOnlineServicesConsultingDefenseHeavyIndustryEntertainmentBanking2011 Sampling of Security Incidents by Attack Type, Time and Impactconjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
    5. 5. 2012: The explosion of breaches continues!Source: IBM X-Force®Research 2012 Trend and Risk Report2012 Sampling of Security Incidents by Attack Type, Time and ImpactConjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
    6. 6. A Denial of Service attack that prevents or impairs the use of networks,systems, or applications by exhausting resourcesMalware infection - A virus, worm, Trojan horse, or other code-basedmalicious entity that successfully infects a hostA targeted, advanced attack – also known as an advanced persistentthreat (APT) - which is designed to be undetectable.Loss or theft of technology (laptops, memory sticks, PDAs) whichcontain sensitive data; Inadvertent disclosure of dataDefacement - A person gains logical or physical access withoutpermission and defaces a Web applicationCommon Cyber Security Risks and Potential ImpactsLoss of CustomersImpact to BrandSensitive Data DisclosureStolen Intellectual PropertyLoss of Data & ProductivityPersonal and National SecurityCommon Security Risks Potential ImpactsLoss of Data or Productivity
    7. 7. Botnets Botnet = A network of compromised computers controlled bythe botmaster, ranging in size from hundreds to millions of hosts Purpose: denial of service attacks, spam delivery, stealingcredentials and data, compromising control systems, etc. Hosts infected by downloads from malicious websites, emailedexecutables, web, memory stick, PDF, … Bots receive updates and commands from the Command andControl node and communications are becoming moresophisticated7
    8. 8. Botnet CommunicationThere is need to talk: Bots receive updates andcommands from the C&Cnode Utilize a command andcontrol structure, throughIRC, HTML, SSL, Twitter, IMor custom built solutions. Botnet communications arebecoming moresophisticated and harder totrack– peer-to-peer, distributed vs.hierarchical control structure– fast fluxing, name generation8C&CP2P
    9. 9. A Typical Threat Example92Malicious Webserver sends orreflects exploit code<click>1Install MalwareMail-Client5VictimDomainNameServerSpammerCommand& Control4 web-page +3 Follow linkExecute (Spam..)9C&C/ Updater IPAddressLookupC&C/ Updater DN6Remotely ControlMalwareContact UpdaterBy IP Address (C&C)78
    10. 10. A Typical Threat Example102Malicious Webserver sends orreflects exploit code<click>1Install MalwareMail-Client5VictimDomainNameServerSpammerCommand& Control4 web-page +3 Follow linkExecute (Spam..)9C&C/ Updater IPAddressLookupC&C/ Updater DN6Remotely ControlMalwareContact UpdaterBy IP Address (C&C)78d) Monitor Web Traffica) Monitor DNSc) Monitor Port &Protocol Usageb) Monitor NetFlowb) Monitor NetFlow
    11. 11. Typical Solution Architecture1101/11/10DNSNetFlow…..X86BoxX86BladeCellBladeX86BladeFPGABladeOperating SystemTransportSystem S Data FabricUnsupervised Real-Time AnalyticsUnsupervised Real-Time Analytics Supervised LearningSupervised LearningDashboarding /Visualization132Real-time Results(Tickets, Monitoring)Collect Results +EvidenceTrends, History4 Adapted Analytics Models• Cybersecurity Analytics• Real-Time processingof massive data streams• Advanced Data Mining,and Trend analytics• New and Incrementalmodel learningPureData System forAnalytics, BigInsights
    12. 12. IBM Confidential © 2012 IBM Corporation12Smarter CommunicationsBI /ReportingBI / Reporting Exploration /VisualizationFunctionalAppIndustryAppPredictiveAnalyticsContentAnalyticsAnalytic ApplicationsIBM Big Data PlatformSystemsManagementApplicationDevelopmentVisualization& DiscoveryAcceleratorsInformation Integration & GovernanceHadoopSystemStreamComputingDataWarehouseIBM Big Data Suite
    13. 13. IBM Confidential © 2012 IBM Corporation13IBM InfoSphere StreamsMillionsofeventspersecondMicrosecondLatencyTraditional / Non-traditionaldata sourcesReal time deliveryPowerfulAnalyticsAlgoTradingTelco churnpredictSmartGridCyberSecurityGovernment /Law enforcementICUMonitoringEnvironmentMonitoringA Platform for Real Time Analytics on BIG DataVolumeTerabytes per secondPetabytes per dayVarietyAll kinds of dataAll kinds of analyticsVelocity Insights in microsecondsAgilityDynamically responsiveRapid application development
    14. 14. IBM Confidential © 2012 IBM Corporation14 continuous ingestion  continuous analysisHow Streams Worksachieve scale bypartitioning applications into components
    15. 15. IBM Confidential © 2012 IBM Corporation15 continuous ingestion continuous analysisachieve scaleby partitioning applications into componentsby distributing across stream-connected hardware nodesHow Streams Worksinfrastructure provides services forscheduling analytics across h/w nodesestablishing streaming connectivity…TransformTransformFilterFilterClassifyClassifyCorrelateCorrelateAnnotateAnnotatewhere appropriate,elements can be “fused” togetherfor lower communication latencies
    16. 16. Security Appliances (Firewalls, IDS, IPS, SIEMs)vs Big DataIBM Big Data PlatformIBM QRadar Security Intelligence PlatformSecurity use cases Turnkey CustomUser Interface All-in-one console Purpose-built applicationsData Sources 450+ preconfigured (and growing) Everything elseData Volume 100+ Terabyte range Peta-byte rangeReal-time Analysis Seconds MillisecondsAnalytics Pre-built, primarily rule-based Custom, learningRequired Expertise Average - Security practitioners Skilled – Data scientists and analystsInfoSphere BigInsights,Streams and PureDatafor Analytics
    17. 17. Organizations have a growing need to identify and protectagainst threats by building insights from broader andlarger data sets
    18. 18. A Typical Threat Example202Malicious Webserver sends orreflects exploit code<click>1Install MalwareMail-Client5VictimDomainNameServerSpammerCommand& Control4 web-page +3 Follow linkExecute (Spam..)9C&C/ Updater IPAddressLookupC&C/ Updater DN6Remotely ControlMalwareContact UpdaterBy IP Address (C&C)78d) Monitor Web Traffica) Monitor DNSc) Monitor Port &Protocol Usageb) Monitor NetFlowb) Monitor NetFlow
    19. 19. Traditional Security Analytics21MonitoredNetworkMonitoredNetworkThe RestOf TheWorldDNSDNSDNSDHCPDHCPFirewallIDS/IPSInlineConventionalSetupDetect Signatureswithin IndividualData Streams
    20. 20. Streaming Analytics22MonitoredNetworkMonitoredNetworkThe RestOf The World(Internet)DNSDNSDNSDHCPDHCPFirewallIDS/IPSInlineReal-Time StreamingAnalytics SetupDetect Signatureswithin IndividualData StreamsReal-TimeCyber SecurityAnalyticsDetects behaviors by correlatingacross diverse & massive datastreams via Analytics in MotionModels learnt offline withAnalytics on Data at RestIDS/IPS Alerts…
    21. 21. Streaming Analytics for Fast-flux Botnets23DNS ResponseRecordsSuspectedFast-fluxDomainNamesJoinJoinDNS Queries(with internal querying host IP Addresses)FastFluxAnalyticsFastFluxAnalyticsFastFluxAnalyticsFastFluxAnalyticsFastFluxAnalyticsFastFluxAnalyticsCandidate Names/IPswith Confidence ValuesAggregatorAggregatorSuspectedFast-FluxIP-addressesJoinJoinDHCP Traffic(IP  MAC  System/Owner)Fast-fluxingBot alertsJoinJoinHost LogsHost LogsIPS AlertsIPS Alerts…Netflow
    22. 22. 24
    23. 23. Use Case 2 - Detect Distributed Denial of Service Attacks inISPs DDOS attacks often launched by botnets to flood a target server Often use techniques to amplify the flooding– E.g. DNS Amplification Attacks Very hard to detect and prevent in time– Need to monitor 100s of Gbps– Need to monitor millions of DNS requests per second Use InfoSphere Streams for running analytics for detecting DDOSattacks– Look for anomalies in DNS server requests– Scale to internet level traffic rates© 2013 IBM Corporation25
    24. 24. Use Case 3 - Detect Data-Leakage from organizations Determine what information employees (or bots) are sending out ofthe company– Look at the all information flowing out of the company to the outside world– Determine if it contains any confidential or sensitive information Monitor what information employees (or bots) are seeing/accessing– Determine if they are accessing sensitive information (even if they may havethe rights to access it)– Determine if their access patterns are suddenly changing• E.g. an employee that is suddenly accessing much more information than he (orsomeone else in his role) typically accesses may want to sell this information outsideor leave the company© 2013 IBM Corporation26
    25. 25. 27
    26. 26. DNS Amplification AttackKey characteristics: 1) Targeted attack victimizing hosts & servers 2) DNS service provider becomes aparticipant and unavailable during attack 3) Attack attribution is hard28To delete