Machine learning for cyber security public version 11 oct 11


Published on

Published in: Technology, Education

Machine learning for cyber security public version 11 oct 11

  2. 2. QUAD CHART SUMMARY:MACHINES C AN LEARN LIKE OVERVIEWHUMANS ai-one’s software development kit that enables programmers to build machine learning into applications. This tool generates an associative network (called a lightweight ontology) that reveals every relationship between each byte in the system. Like a human brain, the technology learns the contextual meaning of data by detecting patterns and relationships – including subtle signals within complex data. AI-ONE WORKS LIKE AN “EMPTY BRAIN” – LEARNINGMEANING BY DETECTING PATTERNS AND ASSOCIA The technology has broad applicability to solve ASSOCIATIONS problems that traditionally relied upon human llyNew technology enables machines to learn cognition to detect; such as finding high-orderlike humans by understanding the inherent term co-occurrences, isolating anomalous occurrences,structure of data. patterns and identifying latent relationships.Unlike other forms of artificial intelligence, ai ai- A modified version of ai-one’s core technology one’sone’s technology detects every relationship has the potential to transform cyber warfare bybetween every byte without any human enabling highly adaptive attacks and defenses. attackintervention at the moment of data ingestion. Moreover, it has the potential to provideThe biologically inspired system is autonomic situational awareness of every packet’s– spawning computational and data cells content, destination and intended purpose in nation purpowithin a neural network as it responds to near real-time across exabyte- -scale networks.external sensors.POTENTIAL APPLICATIONS BENEFITSCYBER W ARFARE AUTONOMIC Defensive: Recognition of threat patterns Learns without any human intervention Offensive: Recognition of vulnerabilities Finds the unexpectedCOMPLIANCE OBJECTIVE (UNBIASED) Risk assessment & mitigation Detects intrinsic & hidden patterns Behavior monitoring & management No cognitive bias from humansINSIDER T HREAT MITIGATION FAST DEPLOYMENT Conspiracy detection Works with existing technologies Anomalous usage detection SDK enables plug-n-play architecture playRED T EAM ATTACK SIMULATIONS SCALABLE/FLEXIBLE Identification of API component weakness Many deployment options Intelligent malware Product roadmap to 2EB/instance capacity Incremental insertion attacks PROVENBLUE T EAM DEFENSE S IMULATIONS In use by Swiss BKA, SwissPort, others Deep packet pattern recognition COTS version available for evaluation now Softkill counter measures © 2011, ai-one inc. Page 2 of 25
  3. 3. CONTENTSForward ........................................................................................................................... 4 About ai-one inc. ....................................................................................................... 5Abstract: .......................................................................................................................... 6The Current State of Cyber Security is Fundamentally Flawed ..................................... 10 Sources & Types of Cyber Attacks ......................................................................... 12 Exploiting API Weaknesses (Application Hijacking)................................................ 13 Machine Learning Measures and Counter-Measures to API Exploits ..................... 14 Exploiting Impersonations ....................................................................................... 14 Machine Learning Measures and Counter-Measures to Impersonation ................. 15Threat Evolution: Exploiting Complexity ........................................................................ 16Lightweight Ontologies (LWO): A New Computational Approach .................................. 16 Summary of the Benefits of ai-one’s Technology.................................................... 18ai-one Technology Roadmap ........................................................................................ 19 Current Commercial-Off-The-Shelf ......................................................................... 19 64-Bit Multi-thread COTS ....................................................................................... 19 64-Bit Chipsets ....................................................................................................... 20Next Steps – Proofs of Concept .................................................................................... 20 Immediate COTS-Based Approach ........................................................................ 20 Intermediate COTS Approach ................................................................................ 21 Matrix Chipset Approach ........................................................................................ 21Appendix - A Worse Case Scenario: MHOTCO Attacks ............................................... 23 Unnoticeable Attrition.............................................................................................. 24 The Game Changer: Machine Learning.................................................................. 25 © 2011, ai-one inc. Page 3 of 25
  4. 4. FORWARDHow will artificial intelligence impact global cyber security?Or put another way: How to attack and defend cyber assets with a new generation of machinelearning technologies?This paper provides actionable technical insights for business, government and militaryexecutives seeking technologies that will provide a competitive advantage in cyber security. Webelieve the requirements for both military and civilian cyber defenses are similar enough to usepublished (public) military specifications as a common denominator for protecting cyber assets.Based on more than 50 people-years of research and development, we believe that machinelearning is transformational to the “cyber battlespace” – where computers and/or networks areintentionally disrupted to cause harm or further criminal, political, ideological, social, or similarobjectives.Our goal is inspire innovation: ai-one does not provide a solution or services. We only providecore machine learning technologies. We believe that a complete artificial intelligence solution tocombat cyber security threats requires combining multiple tiers of technology – possiblyincluding natural language processing (NLP), machine learning, signal processing, Bayesiandecisioning tools, packet profile ontologies, etc.Cyber warfare spans both military and civilian concerns. The US Department of Defense hasdefined cyber warfare as the “Fifth Battlespace” (after land, sea, air and submarine domains).As a result, all branches of the US military now have cyber-specific commands.Similarly, the civilian world is justifiably obsessed with the protection of cyber assets. Groupssuch as Anonymous and WikiLeaks have wreaked havoc on financial institutions, markets andgovernments by disrupting mission critical networks and disseminating proprietary information.Cyber security is essential for civil freedoms, economic opportunities and national is one of a handful of firms that provide off-the-shelf machine learning and artificialintelligence application program interfaces (API). Our breakthrough is creating a system thatenables any programmer to build machine learning into almost any program. The core value issimple: We detect patterns. If you know the patterns… you know the relationships between data elements. If you know the relationships… then you know the context of any element. If you know context… you understand meaning. © 2011, ai-one inc. Page 4 of 25
  5. 5. ai-one’s APIs enable machines to learn. Any data. Any format. Faster and more accurately thana human. The implications for this technology spans the entirety of all computing.First, let’s define the terms artificial intelligence and machine learning as they are used withinthis paper – as there are many interpretations of both. • Artificial intelligence (AI) is the simulation of human intelligence in machines. Its critical feature is the ability to make decisions. Thus, there is a vast range of capabilities within artificial intelligence. A simple manifestation would be a search engine – such as Google. More sophisticated AI systems would include agents that make autonomous decisions – such as Apple’s SIRI. • Machine learning (ML) is a branch of AI that is specifically concerned with enabling machines to understand information, intent and context. Its critical feature is to derive the meaning of data by evaluating data from sensors and/or data storage devices. Examples include: latent Dirichlet allocation (LDA) and ai- one’s adaptive holosemantic dataspace (HSDS). Both are self-organizing maps (SOMs) that detect patterns. However, there are significant differences due to LDAs use of Bayesian statistics which make it computationally less efficient than HSDS which is a new form of neural network that is transparent, autonomous and at least as accurate as Bayes.ABOUT AI-ONE inc. is a Delaware C-corporation with headquarters in La Jolla, California and offices inZurich Switzerland (ai-one AG) and Berlin Germany (ai-one GmbH). The company wasoriginally named Semantic Systems when it was founded in Zurich Switzerland in 2003 byWalter Diggelmann, Manfred Hoffleisch and Thomas Diggelmann. The companycommercializes the mathematical discoveries by Manfred Hoffleisch and the invention of theHoffleisch Neuronal Network (HNN). This technology has now evolved dramatically over thepast eight years to a point where it is now commercially available as a software development kit(SDK) and application programming interface (API).Our mission is to embed “Biologically Inspired Intelligence” in every computing device andapplication, empowering developers to help people to use the global information explosion toimprove the quality of human life. More information on ai-one can be found at © 2011, ai-one inc. Page 5 of 25
  6. 6. ABSTRACT:Machines can learn like humans by understanding the inherent complexity ofpatterns and associations in data.The goal of this paper is to inspire new ideas and invite collaboration to innovate newways to protect large-scale cyber assets. Our central questions are: 1. How will real-time, deep pattern recognition change cyber warfare? 2. How will machine learning of byte-patterns impact the evolution of cyber attacks? 3. How can machine learning systems protect large-scale networks? 4. Can machine learning reduce the human capital and expenditures required to defend large scale networks?Cyber defenses of the US military, government and critical civilian infrastructure areinadequate. The US Department of Homeland’s “Cyberstorm III” drill in September 2010demonstrated that private industry and government resources are unable to protectcritical infrastructure from destruction from a well-orchestrated cyber attack.1 “Americancyber defense has fallen far behind the technological capabilities of our adversaries[such]…that the number of cyber attacks is now so large and their sophistication sogreat that many organizations are having trouble determining which new threats andvulnerabilities pose the greatest risk.”2This paper outlines a framework to improve US cyber defenses in a matter of months atvery minimal cost with virtually no technological risk.A new form of machine learning discovered by ai-one inc. has the potential to transformcyber warfare. This technology was made “America’s prosperity in the 21stcommercially available in June 2011. It is in century will depend on cyberuse by Swiss law enforcement, a major security.”European mobile network and underevaluation by more than 40 organizations PRESIDENT BARAK OBAMA, MAY 29, 2009 3worldwide.1 US GAO report, “CYBERSECURITY: Continued Attention Needed to Protect Our Nation’s CriticalInfrastructure.” Statement of Gregory C. Wilshusen, Director, Information Security Issues, July 26, 2011.2 The Lipman Report, “Threats to the Information Highway: CyberWarfare, Cyber Terrorism and CyberCrime.” October 15, 2010, p.1.3 Bundeskriminalamt (German equivalent to the US FBI) built a shoe print recognition system that is inuse at three major Swiss CSI labs. ai-one is restricted from advertising or using the name of customers aspart of licensing and non-disclosure agreements. © 2011, ai-one inc. Page 6 of 25
  7. 7. Large scale government and corporate networks are irresistible targets for cyber attacks– from hackers, hostile government agencies and malicious NGOs. These networks arefantastically complex. Each user, application, data source, sensor and controlmechanism add value. Yet each of these components increases the threat surface forcyber attacks. Defending a network by simplifying network complexity is not an option.Taking functionality away from a network would be self-defeating. Moreover, the bestnetworks use a blend of custom, commercial and open-source technologies – eachpresenting a new opportunity for attack. Thus, cyber security depends on understandingcomplexity – not simplifying it.Current technologies using Computer programming – such as anti-malware software,firewalls and network appliances (such as IDPS) – are unable to detect the mostcatastrophic forms of zero-day attacks: “All war presupposes humanincremental delivery of viruses, application weakness and seeks to exploit it.”hijacking, impersonation, insider CARL VON CLAUSEWITZ IN VOM KRIEGE 4conspiracies and cloaked DDoS.Why? Computer programming is reductionist and prone to cognitive biases. First,programmers and analysts simplify threat profiles by categorizing them so they can beprocessed mathematically and logically using structured data. For example, they lookfor viruses and potential variations using fuzzy matching techniques. Simplifying thecomplexity of suspicious byte-patterns into mathematical models provides ampleopportunities for attackers to “hide in the noise.” Secondly, programmers and analystsare human. They make mistakes. Moreover, they tend to repeat mistakes – so if youfind one security hole, you can search for patterns that will lead you to others.Cyber attackers know these weaknesses and exploit them by hiding within the noise ofnetwork complexity and discovering patterns of weaknesses. Deception and exploitationof predictable defensive patterns are the pillars of successful offensive cyber attacks.Thus, current defenses are destined to fail against the next generation of zero-daycyber attacks (such as incremental viral insertion, MHOTCO and genetic algorithmintrusions).5 “All warfare is based on deception.” New artificial intelligence technology that THE ART OF WAR BY S UN T ZU, 600 BC learns through detecting data heterarchies enables unprecedented levels of cyber security and countermeasures. Knowing the4 Zero-day attacks refer to threats to networks that exploit vulnerabilities that are unknown toadministrators and/or cyber security applications and appliances. Zero-day exploits include detection ofsecurity holes that are used or shared by attackers before the network detects the vulnerability.5 See Appendix for “Worst Case Scenario” that describes possible MHOTCO attack. © 2011, ai-one inc. Page 7 of 25
  8. 8. structure of data is the key to understanding its meaning. Machine learning usingheterarchical pattern recognition reveals the relationships and associations between allbytes across an entire system (or network) – including overlaps, multiplicities, mixedascendancies, and divergent-but-coexistentpatterns. This approach is similar to howhumans learn: We associate stimuli withpatterns. For example, a child learns that thesound “dog” refers to the 65-pound, four-leggedcreature with soft fuzzy white hair. A computerwould need to be programmed with a series ofcommands to know that dog refers to a specificcreature – and is thus unable to recognizesimilarities that are not part of thepredetermined definition of “dog” – such as ablack 5-pound miniature poodle.In June 2011, ai-one released a new machinelearning application programming interface(API) that is a radical departure from traditionalforms of artificial intelligence. The technology is A REPRESENTATION OF HETERARCHY DATAa neural network that detects heterarchical STRUCTUREbyte-patterns and creates a dynamicdescriptive associative network – called a lightweight ontology. This technologydetermines the meaning of data by evaluating the relationships between each byte,cluster of bytes, words, documents, and so on. Unlike other forms of artificialintelligence, ai-one’s approach: • Detects how each byte relates to another – including multiple paths, asynchronous relationships and multiple high-order co-occurrences. • Automatically generates an associative network (lightweight ontology) revealing all patterns and relationships – detecting anomalies within any portion of the data set. • Enables machine learning without human intervention. • Unbiased. Does not rely upon external ontologies or standards. • Learns associations upon data ingestion – so it is much faster than techniques that require recalculations, such as COStf-idf.6, 76 COStf-idf is an approach to determine the relevance of a term in any given corpus.7 For a more extensive comparison see: Reimer, U., Maier, E., Streit, S., Diggelmann, T., Hoffleisch, M.,Learning a Lightweight Ontology for Semantic Retrieval in Patient-Centered Information Systems. InInternational Journal of Knowledge Management, 7(3), 11-26, (July-September 2011) © 2011, ai-one inc. Page 8 of 25
  9. 9. • Non-redundant. Each byte pattern is stored only once. This has the effect of compressing data while increasing pattern recognition speed. • Spawning cells. The underlying cell structure in the neural network is autonomic; generating cells as they are needed as they are stimulated by sensors (during data input). • Neural cells can be theoretically shared across other instances of the network.8This technology has the potential to enable “UNDERSTANDING AI-ONE REQUIREScyber security systems to detect, evaluate and AN OPEN MIND – ONE THAT IGNOREScounter threats by assessing anomalies within WHAT HAS BEEN AND EMBRACES WHATpackets, byte-patterns, data traffic and user IS POSSIBLE.”behaviors across the entire network. When ALLAN TERRY , PH D, FORMER DARPA AIplaced into a matrix chipset, this technology S CIENTIST (PRIME CONTRACTOR)can theoretically evaluate every byte acrossthe entire network in real time with exabytes (1018) of capacity using a combination ofsliding windows, high performance computing (HPC) and hardware accelerators.As such, we will present how this technology has the potential to revolutionize cybersecurity by supporting each of the “Five Pillars” framework defined by the US Military forcyberwarfare:9, 10 Cyberwarfare Pillar Potential Roles for Machine Learning Cyber domain is similar • Transparency to command & control of emerging threats to other elements in • Unbiased detection & analysis of threats by detecting battlespace. anomalies • Empower human analysts with actionable intelligence Proactive defenses • Constant real-time monitoring of every packet across network • Near instant recognition of anomalies within packet payload or communication frames Protection of critical • Enhance intrusion detection and protection systems (IDPS) infrastructure with real-time libraries & heuristic approximations of potential threats Collective defense • Early detection & instant response across entire network • Enable counter-counter-measures, trapping, etc. Maintain advantage of • Early adoption of technology with accelerating rate of returns technological change (1st mover advantage).8 ai-one internal research project scheduled for mid-2012.9 For purposes of this paper, the requirements of large multi-national corporations (such as Goldman-Sachs, Google, Exxon, etc.) are substantially similar to those of government agencies (such as DoD,DHS, NSA, etc.). © 2011, ai-one inc. Page 9 of 25
  10. 10. The next generation of cyber security attacks will be deadly in their subtly: They canremain undetected until it is too late to prevent catastrophic loss of data, connectivityand/or malicious manipulation of sensitive information. Such attacks can collapse keyinfrastructure systems such as power grids, communications networks, financialsystems and national security assets.The advantages of machine learning as a first line of defense against zero-day attacksinclude: • Force multiplication – enabling fewer human analysts to indentify, thwart and counter far greater numbers of attacks than programmatic approaches. • Evolutionary advantage – enabling cyber defenses to preempt threat adaptations by detecting any change within byte patterns. • Battlespace awareness – providing network security analysts with situational awareness by identifying and classifying byte pattern mutations. • Proactive defenses – Constant monitoring of the entire threat surface to detect any patterns of vulnerability before they can be exploited by the enemy. © 2011, ai-one inc. Page 10 of 25
  11. 11. THE CURRENT STATE OF CYBER SECURITY IS FUNDAMENTALLYFLAWEDOur research indicates that cyber security is far worse than is commonly reported in news outlets. We estimate there is an extreme shortage of human capital with the skills necessary to thwart attacks from rapidly evolving, highly adaptive adversaries. 11 , 12 Research for this paper includes publically available sources of information found on the Internet, interviews with network and software security experts and experts in artificial intelligence. In TWITTER CALL FOR DDOS ATTACK particular, we speculate on how machine learning might impact the security of large-scale (enterprise)networks from both offensive and defensive perspectives. In particular, we seek to findways that machine learning might create and thwart zero-day attacks in networksdeploying the most current security technologies, such as neural network enabledintrusion detection and protection system (IDPS), heuristic and fuzzy matching anti-malware software systems, distributed firewalls, and packet encryption technologies.Furthermore, we evaluate ways that adaptive adversaries might bypass application levelsecurity measures such as: • address space layout randomization (ASLR) • heap hardening • data execution prevention (DEP)We conclude that machine learning provides first-mover advantages to both attackersand defenders. However, we find that the nature of machine learning’s ability tounderstand complexity provides the greater advantage to network defenses whendeployed as part of a multi-layer defensive framework.As networks grow in value they become exponentially more at risk to cyber attacks.Metcalfe’s Law states that the value of any network is proportional to the number ofusers.13 From a practical standpoint, usability is proportional to functionality. That is, theuse of a network is proportional to its functionality: The more it can do, the more peoplewill use it. From a cyber security standpoint, each additional function (or application)running on a network increases the threat surface. Vulnerabilities grow super-linearly11 The shortage in cyber warriors in the US Government is widely reported. For example, see Threats to the Information Highway: Cyber Warfare, Cyber Terrorism and Cyber Crime13 2 V n where value (V) is proportional to the square of the number of connected users of a network (n). ∝ © 2011, ai-one inc. Page 11 of 25
  12. 12. because attacks can happen at both the application surface (through an API) and in theconnections between applications (through malicious packets).14Coordinated cyber attacks using more than one method are the most effective means tofind zero-day vulnerabilities. The December 2009 attack on Google reportedly reliedupon exploiting previously discovered pigeonholes to extract information while humananalysts were concurrently distracted by what appeared to be an unrelated attack. SOURCES & TYPES OF CYBER ATTACKS Threats Attack Types Internal (employees, contractors, • Malicious code (viruses, Trojans, etc.) etc.) • Incremental payloads (MHOTCO, API hijacking, etc.) & • Brute Force (DDoS, hash collisions, etc.) External • Impersonation (ID hack, etc.) (hostile nations, terrorist • Camouflage (cloaking, masking, etc.) organizations, criminals, etc.) • Conspiracy (distributed leaks, espionage, etc.)Cyber attacks are usually derivatives of previously successful tactics.15 Attackers knowthat software programmers are human – they make mistakes. Moreover, they tend torepeat the same mistakes – making it relatively easy to exploit vulnerabilities once theyare detected. 16 Thus, if a hacker finds that a particular part of a network has beenbreached with a particular byte-pattern (such as a birthday attack) they will often createnumerous variations of this pattern to be used in the future to secure an entry into thenetwork (such as a pigeonhole).Let’s evaluate a few of these types of attacks to compare and contrast Computerprogramming and machine learning approaches to exploit and defend cybervulnerabilities.14 Threat vulnerability is a corollary to Metcalfe’s Law whereby each additional network connection 2 2provides an additional point security exposure. T (n p ) where vulnerability (T) is proportional to the ∝square of the number of connected users of a network (n) times the square of the number of APIs (p).15 Interview with former anonymous hacker.16 Yamaguchi, Fabian. “Automated Extraction of API Usage Patterns from Source Code for VulnerabilityIdentification” Diploma Thesis TU Berlin, January 2011. © 2011, ai-one inc. Page 12 of 25
  13. 13. EXPLOITING API W EAKNESSES (APPLICATION HIJACKING)Detecting flaws in application program interfaces (APIs) is a rapidly evolving form ofcyber attack where vulnerabilities in the underlying application are exploited. Forexample, an attacker may use video files to embed code that will cause a video playerto erase files. This approach often involves incrementally inserting malicious code,frame-by-frame, to corrupt the file buffer and/or hijack the application. This incrementalapproach depends upon finding flaws within the code base. This is easily done if theattacker has access to the application outside the network – such as a commercial oropen-source copy of the software.PROGRAMMING MEASURES AND COUNTER-MEASURES TO API EXPLOITSTraditional approaches to thwart derivative attacks to an API are relativelystraightforward and human resource intensive: First, the attack is analyzed to identifymarkers (such as identifiers within packet payload). Next, the markers are categorized,classified and recorded – usually into a master library (e.g., McAfee Global ThreatIntelligence). Finally, anti-malware software (such as McAfee) and IDPS networkappliances (such as ForeScout CounterACT) scan packets to detect threats from knownsources (malware, IPs, DNS, etc.). Threats that are close derivatives of known threatsare easily thwarted using look up tables, algorithms and heuristics while concurrentlydetecting and isolating anomalous network behavior for further human review.PROBLEMS WITH THE COMPUTER PROGRAMMING APPROACHThere are many problems with defenses that know only what they are programmed toknow. First, it is almost impossible for a person to predict and program a computer tohandle every possible attack. Even if you could, it is practically impossible to scalehuman resources to meet the demands of addressing each potential threat as networkcomplexities grow exponentially. A single adaptive adversary can keep many securityanalysts very busy. Next, cyber threats are far easier to produce than they are to detect– it takes 10 times more effort to isolate and develop counter measures to a virus than itdoes to create it. 17 Finally, the sheer scale of external intelligence and humanresources far outstrips the defensiveresources available within the firewall. For “Should we fear hackers? Intent isexample, the US Army’s estimated 21,000 at the heart of this question.” K EVIN MITNICK , H ACKER, AFTER HIS RELEASEsecurity analysts must counter the collective FROM FEDERAL PRISON 2000.learning capacity and computational17 Estimate based on evaluation of virus source codes available at Also see:Stepan, Adrian. “Defeating Polymorphism: Beyond Emulation” Microsoft Corporation, 2005. © 2011, ai-one inc. Page 13 of 25
  14. 14. resources of all hackers seeking to disrupt ARCYBER – potentially facing a 100:1disadvantage worldwide.18Moreover, new approaches to malware involve incremental loading of fragments ofmalware into a network where they are later assembled and executed by a nativeapplication. Often the malicious code fragments are placed over many disparatechannels and inputs thereby disguising themselves as noise or erroneous packets.19MACHINE LEARNING MEASURES AND COUNTER-MEASURES TO API EXPLOITSMachine learning is an ideal technology for both attacking and defending against APIsource code vulnerabilities. Knowing that programmers tend to repeat mistakes, anattacker can find similarities across the code base to identify vulnerabilities. Asophisticated attacker might use genetic algorithms and/or statistical techniques (suchas naïve Bayes) to find new vulnerabilities that are similar to others that have beenfound previously. Machine learning provides defenders with an advantage overattackers because it detects these flaws before the attack. This enables the defender toentrap, deceive or use other counter-measures against the attacker.Machine learning provides a first-mover advantage to both defender and attacker – butthe advantage is far stronger for the defender because it can detect any anomaly withinthe byte-pattern of the network – even after malicious code has bypassed cyberdefenses, as in a sleeper attack.20 Thus, the attacker would need to camouflage byte-patterns in addition to finding and exploiting vulnerabilities – thus requiring the attackerto add tremendous complexity to his tactics to bypass defenses. Since machine learningbecomes more intelligent with use, the defenders systems will harden with each attack– becoming exponentially more secure over time.EXPLOITING IMPERSONATIONSCounterfeiting network authentication to gain illicitaccess to network assets is one of the oldest tricksin the hacker’s book. This can be done as easily asleaving a thumb drive infected with malware in aparking lot for a curious insider to insert into anetwork computer. It can also involve sophisticated IDENTITY THEFT PHOTO CREDIT: NEW YORK TIMES18 Force size estimates from Examples of this technique were discussed at the BlackHat Security Conference in early August 2011.20 For a discussion on sleeper attacks see: Borg, Scott. “Securing the Supply Chain for ElectronicEquipment: A Strategy and Framework.” The Internet Security Alliance report to the White House.(available on and also The US Cyber ConsequencesUnit ( © 2011, ai-one inc. Page 14 of 25
  15. 15. social engineering to crack passwords, find use patterns and points of entry for a hackerto impersonate a legitimate user.21PROGRAMMING MEASURES AND COUNTER-MEASURES TO IMPERSONATIONSTraditional approaches to impersonation attacks depend upon user authentication and tionalcontrolling access to network assets using predetermined permissions Once an permissions.attacker is inside the network with a false identity, he can run freely so long as he doesnot trigger any alarms by violating his permissions. This defense is entirely gerprogrammatic as it assumes that if the attacker gets past the firewall he will behavedifferently than a legitimate user. This is irrelevant to defense since the attacker can usehis presence to learn about network assets to attack them in different ways. For senceexample, the attacker can identify APIs, network appliance and determine other appliancessecurity protocols to identify further vulnerabilities that might be compromised with anexternal attack.PROBLEMS WITH THE COMPUTER PROGRAMMING APPROACH TO PREVENTIMPERSONATIONSRules-based permissions are only as good as the rules can model human behavior. basedAttackers familiar with these rules and the standard practices of network security easily rulesstay within acceptable boundaries of use.MACHINE LEARNING MEASURES AND COUNTER-MEASURES TO IMPERSONATIONIn the case of insider threats, machine learning provides the defender more advantages asethan the attacker. Although attackers can use machine learning of byte byte-patterns to“hack” an identity, they are limited to behaving exactly as that identity would – to theextent that they must know how that person has behaved in the past and how thesystem will perceive their every movement. The defenders advantage is that machinelearning creates an “entology” – an ontology of the entity – for every authenticated user.This is a heterarchical representation ofall past behavior at the byte or packet- AI - ONE ’S TECHNOLOGY WORKS LIKE AN “ EMPTY byte- Llevel. This enables network security to BRAIN ” – LEARNING FROM ASSOCIATIONS . ASSOCIevaluate use patterns to find anomaliesthat would be difficult (if not impossible) topredict using a set of computerprogramming commands. Machinelearning does not depend on rules –rather just observation to find associationsand patterns. This can be done at every21 Interview with former forensic network security agent at major investment bank. ith © 2011, ai-one inc. Page 15 of 25
  16. 16. at every point within the network – routers, network appliances, APIs, data basesaccess points, etc.THREAT EVOLUTION: EXPLOITING COMPLEXITYDetecting cyber threats is much like finding signals within noise. The greater the noise,the more difficult it is to detect faint signals.Traditional computer programming technologiesrequire data to be structured into a known formatbefore it can be transformed using mathematicaland logical operations. Machines are only assmart as what they are told. A programmer mustcommand every step of the process for themachine to complete a task – such as recognizea pattern.For more than 50 years, the field of artificial intelligence evolved techniques to enablemachines to learn. A complete discussion of this vast body of work is beyond the scopeof this paper. However, it is important to note that developments in neural networksenable machines to learn complex patterns without human supervision – such asHopfield and Kohonen neural networks. In these technologies, it is necessary to providestructure and parameters for what will be learned. For example, traditional neuralnetworks rely upon training sets and/or neighborhood functions. Even then, theseapproaches run the risks of “over-learning” and learning the wrong things.22, 23 Learningis biased because the networks depend on human assumptions.24LIGHTWEIGHT ONTOLOGIES (LWO): A NEW COMPUTATIONALAPPROACHAn easy way to understand ai-one’s technology is to think it as an “empty brain” (like aninfant) that learns the meaning of data through associations. Similar to a small child thatlearns language by associating individual sounds with physical objects, ai-one’s neuralnetworks learns the meaning of bytes by associating them with the other bytes. Thenetwork builds an associative network that defines the relationship of every byte withinthe entire corpus of data. This relationship can be symmetric, asymmetric orheterogeneous. This corpus can be as small as a single character of a word or as large22 Reimer, U., op. cit.24 Ibid. © 2011, ai-one inc. Page 16 of 25
  17. 17. as the entire Internet. The limitation to how much the data the system can process is afunction of hardware and system’s technology is radically different B RAIN V IEW VISUALIZATION TOOL SHOWS THEfrom other forms of neural nets or artificial LWO D ATA R ELATIONSHIPS INSIDE AI - ONEintelligence. First, ai-one’s nets do not haveany neural structures pre-defined by theuser. Rather, they resemble neurologicalstructures where connections between thenodes are autonomic – forming withoutconscious control. These connections forman n-dimensional graph that describes allrelationships between every byte that hasbeen fed into the system. The systemlearns at the time of data ingestion –automatically adjusting relationships toaccount for new data.Second, the system creates a lightweightontology (LWO) that automatically classifies each byte into a hierarchy by topic –starting with the most general then progressively moving to the most specific. Aunlimited number of hierarchies can form in any direction – thereby forming aheterarchy. Hierarchical classifications are arranged by hyponymy. 25 , 26 ai-one’slightweight ontology differs from a full-fledged ontology because it detects only theinherent semantic meaning of each byte as it relates to another – there is no humanbias or over-learning. Rather, the LWO enables the machine to learn high-orderrelationships between any data element. For example, it can detect the conceptualmeaning of words and isolate when a word is used in an unexpected or unique way.Another feature of ai-one’s technology is that it provides humans with the option toteach the system thereby giving the machine an intentional point-of-view. Queries canbe introduced to the LWO that dynamically adjust the topography of the data toinfluence the importance of data elements to specific relationships. This enables it to25 Also known as hypernym-hyponym relationships. A hyponym is a word (or data element) that isincluded in the meaning of another word (or data element) with broader classifications. For example,‘scarlet’ is the hyponym to the hypernym ‘red.’26 Hyponymy is usually associated with computational linguistics and natural language processing. ai-oneapplies these classification and extraction techniques to include other forms of data. For a discussion onhyponymy see: Navigli, Roberto and Velardi, Palo “Learning Word-Class Lattices for Definition andHypernym Extraction” Proceedings of the 48th Annual Meeting of the Association for ComputationalLinguistics, pages 1318–1327, Uppsala, Sweden, 11-16 July 2010. © 2010 Association for ComputationalLinguistics © 2011, ai-one inc. Page 17 of 25
  18. 18. “learn” the optimal path to answer a question. If that question is repeated, the systemtightens the associations among the relevant data elements that form the answer. Thisprocess can be thought of as similar to the way muscle memory works in humans.Complex patterns, such as a tennis serve, are learned through repetition.Unlike traditional neural nets, ai-one’s technology reveals all relationships that comprisethe answer to a query. It is semi-transparent. It is also teachable – commands within theSDK enable humans to instruct the system to make specific associations and ignoreothers. However, the best practice is to teach the system by directing it to use externalresources that are verified as truthful (such as malware libraries) to learn patternsfaster.Some of the most useful instructions focus the system on finding anomalies – so youfind answers to the questions you didn’t know you needed to ask.Finally, the system is both language and data agnostic because it learns at the byte-level.SUMMARY OF THE BENEFITS OF AI-ONE’S TECHNOLOGY • Works with existing technologies The CORE API/Library works with other programming languages such as C, Java, Microsoft, etc. • Autonomic learning it learns as it is stimulated by external sensors, without any human intervention or training sets. • Machine generated lightweight ontologies (LWO) reveals all relationships with simple commands. • Dynamic topologies Finds the best answers by automatically reshaping data surfaces to fit queries. • Byte-level processing Data and language agnostic. It works equally well with structured and unstructured data. It can work with or without external references (e.g., human-curated ontologies, libraries, databases, etc). • Fast 10,000x faster than COStf-idf in a benchmark comparison. • Efficient No need to re-index entire corpus as new information is learned or inserted. • Transparency It reports on the pathway that is used to determine associations. • Asymmetric, bidirectional pathways enables machine detection of high- order co-occurrences where concepts (or words or packets or bytes) can be closely associated although they never occur in the same place at the same time. (E.g., The words “rust” and “corrosion” mean the same thing although they never occur together.) © 2011, ai-one inc. Page 18 of 25
  19. 19. • Low cost & fast deployment far less expensive and faster to implement than competing technologies. • Future flexibility extensible architecture ensures solutions build with SDK will port into future firmware (chipsets) in the near future. Our intent is to provide customers with an easy plug-and-play upgrade to further improve the speed and performance of solutions built using ai-one software.AI-ONE TECHNOLOGY ROADMAPCURRENT COMMERCIAL-OFF-THE-SHELFai-one’s technology is currently offered as a core API/Library, with a small softwaredevelopment kit (SDK) that enables programmers to build artificial intelligence into anyapplication. The system mimics neurophysiology – generating neural nets withmassively connected, asymmetrical graphs that are stimulated by binary spikes fromexternal sensors (e.g., IPDS, firewalls, etc.).The system grows as it S CREEN S HOT OF AI - ONE ’ S B RAIN B OARD P ROTOTYPING T OOLlearns through FOR MACHINE LEARNING INSTRUCTION SETSexposure to data (e.g.,from sensors such asrouters, firewalls, etc.).The currentmanifestation of thetechnology isappropriate for a smallscale proof-of-conceptto demonstrate how ai-one’s technology canbe used to hardennetwork securityagainst MHOTCO andother, unanticipatedforms of cyber attacks.64-BIT MULTI-THREAD COTS ai-one is currently porting the 32-bit, single thread dll into a 64-bit multithread, platformagnostic system. Theoretically, this version will have the capacity of storing andprocessing up to 2 exabytes of data (2EB or 2018 bytes) per instance. Processing speedfor this size of data will depend upon a number of factors beyond ai-one’s control – sucha memory access times, processor speeds, operating system overhead, etc. © 2011, ai-one inc. Page 19 of 25
  20. 20. 64-BIT CHIPSETSFuture plans include porting the core technology onto FPGA and ASICS chipsets – towhere it will run approximately 10,000x faster than traditional neural nets. ai-one is at an aiadvanced stage of research after A RCHTECTURE FOR AI - ONE ARTIFICIAL B RAINspending more than eight years C HIPSETdeveloping the technology in thisdirection.We anticipate that the most likelycandidate for an “artificial brain” that icialcan learn the relationships of bothstructured and unstructured data willbe a matrix ASICS chipset wherethe holosemantic network using ai ai-one’s neural mathematics willoperate in unison with traditionalchipsets running linear equations(e.g., Intel, IBM, etc.).This matrix architecture enablesunlimited scaling across multiplecomputational clusters. C ONCEPT D IAGRAM FOR HETERARCHICAL D ATA P ROCESSING C HIPSince 2006, ai-one has been onebuilding experimental prototypesusing this matrix architecture. Our rixresearch indicates these chipsetswill be 104 to 106 times faster thanneural networks running on thecurrent generation of Intel i7 900chips. Commercial productionrequires additional research and dditionaltesting.NEXT STEPS – PROOFS OF CONCEPT SIMMEDIATE COTS-BASED27 APPROACHThe current version of ai-one’s technology (32-bit, single thread) is appropriate for a one’s bit,small-scale proof of concept (POC) where a total of 250MB of network traffic would be (POC)27 COTS = commercial-off-the-shelf. shelf. © 2011, ai-one inc. Page 20 of 25
  21. 21. monitored offline using a moving windows approach. The objective would be to proveai-one’s machine learning approach can detect threats posed by an oppositional (red)attack team. This system could be built within three months at a cost of less than$600,000 using all off-the-shelf hardware and software integrated into a custom builtapplication using ai-one’s Topic-Mapper API.INTERMEDIATE COTS APPROACHThe second evolution would be a POC using a 64-bit multithread instance of ai-one’stechnology. This is due for commercialrelease in 2012. Research indicates this AI - ONE ’ S F IRST W ORKING P ROTOTYPE OF AN A RTIFICIAL B RAIN , 2006configuration can potentially process upto 2EB (exabytes or 2018 bytes) of dataper instance and can be deployed as asoftware-as-a-service (SaaS) over acommercial host such as Amazon WebServices or Google App Engine.However, more research is necessarybefore determining how this COTSapproach would be most effective againstcyber threats. For example, it might bemost effective to use a sliding windowapproach combined with clustering multiple instances of the 64-bit dll across a HPCcluster. 28 This POC will demonstrate the ability of the ai-one solution to scale toaccommodate security for most enterprise networks – likely in excess of 1 petabyte(1015) of traffic per day). We estimate this will take approximately 1 year to develop at acost of approximately $3-5 million excluding hardware manufacturing costs.MATRIX CHIPSET APPROACHThe third evolution will involve developing and testing in two stages. The first stage is aPOC using the 64-bit multithread deployed as a field programmable gate array (FPGA)that will be configured to run up to 1 exabytes (1018) to demonstrate that the ai-onesolution can operate at network speed for a 1 petabyte/day traffic load. We estimate thiswill take approximately 1 year to develop at a cost of not more than $10 million for thefirst chipset excluding the costs of manufacturing hardware.28 High performance computing (HPC) clusters may require changes to bus architectures toaccommodate neural cell traffic. More research is necessary in this area. © 2011, ai-one inc. Page 21 of 25
  22. 22. Once FPGA is proven, the second stage is to deploy the FPGA solution usingapplication specific chip sets (ASICS) that will operate in clusters where each chipperforms 10,000x of current COTS speeds We estimate the ai-one ASICS solution will speeds. oneoperate at very low-energy levels energy(<10% of current Intel i7 chips) and C LUSTERING OF AI - ONE M ATRIX CHIPS FOR HPCbe able to process at least1PB/second. We estimate thissolution will require an additional 1year of development at a cost ofapproximately $50 million excludingthe costs of manufacturing hardware hardware. © 2011, ai-one inc. Page 22 of 25
  23. 23. APPENDIX - A WORSE CASE SCENARIO: MULTIPLE HIGH ORDERTERM CO-OCCURRENCE ATTACKSOur research indicates that current network security technologies are unable to thwartmultiple high order term co-occurrence (MHOTCO) attacks. The essence of MHOTCOis to use packets that look and behave differently when viewed as individuals butassimilate into malware once a critical mass touches a vulnerable network asset,appliance or application. Each packet can be thought of as a word or part of a word.MHOTCO attacks use different “words” that mean the same thing – usually at severalextended levels.29 The diagram below illustrates how MHOTCO escapes detection fromcomputer programming approaches.D IAGRAM 1: R ELATING P ACKETS U SING A S INGLE H IGH O RDER C O -O CCURRENCER EFERENCE P ACKETMHOTCO attacks introduce malware into a network using multiple packets that areseemingly unrelated because they come from different sources, use different controlbits and have payloads that do not have any similarity. Thus, traditional lookup tablesand heuristic testing of packets will not detect a threat from any single or small group ofpackets. An effective tactic to attack structured network defenses using MHOTCO is to29 Hyponynyms can extend to many levels called orders. For example, ‘scarlet’ is a first-order hyponym to‘red’ and a second-order hyponym to ‘color,’ etc. © 2011, ai-one inc. Page 23 of 25
  24. 24. intentionally stimulate and monitor network defense counter-measures to deducevulnerabilities in unstructured (human analyst) defenses.UNNOTICEABLE ATTRITIONA possible scenario might start with a birthday attack on a military network wherebyhash collisions identify pigeonhole opportunities to bypass an intrusion detection andprotection system (IDPS) and firewall security systems. Once the pigeonhole error isidentified, the hacker reveals further vulnerabilities in the network appliances’ approachfor threat identification by intentionally stimulating signature-based, statistical anomaly-based and stateful protocol analysis detection measures. The attacker uses a series ofsplit A/B tests to compare successful and rejected intrusions to refine malware packetwrappers and payloads. These results develop a topology of network defenses –including detailed analysis of vulnerabilities in IDPS, network access and controlsystems, and malware counter-measures.Successful MHOTCO attacks are unrecognized by the network defenses for severalreasons: • They fail to recognize that the attack is from a common source. • They have no physical or signature similarity. • They do not spike network activity (unlike DDoS).The MHOTCO attacker camouflages his identity by using rotating or masking IPaddresses, aliases, and impersonations, etc. He places malicious packets that do notconform to the known patterns within the libraries of the anti-malware software. Theselie dormant and undetected until the attacker decides to use them. The MHOTCOattacker is patient – he might take years to slowly test security measures to ensure thatnetwork activities never increase above a threshold. He hides in a sea of cyber noise.The human analyst never knows that network security has been compromised – all hecan do is refine the malware detection algorithms using his imperfect knowledge of thesituation.At this point, the attacker has complete control of the cyber battlespace – he knowswhere the vulnerabilities lie and has implanted malicious code that can be deployed athis command at a time of his choosing. He has used counter-counter measures tointentionally misguide his adversaries into building ever more convoluted defenses thatdepend on the questionable accuracy of algorithms and statistics at a cost of thousandsof human analysts – all while increasing complexity and potential areas to hide.Meanwhile, the network administrators and security analysts feel confident they aresafe. They believe they have successfully thwarted numerous unrelated attacks – andcan report that network activities are within acceptable variances with detailed analysis © 2011, ai-one inc. Page 24 of 25
  25. 25. of the threats that have been stopped. They have developed mountains of code to heanalyze and process threats. And they have thousands of “highly trained” cyber-defense cybertroops to step-in if needed.Cyber warriors have no idea that they have already lost the battle – until it is too late to edo anything about it.THE GAME CHANGER: MACHINE LEARNINGNow imagine the same situation where a MHO CO attacker uses the same tricks – only MHOTCOthe network administrators use artificial intelligence so network appliances (machines) artificialcan learn the intent, purpose and expected behavior for every packet on the network.The artificial intelligence system learns the relationship of every packet on the networkto every other. The system develo . develops an infinitely scalable, n-dimensional graph – a dimensionalholosemantic dataspace – that enables the machines (firewalls, IDPS, etc.) tounderstand how all packets are related at the byte-level.30 Camouflaging payloads andcontrol wrappers are useless because the machines using ai-one’s intelligence one’sunderstand the inherent latent semantic meaning of each packet by detecting hyponymyrelationships autonomously. .Now the MHOTCO attacker unintentionally reveals his intent with every packet he COintroduces into the system. The network administrators are in control. They can deployCN measures to manage attacks at their own discretion to support strategic military andpolitical objectives. AI - ONE ’ S API CONNECTS NETWORK SENSORS TO SEMachine learning transforms HOLOSEMANTIC DATASPACE . CEnetwork vulnerabilities into strategicmilitary assets.30 Holosemeantic means ‘whole meaning.’ ai one uses this term to describe that each cell within the ai-oneneural network contains enough information to infer the shape of the data space. Another way tounderstand this is to think of it as an emergent system. For more information on emergence see: Corning,Peter A. “The Re-Emergence of “EMERGENCE”: EmergenceA Venerable Concept in Search of a Theory” in Complexity (200 7(6): pages 18-30. (2002) © 2011, ai-one inc. Page 25 of 25