Cognitive Computing - Associative Memories, Saffron Technologies
Upcoming SlideShare
Loading in...5
×
 

Cognitive Computing - Associative Memories, Saffron Technologies

on

  • 2,501 views

Saffron's Associative Memory Base (SMB) enables real time predictive analytics on hybrid data (structured and content) without rules or modeling (schema free). SMB generates a semantic graph from the ...

Saffron's Associative Memory Base (SMB) enables real time predictive analytics on hybrid data (structured and content) without rules or modeling (schema free). SMB generates a semantic graph from the hybrid raw Big Data. The edges of the graph contain the counts, how often have two attributes been observed jointly. The counts are the basis for correlations and statistics. The graph is stored as matrices enabling high performance analytics.

Saffron Memory Base is used for sense making and prediction in the DoD, the National Security Community, global risk, personalized consumer marketing and financial services. Customers are the DoD, many Fortune 500 companies like Boeing, GE, GD, non-profit organizations like The Bill and Melinda Gates Foundation, and many more.

Statistics

Views

Total Views
2,501
Views on SlideShare
2,469
Embed Views
32

Actions

Likes
2
Downloads
98
Comments
0

4 Embeds 32

http://www.interface.ru 13
http://v-unsuty-5-11 10
http://www.linkedin.com 8
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We’ve been promised intelligence. AI since 1957, intelligent agents since Apple Navigator. What do we have? Even in search, the top results are arbitrary, on popularity or who paid most.Searching, making sense, predicting, making decisions is hard cognitive workWe ALL need help, whether at work or home, as professional or consumerIntelligence needs to deal with the real world in real timeComputers need to be more human
  • “Synapses of the world wide web”
  • To fix this, let’s go back to visionaries like Vannevar Bush inventor and science administrator known for his work on analog computers, initiator of the Manhattan Project, founding Raytheon, and the memex, an adjustable microfilm viewer with a structure analogous to that of the World Wide Web. Founder of NSF
  • Semantic WebThe MEANING
  • 3 univcomp machines: von Neumann architecture – CPU and RAM; cellular automata (von Neum & StaniUlam); associative memories –synapses as compute and storage unit -> content addressable associative memory -> asynchronous, reaching fixed point - Hopfield nets (homomorph to Ising model -> node's behavior is deterministic moves to a state to minimize energy of itself & its neighbors -> Lapunov, emerging patterns
  • 3 univcomp machines: von Neumann architecture – CPU and RAM; cellular automata (von Neum & StaniUlam); associative memories –synapses as compute and storage unit -> content addressable associative memory -> asynchronous, reaching fixed point - Hopfield nets (homomorph to Ising model -> node's behavior is deterministic moves to a state to minimize energy of itself & its neighbors -> Lapunov, emerging patterns
  • Let me give a quick example of what we do, given a sentence. The real world is observed as vectors with context, not the abstraction of predicates and yes/no, true/false logic.The more natural consequence is to think of a Vector symbolic Architecture, implemented by matrices, beyond graphsThis approach also clarifies the definition of “associations” & what we mean by “connect the dots”Sentence not document, triples not pairs, statistics not just symbolics; full materialized as a very fast “memory base”Precise and contextual, a massive correlation engine as well as connection engine, with real-time query for fast exploitation
  • For every entity and other mark in the data…
  • Sentence by sentence, document by document, record by record, data base by data base, …
  • For every THING in the data, including all context – at massive scale, not just these two sentences.This is the way our brains work. I will meet some of you for the first time and know it is the first time.I knows others of you very well, not just in strength, but in know who else, where, and what of when we’ve met before.I might remember when we met on some particular topic, twice. Not the probability of meeting, a memory of meeting.
  • All the time Defrag
  • graphs are difficult to store & navigate; Add the complexity of real world linksLincoln Labs  matrices

Cognitive Computing - Associative Memories, Saffron Technologies Cognitive Computing - Associative Memories, Saffron Technologies Presentation Transcript

  • Google Twitter FACEBOOK DATABASE SOCIAL NETWORKS STOCKSEmail DATABASES EXEL Word PDF RSSCognitive ComputingPaul Hofmann, PhD, CTO Saffron Technology April 2013
  • 2 4/11/2013 © Saffron Technology, Inc. All rights reserved.
  • Cognitive Work Will be Automated - Democratized More Real Real time Streams More Human Concepts Context Experience
  • Growing Interest in Brain-Like ComputingFrom Big Data to Signalsand Predictionvia Cognitive Computing Our brain contains trillions of powerful associative neurons to represent the context and conditions of every link
  • An Old Idea, Now Inevitable for Emerging Needs “The human mind … operates by association. Selection by association, rather than indexing, may yet be mechanized.” As We May Think, 1945 Vannevar Bush
  • An Old Idea, Now Inevitable for Emerging Needs Semantic Web Hypertext All were human-friendly
  • An Old Idea, Now Inevitable for Emerging Needs “The human mind … operates by association. Selection by association, rather than indexing, may yet be mechanized.” As We May Think, 1945 Vannevar Bush Connections and counts, synapses and strengths. What else is there?
  • Order Out of Chaos - Asynchronous ComputingIsing Model for order  disorder phase transition for example Ferromagnetism Hopfield Network emerging pattern Connections and counts = synapses and strengths
  • Apex for Big Data - Brain-Like Thinking Remembering: Connections, Counts, Context SENSE-MAKING Who/what is related? How? Where? When? Googlerss TwitterFACEBOOK DATABASE SOCIAL NETWORKS Who/what is similar? How similar/different? STOCKSEXEL DECISION-MAKING Email What could happen? Where? When?DATABASES WordPDF What has been done before? Did it work? Hybrid Big Data 9 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Memory Construction and Associative Recall Attributes“John Smith” flew to London on “14 Jan 2009” aboard “United Airlines” toto meet with “Prime Minister” for “2 hours” on a rainy day. Smith” “London” on “14 Jan 2009” aboard “United Airlines” meet with “Prime Minister” for “2 hours” on a 2009” Airlines” Minister” hours”rainy day. Entities (with memory) Memory Attribute Context Entity Memories 2 “United hours day Airlines” “John “John Smith” Smith” 14 Jan aboard rainy 2009 meet flew “Prime Minister” “London” Memory Associations Created by snippet RefID 1234 10 4/11/2013 Saffron Technology, Inc. All Rights Reserved
  • Memory - Matrix Conditional on One EntitySnippet of intelligence in RefID 1234 John Smith flew to London on 14 Jan 2009 aboard United Airlines to meet with Prime Minister for 2 hours on a rainy day. (Vector description can also include fact, sentiment, and other marks to learn) • Snippet scope – More precise than document • Semantic triples – Adds context to pairwise • Statistical frequencies – Adds relevance to graph – Allows for calculating correlation • Materialized storage – Instant exploitation rather than query-compute over raw data 11 4/11/2013 Saffron Technology, Inc. All Rights Reserved.
  • Multiple Entity Matrices From Just One SnippetSnippet of intelligence in RefID 1234 John Smith flew to London on 14 Jan 2009 aboard United Airlines to meet with Prime Minister for 2 hours on a rainy day. refid 1234 1 1 1 1 1 1 1 1 1 1 place London 1 1 1 1 1 1 1 1 1 1 person John Smith 1 1 1 1 1 1 1 1 1 1 Red = Entity person time Prime Minister 1 14-Jan-09 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Green = Other Attribute verb verb flew 1 meet 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 keyword rainy 1 1 1 1 1 1 1 1 1 1 Yellow = New data keyword day 1 1 1 1 1 1 1 1 1 1 keyword aboard 1 1 1 1 1 1 1 1 1 1 duration 2 hours 1 1 1 1 1 1 1 1 1 1 Prime Minster John Smith meet aboard rainy London flew day 2 hours 1234 14-Jan-09 Organization United Airlines verb verb keyword keyword ketword refid duration time person person place refid 1234 1 1 1 1 1 1 1 1 1 1 refid 1234 1 1 1 1 1 1 1 1 1 1 person John Smith 1 1 1 1 1 1 1 1 1 1 place London 1 1 1 1 1 1 1 1 1 1 person Prime Minster 1 1 1 1 1 1 1 1 1 1 person John Smith 1 1 1 1 1 1 1 1 1 1 organization United Airlines 1 1 1 1 1 1 1 1 1 1 organization United Airlines 1 1 1 1 1 1 1 1 1 1 time 14-Jan-09 1 1 1 1 1 1 1 1 1 1 time 14-Jan-09 1 1 1 1 1 1 1 1 1 1 verb flew 1 1 1 1 1 1 1 1 1 1 verb flew 1 1 1 1 1 1 1 1 1 1 verb meet 1 1 1 1 1 1 1 1 1 1 verb meet 1 1 1 1 1 1 1 1 1 1 keyword rainy 1 1 1 1 1 1 1 1 1 1 keyword rainy 1 1 1 1 1 1 1 1 1 1 keyword day 1 1 1 1 1 1 1 1 1 1 keyword day 1 1 1 1 1 1 1 1 1 1 keyword aboard 1 1 1 1 1 1 1 1 1 1 keyword aboard 1 1 1 1 1 1 1 1 1 1 duration 2 hours 1 1 1 1 1 1 1 1 1 1 duration 2 hours 1 1 1 1 1 1 1 1 1 1 John Smith meet aboard rainy London flew day 2 hours 1234 United Airlines 14-Jan-09 Prime Minster John Smith meet aboard rainy flew day 2 hours 1234 14-Jan-09 United Airlines Place Person London Prime Minister verb verb keyword keyword ketword refid duration organization time person place verb verb refid keyword keyword ketword duration organization time person person 12 4/11/2013 Saffron Technology, Inc. All Rights Reserved.
  • Incremental Update Observing Another Snippet John Smith also met with House of Commons while in London.13 4/11/2013 Saffron Technology, Inc. All Rights Reserved.
  • The 3 Cs: Connections and Counts in Context John Smith flew to London on 14 Jan 2009 aboard United Airlines to meet with Prime Minister for 2 hours on a rainy day. John Smith also met with House of Commons while in London. refid 1234 1 1 1 1 1 1 1 1 1 1 place London 1 1 1 1 1 1 1 1 1 1 person John Smith 1 1 1 1 1 1 1 1 1 1 organization United Airlines 1 1 1 1 1 1 1 1 1 1 time 14-Jan-09 1 1 1 1 1 1 1 1 1 1 refid 1234 1 1 1 1 1 1 1 1 1 1 verb flew 1 1 1 1 1 1 1 1 1 1 place London 1 1 1 1 1 1 1 1 1 1 verb meet 1 1 1 1 1 1 1 1 1 1 person John Smith 1 1 1 1 1 1 1 1 1 1 person Prime Minister 1 1 1 1 1 1 1 1 1 1 keyword rainy 1 1 1 1 1 1 1 1 1 1 time 14-Jan-09 1 1 1 1 1 1 1 1 1 1 keyword day 1 1 1 1 1 1 1 1 1 1 verb flew 1 1 1 1 1 1 1 1 1 1 keyword aboard 1 1 1 1 1 1 1 1 1 1 verb meet 1 1 1 1 1 1 1 1 1 1 keyword rainy 1 1 1 1 1 1 1 1 1 1 duration 2 hours 1 1 1 1 1 1 1 1 1 1 aboard John Smith meet rainy London flew day 2 hours 1234 United Airlines 14-Jan-09 keyword day 1 1 1 1 1 1 1 1 1 1 keyword aboard 1 1 1 1 1 1 1 1 1 1 duration 2 hours 1 1 1 1 1 1 1 1 1 1 Prime Minster John Smith meet aboard rainy London flew day 2 hours 1234 14-Jan-09 Person Organization Prime Minister No change verb verb keyword keyword ketword refid duration organization time person place United Airlines verb verb keyword keyword ketword refid duration time person person place14 4/11/2013 Saffron Technology, Inc. All Rights Reserved.
  • From SQL to Matrix of Connections and CountsIncremental, Order_Supplier order_id supplier_id order_date company country contactNon-parametric, 500125 10000 05/12 IBM USA JohnNon-functional 500126 10001 05/12 HP USA MaryLearning 500127 10001 05/13 HP USA Maryand 3Novelty-detection(know what you 2don’t know) 1 15 4/11/2013 Saffron Technology, Inc. All Rights Reserved.
  • Triple Store Store for Billions of Matrices John who do you fly to London? John Matrix for John Memory City:Basra 1 1 1 1 Person:F. Demet 1 1 1 1 1 1 2 1 1 1 Fin_Amt:$100,000 Item:Uranium 1 1 1 1 1 1 1 1 1 City:Anah Mat_Amt:4 tons 1 1 1 Country:Iraq 2 Row for London 2 2 1 Material:Plutonium 1 1 1 1 2 City: An Najaf Person: S. Hussein 1 Mat_Amt:28 tons London Memory Cell City:Anah City:Basra Country:Iraq City:An Najaf Item:Uranium Mat_Amt:4 tons Person:F. Demet Mat_Amt:28 tons Material:Plutonium Person:S. Hussein Fin_Amt:$100,000 Counts Columns for Air Carrier Associative Memory Triples 17 2 1 • A massive connection engine • A massive correlation engine AA UA BA • Pre-joined, pre-scanned, co-local
  • Reason By Similarity Using Nearest Neighbors Example: rank the animals by their similarity to platypus. Step 1: Given platypus, recall attributes Step 2: Given attributes, recall animals At increasing scale, only associated attributes and target category are accessed and only non-zero associations even exist 3 5 3 5 2 2 1 Similar Animals (entropy-based attribute “shrinkage” also computed but not known here)
  • Reason By Similarity Using Counts Example: rank the animals by their similarity to platypus. +----------------+-------------+ animal blood birth legs hair scales fins | animal | similarity | horse warm livebearer 4 y n n +----------------+-------------+ | platypus | 6 | dog warm livebearer 4 y n n | dog | 5 | | horse | 5 | dolphin warm livebearer 0 y n y | dolphin | 3 | | alligator | 3 | platypus warm eggbearer 4 y n n | thresher shark | 2 | | tiger shark | 2 | trout cold eggbearer 0 n y y | trout | 1 | thresher shark warm livebearer 0 n n y +----------------+-------------+ tiger shark cold eggbearer 0 n n y alligator cold eggbearer 4 n y n SELECT a.animal, ((CASE WHEN a.blood = b.blood THEN 1 ELSE 0 END) + (CASE WHEN a.birth = b.birth THEN 1 ELSE 0 END) + SQL is tricky (CASE WHEN a.legs = b.legs THEN 1 ELSE 0 END) + (CASE WHEN a.hair = b.hair THEN 1 ELSE 0 END) + (CASE WHEN a.scales = b.scales THEN 1 ELSE 0 END) + (CASE WHEN a.fins = b.fins THEN 1 ELSE 0 END)) AS similarity FROM animals a, animals b Touches entire table WHERE b.animal = platypus ORDER BY similarity DESC;
  • How Are Memories Implemented?19 4/11/2013 ©2011 Saffron Technology Inc. All Rights Reserved.
  • Matrices Are Efficient Implementations of MemoryWhat Teradata did for databasesand Google did for search, Category:value ID Encoding Other StuffSaffron provides for memories 10 high bits encode for 1024 categories 42 low bits for 4.2B values in categoryMatrix representation Values are co-local within lower bit range Global consistent order (no sort joins)provides Matrix-Row Partitioning – Co-locality Row ID Matrix ID Matrix ID:: Row ID • Minimizes the computing Row ID Row ID Matrix ID:: Row ID Matrix ID:: Row ID resources needed Row ID Matrix ID:: Row ID Row ID Matrix ID:: Row ID – Distribution Row ID Matrix ID:: Row ID Hyper-sparse encoding • Maximizes the computer Col ID Col ID Col ID Col ID Col ID Col ID resources available – Compression Hyper-sparse Encoding • Stores connections and counts efficiently End, zero run, 010110 ID 0 + 22 = 22 End, integer count, 000110 Return ID 22 : Count 6, ID 22++ = 23
  • Atom Encoding for Co-locality of Distribution ID -- 64bit word Other Stuff10 high bits encode for 1024 categories 42 low bits encode for 4.2B values in categoryAll values are ensured co-local within lower bit range This category/value bit ratio can be configuredScanning IDs past category range terminates scan Global consistent identity and order (no sort join) Distributed Atom Table Server Cluster Category Value Category Value ID Globally Shared but Also Distributed Person Null * hash hash Person John Smith * Person Prime Minster * shift Place Null * Place London * Place Washington D.C. * … … … ID From Memories 21
  • Partitioning and Distribution with DHC Formal Matrix Matrix-Row Partitioned Matrix ID Row ID Matrix ID:: Row ID Row-Segment Partitioned Row ID Matrix ID:: Row ID Row ID If Matrix ID:: Row ID Row ID Matrix ID:: Row ID If Large Large Row ID Matrix ID:: Row ID Row ID Matrix ID:: Row ID Partitioned then Hyper-sparse encoding Hyper-sparse encoding Col ID Col ID Col ID Col ID Col ID Col ID Small Matrix ID Sparse encoding Server Cluster Server Lookup Example Person: John Smith :: Place: London  Matrix ID 123 :: Row ID 456Matrix ID:: Row ID DHC  hash (123::456) mod 10  Answers located on Server 4 of 10 Cluster Size 22
  • Hyper-Sparse Coincidence Matrix CompressionCodex is a combination of methods for ID connections and counts Matrix for A Zero-run “pointer” to next connection ID over 2^54 distance Variable size integer to span from initial low counts to any BigNum Row for B 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1 0 End, zero run, 010110 End, integer count, 000110 IDs and Counts ID 0 + 22 = 22 Return ID 22 : Count 6, ID 22++ = 23 for all returns of ?C 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 Control Bits Legend 0 End byte Continue, zero run, 110011… Continue, zero run, 1100111000100… 1 Continue 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 Zero length End, zero run, 11001110001001000011 End, integer count, 000100 1 Counter ID 23 + 1688710 = 1688733 Return ID 1688733 : Count 5 23
  • Memory Physics: Small and Local is Also Fast Compared to a database, a memory-base responds to queries faster and faster as Compared to Tables and Graphs data grows larger*. No table joins • Pre-joined No table scans • Pre-counted *As the number of observations exceeds the number of attributes, which is a common No semantic sort joins • Pre-ordered property of real world observations. Therefore, evolution selected for brains to be No graph pointer-chasing • Co-local memory-bases, not databases.Some Proofs in the Pudding • Reached over 20M triples/min on 10 nodes for ingestion • Near-linear scalability of ingestion (0.99 slope) tested from 1 to 10 nodes • Distributed cluster installation and management tested to 64 server nodes • Semantic expansion of 2.6 X at recent customer, world-record 20 bytes/triple • Sub-second “real time” Web Service - 110 msec mean • “Dirty secret” of semantic graph stores: toward an order of magnitude larger storage costs than SaffronMemoryBase24 4/11/2013 ©2011 Saffron Technology Inc. All Rights Reserved.
  • Growing Agreement for Matrix Representation  Graphs and matrices are formally equivalent, but…  Matrices better address – Syntactic Complexity – Ease of Implementation – Performance
  • What Do Saffron’s Customers Do? Make sense of “things” in their environment – how they are connected, similar, different Identify threats and opportunities based on a real time understanding of past and current experience Predict what may happen next with conditions based, personalized knowledge of each person or “thing”26 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Converging Analytics of Hybrid of DataAnalytic Processes: Descriptive, Diagnostic, Predictive, Prescriptive STRUCTURED Market-Driven HYBRID Market-Driven CONTENT Convergence Convergence Source: Gartner27 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Connecting the Dots - Entity AnalyticsIf you dont go after the network, youre “Find without Looking” in MROnever going to stop these guys. Never. Global Fortune 100 Risk Mitigation for Assets and Causes World’s Largest Nonprofit Associative Targeting with SF/SOF 500,000X faster to read everything www.saffrontech.com/solutions/demo-sensemaking/
  • Disrupting Illegal Trafficking Mission: Find Traffickers – weapons, drugs, people Problem: Huge data sets – many data sources, unstructured & structured, many “things” in the data Don’t know what to look for – anything can matter Attributes always changing to create false signals, aliases With Saffron: – Unify all the data – Automatically find connections for the “things” in the data as it arrives – Find the people and organizations behind the aliases – and who and what they are connected to – Anticipate what they are going to do next – Illuminate the dots that matter29 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Predictive Maintenance Unify: – Pilots’ intuition & sensory recall – Complete maintenance records – Mechanics’ knowledge and experience Identify signals and patterns Learn from “one”, apply to others based on conditional similarity 100% accuracy 2% false positive improved from 63% accuracy 18% false positive 30 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Anticipating Events at Nuclear Power Plants Thousands of safety events occur annually. Some are minor and don’t require much action, some aren’t so. We find patterns in masses of reports over 40 years of experience – Across Events, Plants, Owner Operators, Systems, Components, Manufact urers, People? With Saffron we know where have we seen this event or situation before, what happened, what was done, and where it might happen again.31 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Predicting Threats For The Gates FoundationThe Bill and Melinda Gates Foundation exploits predictive analytics to ensure that its globalmission is advanced safely and securely.This analytical capability provides a 360 view of known risks and identifies emerging risks.Saffron’s threat scoring system collects and analyzes unsolicited inbound correspondenceproviding real-time diagnostic intelligence.The Gates Foundation synthesize a factor and motive based conceptual model from the RandCorporation with behavioral modeling by Saffron Technologys Associative Memory capabilities tooperationalize threat prediction. If a correspondence is classified as high threat – itis automatically routed to The Foundations protective intelligence specialists. They utilize Saffronto determine root cause to take appropriate action.
  • Predict Thread Assessment at Gates FoundationStrategic Early Warning System (SEWS) - Igor Ansoff – Scan environment to detect weak signals & rare events to predict surprises & discontinuities – Implemented using RAND and SaffronRAND Model – Thread Assessment ModelSaffron – operationalizes RAND model – Unification of hybrid data, finding pattern and scoring33 4/11/2013 Saffron Technology, Inc. All Rights Reserved
  • PredictionETL – Ingest Hybrid Data – unify hybrid data from individual communication incidents/events – define groups and individuals as risk vectors – build multidimensional associative arraysCorrelate and select right attribute combinations – calculate several statistical measures – calculate partial correlation and lift • 2-way and 3-way mutual information – select the right attributes and attribute combinations • ranking the partial correlations (plus some engineering)Prediction – use instance based learning, or – use traditional statistics like regression34 4/11/2013 Saffron Technology, Inc. All Rights Reserved
  • Forensics and Prediction in One PlatformAnalytic Process Analytic Capability Current Use Cases Saffron ApplicationDISCOVERY Find without knowing what to look for: • All-Source Intelligence for SaffronAdvantageWhat is in my data? • Rapid visualization Everyone • Multiple, diverse sources • Experience Based Knowledge • New and Relevant Intel based on Management Connections • National SecuritySENSE MAKING Find patterns of similarity and • Due Diligence, SaffronAdvantageWhat is happening? connections in hybrid data: • Sales Intelligence, Root Cause • Who / what is connected or similar to Analysis, Entity De-duplication, whom / what? How, when, where, Risk Intelligence why?PREDICTIVE Model-free statistics: • Experience-Based Predictive SaffronAPIs,What will happen? • Scoring of threat, risk, response Maintenance SaffronMemoryBase • Automatic pattern discovery • Intelligence Surveillance and • What can we anticipate based on Reconnaissance Tasking contextually similar situations?PRESCRIPTIVE Leverage experience from past situations, • MRO Parts Replacement, SaffronAPIs,What should I do? actions and outcomes: • Best Practices, SaffronMemoryBase • Have we seen this before? What did the • Intelligence Surveillance, best do? What did we do? Did it work? Reconnaissance Decisions What should we do? 35 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron Memory Base - Architecture Saffron: Advantage Partner and Customer INSIGHT SENSE MAKING, DECISION SUPPORT Applications Analytic Reasoning REST APIsREASONING ANALOGIES, CONNECTIONS, CLASSIFICATIONS, EPISODIC PATTERNS, TEMPORAL TRENDS, and CUSTOMER DEFINED Auto Recall – RESTful Navigation of the MemoryBase SaffronMemoryBaseKNOWLEDGE ENTITY CONNECTIONS, COUNTS AND CONTEXT SPACES, MEMORIES, MATRICES, ROWS, COLUMNS SaffronAdmin Text Analysis for INGESTION Unstructured Data DATA INGESTION TEMPLATES FOR STRUCTURED DATA, Entity Exttraction NLP DATA Structured Unstructured Streaming Other 36 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron Open Platform Collect and Ingest and Unify Store and Analyze View and Report Harvest Hybrid Data • Connections & • SaffronAdvantage™ • Unstructured • Data Dictionary Counts • Trends & Episodes Content • Ontologies • Semantic Analysis • Emerging Patterns • Semi-Structured Saffron Data • Name Lists • Statistical Analysis • Episodic Patterns •Special parsers for • Clustering & Pattern • Prediction Proprietary • Structured Data disparate data types • Prediction Saffron API Connectors • BI – RDBMS • NLP • Hadoop MapReduce • UI (JDBC, SQL) (ThingFinder, Attensity, (Tableau) WordNet) •R • Web & Deep Web •Graphs (Bright Planet) • Hadoop (Tom Sawyer, ) • Streams like Twitter • BI (Attensity) (IBM Congos, SAP BOBJ) • Text, PDF, Email Open API Voice Connectors (Nuance, IVR)37 4/11/2013 © Saffron Technology, Inc. All rights reserved.
  • Consumer Intelligence - Analytics 2.0Customer CLV & Influencer Revenue• Predict customer life time & influencer revenue• Model free prediction beyond structured and parametric models based on theory• Combine transactional BI and unstructured web data (sentiment, motivation, etc.) to quantify network effects, conversion, etc.Gartner: CMO will spend more than CIO, 2015Actionable Insights for Chief Scoring OfficerUse Case – Churn• Understand individual customer to prevent revenue lossUse Case – Increase Customer Acquisition• Improve conversion through influencer networkUse Case – Upselling and Cross Selling• Move customer to higher value add by understanding sentiment and motivation
  • Distributed Storage And ComputingSQL NO-SQL NO-SQLSCHEMA SCHEMA-LESS SCHEMA-LESSRDBMS TRIPLE STORES KEY VALUE STORES • BigTable, BigQuery • Hadoop • Cassandra, HBase • Dynamo • Saffron • GFS • GraphDBsdistribution impossible optimal trade off between embarrassing parallel localization and distribution • no machine learning • no HPCSQL is dead ASYNCHRONOUS COMPUTING Hadoop is too simple aka distributed agents or • data has to be copied message passing • no scatter gather • triples are rebuilt from key values MULTI-DIMENSIONAL ASSOCIATIVE ARRAYS UNIFY • Natural combination of structured & unstructured • 3D to 2D - Graphs mapped onto matrices • allow for simplicity AND high performance of linear Algebra • Semantics and statistics • Counts connect to any type of statistics 39 4/11/2013 ©2013 Saffron Technology, Inc. All rights reserved.
  • Twitter @paul_hofmann Email phofmann@saffrontech.com Homepage www.paulhofmann.net Blog www.paulhofmann.net/blog Slide Share www.slideshare.com/paulhofmann LinkedIn www.linkedin.com/in/hofmannpaul40 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron MemoryBase® More41 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Why Couldn’t a Larger, Better-Funded Company Do the Same Thing?42 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Fourteen Associative Memory Patents43 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Unfair Advantages Enterprise-ready product – operational with reference customers Real-time analytics – unifying complex, multi-structured data in one knowledge representation No rules or models –subject matter experts not data scientists or statisticians Semantics meets statistics – schema-free on hybrid data Incremental instance-based learning – as data arrives, in real time Leadership team – diverse, experience in customer value delivery, advanced technology invention and development, rapid growth adaptation, strategic account management, application development (IBM, SAP, PeopleSoft, JD Edwards) IP – 14 unique patents in schema free semantics and statistics for associative memory44 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • How Does the Customer Buy Saffron? How Fast Can You Implement a Saffron Solution?45 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Delivery and Services Licensing: – Based on CPU cores used with volume-based pricing – Perpetual or Term Subscriptions Delivery: – On-Premise or Private Cloud – Hosted, Managed Service Implementation: – Pick a problem, focus, solve it, then expand – Use case opportunity focused – Project team combined Saffron, partners and customer – 90 day implementation cycles are the norm46 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Rapid Implementation ConstructELAPSED TIME Day Day Day Day 1 - 30 30 - 90 60 - 120 120+FOCUS Configure & Define Operationalize Roll Out Develop • • End User Training – • Launch Operational Confirm IT or Hosting • Project Team Training Resources Application, Sense Roll Out • Start Up SaffronAdmin • Complete Software Making • Expand UI and • Ingest Sample DataACTIVITIES Licenses & SOW • End User Guides Services Capabilities Sources as needed • Project Team • Change Management • Confirm Use Cases with Orientation Advantage • Fine Tune • Understand Use Cases • Tune, Repeat, and • Validate with Early • Review Data Sources & Verify Adopter Customers Parsing Requirements • Integration with Legacy • Lock Down V1 for • Define Success Metrics Data Bases Operational Roll Out • Complete Project Plan 47 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • What About the Competition?48 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Competitive AND Complimentary Positioning Complements  Efficient connection store for “higher” AI to extract more formal relationships and control COMPLEMENT UNIFY memory with business rules UNIFY  Massive frequency store for any “flavor” of statistics, including use of Saffron to quickly discover and build more traditional models Architectures Semantic Associative  Saffron methods of partitioning and Stores Memories compression fit well with column-oriented infrastructure as one storage solution for both data and memories COMPLEMENT  Saffron is additive, not replacing data stores, but adding a memory base to a polymorphic architecture Data Statistical Visualization Packages Visualizations  Saffron APIs add intelligence to existing business applications and data-oriented visualizations providing smarter, faster access for business owners and operational users 49 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron vs. Selected Competitors Advanced faceted search. No complete semantic graph and no counts for advanced statistics. Hybrid logic and statistics. No incremental ingestion/learning, real world is not a jeopardy question. Symbolics with add-on statistics. No unified representation, only a “bolt on” of traditional statistics. Lead in statistics, some text analytics. No semantic graph for hybrid analytics in combination. Associative ”experience” GUI but not an associative store. No complete graph, no count statistics. Manual link construction in GUI. No automated intelligence, no deep analytics. Biologically inspired AI toolkit. Nascent and unclear mix of traditional methods and new claims.50 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron’s unique capabilities v. MapReduce MapReduce Saffron MemoryBase Distributed batch processing Distributed real-time transactions New attributes -> code change Real-time update: no manual effort Low level assembler-like API High level, declarative API -> no programming Generic framework Optimized solution for advanced analytics51 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Saffron’s unique capabilities v. RDBMS RDBMS Saffron (Matrices) Table joins for semantics Pre-joined matrices Predefined schema Schema-less Limited keys & sorting joins Everything is a key, globally sorted No natural partitioning Shared-nothing parallelism Structured data is fact-based Knowledge is more exploitable Nearest-neighbor is infeasible Nearest-neighbor is instant52 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • What are Industry Analysts Saying about Saffron?53 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Converging Analytics of Hybrid of DataAnalytic Processes: Descriptive, Diagnostic, Predictive, Prescriptive STRUCTURED Market-Driven HYBRID Market-Driven CONTENT Convergence Convergence Source: Gartner54 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • It’s The Dawning Of The Age Of BI DBMS Quickly changingrequirements Associative “Use associative index when you can’t predict the future but need to prepare for anything”, 2011 Disparate data
  • View on BI Ecosystem Commercial distribution Component Hadoop Apache Projects or Hadoop integrationApp dev / scripting Pig, Cascading, WebHDFSIntegrate and transform Pig, Sqoop, FlumeDBMS SQL Hive, DerbyDBMS NoSQL Cassandra, Hbase
  • Some Cool Properties of the Bit Vector Space57 4/11/2013 ©2012 Saffron Technology, Inc. All rights reserved.
  • Properties Of n-bit Vector Space Most distances are at the equator = n/2 n/2 is indifference distance or orthogonal – If d(x,y) = n/2  x and y are orthogonal n/2 is the mean distance Standard deviation for big n: n=sqrt(n)/2 = r of n-dim sphere In case of Tesseract (4-dimensional cube) – 4 categories, 2 values each distance Number of elements 4 1 3 4 2 6 1 4 SMB is perfect 58 4/11/2013 for outliers and similarities Saffron Technology, Inc. All Rights Reserved
  • Tesseract – 4-Cube59 4/11/2013 ©2011 Saffron Technology Inc. All Rights Reserved.
  • Bit-Vectors For The Tesseract – 4-Cube60 4/11/2013 ©2011 Saffron Technology Inc. All Rights Reserved.
  • Statistics of Distances in SMB Most distances are at the equator = n/2 n/2 is indifference distance or orthogonal Standard deviation for big n = sqrt(n)/2 For n=1000, sigma = sqrt(n)/2 ~ 16, n/2 = 500 .999999 % of data lie within 5 sigma ~ 80  only 1 out of 1 MM of the space is closer than ~ 420 bits and further than ~ 580 bits 61 4/11/2013 Saffron Technology, Inc. All Rights Reserved
  • Classification And Prediction1. Classify a given vector w/r to a given training set2. Classify a given set into classesUniversal approaches like, k-nearest neighbor (k-means), randomforests, etc. and modeling approaches likelogistic regression, decision trees, cluster, Bayesian networks, etc.SMB Hamming distance - built in Entropy – very easy to calculate and no original vectors needed – used for finding patterns, clusters, classification and prediction – mutual information and interaction information 62 4/11/2013 Saffron Technology, Inc. All Rights Reserved