AI-SDV 2021: Francisco Webber - Efficiency is the New Precision

Efficiency is the New Precision
Semantic Supercomputing in the Zettabyte age
Francisco De Sousa Webber

Co-Founder & CEO

f.webber@cortical.io

2
Big Bang: Data Explosion
Transactional Data
Human Files
Social Interactions
M
a
c
h
i
n
e
G
e
n
e
r
a
t
e
d
D
a
t
a
(
I
o
T
)
Terabyte
Petabyte
Exabyte
Zettabyte
Mainframe/Mini Era PC/Client Era Internet Era Virtualisation Era
2025
Data
Volume

3
Current Status
ML & AI is the Answer … Is it?
Productivity Decreases
(Text) Content Increases
Current Von Neumann Computer Platform
Performance Stagnates

4
Technical Perspective
•Von Neumann Computing

•Statistical ML

•Semantic Folding

•Semantic Computing

5
Von Neumann Computing
Computing Unit
Processor
Memory
Adress Logic
Arithmetic Unit
Control Unit
I / O
Data
Address
World

6
Von Neumann Computing Limitations
Computing Unit
Processor
Memory
Adress Logic
Arithmetic Unit
Control Unit
I / O World
Address
Bottleneck
1000+
Instructions
Sequencial
Access
Every instruction
passes over Bus

7
Statistical Machine Learning
Machine
Learning 
Use Case Data
Annotated
data
Training Data
Training
Use Case
Model
Training
Engine
Inference
Engine
Data In-Flow
Data 
Out-Flow
Test Data

8
Statistical Machine Learning Limitations
Machine
Learning 
Use Case Data
Annotated
data
Training Data
Training
Use Case
Model
Training
Engine
Inference
Engine
Data In-Flow
Data 
Out-Flow
Test Data
Insufficient
Data
Slow
Training
Model
Imprecision
Inference
Latency
Manual
Effort

9
Statistic AI & ML Problem: Efficiency
Need to improve the Principle not just increase the Computational Power
Findability Issue

Von Neumann Gap

Exponential Power Consumption

Million Model Multiverse
Both are cars, for people, using “water gas” - the difference between them is efficiency ….
Initial Principle - Steam Latest Principle - Hydrogen Fuel Cell

10
Statistical Modelling: Findability Issue
The Google field of view
The actual Internet
The Google “virtual view” of the internet
The “blind spot”
user
internet is growing
visible internet is
growing slowly
invisible internet
is growing fast
The number of new pages
grows faster than the
number of keywords
pointing at them.

11
Statistical Modelling: Von Neumann Gap
1980 1990 2000 2010 2020
Processing Speed
Data Amount
GAP
Current computing paradigm is
insufficient for the growing data load:
• Increased Error Rates

• Increased Power Consumption

• Increased Processing Delays

12
Statistical Modelling: Exponential Energy Need
0%
4%
8%
2018
2030
Current Global Energy Consumption
of Computing Devices 
equals that of Global Air Transport
In 2030 Global Energy
Consumption of Computing
Devices will reach that of Global
Automobile Transportation

13
Statistical Modelling: Million Model Multiverse
Individually labeled data for
supervised learning
Local

Use-Case
Local
Statistical
Model
Local
Training
Data
Local
Gold
Standard
Individually trained model
Individually collected and
prepared training data
No network effects

14
Statistical Modelling: Technology Impact
• Findability Issue: Fake News

• Von Neumann Gap: Climate Change

• Stuck in the Average: Innovation Gap

• Phased ML-User Profiles: Populism
When its hard to find
information its also hard to find
the truth
Statistics Averages: Innovation
is not made by Majorities
Green Computing is beyond
the Von Neumann Gap
Statistical ML Models
facilitate Opinion Meddling

15
The Solution: Semantic Folding
Based on recent findings in Neuroscience 
Implemented as Unsupervised Machine Learning approach

Replaces complex statistical modelling with Analogical Computation

16
Semantic Folding: Analogical Computation
“signed contract” Overlap 36% ”done deal”
“star trek” Overlap 1%
Similar

Meanings
Different

Meanings
Context:

Bank,
Account,
Holder,
Payment,  
Tax,  
In-house,
Manager
”done deal”

17
Semantic Fingerprinting
Training of the Semantic Space
Reference Material
Semantic Word Fingerprint Dictionary
Converting Text into Semantic Fingerprints
Use Case Data Semantic Text Fingerprint
Comparing Semantic Fingerprints

18
Level 1: Word Fingerprints
organ
Fingerprint Generation
“organ”
Context 3:

church, 
altar,
baroque,
architecture,
renaissance
Context 1:

liver, 
heart, 
muscle,  
endothelia,
body,
anatomy
Context 2b:

piano, 
guitar,
trombone,
flute,
trumpet,
quartet,
music
Contexts 2a: 
composer,
baroque,
music, 
score,
Johann
Sebastian
Bach

19
Level 2: Text Fingerprints
organs and pianos are musical instruments
organs and pianos are musical
instruments
Aggregation + Sparsification
1 2
3
4

20
Many Languages - One Semantic Fingerprint
Concepts & their Representations are Stable Across Languages
philosophy
EN
philosophie
FR
filosofía
ES
философия
RU
‫فلسفة‬
AR
哲學
ZH

21
Example document Most similar documents
Ordered along the users
information need
query result set ranking
Similarity Engine
document index
NLU Primitive 1: Semantic Search

22
Email 12
Semantic filter FP
Positive class
Negative class
Semantic

space

trained for
Compliance
SH1_email

AND
SH2_email

AND

SH3_email
Email 189 Email 2443
Email 12 Email 189 Email 2443
NLU Primitive 2: Semantic Classification

23
Socrates 470/469 – 399 BC was a classical Greek (Athenian)
philosopher credited as one of the founders of Western
philosophy. He is an enigmatic figure known chiefly through the
accounts of classical writers, especially the writings of his students
Plato and Xenophon and the plays of his contemporary
Aristophanes. Plato's dialogues are among the most
comprehensive accounts of Socrates to survive from antiquity,
though it is unclear the degree to which Socrates himself is hidden
behind his best disciple, Plato.
Aggregation
[

“plato",

“socrates",

“philosopher",

“aristophanes",

“antiquity",

“writings",

“xenophon",

“dialogues",

“disciple",

“philosophy"

]
Text Fingerprint
Maximize for Similarity
NLU Primitive 3: Keyword Extraction Extract Keywords
Word Fingerprints

24
There are a number of remedies for
snoring, but few are proven clinically
effective. Popular treatments include:
Mechanical devices. Many splints,
braces, and other devices are
available which reposition the nose,
jaw, and/or mouth in order to clear the
airways.
Nasal strips that attach like an
adhesive bandage to the bridge of the
nose are available at most drugstores,
and can help stop snoring in some
individualss. Continuous positive
airway pressure. Several surgical
procedures are available for treat? ing
chronic snoring.
Snoring usually worsens when an
individual sleeps on his or her back, so
sleeping on ones side may alleviate
the problem. Those who have difficulty
staying in a side sleeping position may
find sleeping with pillows behind them
helps them maintain the position
longer.
Retina Engine SVM
Random Forrest
DL Network
Algorithm 1
Algorithm n
• Classification

• Clustering

• Prediction

• Generating

• Computing

• Analyzing
Semantic Folding based Machine Learning

25
Semantic Engine Semantic Search Semantic Annotation
Document 
Classification/Clustering
Keyword Extraction
Context Term Generation
Information Discovery
Expert Finding
Text Analytics
Risk Analysis
Business Intelligence
Lease/Credit 
Agreements
Cortical.io Engines

26
Hardware Acceleration for Semantic Folding
Match one Query Fingerprint against
an unlimited number of Document
Fingerprints
Match one Filter Fingerprint against a
stream of incoming Fingerprints
Enterprise Search

Discovery Search

Web Search

Social Media Profile Search
Desktop: Each Board searches up to 1 Billion Fingerprints per second
Enterprise: Each Server searches up to 10 Billion Fingerprints per second
Web Scale: Each Rack searches up to 100 Billion Fingerprints per second
Real-time Document Classification

Email Filtering - Routing

DeepPacket Inspection

Social Media Topic Detection

27
Semantic Super Computing Platform
Retina Engine
Converter
Module
Similar Term

Module
Context
Module
Compare
Module
Retina Database
Retina Search
Document Index
Fingerprint
Matcher
Search

Re-ranker
Retina Filter
Filter Bank
Fingerprint
Matcher
Filter

Re-ranker
[
Xilinx
Alveo
Host System
Storage
CPU-cores Memory
X86
Server
Application Server
Retina System
Administration App
Email-Filter App Semantic-Search App Next App
Integration Layer
Identity Access
Management
SMTP
Connector
RDB
Connector
CMS
Connector
DMS
Connector
File Service

Connector
Email-Filter API Semantic-Search API Next Application API
Web Service

Connector
BPM
Connector Management API

&

Monitoring API

28
Comparing the Leading NLU Approaches
"The Enron Email
Corpus Archived
2011-03-08 at the
Wayback Machine"
Retrieved March 5, 2011.
Retina Engine (CPU) Retina Engine (FPGA)
Pure Keyword Baseline (CPU)
FastText (CPU)
Doc2Vec (CPU)
Word2Vec (CPU)
BERT (GPU)
BERT (CPU)
70%
75%
80%
85%
90%
0% 0% 1% 10% 100% 1000%
Precision
Speed ---> faster (logarithmic)
Classification Enron email dataset: Farmer Set
PyTorch 
(bert-base-uncased)
PyTorch 
(bert-base-uncased) 
AWS g3.8xlarge EC2
Scikit-learn  
TfidfVectorizer
official  
pre-trained model 
Facebook
Gensim  
implementation
Pre-trained  
Google Model
1 x Xilinx 
Alveo 250
+

29
• Banking:

• E-mail & Chat Compliance Monitoring

• Credit Risk Analysis

• CRM:

• Customer Intent Analysis

• Legal:

• Contract Intelligence

• Regulatory Process Optimization

• Financial Services:

• Investment Signal Extraction from News
Streams

• Life Sciences:

• Information Discovery

• Media:

• Viewer Stream Analytics
• Automotive:

• Handbook Search

• Car Supplier Management

• Consolidation of Car Terminology

• Technical Support:

• Support Intelligence

• Social Media:

• Organic Topic Mining

• Commerce:

• Catalogue Management & Automation

• Human Resources:

• Job Description - Resume Matching
Demonstrated Semantic Folding Use Cases

30
Simplicity
One Algorithm, One Operator, One Data Format
Compositionality
Words, Sentences, Paragraphs, Documents
Analogy
Normalized Representation, Bitwise Similarity
Modelability
Unsupervised Semantic Model Generation
Efficiency
Small Amounts of Reference Data
Scaleability
One Semantic Model Many Use-Cases
Replicability
Same Use-Case in New Domain
Inspectability
Refinement, Debugging, Verification
Robustness
“Graceful Failing”
NLU by Semantic Folding

info@cortical.io
Global Data Sphere (Zettabyte)
Transactional Data
Machine Generated Data
Human Generated Data
Social Media Data
ML-Data
Text-ML-Data
Semantic Folding Potential Market
L
O
G
D
a
t
a
S
e
n
s
o
r
D
a
t
a
T
e
x
t
D
a
t
a
T
e
x
t
D
a
t
a
Market Potential - Semantic Folding

AI-SDV 2021: Francisco Webber - Efficiency is the New Precision

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI-SDV 2021: Francisco Webber - Efficiency is the New Precision

Similar to AI-SDV 2021: Francisco Webber - Efficiency is the New Precision (20)

More from Dr. Haxel Consult

More from Dr. Haxel Consult (20)

Recently uploaded

Recently uploaded (20)

AI-SDV 2021: Francisco Webber - Efficiency is the New Precision