SlideShare a Scribd company logo
Big Data and the SP
Theory of Intelligence
Varsha Prabhakaran
S8 CSE B
Roll No: 43
1
Contents
Introduction
SP Theory of Intelligence
Problems of Big Data
Volume
Efficiency
Transmission
Variety
Veracity
Visualization
2
Introduction
SP theory of intelligence be applied to the management
and analysis of big data
Overcomes the problem of variety in big data.
Analysis of streaming data- velocity
Economies in the transmission of data
Veracity in big data.
Visualization of knowledge structures and inferential
processes 3
SP Theory of Intelligence
4
SP Theory of Intelligence
Designed to simplify and integrate concepts across artificial
intelligence, mainstream computing, and human perception
and cognition.
Product of an extensive program of development and testing
via the SP computer model.
Knowledge represented with arrays of atomic symbols in one
or two dimensions called “patterns”.
Processing are done by compressing information
Via the matching and unification of patterns.
Via the building of multiple alignments .
5
Benefits of the SP Theory
Conceptual simplicity combined with descriptive and
explanatory power across several aspects of intelligence.
Simplification of computing systems, including software.
Deeper insights and better solutions in several areas of
application.
Seamless integration of structures and functions within and
between different areas of application
6
SIMPLIFICATION OF COMPUTING
SYSTEMS
7
MULTIPLE ALIGNMENT: A CONCEPT
BORROWED FROM BIOINFORMATICS
8
Multiple Alignment
The system aims to find multiple alignments that enable a
New pattern to be encoded economically in terms of one or
more Old patterns
Multiple alignment provides the key to:
Versatility in representing different kinds of knowledge.
Versatility in different kinds of processing in AI and mainstream
computing.
9
Multiple Alignment
S → NP V NP
NP → D N
D → t h i s
D → t h a t
N → g i r l
N → b o y
V → l o v e s
V → h a t e s
10
Multiple Alignment
11
Multiple Alignment
12
S 0 1 0 1 0 #S
Multiple Alignment
Compression difference:
CD = BN-BE
BN :total number of bits in those symbol in the New pattern that are
aligned with Old symbols in the alignment
BE :the total number of bits in the symbols in the code pattern
Compression ratio:
CR = BN/BE;
13
Multiple Alignment
BN is calculated as:
h
BN = Ʃ Ci
i=1
Ci is the size of the code for ith symbol in a sequence, H1...Hh, com-
prising those symbols within the New pattern that are aligned with Old
symbols
14
Multiple Alignment
BE is calculated as:
s
BE = Ʃ Ci
i=1
where Ci is the size of the code for ith symbol in the sequence
of s symbols in the code pattern derived from the multiple
alignment.
15
Multiple Alignment
16
17
18
Big Data
19
Problems of Big Data and Solutions
Volume: big data is … BIG!
Efficiency in computation and the use of energy.
Unsupervised learning: discovering ‘natural’ structures in
data.
Transmission of information and the use of energy.
Variety: in kinds of data, formats, and modes of processing.
Veracity: errors and uncertainties in data.
Interpretation of data: pattern recognition, reasoning
Velocity: analysis of streaming data.
Visualization: representing structures and processes
20
Volume: Making Big Data Smaller
“Very-large-scale data sets introduce many data management
challenges.”
Information compression.
Direct benefits in storage, management and transmission.
Indirect benefits
efficiency in computation and the use of energy
unsupervised learning
additional economies in transmission and the use of energy
assistance in the management of errors and uncertainties in
data
processes of interpretation.
27
Energy, Speed and Bulk
In the SP theory, a process of searching for matching patterns
is central in all kinds of ‘processing’ or ‘computing’.
This means that anything that increases the efficiency of
searching will increase computational efficiency and,
probably, cut the use of energy:
Reducing the volume of big data.
Exploiting ***probabilities***.
Cutting out some searching.
22
Efficiency via Reduction in Volume
Information compression is central in how the
SP system works:
Reducing the size of big data.
Reducing the size of search terms.
Both these things can increase the efficiency
of searching, meaning gains in computational
efficiency and cuts in the use of energy.
23
Efficiency Via Probabilities
24
Efficiency Via Probabilities
25
26
Efficiency Via Probabilities
26
27
Efficiency Via Probabilities
27
Efficiency Via Probabilities
Statistical knowledge flows directly from:
Information compression in the SP system and
The intimate connection between information compression and
concepts of prediction and probability.
There is great potential to cut out unnecessary searching, with
consequent gains in efficiency.
Potential for savings at all levels and in all parts of the system
and on many fronts in its stored knowledge.
28
Efficiency via a Synergy with Data-Centric
Computing
29
Efficiency via a Synergy with Data-Centric
Computing
In SP-neural, SP patterns may be realized as
neuronal pattern assemblies.
There would be close integration of data and
processing, as in data-centric computing.
Direct connections may cut out some
searching
30
Unsupervised learning
Lossless compression of a body of information
Information compression, or “minimum length encoding”
remains the key.
Matching and unification of patterns
SP computer model has already demonstrated an ability
to discover generative grammars, including segmental
structures, classes of structure, and abstract patterns.
For body of information, I, the products of learning are:
a grammar (G) and an encoding (E) of I in terms of G
31
Product of Learning
32
Transmission of Data
• By making big data smaller (“Volume”).
• By separating grammar (G) from encoding (E), as in some
dictionary techniques and analysis/synthesis schemes.
• Efficiency in transmission can mean cuts in the use of energy.
33
Transmission of Data
34
Transmission of Data
Simplicity of a focus on the matching and unification of
patterns.
Aims to discover structures that are, quotes, “natural”.
Brain-inspired “DONSVIC” principle can mean relatively
high levels of information compression.
Potential for G to include structures not recognized by
most compression algorithms, such as:
Generic 3D models of objects and scenes.
Generic sequential redundancies across sequences of frames. 35
Overcoming Problems of Variety of Big
Data
Diverse kinds of data: the world’s many languages, spoken or
written; static and moving images; music as sound and music
in its written form; numbers and mathematical notations;
tables; charts; graphs; networks; trees; grammars; computer
programs; and more.
There are often several different computer formats for each
kind of data. With images, for example: JPEG, TIFF, WMF,
BMP, GIF, EPS, PDF, PNG, PBM, and more.
Adding to the complexity is that each kind of data and each
format normally requires its own special mode of processing.
THIS IS A MESS! It needs cleaning up.
Although some kinds of diversity are useful, there is a case for
developing a universal framework for the representation and
processing of diverse kinds of knowledge (UFK). 36
Universal Framework for the Representation
and Processing of Knowledge(UFK)
Potential benefits of a UFK in: ● Learning structure in data ●
Interpretation of data; ● Data fusion; ● Understanding and
translation of natural languages; ● The semantic web and
internet of things; ● Long-term preservation of data; ●
Seamless integration in the representation and processing of
diverse kinds of knowledge.
Most concepts are an amalgam of diverse kinds of knowledge
(which implies some uniformity in the representation and
processing of diverse kinds of knowledge).
The SP system is a good candidate for the role of UFK
because of its versatility in the representation and processing
of diverse kinds of knowledge.
37
How Variety Hinders Learning
Discovering the association between lightning and thunder is
likely to be difficult when:
Lightning appears in big data as a static image in one of several formats; or
in a moving image in one of several formats; or it is described, in spoken
or written form, as any one of such things as “firebolt”, “fulmination”, “la
foudre”, “der Blitz”, “lluched”, “a big flash in the sky”, or indeed
“lightning”.
Thunder is represented in one of several different audio formats; or it is
described, in spoken or written form, as “thunder”, “gök gürültüsü”, “le
tonnerre”, “a great rumble”, and so on.
If learning and discovery processes are going to work
effectively, we need to get behind these surface forms and
focus on the underlying meanings. This can be done using a
UFK.
38
Veracity
“In building a statistical model from any data source, one must
often deal with the fact that data are imperfect. Real-world
data are corrupted with noise. … Measurement processes are
inherently noisy, data can be recorded with error, and parts of
the data may be missing.”
In tasks such as parsing or pattern recognition, the SP system
is robust in the face of errors of omission, addition, or
substitution.
39
Veracity
40
Veracity
When we learn a first language (L):
We learn from a finite sample.
We generalize (to L) without over-generalising.
We learn ‘correct’ knowledge despite ‘dirty data’.
41
Veracity
For any body of data, I, principles of minimum-length
encoding provide the key:
Aim to minimize the overall size of G and E.
G is a distillation or ‘essence’ of I, that excludes most
‘errors’ and generalizes beyond I.
E + G is a lossless compression of I including typos etc but
without generalizations.
Systematic distortions remain a problem. 42
Interpretation of Data
Processing I in conjunction with a pre-established grammar (G) to
create a relatively compact encoding (E) of I
Depending on the nature of I and G, the process of interpretation
may be seen to achieve:
Pattern recognition
Information retrieval
Parsing and production of natural language
Translation from one representation to another
Planning
Problem solving
43
Velocity: Analysis of Streaming Data
In the context of big data, “velocity” means the analysis
of streaming data as it is received.
“This is the way humans process information.”
This style of analysis is at the heart of how the SP
system has been designed.
Unsupervised learning.
44
Visualizations
The SP system is well suited to visualization for these reasons:
Transparency in the representation of knowledge.
Transparency in processing.
The system is designed to discover ‘natural’ structures in data.
There is clear potential to integrate visualization with the
statistical techniques that lie at the heart of how the SP system
works.
45
Conclusion
Designed to simplify and integrate concepts across artificial
intelligence, mainstream computing, and human perception
and cognition, has potential in the management and analysis of
big data.
The SP system has potential as a universal framework for the
representation and processing of diverse kinds of knowledge
(UFK), helping to reduce the problem of variety in big data
the great diversity of formalisms and formats for knowledge,
and how they are processed.
46
Bibliography
www.cognitionresearch.org/sp.htm .
Article: “Big data and the SP theory of
intelligence”, J G Wolff, IEEE Access, 2, 301-
315, 2014.
47
48

More Related Content

What's hot

Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
Pramit Choudhary
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
IRJET Journal
 
Iaetsd design of image steganography using haar dwt
Iaetsd design of image steganography using haar dwtIaetsd design of image steganography using haar dwt
Iaetsd design of image steganography using haar dwt
Iaetsd Iaetsd
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
IDES Editor
 
Ijariie1132
Ijariie1132Ijariie1132
Ijariie1132
IJARIIE JOURNAL
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
Marvin Bertin
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
cscpconf
 
A New Approach of Cryptographic Technique Using Simple ECC & ECF
A New Approach of Cryptographic Technique Using Simple ECC & ECFA New Approach of Cryptographic Technique Using Simple ECC & ECF
A New Approach of Cryptographic Technique Using Simple ECC & ECF
IJAEMSJORNAL
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
webwinkelvakdag
 
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDINGUNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
ijaia
 
Soft computing
Soft computingSoft computing
Soft computing
CSS
 
Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learning
butest
 
Fundamentals of the fuzzy logic based generalized theory of decisions
Fundamentals of the fuzzy logic based generalized theory of decisionsFundamentals of the fuzzy logic based generalized theory of decisions
Fundamentals of the fuzzy logic based generalized theory of decisions
Springer
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processing
DHIVYADEVAKI
 
Conceptual design of edge adaptive steganography scheme based on advanced lsb...
Conceptual design of edge adaptive steganography scheme based on advanced lsb...Conceptual design of edge adaptive steganography scheme based on advanced lsb...
Conceptual design of edge adaptive steganography scheme based on advanced lsb...
IAEME Publication
 
Text Steganography Using Compression and Random Number Generators
Text Steganography Using Compression and Random Number GeneratorsText Steganography Using Compression and Random Number Generators
Text Steganography Using Compression and Random Number Generators
Editor IJCATR
 
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED STEGANOGRAPHY AND CR...
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED  STEGANOGRAPHY AND CR...Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED  STEGANOGRAPHY AND CR...
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED STEGANOGRAPHY AND CR...
Budi Prasetiyo
 
IRJET- Application of Machine Learning for Data Security
IRJET-  	  Application of Machine Learning for Data SecurityIRJET-  	  Application of Machine Learning for Data Security
IRJET- Application of Machine Learning for Data Security
IRJET Journal
 
1283920.1283924
1283920.12839241283920.1283924
B03208016
B03208016B03208016
B03208016
inventy
 

What's hot (20)

Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
Iaetsd design of image steganography using haar dwt
Iaetsd design of image steganography using haar dwtIaetsd design of image steganography using haar dwt
Iaetsd design of image steganography using haar dwt
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
Ijariie1132
Ijariie1132Ijariie1132
Ijariie1132
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
 
A New Approach of Cryptographic Technique Using Simple ECC & ECF
A New Approach of Cryptographic Technique Using Simple ECC & ECFA New Approach of Cryptographic Technique Using Simple ECC & ECF
A New Approach of Cryptographic Technique Using Simple ECC & ECF
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
 
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDINGUNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
 
Soft computing
Soft computingSoft computing
Soft computing
 
Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learning
 
Fundamentals of the fuzzy logic based generalized theory of decisions
Fundamentals of the fuzzy logic based generalized theory of decisionsFundamentals of the fuzzy logic based generalized theory of decisions
Fundamentals of the fuzzy logic based generalized theory of decisions
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processing
 
Conceptual design of edge adaptive steganography scheme based on advanced lsb...
Conceptual design of edge adaptive steganography scheme based on advanced lsb...Conceptual design of edge adaptive steganography scheme based on advanced lsb...
Conceptual design of edge adaptive steganography scheme based on advanced lsb...
 
Text Steganography Using Compression and Random Number Generators
Text Steganography Using Compression and Random Number GeneratorsText Steganography Using Compression and Random Number Generators
Text Steganography Using Compression and Random Number Generators
 
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED STEGANOGRAPHY AND CR...
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED  STEGANOGRAPHY AND CR...Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED  STEGANOGRAPHY AND CR...
Journal - DATA HIDING SECURITY USING BIT MATCHING-BASED STEGANOGRAPHY AND CR...
 
IRJET- Application of Machine Learning for Data Security
IRJET-  	  Application of Machine Learning for Data SecurityIRJET-  	  Application of Machine Learning for Data Security
IRJET- Application of Machine Learning for Data Security
 
1283920.1283924
1283920.12839241283920.1283924
1283920.1283924
 
B03208016
B03208016B03208016
B03208016
 

Similar to Big data and SP Theory of Intelligence

Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligence
Christy Abraham Joy
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
Amazon Web Services
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Amit Sheth
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Artificial Intelligence Institute at UofSC
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Katy Allen
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
David Raj Kanthi
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
Babasab Patil
 
Assessing data dissemination strategies
Assessing data dissemination strategiesAssessing data dissemination strategies
Assessing data dissemination strategies
Open University, KMi
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Barry Smith
 
Information Upload and retrieval using SP Theory of Intelligence
Information Upload and retrieval using SP Theory of IntelligenceInformation Upload and retrieval using SP Theory of Intelligence
Information Upload and retrieval using SP Theory of Intelligence
INFOGAIN PUBLICATION
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific System
Subhasis Dasgupta
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
Sören Auer
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
Fabio Stella
 
Chapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptxChapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptx
Eyersu Selemon
 
Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation Challenges
Sören Auer
 
A holistic approach to distribute dimensionality reduction of big dat,big dat...
A holistic approach to distribute dimensionality reduction of big dat,big dat...A holistic approach to distribute dimensionality reduction of big dat,big dat...
A holistic approach to distribute dimensionality reduction of big dat,big dat...
Nexgen Technology
 

Similar to Big data and SP Theory of Intelligence (20)

Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligence
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
Assessing data dissemination strategies
Assessing data dissemination strategiesAssessing data dissemination strategies
Assessing data dissemination strategies
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Information Upload and retrieval using SP Theory of Intelligence
Information Upload and retrieval using SP Theory of IntelligenceInformation Upload and retrieval using SP Theory of Intelligence
Information Upload and retrieval using SP Theory of Intelligence
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific System
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Chapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptxChapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptx
 
Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation Challenges
 
A holistic approach to distribute dimensionality reduction of big dat,big dat...
A holistic approach to distribute dimensionality reduction of big dat,big dat...A holistic approach to distribute dimensionality reduction of big dat,big dat...
A holistic approach to distribute dimensionality reduction of big dat,big dat...
 

Recently uploaded

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 

Recently uploaded (20)

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 

Big data and SP Theory of Intelligence

  • 1. Big Data and the SP Theory of Intelligence Varsha Prabhakaran S8 CSE B Roll No: 43 1
  • 2. Contents Introduction SP Theory of Intelligence Problems of Big Data Volume Efficiency Transmission Variety Veracity Visualization 2
  • 3. Introduction SP theory of intelligence be applied to the management and analysis of big data Overcomes the problem of variety in big data. Analysis of streaming data- velocity Economies in the transmission of data Veracity in big data. Visualization of knowledge structures and inferential processes 3
  • 4. SP Theory of Intelligence 4
  • 5. SP Theory of Intelligence Designed to simplify and integrate concepts across artificial intelligence, mainstream computing, and human perception and cognition. Product of an extensive program of development and testing via the SP computer model. Knowledge represented with arrays of atomic symbols in one or two dimensions called “patterns”. Processing are done by compressing information Via the matching and unification of patterns. Via the building of multiple alignments . 5
  • 6. Benefits of the SP Theory Conceptual simplicity combined with descriptive and explanatory power across several aspects of intelligence. Simplification of computing systems, including software. Deeper insights and better solutions in several areas of application. Seamless integration of structures and functions within and between different areas of application 6
  • 8. MULTIPLE ALIGNMENT: A CONCEPT BORROWED FROM BIOINFORMATICS 8
  • 9. Multiple Alignment The system aims to find multiple alignments that enable a New pattern to be encoded economically in terms of one or more Old patterns Multiple alignment provides the key to: Versatility in representing different kinds of knowledge. Versatility in different kinds of processing in AI and mainstream computing. 9
  • 10. Multiple Alignment S → NP V NP NP → D N D → t h i s D → t h a t N → g i r l N → b o y V → l o v e s V → h a t e s 10
  • 13. Multiple Alignment Compression difference: CD = BN-BE BN :total number of bits in those symbol in the New pattern that are aligned with Old symbols in the alignment BE :the total number of bits in the symbols in the code pattern Compression ratio: CR = BN/BE; 13
  • 14. Multiple Alignment BN is calculated as: h BN = Ʃ Ci i=1 Ci is the size of the code for ith symbol in a sequence, H1...Hh, com- prising those symbols within the New pattern that are aligned with Old symbols 14
  • 15. Multiple Alignment BE is calculated as: s BE = Ʃ Ci i=1 where Ci is the size of the code for ith symbol in the sequence of s symbols in the code pattern derived from the multiple alignment. 15
  • 17. 17
  • 18. 18
  • 20. Problems of Big Data and Solutions Volume: big data is … BIG! Efficiency in computation and the use of energy. Unsupervised learning: discovering ‘natural’ structures in data. Transmission of information and the use of energy. Variety: in kinds of data, formats, and modes of processing. Veracity: errors and uncertainties in data. Interpretation of data: pattern recognition, reasoning Velocity: analysis of streaming data. Visualization: representing structures and processes 20
  • 21. Volume: Making Big Data Smaller “Very-large-scale data sets introduce many data management challenges.” Information compression. Direct benefits in storage, management and transmission. Indirect benefits efficiency in computation and the use of energy unsupervised learning additional economies in transmission and the use of energy assistance in the management of errors and uncertainties in data processes of interpretation. 27
  • 22. Energy, Speed and Bulk In the SP theory, a process of searching for matching patterns is central in all kinds of ‘processing’ or ‘computing’. This means that anything that increases the efficiency of searching will increase computational efficiency and, probably, cut the use of energy: Reducing the volume of big data. Exploiting ***probabilities***. Cutting out some searching. 22
  • 23. Efficiency via Reduction in Volume Information compression is central in how the SP system works: Reducing the size of big data. Reducing the size of search terms. Both these things can increase the efficiency of searching, meaning gains in computational efficiency and cuts in the use of energy. 23
  • 28. Efficiency Via Probabilities Statistical knowledge flows directly from: Information compression in the SP system and The intimate connection between information compression and concepts of prediction and probability. There is great potential to cut out unnecessary searching, with consequent gains in efficiency. Potential for savings at all levels and in all parts of the system and on many fronts in its stored knowledge. 28
  • 29. Efficiency via a Synergy with Data-Centric Computing 29
  • 30. Efficiency via a Synergy with Data-Centric Computing In SP-neural, SP patterns may be realized as neuronal pattern assemblies. There would be close integration of data and processing, as in data-centric computing. Direct connections may cut out some searching 30
  • 31. Unsupervised learning Lossless compression of a body of information Information compression, or “minimum length encoding” remains the key. Matching and unification of patterns SP computer model has already demonstrated an ability to discover generative grammars, including segmental structures, classes of structure, and abstract patterns. For body of information, I, the products of learning are: a grammar (G) and an encoding (E) of I in terms of G 31
  • 33. Transmission of Data • By making big data smaller (“Volume”). • By separating grammar (G) from encoding (E), as in some dictionary techniques and analysis/synthesis schemes. • Efficiency in transmission can mean cuts in the use of energy. 33
  • 35. Transmission of Data Simplicity of a focus on the matching and unification of patterns. Aims to discover structures that are, quotes, “natural”. Brain-inspired “DONSVIC” principle can mean relatively high levels of information compression. Potential for G to include structures not recognized by most compression algorithms, such as: Generic 3D models of objects and scenes. Generic sequential redundancies across sequences of frames. 35
  • 36. Overcoming Problems of Variety of Big Data Diverse kinds of data: the world’s many languages, spoken or written; static and moving images; music as sound and music in its written form; numbers and mathematical notations; tables; charts; graphs; networks; trees; grammars; computer programs; and more. There are often several different computer formats for each kind of data. With images, for example: JPEG, TIFF, WMF, BMP, GIF, EPS, PDF, PNG, PBM, and more. Adding to the complexity is that each kind of data and each format normally requires its own special mode of processing. THIS IS A MESS! It needs cleaning up. Although some kinds of diversity are useful, there is a case for developing a universal framework for the representation and processing of diverse kinds of knowledge (UFK). 36
  • 37. Universal Framework for the Representation and Processing of Knowledge(UFK) Potential benefits of a UFK in: ● Learning structure in data ● Interpretation of data; ● Data fusion; ● Understanding and translation of natural languages; ● The semantic web and internet of things; ● Long-term preservation of data; ● Seamless integration in the representation and processing of diverse kinds of knowledge. Most concepts are an amalgam of diverse kinds of knowledge (which implies some uniformity in the representation and processing of diverse kinds of knowledge). The SP system is a good candidate for the role of UFK because of its versatility in the representation and processing of diverse kinds of knowledge. 37
  • 38. How Variety Hinders Learning Discovering the association between lightning and thunder is likely to be difficult when: Lightning appears in big data as a static image in one of several formats; or in a moving image in one of several formats; or it is described, in spoken or written form, as any one of such things as “firebolt”, “fulmination”, “la foudre”, “der Blitz”, “lluched”, “a big flash in the sky”, or indeed “lightning”. Thunder is represented in one of several different audio formats; or it is described, in spoken or written form, as “thunder”, “gök gürültüsü”, “le tonnerre”, “a great rumble”, and so on. If learning and discovery processes are going to work effectively, we need to get behind these surface forms and focus on the underlying meanings. This can be done using a UFK. 38
  • 39. Veracity “In building a statistical model from any data source, one must often deal with the fact that data are imperfect. Real-world data are corrupted with noise. … Measurement processes are inherently noisy, data can be recorded with error, and parts of the data may be missing.” In tasks such as parsing or pattern recognition, the SP system is robust in the face of errors of omission, addition, or substitution. 39
  • 41. Veracity When we learn a first language (L): We learn from a finite sample. We generalize (to L) without over-generalising. We learn ‘correct’ knowledge despite ‘dirty data’. 41
  • 42. Veracity For any body of data, I, principles of minimum-length encoding provide the key: Aim to minimize the overall size of G and E. G is a distillation or ‘essence’ of I, that excludes most ‘errors’ and generalizes beyond I. E + G is a lossless compression of I including typos etc but without generalizations. Systematic distortions remain a problem. 42
  • 43. Interpretation of Data Processing I in conjunction with a pre-established grammar (G) to create a relatively compact encoding (E) of I Depending on the nature of I and G, the process of interpretation may be seen to achieve: Pattern recognition Information retrieval Parsing and production of natural language Translation from one representation to another Planning Problem solving 43
  • 44. Velocity: Analysis of Streaming Data In the context of big data, “velocity” means the analysis of streaming data as it is received. “This is the way humans process information.” This style of analysis is at the heart of how the SP system has been designed. Unsupervised learning. 44
  • 45. Visualizations The SP system is well suited to visualization for these reasons: Transparency in the representation of knowledge. Transparency in processing. The system is designed to discover ‘natural’ structures in data. There is clear potential to integrate visualization with the statistical techniques that lie at the heart of how the SP system works. 45
  • 46. Conclusion Designed to simplify and integrate concepts across artificial intelligence, mainstream computing, and human perception and cognition, has potential in the management and analysis of big data. The SP system has potential as a universal framework for the representation and processing of diverse kinds of knowledge (UFK), helping to reduce the problem of variety in big data the great diversity of formalisms and formats for knowledge, and how they are processed. 46
  • 47. Bibliography www.cognitionresearch.org/sp.htm . Article: “Big data and the SP theory of intelligence”, J G Wolff, IEEE Access, 2, 301- 315, 2014. 47
  • 48. 48