P1121133746

Combinatory Logic and language
engineering
Ismail Biskri, Adam Joly and Boucif Amar Bensaber
LAMIA, Université du Québec à Trois-Rivières

Introduction
 Language engineering:
 Everything related to the NLP and the knowledge extraction
 Main goal: help humans to access to knowledge contained in texts
 Definition:
 The study and the description of the concepts, the approaches, the methods and the
techniques that allow data extraction and knowledge modeling and acquisition from
texts
 Knowledge acquisition from text needs to be assisted by analysis tools for
corpus, such as:
 Semantic or syntactic analyzers
 Marker tracking tools supported by contextual exploration
 Statistical analyzers
 Etc.
 Numerous application fields since the development of the Web and
office tools
2 Biskri, Joly & Amar Bensaber, ICGST 2011

Introduction (2/3)
 Many generations of tools:
 At the beginning (about 40 years ago):
 Applications focusing on 1 functionality
 Since the 90’s:
 More complex approaches are required by the industry for text analysis
 There is an interest for functions and operations assembling in complex processing
chains (Hallab & al. 2000; Moscarola & al., 2002)
 Most of the tools proposed then offer various functionalities
 Despite some success with scientists and industries, they have many
important limits:
 The technologies offer a closed and limited set of functionalities
 They are designed as autonomous entities that can hardly or simply not be
integrated into more complex processing chains
 They can be unusable by researchers with particular analysis needs (lack of
adaptability)

Introduction (3/3)
 Recently, a new generation of software platforms for language engineering
has started to emerge
 Statistical analysis:
 Aladin (Seffah & al., 1995)
 T2K and Knime (Warr, 2007)
 Linguistic Analysis:
 Context (Crispino & al., 1999)
 Gate (Cunningham & al, 2002)
 From these new platforms emerge new interests on processing chains
about:
 Their coherence
 Their flexibility
 Their adaptability
 Etc.

General Framework
 Processing chain:
 Integrated sequence of computational modules dedicated to specific processing,
assembled in a pertinent order according to a processing goal determined by the
language engineer
 A module accomplishes an operation which applies to one or many object
entities from a given type and returns other object entities from another
type
 A processing chain allow the composition of modules
 We need a formal system that can answer 2 fundamental questions:
 Given a set of modules, what are the allowable arrangements which lead to
coherent processing chains?
 Given a coherent processing chain, how can we automate (as much as possible)
its assessment (in the sense of its calculability)?
 Such a system will be at the center of our theoretical model

General Framework
 Theoretical general framework chosen: Applicative
Grammars (Desclés, 1990; Shaumyan, 1998)
 Instead of designing a rewritten grammar for syntactical
validation of the processing chain, we use a typed logic.
 Types are given to inputs/outputs (integer, char, …)
 Types constraint the possibilities of modules composition
 Main advantages of this formalism:
 Assures a firm compositionality of the different modules in the different processing chains,
by validating the types attributed to the modules
 Allows to compose an infinity of modules

Combinatory Logic
Combinator Role -Reduction ruleβ
B Composition B x y z x (y z)→
C Permutation C x z y x y z→
Φ Distribution Φ x y z u x (y u) (z u)→
W Duplication W x y x y y→
 From the works of Schöfinkel (1924) and Curry and Feys (1958)
 Eliminate the need for variables in mathematics
 Combinators:
 Abstract operators that apply to other operators in order to build more
complex operators;
 Act as functions over arguments, in an operator-operands structure
 Each specific action is represented by a unique rule that defines the
equivalence between a logical expression with a combinator versus one
without a combinator ( -reduction rule)β

 Complex combinators:
 We can combine recursively many elementary combinators together
to form an infinitely range of complex combinators
 The global action is determined by the successive application of the
combinators (from left to right)
 Example:
i. B B C x y z u v
ii. B (C x) y z u v
iii. C x (y z) u v
iv. x u (y z) v
 Power combinators (χn
):
 Reiterates n times the action of the combinator χ
 Distance combinators (χn):
 Postpones the action of a combinator of n stepsχ
Combinatory Logic (2/3)

Combinatory Logic (3/3)
 Combinatory logic fills 2 major goals:
 It gives an interoperable and formal representation of the solution;
 Combinatory logic expressions formally represent the composition of the
modules of the processing chain and gives the direct execution order
 Combinators provides operators to support the different types of
interactions between modules:
 B: expresses the composition of 2 interconnected modules
 C: assures that all combinators and modules of the expression appear together
to the left and all inputs to the right (ordering)
 Φ: distributes the same input to 2 or more different modules

Processing Chains
 Our model builds systems using metaprogramming:
 The metaprograms act as controllers over the programs (modules) by specifying the
interactions between modules and their execution flow
 The goal is to be able to easily replace a module by another one with
compatible inputs and outputs
 Module:
 It acts like a math function:
 It takes arguments as inputs
 It processes a specific action
 It returns a result as output
 Each module is independent (black box: we know what it does but we are not interesting
in how)
 It must have the capacity to communicate with other modules following a protocol

Processing Chains (2/2)
 A controller supervises the flow of communication:
 It verifies the validity of connections between modules (if the processing chain is
syntactically correct):
 It determines the execution order of modules (following the combinatory
expression)
 It triggers the execution of a module (one at a time only)
Processing chain 2
Processing chain 1
M1M1
M2M2
O1
O2
I1
I2 I4
I3
M3/C2M3/C2 O3 M4M4 O4I5
Controller 1
M1M1 O1 M2/C3M2/C3 O2I3
I2
I1
…
 By abstraction, a processing
chain (the controller and
modules) can be considered
as a (super or meta) module
by itself)
 Thus it can be used as a
module in another processing
chain

Basic Processing Chains (1 module)
M1M1 O1I1
M1M1 O1
I2
I1
In
…
 1 input:
 No combinator needed
 O1 is obtained by applying M1 to I1
 O1 = M1 I1
 n inputs:
 We add the inputs at the end of the expression
 O1 = M1 I1 I2 … In

Serial processing chains
 Relation of composition between modules (B)
 2 connected modules:
 O1 = M1 I1
 O2 = M2 I2
 I2 = O1
 O2 = M2 (M1 I1)
 O2 = B M2 M1 I1
 3 connected modules:
 O3 = M3 I3
 I3 = O2
 O3 = M3 (B M2 M1 I1)
 O3 = B3
M3 B M2 M1 I1
 O3 = C B3
B M3 M2 M1 I1
 4 connected modules: O4 = C B4
(C B3
B) M4 M3 M2 M1 I1
 (…)
 The power of B is induced by the number of modules in the chain
M1M1 O1I1 M2M2 O2I2
M1M1 O1I1 M2M2 O2I2 M3M3 O3I3

Parallel processing chains
 Contains modules that have many inputs
 Module connected on the 1st input of a 2nd module:
 O2 = M2 I2 I3
 O1 = M1 I1
 I2 = O1
 O2 = M2 (M1 I1) I3
 O2 = B M2 M1 I1 I3
 2 modules connected to a 3rd module:
 O3 = M3 I3 I4
 I3 = M1 I1
 I4 = M2 I2
 O3 = M3 (M1 I1) (M2 I2)
 O3 = B M3 M1 I1 (M2 I2)
 O3 = C2 B M3 M1 (M2 I2) I1
 O3 = B3 C2 B M3 M1 M2 I2 I1
 3 modules connected to a 4th module: B7 C6 C6 B3 C2 B M4 M1 M2 M3 I3 I2 I1
 (…)
 The distance of combinators B and C can be induced by the number of modules
M1M1 O1I1
M2M2 O2
I2
I3
M1M1
M2M2
O1
O2
I1
I2 I4
I3
M3M3 O3

A Complex Processing Chain
M3M3
M6M6
O3
O6I4 M4M4 O4
M5M5 O5
I9
I8
M7M7 O7M2M2 O2I2
M1M1 O1I1 I3
I7
I6
I5
 B3 C2 B M7 (B M6 M4) (B3 C2 B M5 M2 (B M3 M1))) I4 I2 I1

SATIM
 Following these formalisms and principles, we have implemented a prototype
(work in progress) named SATIM.
 SATIM: « Système d’Analyse et de Traitement de l’Information Multidimensionnelle »
(Multidimensional Data Analysis and processing System)
 The architecture of this modular platform postulates 3 levels of interaction with a
language engineer:
1. Workshop:
 Contains various modules, procedures and functions and their assigned applicative categories
 Possibility to add or delete modules to a « database » of modules
1. Laboratory:
 Allows an engineer to build his processing chain and adjust it using tests and according to his
objective
1. Application:
 It is the output of the previous level: the processing chain is then an autonomous software that
contains a coherent and well organized subset of modules

Conclusion
 We are at a prototypal stage/test phase
 Eventually, it will become the full-size project within which we aspire
to design tools for language engineering and other tools for NLP in
general
 The strong foundations (formalism and principles) at the heart of SATIM
are aimed to address the need for coherence, flexibility, adaptability and
easy communication between programs (processing chains):
 Modules are independents: we can easily replace a module by another one
with compatible inputs and output to change some parts of a given program
 We believe that the approach could help research teams to collaborate
together by sharing components

P1121133746

More Related Content

What's hot

Viewers also liked

Similar to P1121133746

More from Ashraf Aboshosha

P1121133746

Editor's Notes