Fusepool Machine Learning Framework

Fusepool
Machine Learning
Framework
June 25th, Brussels

Fusepool
Structured Content
Visualization
Enable personalized software

Outline
Introduction to adaptive interfaces
Source refinement
Document labeling
Link prediction
Adaptive layout
Simple Machine Learning: Listen-Update-Predict (LUP)
LUP in detail for document labelling
Predictive Query: Predictive queries

Adaptive interfaces
Guillaume Bouchard (Xerox)

Customization/Contextualization of
interfaces
Known and
accepted by big
internet companies
Nor easy to
implement for SMEs

Annotation tools
●To manage large
knowledge bases, the is a
need for efficient interfaces
for annotators
●Web2.0 companies are
investigating these tools
●Mixed initiative
oA learning algorithm + human
interface
●Remark: a user can be
an annotator for some time

Supervised automation
Introduction
Challenge
LOD provides huge amount of data
Hard to organize
Goal
Streamline KB cleaning and management through
implicit and explicit feedback
Specifications
Easy tagging of documents
Near real-time prediction

Adaptive components in Fusepool
Document category
prediction
Entity labeling
Source refinement (re-ranking based on previous user clicks)
Adaptive Layout

Simple Machine Learning:
Listen-Update-Predict (LUP)

Motivation
●Adaptive systems
●Many systems use machine learning algorithms as internal components
●The interaction between raw data, annotations, algorithms and predictions is not
simple:
• Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume)
• Algorithms: multiple possible algorithms for the same task, slow
training/inference
• Visualization: must carry the uncertainty about data, annotations and
predictions
●Common problems:
• Confusion between predictions and data
• Models not automatically updated (manually « re-train » models)
• No simple way to test new algorithms
• Annotations not shared accross models in the same system
• Too few annotations in specific domain (no principled way to gather new
annotations)

Prior art
• Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008
• https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf
• The Agent Learning Pattern: Implementing ML algorithms in multiagent systems
• http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf
• Gestalt, a general-purpose integrated development environment designed the application of
machine learning
• Kayur Patel (University of Washington)
• http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf
• Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer
• http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf
• Infer.net: Probabilistic programming. Compilation of machine learning codes
• http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf
• Never-Ending Language Learning (NELL). The closest to our work but focused on language
• www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf

Never Ending Language Learning
● ●Intelligent computer agent
●Runs forever. Every day:
1. extract, or read, information from
the web
2. learn to perform this task better
●Carlson, Betteridge, Kisiel, Settles,
Hruschka and Mitchell (2010) give
the design principles for such an
agent

LUPI Module overview
Listen
Gets notified when new annotations arrive
Update
Process annotation & update learning models
Predict
Exposes a prediction service available for other
components
Investigate
Actively ask for new annotations

LUP modules are monitored by
Fusepool main platform

LUP Module Implementation
●LUPEngine in a java interface
●Locations: com.xerox.services.LUPEngine
o + getGraphListener(...);
o + graphChanged(...);
o + updateModels(...);
o + predict(...);

Follow the LUP
Listen
Users give labels to documents in the GUI
Labels stored in annotation store
Update
Optimize the model with latest annotations
Warm start machine learning algorithms
Predict
Real time prediction based on updated model
Visible in the GUI

Architecture
Components Process

Xerox web services
Update and prediction using REST interface
Scaling up prediction to huge datasets

Listen
private class MyListener implements GraphListener {
public void graphChanged(List<GraphEvent> list) {
/**
* Listener method: called when matching modifications detected on
* the Annostore. This method triggers the Learning process, using
* the updateModels(HashMap<String,String> paramas) method.
*/
annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME);
for (GraphEvent e : list) {
log.info("New #MyKindOfAnnotation !");
HashMap<String,String> params = new HashMap<String, String>();
// 1.) Accessing the target of the annotation
Iterator<Triple> it = annostore.filter(e.getTriple().getSubject(),
new UriRef("http://www.w3.org/ns/oa#hasTarget"),
null);
// 2.) Accessing the content as text of the target
// e.g. the new word to insert into the dictionary
Resource target = it.next().getObject();
it = annostore.filter((NonLiteral)target,
new UriRef("http://www.w3.org/2011/content#chars"),
null);
String newWord = it.next().getObject().toString();
params.put("newWord", newWord);
updateModels(params);
}
}
}

Update
public void updateModels(HashMap<String, String> params) {
/**
* This method updates the learning models.
*/
String newWord = params.get("newWord");
log.info("Adding " + newWord + " to dictionnary");
myDictionnary.add(newWord);
}

Predict
HashMap<String,String> params = new HashMap<String,String>();
String docURI = "<http://fusepool.info/doc/pmc/2751467>";
/**
* We build the parameters to give it to the L3.4via the predictionHub
*/
params.put("docURI", docURI);
/**
* We call the LUP34.predict(...) method via the predictionHub.predict(...)
method
*/
String predictedLabels = predictionHub.predict("LUP34", params);
/**
* We dump the result of the prediction
*/
log.info(predictedLabels);
/**
* "tissue__0.713##sodium__0.09135##English__0.016"
*/

Multi-task learning services
● Better prediction based on
multi-task algorithm with label
embedding
● Efficient learning algorithms
o Alternating optimization
o Stochastic Gradient Descent
● Efficient storage based on
Cassandra

Sequence diagram
1. The GUI insert
annotations
2. The Listener calls the
LUP3.4 Module
3. The LUP calls the
REST API
4. Then the information
flows back when
doing prediction

Properly tested interface
Corpus 20 Newgroups WebKB Cade
Tolerance 1 2 3 1 2 3 1 2
Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222
Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266
Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072

Predictive queries

Motivation for predictive queries
Most of prediction problems can be expressed as a query
on “missing” information.
SELECT ?n WHERE
<?d, hasLabel, “WellWritten”>
<?p, isAuthor, ?d>
<?p, hasName, ?n>

Semantic Search API
Predictive SPARQL
Core idea: learn a model on KB
 Now we can query missing data!
● SPARQL is a standard query language for semantic data
● Predictive SPARQL: generalization to probabilistic models

Semantic Search API
Predictive SPARQL example

Semantic Search API
Predictive model
● Use of tensor
factorization methods
● Tensor=generalization of
matrices
● Scalable probabilistic
models
● Based on Rescal
approximation:
Tikj ≈ ei
TRk ej
where:
o ei and ej are entities
o Rk is the relational matrix

Conclusion

Main achievements
● LUP: Listen-Update-Predict is a design pattern
that provide software engineering best practices
● Predictive SPARQL: A framework for predictive
queries on RDF data

Future of Fusepool
Xerox is using Fusepool for exploring and
organizing its customer KB

Fusepool Machine Learning Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (20)

Similar to Fusepool Machine Learning Framework

Similar to Fusepool Machine Learning Framework (20)

More from Fusepool SME project

More from Fusepool SME project (11)

Recently uploaded

Recently uploaded (20)

Fusepool Machine Learning Framework