Shifu plugin-trainer and pmml-adapter

Shifu-Plugin Demo
Lisa Hua
7/21/2014

Recap
1. Convert PMML back to ML model
2. Integrate to Shifu as Shifu-plugin-*
3. Add examples
4. Performance test for PMML evaluator

Miscellaneous
1. Compatible issue:
Spark depends on Akka 2.2.3, while shifu uses 2.1.1
2. Spark overview
3. About showcase
a video that introduces shifu
a poster that describes my project
a project title and project description

PMML Adapter Demo
Lisa Hua
06/23/14
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD

Outline
1. Neural Network Model Conversion
a. Encog NN model
b. Mahout NN model
1. Logistic Regression Model Conversion
a. Encog LR model (NN)
b. Spark LR model
c. Mahout LR model
2. PMML Adapter API and how to extend
PMML Adapter

protected void initMLModel() {...
mlModel = new MultilayerPerceptron();
mlModel.addLayer(20, false, "Identity");
// numInputFields,isFinalLayer,squashFunction
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(1, true, "Sigmoid");
for (MahoutData data : inputDataSet) {
mlModel.trainOnline(data.getInput()); …}
}
protected void adaptToPMML() {...
Matrix[] matrixList = nnModel.getWeightMatrices();...
}
squashFunctions:
1. only supports identity and sigmoid now.
2. squashFunctionList is protected without
getter function, now we set
activationFunction as sigmoid by default.
Mahout NN Model - trainOnline()
//in Adapter
for (int k = 1; k < columnSize; k++) {
neuron.withConnections(new Connection(matrix.get(j, k)));
}
// bias neuron for each layer, set to bias=1
neuron.withConnections(new Connection(matrix.get(j, 0)));
Bias is the first Neuron in each layer
that is not the final layer

protected void evaluatePMML() {
for (int i = 0; i < mahoutDataSet.size(); i++) {
Assert.assertEquals(
getPMMLEvaluatorResult(pmmlEvalResultList.get(i)),
getMahoutResult(mahoutDataSet.get(i)), DELTA);//DELTA=10-5
}
private double getMahoutResult(MahoutData data) {
return mlModel.getOutput(data.getEvalInput()).get(0);
}
Mahout NN Model - getOutput()

Encog LR Model - compute()
lrModel = (BasicNetwork) networkReader.read(new FileInputStream("EncogLR.lr"));
}
double[] weights = lrModel.getWeights();...}
}
protected void evaluatePMML() {
for (int i = 0; i < dataSet.size(); i++) {
Assert.assertEquals( getPMMLEvaluatorResult(index++),
getNextEncogLRResult(mlResultIterator), DELTA);
}
}
private double getNextEncogLRResult(Iterator<MLDataPair> mlResultIterator) {
MLData result = lrModel.compute(mlResultIterator.next().getInput());
return result.getData(0);
}

Spark LR Model: train() and predict()
lrModel = LogisticRegressionWithSGD.train(points.rdd(), iterations,stepSize);
}
List<double> weights = lrModel.weights();
...}
protected void evaluatePMML() {...
List<Double> evalList = lrModel.predict(evalRDD).cache().collect();
for (...) {
Assert.assertEqual( getPMMLEvaluatorResult(i),
sparkEvalList.get(i),DELTA);
}
} Notes:
1. The method lrModel.weights() returns intercept followed by the weight list.
2. Compatible issue:
Spark depends on Akka 2.2.3, while shifu uses 2.1.1. Currently, these is compatible issue if we
change Akka version of shifu-core from 2.1.1 to 2.2.3, I suspect the issue lies in Guagua based
on the building history, the root cause is still unknown to me.

Mahout LR Model - train() and classifyScalar()
lrModel = new OnlineLogisticRegression(2, 20, new L1());
//numCategory, numFeatures, PriorFunction
for (MahoutDataPair pair : inputDataSet) {
lrModel.train(pair.getActual(), pair.getFeatureField());
}
}
Matrix matrix = lrModel.getBeta(); // coefficients. This is a dense matrix
// that is (numCategories-1) x numFeatures
}
private double getMahoutResult(MahoutDataPair data) {
return lrModel.classifyScalar(data.getVector());
//Returns a single scalar probability in the case where we have two categories.
}

Summary of Evaluation Dataset
Model ML Framework Input Data Field Input Data Evaluation Data Nodes in each layer
Neural
Network
Encog 2 layers 20 450
118
20,45,45,1
560
Encog 3 layers 25 450 550 25,20,15,20,1
Mahout 2 layers 20 450
118
20,45,45,1
560
Mahout 3 layers 25 450 550 25,20,15,20,1
Logistic
Regression
Encog 20 450
118
560
Logistic
Regression
Spark 20 450
118
560
Logistic
Regression
Mahout 20 450
118
560

Summary of the Functions
model class name
parent
class/interface Training method
retrieve training
result
evalution
method
Basic Data
Structure
Encog
Neural
Network
BasicNetork MLClassification
compute
(MLDataSet data)
getWeights():
double[] compute()
MLData: Double[],
MLDataSet:
Set<Double[]>
Logistic
Regression
Spark
Logistic
Regression
Logistic
Regression
Model
GeneralLinearModel,
ClassificationModel train(RDD data) weights():double[]
predict (RDD
<Vector>):
RDD<Double>
RDD: Resilient
Distributed Dataset
Mahout
Neural
Network
Multilayer
Perceptron NeuralNetwork
trainOnline (Vector
instance)
getWeightMatrices
():Matrix
getOutput
(Vector):Vector
Vector
Matrix:
List<Vector>
Logistic
Regression
Online
Logistic
Regression
AbstractOnline
LogisticRegression
train(Vector actual,
Vector instance) getBeta(): Matrix
classifyScalar
(Vector
instance)
:double

3. PMML Adapter API
1. For new ML model conversion
a. implement a subclass of
PMMLModelBuilder<TargetPMMLModel,
SourceMLModel>, implement
adaptMLModelToPMML()

Next Step
● Support: supported by PMML Adapter
● None: The ML framework doesn’t support this ML
model currently
● TBD: To be determined
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD

1. PMML skeleton - Neural Network
<PMML>
<Header></Header>
<DataDictionary></DataDictionary> (specify the format of the input csv)
<NeuralNetwork functionName=”classification”> (models)
<MiningSchema></MiningSchema> (how to use the input data)
<LocalTransformation></LocalTransformation> (specify derived field)
<NeuralInput></NeuralInput> (Input layer, which field should be
used)
<NeuralLayers> (Layers,not include input layer and output layer)
<NeuralLayer activationFunction=”logistic”>
<Neuron id=”X,Y” bias=”0.0”>
<Con from=”X-1,Y” weight=””>
</Neuron>
</NeuralLayer>
</NeuralLayers>
<NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="3,0"></NeuralOutput >
</NeuralOutputs>
</NeuralNetwork>
</PMML>

2.1 PMML Neural Network - Mahout
2,3,1
{
0 => {0:-0.2861259717601905,1:-0.4079344783742465,2:-0.43218273192749174}
1 => {0:0.223912887382075,1:-0.08865866120943716,2:0.4095464158191267}
2 => {0:0.14754755237008804,1:0.2638192545136143,2:0.06633581725392071}
}
{
0 => {0:0.04388751672411058,1:-0.35597268769777723,2:0.21149680575173224,3:0.34402628331423807}
}
0.5635827615510126,
0.5482023969601073,
0.5609684690326279,
0.5751568027254008,
Propagation Weight train evaluate
Encog backpropagation double[] MLTrain/Propagation
Mahout feed-forward Matrix network.trainOnline
(vector)
network.getOutput(vector)

3. PMML Evaluation
public Map<String, Double> evaluateRaw(EvaluationContext context){
NeuralNetwork neuralNetwork = getModel();
Map<String, Double> result = Maps.newLinkedHashMap();
NeuralInputs neuralInputs = neuralNetwork.getNeuralInputs();
for(NeuralInput neuralInput: neuralInputs){
DerivedField derivedField = neuralInput.getDerivedField();
FieldValue value = ExpressionUtil.evaluate(derivedField, context);...
result.put(neuralInput.getId(), (value.asNumber()).doubleValue());
}
List<NeuralLayer> neuralLayers = neuralNetwork.getNeuralLayers();
for(NeuralLayer neuralLayer : neuralLayers){
List<Neuron> neurons = neuralLayer.getNeurons();
for(Neuron neuron : neurons){
double z = neuron.getBias();//the bias for each Neuron,
should be set to 0
List<Connection> connections = neuron.getConnections();
for(Connection connection : connections){
double input = result.get(connection.getFrom());
z += input * connection.getWeight();
}
double output = activation(z, neuralLayer);
result.put(neuron.getId(), output);
}
normalizeNeuronOutputs(neuralLayer, result);
}
return result;
}
private double activation(double z, NeuralLayer neuralLayer){...
switch(activationFunction){
case LOGISTIC: return 1.0 / (1.0 + Math.exp(-z)); //Sigmoid
case IDENTITY: return z; ...//Linear
}

How to get score from PMML
evaluator - EvaluatorTest
PMML pmml = loadPMML(getClass());
//InputStream is = getResourceAsStream("/pmml/" +getSimpleName() + ".pmml");
//return IOUtil.unmarshal(is);
NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);
InputStream is = getClass().getResourceAsStream("/pmml/NormalizedData.csv");
List<Map<FieldName, String>> input = CsvUtil.load(is);
for (Map<FieldName, String> maps : input) {
Map<FieldName, NeuronClassificationMap> evaluateList = (Map<FieldName,
NeuronClassificationMap>)
evaluator.evaluate(maps);
for (NeuronClassificationMap cMap : evaluateList.values())
for (Map.Entry<?, Double> entry : cMap.entrySet())
System.out.println(index++ +":"+entry.getKey() + ":" +
entry.getValue() * 1000);
List<FieldName> activeFields = evaluator.getActiveFields();
}

Shifu plugin-trainer and pmml-adapter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Shifu plugin-trainer and pmml-adapter

Similar to Shifu plugin-trainer and pmml-adapter (20)

Recently uploaded

Recently uploaded (20)

Shifu plugin-trainer and pmml-adapter

Editor's Notes