Shifu (www.shifu.ml) is a fast and scalable machine learning platform. This presentation briefly describes how to convert the Logistic Regression and Neural Network model in Encog, Mahout, and Spark.
2. Recap
1. Convert PMML back to ML model
2. Integrate to Shifu as Shifu-plugin-*
3. Add examples
4. Performance test for PMML evaluator
3. Miscellaneous
1. Compatible issue:
Spark depends on Akka 2.2.3, while shifu uses 2.1.1
2. Spark overview
3. About showcase
a video that introduces shifu
a poster that describes my project
a project title and project description
4. PMML Adapter Demo
Lisa Hua
06/23/14
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD
5. Outline
1. Neural Network Model Conversion
a. Encog NN model
b. Mahout NN model
1. Logistic Regression Model Conversion
a. Encog LR model (NN)
b. Spark LR model
c. Mahout LR model
2. PMML Adapter API and how to extend
PMML Adapter
7. protected void initMLModel() {...
mlModel = new MultilayerPerceptron();
mlModel.addLayer(20, false, "Identity");
// numInputFields,isFinalLayer,squashFunction
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(1, true, "Sigmoid");
for (MahoutData data : inputDataSet) {
mlModel.trainOnline(data.getInput()); …}
}
protected void adaptToPMML() {...
Matrix[] matrixList = nnModel.getWeightMatrices();...
}
squashFunctions:
1. only supports identity and sigmoid now.
2. squashFunctionList is protected without
getter function, now we set
activationFunction as sigmoid by default.
Mahout NN Model - trainOnline()
//in Adapter
for (int k = 1; k < columnSize; k++) {
neuron.withConnections(new Connection(matrix.get(j, k)));
}
// bias neuron for each layer, set to bias=1
neuron.withConnections(new Connection(matrix.get(j, 0)));
Bias is the first Neuron in each layer
that is not the final layer
8. protected void evaluatePMML() {
for (int i = 0; i < mahoutDataSet.size(); i++) {
Assert.assertEquals(
getPMMLEvaluatorResult(pmmlEvalResultList.get(i)),
getMahoutResult(mahoutDataSet.get(i)), DELTA);//DELTA=10-5
}
private double getMahoutResult(MahoutData data) {
return mlModel.getOutput(data.getEvalInput()).get(0);
}
Mahout NN Model - getOutput()
9. Outline
1. Neural Network Model Conversion
a. Encog NN model
b. Mahout NN model
1. Logistic Regression Model Conversion
a. Encog LR model (NN)
b. Spark LR model
c. Mahout LR model
2. PMML Adapter API and how to extend
PMML Adapter
10. Encog LR Model - compute()
protected void initMLModel() {...
lrModel = (BasicNetwork) networkReader.read(new FileInputStream("EncogLR.lr"));
}
protected void adaptToPMML() {...
double[] weights = lrModel.getWeights();...}
}
protected void evaluatePMML() {
for (int i = 0; i < dataSet.size(); i++) {
Assert.assertEquals( getPMMLEvaluatorResult(index++),
getNextEncogLRResult(mlResultIterator), DELTA);
}
}
private double getNextEncogLRResult(Iterator<MLDataPair> mlResultIterator) {
MLData result = lrModel.compute(mlResultIterator.next().getInput());
return result.getData(0);
}
11. Spark LR Model: train() and predict()
protected void initMLModel() {...
lrModel = LogisticRegressionWithSGD.train(points.rdd(), iterations,stepSize);
}
protected void adaptToPMML() {...
List<double> weights = lrModel.weights();
...}
protected void evaluatePMML() {...
List<Double> evalList = lrModel.predict(evalRDD).cache().collect();
for (...) {
Assert.assertEqual( getPMMLEvaluatorResult(i),
sparkEvalList.get(i),DELTA);
}
} Notes:
1. The method lrModel.weights() returns intercept followed by the weight list.
2. Compatible issue:
Spark depends on Akka 2.2.3, while shifu uses 2.1.1. Currently, these is compatible issue if we
change Akka version of shifu-core from 2.1.1 to 2.2.3, I suspect the issue lies in Guagua based
on the building history, the root cause is still unknown to me.
12. Mahout LR Model - train() and classifyScalar()
protected void initMLModel() {...
lrModel = new OnlineLogisticRegression(2, 20, new L1());
//numCategory, numFeatures, PriorFunction
for (MahoutDataPair pair : inputDataSet) {
lrModel.train(pair.getActual(), pair.getFeatureField());
}
}
protected void adaptToPMML() {...
Matrix matrix = lrModel.getBeta(); // coefficients. This is a dense matrix
// that is (numCategories-1) x numFeatures
}
private double getMahoutResult(MahoutDataPair data) {
return lrModel.classifyScalar(data.getVector());
//Returns a single scalar probability in the case where we have two categories.
}
13. Summary of Evaluation Dataset
Model ML Framework Input Data Field Input Data Evaluation Data Nodes in each layer
Neural
Network
Encog 2 layers 20 450
118
20,45,45,1
560
Encog 3 layers 25 450 550 25,20,15,20,1
Mahout 2 layers 20 450
118
20,45,45,1
560
Mahout 3 layers 25 450 550 25,20,15,20,1
Logistic
Regression
Encog 20 450
118
560
Logistic
Regression
Spark 20 450
118
560
Logistic
Regression
Mahout 20 450
118
560
14. Summary of the Functions
model class name
parent
class/interface Training method
retrieve training
result
evalution
method
Basic Data
Structure
Encog
Neural
Network
BasicNetork MLClassification
compute
(MLDataSet data)
getWeights():
double[] compute()
MLData: Double[],
MLDataSet:
Set<Double[]>
Logistic
Regression
Spark
Logistic
Regression
Logistic
Regression
Model
GeneralLinearModel,
ClassificationModel train(RDD data) weights():double[]
predict (RDD
<Vector>):
RDD<Double>
RDD: Resilient
Distributed Dataset
Mahout
Neural
Network
Multilayer
Perceptron NeuralNetwork
trainOnline (Vector
instance)
getWeightMatrices
():Matrix
getOutput
(Vector):Vector
Vector
Matrix:
List<Vector>
Logistic
Regression
Online
Logistic
Regression
AbstractOnline
LogisticRegression
train(Vector actual,
Vector instance) getBeta(): Matrix
classifyScalar
(Vector
instance)
:double
15. Outline
1. Neural Network Model Conversion
a. Encog NN model
b. Mahout NN model
1. Logistic Regression Model Conversion
a. Encog LR model (NN)
b. Spark LR model
c. Mahout LR model
2. PMML Adapter API and how to extend
PMML Adapter
16. 3. PMML Adapter API
1. For new ML model conversion
a. implement a subclass of
PMMLModelBuilder<TargetPMMLModel,
SourceMLModel>, implement
adaptMLModelToPMML()
17. Next Step
● Support: supported by PMML Adapter
● None: The ML framework doesn’t support this ML
model currently
● TBD: To be determined
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD
18. 1. PMML skeleton - Neural Network
<PMML>
<Header></Header>
<DataDictionary></DataDictionary> (specify the format of the input csv)
<NeuralNetwork functionName=”classification”> (models)
<MiningSchema></MiningSchema> (how to use the input data)
<LocalTransformation></LocalTransformation> (specify derived field)
<NeuralInput></NeuralInput> (Input layer, which field should be
used)
<NeuralLayers> (Layers,not include input layer and output layer)
<NeuralLayer activationFunction=”logistic”>
<Neuron id=”X,Y” bias=”0.0”>
<Con from=”X-1,Y” weight=””>
</Neuron>
</NeuralLayer>
</NeuralLayers>
<NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="3,0"></NeuralOutput >
</NeuralOutputs>
</NeuralNetwork>
</PMML>
20. 3. PMML Evaluation
public Map<String, Double> evaluateRaw(EvaluationContext context){
NeuralNetwork neuralNetwork = getModel();
Map<String, Double> result = Maps.newLinkedHashMap();
NeuralInputs neuralInputs = neuralNetwork.getNeuralInputs();
for(NeuralInput neuralInput: neuralInputs){
DerivedField derivedField = neuralInput.getDerivedField();
FieldValue value = ExpressionUtil.evaluate(derivedField, context);...
result.put(neuralInput.getId(), (value.asNumber()).doubleValue());
}
List<NeuralLayer> neuralLayers = neuralNetwork.getNeuralLayers();
for(NeuralLayer neuralLayer : neuralLayers){
List<Neuron> neurons = neuralLayer.getNeurons();
for(Neuron neuron : neurons){
double z = neuron.getBias();//the bias for each Neuron,
should be set to 0
List<Connection> connections = neuron.getConnections();
for(Connection connection : connections){
double input = result.get(connection.getFrom());
z += input * connection.getWeight();
}
double output = activation(z, neuralLayer);
result.put(neuron.getId(), output);
}
normalizeNeuronOutputs(neuralLayer, result);
}
return result;
}
private double activation(double z, NeuralLayer neuralLayer){...
switch(activationFunction){
case LOGISTIC: return 1.0 / (1.0 + Math.exp(-z)); //Sigmoid
case IDENTITY: return z; ...//Linear
}
21. How to get score from PMML
evaluator - EvaluatorTest
PMML pmml = loadPMML(getClass());
//InputStream is = getResourceAsStream("/pmml/" +getSimpleName() + ".pmml");
//return IOUtil.unmarshal(is);
NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);
InputStream is = getClass().getResourceAsStream("/pmml/NormalizedData.csv");
List<Map<FieldName, String>> input = CsvUtil.load(is);
for (Map<FieldName, String> maps : input) {
Map<FieldName, NeuronClassificationMap> evaluateList = (Map<FieldName,
NeuronClassificationMap>)
evaluator.evaluate(maps);
for (NeuronClassificationMap cMap : evaluateList.values())
for (Map.Entry<?, Double> entry : cMap.entrySet())
System.out.println(index++ +":"+entry.getKey() + ":" +
entry.getValue() * 1000);
List<FieldName> activeFields = evaluator.getActiveFields();
}
Editor's Notes
Final w: 21 [0.0,3.4710234211383546,2.023273257032689,3.5343502155005786,3.366137268739723,2.7192122540777857,3.1421212856560685,3.7936396756134347,2.6680726281308655,2.5460611839812803,2.67024401591956,0.950869815171535,0.8354990986314994,1.5612095144815994,3.87191600605803,2.2967968626408948,3.8800049504570575,3.7145275740775876,2.7820240348634053,2.967509669711238,3.878495608939215]
Final w: 21 [0.0,3.4710234211383546,2.023273257032689,3.5343502155005786,3.366137268739723,2.7192122540777857,3.1421212856560685,3.7936396756134347,2.6680726281308655,2.5460611839812803,2.67024401591956,0.950869815171535,0.8354990986314994,1.5612095144815994,3.87191600605803,2.2967968626408948,3.8800049504570575,3.7145275740775876,2.7820240348634053,2.967509669711238,3.878495608939215]
Final w: 21 [0.0,3.4710234211383546,2.023273257032689,3.5343502155005786,3.366137268739723,2.7192122540777857,3.1421212856560685,3.7936396756134347,2.6680726281308655,2.5460611839812803,2.67024401591956,0.950869815171535,0.8354990986314994,1.5612095144815994,3.87191600605803,2.2967968626408948,3.8800049504570575,3.7145275740775876,2.7820240348634053,2.967509669711238,3.878495608939215]