SlideShare a Scribd company logo
1 of 48
Download to read offline
Programming Math in Java
Lessons from Apache Commons Math
Phil Steitz
psteitz@apache.org
Apachecon North America 2015
April 14, 2015
Apachecon North America 2015 Programming Math in Java April 14, 2015 1 / 48
Agenda
Commons Math Overview
Examples and problems
Lessons learned
Getting involved
Apachecon North America 2015 Programming Math in Java April 14, 2015 2 / 48
Disclaimer
Apache Commons Math is a community-based, open
development project.
The discussion of problems, challenges and lessons
learned represents one contributor’s views.
Apachecon North America 2015 Programming Math in Java April 14, 2015 3 / 48
Goals
Self-contained, ASL-licensed Java
Provide simple solutions to common problems
Implement standard, well-documented algorithms
Balance ease of use and good design
Stay well-tested and maintained
Maintain a strong community
Apachecon North America 2015 Programming Math in Java April 14, 2015 4 / 48
Some Statistics (as of March 11, 2015)...
67 packages
908 classes
84,057 lines of code
54,629 lines of comments
99.7% documented API
5862 unit tests
92% of code covered
80 open issues
Average 216 issues resolved per year
6 active committers
70+ contributors
0 dependencies
Apachecon North America 2015 Programming Math in Java April 14, 2015 5 / 48
Quick Tour
Apachecon North America 2015 Programming Math in Java April 14, 2015 6 / 48
Building Blocks - util, special, complex packages
FastMath (pure java replacement for java.lang.Math)
Dfp (arbitrary precision arithmetic)
Continued fractions
Special functions (Beta, Gamma, Bessel, Erf)
Array utilities (norms, normalization, filling, etc)
Basic combinatorics ( n
k
, n!, Stirling numbers...)
Basic integer arithmetic (checked ops, gcd, lcm, primality...)
Complex numbers
Apachecon North America 2015 Programming Math in Java April 14, 2015 7 / 48
Linear Algebra - linear package
Vector and Matrix operations (dense, sparse, field)
Decompositions (LU, QR, Cholesky, SVD, Eigenvalue)
Solvers
Basic Numerical Analysis - analysis package
Automatic Differentiation
Numerical Integration
Interpolation
Root finders
Apachecon North America 2015 Programming Math in Java April 14, 2015 8 / 48
Probability and Statistics - distributions, stat and
random packages
Probability distributions (Gaussian, Exponential, Poisson...)
Random number generators (Well, ISAAC, Mersenne twister...)
Random data generators (samplers, random vectors...)
Univariate statistics (storeless and in-memory)
Regression (matrix-based and storeless)
Correlation
Inference (T-, G-, ChiSquare, Kolmogorov-Smirnov tests...)
Apachecon North America 2015 Programming Math in Java April 14, 2015 9 / 48
Optimization and Geometry - optim, fitting, ode,
transform, geometry packages
Fitting (non-linear least-squares, curve fitting)
Ordinary differential equations (ODE)
Linear/Non-linear optimization
Computational Geometry
Fast Fourier transform
Apachecon North America 2015 Programming Math in Java April 14, 2015 10 / 48
AI / Machine learning - genetics, filter and ml
packages
Genetic algorithms
Neural networks
Clustering
Kalman filter
Apachecon North America 2015 Programming Math in Java April 14, 2015 11 / 48
Examples and Challenges
Apachecon North America 2015 Programming Math in Java April 14, 2015 12 / 48
Example 1 - Root finding
Find all roots of the polynomial
p(x) = x5
+ 4x3
+ x2
+ 4 = (x + 1)(x2
− x + 1)(x2
+ 4)
double[] coefficients = { 4.0, 0.0, 1.0, 4.0, 0.0, 1.0 };
LaguerreSolver solver = new LaguerreSolver();
Complex[] result = solver.solveAllComplex(coefficients, 0);
Apachecon North America 2015 Programming Math in Java April 14, 2015 13 / 48
Example - Root finding (cont)
How about this one?
p(x) =
50
i=1
15, 000(i + ix + ix2
+ ix3
)
As of 3.4.1, this will appear to hang.
Two problems:
1 Complex iterates "escape to NaN"
2 Default max function evaluations is set to
Integer.MAX_VALUE
Apachecon North America 2015 Programming Math in Java April 14, 2015 14 / 48
Example 1 - Root finding (cont)
First problem: escape to NaN
Long and painful history debating C99x Annex G
Desire: clear, self-contained, open documentation
Desire: simple, performant code
Result: use standard definitions and let NaNs propagate
Second problem: hanging
Ongoing debate about what to make defaults, whether to have
defaults at all...
Ease of use vs foot-shooting potential
OO vs procedural / struct / primitives design
Apachecon North America 2015 Programming Math in Java April 14, 2015 15 / 48
Example 2 - Kolmogorov-Smirnov Test
Problem: Given two samples S1 and S2, are they drawn from the
same underlying probability distribution?
// load sample data into double[] arrays
double[] s1 = ...
double[] s2 = ...
KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
// Compute p-value for test
double pValue = test.kolmogorovSmirnovTest(s1, s2);
Apachecon North America 2015 Programming Math in Java April 14, 2015 16 / 48
K-S Test Implementation Challenges
Test statistic = Dm,n = maximum difference between the empirical
distribution functions of the two samples.
The distribution of Dm,n is asymptotically an easy-to-compute
distribution
For very small m, n, the distribution can be computed exactly,
but expensively
What to do for mid-size samples (100 < m × n < 10, 000)?
Should this distribution be in the distributions package?
Reference data / correctness very hard to verify
Apachecon North America 2015 Programming Math in Java April 14, 2015 17 / 48
Example 3 - Multivariate optimization
Find the minimum value of 100(y − x2
)2
+ (1 − x)2
, starting at the
point (−1, 1)
MultivariateFunction f = new MultivariateFunction() {
public double value(double[] x) {
double xs = x[0] * x[0];
return (100 * FastMath.pow((x[1] - xs), 2)) +
FastMath.pow(1 - x[0], 2);
}
};
PowellOptimizer optim = new PowellOptimizer(1E-13, 1e-40);
PointValuePair result = optim.optimize(new MaxEval(1000),
new ObjectiveFunction(f),
GoalType.MINIMIZE,
new InitialGuess(new double[] {-1, 1}));
Works OK in this case, but...
Apachecon North America 2015 Programming Math in Java April 14, 2015 18 / 48
Example 3 - Optimization (cont)
User question: How do I choose among the available optimizers?
Developer question: How do we define a consistent API?
Another solution for Example 3:
BOBYQAOptimizer optim = new BOBYQAOptimizer(4);
PointValuePair result = optim.optimize(new MaxEval(1000),
new ObjectiveFunction(f),
GoalType.MINIMIZE,
new InitialGuess(new double[] {-1, 1}),
SimpleBounds.unbounded(2));
Note additional parameter to optimize.
Apachecon North America 2015 Programming Math in Java April 14, 2015 19 / 48
Example 3 - Optimization (cont)
To allow arbitrary parameter lists, we settled on varargs ...
public abstract class BaseOptimizer<PAIR>
...
/**
* Stores data and performs the optimization.
*
* The list of parameters is open-ended so that sub-classes can extend it
* with arguments specific to their concrete implementations.
*
* When the method is called multiple times, instance data is overwritten
* only when actually present in the list of arguments: when not specified,
* data set in a previous call is retained (and thus is optional in
* subsequent calls).
...
* @param optData Optimization data.
...
*/
public PAIR optimize(OptimizationData... optData)
Apachecon North America 2015 Programming Math in Java April 14, 2015 20 / 48
Optimization (cont) - version 4 direction
Refactor all classes in "o.a.c.m.optim"
Phase out "OptimizationData"
Use "fluent" API
Separate optimization problem from optimization
procedure
Experimentation has started with least squares
optimizers:
LevenbergMarquardtOptimizer
GaussNewtonOptimizer
Apachecon North America 2015 Programming Math in Java April 14, 2015 21 / 48
Optimization (cont) - version 4 direction
New classes/interfaces in "o.a.c.m.fitting.leastsquares":
LeastSquaresProblem and LeastSquaresProblem.Evaluation
LeastSquaresOptimizer and LeastSquaresOptimizer.Optimum
LeastSquaresFactory and LeastSquaresBuilder
Usage:
LeastSquaresProblem problem = LeastSquaresFactory.create( /* ... */ );
LeastSquaresOptimizer optim = new LevenbergMarquardtOptimizer( /* ... */ );
Optimum result = optim.optimize(problem);
Yet to be done: Refactor all the other optimizers.
Apachecon North America 2015 Programming Math in Java April 14, 2015 22 / 48
Example 3 - Optimization (cont)
Some additional challenges:
Stochastic algorithms are tricky to test
FORTRAN ports create awful Java code, but "fixing" can
create discrepancies
BOBYQAOptimizer port has never really been fully supported
Answering user questions requires expertise and time
Apachecon North America 2015 Programming Math in Java April 14, 2015 23 / 48
Message from our sponsors...
Come join us!
Code is interesting!
Math is interesting!
Bugs are interesting!
People are interesting!
Come join us!
http://commons.apache.org/math
Apachecon North America 2015 Programming Math in Java April 14, 2015 24 / 48
Example 4 - Aggregated statistics
Problem: Aggregate summary statistics across multiple processes
// Create a collection of stats collectors
Collection<SummaryStatistics> aggregate =
new ArrayList<SummaryStatistics>();
// Add a collector, hand it out to some process
// and gather some data...
SummaryStatistics procStats = new SummaryStatistics();
aggregate.add(procStats);
procStats.addValue(wildValue);
...
// Aggregate results
StatisticalSummary aggregatedStats =
AggregateSummaryStatistics.aggregate(aggregate);
SummaryStatistics instances handle large data streams
StatisticalSummary is reporting interface
Question: how to distribute instances / workload?
Apachecon North America 2015 Programming Math in Java April 14, 2015 25 / 48
Threading and workload management
Questions:
1 Should we implement multi-threaded algorithms?
2 How can we be more friendly for Hadoop / other
distributed compute environments?
So far (as of 3.4.1), we have not done 1.
So far, answer to 2 has been "leave it to users."
Could be both answers are wrong...
Apachecon North America 2015 Programming Math in Java April 14, 2015 26 / 48
Example 5 - Genetic algorithm
Another example where concurrent execution would be natural...
// initialize a new genetic algorithm
GeneticAlgorithm ga = new GeneticAlgorithm(
new OnePointCrossover<Integer>(),
CROSSOVER_RATE,
new BinaryMutation(),
MUTATION_RATE,
new TournamentSelection(TOURNAMENT_ARITY)
);
// initial population
Population initial = getPopulation();
// stopping conditions
StoppingCondition stopCond = new FixedGenerationCount(NUM_GENERATIONS);
// run the algorithm
Population finalPopulation = ga.evolve(initial, stopCond);
// best chromosome from the final population
Chromosome bestFinal = finalPopulation.getFittestChromosome();
Question: how to distribute "evolve" workload?
Apachecon North America 2015 Programming Math in Java April 14, 2015 27 / 48
Example 6 - OLS regression
Three ways to do it
OLSMultipleRegression (stat.regression) in-memory
MillerUpdatingRegression (stat.regression) streaming
LevenbergMarquardtOptimizer (fitting.leastsquares) in-memory
1 How to make sure users discover the right solution for them?
2 How much dependency / reuse / common structure to force?
Apachecon North America 2015 Programming Math in Java April 14, 2015 28 / 48
Example 6 - OLS regression (cont)
Simplest, most common:
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
// Load data
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][];
x[0] = new double[]{0, 0, 0, 0, 0};
x[1] = new double[]{2.0, 0, 0, 0, 0};
...
x[5] = new double[]{0, 0, 0, 0, 6.0};
regression.newSample(y, x);
// Estimate model
double[] beta = regression.estimateRegressionParameters();
double[] residuals = regression.estimateResiduals();
double rSquared = regression.calculateRSquared();
double sigma = regression.estimateRegressionStandardError();
Question: What happens if design matrix is singular?
Apachecon North America 2015 Programming Math in Java April 14, 2015 29 / 48
Example 6 - OLS regression (cont)
Answer: if it is "exactly" singular, SingularMatrixException
If it is near-singular, garbage out
So make singularity threshold configurable:
OLSMultipleLinearRegression regression =
new OLSMultipleLinearRegression(threshold);
What exactly is this threshold?
What should exception error message say?
How to doc it?
What if we change underlying impl?
Apachecon North America 2015 Programming Math in Java April 14, 2015 30 / 48
Message from our sponsors...
Come join us!
Code is interesting!
Math is interesting!
Bugs are interesting!
People are interesting!
Come join us!
http://commons.apache.org/math
Apachecon North America 2015 Programming Math in Java April 14, 2015 31 / 48
Example 7 - Rank correlation
Given two double arrays x[] and y[], we want a
statistical measure of how well their rank orderings
match.
Use Spearman’s rank correlation coefficient.
double[] x = ...
double[] y = ...
SpearmansCorrelation corr = new SpearmansCorrelation();
corr.correlation(x, y);
How are ties in the data handled?
What if one of the arrays contains NaNs?
Apachecon North America 2015 Programming Math in Java April 14, 2015 32 / 48
Example 7 - Rank correlation (cont)
As of 3.4.1, by default ties are averaged and NaNs cause
NotANumberException
Both are configurable via RankingAlgorithm constructor
argument
RankingAlgorithm is a ranking implementation plus
TiesStrategy - what ranks to assign to tied values
NaNStrategy - what to do with NaN values
Default in Spearman’s is natural ranking on doubles with ties
averaged and NaNs disallowed
Apachecon North America 2015 Programming Math in Java April 14, 2015 33 / 48
Example 7 - Rank correlation (cont)
NaNStrategies:
MINIMAL - NaNs are treated as minimal in the ordering
MAXIMAL - NaNs are treated as maximal in the ordering
REMOVED - NaNs are removed
FIXED - NaNs are left "in place"
FAILED - NaNs cause NotANumberException
Problems:
1 What if user supplies a NaturalRanking with the NaN strategy
that says remove NaNs?
2 How to doc clearly and completely?
Apachecon North America 2015 Programming Math in Java April 14, 2015 34 / 48
Example 8 - Random Numbers
Commons Math provides alternatives to the JDK PRNG and a
pluggable framework
RandomGenerator interface is like j.u.Random
All random data generation within Commons Math is
pluggable
Low-level example:
// Allocate an array to hold random bytes
byte[] bytes = new byte[20];
// Instantiate a Well generator with seed = 100;
Well19937c gen = new Well19937c(100);
// Fill the array with random bytes from the generator
gen.nextBytes(bytes);
Apachecon North America 2015 Programming Math in Java April 14, 2015 35 / 48
Example 8 - Random Numbers (cont)
High-level example:
// Create a RandomDataGenerator using a Mersenne Twister
RandomDataGenerator gen =
new RandomDataGenerator(new MersenneTwister());
// Sample a random value from the Poisson(2) distribution
long dev = gen.nextPoisson(2);
Problems:
1 Well generators (used as defaults a lot) have some
initialization overhead. How / when to take this hit?
2 Distribution classes also have sample() methods, but that
makes them dependent on generators
Apachecon North America 2015 Programming Math in Java April 14, 2015 36 / 48
Example 9 - Linear Programming
Maximize 7x1 + 3x2 subject to
3x1 − 5x3 ≤ 0
2x1 − 5x4 ≤ 0
... (more constraints)
LinearObjectiveFunction f =
new LinearObjectiveFunction(new double[] { 7, 3, 0, 0 }, 0 );
List<LinearConstraint> constraints = new ArrayList<LinearConstraint>();
constraints.add( new LinearConstraint(new double[] { 3, 0, -5, 0 },
Relationship.LEQ, 0.0));
constraints.add(new LinearConstraint(new double[] { 2, 0, 0, -5 },
Relationship.LEQ, 0.0));
...
SimplexSolver solver = new SimplexSolver();
PointValuePair solution = solver.optimize(new MaxIter(100), f,
new LinearConstraintSet(constraints),
GoalType.MAXIMIZE,
new NonNegativeConstraint(true));
Apachecon North America 2015 Programming Math in Java April 14, 2015 37 / 48
Example 9 - Linear Programming (cont)
Question: What happens if the algorithm does not converge?
Answer: TooManyIterationsException
What if I want to know the last PointValuePair?
SolutionCallback callback = new SolutionCallback();
try {
PointValuePair solution = solver.optimize(new MaxIter(100), f,
new LinearConstraintSet(constraints),
GoalType.MAXIMIZE,
new NonNegativeConstraint(true), callback);
} catch (TooManyIterationsException ex) {
PointValuePair lastIterate = callback.getSolution();
}
(Rejected) Alternatives:
1 Don’t throw, but embed some kind of status object in return
2 Shove context data in exception messages
Apachecon North America 2015 Programming Math in Java April 14, 2015 38 / 48
Message from our sponsors...
Come join us!
Code is interesting!
Math is interesting!
Bugs are interesting!
People are interesting!
Come join us!
http://commons.apache.org/math
Apachecon North America 2015 Programming Math in Java April 14, 2015 39 / 48
Example 10 - Geometry
Calculate the convex hull and enclosing ball of a set of 2D points:
List<Vector2D> points = ...
ConvexHullGenerator2D generator = new MonotoneChain(true, 1e-6);
ConvexHull2D hull = generator.generate(points);
Encloser<Euclidean2D, Vector2D> encloser =
new WelzlEncloser<Euclidean2D, Vector2D>(1e-6, new DiskGenerator());
EnclosingBall<Euclidean2D, Vector2D> ball = encloser.enclose(points);
Problems:
1 algorithms are prone to
numerical problems
2 important to specify a
meaningful tolerance, but
depends on input data
3 does it make sense to
support a default tolerance?
Apachecon North America 2015 Programming Math in Java April 14, 2015 40 / 48
Lessons Learned
Apachecon North America 2015 Programming Math in Java April 14, 2015 41 / 48
Key Challenges Recap...
Balancing performance, mathematical correctness and
algorithm fidelity
Fully documenting behavior and API contracts
Steering users toward good solutions as simply as possible
Balancing OO design and internal reuse with approachability
Obscure and abandoned contributions
Reference implementations and / or data for validation
Backward compatibility vs API improvement
Apachecon North America 2015 Programming Math in Java April 14, 2015 42 / 48
Some lessons learned
Let practical use cases drive performance / accuracy / fidelity
decisions
Be careful with ports and large contributions
Limit public API to what users need
Prefer standard definitions and algorithms
Take time to fully research bugs and patches
Constantly improve javadoc and User Guide
Try to avoid compatibility breaks, but bundle into major
version releases when you have to
Apachecon North America 2015 Programming Math in Java April 14, 2015 43 / 48
Get Involved!
Apachecon North America 2015 Programming Math in Java April 14, 2015 44 / 48
Using Apache Commons Math
Maven:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.4.1</version>
</dependency>
Download:
http://commons.apache.org/math/download_math.cgi
Watch for new versions!
Apachecon North America 2015 Programming Math in Java April 14, 2015 45 / 48
Links
Project homepage: http://commons.apache.org/math/
Issue tracker:
https://issues.apache.org/jira/browse/MATH
Mailinglists: dev@commons.apache.org &
user@commons.apache.org
e-mail subject: [math]
Wiki: http://wiki.apache.org/commons/MathWishList
Apachecon North America 2015 Programming Math in Java April 14, 2015 46 / 48
How to contribute?
Check out the user guide
http://commons.apache.org/math/userguide
Ask - and answer - questions on the user mailing list
Participate in discussions on the dev list
Create bug reports / feature requests / improvements in JIRA
Create and submit patches (code, documentation, examples)
Look at
http://commons.apache.org/math/developers.html
Run mvn site and review CheckStyle, Findbugs reports when
creating patches
Don’t be shy asking for help with maven, git, coding style,
math ... anything
Provide feedback - most welcome!Apachecon North America 2015 Programming Math in Java April 14, 2015 47 / 48
Questions?
Apachecon North America 2015 Programming Math in Java April 14, 2015 48 / 48

More Related Content

What's hot

What's hot (13)

Vegetables
VegetablesVegetables
Vegetables
 
PPT Morfologi Tumbuhan - Jenis dan Bagian Bunga
PPT Morfologi Tumbuhan - Jenis dan Bagian BungaPPT Morfologi Tumbuhan - Jenis dan Bagian Bunga
PPT Morfologi Tumbuhan - Jenis dan Bagian Bunga
 
WATER VASCULAR SYSTEM.pptx
WATER VASCULAR SYSTEM.pptxWATER VASCULAR SYSTEM.pptx
WATER VASCULAR SYSTEM.pptx
 
Kunci determinasi
Kunci determinasiKunci determinasi
Kunci determinasi
 
Pelajaran 7 8 mandarin
Pelajaran 7 8 mandarinPelajaran 7 8 mandarin
Pelajaran 7 8 mandarin
 
Pencirian, Konsep Sifat, dan Sumber Bukti Taksonomi
Pencirian, Konsep Sifat, dan Sumber Bukti TaksonomiPencirian, Konsep Sifat, dan Sumber Bukti Taksonomi
Pencirian, Konsep Sifat, dan Sumber Bukti Taksonomi
 
Clear concept Of porifera phylum
Clear concept Of porifera phylumClear concept Of porifera phylum
Clear concept Of porifera phylum
 
10. Phyla Cnidaria and Ctenophora
10. Phyla Cnidaria and Ctenophora10. Phyla Cnidaria and Ctenophora
10. Phyla Cnidaria and Ctenophora
 
Reaksi fisiologi, adaptasi, dan bentuk bentuk kehidupan
Reaksi fisiologi, adaptasi, dan bentuk bentuk kehidupanReaksi fisiologi, adaptasi, dan bentuk bentuk kehidupan
Reaksi fisiologi, adaptasi, dan bentuk bentuk kehidupan
 
Klasifikasi porifera
Klasifikasi poriferaKlasifikasi porifera
Klasifikasi porifera
 
PPT Embriologi Tumbuhan - Bryophyta
PPT Embriologi Tumbuhan - BryophytaPPT Embriologi Tumbuhan - Bryophyta
PPT Embriologi Tumbuhan - Bryophyta
 
Sekresi
SekresiSekresi
Sekresi
 
Pertanyaan 4
Pertanyaan 4Pertanyaan 4
Pertanyaan 4
 

Similar to Programming Math in Java: Lessons from Apache Commons Math

【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~
【FIT2016チュートリアル】ここから始める情報処理  ~機械学習編~【FIT2016チュートリアル】ここから始める情報処理  ~機械学習編~
【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~Toshihiko Yamasaki
 
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationPythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationECAM Brussels Engineering School
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Mathassess demo-20090130
Mathassess demo-20090130Mathassess demo-20090130
Mathassess demo-20090130JB Online
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon
 
The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization SAP Technology
 
Nag software For Finance
Nag software For FinanceNag software For Finance
Nag software For Financefcassier
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code OptimizationIRJET Journal
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testingjfrchicanog
 
Testify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedTestify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedMinh Nguyen
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developersNirmal Fernando
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Numerical Excellence In Finance N A G Jan2010
Numerical Excellence In Finance N A G Jan2010Numerical Excellence In Finance N A G Jan2010
Numerical Excellence In Finance N A G Jan2010John Holden
 
Natural Computing for Vehicular Networks
Natural Computing for Vehicular NetworksNatural Computing for Vehicular Networks
Natural Computing for Vehicular NetworksJamal Toutouh, PhD
 
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...Towards Developing a Repository of Logical Errors Observed in Parallel Code t...
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...Ritu Arora
 

Similar to Programming Math in Java: Lessons from Apache Commons Math (20)

【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~
【FIT2016チュートリアル】ここから始める情報処理  ~機械学習編~【FIT2016チュートリアル】ここから始める情報処理  ~機械学習編~
【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~
 
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationPythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Mathassess demo-20090130
Mathassess demo-20090130Mathassess demo-20090130
Mathassess demo-20090130
 
Math Assess Demo 20090130
Math Assess Demo 20090130Math Assess Demo 20090130
Math Assess Demo 20090130
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization
 
Nag software For Finance
Nag software For FinanceNag software For Finance
Nag software For Finance
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testing
 
Testify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedTestify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeed
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developers
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
Numerical Excellence In Finance N A G Jan2010
Numerical Excellence In Finance N A G Jan2010Numerical Excellence In Finance N A G Jan2010
Numerical Excellence In Finance N A G Jan2010
 
geekgap.io webinar #1
geekgap.io webinar #1geekgap.io webinar #1
geekgap.io webinar #1
 
Natural Computing for Vehicular Networks
Natural Computing for Vehicular NetworksNatural Computing for Vehicular Networks
Natural Computing for Vehicular Networks
 
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...Towards Developing a Repository of Logical Errors Observed in Parallel Code t...
Towards Developing a Repository of Logical Errors Observed in Parallel Code t...
 

More from Phil Steitz

Apache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdateApache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdatePhil Steitz
 
Leadership model
Leadership modelLeadership model
Leadership modelPhil Steitz
 
Open Development in the Enterprise
Open Development in the EnterpriseOpen Development in the Enterprise
Open Development in the EnterprisePhil Steitz
 
Commons Pool and DBCP
Commons Pool and DBCPCommons Pool and DBCP
Commons Pool and DBCPPhil Steitz
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology LeadershipPhil Steitz
 

More from Phil Steitz (7)

Apache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdateApache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 Update
 
Commons Nabla
Commons NablaCommons Nabla
Commons Nabla
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Leadership model
Leadership modelLeadership model
Leadership model
 
Open Development in the Enterprise
Open Development in the EnterpriseOpen Development in the Enterprise
Open Development in the Enterprise
 
Commons Pool and DBCP
Commons Pool and DBCPCommons Pool and DBCP
Commons Pool and DBCP
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 

Recently uploaded

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Programming Math in Java: Lessons from Apache Commons Math

  • 1. Programming Math in Java Lessons from Apache Commons Math Phil Steitz psteitz@apache.org Apachecon North America 2015 April 14, 2015 Apachecon North America 2015 Programming Math in Java April 14, 2015 1 / 48
  • 2. Agenda Commons Math Overview Examples and problems Lessons learned Getting involved Apachecon North America 2015 Programming Math in Java April 14, 2015 2 / 48
  • 3. Disclaimer Apache Commons Math is a community-based, open development project. The discussion of problems, challenges and lessons learned represents one contributor’s views. Apachecon North America 2015 Programming Math in Java April 14, 2015 3 / 48
  • 4. Goals Self-contained, ASL-licensed Java Provide simple solutions to common problems Implement standard, well-documented algorithms Balance ease of use and good design Stay well-tested and maintained Maintain a strong community Apachecon North America 2015 Programming Math in Java April 14, 2015 4 / 48
  • 5. Some Statistics (as of March 11, 2015)... 67 packages 908 classes 84,057 lines of code 54,629 lines of comments 99.7% documented API 5862 unit tests 92% of code covered 80 open issues Average 216 issues resolved per year 6 active committers 70+ contributors 0 dependencies Apachecon North America 2015 Programming Math in Java April 14, 2015 5 / 48
  • 6. Quick Tour Apachecon North America 2015 Programming Math in Java April 14, 2015 6 / 48
  • 7. Building Blocks - util, special, complex packages FastMath (pure java replacement for java.lang.Math) Dfp (arbitrary precision arithmetic) Continued fractions Special functions (Beta, Gamma, Bessel, Erf) Array utilities (norms, normalization, filling, etc) Basic combinatorics ( n k , n!, Stirling numbers...) Basic integer arithmetic (checked ops, gcd, lcm, primality...) Complex numbers Apachecon North America 2015 Programming Math in Java April 14, 2015 7 / 48
  • 8. Linear Algebra - linear package Vector and Matrix operations (dense, sparse, field) Decompositions (LU, QR, Cholesky, SVD, Eigenvalue) Solvers Basic Numerical Analysis - analysis package Automatic Differentiation Numerical Integration Interpolation Root finders Apachecon North America 2015 Programming Math in Java April 14, 2015 8 / 48
  • 9. Probability and Statistics - distributions, stat and random packages Probability distributions (Gaussian, Exponential, Poisson...) Random number generators (Well, ISAAC, Mersenne twister...) Random data generators (samplers, random vectors...) Univariate statistics (storeless and in-memory) Regression (matrix-based and storeless) Correlation Inference (T-, G-, ChiSquare, Kolmogorov-Smirnov tests...) Apachecon North America 2015 Programming Math in Java April 14, 2015 9 / 48
  • 10. Optimization and Geometry - optim, fitting, ode, transform, geometry packages Fitting (non-linear least-squares, curve fitting) Ordinary differential equations (ODE) Linear/Non-linear optimization Computational Geometry Fast Fourier transform Apachecon North America 2015 Programming Math in Java April 14, 2015 10 / 48
  • 11. AI / Machine learning - genetics, filter and ml packages Genetic algorithms Neural networks Clustering Kalman filter Apachecon North America 2015 Programming Math in Java April 14, 2015 11 / 48
  • 12. Examples and Challenges Apachecon North America 2015 Programming Math in Java April 14, 2015 12 / 48
  • 13. Example 1 - Root finding Find all roots of the polynomial p(x) = x5 + 4x3 + x2 + 4 = (x + 1)(x2 − x + 1)(x2 + 4) double[] coefficients = { 4.0, 0.0, 1.0, 4.0, 0.0, 1.0 }; LaguerreSolver solver = new LaguerreSolver(); Complex[] result = solver.solveAllComplex(coefficients, 0); Apachecon North America 2015 Programming Math in Java April 14, 2015 13 / 48
  • 14. Example - Root finding (cont) How about this one? p(x) = 50 i=1 15, 000(i + ix + ix2 + ix3 ) As of 3.4.1, this will appear to hang. Two problems: 1 Complex iterates "escape to NaN" 2 Default max function evaluations is set to Integer.MAX_VALUE Apachecon North America 2015 Programming Math in Java April 14, 2015 14 / 48
  • 15. Example 1 - Root finding (cont) First problem: escape to NaN Long and painful history debating C99x Annex G Desire: clear, self-contained, open documentation Desire: simple, performant code Result: use standard definitions and let NaNs propagate Second problem: hanging Ongoing debate about what to make defaults, whether to have defaults at all... Ease of use vs foot-shooting potential OO vs procedural / struct / primitives design Apachecon North America 2015 Programming Math in Java April 14, 2015 15 / 48
  • 16. Example 2 - Kolmogorov-Smirnov Test Problem: Given two samples S1 and S2, are they drawn from the same underlying probability distribution? // load sample data into double[] arrays double[] s1 = ... double[] s2 = ... KolmogorovSmirnovTest test = new KolmogorovSmirnovTest(); // Compute p-value for test double pValue = test.kolmogorovSmirnovTest(s1, s2); Apachecon North America 2015 Programming Math in Java April 14, 2015 16 / 48
  • 17. K-S Test Implementation Challenges Test statistic = Dm,n = maximum difference between the empirical distribution functions of the two samples. The distribution of Dm,n is asymptotically an easy-to-compute distribution For very small m, n, the distribution can be computed exactly, but expensively What to do for mid-size samples (100 < m × n < 10, 000)? Should this distribution be in the distributions package? Reference data / correctness very hard to verify Apachecon North America 2015 Programming Math in Java April 14, 2015 17 / 48
  • 18. Example 3 - Multivariate optimization Find the minimum value of 100(y − x2 )2 + (1 − x)2 , starting at the point (−1, 1) MultivariateFunction f = new MultivariateFunction() { public double value(double[] x) { double xs = x[0] * x[0]; return (100 * FastMath.pow((x[1] - xs), 2)) + FastMath.pow(1 - x[0], 2); } }; PowellOptimizer optim = new PowellOptimizer(1E-13, 1e-40); PointValuePair result = optim.optimize(new MaxEval(1000), new ObjectiveFunction(f), GoalType.MINIMIZE, new InitialGuess(new double[] {-1, 1})); Works OK in this case, but... Apachecon North America 2015 Programming Math in Java April 14, 2015 18 / 48
  • 19. Example 3 - Optimization (cont) User question: How do I choose among the available optimizers? Developer question: How do we define a consistent API? Another solution for Example 3: BOBYQAOptimizer optim = new BOBYQAOptimizer(4); PointValuePair result = optim.optimize(new MaxEval(1000), new ObjectiveFunction(f), GoalType.MINIMIZE, new InitialGuess(new double[] {-1, 1}), SimpleBounds.unbounded(2)); Note additional parameter to optimize. Apachecon North America 2015 Programming Math in Java April 14, 2015 19 / 48
  • 20. Example 3 - Optimization (cont) To allow arbitrary parameter lists, we settled on varargs ... public abstract class BaseOptimizer<PAIR> ... /** * Stores data and performs the optimization. * * The list of parameters is open-ended so that sub-classes can extend it * with arguments specific to their concrete implementations. * * When the method is called multiple times, instance data is overwritten * only when actually present in the list of arguments: when not specified, * data set in a previous call is retained (and thus is optional in * subsequent calls). ... * @param optData Optimization data. ... */ public PAIR optimize(OptimizationData... optData) Apachecon North America 2015 Programming Math in Java April 14, 2015 20 / 48
  • 21. Optimization (cont) - version 4 direction Refactor all classes in "o.a.c.m.optim" Phase out "OptimizationData" Use "fluent" API Separate optimization problem from optimization procedure Experimentation has started with least squares optimizers: LevenbergMarquardtOptimizer GaussNewtonOptimizer Apachecon North America 2015 Programming Math in Java April 14, 2015 21 / 48
  • 22. Optimization (cont) - version 4 direction New classes/interfaces in "o.a.c.m.fitting.leastsquares": LeastSquaresProblem and LeastSquaresProblem.Evaluation LeastSquaresOptimizer and LeastSquaresOptimizer.Optimum LeastSquaresFactory and LeastSquaresBuilder Usage: LeastSquaresProblem problem = LeastSquaresFactory.create( /* ... */ ); LeastSquaresOptimizer optim = new LevenbergMarquardtOptimizer( /* ... */ ); Optimum result = optim.optimize(problem); Yet to be done: Refactor all the other optimizers. Apachecon North America 2015 Programming Math in Java April 14, 2015 22 / 48
  • 23. Example 3 - Optimization (cont) Some additional challenges: Stochastic algorithms are tricky to test FORTRAN ports create awful Java code, but "fixing" can create discrepancies BOBYQAOptimizer port has never really been fully supported Answering user questions requires expertise and time Apachecon North America 2015 Programming Math in Java April 14, 2015 23 / 48
  • 24. Message from our sponsors... Come join us! Code is interesting! Math is interesting! Bugs are interesting! People are interesting! Come join us! http://commons.apache.org/math Apachecon North America 2015 Programming Math in Java April 14, 2015 24 / 48
  • 25. Example 4 - Aggregated statistics Problem: Aggregate summary statistics across multiple processes // Create a collection of stats collectors Collection<SummaryStatistics> aggregate = new ArrayList<SummaryStatistics>(); // Add a collector, hand it out to some process // and gather some data... SummaryStatistics procStats = new SummaryStatistics(); aggregate.add(procStats); procStats.addValue(wildValue); ... // Aggregate results StatisticalSummary aggregatedStats = AggregateSummaryStatistics.aggregate(aggregate); SummaryStatistics instances handle large data streams StatisticalSummary is reporting interface Question: how to distribute instances / workload? Apachecon North America 2015 Programming Math in Java April 14, 2015 25 / 48
  • 26. Threading and workload management Questions: 1 Should we implement multi-threaded algorithms? 2 How can we be more friendly for Hadoop / other distributed compute environments? So far (as of 3.4.1), we have not done 1. So far, answer to 2 has been "leave it to users." Could be both answers are wrong... Apachecon North America 2015 Programming Math in Java April 14, 2015 26 / 48
  • 27. Example 5 - Genetic algorithm Another example where concurrent execution would be natural... // initialize a new genetic algorithm GeneticAlgorithm ga = new GeneticAlgorithm( new OnePointCrossover<Integer>(), CROSSOVER_RATE, new BinaryMutation(), MUTATION_RATE, new TournamentSelection(TOURNAMENT_ARITY) ); // initial population Population initial = getPopulation(); // stopping conditions StoppingCondition stopCond = new FixedGenerationCount(NUM_GENERATIONS); // run the algorithm Population finalPopulation = ga.evolve(initial, stopCond); // best chromosome from the final population Chromosome bestFinal = finalPopulation.getFittestChromosome(); Question: how to distribute "evolve" workload? Apachecon North America 2015 Programming Math in Java April 14, 2015 27 / 48
  • 28. Example 6 - OLS regression Three ways to do it OLSMultipleRegression (stat.regression) in-memory MillerUpdatingRegression (stat.regression) streaming LevenbergMarquardtOptimizer (fitting.leastsquares) in-memory 1 How to make sure users discover the right solution for them? 2 How much dependency / reuse / common structure to force? Apachecon North America 2015 Programming Math in Java April 14, 2015 28 / 48
  • 29. Example 6 - OLS regression (cont) Simplest, most common: OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression(); // Load data double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; double[] x = new double[6][]; x[0] = new double[]{0, 0, 0, 0, 0}; x[1] = new double[]{2.0, 0, 0, 0, 0}; ... x[5] = new double[]{0, 0, 0, 0, 6.0}; regression.newSample(y, x); // Estimate model double[] beta = regression.estimateRegressionParameters(); double[] residuals = regression.estimateResiduals(); double rSquared = regression.calculateRSquared(); double sigma = regression.estimateRegressionStandardError(); Question: What happens if design matrix is singular? Apachecon North America 2015 Programming Math in Java April 14, 2015 29 / 48
  • 30. Example 6 - OLS regression (cont) Answer: if it is "exactly" singular, SingularMatrixException If it is near-singular, garbage out So make singularity threshold configurable: OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression(threshold); What exactly is this threshold? What should exception error message say? How to doc it? What if we change underlying impl? Apachecon North America 2015 Programming Math in Java April 14, 2015 30 / 48
  • 31. Message from our sponsors... Come join us! Code is interesting! Math is interesting! Bugs are interesting! People are interesting! Come join us! http://commons.apache.org/math Apachecon North America 2015 Programming Math in Java April 14, 2015 31 / 48
  • 32. Example 7 - Rank correlation Given two double arrays x[] and y[], we want a statistical measure of how well their rank orderings match. Use Spearman’s rank correlation coefficient. double[] x = ... double[] y = ... SpearmansCorrelation corr = new SpearmansCorrelation(); corr.correlation(x, y); How are ties in the data handled? What if one of the arrays contains NaNs? Apachecon North America 2015 Programming Math in Java April 14, 2015 32 / 48
  • 33. Example 7 - Rank correlation (cont) As of 3.4.1, by default ties are averaged and NaNs cause NotANumberException Both are configurable via RankingAlgorithm constructor argument RankingAlgorithm is a ranking implementation plus TiesStrategy - what ranks to assign to tied values NaNStrategy - what to do with NaN values Default in Spearman’s is natural ranking on doubles with ties averaged and NaNs disallowed Apachecon North America 2015 Programming Math in Java April 14, 2015 33 / 48
  • 34. Example 7 - Rank correlation (cont) NaNStrategies: MINIMAL - NaNs are treated as minimal in the ordering MAXIMAL - NaNs are treated as maximal in the ordering REMOVED - NaNs are removed FIXED - NaNs are left "in place" FAILED - NaNs cause NotANumberException Problems: 1 What if user supplies a NaturalRanking with the NaN strategy that says remove NaNs? 2 How to doc clearly and completely? Apachecon North America 2015 Programming Math in Java April 14, 2015 34 / 48
  • 35. Example 8 - Random Numbers Commons Math provides alternatives to the JDK PRNG and a pluggable framework RandomGenerator interface is like j.u.Random All random data generation within Commons Math is pluggable Low-level example: // Allocate an array to hold random bytes byte[] bytes = new byte[20]; // Instantiate a Well generator with seed = 100; Well19937c gen = new Well19937c(100); // Fill the array with random bytes from the generator gen.nextBytes(bytes); Apachecon North America 2015 Programming Math in Java April 14, 2015 35 / 48
  • 36. Example 8 - Random Numbers (cont) High-level example: // Create a RandomDataGenerator using a Mersenne Twister RandomDataGenerator gen = new RandomDataGenerator(new MersenneTwister()); // Sample a random value from the Poisson(2) distribution long dev = gen.nextPoisson(2); Problems: 1 Well generators (used as defaults a lot) have some initialization overhead. How / when to take this hit? 2 Distribution classes also have sample() methods, but that makes them dependent on generators Apachecon North America 2015 Programming Math in Java April 14, 2015 36 / 48
  • 37. Example 9 - Linear Programming Maximize 7x1 + 3x2 subject to 3x1 − 5x3 ≤ 0 2x1 − 5x4 ≤ 0 ... (more constraints) LinearObjectiveFunction f = new LinearObjectiveFunction(new double[] { 7, 3, 0, 0 }, 0 ); List<LinearConstraint> constraints = new ArrayList<LinearConstraint>(); constraints.add( new LinearConstraint(new double[] { 3, 0, -5, 0 }, Relationship.LEQ, 0.0)); constraints.add(new LinearConstraint(new double[] { 2, 0, 0, -5 }, Relationship.LEQ, 0.0)); ... SimplexSolver solver = new SimplexSolver(); PointValuePair solution = solver.optimize(new MaxIter(100), f, new LinearConstraintSet(constraints), GoalType.MAXIMIZE, new NonNegativeConstraint(true)); Apachecon North America 2015 Programming Math in Java April 14, 2015 37 / 48
  • 38. Example 9 - Linear Programming (cont) Question: What happens if the algorithm does not converge? Answer: TooManyIterationsException What if I want to know the last PointValuePair? SolutionCallback callback = new SolutionCallback(); try { PointValuePair solution = solver.optimize(new MaxIter(100), f, new LinearConstraintSet(constraints), GoalType.MAXIMIZE, new NonNegativeConstraint(true), callback); } catch (TooManyIterationsException ex) { PointValuePair lastIterate = callback.getSolution(); } (Rejected) Alternatives: 1 Don’t throw, but embed some kind of status object in return 2 Shove context data in exception messages Apachecon North America 2015 Programming Math in Java April 14, 2015 38 / 48
  • 39. Message from our sponsors... Come join us! Code is interesting! Math is interesting! Bugs are interesting! People are interesting! Come join us! http://commons.apache.org/math Apachecon North America 2015 Programming Math in Java April 14, 2015 39 / 48
  • 40. Example 10 - Geometry Calculate the convex hull and enclosing ball of a set of 2D points: List<Vector2D> points = ... ConvexHullGenerator2D generator = new MonotoneChain(true, 1e-6); ConvexHull2D hull = generator.generate(points); Encloser<Euclidean2D, Vector2D> encloser = new WelzlEncloser<Euclidean2D, Vector2D>(1e-6, new DiskGenerator()); EnclosingBall<Euclidean2D, Vector2D> ball = encloser.enclose(points); Problems: 1 algorithms are prone to numerical problems 2 important to specify a meaningful tolerance, but depends on input data 3 does it make sense to support a default tolerance? Apachecon North America 2015 Programming Math in Java April 14, 2015 40 / 48
  • 41. Lessons Learned Apachecon North America 2015 Programming Math in Java April 14, 2015 41 / 48
  • 42. Key Challenges Recap... Balancing performance, mathematical correctness and algorithm fidelity Fully documenting behavior and API contracts Steering users toward good solutions as simply as possible Balancing OO design and internal reuse with approachability Obscure and abandoned contributions Reference implementations and / or data for validation Backward compatibility vs API improvement Apachecon North America 2015 Programming Math in Java April 14, 2015 42 / 48
  • 43. Some lessons learned Let practical use cases drive performance / accuracy / fidelity decisions Be careful with ports and large contributions Limit public API to what users need Prefer standard definitions and algorithms Take time to fully research bugs and patches Constantly improve javadoc and User Guide Try to avoid compatibility breaks, but bundle into major version releases when you have to Apachecon North America 2015 Programming Math in Java April 14, 2015 43 / 48
  • 44. Get Involved! Apachecon North America 2015 Programming Math in Java April 14, 2015 44 / 48
  • 45. Using Apache Commons Math Maven: <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-math3</artifactId> <version>3.4.1</version> </dependency> Download: http://commons.apache.org/math/download_math.cgi Watch for new versions! Apachecon North America 2015 Programming Math in Java April 14, 2015 45 / 48
  • 46. Links Project homepage: http://commons.apache.org/math/ Issue tracker: https://issues.apache.org/jira/browse/MATH Mailinglists: dev@commons.apache.org & user@commons.apache.org e-mail subject: [math] Wiki: http://wiki.apache.org/commons/MathWishList Apachecon North America 2015 Programming Math in Java April 14, 2015 46 / 48
  • 47. How to contribute? Check out the user guide http://commons.apache.org/math/userguide Ask - and answer - questions on the user mailing list Participate in discussions on the dev list Create bug reports / feature requests / improvements in JIRA Create and submit patches (code, documentation, examples) Look at http://commons.apache.org/math/developers.html Run mvn site and review CheckStyle, Findbugs reports when creating patches Don’t be shy asking for help with maven, git, coding style, math ... anything Provide feedback - most welcome!Apachecon North America 2015 Programming Math in Java April 14, 2015 47 / 48
  • 48. Questions? Apachecon North America 2015 Programming Math in Java April 14, 2015 48 / 48