Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 / 47
mlpack: or, How I Learned To Stop Worrying and
Love C++
Ryan R. Curtin
September 23, 2016
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
pro...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
program...
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line...
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line...
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line...
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Standard ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cutting-edge ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
...
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command...
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command...
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command...
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progr...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
prog...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comm...
Take-home
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progra...
kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progra...
vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
progra...
What’s coming?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
p...
Questions and comments?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Comma...
Upcoming SlideShare
Loading in …5
×

Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016

407 views

Published on

mlpack: Or, How I Learned To Stop Worrying and Love C++: mlpack is a cutting-edge C++ machine learning library containing fast implementations of both standard machine learning algorithms and recently-published algorithms. In this talk, I will introduce mlpack, its design philosophy, and discuss how C++ is helpful for making implementations fast, as well as the pros and cons of C++ as a language choice. I will briefly review the capabilities of mlpack, then focus on mlpack’s flexibility by demonstrating the k-means clustering code (and maybe some other algorithms too, like nearest neighbor search), and how it might be used in a production environment. The project website can be found at http://www.mlpack.org/.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016

  1. 1. 1 / 47 mlpack: or, How I Learned To Stop Worrying and Love C++ Ryan R. Curtin September 23, 2016
  2. 2. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here?
  3. 3. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed
  4. 4. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages
  5. 5. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning
  6. 6. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++
  7. 7. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++ mlpack
  8. 8. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++ mlpack fast
  9. 9. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47
  10. 10. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level
  11. 11. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM
  12. 12. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM
  13. 13. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB
  14. 14. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB
  15. 15. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation.
  16. 16. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation. fast easy
  17. 17. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation. fast easy specific portable
  18. 18. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability
  19. 19. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability
  20. 20. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability If we’re careful, we can get speed, portability, and readability by using C++.
  21. 21. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it?
  22. 22. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it? ● a fast general-purpose C++ machine learning library ● contains flexible implementations of common and cutting-edge machine learning algorithms ● for fast or big runs on single workstations ● bindings are available for R, and are coming for other languages ● commonly used inside the machine learning community ● 60+ developers from around the world ● frequent participation in the Google Summer of Code program R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14, p. 801–805, 2013.
  23. 23. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it? ● a fast general-purpose C++ machine learning library ● contains flexible implementations of common and cutting-edge machine learning algorithms ● for fast or big runs on single workstations ● bindings are available for R, and are coming for other languages ● commonly used inside the machine learning community ● 60+ developers from around the world ● frequent participation in the Google Summer of Code program R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14, p. 801–805, 2013. http://www.mlpack.org/ https://github.com/mlpack/mlpack/
  24. 24. Standard ML ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 6 / 47 mlpack implements a lot of standard machine learning techniques. ● k-means and mean shift clustering ● nearest neighbor search and range search ● naive Bayes classifier ● regression: linear, logistic, LARS, softmax ● independent components analysis ● matrix factorization and collaborative filtering ● AdaBoost (plus some weak learners) ● GMMs and HMMs ● deep learning (still experimental) ● kernel PCA and regular PCA ● data preprocessing utilities ● numerous generic optimizers
  25. 25. Cutting-edge ML ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 7 / 47 mlpack also implements new, cutting-edge machine learning techniques. ● several variants of the k-means algorithm ● density estimation trees ● Hoeffding (streaming) decision trees (and experimental streaming forests) ● sparse coding and local coordinate coding ● dual-tree minimum spanning tree calculation ● dual-tree fast max-kernel search ● neighborhood components analysis ● rank-approximate nearest neighbor search ● automatically tuned locality-sensitive hashing (still experimental) ● QUIC-SVD (fast approximate SVD)
  26. 26. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 8 / 47 You don’t need to be a C++ expert.
  27. 27. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 9 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv
  28. 28. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 10 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv # Use PCA to reduce to 5 dimensions. $ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
  29. 29. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 11 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv # Use PCA to reduce to 5 dimensions. $ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv # Impute missing values ("NULL") in the input dataset to the # mean in that dimension. $ mlpack_preprocess_imputer -i dataset.h5 -s mean -o imputed.h5
  30. 30. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 12 / 47 Why write an algorithm for one specific situation?
  31. 31. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 13 / 47 Why write an algorithm for one specific situation? NearestNeighborSearch n(dataset); n.Search(query_set, 3, neighbors, distances); What if I don’t want the Euclidean distance?
  32. 32. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 14 / 47 Why write an algorithm for one specific situation? // The numeric parameter is the value of p for the p-norm to // use. 1 = Manhattan distance, 2 = Euclidean distance, etc. NearestNeighborSearch n(dataset, 1); n.Search(query_set, 3, neighbors, distances); Ok, this is a little better!
  33. 33. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 15 / 47 Why write an algorithm for one specific situation? // ManhattanDistance is a class with a method Evaluate(). NearestNeighborSearch<ManhattanDistance> n(dataset); n.Search(query_set, 3, neighbors, distances); This is much better! The user can specify whatever distance metric they want, including one they write themselves.
  34. 34. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 16 / 47 Why write an algorithm for one specific situation? // This will _definitely_ get me best paper at ICML! I can // feel it! class MyStupidDistance { static double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; // Now we can use it! NearestNeighborSearch<MyStupidDistance> n(dataset); n.Search(query_set, 3, neighbors, distances);
  35. 35. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 17 / 47 Why write an algorithm for one specific situation? // We can also use sparse matrices instead! NearestNeighborSearch<MyStupidDistance, arma::sp_mat> n(sparse_dataset); n.Search(sparse_query_set, 3, neighbors, distances);
  36. 36. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 18 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
  37. 37. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 19 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  38. 38. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 20 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  39. 39. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 21 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  40. 40. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great!
  41. 41. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates.
  42. 42. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management.
  43. 43. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead.
  44. 44. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known!
  45. 45. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known! ● The Armadillo library gives us good linear algebra primitives.
  46. 46. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 23 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known! ● The Armadillo library gives us good linear algebra primitives. using namespace arma; extern mat x, y; mat z = (x + y) * chol(x) + 3 * chol(y.t());
  47. 47. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great!
  48. 48. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages.
  49. 49. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce.
  50. 50. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce. ● The new language revisions are not making the language any simpler...
  51. 51. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce. ● The new language revisions are not making the language any simpler...
  52. 52. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 25 / 47 What about virtual inheritance?
  53. 53. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 26 / 47 What about virtual inheritance? class MyStupidDistance : public Distance { virtual double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; NearestNeighborSearch n(dataset, new MyStupidDistance()); n.Search(3, neighbors, distances);
  54. 54. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 27 / 47 What about virtual inheritance? class MyStupidDistance : public Distance { virtual double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; NearestNeighborSearch n(dataset, new MyStupidDistance()); n.Search(3, neighbors, distances); vtable lookup penalty!
  55. 55. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 28 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b);
  56. 56. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 29 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b); ; push stack pointer movq %rsp, %rdi ; get location of function movq $_ZTV1A+16, (%rsp) ; call Evaluate() call _ZN1A1aEd ; just call Evaluate()! call _ZN1B1aEd.isra.0.constprop.1
  57. 57. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 30 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b); ; push stack pointer movq %rsp, %rdi ; get location of function movq $_ZTV1A+16, (%rsp) ; call Evaluate() call _ZN1A1aEd ; just call Evaluate()! call _ZN1B1aEd.isra.0.constprop.1 Up to 10%+ performance penalty in some situations!
  58. 58. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 31 / 47 What about math? (Armadillo)
  59. 59. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 32 / 47 What about math? (Armadillo) In C: extern double** a, b, c, d, e; extern int rows, cols; // We want to do e = a + b + c + d. mat_copy(e, a, rows, cols); mat_add(e, b, rows, cols); mat_add(e, c, rows, cols); mat_add(e, d, rows, cols);
  60. 60. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 33 / 47 What about math? (Armadillo) In C with a custom function: extern double** a, b, c, d, e; extern int rows, cols; // We want to do e = a + b + c + d. mat_add4_into(e, a, b, c, d, rows, cols); Fastest! (one pass)
  61. 61. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 34 / 47 What about math? (Armadillo) In MATLAB: e = a + b + c + d
  62. 62. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 35 / 47 What about math? (Armadillo) In MATLAB: e = a + b + c + d Beautiful!
  63. 63. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 36 / 47 What about math? (Armadillo)
  64. 64. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 37 / 47 What about math? (Armadillo)
  65. 65. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 38 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; No temporaries, only one pass! Just as fast as the fastest C implementation.
  66. 66. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 39 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add>
  67. 67. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 40 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments
  68. 68. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 41 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments ● So a + b + c + d returns op<mat, op<mat, op<mat, mat, add>, add>, add>
  69. 69. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 42 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments ● So a + b + c + d returns op<mat, op<mat, op<mat, mat, add>, add>, add> ● operator=() is overloaded to ‘unwrap’ op<> expressions
  70. 70. Take-home ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 43 / 47 ● Templates give us generic code. ● Templates allow us to generate fast code.
  71. 71. kNN benchmarks ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 44 / 47 dataset d N mlpack mlpy matlab scikit shogun Weka isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
  72. 72. kNN benchmarks ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 44 / 47 dataset d N mlpack mlpy matlab scikit shogun Weka isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
  73. 73. vs. Spark ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 45 / 47 We can use mmap() for out-of-core learning because our algorithms are generic!
  74. 74. vs. Spark ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 45 / 47 We can use mmap() for out-of-core learning because our algorithms are generic! D. Fang, P. Chau. M3: scaling up machine learning via memory mapping, SIGMOD/PODS 2016.
  75. 75. What’s coming? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 46 / 47 ● Bindings to other languages! (more than just R) ● Non-experimental support for deep learning ● Automatic tuning of machine learning algorithms ● Further compile-time speedups
  76. 76. Questions and comments? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 47 / 47 http://www.mlpack.org/ https://github.com/mlpack/mlpack/

×