SlideShare a Scribd company logo
NumPy and SciPy: History
                     and Ideas for the Future
                               Travis E. Oliphant

                          SciPy Tokyo. March 18, 2012




Saturday, March 17, 12
My Roots




Saturday, March 17, 12
My Roots
                         Images from BYU Mers Lab




Saturday, March 17, 12
Science led to Python
                         2
   ⇢0 (2⇡f ) Ui (a, f ) = [Cijkl (a, f ) Uk,l (a, f )],j

          Raja Muthupillai


                                 Richard Ehman
                                      1997



                                                      Armando Manduca


Saturday, March 17, 12
Finding derivatives of 5-d data
                                  ⌅=r⇥U




Saturday, March 17, 12
Scientist at heart




Saturday, March 17, 12
Python origins.                      http://python-history.blogspot.com/2009/01/brief-timeline-of-python.html



           Version         Date
                0.9.0    Feb. 1991
                0.9.4    Dec. 1991
                0.9.6    Apr. 1992
                0.9.8    Jan. 1993
                1.0.0    Jan. 1994
                 1.2     Apr. 1995
                 1.4     Oct. 1996
                1.5.2    Apr. 1999


Saturday, March 17, 12
Python origins.                      http://python-history.blogspot.com/2009/01/brief-timeline-of-python.html



           Version         Date
                0.9.0    Feb. 1991
                0.9.4    Dec. 1991
                0.9.6    Apr. 1992
                0.9.8    Jan. 1993
                1.0.0    Jan. 1994
                 1.2     Apr. 1995
                 1.4     Oct. 1996
                1.5.2    Apr. 1999


Saturday, March 17, 12
Brief History

                              Person               Package       Year
                                                 Matrix Object
                            Jim Fulton                           1994
                                                  in Python
                          Jim Hugunin              Numeric       1995
                         Perry Greenfield, Rick
                          White, Todd Miller      Numarray       2001

                         Travis Oliphant            NumPy        2005



Saturday, March 17, 12
Found Python and Numeric in 1997
              I was a fairly proficient MATLAB user, but it was not memory efficient enough.



                     Loved the expressive syntax of Python
                  Loved the fact that slicing didn’t make copies
                     Loved the existing multiple data-types
                    Loved how much more flexible it was to
                         extend than MATLAB was
                  Loved that I could read the source code and
                                   extend it




Saturday, March 17, 12
First problem: Efficient Data Input
                             “It’s All About the Data”

                                                 Reference Counting Essay
                           TableIO     http://www.python.org/doc/essays/refcnt/
                          April 1998                   May 1998


      Michael A. Miller                                                       Guido van Rossum



                                          NumPyIO
                                          June 1998




Saturday, March 17, 12
First problem: Efficient Data Input
                             “It’s All About the Data”

                                                 Reference Counting Essay
                           TableIO     http://www.python.org/doc/essays/refcnt/
                          April 1998                   May 1998


      Michael A. Miller                                                       Guido van Rossum



                                          NumPyIO
                                          June 1998




Saturday, March 17, 12
Early pieces of SciPy
                   fftw wrappers              cephesmodule
                     June 1998               November 1998




                 stats.py
      December 1998
                                    Gary
                                 Strangman




Saturday, March 17, 12
1999 : Early SciPy emerges
       Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis
    environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen,
                 and others. Activity in 1998, led to increased interest in 1999.

     In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be
    present and began wrapping / writing in earnest. On 6 April 1999, I announced I would
                  be creating this uber-package which eventually became SciPy

              Gaussian quadrature                 5 Jan 1999
                     cephes 1.0                  30 Jan 1999
                    sigtools 0.40                23 Feb 1999
                  Numeric docs                   March 1999
                     cephes 1.1                  9 Mar 1999                                       Plotting??
                   multipack 0.3                 13 Apr 1999
                 Helper routines                 14 Apr 1999                                       Gist
          multipack 0.6 (leastsq, ode, fsolve,   29 Apr 1999                                      XPLOT
                         quad)
                                                                                                  DISLIN
              sparse plan described              30 May 1999
                                                                                                  Gnuplot
                   multipack 0.7                 14 Jun 1999
                   SparsePy 0.1
               cephes 1.2 (vectorize)
                                                 5 Nov 1999
                                                 29 Dec 1999
                                                                           Helping with f2py


Saturday, March 17, 12
Early 2000 : Numeric needs


             •      Memory mapped arrays
             •      Rank-0 arrays or scalars
             •      Handling indirect indexing: a[[10,5,7]]
             •      Handling masked indexing: a[[True, False, False]
             •      More attributes to N-d arrays
             •      “Record arrays”




Saturday, March 17, 12
Early contributors in 1999
                                Hosting of first Multipack CVS repository (June 1999)
                                                  Amazing makefiles
                                                Interface to FITPACK
                             Wrote f2py as he watched my brute-force approach (July 1999)
      Pearu Peterson



                         (IPP) Early IPython interactive environment (27 Apr 1999)
                                       Matlab file reader (24 Apr 1999)

        Janko Hauser


                             Created windows binaries of multipack, cephesmodule,
                            fftw, and signaltools (June 1999 while still in high school!)

        Robert Kern



Saturday, March 17, 12
Early contributors in 1999
                                Hosting of first Multipack CVS repository (June 1999)
                                                  Amazing makefiles
                                                Interface to FITPACK
                             Wrote f2py as he watched my brute-force approach (July 1999)
      Pearu Peterson



                         (IPP) Early IPython interactive environment (27 Apr 1999)
                                       Matlab file reader (24 Apr 1999)

        Janko Hauser


                             Created windows binaries of multipack, cephesmodule,
                            fftw, and signaltools (June 1999 while still in high school!)

        Robert Kern



Saturday, March 17, 12
Saturday, March 17, 12
Saturday, March 17, 12
Saturday, March 17, 12
2000-2001




Saturday, March 17, 12
SciPy 2001        Travis Oliphant
                             optimize
                              sparse
                           interpolate
                            integrate
                              special
                               signal
                                stats      Founded in 2001 with Travis Vaught
                              fftpack
                                misc




                                                            Eric Jones
                                                              weave
                                                             cluster
     Pearu Peterson
                                                               GA*
          linalg
       interpolate
           f2py



Saturday, March 17, 12
Finally... Plotting



                    John Hunter
                       2001



Saturday, March 17, 12
IPython (building on IPP)
                    Fernando Perez
                     Dec 10, 2001




Saturday, March 17, 12
STSCI leads out with Numarray


             Perry Greenfield
               J. Todd Miller
                Rick White
                Paul Barrett


Saturday, March 17, 12
Numarray released in 2003

         • too slow for small arrays
         • incomplete implementation for ufuncs
         • minimal Numeric code re-use

         • lots of very nice things, though (e.g. memory
           maps, fast code for large arrays, better
           sorting algorithms)



Saturday, March 17, 12
Split in the community


                         Numeric
                          SciPy

                                   Numarray
                                   ndimage
                                    others




Saturday, March 17, 12
ndimage




Saturday, March 17, 12
started January 2005

                              NumPy
                          Version 1.0 October 2006



                         Key contributions from:
                                Numarray
                                 Numeric
                               Chuck Harris
                               Robert Kern
                               David Cooke
                                Pierre GM




Saturday, March 17, 12
NumPy in Python
            Q: When is NumPy going to be part of the
            core Python?

            A: Data structure is in Python as PEP 3118




Saturday, March 17, 12
Saturday, March 17, 12
Left Academica For Industry




Saturday, March 17, 12
Community effort                           many, many others --- forgive me!
      • Chuck Harris
      • Pauli Virtanen
      • David Cournapeau
      • Stefan van der Walt
      • Jarrod Millman
      • Josef Perktold
      • Anne Archibald
      • Dag Sverre Seljebotn
      • Robert Kern
      • Matthew Brett
      • Warren Weckesser
      • Ralf Gommers
      • Joe Harrington --- Documentation effort
      • Andrew Straw --- www.scipy.org




Saturday, March 17, 12
NumPy Array




                         shape




Saturday, March 17, 12
NumPy Essentials
        • Data: the array object
               – slicing
               – shapes and strides
               – data-type generality

        • Fast Math:
               – vectorization
               – broadcasting
               – aggregations


Saturday, March 17, 12
Zen of NumPy
          •   strided is better than scattered
          •   contiguous is better than strided
          •   descriptive is better than imperative
          •   array-oriented is better than object-oriented
          •   broadcasting is a great idea
          •   vectorized is better than an explicit loop
          •   unless it’s too complicated --- then use Cython / weave
          •   think in higher dimensions




Saturday, March 17, 12
Now What?




Saturday, March 17, 12
Experiences Conulsting
         • Basically, I was a consultant and manager of
           consultants for 4 years.
         • It kept the lights on at home, but gave me minimal
           time for NumPy.
         • Silver lining was that I learned exactly what “big-data”
           problems big companies have and saw first-hand that
           numerous “little” improvements need to be made to
           NumPy which will allow it to have a larger impact on
           helping to manage the world’s data.




Saturday, March 17, 12
Putting Science back in CS
                • Much of the software stack is for systems
                  programming --- C++, Java, .NET, ObjC, web
                   - Complex numbers?
                   - Vectorization primitives?
                • Array-based programming has been
                  supplanted by Object-oriented programming
                • Software stack for scientists is not as helpful
                  as it should be
                • Fortran is still where many scientists end up


Saturday, March 17, 12
APL : the first array-oriented language
    • Appeared in 1964
    • Originated by Ken Iverson
    • Direct descendants (J, K, Matlab) are still
      used heavily and people pay a lot of money
      for them                            APL
    • NumPy is a descendent                J

                                     K   Matlab
                                                  Numeric
                                                  NumPy




Saturday, March 17, 12
Conway’s game of Life
              • Dead cell with exactly 3 live neighbors
                will come to life
              • A live cell with 2 or 3 neighbors will
                survive
              • With too few or too many neighbors, the
                cell dies




Saturday, March 17, 12
Interesting Patterns emerge




Saturday, March 17, 12
Conway’s Game of Life
     APL


    NumPy
                         Initialization


        Update Step




Saturday, March 17, 12
Improvements needed
        • NDArray improvements
          • Indexes (esp. for Structured arrays)
          • SQL front-end
          • Multi-level, hierarchical labels
          • selection via mappings (labeled arrays)
          • Memory spaces (array made up of regions)
          • Distributed arrays (global array)
          • Compressed arrays
          • Standard distributed persistance
          • fancy indexing as view and optimizations
          • streaming arrays


Saturday, March 17, 12
Improvements needed
         • Dtype improvements
           • Enumerated types (including dynamic enumeration)
           • Derived fields
           • Specification as a class (or JSON)
           • Pointer dtype (i.e. C++ object, or varchar)
           • Finishing datetime
           • Missing data with both bit-patterns and mask
           • Parameterized field names




Saturday, March 17, 12
Example of Object Dtype

                         @np.dtype
                         class Stock(np.DType):
                               symbol = np.Str(4)
                               open = np.Int(2)
                               close = np.Int(2)
                               high = np.Int(2)
                               low = np.Int(2)
                               @np.Int(2)
                               def mid(self):
                                   return (self.high + self.low) / 2.0




Saturday, March 17, 12
Improvements needed
         • Ufunc improvements
           • Generalized ufuncs support more than just
             contiguous arrays
           • Specification of ufuncs in Python
           • Move most dtype “array functions” to ufuncs
           • Unify error-handling for all computations
           • Allow lazy-evaluation and remote computation ---
             streaming and generator data
           • Structured and string dtype ufuncs
           • Multi-core and GPU optimized ufuncs
           • Group-by reduction



Saturday, March 17, 12
Improvements needed
         • Miscellaneous improvements
           • ABI-management
           • Eventual Move to C++ library (NDLib)
           • NDLib could serve as base for Javascript and other
             high-level languages
           • Integration with LLVM
           • Possible dtype / shape / stride unification into a
             “dimension protocol”
           • Remote computation
           • Fast I/O for CSV and Excel




Saturday, March 17, 12
NumPy Users

             • Want to be able to write Python to get fast
                   code that works on arrays and scalars
             •     Need access to a boat-load of C-extensions
                   (NumPy is just the beginning)


                            PyPy doesn’t cut it for us!




Saturday, March 17, 12
Ufuncs




Saturday, March 17, 12
                                         Generalized
                                          UFuncs
                                                                               Python
                                                                              Function
                                          Window
                                          Kernel
                                           Funcs

                                          Function-
                                            based
                                          Indexing


                                          Memory
                                                                                         Dynamic compilation




                                           Filters
                                                                       Dynamic
                                                                      Compilation




                         NumPy Runtime
                                         I/O Filters



                                         Reduction
                                          Filters


                                         Computed
                                         Columns
                                                       function pointer
SciPy needs a Python compiler

                         optimize                   integrate


                         special                       ode



                         writing more of SciPy at high-level




Saturday, March 17, 12
Numba -- a Python compiler

               • Replays byte-code on a stack with simple type-
                 inference
               • Translates to LLVM (using LLVM-py)
               • Uses LLVM for code-gen
               • Resulting C-level function-pointer can be
                 inserted into NumPy run-time
               • Understands NumPy arrays
               • Is NumPy / SciPy aware


Saturday, March 17, 12
NumPy + Mamba = Numba
                  Python Function                             Machine Code


                                            LLVM-PY

                                            LLVM 3.1
                         ISPC      OpenCL    OpenMP    CUDA     CLANG

                           Intel       AMD        Nvidia      Apple



Saturday, March 17, 12
Examples




Saturday, March 17, 12
Examples




Saturday, March 17, 12
Software Stack Future?
                           Plateaus of Code re-use + DSLs
                     SQL                                R
                              TDPL                                Matlab


                                      Python


                               OBJC                C
                    FORTRAN                                 C++



                                       LLVM



Saturday, March 17, 12
Seeking Developers!



                    https://github.com/ContinuumIO/numba




Saturday, March 17, 12
How to pay for all this?




Saturday, March 17, 12
Dual strategy




                         NumPy 2.0



Saturday, March 17, 12
NumFOCUS
    Num(Py) Foundation for Open Code for Usable Science




Saturday, March 17, 12
NumFOCUS


            • Mission
              • To initiate and support educational programs
                furthering the use of open source software in
                science.
              • To promote the use of high-level languages and
                open source in science, engineering, and math
                research
              • To encourage reproducible scientific research




Saturday, March 17, 12
NumFOCUS


           • Activites
             • Sponsor sprints and conferences
             • Provide scholarships and grants
             • Pay for documentation development and basic
               course development
             • Work with domain-specific organizations
             • Raise funds from industries using Python and
               NumPy



Saturday, March 17, 12
NumFOCUS

                • Directors
                  • Jarrod Millman
                  • Fernando Perez
                  • Travis Oliphant
                  • Perry Greenfield
                  • John Hunter
                • Members
                  • Basically people who donate for now. In time, a
                    body that elects directors.



Saturday, March 17, 12

More Related Content

What's hot

楽しいクォータニオンの世界 田所 第二回Rogyゼミ
楽しいクォータニオンの世界 田所 第二回Rogyゼミ楽しいクォータニオンの世界 田所 第二回Rogyゼミ
楽しいクォータニオンの世界 田所 第二回Rogyゼミ
rogy01
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Bayesian Structural Equations Modeling (SEM)
Bayesian Structural Equations Modeling (SEM)Bayesian Structural Equations Modeling (SEM)
Bayesian Structural Equations Modeling (SEM)
Hamy Temkit
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
Hojin Yang
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
Vivian S. Zhang
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
NAVER Engineering
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
BeerenSahu
 
Feature Pyramid Network, FPN
Feature Pyramid Network, FPNFeature Pyramid Network, FPN
Feature Pyramid Network, FPN
Institute of Agricultural Machinery, NARO
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
Ganesan Narayanasamy
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
岳華 杜
 
Optimizers
OptimizersOptimizers
Optimizers
Il Gu Yi
 
PRML読書会#4資料+補足
PRML読書会#4資料+補足PRML読書会#4資料+補足
PRML読書会#4資料+補足Hiromasa Ohashi
 
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
hoxo_m
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagation
Dea-hwan Ki
 
2次元/3次元幾何学変換の統一的な最適計算論文
2次元/3次元幾何学変換の統一的な最適計算論文2次元/3次元幾何学変換の統一的な最適計算論文
2次元/3次元幾何学変換の統一的な最適計算論文doboncho
 
Swarm intelligence
Swarm intelligenceSwarm intelligence
Swarm intelligence
Shaheena Begum Mohammed
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
Sri Ambati
 

What's hot (20)

楽しいクォータニオンの世界 田所 第二回Rogyゼミ
楽しいクォータニオンの世界 田所 第二回Rogyゼミ楽しいクォータニオンの世界 田所 第二回Rogyゼミ
楽しいクォータニオンの世界 田所 第二回Rogyゼミ
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Bayesian Structural Equations Modeling (SEM)
Bayesian Structural Equations Modeling (SEM)Bayesian Structural Equations Modeling (SEM)
Bayesian Structural Equations Modeling (SEM)
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Feature Pyramid Network, FPN
Feature Pyramid Network, FPNFeature Pyramid Network, FPN
Feature Pyramid Network, FPN
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Optimizers
OptimizersOptimizers
Optimizers
 
PRML読書会#4資料+補足
PRML読書会#4資料+補足PRML読書会#4資料+補足
PRML読書会#4資料+補足
 
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagation
 
2次元/3次元幾何学変換の統一的な最適計算論文
2次元/3次元幾何学変換の統一的な最適計算論文2次元/3次元幾何学変換の統一的な最適計算論文
2次元/3次元幾何学変換の統一的な最適計算論文
 
Swarm intelligence
Swarm intelligenceSwarm intelligence
Swarm intelligence
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 

More from Shohei Hido

CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
Shohei Hido
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門
Shohei Hido
 
NIPS2017概要
NIPS2017概要NIPS2017概要
NIPS2017概要
Shohei Hido
 
ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術
Shohei Hido
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
Shohei Hido
 
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 TokyoSoftware for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
Shohei Hido
 
Chainer GTC 2016
Chainer GTC 2016Chainer GTC 2016
Chainer GTC 2016
Shohei Hido
 
How AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industriesHow AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industries
Shohei Hido
 
NIPS2015概要資料
NIPS2015概要資料NIPS2015概要資料
NIPS2015概要資料
Shohei Hido
 
プロダクトマネージャのお仕事
プロダクトマネージャのお仕事プロダクトマネージャのお仕事
プロダクトマネージャのお仕事
Shohei Hido
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイント
Shohei Hido
 
PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料
Shohei Hido
 
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
Shohei Hido
 
機械学習CROSS 後半資料
機械学習CROSS 後半資料機械学習CROSS 後半資料
機械学習CROSS 後半資料
Shohei Hido
 
機械学習CROSS 前半資料
機械学習CROSS 前半資料機械学習CROSS 前半資料
機械学習CROSS 前半資料
Shohei Hido
 
Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門
Shohei Hido
 
Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤
Shohei Hido
 
今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました
Shohei Hido
 
さらば!データサイエンティスト
さらば!データサイエンティストさらば!データサイエンティスト
さらば!データサイエンティスト
Shohei Hido
 
ICML2013読み会 開会宣言
ICML2013読み会 開会宣言ICML2013読み会 開会宣言
ICML2013読み会 開会宣言
Shohei Hido
 

More from Shohei Hido (20)

CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門
 
NIPS2017概要
NIPS2017概要NIPS2017概要
NIPS2017概要
 
ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術ディープラーニングの産業応用とそれを支える技術
ディープラーニングの産業応用とそれを支える技術
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
 
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 TokyoSoftware for Edge Heavy Computing @ INTEROP 2016 Tokyo
Software for Edge Heavy Computing @ INTEROP 2016 Tokyo
 
Chainer GTC 2016
Chainer GTC 2016Chainer GTC 2016
Chainer GTC 2016
 
How AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industriesHow AI revolutionizes robotics and automotive industries
How AI revolutionizes robotics and automotive industries
 
NIPS2015概要資料
NIPS2015概要資料NIPS2015概要資料
NIPS2015概要資料
 
プロダクトマネージャのお仕事
プロダクトマネージャのお仕事プロダクトマネージャのお仕事
プロダクトマネージャのお仕事
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイント
 
PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料PFIセミナー "「失敗の本質」を読む"発表資料
PFIセミナー "「失敗の本質」を読む"発表資料
 
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
NIPS2013読み会: More Effective Distributed ML via a Stale Synchronous Parallel P...
 
機械学習CROSS 後半資料
機械学習CROSS 後半資料機械学習CROSS 後半資料
機械学習CROSS 後半資料
 
機械学習CROSS 前半資料
機械学習CROSS 前半資料機械学習CROSS 前半資料
機械学習CROSS 前半資料
 
Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門
 
Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤Jubatusが目指すインテリジェンス基盤
Jubatusが目指すインテリジェンス基盤
 
今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました今年のKDDベストペーパーを実装・公開しました
今年のKDDベストペーパーを実装・公開しました
 
さらば!データサイエンティスト
さらば!データサイエンティストさらば!データサイエンティスト
さらば!データサイエンティスト
 
ICML2013読み会 開会宣言
ICML2013読み会 開会宣言ICML2013読み会 開会宣言
ICML2013読み会 開会宣言
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Travis E. Oliphant, "NumPy and SciPy: History and Ideas for the Future"

  • 1. NumPy and SciPy: History and Ideas for the Future Travis E. Oliphant SciPy Tokyo. March 18, 2012 Saturday, March 17, 12
  • 3. My Roots Images from BYU Mers Lab Saturday, March 17, 12
  • 4. Science led to Python 2 ⇢0 (2⇡f ) Ui (a, f ) = [Cijkl (a, f ) Uk,l (a, f )],j Raja Muthupillai Richard Ehman 1997 Armando Manduca Saturday, March 17, 12
  • 5. Finding derivatives of 5-d data ⌅=r⇥U Saturday, March 17, 12
  • 7. Python origins. http://python-history.blogspot.com/2009/01/brief-timeline-of-python.html Version Date 0.9.0 Feb. 1991 0.9.4 Dec. 1991 0.9.6 Apr. 1992 0.9.8 Jan. 1993 1.0.0 Jan. 1994 1.2 Apr. 1995 1.4 Oct. 1996 1.5.2 Apr. 1999 Saturday, March 17, 12
  • 8. Python origins. http://python-history.blogspot.com/2009/01/brief-timeline-of-python.html Version Date 0.9.0 Feb. 1991 0.9.4 Dec. 1991 0.9.6 Apr. 1992 0.9.8 Jan. 1993 1.0.0 Jan. 1994 1.2 Apr. 1995 1.4 Oct. 1996 1.5.2 Apr. 1999 Saturday, March 17, 12
  • 9. Brief History Person Package Year Matrix Object Jim Fulton 1994 in Python Jim Hugunin Numeric 1995 Perry Greenfield, Rick White, Todd Miller Numarray 2001 Travis Oliphant NumPy 2005 Saturday, March 17, 12
  • 10. Found Python and Numeric in 1997 I was a fairly proficient MATLAB user, but it was not memory efficient enough. Loved the expressive syntax of Python Loved the fact that slicing didn’t make copies Loved the existing multiple data-types Loved how much more flexible it was to extend than MATLAB was Loved that I could read the source code and extend it Saturday, March 17, 12
  • 11. First problem: Efficient Data Input “It’s All About the Data” Reference Counting Essay TableIO http://www.python.org/doc/essays/refcnt/ April 1998 May 1998 Michael A. Miller Guido van Rossum NumPyIO June 1998 Saturday, March 17, 12
  • 12. First problem: Efficient Data Input “It’s All About the Data” Reference Counting Essay TableIO http://www.python.org/doc/essays/refcnt/ April 1998 May 1998 Michael A. Miller Guido van Rossum NumPyIO June 1998 Saturday, March 17, 12
  • 13. Early pieces of SciPy fftw wrappers cephesmodule June 1998 November 1998 stats.py December 1998 Gary Strangman Saturday, March 17, 12
  • 14. 1999 : Early SciPy emerges Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen, and others. Activity in 1998, led to increased interest in 1999. In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be present and began wrapping / writing in earnest. On 6 April 1999, I announced I would be creating this uber-package which eventually became SciPy Gaussian quadrature 5 Jan 1999 cephes 1.0 30 Jan 1999 sigtools 0.40 23 Feb 1999 Numeric docs March 1999 cephes 1.1 9 Mar 1999 Plotting?? multipack 0.3 13 Apr 1999 Helper routines 14 Apr 1999 Gist multipack 0.6 (leastsq, ode, fsolve, 29 Apr 1999 XPLOT quad) DISLIN sparse plan described 30 May 1999 Gnuplot multipack 0.7 14 Jun 1999 SparsePy 0.1 cephes 1.2 (vectorize) 5 Nov 1999 29 Dec 1999 Helping with f2py Saturday, March 17, 12
  • 15. Early 2000 : Numeric needs • Memory mapped arrays • Rank-0 arrays or scalars • Handling indirect indexing: a[[10,5,7]] • Handling masked indexing: a[[True, False, False] • More attributes to N-d arrays • “Record arrays” Saturday, March 17, 12
  • 16. Early contributors in 1999 Hosting of first Multipack CVS repository (June 1999) Amazing makefiles Interface to FITPACK Wrote f2py as he watched my brute-force approach (July 1999) Pearu Peterson (IPP) Early IPython interactive environment (27 Apr 1999) Matlab file reader (24 Apr 1999) Janko Hauser Created windows binaries of multipack, cephesmodule, fftw, and signaltools (June 1999 while still in high school!) Robert Kern Saturday, March 17, 12
  • 17. Early contributors in 1999 Hosting of first Multipack CVS repository (June 1999) Amazing makefiles Interface to FITPACK Wrote f2py as he watched my brute-force approach (July 1999) Pearu Peterson (IPP) Early IPython interactive environment (27 Apr 1999) Matlab file reader (24 Apr 1999) Janko Hauser Created windows binaries of multipack, cephesmodule, fftw, and signaltools (June 1999 while still in high school!) Robert Kern Saturday, March 17, 12
  • 22. SciPy 2001 Travis Oliphant optimize sparse interpolate integrate special signal stats Founded in 2001 with Travis Vaught fftpack misc Eric Jones weave cluster Pearu Peterson GA* linalg interpolate f2py Saturday, March 17, 12
  • 23. Finally... Plotting John Hunter 2001 Saturday, March 17, 12
  • 24. IPython (building on IPP) Fernando Perez Dec 10, 2001 Saturday, March 17, 12
  • 25. STSCI leads out with Numarray Perry Greenfield J. Todd Miller Rick White Paul Barrett Saturday, March 17, 12
  • 26. Numarray released in 2003 • too slow for small arrays • incomplete implementation for ufuncs • minimal Numeric code re-use • lots of very nice things, though (e.g. memory maps, fast code for large arrays, better sorting algorithms) Saturday, March 17, 12
  • 27. Split in the community Numeric SciPy Numarray ndimage others Saturday, March 17, 12
  • 29. started January 2005 NumPy Version 1.0 October 2006 Key contributions from: Numarray Numeric Chuck Harris Robert Kern David Cooke Pierre GM Saturday, March 17, 12
  • 30. NumPy in Python Q: When is NumPy going to be part of the core Python? A: Data structure is in Python as PEP 3118 Saturday, March 17, 12
  • 32. Left Academica For Industry Saturday, March 17, 12
  • 33. Community effort many, many others --- forgive me! • Chuck Harris • Pauli Virtanen • David Cournapeau • Stefan van der Walt • Jarrod Millman • Josef Perktold • Anne Archibald • Dag Sverre Seljebotn • Robert Kern • Matthew Brett • Warren Weckesser • Ralf Gommers • Joe Harrington --- Documentation effort • Andrew Straw --- www.scipy.org Saturday, March 17, 12
  • 34. NumPy Array shape Saturday, March 17, 12
  • 35. NumPy Essentials • Data: the array object – slicing – shapes and strides – data-type generality • Fast Math: – vectorization – broadcasting – aggregations Saturday, March 17, 12
  • 36. Zen of NumPy • strided is better than scattered • contiguous is better than strided • descriptive is better than imperative • array-oriented is better than object-oriented • broadcasting is a great idea • vectorized is better than an explicit loop • unless it’s too complicated --- then use Cython / weave • think in higher dimensions Saturday, March 17, 12
  • 38. Experiences Conulsting • Basically, I was a consultant and manager of consultants for 4 years. • It kept the lights on at home, but gave me minimal time for NumPy. • Silver lining was that I learned exactly what “big-data” problems big companies have and saw first-hand that numerous “little” improvements need to be made to NumPy which will allow it to have a larger impact on helping to manage the world’s data. Saturday, March 17, 12
  • 39. Putting Science back in CS • Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web - Complex numbers? - Vectorization primitives? • Array-based programming has been supplanted by Object-oriented programming • Software stack for scientists is not as helpful as it should be • Fortran is still where many scientists end up Saturday, March 17, 12
  • 40. APL : the first array-oriented language • Appeared in 1964 • Originated by Ken Iverson • Direct descendants (J, K, Matlab) are still used heavily and people pay a lot of money for them APL • NumPy is a descendent J K Matlab Numeric NumPy Saturday, March 17, 12
  • 41. Conway’s game of Life • Dead cell with exactly 3 live neighbors will come to life • A live cell with 2 or 3 neighbors will survive • With too few or too many neighbors, the cell dies Saturday, March 17, 12
  • 43. Conway’s Game of Life APL NumPy Initialization Update Step Saturday, March 17, 12
  • 44. Improvements needed • NDArray improvements • Indexes (esp. for Structured arrays) • SQL front-end • Multi-level, hierarchical labels • selection via mappings (labeled arrays) • Memory spaces (array made up of regions) • Distributed arrays (global array) • Compressed arrays • Standard distributed persistance • fancy indexing as view and optimizations • streaming arrays Saturday, March 17, 12
  • 45. Improvements needed • Dtype improvements • Enumerated types (including dynamic enumeration) • Derived fields • Specification as a class (or JSON) • Pointer dtype (i.e. C++ object, or varchar) • Finishing datetime • Missing data with both bit-patterns and mask • Parameterized field names Saturday, March 17, 12
  • 46. Example of Object Dtype @np.dtype class Stock(np.DType): symbol = np.Str(4) open = np.Int(2) close = np.Int(2) high = np.Int(2) low = np.Int(2) @np.Int(2) def mid(self): return (self.high + self.low) / 2.0 Saturday, March 17, 12
  • 47. Improvements needed • Ufunc improvements • Generalized ufuncs support more than just contiguous arrays • Specification of ufuncs in Python • Move most dtype “array functions” to ufuncs • Unify error-handling for all computations • Allow lazy-evaluation and remote computation --- streaming and generator data • Structured and string dtype ufuncs • Multi-core and GPU optimized ufuncs • Group-by reduction Saturday, March 17, 12
  • 48. Improvements needed • Miscellaneous improvements • ABI-management • Eventual Move to C++ library (NDLib) • NDLib could serve as base for Javascript and other high-level languages • Integration with LLVM • Possible dtype / shape / stride unification into a “dimension protocol” • Remote computation • Fast I/O for CSV and Excel Saturday, March 17, 12
  • 49. NumPy Users • Want to be able to write Python to get fast code that works on arrays and scalars • Need access to a boat-load of C-extensions (NumPy is just the beginning) PyPy doesn’t cut it for us! Saturday, March 17, 12
  • 50. Ufuncs Saturday, March 17, 12 Generalized UFuncs Python Function Window Kernel Funcs Function- based Indexing Memory Dynamic compilation Filters Dynamic Compilation NumPy Runtime I/O Filters Reduction Filters Computed Columns function pointer
  • 51. SciPy needs a Python compiler optimize integrate special ode writing more of SciPy at high-level Saturday, March 17, 12
  • 52. Numba -- a Python compiler • Replays byte-code on a stack with simple type- inference • Translates to LLVM (using LLVM-py) • Uses LLVM for code-gen • Resulting C-level function-pointer can be inserted into NumPy run-time • Understands NumPy arrays • Is NumPy / SciPy aware Saturday, March 17, 12
  • 53. NumPy + Mamba = Numba Python Function Machine Code LLVM-PY LLVM 3.1 ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple Saturday, March 17, 12
  • 56. Software Stack Future? Plateaus of Code re-use + DSLs SQL R TDPL Matlab Python OBJC C FORTRAN C++ LLVM Saturday, March 17, 12
  • 57. Seeking Developers! https://github.com/ContinuumIO/numba Saturday, March 17, 12
  • 58. How to pay for all this? Saturday, March 17, 12
  • 59. Dual strategy NumPy 2.0 Saturday, March 17, 12
  • 60. NumFOCUS Num(Py) Foundation for Open Code for Usable Science Saturday, March 17, 12
  • 61. NumFOCUS • Mission • To initiate and support educational programs furthering the use of open source software in science. • To promote the use of high-level languages and open source in science, engineering, and math research • To encourage reproducible scientific research Saturday, March 17, 12
  • 62. NumFOCUS • Activites • Sponsor sprints and conferences • Provide scholarships and grants • Pay for documentation development and basic course development • Work with domain-specific organizations • Raise funds from industries using Python and NumPy Saturday, March 17, 12
  • 63. NumFOCUS • Directors • Jarrod Millman • Fernando Perez • Travis Oliphant • Perry Greenfield • John Hunter • Members • Basically people who donate for now. In time, a body that elects directors. Saturday, March 17, 12