Successfully reported this slideshow.

Julia - Easier, Better, Faster, Stronger


Published on

Published in: Technology, Education
  • Be the first to comment

Julia - Easier, Better, Faster, Stronger

  1. 1. Easier, Better, Faster, Stronger KentaSato July 02, 2014 1 / 30
  2. 2. Agenda 1. The Julia Language 2. Easier Familiar Syntax Just-In-Time Compiler 3. Better Types for Technical Computing Library Support Type System 4. Faster Benchmark N Queens Puzzle 5. Stronger Multiple Dispatch Macros 2 / 30
  3. 3. Notations Here I use the following special notation in examples. <expression>#><value>: The <expression>is evaluated to the <value>. <expression>#:<output>: When the <expression>is evaluated, it prints the <output>to the screen. <expression>#!<error>: When the <expression>is evaluated, it throws the <error>. Examples: 42 #>42 2+3 #>5 "hello,world" #>"hello,world" println("hello,world") #:hello,world 42+"hello,world" #!ERROR:nomethod+(Int64,ASCIIString) 3 / 30
  4. 4. The Julia Language Julia is a high­level, high­performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The core of the Julia implementation is licensed under the MIT license. Various libraries used by the Julia environment include their own licenses such as the GPL, LGPL, and BSD (therefore the environment, which consists of the language, user interfaces, and libraries, is under the GPL). — “ 4 / 30
  5. 5. Easier - Familiar Syntax At a glance, you will feel familiar with the syntax of Julia. The usage of for, while, and ifis very close to that of Ruby or Python. continue, break, and returnwork as you expect. Defining function is also straightforward, the function name is followed by its arguments. You can specify the types of arguments, which is actually optional. @inboundsis a kind of macros, and macros always start with the @character. functionsort!(v::AbstractVector,lo::Int,hi::Int,::InsertionSortAlg,o::Ordering) @inboundsforiinlo+1:hi j=i x=v[i] whilej>lo iflt(o,x,v[j-1]) v[j]=v[j-1] j-=1 continue end break end v[j]=x end returnv end base/sort.jl5 / 30
  6. 6. Easier - Familiar Syntax n:mcreates a range data which is inclusive on both sides. Python's range(n,m)includes the left side, but doesn't the right side, which is often confusing. [...forxinxs]creates an array from xs, which is something iterable. This notation is known as list comprehension in Python and Haskell. 4:8 #>4:8 [xforxin4:8] #>[4,5,6,7,8] [4:8] #>[4,5,6,7,8] [x*2forxin4:8] #>[8,10,12,14,16] 6 / 30
  7. 7. Easier - Familiar Syntax The index of an array always starts with 1, not 0. That means when you allocate an array with size n, all indices in 1:nare accessible. You can use a range data to copy a part of an array. The step of a range can be placed between the start and stop. (i.e. start:step:stop) You can also specify negative step, which creates a reversed range. There is a special index - end- indicating the last index of an array. xs=[8,6,4,2,0] xs[1:3] #>[8,6,4] xs[4:end] #>[2,0] xs[1:2:end] #>[8,4,0] xs[end:-2:1] #>[0,4,8] 7 / 30
  8. 8. Easier - Just-In-Time Compiler To run your program written in Julia, there is no need to compile it beforehand. You only have to give the entry point file to the Julia's JIT (Jist-In-Time) compiler: %catmyprogram.jl n=10 xs=[1:n] println("thetotalbetween1and$nis$(sum(xs))") %juliamyprogram.jl thetotalbetween1and10is55 From version 0.3, the standard libraries are precompiled when you build Julia, which saves much time to start your program. %timejuliamyprogram.jl thetotalbetween1and10is55 0.80real 0.43user 0.10sys 8 / 30
  9. 9. Better - Types for Technical Computing Julia supports various numerical types with different sizes. Integer types Type Signed? Number of bits Smallest value Largest value Int8 ✓ 8 -2^7 2^7 - 1 Uint8 8 0 2^8 - 1 Int16 ✓ 16 -2^15 2^15 - 1 Uint16 16 0 2^16 - 1 Int32 ✓ 32 -2^31 2^31 - 1 Uint32 32 0 2^32 - 1 Int64 ✓ 64 -2^63 2^63 - 1 Uint64 64 0 2^64 - 1 Int128 ✓ 128 -2^127 2^127 - 1 Uint128 128 0 2^128 - 1 Bool N/A 8 false (0) true (1) Char N/A 32 '0' 'Uffffffff' 9 / 30
  10. 10. Better - Types for Technical Computing Floating-point types Type Precision Number of bits Float16 half 16 Float32 single 32 Float64 double 64 10000 #>10000 typeof(10000) #>Int64 0x12 #>0x12 typeof(0x12) #>Uint8 0x123 #>0x0123 typeof(0x123) #>Uint16 1.2 #>1.2 typeof(1.2) #>Float64 1.2e-10 #>1.2e-10 Complex numbers and rational numbers are also available: 1+2im #1+2i 6//9 #2/3 10 / 30
  11. 11. Better - Types for Technical Computing If you need more precise values, arbitrary-precision arithmetic is supported. There are two data types to offer this arithmetic operation: BigInt- arbitrary precision integer BigFloat- arbitrary precision floating point numbers big_prime=BigInt("5052785737795758503064406447721934417290878968063369478337") typeof(big_prime) #>BigInt precise_pi=BigFloat("3.14159265358979323846264338327950288419716939937510582097") typeof(precise_pi) #>BigFloat And if you need customized types, you can create a new type. The user-defined types are instantiated by their type name functions called constructors: typePoint x::Float64 y::Float64 end #Pointistheconstructor. p1=Point(1.2,3.4) p2=Point(0.2,-3.1) 11 / 30
  12. 12. Better - Library Support Julia bundles various libraries in it. These libraries are incorporated into the standard library, thus almost no need to know the details of the underlying APIs. Numerical computing OpenBLAS ― basic linear algebra subprograms LAPACK ― linear algebra routines for solving systems Intel® Math Kernel Library (optional) ― fast math library for Intel processors SuiteSparse ― linear algebra routines for sparse matrices ARPACK ― subroutines desined to solve large scale eigenvalue problems FFTW ― library for computing the discrete Fourier transformations Other tools PCRE ― Perl-compatible regular expressions library libuv ― asynchronous IO library 12 / 30
  13. 13. Better - Library Support Here some functions of linear algebra library. a=randn((50,1000)) # 50x1000matrix b=randn((50,1000)) # 50x1000matrix x=randn((1000,1000)) #1000x1000matrix #dotproduct dot(vec(a),vec(b)) #matrixmultiplication a*x #LUfactorization lu(x) #eigenvaluesandeigenvectors eig(x) The vecfunction converts a multi-dimensional array into a vector without copy.❏ 13 / 30
  14. 14. Better - Type System The type system of Julia is categorized as dynamic type-checking, in which the type safety is verified at runtime. But each value has a concrete type and its type is not implicitly converted to other type at runtime. You can almost always think that types should be converted explicitly. There are two notable exceptions: arithmetic operators and constructors. x=12 typeof(x) #>Int64 y=12.0 typeof(y) #>Float64 #thisfunctiononlyacceptsanInt64argument functionfoo(x::Int64) println("thevalueis$x") end foo(x) #:thevalueis12 foo(y) #!ERROR:nomethodfoo(Float64) 14 / 30
  15. 15. x=12 y=12.0 x+y #>24.0 x-y #>0.0 x*y #>144.0 x/y #>1.0 promotion rule is defined as: promote_rule(::Type{Float64},::Type{Int64})=Float64 typePoint x::Float64 y::Float64 end Point(x,y) #>Point(12.0,12.0) Better - Type System Arithmetic operators are functions in Julia. For example, addition of Float64is defined as +(x::Float64,y::Float64) atfloat.jl:125. But you can use these operators for differently typed values. This automatic type conversion is called promotion, which is defined by the promote_rulefunction. Constructors also do type conversion implicitly. 15 / 30
  16. 16. Better - Type System Types can be parameterized by other types or values. This is called type parameters. For example, an array has two type parameters - the element type and the dimensions. The Array{T,D}type contains elements typed as T, and is a D dimensional array. typeof([1,2,3]) #>Array{Int64,1} typeof([1.0,2.0,3.0]) #>Array{Float64,1} typeof(["one","two","three"]) #>Array{ASCIIString,1} typeof([12;34]) #>Array{Int64,2} Julia allows you to define parameterized types as follows: typePoint{T} x::T y::T end Point{Int}(1,2) #>Point{Int64}(1,2) Point{Float64}(4.2,2.1) #>Point{Float64}(4.2,2.0) 16 / 30
  17. 17. Faster - Benchmark The performance of Julia is comparable to other compiled languages like C and Fortran, and much faster than other interpreted languages. 10 1 10 2 10 -2 10 7 10 8 10 0 10 -1 10 6 10 4 10 3 10 5 MatlabGo RMathematicaPythonFortran OctaveJavaScriptJulia benchmark fib mandel pi_sum rand_mat_mul rand_mat_stat printfd quicksort parse_int Figure: benchmark times relative to C (smaller is better, C performance = 1.0). 17 / 30
  18. 18. Faster - Benchmark The performance of Julia is comparable to other compiled languages like C and Fortran, and much faster than other interpreted languages. Figure: benchmark times relative to C (smaller is better, C performance = 1.0). Fortran Julia Python R Matlab Octave Mathe- matica JavaScript Go gcc 4.8.1 0.2 2.7.3 3.0.2 R2012a 3.6.4 8.0 V8 go1 fib 0.26 0.91 30.37 411.36 1992.00 3211.81 64.46 2.18 1.03 parse_int 5.03 1.60 13.95 59.40 1463.16 7109.85 29.54 2.43 4.79 quicksort 1.11 1.14 31.98 524.29 101.84 1132.04 35.74 3.51 1.25 mandel 0.86 0.85 14.19 106.97 64.58 316.95 6.07 3.49 2.36 pi_sum 0.80 1.00 16.33 15.42 1.29 237.41 1.32 0.84 1.41 rand_mat_stat 0.64 1.66 13.52 10.84 6.61 14.98 4.52 3.28 8.12 rand_mat_mul 0.96 1.01 3.41 3.98 1.10 3.41 1.16 14.60 8.51 C compiled by gcc 4.8.1, taking best timing from all optimization levels (-O0 through -O3). C, Fortran and Julia use OpenBLAS v0.2.8. The Python implementations of rand_mat_statand rand_mat_muluse NumPy (v1.6.1) functions; the rest are pure Python implementations. ❏ 18 / 30
  19. 19. Faster - N Queens Puzzle Place N queens on an N × N chessboard so that no queens cut in each other, and return the number of possible cases. These are part of solutions when N = 8. Weisstein, Eric W. "Queens Problem." From MathWorld--A Wolfram Web Resource. 19 / 30
  20. 20. Faster - N Queens Puzzle When N gets bigger, the number of solutions grows drastically. It may take a long time to get the answer when N is sufficiently large. The algorithm uses a bunch of arithmetic, iteration, recursive function call, and branching. So this puzzle would be suitable for trying the efficiency of a programming language. The number of solutions of N queens puzzle. N 4 5 6 7 8 9 10 11 12 13 14 15 #Solutions 2 10 4 40 92 352 724 2,680 14,200 73,712 365,596 2,279,183 20 / 30
  21. 21. Program in Julia. solve(n::Int): put nqueens on a board, then return the number of solutions. search(places,i,n): put a queen on the ith row. isok(places,i,j): check whether you can put a queen at (i,j). This algorithm is not optimal; you can exploit the symmetry of position, but this is enough to time the speed of Julia. Faster - N Queens Puzzle In isok, you can iterate over enumerate(places)instead. But that killed the performance of the code. ❏ functionsolve(n::Int) places=zeros(Int,n) search(places,1,n) end functionsearch(places,i,n) ifi==n+1 return1 end s=0 @inboundsforjin1:n ifisok(places,i,j) places[i]=j s+=search(places,i+1,n) end end s end functionisok(places,i,j) qi=1 @inboundsforqjinplaces ifqi==i break elseifqj==j||abs(qi-i)==abs(qj-j) returnfalse end qi+=1 end true end Julia 21 / 30
  22. 22. Python and C++ are competitors in our benchmark. Faster - N Queens Puzzle defsolve(n): places=[-1]*n returnsearch(places,0,n) defsearch(places,i,n): ifi==n: return1 s=0 forjinrange(n): ifisok(places,i,j): places[i]=j s+=search(places,i+1,n) returns defisok(places,i,j): forqi,qjinenumerate(places): ifqi==i: break elifqj==jorabs(qi-i)==abs(qj-j): returnFalse returnTrue Python intsolve(intn) { std::vector<int>places(n,-1); returnsearch(places,0,n); } intsearch(std::vector<int>&places,inti,intn) { if(i==n) return1; ints=0; for(intj=0;j<n;j++){ if(isok(places,i,j)){ places[i]=j; s+=search(places,i+1,n); } } returns; } boolisok(conststd::vector<int>&places,inti,intj) { intqi=0; for(intqj:places){ if(qi==i) break; elseif(qj==j||abs(qi-i)==abs(qj-j)) returnfalse; qi++; } returntrue; } C++ 22 / 30
  23. 23. Faster - N Queens Puzzle I measured the total time to get the answers corresponding to N = 4, 5, ..., 14. Julia - v0.3 (commit: da158df6b5b7f918989a73317a799c909d639e5f) %timejulia.jleightqueen.jl14>/dev/null 10.05real 9.89user 0.11sys Python - v3.4.1 %timepython3eightqueen.py14>/dev/null 1283.34real 1255.18user 2.67sys C++ - v503.0.40 %clang++-O3--std=c++11--stdlib=libc++eightqueen.cpp %time./a.out14>/dev/null 8.24real 8.17user 0.01sys 23 / 30
  24. 24. Faster - N Queens Puzzle And N = 15. Julia %timejulia.jleightqueen.jl15>/dev/null 64.75real 63.73user 0.17sys C++ %time./a.out15>/dev/null 54.31real 53.89user 0.05sys Note that the result of Julia included JIT compiling time whereas C++ was compiled before execution. The execution time of Python is not measured because Python took too much time when N = 15.❏ Platform Info: System: Darwin (x86_64-apple-darwin13.2.0) CPU: Intel(R) Core(TM) i5-2435M CPU @ 2.40GHz❏ 24 / 30
  25. 25. Stronger - Multiple Dispatch We often want to use a single function name to handle different types. Additions of floats and integers are completely different procedures, but we always want to use the +operator in both cases. Leaving some parameters as optional is useful. maximum(A,dims)computes the maximum value of an array Aover the given dimensions. maximum(A)computes the maximum value of an array A, ignoring dimensions. Unified API will save your memory. fit(model,x,y)trains modelbased on the input xand the output y. The model may be Generalized Linear Model, Lasso, Random Forest, SVM, and so on. Julia satisfies these demands using multiple dispatch; multiple methods are dispatched according to their arity and argument types. 25 / 30
  26. 26. Stronger - Multiple Dispatch When the foofunction is called, one of the following methods is actually selected besed on the number of arguments. functionfoo() println("foo0:") end functionfoo(x) println("foo1:$x") end functionfoo(x,y) println("foo2:$x$y") end foo() #:foo0: foo(100) #:foo1:100 foo(100,200) #:foo2:100200 26 / 30
  27. 27. Stronger - Multiple Dispatch Multiple dispatch discerns the types of arguments - a suitable method which has the matching type spec to the values is selected. functionfoo(x::Int,y::Int) println("fooIntInt:$x$y") end functionfoo(x::Float64,y::Float64) println("fooFloat64Float64:$x$y") end functionfoo(x::Int,y::Float64) println("fooIntFloat64:$x$y") end foo(1,2) #:fooIntInt:12 foo(1.0,2.0) #:fooFloat64Float64:1.02.0 foo(1,2.0) #:fooIntFloat64:12.0 27 / 30
  28. 28. Stronger - Macros Macros allows you to get or modify your code from Julia itself. In the following example, the assertmacro gets given expression ( x>0), then evaluates the expression in that place. When the evaluated result is false, it throws an assertion error. Note that the error message contains acquired expression ( x>0) which is evaluated as false; this information is useful for debugging purpose. x=-5 @assertx>0 #!ERROR:assertionfailed:x>0 Instead of an expression, you can specify an error message: x=-5 @assertx>0"xmustbepositive" #!ERROR:assertionfailed:xmustbepositive 28 / 30
  29. 29. Stronger - Macros The assertmacro is defined as follows in the standard library. The macro is called with an expression ( ex) and zero or more messages ( msg...). If the messages are empty, the expression itself becomes the error message ( msg). Then the error message is constructed. Finally, an assertion code is spliced into the calling place. macroassert(ex,msgs...) msg=isempty(msgs)?ex:msgs[1] if!isempty(msgs)&&isa(msg,Expr) #messageisanexpressionneedingevaluating msg=:(string("assertionfailed:",$(esc(msg)))) elseifisdefined(Base,:string) msg=string("assertionfailed:",msg) else #string()mightnotbedefinedduringbootstrap msg=:(string("assertionfailed:",$(Expr(:quote,msg)))) end :($(esc(ex))?$(nothing):error($msg)) end base/error.jl 29 / 30
  30. 30. Future - :) 30 / 30