SlideShare a Scribd company logo
It´s The Memory,
           Stupid!
                      or:
How I Learned to Stop Worrying about CPU Speed
           and Love Memory Access
                Francesc Alted
               Software Architect

     Big Data Spain 2012, Madrid (Spain)
              November 16, 2012
About Continuum
      Analytics
• Develop new ways on how data is
  stored, computed, and visualized.
• Provide open technologies for Data
  Integration on a massive scale.
• Provide software tools, training, and
  integration/consulting services to
  corporate, government, and educational
  clients worldwide.
Overview

• The Era of ‘Big Data’
• A few words about Python and NumPy
• The Starving CPU problem
• Choosing optimal containers for Big Data
“A wind of streaming data, social data
        and unstructured data is knocking at
      the door, and we're starting to let it in.
           It's a scary place at the moment.”

          -- Unidentified bank IT executive, as
            quoted by “The American Banker”




The Dawn of ‘Big Data’
Challenges

• We have to deal with as much data as
  possible by using limited resources


• So, we must use our computational
  resources optimally to be able to get the
  most out of Big Data
Interactivity and Big
          Data

• Interactivity is crucial for handling data

• Interactivity and performance are crucial
  for handling Big Data
Python and ‘Big Data’
• Python is an interpreted language and hence,
   it offers interactivity
• Myth: “Python is slow, so why on the hell are
   you going to use it for Big Data?”
• Answer: Python has access to an incredibly
   powerful range of libraries that boost its
   performance far beyond your expectations
• ...and during this talk I will prove it!
NumPy: A Standard ‘De
  Facto’ Container





                                            





         
Operating
    with NumPy
• array[2]; array[1,1:5, :]; array[[3,6,10]]
• (array1**3 / array2) - sin(array3)
• numpy.dot(array1, array2): access to
  optimized BLAS (*GEMM) functions
• and much more...
Nothing Is Perfect

• NumPy is just great for many use cases
• However, it also has its own deficiencies:
  •   Follows the Python evaluation order in complex
      expressions like : (a * b) + c

  •   Does not have support for multiprocessors
      (except for BLAS computations)
Numexpr: Dealing with
Complex Expressions
• It comes with a specialized virtual machine
  for evaluating expressions
• It accelerates computations mainly by
  making a more efficient memory usage
• It supports extremely easy to use
  multithreading (active by default)
Exercise (I)
Evaluate the next polynomial:
      0.25x3 + 0.75x2 + 1.5x - 2
in the range [-1, 1] with a step size of 2*10-7,
using both NumPy and numexpr.
Note: use a single processor for numexpr
numexpr.set_num_threads(1)
Exercise (II)
Rewrite the polynomial in this notation:

    ((0.25x + 0.75)x + 1.5)x - 2

and redo the computations.

What happens?
((.25*x + .75)*x - 1.5)*x – 2                         0,301            0,11
x                                                     0,052           0,045
sin(x)**2+cos(x)**2                                   0,715           0,559

                               Time to evaluate polynomial (1 thread)

              1,8
              1,6
              1,4
              1,2
                                                                                      NumPy
               1
   Time (s)




                                                                                      Numexpr
              0,8
              0,6
              0,4
              0,2
               0
                    .25*x**3 + .75*x**2 - 1.5*x – 2   ((.25*x + .75)*x - 1.5)*x – 2



                                    NumPy vs Numexpr (1 thread)

              1,8
Power Expansion
Numexpr expands expression:

0.25x3 + 0.75x2 + 1.5x - 2
to:
0.25x*x*x + 0.75x*x + 1.5x*x - 2

so, no need to use transcendental pow()
Pending question


• Why numexpr continues to be 3x faster
  than NumPy, even when both are executing
  exactly the *same* number of operations?
“Across the industry, today’s chips are largely
    able to execute code faster than we can feed
                them with instructions and data.”

               – Richard Sites, after his article
                    “It’s The Memory, Stupid!”,
          Microprocessor Report, 10(10),1996



The Starving CPU
    Problem
Memory Access Time
 vs CPU Cycle Time
Book in
 2009
The Status of CPU
   Starvation in 2012
• Memory latency is much slower (between
  250x and 500x) than processors.
• Memory bandwidth is improving at a better
  rate than memory latency, but it is also
  slower than processors (between 30x and
  100x).
CPU Caches to the
      Rescue

• CPU cache latency and throughput
  are much better than memory
• However: the faster they run the
  smaller they must be
CPU Cache Evolution
           Up to end 80’s                     90’s and 2000’s                                  2010’s
                 Mechanical disk                      Mechanical disk                         Mechanical disk



                                                                                              Solid state disk
Capacity




                                                                                                                         Speed
                 Main memory                           Main memory                             Main memory



                                                                                               Level 3 cache

                                                        Level 2 cache                          Level 2 cache
                    Central
                   processing                           Level 1 cache                          Level 1 cache
                   unit (CPU)                               CPU                                    CPU
           (a)                                (b)                                     (c)

 Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current
 implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade:
 three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
When CPU Caches Are
     Effective?
Mainly in a couple of scenarios:
 • Time locality: when the dataset is
   reused
 • Spatial locality: when the dataset is
   accessed sequentially
The Blocking Technique
When accessing disk or memory, get a contiguous block that fits
in CPU cache, operate upon it and reuse it as much as possible.

                         




                  


                         

                                       




                            Use this extensively to leverage
                                      spatial and temporal localities
Time To Answer                         NumPy


              Pending Questions
.25*x**3 + .75*x**2 - 1.5*x – 2
((.25*x + .75)*x - 1.5)*x – 2
x
                                                 NumPy
                                                         1,613
                                                         0,301
                                                         0,052
                                                               Numexpr
                                                                     0,138
                                                                       0,11
                                                                     0,045
sin(x)**2+cos(x)**2                                      0,715       0,559

                               Time to evaluate polynomial (1 thread)

              1,8
              1,6
              1,4
              1,2
                                                                                         NumPy
               1
   Time (s)




                                                                                         Numexpr
              0,8
              0,6
              0,4
              0,2
               0
                    .25*x**3 + .75*x**2 - 1.5*x – 2      ((.25*x + .75)*x - 1.5)*x – 2



                                    NumPy vs Numexpr (1 thread)

              1,8





                                       
                                       




                                                 


                     




       





                                       
                                       




                                                 


                      




       
Beyond numexpr:
    Numba
Numexpr Limitations
• Numexpr only implements element-wise
  operations, i.e. ‘a*b’ is evaluated as:
  for i in range(N):

      c[i] = a[i] * b[i]


• In particular, it cannot deal with things like:
  for i in range(N):

      c[i] = a[i-1] + a[i] * b[i]
Numba: Overcoming
 numexpr Limitations
• Numba is a JIT that can translate a subset
  of the Python language into machine code
• It uses LLVM infrastructure behind the
  scenes
• Can achieve similar or better performance
  than numexpr, but with more flexibility
How Numba Works
Python Function                            Machine Code


                         LLVM-PY

                         LLVM 3.1
      ISPC      OpenCL    OpenMP    CUDA     CLANG

        Intel       AMD        Nvidia      Apple
Numba Example:
     Computing the Polynomial
import numpy as np
import numba as nb

N = 10*1000*1000

x = np.linspace(-1, 1, N)
y = np.empty(N, dtype=np.float64)

@nb.jit(arg_types=[nb.f8[:], nb.f8[:]])
def poly(x, y):
    for i in range(N):
        # y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2
        y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2

poly(x, y)   # run through Numba!
Times for Computing the
   Polynomial (In Seconds)
  Poly version     (I)        (II)
    Numpy         1.086      0.505

    numexpr       0.108      0.096

    Numba         0.055      0.054

Pure C, OpenMP    0.215      0.054

• Compilation time for Numba: 0.019 sec
• Run on Mac OSX, Core2 Duo @ 2.13 GHz
Numba: LLVM for
    Python
Python code can reach C
 speed without having to
   program in C itself
  (and without losing interactivity!)
Numba in SC 2012
Numba in SC2012
 Awesome Python!
If a datastore requires all data to fit in
                     memory, it isn't big data

                   -- Alex Gaynor (in twitter)




Optimal Containers for
      Big Data
The Need for a Good
  Data Container
• Too many times we are too focused on
  computing as fast as possible
• But we have seen how important data
  access is
• Hence, having an optimal data structure is
  critical for getting good performance when
  processing very large datasets
Appending Data in
   Large NumPy Objects

 array to be enlarged           final array object
                        Copy!


                                New memory
 new data to append
                                 allocation
• Normally a realloc() syscall will not succeed
• Both memory areas have to exist simultaneously
Contiguous vs Chunked
 NumPy container       Blaze container

                          chunk 1

                          chunk 2
                             .
                             .
                             .
                          chunk N

Contiguous memory   Discontiguous memory
Appending data in Blaze
 array to be enlarged              final array object


                        X
        chunk 1                        chunk 1

       chunk 2                         chunk 2


                        compress
 new data to append                  new chunk

Only a small amount of data has to be compressed
Blosc: (de)compressing
     faster than memcpy()




Transmission + decompression faster than direct transfer?
TABLE 1
                                                  Test Data Sets

   Example of How Blosc Accelerates Genomics I/O:
     #
     1
         Source
         1000 Genomes
                        Identifier
                        ERR000018
                                      Sequencer
                                      Illumina GA
                                                            Read Count
                                                               9,280,498
                                                                           Read Length
                                                                                 36 bp
                                                                                         ID Lengths
                                                                                              40–50
                                                                                                      FASTQ Size
                                                                                                        1,105 MB
     2
     3        SeqPack (backed by Blosc)
         1000 Genomes
         1000 Genomes
                        SRR493233 1
                        SRR497004 1
                                      Illumina HiSeq 2000
                                      AB SOLiD 4
                                                              43,225,060
                                                             122,924,963
                                                                                100 bp
                                                                                 51 bp
                                                                                              51–61
                                                                                              78–91
                                                                                                       10,916 MB
                                                                                                       22,990 MB




 g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each
equence is 256 bytes Howison, M. (in press). High-throughput compression of FASTQ data
            Source:
                     long).
            with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics.


to a memory buffer, timed the compression of block          consistent throughput across both compression and
How Blaze Does Out-
Of-Core Computations
                                                
                                                      
                                                      
                                                                            
                                                                            
                                                                        

                                     




                                                   
                                            
                        
                                        
                                                               
                                                                             
            
            
                                         
                                                                 
                                                                             
                                     


                                    
                                                     


                         
                                               
                                             



Virtual Machine : Python, numexpr, Numba
Last Message for Today
Big data is tricky to manage:

Look for the optimal containers for
your data


Spending some time choosing your
appropriate data container can be a big time
saver in the long run
Summary
• Python is a perfect language for Big Data
• Nowadays you should be aware of the
  memory system for getting good
  performance
• Choosing appropriate data containers is of
  the utmost importance when dealing with
  Big Data
“El éxito del Big Data lo conseguirán
aquellos desarrolladores que sean capaces
de mirar más allá del standard y sean
capaces de entender los recursos hardware
subyacentes y la variedad de algoritmos
disponibles.”

-- Oscar de Bustos, HPC Line of Business
Manager at BULL
¡Gracias!

More Related Content

What's hot

MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
MongoDB
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
Frédéric Parienté
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev
 
CuPy v4 and v5 roadmap
CuPy v4 and v5 roadmapCuPy v4 and v5 roadmap
CuPy v4 and v5 roadmap
Preferred Networks
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Kenta Oono
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
Amazon Web Services
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
Chris Richardson
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
Seiya Tokui
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
vishal choudhary
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
Cloudera, Inc.
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
Park Chunduck
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
Ryousei Takano
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
Haifeng Li
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
Jun Young Park
 

What's hot (20)

MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
CuPy v4 and v5 roadmap
CuPy v4 and v5 roadmapCuPy v4 and v5 roadmap
CuPy v4 and v5 roadmap
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 

Similar to Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012

How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
Tomer Gabel
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
huguk
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
Yoav Avrahami
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
Berkay TURAN
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
Tech in Asia ID
 
Learn How to Master Solr1 4
Learn How to Master Solr1 4Learn How to Master Solr1 4
Learn How to Master Solr1 4
Lucidworks (Archived)
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
Hackito Ergo Sum
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
npinto
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
ActiveState
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
Jongsu "Liam" Kim
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA Taiwan
 
Ca บทที่สี่
Ca บทที่สี่Ca บทที่สี่
Ca บทที่สี่
atit604
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
Schubert Zhang
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
Cournapeau David
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
Spark Summit
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
Kuba Břečka
 

Similar to Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012 (20)

How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
Learn How to Master Solr1 4
Learn How to Master Solr1 4Learn How to Master Solr1 4
Learn How to Master Solr1 4
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Ca บทที่สี่
Ca บทที่สี่Ca บทที่สี่
Ca บทที่สี่
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
Priyanka Aash
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 

Recently uploaded (20)

Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 

Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012

  • 1. It´s The Memory, Stupid! or: How I Learned to Stop Worrying about CPU Speed and Love Memory Access Francesc Alted Software Architect Big Data Spain 2012, Madrid (Spain) November 16, 2012
  • 2. About Continuum Analytics • Develop new ways on how data is stored, computed, and visualized. • Provide open technologies for Data Integration on a massive scale. • Provide software tools, training, and integration/consulting services to corporate, government, and educational clients worldwide.
  • 3. Overview • The Era of ‘Big Data’ • A few words about Python and NumPy • The Starving CPU problem • Choosing optimal containers for Big Data
  • 4. “A wind of streaming data, social data and unstructured data is knocking at the door, and we're starting to let it in. It's a scary place at the moment.” -- Unidentified bank IT executive, as quoted by “The American Banker” The Dawn of ‘Big Data’
  • 5. Challenges • We have to deal with as much data as possible by using limited resources • So, we must use our computational resources optimally to be able to get the most out of Big Data
  • 6. Interactivity and Big Data • Interactivity is crucial for handling data • Interactivity and performance are crucial for handling Big Data
  • 7. Python and ‘Big Data’ • Python is an interpreted language and hence, it offers interactivity • Myth: “Python is slow, so why on the hell are you going to use it for Big Data?” • Answer: Python has access to an incredibly powerful range of libraries that boost its performance far beyond your expectations • ...and during this talk I will prove it!
  • 8. NumPy: A Standard ‘De Facto’ Container
  • 9.     
  • 10. Operating with NumPy • array[2]; array[1,1:5, :]; array[[3,6,10]] • (array1**3 / array2) - sin(array3) • numpy.dot(array1, array2): access to optimized BLAS (*GEMM) functions • and much more...
  • 11. Nothing Is Perfect • NumPy is just great for many use cases • However, it also has its own deficiencies: • Follows the Python evaluation order in complex expressions like : (a * b) + c • Does not have support for multiprocessors (except for BLAS computations)
  • 12. Numexpr: Dealing with Complex Expressions • It comes with a specialized virtual machine for evaluating expressions • It accelerates computations mainly by making a more efficient memory usage • It supports extremely easy to use multithreading (active by default)
  • 13. Exercise (I) Evaluate the next polynomial: 0.25x3 + 0.75x2 + 1.5x - 2 in the range [-1, 1] with a step size of 2*10-7, using both NumPy and numexpr. Note: use a single processor for numexpr numexpr.set_num_threads(1)
  • 14. Exercise (II) Rewrite the polynomial in this notation: ((0.25x + 0.75)x + 1.5)x - 2 and redo the computations. What happens?
  • 15. ((.25*x + .75)*x - 1.5)*x – 2 0,301 0,11 x 0,052 0,045 sin(x)**2+cos(x)**2 0,715 0,559 Time to evaluate polynomial (1 thread) 1,8 1,6 1,4 1,2 NumPy 1 Time (s) Numexpr 0,8 0,6 0,4 0,2 0 .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 NumPy vs Numexpr (1 thread) 1,8
  • 16. Power Expansion Numexpr expands expression: 0.25x3 + 0.75x2 + 1.5x - 2 to: 0.25x*x*x + 0.75x*x + 1.5x*x - 2 so, no need to use transcendental pow()
  • 17. Pending question • Why numexpr continues to be 3x faster than NumPy, even when both are executing exactly the *same* number of operations?
  • 18. “Across the industry, today’s chips are largely able to execute code faster than we can feed them with instructions and data.” – Richard Sites, after his article “It’s The Memory, Stupid!”, Microprocessor Report, 10(10),1996 The Starving CPU Problem
  • 19. Memory Access Time vs CPU Cycle Time
  • 21. The Status of CPU Starvation in 2012 • Memory latency is much slower (between 250x and 500x) than processors. • Memory bandwidth is improving at a better rate than memory latency, but it is also slower than processors (between 30x and 100x).
  • 22. CPU Caches to the Rescue • CPU cache latency and throughput are much better than memory • However: the faster they run the smaller they must be
  • 23. CPU Cache Evolution Up to end 80’s 90’s and 2000’s 2010’s Mechanical disk Mechanical disk Mechanical disk Solid state disk Capacity Speed Main memory Main memory Main memory Level 3 cache Level 2 cache Level 2 cache Central processing Level 1 cache Level 1 cache unit (CPU) CPU CPU (a) (b) (c) Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade: three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
  • 24. When CPU Caches Are Effective? Mainly in a couple of scenarios: • Time locality: when the dataset is reused • Spatial locality: when the dataset is accessed sequentially
  • 25. The Blocking Technique When accessing disk or memory, get a contiguous block that fits in CPU cache, operate upon it and reuse it as much as possible.        Use this extensively to leverage spatial and temporal localities
  • 26. Time To Answer NumPy Pending Questions .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 x NumPy 1,613 0,301 0,052 Numexpr 0,138 0,11 0,045 sin(x)**2+cos(x)**2 0,715 0,559 Time to evaluate polynomial (1 thread) 1,8 1,6 1,4 1,2 NumPy 1 Time (s) Numexpr 0,8 0,6 0,4 0,2 0 .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 NumPy vs Numexpr (1 thread) 1,8
  • 30. Numexpr Limitations • Numexpr only implements element-wise operations, i.e. ‘a*b’ is evaluated as: for i in range(N): c[i] = a[i] * b[i] • In particular, it cannot deal with things like: for i in range(N): c[i] = a[i-1] + a[i] * b[i]
  • 31. Numba: Overcoming numexpr Limitations • Numba is a JIT that can translate a subset of the Python language into machine code • It uses LLVM infrastructure behind the scenes • Can achieve similar or better performance than numexpr, but with more flexibility
  • 32. How Numba Works Python Function Machine Code LLVM-PY LLVM 3.1 ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple
  • 33. Numba Example: Computing the Polynomial import numpy as np import numba as nb N = 10*1000*1000 x = np.linspace(-1, 1, N) y = np.empty(N, dtype=np.float64) @nb.jit(arg_types=[nb.f8[:], nb.f8[:]]) def poly(x, y): for i in range(N): # y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2 y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2 poly(x, y) # run through Numba!
  • 34. Times for Computing the Polynomial (In Seconds) Poly version (I) (II) Numpy 1.086 0.505 numexpr 0.108 0.096 Numba 0.055 0.054 Pure C, OpenMP 0.215 0.054 • Compilation time for Numba: 0.019 sec • Run on Mac OSX, Core2 Duo @ 2.13 GHz
  • 35. Numba: LLVM for Python Python code can reach C speed without having to program in C itself (and without losing interactivity!)
  • 36. Numba in SC 2012
  • 37. Numba in SC2012 Awesome Python!
  • 38. If a datastore requires all data to fit in memory, it isn't big data -- Alex Gaynor (in twitter) Optimal Containers for Big Data
  • 39. The Need for a Good Data Container • Too many times we are too focused on computing as fast as possible • But we have seen how important data access is • Hence, having an optimal data structure is critical for getting good performance when processing very large datasets
  • 40. Appending Data in Large NumPy Objects array to be enlarged final array object Copy! New memory new data to append allocation • Normally a realloc() syscall will not succeed • Both memory areas have to exist simultaneously
  • 41. Contiguous vs Chunked NumPy container Blaze container chunk 1 chunk 2 . . . chunk N Contiguous memory Discontiguous memory
  • 42. Appending data in Blaze array to be enlarged final array object X chunk 1 chunk 1 chunk 2 chunk 2 compress new data to append new chunk Only a small amount of data has to be compressed
  • 43. Blosc: (de)compressing faster than memcpy() Transmission + decompression faster than direct transfer?
  • 44. TABLE 1 Test Data Sets Example of How Blosc Accelerates Genomics I/O: # 1 Source 1000 Genomes Identifier ERR000018 Sequencer Illumina GA Read Count 9,280,498 Read Length 36 bp ID Lengths 40–50 FASTQ Size 1,105 MB 2 3 SeqPack (backed by Blosc) 1000 Genomes 1000 Genomes SRR493233 1 SRR497004 1 Illumina HiSeq 2000 AB SOLiD 4 43,225,060 122,924,963 100 bp 51 bp 51–61 78–91 10,916 MB 22,990 MB g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each equence is 256 bytes Howison, M. (in press). High-throughput compression of FASTQ data Source: long). with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics. to a memory buffer, timed the compression of block consistent throughput across both compression and
  • 45. How Blaze Does Out- Of-Core Computations                                                         Virtual Machine : Python, numexpr, Numba
  • 46. Last Message for Today Big data is tricky to manage: Look for the optimal containers for your data Spending some time choosing your appropriate data container can be a big time saver in the long run
  • 47. Summary • Python is a perfect language for Big Data • Nowadays you should be aware of the memory system for getting good performance • Choosing appropriate data containers is of the utmost importance when dealing with Big Data
  • 48. “El éxito del Big Data lo conseguirán aquellos desarrolladores que sean capaces de mirar más allá del standard y sean capaces de entender los recursos hardware subyacentes y la variedad de algoritmos disponibles.” -- Oscar de Bustos, HPC Line of Business Manager at BULL