SlideShare a Scribd company logo
1 of 34
Download to read offline
High Performance with Java


   malduarte@gmail.com
Foreword

In the beginning was the Tao. The Tao gave birth
  to Space and Time. Therefore Space and Time
  are Yin and Yang of programming.
Programmers that do not comprehend the Tao are
  always running out of time and space for their
  programs. Programmers that comprehend the
  Tao always have enough time and space to
  accomplish their goals.
How could it be otherwise?
From www.canonical.org/~kragen/tao-of-programming.htm
What is High Performance?

                                                                      2 X Sun SPARC Enterprise
                                                                      M5000
      •HitachiH8 8 bit cpu, 16 MHz                                    6 Quad Core 2.4ghz - 6 MB L2
      •32 kb Ram                                                      Cache,Sparc VII CPUs, 48 hw
                                                                      threads, 32Gb RAM




Sources:
Sun Microsystems: www.sun.com/servers/midrange/m5000/
WikiPedia: en.wikipedia.org/wiki/Lego_Mindstorms
Aad van der Steen HPC Page - www.phys.uu.nl/~steen/web08/sparc.html
High Performance is all about
“Delivering solutions which meet
  requirements within time and space
  constraints using available resources
  rationally”

The most important resource: brain time.

HW increases performance with time, brain
 decreases performance with time.
Why Java?

• Mature technology
• Speedy and Stable VMs (those who were
  burned in the early days still loath it,
  though)
• Lots of high quality tools
• Lots of high quality available libraries
• Large ecosystem
• NOT the language itself 
GSM 101




Source: en.wikipedia.org/wiki/GSM
A small case study

• Goal: Analyse 17 G (gzip’ed) worth of
  MSC Call Detail Records (CDRs in Mobile
  Operator Lingo)
Snippet:
04|001|26806XXXXXXXXXX|3519XXXXXXXX|3519800049344611||||||
081105|002559|||00062|00|000-076|015-113||||MALM1
|0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|268061011305482|B
AL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305482||3e3212003
4df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|3519XXXXXXXX||25||||||0|01|
9XXXXXXXX|002559|081105|00062||2||5||||||||||||3|||||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||



Note: Sensitive information was hidden
A bit more info

• Aproximatly 170 G uncompressed
• Exactly 359 014 695 cdrs
• Trivia: about 3 days worth of GSM call
  logs.
• Correlate CDRs with Customer information
• Peformance goal : running time must be
  below one hour.
Performance Budget




Network      Disk
Bandwith   Bandwith
                      Memory   CPU
  and        and
Latency    Latency
If you don’t take a temperature you
can’t find a fever
• Measure the progress as the system is
  implemented
• Make *honest* measurements. Prove
  yourself wrong.
• Avoid premature optimization – How can
  you know? If you’re within your
  performance budget don’t worry

(*) Fat Man’s Law X – “House of God”
Samuel Shen - http://en.wikipedia.org/wiki/The_House_of_God
quot;The journey of a thousand miles starts
with a single step.quot; Lao Tse
• Line read performance

1811229 Line Sample

Sample timmings:
real    0m13.872s
user    0m13.366s
sys     0m4.056s

ETA: ~45 minutes
I/O Tips

• Use Memory Mapped Files (see
  FileChannel.map and MappedByteBuffer
  APIS)
• Use Buffered I/O - BufferedInputStream
• Optimal buffer size multiple of OS page
  size (usually 8k)
• If the process is I/O bound and have fast
  CPUs, consider processing compressed
  files
One more step

• Extract date of call and customer phone
  number

04|001|268061100021547|3519XXXXXXXX|3519800049344611||||||
081105|002559|||00062|00|000-076|015-113||||MALM1
|0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|2680610113
05482|BAL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305
482||3e32120034df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|351
9XXXXXXXX||25||||||0|01|9XXXXXXXX|002559|081105|00062||2||5|||||||
|||||3|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



Censored numbers to protect the innocent 
Split lines by columns

String fields[] = line.split(quot;|quot;);

Sample timmings:
real    1m0.670s
user    1m1.252s
sys     0m6.015s

ETA: 3 hours, 18 minutes



~ 6 x slower!!! Exceeded the performance budget
When in doubt, profile




      ~85% spent splitting fields!
Tune

String fields[] = split(line, '|', 3,10,11);
Sample timmings:
real    0m13.450s
user    0m13.425s
sys     0m3.965s



ETA: 44 minutes e 35 seconds


14 extra lines of java code and we’re back on track
Must get SIM card data

•   SIM card Type (prepaid, postpaid, ...)
•   ~ 15 million record table
•   Database constantly under load
•   4000 querys/s (0.25 ms/q) spare capacity
Database Tips (JDBC)

 – Reuse connections!
 – Read only ? setReadOnly(true)
 – Allways use PreparedStatements
 – Allways explicitly close ResultSet (GC
   friendly)
 – Turn off autocommit
 – Use batched operations and transactions in
   CRUD type accesses
 – Large ResultSets? Increase fetch size!
   rs.setFetchSize(XXX)
Ooops

• Too slow!
• Assuming an average rate of 4000 q/s:

ETA: ~ 1 day, 56 minutes
Alternatives



• TimesTen     • H2         • BerkeyleyDb
• SolidDb      • Hsqldb     • Infinitydb
               • Derby

In Memory      Emebeded     Others
Databases      Relational   Embebed
Must keep a balance


                        Performance




           Cost,
         Complexity,
       Learning Curve
        (aka neuron
           Time),
        Maintenance
Remebering old times

• In C/C++ you could map structs to
  memory
• The amount of information needed is 16
  bytes per SIM card (phone number, start
  date, end date, type of card – 4 * 4 bytes)
• ~ 343 M if stored in a compact form (int[])
• Sort the data and wrap the array in a List
• Use Collections.binarySearch to do the
  heavy lifting
Way faster!

• No extra libraries, 40 lines of simple java
  code

ETA: 1 hour, 30 minutes e 35
 seconds

Above the budget 
Put those extra cores to work

• 6 Quad Core 2.4ghz - 6 MB L2
  Cache,Sparc VII CPUs, 48 hw threads,
  32Gb RAM
• Split the data in work units
• Split the work units among the threads
• Collect the results when the treads finish
Concurrent tips

• Concurrent programming is really hard!
• But you’re not going to be able to avoid it
  (cpu speed increases per core stalled,
  cores are increasing in number)
• Don’t share R/W data among threads
• Locking will kill performance
• Be aware of memory architecture
java.sun.com/javase/6/docs/technotes/guide
  s/concurrency/index.html
Mission Acomplished

• With 8 threads of the 48 possible

Real running time: 10 minutes,
 23 seconds
Near linear scaling!
There’s no point in optimizing more. We’ve
 just entered the Law of Diminishing returns
 en.wikipedia.org/wiki/Diminishing_returns
What about Network I/O
• 1 thread per client using blocking I/O does
  not scale
• Use Nonblocking I/O
• VM implementors will (problaby) use the
  best API in the host OS (/dev/epoll in
  Linux Kernel 2.6 for example)
• NBIO is hard. Don’t reinvent the wheel.
  See Apache Mina - mina.apache.org
• Scales to over 10.000k connections easily!
A few extra tips
• Know your VM
• Not all VMs are created equal
• Even without changing a line of code you
  can improve things, if you know what
  you’re doing
• If you’re using the SUN VM try the Server
  VM (default is Client VM)
• Plenty of options to fiddle
  blogs.sun.com/watt/resource/jvm-options-
  list.html
What about designing and maintaining
complex systems
• Implement a feature complete solution in
  small scale
• Learn the performance characteristics.
  Implement benchmarks.
• Change the architecture if needed
• How much does it cost? It’s all about
  €€€€€ (licensing, hardware, human
  resources, rack space, energy, cooling
  requirements, maintenance,...)
Keep measuaring after the system
goes live
“The only man I know who behaves sensibly
  is my tailor; he takes my measurements
  anew each time he sees me. The rest go
  on with their old measurements and
  expect me to fit them.”
George Bernard Shaw -
en.wikiquote.org/wiki/George_Bernard_Shaw




• Specially if you keep adding features
Code snippets – A (way) faster split
String fields[] = split(line, '|', 3,10,11);

public static String[] split(String l, char sep, int... columns) {
    String[] fields = new String[columns.length];
    int start = 0, column = 0, end, i = 0;
    while((end = l.indexOf(sep, start)) != -1) {
        if(column++ == columns[i]) {
             fields[i] = l.substring(start, end);
            if(++i == columns.length)
                return fields;
        }
        start = end + 1;
    }
    if(column == columns[i])
         fields[i] = l.substring(start);
    return fields;
}
Static in-memory “database”: Poor
man’s solution (but as fast as it gets)
public class ClientFile implements List<CardInfo>, RandomAccess {
    static final int CLIENT_SIZE = 16;
    int[] clients;

      public ClientFile() throws FileNotFoundException, IOException {
          File f = new File(quot;clientes.dbquot;);
          FileInputStream fs = new FileInputStream(f);
          int client_count = (int)f.length() / CLIENT_SIZE;
          clients = new int[client_count * 4];
          byte b[] = new byte[(int) f.length()];
          fs.read(b);

          for(int i = 0;i   != client_count; ++i) {
              clients[i *   4] =     toi(b, i * CLIENT_SIZE);
              clients[i *   4 + 1] = toi(b, i * CLIENT_SIZE + 4);
              clients[i *   4 + 2] = toi(b, i * CLIENT_SIZE + 8);
              clients[i *   4 + 3] = toi(b, i * CLIENT_SIZE + 12);
          }
      }
      // map byte[] to integer
      public int toi(byte[] b, int offset) {
          return ((0xFF & b[offset])       << 24) +
                  ((0xFF & b[offset + 1]) << 16) +
                  ((0xFF & b[offset + 2]) << 8) +
                   (0xFF & b[offset + 3]);
      }
(…)
Static in-memory “database”:
 (continued)
(…)
      public CardInfo get(int index) {
          return new CardInfo(clients[index * 4],
                             clients[index * 4 + 1],
                             clients[index * 4 + 2],
                             clients[index * 4 + 3]);
      }

  public CardInfo getCardInfo(String msisdn, String yymmdd, String hhmmss){
      Calendar cal = Calendar.getInstance();
      cal.set(i(yymmdd, 0, 1) + 2000, i(yymmdd, 2, 3) - 1, i(yymmdd, 4, 5),
              i(hhmmss, 0, 1), i(hhmmss, 2, 3), i(hhmmss, 4, 5));


        int idx = Collections.binarySearch(this,
                       new Key(i(msisdn),
                               (int)(cal.getTimeInMillis() / 1000)));
        if (idx < 0) {
            return null;
        }
        return get(idx);
  }
Questions?


    1€       • Answers


    5€       • Answers that require thought


   20 €      • Correct Answers


For Free!    • Dumb looks

More Related Content

What's hot

Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningMilind Koyande
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the FieldMongoDB
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)David Evans
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005dflexer
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合Hitoshi Sato
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatChris Barber
 

What's hot (20)

Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and Tuning
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & Heartbeat
 

Similar to Alto Desempenho com Java

High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded DevelopmentSQABD
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKristofferson A
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and presentMuhammad Danish Badar
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...Flink Forward
 

Similar to Alto Desempenho com Java (20)

High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded Development
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and present
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...
Flink Forward Berlin 2018: George Theodorakis - "Hardware-efficient Stream Pr...
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 

More from codebits

Gis SAPO Hands On
Gis SAPO Hands OnGis SAPO Hands On
Gis SAPO Hands Oncodebits
 
Aplicações Web TV no Meo
Aplicações Web TV no MeoAplicações Web TV no Meo
Aplicações Web TV no Meocodebits
 
Forms Usability 101
Forms Usability 101Forms Usability 101
Forms Usability 101codebits
 
Speak up: como criar Speech-based apps
Speak up: como criar Speech-based appsSpeak up: como criar Speech-based apps
Speak up: como criar Speech-based appscodebits
 
XMPP Hands-On
XMPP Hands-OnXMPP Hands-On
XMPP Hands-Oncodebits
 
Mitos da Acessibilidade Web
Mitos da Acessibilidade WebMitos da Acessibilidade Web
Mitos da Acessibilidade Webcodebits
 
Getting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko FreerunnerGetting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko Freerunnercodebits
 
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...codebits
 
Getting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko FreerunnerGetting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko Freerunnercodebits
 
Exploring XMPP
Exploring XMPPExploring XMPP
Exploring XMPPcodebits
 
Sapo BUS Hands-On
Sapo BUS Hands-OnSapo BUS Hands-On
Sapo BUS Hands-Oncodebits
 
Qtractor - An Audio/MIDI multi-track sequencer
Qtractor - An Audio/MIDI multi-track sequencerQtractor - An Audio/MIDI multi-track sequencer
Qtractor - An Audio/MIDI multi-track sequencercodebits
 
Making the Chumby
Making the ChumbyMaking the Chumby
Making the Chumbycodebits
 
Globs - Gestão de Glossários
Globs - Gestão de GlossáriosGlobs - Gestão de Glossários
Globs - Gestão de Glossárioscodebits
 
ATrad - Sistema de Garantia de Qualidade de Traduções
ATrad - Sistema de Garantia de Qualidade de TraduçõesATrad - Sistema de Garantia de Qualidade de Traduções
ATrad - Sistema de Garantia de Qualidade de Traduçõescodebits
 
Sapo GIS Hands-On
Sapo GIS Hands-OnSapo GIS Hands-On
Sapo GIS Hands-Oncodebits
 
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008codebits
 
Optimização de pesquisas Web utilizando Colónias de Formigas
Optimização de pesquisas Web utilizando Colónias de FormigasOptimização de pesquisas Web utilizando Colónias de Formigas
Optimização de pesquisas Web utilizando Colónias de Formigascodebits
 

More from codebits (20)

Gis SAPO Hands On
Gis SAPO Hands OnGis SAPO Hands On
Gis SAPO Hands On
 
Aplicações Web TV no Meo
Aplicações Web TV no MeoAplicações Web TV no Meo
Aplicações Web TV no Meo
 
Forms Usability 101
Forms Usability 101Forms Usability 101
Forms Usability 101
 
Speak up: como criar Speech-based apps
Speak up: como criar Speech-based appsSpeak up: como criar Speech-based apps
Speak up: como criar Speech-based apps
 
XMPP Hands-On
XMPP Hands-OnXMPP Hands-On
XMPP Hands-On
 
Mitos da Acessibilidade Web
Mitos da Acessibilidade WebMitos da Acessibilidade Web
Mitos da Acessibilidade Web
 
Getting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko FreerunnerGetting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko Freerunner
 
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...
Hardware Hacking area: Make Cool Things with Microcontrollers (and learn to s...
 
CouchDB
CouchDBCouchDB
CouchDB
 
Getting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko FreerunnerGetting started with mobile devices development - Openmoko Freerunner
Getting started with mobile devices development - Openmoko Freerunner
 
Exploring XMPP
Exploring XMPPExploring XMPP
Exploring XMPP
 
Sapo BUS Hands-On
Sapo BUS Hands-OnSapo BUS Hands-On
Sapo BUS Hands-On
 
Qtractor - An Audio/MIDI multi-track sequencer
Qtractor - An Audio/MIDI multi-track sequencerQtractor - An Audio/MIDI multi-track sequencer
Qtractor - An Audio/MIDI multi-track sequencer
 
Making the Chumby
Making the ChumbyMaking the Chumby
Making the Chumby
 
Globs - Gestão de Glossários
Globs - Gestão de GlossáriosGlobs - Gestão de Glossários
Globs - Gestão de Glossários
 
ATrad - Sistema de Garantia de Qualidade de Traduções
ATrad - Sistema de Garantia de Qualidade de TraduçõesATrad - Sistema de Garantia de Qualidade de Traduções
ATrad - Sistema de Garantia de Qualidade de Traduções
 
Sapo GIS Hands-On
Sapo GIS Hands-OnSapo GIS Hands-On
Sapo GIS Hands-On
 
Gis@sapo
Gis@sapoGis@sapo
Gis@sapo
 
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
 
Optimização de pesquisas Web utilizando Colónias de Formigas
Optimização de pesquisas Web utilizando Colónias de FormigasOptimização de pesquisas Web utilizando Colónias de Formigas
Optimização de pesquisas Web utilizando Colónias de Formigas
 

Alto Desempenho com Java

  • 1. High Performance with Java malduarte@gmail.com
  • 2. Foreword In the beginning was the Tao. The Tao gave birth to Space and Time. Therefore Space and Time are Yin and Yang of programming. Programmers that do not comprehend the Tao are always running out of time and space for their programs. Programmers that comprehend the Tao always have enough time and space to accomplish their goals. How could it be otherwise? From www.canonical.org/~kragen/tao-of-programming.htm
  • 3. What is High Performance? 2 X Sun SPARC Enterprise M5000 •HitachiH8 8 bit cpu, 16 MHz 6 Quad Core 2.4ghz - 6 MB L2 •32 kb Ram Cache,Sparc VII CPUs, 48 hw threads, 32Gb RAM Sources: Sun Microsystems: www.sun.com/servers/midrange/m5000/ WikiPedia: en.wikipedia.org/wiki/Lego_Mindstorms Aad van der Steen HPC Page - www.phys.uu.nl/~steen/web08/sparc.html
  • 4. High Performance is all about “Delivering solutions which meet requirements within time and space constraints using available resources rationally” The most important resource: brain time. HW increases performance with time, brain decreases performance with time.
  • 5. Why Java? • Mature technology • Speedy and Stable VMs (those who were burned in the early days still loath it, though) • Lots of high quality tools • Lots of high quality available libraries • Large ecosystem • NOT the language itself 
  • 7. A small case study • Goal: Analyse 17 G (gzip’ed) worth of MSC Call Detail Records (CDRs in Mobile Operator Lingo) Snippet: 04|001|26806XXXXXXXXXX|3519XXXXXXXX|3519800049344611|||||| 081105|002559|||00062|00|000-076|015-113||||MALM1 |0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|268061011305482|B AL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305482||3e3212003 4df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|3519XXXXXXXX||25||||||0|01| 9XXXXXXXX|002559|081105|00062||2||5||||||||||||3||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||| Note: Sensitive information was hidden
  • 8. A bit more info • Aproximatly 170 G uncompressed • Exactly 359 014 695 cdrs • Trivia: about 3 days worth of GSM call logs. • Correlate CDRs with Customer information • Peformance goal : running time must be below one hour.
  • 9. Performance Budget Network Disk Bandwith Bandwith Memory CPU and and Latency Latency
  • 10. If you don’t take a temperature you can’t find a fever • Measure the progress as the system is implemented • Make *honest* measurements. Prove yourself wrong. • Avoid premature optimization – How can you know? If you’re within your performance budget don’t worry (*) Fat Man’s Law X – “House of God” Samuel Shen - http://en.wikipedia.org/wiki/The_House_of_God
  • 11. quot;The journey of a thousand miles starts with a single step.quot; Lao Tse • Line read performance 1811229 Line Sample Sample timmings: real 0m13.872s user 0m13.366s sys 0m4.056s ETA: ~45 minutes
  • 12. I/O Tips • Use Memory Mapped Files (see FileChannel.map and MappedByteBuffer APIS) • Use Buffered I/O - BufferedInputStream • Optimal buffer size multiple of OS page size (usually 8k) • If the process is I/O bound and have fast CPUs, consider processing compressed files
  • 13. One more step • Extract date of call and customer phone number 04|001|268061100021547|3519XXXXXXXX|3519800049344611|||||| 081105|002559|||00062|00|000-076|015-113||||MALM1 |0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|2680610113 05482|BAL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305 482||3e32120034df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|351 9XXXXXXXX||25||||||0|01|9XXXXXXXX|002559|081105|00062||2||5||||||| |||||3||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Censored numbers to protect the innocent 
  • 14. Split lines by columns String fields[] = line.split(quot;|quot;); Sample timmings: real 1m0.670s user 1m1.252s sys 0m6.015s ETA: 3 hours, 18 minutes ~ 6 x slower!!! Exceeded the performance budget
  • 15. When in doubt, profile ~85% spent splitting fields!
  • 16. Tune String fields[] = split(line, '|', 3,10,11); Sample timmings: real 0m13.450s user 0m13.425s sys 0m3.965s ETA: 44 minutes e 35 seconds 14 extra lines of java code and we’re back on track
  • 17. Must get SIM card data • SIM card Type (prepaid, postpaid, ...) • ~ 15 million record table • Database constantly under load • 4000 querys/s (0.25 ms/q) spare capacity
  • 18. Database Tips (JDBC) – Reuse connections! – Read only ? setReadOnly(true) – Allways use PreparedStatements – Allways explicitly close ResultSet (GC friendly) – Turn off autocommit – Use batched operations and transactions in CRUD type accesses – Large ResultSets? Increase fetch size! rs.setFetchSize(XXX)
  • 19. Ooops • Too slow! • Assuming an average rate of 4000 q/s: ETA: ~ 1 day, 56 minutes
  • 20. Alternatives • TimesTen • H2 • BerkeyleyDb • SolidDb • Hsqldb • Infinitydb • Derby In Memory Emebeded Others Databases Relational Embebed
  • 21. Must keep a balance Performance Cost, Complexity, Learning Curve (aka neuron Time), Maintenance
  • 22. Remebering old times • In C/C++ you could map structs to memory • The amount of information needed is 16 bytes per SIM card (phone number, start date, end date, type of card – 4 * 4 bytes) • ~ 343 M if stored in a compact form (int[]) • Sort the data and wrap the array in a List • Use Collections.binarySearch to do the heavy lifting
  • 23. Way faster! • No extra libraries, 40 lines of simple java code ETA: 1 hour, 30 minutes e 35 seconds Above the budget 
  • 24. Put those extra cores to work • 6 Quad Core 2.4ghz - 6 MB L2 Cache,Sparc VII CPUs, 48 hw threads, 32Gb RAM • Split the data in work units • Split the work units among the threads • Collect the results when the treads finish
  • 25. Concurrent tips • Concurrent programming is really hard! • But you’re not going to be able to avoid it (cpu speed increases per core stalled, cores are increasing in number) • Don’t share R/W data among threads • Locking will kill performance • Be aware of memory architecture java.sun.com/javase/6/docs/technotes/guide s/concurrency/index.html
  • 26. Mission Acomplished • With 8 threads of the 48 possible Real running time: 10 minutes, 23 seconds Near linear scaling! There’s no point in optimizing more. We’ve just entered the Law of Diminishing returns en.wikipedia.org/wiki/Diminishing_returns
  • 27. What about Network I/O • 1 thread per client using blocking I/O does not scale • Use Nonblocking I/O • VM implementors will (problaby) use the best API in the host OS (/dev/epoll in Linux Kernel 2.6 for example) • NBIO is hard. Don’t reinvent the wheel. See Apache Mina - mina.apache.org • Scales to over 10.000k connections easily!
  • 28. A few extra tips • Know your VM • Not all VMs are created equal • Even without changing a line of code you can improve things, if you know what you’re doing • If you’re using the SUN VM try the Server VM (default is Client VM) • Plenty of options to fiddle blogs.sun.com/watt/resource/jvm-options- list.html
  • 29. What about designing and maintaining complex systems • Implement a feature complete solution in small scale • Learn the performance characteristics. Implement benchmarks. • Change the architecture if needed • How much does it cost? It’s all about €€€€€ (licensing, hardware, human resources, rack space, energy, cooling requirements, maintenance,...)
  • 30. Keep measuaring after the system goes live “The only man I know who behaves sensibly is my tailor; he takes my measurements anew each time he sees me. The rest go on with their old measurements and expect me to fit them.” George Bernard Shaw - en.wikiquote.org/wiki/George_Bernard_Shaw • Specially if you keep adding features
  • 31. Code snippets – A (way) faster split String fields[] = split(line, '|', 3,10,11); public static String[] split(String l, char sep, int... columns) { String[] fields = new String[columns.length]; int start = 0, column = 0, end, i = 0; while((end = l.indexOf(sep, start)) != -1) { if(column++ == columns[i]) { fields[i] = l.substring(start, end); if(++i == columns.length) return fields; } start = end + 1; } if(column == columns[i]) fields[i] = l.substring(start); return fields; }
  • 32. Static in-memory “database”: Poor man’s solution (but as fast as it gets) public class ClientFile implements List<CardInfo>, RandomAccess { static final int CLIENT_SIZE = 16; int[] clients; public ClientFile() throws FileNotFoundException, IOException { File f = new File(quot;clientes.dbquot;); FileInputStream fs = new FileInputStream(f); int client_count = (int)f.length() / CLIENT_SIZE; clients = new int[client_count * 4]; byte b[] = new byte[(int) f.length()]; fs.read(b); for(int i = 0;i != client_count; ++i) { clients[i * 4] = toi(b, i * CLIENT_SIZE); clients[i * 4 + 1] = toi(b, i * CLIENT_SIZE + 4); clients[i * 4 + 2] = toi(b, i * CLIENT_SIZE + 8); clients[i * 4 + 3] = toi(b, i * CLIENT_SIZE + 12); } } // map byte[] to integer public int toi(byte[] b, int offset) { return ((0xFF & b[offset]) << 24) + ((0xFF & b[offset + 1]) << 16) + ((0xFF & b[offset + 2]) << 8) + (0xFF & b[offset + 3]); } (…)
  • 33. Static in-memory “database”: (continued) (…) public CardInfo get(int index) { return new CardInfo(clients[index * 4], clients[index * 4 + 1], clients[index * 4 + 2], clients[index * 4 + 3]); } public CardInfo getCardInfo(String msisdn, String yymmdd, String hhmmss){ Calendar cal = Calendar.getInstance(); cal.set(i(yymmdd, 0, 1) + 2000, i(yymmdd, 2, 3) - 1, i(yymmdd, 4, 5), i(hhmmss, 0, 1), i(hhmmss, 2, 3), i(hhmmss, 4, 5)); int idx = Collections.binarySearch(this, new Key(i(msisdn), (int)(cal.getTimeInMillis() / 1000))); if (idx < 0) { return null; } return get(idx); }
  • 34. Questions? 1€ • Answers 5€ • Answers that require thought 20 € • Correct Answers For Free! • Dumb looks