High Performance with Java


   malduarte@gmail.com
Foreword

In the beginning was the Tao. The Tao gave birth
  to Space and Time. Therefore Space and Time
  are Yin and Yan...
What is High Performance?

                                                                      2 X Sun SPARC Enterprise
...
High Performance is all about
“Delivering solutions which meet
  requirements within time and space
  constraints using av...
Why Java?

• Mature technology
• Speedy and Stable VMs (those who were
  burned in the early days still loath it,
  though...
GSM 101




Source: en.wikipedia.org/wiki/GSM
A small case study

• Goal: Analyse 17 G (gzip’ed) worth of
  MSC Call Detail Records (CDRs in Mobile
  Operator Lingo)
Sn...
A bit more info

• Aproximatly 170 G uncompressed
• Exactly 359 014 695 cdrs
• Trivia: about 3 days worth of GSM call
  lo...
Performance Budget




Network      Disk
Bandwith   Bandwith
                      Memory   CPU
  and        and
Latency  ...
If you don’t take a temperature you
can’t find a fever
• Measure the progress as the system is
  implemented
• Make *hones...
quot;The journey of a thousand miles starts
with a single step.quot; Lao Tse
• Line read performance

1811229 Line Sample
...
I/O Tips

• Use Memory Mapped Files (see
  FileChannel.map and MappedByteBuffer
  APIS)
• Use Buffered I/O - BufferedInput...
One more step

• Extract date of call and customer phone
  number

04|001|268061100021547|3519XXXXXXXX|3519800049344611|||...
Split lines by columns

String fields[] = line.split(quot;|quot;);

Sample timmings:
real    1m0.670s
user    1m1.252s
sys...
When in doubt, profile




      ~85% spent splitting fields!
Tune

String fields[] = split(line, '|', 3,10,11);
Sample timmings:
real    0m13.450s
user    0m13.425s
sys     0m3.965s

...
Must get SIM card data

•   SIM card Type (prepaid, postpaid, ...)
•   ~ 15 million record table
•   Database constantly u...
Database Tips (JDBC)

 – Reuse connections!
 – Read only ? setReadOnly(true)
 – Allways use PreparedStatements
 – Allways ...
Ooops

• Too slow!
• Assuming an average rate of 4000 q/s:

ETA: ~ 1 day, 56 minutes
Alternatives



• TimesTen     • H2         • BerkeyleyDb
• SolidDb      • Hsqldb     • Infinitydb
               • Derby
...
Must keep a balance


                        Performance




           Cost,
         Complexity,
       Learning Curve
...
Remebering old times

• In C/C++ you could map structs to
  memory
• The amount of information needed is 16
  bytes per SI...
Way faster!

• No extra libraries, 40 lines of simple java
  code

ETA: 1 hour, 30 minutes e 35
 seconds

Above the budget...
Put those extra cores to work

• 6 Quad Core 2.4ghz - 6 MB L2
  Cache,Sparc VII CPUs, 48 hw threads,
  32Gb RAM
• Split th...
Concurrent tips

• Concurrent programming is really hard!
• But you’re not going to be able to avoid it
  (cpu speed incre...
Mission Acomplished

• With 8 threads of the 48 possible

Real running time: 10 minutes,
 23 seconds
Near linear scaling!
...
What about Network I/O
• 1 thread per client using blocking I/O does
  not scale
• Use Nonblocking I/O
• VM implementors w...
A few extra tips
• Know your VM
• Not all VMs are created equal
• Even without changing a line of code you
  can improve t...
What about designing and maintaining
complex systems
• Implement a feature complete solution in
  small scale
• Learn the ...
Keep measuaring after the system
goes live
“The only man I know who behaves sensibly
  is my tailor; he takes my measureme...
Code snippets – A (way) faster split
String fields[] = split(line, '|', 3,10,11);

public static String[] split(String l, ...
Static in-memory “database”: Poor
man’s solution (but as fast as it gets)
public class ClientFile implements List<CardInfo...
Static in-memory “database”:
 (continued)
(…)
      public CardInfo get(int index) {
          return new CardInfo(clients...
Questions?


    1€       • Answers


    5€       • Answers that require thought


   20 €      • Correct Answers


For F...
Upcoming SlideShare
Loading in …5
×

Alto Desempenho com Java

3,230 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,230
On SlideShare
0
From Embeds
0
Number of Embeds
998
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Alto Desempenho com Java

  1. High Performance with Java malduarte@gmail.com
  2. Foreword In the beginning was the Tao. The Tao gave birth to Space and Time. Therefore Space and Time are Yin and Yang of programming. Programmers that do not comprehend the Tao are always running out of time and space for their programs. Programmers that comprehend the Tao always have enough time and space to accomplish their goals. How could it be otherwise? From www.canonical.org/~kragen/tao-of-programming.htm
  3. What is High Performance? 2 X Sun SPARC Enterprise M5000 •HitachiH8 8 bit cpu, 16 MHz 6 Quad Core 2.4ghz - 6 MB L2 •32 kb Ram Cache,Sparc VII CPUs, 48 hw threads, 32Gb RAM Sources: Sun Microsystems: www.sun.com/servers/midrange/m5000/ WikiPedia: en.wikipedia.org/wiki/Lego_Mindstorms Aad van der Steen HPC Page - www.phys.uu.nl/~steen/web08/sparc.html
  4. High Performance is all about “Delivering solutions which meet requirements within time and space constraints using available resources rationally” The most important resource: brain time. HW increases performance with time, brain decreases performance with time.
  5. Why Java? • Mature technology • Speedy and Stable VMs (those who were burned in the early days still loath it, though) • Lots of high quality tools • Lots of high quality available libraries • Large ecosystem • NOT the language itself 
  6. GSM 101 Source: en.wikipedia.org/wiki/GSM
  7. A small case study • Goal: Analyse 17 G (gzip’ed) worth of MSC Call Detail Records (CDRs in Mobile Operator Lingo) Snippet: 04|001|26806XXXXXXXXXX|3519XXXXXXXX|3519800049344611|||||| 081105|002559|||00062|00|000-076|015-113||||MALM1 |0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|268061011305482|B AL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305482||3e3212003 4df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|3519XXXXXXXX||25||||||0|01| 9XXXXXXXX|002559|081105|00062||2||5||||||||||||3||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||| Note: Sensitive information was hidden
  8. A bit more info • Aproximatly 170 G uncompressed • Exactly 359 014 695 cdrs • Trivia: about 3 days worth of GSM call logs. • Correlate CDRs with Customer information • Peformance goal : running time must be below one hour.
  9. Performance Budget Network Disk Bandwith Bandwith Memory CPU and and Latency Latency
  10. If you don’t take a temperature you can’t find a fever • Measure the progress as the system is implemented • Make *honest* measurements. Prove yourself wrong. • Avoid premature optimization – How can you know? If you’re within your performance budget don’t worry (*) Fat Man’s Law X – “House of God” Samuel Shen - http://en.wikipedia.org/wiki/The_House_of_God
  11. quot;The journey of a thousand miles starts with a single step.quot; Lao Tse • Line read performance 1811229 Line Sample Sample timmings: real 0m13.872s user 0m13.366s sys 0m4.056s ETA: ~45 minutes
  12. I/O Tips • Use Memory Mapped Files (see FileChannel.map and MappedByteBuffer APIS) • Use Buffered I/O - BufferedInputStream • Optimal buffer size multiple of OS page size (usually 8k) • If the process is I/O bound and have fast CPUs, consider processing compressed files
  13. One more step • Extract date of call and customer phone number 04|001|268061100021547|3519XXXXXXXX|3519800049344611|||||| 081105|002559|||00062|00|000-076|015-113||||MALM1 |0|01|9XXXXXXXX|11|||2|1|MICOUT|0|0||||||||||||||||||331985|2680610113 05482|BAL10A|15|22|12402523|||||||||||||||||||||||||||||||02|||||100001011305 482||3e32120034df00|||0|1|17|||1|||3||1|01|3519XXXXXXXX||||1|01|351 9XXXXXXXX||25||||||0|01|9XXXXXXXX|002559|081105|00062||2||5||||||| |||||3||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Censored numbers to protect the innocent 
  14. Split lines by columns String fields[] = line.split(quot;|quot;); Sample timmings: real 1m0.670s user 1m1.252s sys 0m6.015s ETA: 3 hours, 18 minutes ~ 6 x slower!!! Exceeded the performance budget
  15. When in doubt, profile ~85% spent splitting fields!
  16. Tune String fields[] = split(line, '|', 3,10,11); Sample timmings: real 0m13.450s user 0m13.425s sys 0m3.965s ETA: 44 minutes e 35 seconds 14 extra lines of java code and we’re back on track
  17. Must get SIM card data • SIM card Type (prepaid, postpaid, ...) • ~ 15 million record table • Database constantly under load • 4000 querys/s (0.25 ms/q) spare capacity
  18. Database Tips (JDBC) – Reuse connections! – Read only ? setReadOnly(true) – Allways use PreparedStatements – Allways explicitly close ResultSet (GC friendly) – Turn off autocommit – Use batched operations and transactions in CRUD type accesses – Large ResultSets? Increase fetch size! rs.setFetchSize(XXX)
  19. Ooops • Too slow! • Assuming an average rate of 4000 q/s: ETA: ~ 1 day, 56 minutes
  20. Alternatives • TimesTen • H2 • BerkeyleyDb • SolidDb • Hsqldb • Infinitydb • Derby In Memory Emebeded Others Databases Relational Embebed
  21. Must keep a balance Performance Cost, Complexity, Learning Curve (aka neuron Time), Maintenance
  22. Remebering old times • In C/C++ you could map structs to memory • The amount of information needed is 16 bytes per SIM card (phone number, start date, end date, type of card – 4 * 4 bytes) • ~ 343 M if stored in a compact form (int[]) • Sort the data and wrap the array in a List • Use Collections.binarySearch to do the heavy lifting
  23. Way faster! • No extra libraries, 40 lines of simple java code ETA: 1 hour, 30 minutes e 35 seconds Above the budget 
  24. Put those extra cores to work • 6 Quad Core 2.4ghz - 6 MB L2 Cache,Sparc VII CPUs, 48 hw threads, 32Gb RAM • Split the data in work units • Split the work units among the threads • Collect the results when the treads finish
  25. Concurrent tips • Concurrent programming is really hard! • But you’re not going to be able to avoid it (cpu speed increases per core stalled, cores are increasing in number) • Don’t share R/W data among threads • Locking will kill performance • Be aware of memory architecture java.sun.com/javase/6/docs/technotes/guide s/concurrency/index.html
  26. Mission Acomplished • With 8 threads of the 48 possible Real running time: 10 minutes, 23 seconds Near linear scaling! There’s no point in optimizing more. We’ve just entered the Law of Diminishing returns en.wikipedia.org/wiki/Diminishing_returns
  27. What about Network I/O • 1 thread per client using blocking I/O does not scale • Use Nonblocking I/O • VM implementors will (problaby) use the best API in the host OS (/dev/epoll in Linux Kernel 2.6 for example) • NBIO is hard. Don’t reinvent the wheel. See Apache Mina - mina.apache.org • Scales to over 10.000k connections easily!
  28. A few extra tips • Know your VM • Not all VMs are created equal • Even without changing a line of code you can improve things, if you know what you’re doing • If you’re using the SUN VM try the Server VM (default is Client VM) • Plenty of options to fiddle blogs.sun.com/watt/resource/jvm-options- list.html
  29. What about designing and maintaining complex systems • Implement a feature complete solution in small scale • Learn the performance characteristics. Implement benchmarks. • Change the architecture if needed • How much does it cost? It’s all about €€€€€ (licensing, hardware, human resources, rack space, energy, cooling requirements, maintenance,...)
  30. Keep measuaring after the system goes live “The only man I know who behaves sensibly is my tailor; he takes my measurements anew each time he sees me. The rest go on with their old measurements and expect me to fit them.” George Bernard Shaw - en.wikiquote.org/wiki/George_Bernard_Shaw • Specially if you keep adding features
  31. Code snippets – A (way) faster split String fields[] = split(line, '|', 3,10,11); public static String[] split(String l, char sep, int... columns) { String[] fields = new String[columns.length]; int start = 0, column = 0, end, i = 0; while((end = l.indexOf(sep, start)) != -1) { if(column++ == columns[i]) { fields[i] = l.substring(start, end); if(++i == columns.length) return fields; } start = end + 1; } if(column == columns[i]) fields[i] = l.substring(start); return fields; }
  32. Static in-memory “database”: Poor man’s solution (but as fast as it gets) public class ClientFile implements List<CardInfo>, RandomAccess { static final int CLIENT_SIZE = 16; int[] clients; public ClientFile() throws FileNotFoundException, IOException { File f = new File(quot;clientes.dbquot;); FileInputStream fs = new FileInputStream(f); int client_count = (int)f.length() / CLIENT_SIZE; clients = new int[client_count * 4]; byte b[] = new byte[(int) f.length()]; fs.read(b); for(int i = 0;i != client_count; ++i) { clients[i * 4] = toi(b, i * CLIENT_SIZE); clients[i * 4 + 1] = toi(b, i * CLIENT_SIZE + 4); clients[i * 4 + 2] = toi(b, i * CLIENT_SIZE + 8); clients[i * 4 + 3] = toi(b, i * CLIENT_SIZE + 12); } } // map byte[] to integer public int toi(byte[] b, int offset) { return ((0xFF & b[offset]) << 24) + ((0xFF & b[offset + 1]) << 16) + ((0xFF & b[offset + 2]) << 8) + (0xFF & b[offset + 3]); } (…)
  33. Static in-memory “database”: (continued) (…) public CardInfo get(int index) { return new CardInfo(clients[index * 4], clients[index * 4 + 1], clients[index * 4 + 2], clients[index * 4 + 3]); } public CardInfo getCardInfo(String msisdn, String yymmdd, String hhmmss){ Calendar cal = Calendar.getInstance(); cal.set(i(yymmdd, 0, 1) + 2000, i(yymmdd, 2, 3) - 1, i(yymmdd, 4, 5), i(hhmmss, 0, 1), i(hhmmss, 2, 3), i(hhmmss, 4, 5)); int idx = Collections.binarySearch(this, new Key(i(msisdn), (int)(cal.getTimeInMillis() / 1000))); if (idx < 0) { return null; } return get(idx); }
  34. Questions? 1€ • Answers 5€ • Answers that require thought 20 € • Correct Answers For Free! • Dumb looks

×