Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My parallel universe

547 views

Published on

TItle: Mitt Parallella Universum
Author: Andreas Olofsson
Location: Lantinoware, Foz do Iguaçu, Brasil
Date: Oct, 2014
Abstract:
Andreas Olofsson é fundador da Adapteva (http://adapteva.com), empresa criada com o objetivo de trazer um avanço 10x em eficiência energética de processamento de ponto flutuante para o mercado de dispositivos móveis. Em maio de 2009, Olofsson tinha criado o primeiro protótipo com base em um novo tipo de processador de arquitetura multicore em paralelo. O protótipo inicial foi implementado em 65 nm e tinha 16 núcleos de microprocessadores independentes. Em setembro de 2012, a Adapteva começou o projeto Parallella no Kickstarter, que é comercializado como “supercomputador para todos.” Manuais de referência de arquitetura para a plataforma foram publicados como parte da campanha para atrair a atenção para o projeto. Foi solicitado $ 750.000 em financiamento sendo alcançado em um mês. O computador de placa única, com chip de Epifania 16-core, será disponível em maio de 2013 a um custo de $99.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

My parallel universe

  1. 1. 1 Mitt Parallella Universum Latinoware-2014 Andreas Olofsson andreas@adapteva.com Twitter: @adapteva
  2. 2. 2 The Prologue... In 2008 I quit my job to launch a chip startup with the goal of boosting processor energy efficiency by 25X.
  3. 3. 3 25X: Equivalent to driving from Patagonia to Alaska on one tank of gas!
  4. 4. 4 Why?
  5. 5. 5 Peak CPU Performance is Stalling!
  6. 6. 6 Real CPU performance is stalling!
  7. 7. 7 ..and it gets worse!
  8. 8. 8 So What?
  9. 9. 9 Communication Robotics IoT Datacenters/HPC Life without Moore is boring!
  10. 10. 10 The “Epiphany”
  11. 11. 11 The Epiphany Manycore Architecture RISC SRAM Router DMA No Caches No Standards No MMU No Legacy ...No Power
  12. 12. 12 Epiphany-III Power-up (2011): Success!! Happy but tired Pappa!
  13. 13. 13 The market reception.... 13 • Ambric • Asocs • Aspex • Axis Semi • BOPS • Boston Circuits • Brightscale • Chameleon • Clearspeed • Ambric • Asocs • Aspex • Axis Semi • BOPS • Boston Circuits • Brightscale • Chameleon • Clearspeed • PACT • Picochip • Plurality • Quicksilver • Rapport • Recore • Sandbridge • SiByte • TILERA • PACT • Picochip • Plurality • Quicksilver • Rapport • Recore • Sandbridge • SiByte • TILERA • SiCortex • Silicon Hive • Spiral Gateway • Stream Processors • Stretch • Venray • Xelerated • XMOS • Zililabs • SiCortex • Silicon Hive • Spiral Gateway • Stream Processors • Stretch • Venray • Xelerated • XMOS • Zililabs How the $@%# will we program this thing??
  14. 14. 14 There is no “C” of parallel programming Erlang SystemC Intel TBB Co-Fortran Lisp Janus Scala Haskell Pragmas Fortress Hadoop Linda Smalltalk CUDA Clojure UPC PVM Rust Julia OpenCL Go X10 Posix XC Occam OpenHMPP ParaSail APL Simulink Charm++ Occam-pi OpenMP Ada Labview Ptolemy StreamIt Verilog OpenACC C++Amp Rust Sisal Star-P VHDL Cilk Chapel MPI MCAPI Java
  15. 15. 15 The Problem(s)! ● Parallel programming is HARD! ● Productivity matters. Time is money ● <1% of developers know parallel programming Technology doesn't move backwards!
  16. 16. 16 The Obvious Answer: Open Source Collaboration!
  17. 17. 17 Presenting “Parallella” ● Launched in September 2012 at $99 (now starting at $119) ● Open source SW/HW! ● Runs Linux (Ubuntu) ● Dual-core ARM A9 processor ● A sizable FPGA ● 1GB RAM USB, HDMI, GigE ● 16/64 Epiphany coprocessors ● 50 Gbit/sec IO, 25/100 GFLOPS
  18. 18. 18 Parallella Mission and Principles ● Mission: To help make parallel computing ubiquitous ● Principles: ● Complete and open documentation ● Low cost ● Open source software ● Open standards ● Open source hardware (schematics, layout) ● Open collaboration: http://github.com/parallella http://forums.parallella.org
  19. 19. 19 Some Perspective... ● 1993 CM-5 ● 1024 processors ● 136 GFLOPS/100KW ● #1 in 1993 Top500 List ● Price: >$30M ● 2014 Parallella-64 ● 66 processors ● 100 GFLOPS*/5W ● #1 in energy efficiency ● Price: $199*
  20. 20. 20 Yes, but does it work?
  21. 21. 21 25X: Size does matter... Tianhe-2 ● 33 PFLOPS ● $390M USD ● 24 MW ● Insanity!!!! “There is STILL plenty of room at the bottom” 33 PFLOPS=~ 16 28nm Epiphany Wafers**
  22. 22. 22 Now What?
  23. 23. 23 Parallella Research in 2014 ● >10,000 Parallella boards shipped ● 200+ University collaborations ● $10K in hardware donated ● Active Research Areas: ● Computer science education ● Robotics/drones ● Software defined radio ● HPC
  24. 24. 24 Parallella Universities in South America Brazil: ● Sao Paolo State University ● CELTAB ● Federal University of Uberlandia Argentina: ● Universidad Austral, Argentina ● Universidad De Buenos Aires ● Universidad Nacional de La Plata ● Universidad Tecnologica Nacional ● Pontificia Universidad Javeriana ● Univesidad Nacional de Cordoba Chile: ● Universidad Mayor Colombia: ● Universidad Industrial de Santander
  25. 25. 25 Some Parallella Lessons ● Openness more important than cost ● You CAN build hardware with a profit outside China, we did it! ● Collaboration is VERY hard work ● Time is our devs' most precious resource ● Ease of use wins over performance very time.(simplicity+docs+support)
  26. 26. 26 How we benefited from open source ● As consumers: ● Linux, U-boot, Ubuntu, Beaglebone, Verilator ● As recipients: ● Eclipse Multicore IDE ($1M) ● OpenCL ($1M) ● Multicore Epiphany simulator ($50K) ● Demos ($50K)
  27. 27. 27 It is “your” responsibility to make pervasive parallel computing a reality! Explorers 1. Create the tools to make parallel programming easier 2. Create algorithms that scale (Amdahl) 3. Create a universal parallel software stack Teachers 1. Rewrite the computer science curriculum 2. Retrain 20M programmers
  28. 28. 28 The Future of HW: A Brief Summary Constraint --> Result Performance limits Massive parallelism Thermal density Slow clocks (1MHz-1GHz) Failure rate Distributed systems Bandwidth No shared resources Density 3D chip stacking Efficiency Heterogeneous HW Productivity Heterogeneous SW Amdahl's law New algorithms Development cost Open collaboration Latency Open collaboration
  29. 29. 29 Get ready now!! ●Critical code must be performance scalable to 1000 threads ●You (or a tool) will manage memory in software ●Know where in the universe your bits are stored! ●The hardware will fail often, can your SW handle it? ●The minimum number of languages is 2.
  30. 30. 30 The Future is Heterogeneous FPGA ● Irregular math ● IO ● Customization CPU ● Legacy code ● 90% of LOC ● <100GFLOPS ASIC ● Makes comeback at end of Moore's Law ● Another 100X boost Accelerators ● Math crunching ● Scalable ● >100 GFLOPS
  31. 31. 31 16K-64K CPUs 1MB/core (3D) ~20 TFLOPS 0.2W-20W 16K-64K CPUs 1MB/core (3D) ~20 TFLOPS 0.2W-20W 64 CPUs 32KB/core 100 GFLOPS 0.1W-2W 64 CPUs 32KB/core 100 GFLOPS 0.1W-2W 64 CPUs 128KB/core 80 GFLOPS (DPF) 0.1W-3W 64 CPUs 128KB/core 80 GFLOPS (DPF) 0.1W-3W 1K CPUs 128KB/core ~1.2 TFLOPS 0.4W-40W 1K CPUs 128KB/core ~1.2 TFLOPS 0.4W-40W By 2018 there WILL be 64K-core chips! This is a new world. Without legacy, a great opportunity to do software right! 2013 2015 2015 2018
  32. 32. 32 Getting your hands dirty ● Tomorrow: LAB2 from 10am-2pm ● Email: andreas@adapteva.com ● Twitter: @adapteva

×