Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computing Beyond Moore’s Law: Architecture and Device Innovations

1,049 views

Published on

The performance of computing has improved by around 1012 times in the past 70 years. This tremendous growth has followed Moore’s Law, which brings higher performance and lower power at the same time and enables ICT to achieve higher power efficiency with every new generation of technology. Fujitsu has made significant contributions in ICT on a global scale during this period. However, due to physical and economic limits, we cannot expect to continue this trend in the next decade. We must develop new technology to overcome this difficulty and establish a new paradigm beyond Moore’s Law. In this session, we highlight the latest challenges Fujitsu R&D faces for increasing performance through computer architecture and device innovations.
Speaker
Takeshi Horie

Published in: Technology
  • Be the first to comment

Computing Beyond Moore’s Law: Architecture and Device Innovations

  1. 1. 0 Copyright 2016 FUJITSU Fujitsu Forum 2016 #FujitsuForum
  2. 2. 1 Copyright 2016 FUJITSU Computing Beyond Moore’s Law: Architecture and Device Innovations TAKESHI HORIE Head of Computer Systems Laboratory FUJITSU LABORATORIES LTD.
  3. 3. 2 Copyright 2016 FUJITSU Why computing now? Data explosion  Data is generated by many IoT devices and the amount of data is exploding.  Computing creates knowledge and intelligence from data. But traditional computing cannot handle it.  End of Moore’s law  For 50 years we have enjoyed device technology scaling. But that is ending. Fundamentally rethink new computing architecture
  4. 4. 3 Copyright 2016 FUJITSU Demand for Computing and Fujitsu Computer Systems
  5. 5. 4 Copyright 2016 FUJITSU Computer performance  Since ENIAC was developed 70 year ago, computer performance is increasing twice every 1.5 years. 1,E+00 1,E+03 1,E+06 1,E+09 1,E+12 1930 1950 1970 1990 2010 ENIAC Computationspersecondpercomputer ENIAC, 1946 U.S. federal government 2x / 1.5 years
  6. 6. 5 Copyright 2016 FUJITSU Computing demand for scientific applications  Although computing has enabled applications in variety of fields, still much higher computing power is required to solve complex problems of the real world. Heart simulation Joint research with the University of Tokyo Tsunami simulation Joint research with Tohoku University - International Research Institute for Disaster Life science and drug manufacturing Global change prediction for reducing disaster Industrial innovation New material and energy creation Origin of matter and the universe
  7. 7. 6 Copyright 2016 FUJITSU Computing demand for financial applications  Tokyo Stock Exchange, Inc. (TSE) is one of the world's top trading market and lists around 3,800 brands. Daily trading value exceeds three trillion yen.  Trading volume is constantly increasing year by year  For high frequency trading, response time is reduced from 2ms to 500us in 5 years 0 100 200 300 400 500 600 700 800 900 Million 2015 Trading Volume in TSE 1st section 1949 Response Time of TSE 2ms 900μs 2010 2012 2015 500μs
  8. 8. 7 Copyright 2016 FUJITSU Fujitsu computer systems 1950 1960 1970 1980 1990 2000 2010 FACOM100 (1954) FACOM230-10 (1965) M-190 (1976) M-780 (1985) M-1800 (1990) VPP-500 (1992) FM V (1993) OAYSYS100 (1980) PRIMEHPC FX10 (2011) VP-100 (1982) FM TOWNS (1989) PRIMEQUEST (2005) GS21 (2002) DS90 (1991) Arrows (2011) SPARC M10 (2013) Supercomputer Mainframe Enterprise Servers Ubiquitous Terminal
  9. 9. 8 Copyright 2016 FUJITSU SPARC64 XIfx 2000 - 2003- 1999 SPARC64 V SPARC64 GP GS8900 GS21 600 GS8800B SPARC64 VII GS21SPARC64 V + SPARC64 VI GS8800 GS21 900 Mainframe HighPerformanceHighReliability Store Ahead Branch History Prefetch Single-chip CPU Non-Blocking $ O-O-O Execution Super-Scalar L2$ on Die HPC-ACE System on Chip Hardware Barrier Multi-core Multi-thread 2004 - 2007 2008 - 2011 SPARC64 GP 2012 - 2015 SPARC64 IXfx Virtual Machine Architecture Software On Chip High-speed Interconnect SPARC64 X SPARC64 X+ Supercomputer UNIX $ECC Register/ALU Parity Instruction Retry $ Dynamic degradation RC/RT/History SPARC64 VIIIfx GS21 M2600 2016 - K computer SPARC64 SPARC64 II GS8600 Fujitsu microprocessors
  10. 10. 9 Copyright 2016 FUJITSU  Fujitsu provides many HPC solutions to satisfy various customer demands  Support for both supercomputers with original CPU and x86 cluster systems  Post-K will be developed with collaboration with RIKEN and ARM Supercomputer PRIMEHPC PRIMEHPC FX10 PRIMEHPC FX100 K computer (Co-developed with RIKEN) Large-Scale SMP System RX900 x86 Cluster CX400/CX600(KNL) BX900/BX400 Post-K (Co developed with RIKEN and ARM) Fujitsu high performance computing
  11. 11. 10 Copyright 2016 FUJITSU IoT and Data Explosion
  12. 12. 11 Copyright 2016 FUJITSU IoT connects everything  By 2020, 50 billion devices will be connected and generate data constantly 1990 2010 20202000 Year Billionsofdevices 10 20 30 40 50 (src: CISCO) Only 1 million PCs were connected to the Internet Number of devices exceeded the world wide populations More than 50 billion devices in 2020 World wide populations
  13. 13. 12 Copyright 2016 FUJITSU Data explosion  As amount of data is exploding, it exceeds capability of traditional ICT  Need new processing to create valuable information from unstructured data 1990 2010 20202000 Year Amountofdata 1 ZB=1021 1 YB=1024 Amount of data will reach: 40 Zetta Byte by 2020 1 Yotta Byte by 2030 40 ZB1 ZB 1 YB Unstructured data IOT, sensors Structured data Business data, RDB
  14. 14. 13 Copyright 2016 FUJITSU New information processing for data explosion Information Knowledge Intelligence/ Knowledge Volume Quality for Value Numeric Data Computing
  15. 15. 14 Copyright 2016 FUJITSU Technology Trend for Computing
  16. 16. 15 Copyright 2016 FUJITSU Microprocessor trend  Tr. counts are growing exponentially following Moore’s law  Single thread performance •Increased by 60%/year (-2005) •Slowed down to +20%/year (2005-)  Power & operating frequency •Power restriction limits operating frequency (2005-) Performance growth is limited by power consumption Source: Stanford, K. Rupp Tr, counts(K) Performance Frequency(MHz) Power(W) Core counts
  17. 17. 16 Copyright 2016 FUJITSU Memory trend 0,001 0,01 0,1 1 10 100 1000 2000 2002 2004 2006 2008 2010 2012 2014 2016 Year (Source: ISSCC, VLSI Circuits & Tech., ASSCC, IEDM) NAND +32%/Yr DRAM +18%/Yr MRAM +52%/Yr PCM +95%/Yr 103 102 10 1 10-1 10-2 10-3 MemoryICCapacity[Gb/die] ReRAM +140%/Yr msusns SRAM DRAM HDDSSD CPU Cache Flash Magnetic 1000x Performance Gap Access Time Next Gen. Memory Memory Next generation memories are required to fill DRAM-NAND gap  DRAM density saturated. NAND Flash density growing with limited endurance  Big performance gap between DRAM and NAND Flash
  18. 18. 17 Copyright 2016 FUJITSU Moore’s law  Device technology scaling has brought higher performance as well as higher power efficiency for these 50 years.  The trade off line is determined by device technology at each generation. As technology scales, the trade-off line moves upward.  Technology node will reach 7nm in 2020. (physical limitation of current Tr. technology) s: Scaling factor Power efficiency*(Performance)2 = K∝s5 1 10 102 103 104 102 103 104 105 Performance (a.u.) Powerefficiency(a.u.) 1990 2000 2010 2020 Technology scaling will never be a driver for computing Mobile Server Moore’s limit line advancement
  19. 19. 18 Copyright 2016 FUJITSU Computing innovations beyond Moore’s law  To overcome the limit of Moore’s law in terms of both performance and power efficiency, realize beyond-Moore’s law computing by two approaches 1 10 102 103 104 102 103 104 105 Performance (a.u.) PowerEfficiency(a.u.) Moore’s limit line Beyond Moore’s Law Moore’s Law Computing architecture innovation Device innovation
  20. 20. 19 Copyright 2016 FUJITSU Computing Architecture Innovation
  21. 21. 20 Copyright 2016 FUJITSU Data explosion and challenges  Overcome challenges to realize new information processing 40ZB(40*1021 B) Unstructured data Structured data 2020 20302010 Year Amountofdata Intelligence 電力,伝送, 集積,処理 の限界 2000 1YB (1024 B) Essence of Intelligence Data Information knowledge Challenges • Process Technology • Network Bandwidth • Power Consumption • Computing Power Data explosion
  22. 22. 21 Copyright 2016 FUJITSU Computing architecture innovation  Create new computing paradigm for data explosion 40ZB(40*1021 B) Unstructured data Structured data 2020 20302010 Year Amountofdata Intelligence 電力,伝送, 集積,処理 の限界 2000 1YB (1024 B) Essence of Intelligence Data Information knowledge Challenges • Process Technology • Network Bandwidth • Power Consumption • Computing Power Data explosion New Computing Architecture Moore’s Law Computing Hyperconnected Cloud Cloud Computing System
  23. 23. 22 Copyright 2016 FUJITSU Hyperconnected Cloud  R&D vision and strategy: “Hyperconnected Cloud”  Web scale ICT provides computing and data processing power through service-oriented connection  AI and security are embedded at every layer to create knowledge in safe and secure society
  24. 24. 23 Copyright 2016 FUJITSU New computing architecture Conventional computing Neural computing (Inference) Neural computing (Learning) Accelerators Brain inspired computing Supercomputers Quantum computers Specialization Processing Numeric Media Knowledge Intelligence  Evolving from numeric computing to intelligence computing End of Moore’s Law
  25. 25. 24 Copyright 2016 FUJITSU Approach for new computing architecture Conventional computing Neural computing (Inference) Neural computing (Learning) Brain inspired computing Supercomputers Quantum computers Processing Numeric Media Knowledge Conventional Computing  Evolving from numeric computing to intelligence computing Conventional Computing Specialization Intelligence
  26. 26. 25 Copyright 2016 FUJITSU Conventional computing Neural computing (Inference) Neural computing (Learning) Accelerators Brain inspired computing Supercomputers Quantum computers Processing Numeric Media Knowledge Conventional Computing  Evolving from numeric computing to intelligence computing Conventional Computing Domain Specific Computing Specialization Intelligence Approach for new computing architecture
  27. 27. 26 Copyright 2016 FUJITSU Conventional computing Neural computing (Inference) Neural computing (Learning) Accelerators Brain inspired computing Scientific computing Quantum computers Processing Numeric Media Knowledge Conventional Computing Domain Specific Computing  Evolving from numeric computing to intelligence computing Conventional Computing Domain Specific Computing New Computing Paradigm Specialization Intelligence Approach for new computing architecture
  28. 28. 27 Copyright 2016 FUJITSU Conventional computing Neural computing (Inference) Neural computing (Learning) Accelerators Brain inspired computing Scientific computing Quantum computers Processing Numeric Media Knowledge Conventional Computing Domain Specific Computing New Computing Paradigm  Evolving from numeric computing to intelligence computing Conventional Computing Domain Specific Computing New Computing Paradigm Future Computing Technologies Specialization Intelligence Approach for new computing architecture
  29. 29. 28 Copyright 2016 FUJITSU  Achieve extremely high performance, simple operation and low cost by specializing hardware and software in specific application domains  Optimize architecture to the characteristics of the specific domain  Optimize hardware and software to the major functions of the domain What is domain specific computing? Media SearchBig Data Analysis Control, Compression Encryption, Attack Detection Domain Specific Computing Hardware configuration optimized for the domain Combinatorial Optimization Domain Specific
  30. 30. 29 Copyright 2016 FUJITSU Three areas for domain specific computing Media SearchBig Data Analysis Control, Compression Encryption, Attack Detection Domain Specific Computing Hardware configuration optimized for the domain Combinatorial Optimization Domain Specific Media processing Rivalling quantum computing Neural computing
  31. 31. 30 Copyright 2016 FUJITSU Computing Architecture Innovation Rivalling Quantum Computing for Combinatorial Optimization Demonstration 1
  32. 32. 31 Copyright 2016 FUJITSU What is combinatorial optimization? Power delivery Disaster recovery Investment portfolio City City City City Combinatorial optimization Find the shortest distance of tour course ?  Number of combinations: (N-1)!/2  e. g., 32 cities  1033 order combinations Combinatorial explosion
  33. 33. 32 Copyright 2016 FUJITSU FastSlow Applicable to practical problems Limitation of problems Conventional processor Quantum Computer * Our goal * Quantum Annealing type Strategy to solve combinatorial optimization Create high-speed and widely applicable architecture • Locating power grid failure • Pick-up and delivery of 2000 depots • Locating failures in 20-breaker power grid • Map coloring
  34. 34. 33 Copyright 2016 FUJITSU  Architecture to meet usability and scalability for combinatorial optimization  Solve practical problems by using CMOS digital design  Realize scalability for larger problems and speed enhancement  Features  Minimize the volume of date to move in parallel and hierarchical structure  Accelerate search for paths by parallel score calculation and transition facilitation Proposed new computing architecture Multiple engines for larger problems Further speed up achieved by parallelism Speed up by parallel score calculation and transition facilitation Press release on Oct. 20th 2016
  35. 35. 34 Copyright 2016 FUJITSU Evaluation of our prototype 12,000 speedup confirmed by using 32-city traveling salesman problem  Engine performance evaluated using FPGA implementation 0.1 1 10 100 1,000 10,000 2 x Timetosolution(sec) Conventional processor FPGA Parallel Score Calculation 1000 x 6 x Transition Facilitation T h i s W o r k s 12,000 x *3.5-GHz Intel Xeon E5
  36. 36. 35 Copyright 2016 FUJITSU Current status and future plan  High-speed, widely applicable architecture for optimization  Operates 12,000 times faster than conventional processor  1,000,000 times speedup envisioned using higher-layer parallelism Engine Integrating many engines Upper layer parallelism Achieved 12,000 times speedup using internal-engine parallelism Further speed up and larger network size by using upper layers
  37. 37. 36 Copyright 2016 FUJITSU Ecosystem of combinatorial optimizer Ecosystem of Combinatorial Optimizer Architecture Software Development Environment Application Research Institutes Fujitsu Universities Application to Practical Problem Combinatorial Optimizer Engine High-Speed Engine Scalable Solution Next Step Combinatorial Optimizer Make SDK with the Engine available for joint research project Delivery, Distribution Manufacturing, CAD Decision Making AI
  38. 38. 37 Copyright 2016 FUJITSU Computing Architecture Innovation Neural Computing Demonstration 5
  39. 39. 38 Copyright 2016 FUJITSU Neural computing comes back again  Deep Learning algorithm and enhanced computing capability have enabled much higher object recognition rate than ever since 2012. Features ResultsInput image Feature extraction Classification Manual design Features ResultsInput image Feature extraction Classification Automatic extraction(Deep Learning) Automatic 0.00 0.05 0.10 0.15 0.20 0.25 0.30 2011 2012 2013 2014 2015 Neural computing Conventional machine learning algorithm Large difference Improving every year 1y ny2y ijw Output Input Learning Inference Neural network (Feedforwad) Generalobjectrecognitionrate
  40. 40. 39 Copyright 2016 FUJITSU Computing for deeper neural network  To achieve higher accuracy, neural network has been deeper and larger  Processing speed: computing for learning with deeper neural network is time consuming  Processing capacity: limited memory size on GPU is critical for larger neural network 0 2 4 6 8 10 12 14 16 18 1998 ~ 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 MemorySize[GB] Year GPU Memory Size NN Size(Batch=8) ResNet AlexNet VGGNetLeNet ~16GB Neural network size trend
  41. 41. 40 Copyright 2016 FUJITSU Fastest learning w/ HPC technology  Developed high-speed technology to process deep learning  Using "AlexNet," 64 GPUs in parallel achieve 27 times the speed of a single GPU for world's fastest processing Press release on Aug. 9th 2016 1.8x faster Conventional Same accuracy 64 GPUs1 GPU 27x faster learning speed (60x faster execution speed) Our approach … (64 GPUs) (64 GPUs)
  42. 42. 41 Copyright 2016 FUJITSU Doubles deep learning neural network scale  Developed technology to streamline internal memory of GPUs to support growing neural network scale that works to heighten machine learning accuracy  Enabled neural network machine learning of a scale up to twice what was capable with previous technology  Response after press release “How A New Technology Promises To Make Learning More Powerful Than It Already Is” By Kelvin Murae, Forbes 4% more accuracy Conventional Our approach Same memory 2x more images Press release on Sep. 21st 2016
  43. 43. 42 Copyright 2016 FUJITSU Computing Architecture Innovation Media Processing Demonstration 2
  44. 44. 43 Copyright 2016 FUJITSU Needs for image retrieval  Routinely create and store numerous documents that contain images like presentation materials.  Stored massive image materials are not reused sufficiently.  To search for documents, 10% of work-time is wasted at offices Needs more intuitive search method “Search by image” increases productivity
  45. 45. 44 Copyright 2016 FUJITSU Partial image retrieval  Find images based on matches with a part of the query image Query image Search results ・Partial match ・Enlarged/Reduce image Search Massive image DB Results  General-purpose server takes a long processing time for massive calculations of partial matching Requires acceleration of partial image retrieval to search a target image intuitively and efficiently
  46. 46. 45 Copyright 2016 FUJITSU Image search acceleration system  We develops technology for instantaneous searches of a target image from a massive volume of images Query by Image Results Server Database I found it! Partial image retrieval engine CPU FPGA Matching Feature Extraction I/O Processing Overall Control Client Visual, intuitive user interface Press release on Feb. 2nd 2016
  47. 47. 46 Copyright 2016 FUJITSU Demonstration
  48. 48. 47 Copyright 2016 FUJITSU Performance  Performance  Search performance : more than 50 times  Power consumption: less than 1/30†  Cubic volume of space: less than 1/50† † for equivalent search performance Conventional server Media domain specific server 200 Image/sec 12,000 Image/sec Throughput More than 50 times “Search by image” makes document creation more productive
  49. 49. 48 Copyright 2016 FUJITSU Device Innovation
  50. 50. 49 Copyright 2016 FUJITSU Device innovations for beyond Moore’s law  Novel Packaging Technology  System in Package to be replaced by new and different types of integration and scaling • 2.5D integration with Interposer • 3D stacked ICs  Beyond CMOS  New technology that may take the place of silicon CMOS technology • New channel materials : Compound Semiconductor, Graphene and CNTs (Carbon nanotube) • New principle devices:Tunneling FET, Spin FET, Mott FET, … Device innovation accelerates further innovation in architecture
  51. 51. 50 Copyright 2016 FUJITSU Summary
  52. 52. 51 Copyright 2016 FUJITSU Computing architecture innovations  The demand for computing performance is unlimited  We will continue to innovate computing architecture and penetrate new applications with data explosion.Penetration Graphics Processing Computing paradigm shift Vector Neural computing Accelerators Brain inspired computing Quantum computers
  53. 53. 52 Copyright 2016 FUJITSU

×