Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HIS 2015: Prof. Ian Phillips - Stronger than its weakest link

HIS 2015

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

HIS 2015: Prof. Ian Phillips - Stronger than its weakest link

  1. 1. 1 Stronger than its weakest link High Integrity So.ware Conference (HIS'15) 5nov15: Bristol. Pdf & SlideCast @ hCp://ianp24.blogspot.com Opinions expressed are my own ... Prof. Ian Phillips Principal Staff Engineer ARM Ltd ian.phillips@arm.com Visiting Prof. at ... Contribution to Industry Award 2008 2v0
  2. 2. 2 High Integrity Software !? ..Or.. "The scienNfic method assumes that a system with perfect integrity yields a singular extrapolaNon within its domain that one can test against observed results" (Wikipedia) §  Is So.ware the weakest link in High Integrity Systems ? §  Such that improving it is all that's necessary to produce High Integrity Systems? §  When we say So.ware are we are actually thinking ComputaNon? §  But Computa;on is about results not about implementa;on technologies!
  3. 3. 3 We know what Proper Computing is... §  HPC and Mainframe ... maybe Worksta2on §  But not really Laptop or ... (Heaven forbid) a Pocketable?
  4. 4. 4 Graham's Orrery - c1700 §  A machine to Compute the posi;on of the planets §  Single-Task, Con;nuous Time, Analogue, Mechanical, Computer (With backlash!) George Graham. Clock-Maker (1674-1751)
  5. 5. 5 Amsler’s Planimeter - c1856 Planimeter 2015 ! §  A Machine for Compu;ng the Area of an arbitrary 2D shape §  Technology: Precision Mechanics, Analogue §  Available today ... Electronically enhanced Jakob Amsler-Laffon. Mathematician, physicist, engineer (1823-1912)
  6. 6. 6 IN (x) Enumerated Phenomena OUT (y) Processed Data/ Information y=F(x) §  State (s) and Time (t) are implicit or explicit variables in this §  And so are Accuracy (a), Reliability (r) and Cost ($) §  All of which can be balanced (Architected) to meet End-Customer needs §  Exceeding needs almost always 'costs' more! ... Technologies and Methodologies just offer 'star$' op;ons over basic func;onality ... Not all of which will be commercially valuable Computing is solving a Model of a Subset of Reality ... Fast enough to be useful and affordable by its customer y=F(x,s,t,a,r,$)
  7. 7. 7 10nm 100nm 1um 10um 100um ApproximateProcessGeometry ITRS’99 Transistors/Chip(M) Transistor/PM(K) X http://en.wikipedia.org/wiki/Moore’s_law Digital Electronics Changed the Computation Game ...
  8. 8. 8 2012: Nvidea’s Tegra 3 Processor Unit (Around 1B transistors) NB:TheTegra 3 is similar to the Apple A4
  9. 9. 9 Computing Systems §  The System is perceived at its Human Interface §  Though the actual interface is (usually) rela;vely dumb §  And its Compute Engine is almost always remote (and may be shared)
  10. 10. 10 The Invisible Face of Computing Today Unrecognised but Vital ... All need to be Dependable
  11. 11. 11 The Visible Face of Computing Today EssenNal but not Vital ... But BIG-BIG-BIG $
  12. 12. 12 §  Digital Electronics §  So.ware §  Memory §  OpNcs §  Analogue Electronic §  Sensors/Transducers §  Mechanics §  Micro-Motors §  Displays §  Discharge Tube §  RoboNc Assembly §  PlasNc, Metal, Glass Input: Image(Light) => Compute (Process Image) => Output: SD Card (Electrons) ... Many Technologies seamlessly coopera;ng, to Enhance Human Memory ... Tradi;onal siloes (inc. SW and HW) are just a means to this end! Electronic System (Cyber-physical System) - c2015 Incorporating DIGIC5+ (ARM) System-Level Computation ‘Classic’ Computer
  13. 13. 13 Human Population Computing for the Masses ... ... Technology Products are Increasingly ‘Intelligent’ 1970 1980 1990 2000 2010 2020 2030 Main Frame Mini Computer Personal Computer Desktop Internet Mobile Internet Millionsof Units 1st Era Select work-tasks 2nd Era Broad-based computing for specific tasks 3rd Era Computing as part of our lives Technology is the Driver Consumer is the Driver ... Old Markets are s;ll there; but don't drive the Technology today!
  14. 14. 14 Typical 2015 Computing Platform ... ... is just 137.2 x 70.5 x 5.9 mm
  15. 15. 15 Typical 2015 Computing Platform Exynos 5422 Eight 32 bit CPUs (big.LITTLE): •  Four big (2.1GHz ARM A15) for heavy tasks; •  Four small (1.5GHz ARM A7) for lighter tasks. + Nine Mali GPU cores ... ... A ~30 Core Heterogeneous Mul;-Processor ... In your Shirt Pocket! One Board ... 21 significant ‘Chips’
  16. 16. 16 2010:Apple’s A4 SIP Package (Cross-sec;on) IC Packaging Technology §  The processor is the centre rectangle. The silver circles beneath it are solder balls. §  Two rectangles above are RAM die, offset to make room for the wirebonds. §  Pufng the RAM close to the processor reduces latency, making RAM faster and reduces power consumpNon ... But increases cost. §  Memory: Unknown §  Processor: Samsung/Apple (ARM Processor) §  Packaging: Unknown (SIP Technology) Source ... http://www.ifixit.com Processor SOC Die 2 Memory Dies Glue Memory ‘Package’ 4-Layer Platform Package’ Steve Jobs WWDC 2010
  17. 17. 17 2013: Samsung Solid-State Memory §  Smart Memory (eMMC) §  16-128Gb in a single package §  8Gb/die. Stacked 2-16 die/package §  Handles errors in the API (Smart Interface) §  Package just 1.4mm thick! (11.5x13x1.4mm) ... Smaller than a postage stamp
  18. 18. 18 10nm 100nm 1um 10um 100um ApproximateProcessGeometry ITRS’99 Transistors/Chip(M) Transistor/PM(K) “Verification Gap” 1,800py 8,500py 100py Moore’s Law: Increasing Design Challenge... http://en.wikipedia.org/wiki/Moore’s_law
  19. 19. 19 §  They sell things that Their Customers desire and can afford §  To sa;sfy the End-Customers needs ... In an End-Product which may be several ‘layers’ above them. §  Focus on their Core Competencies as a Component Provider in a Global Market §  Avoid CommodiNsaNon by DifferenNaNon §  Improved Cost and Quality (by improving Process) ..and.. §  Improved Business-Models (which make the Money) ..and.. §  Improved Func;onality (by new Technology and Methods) §  But New Product Development is a Cost and a Risk to be Minimised §  Technology (HW, SW, Mechanics, Op;cs, Graphene, etc) just enables Op;ons! §  New-Technology may cost more (including risk) than it delivers in Product Value! §  Over-Design costs ... Business can't afford the Precau;onary Principle! ... Because successful End-Products fund their en;re (RD&I) Value-Chains ... Reuse of their Technologies become economic necessity in other markets! Computing Technologies in Business Context Businesses have to be Competitive, Money Making Machines today ...
  20. 20. 20 Component and Sub-Systems from Global Enterprise ... ... Global Teams contributing Specialist Knowledge & Knowhow §  Apple ID’d 159 Tier-1 Suppliers ... §  Thousands of Engineers Globally §  Est. 10x Tier-2 Suppliers ... §  Including Virtual Components1 and Sub-Systems (ARM and other IP Providers) §  Mul;ple Technologies ... §  Hardware, Sojware, Op;cs, Mechanics, Acous;cs, RF, Plas;cs, etc §  Manufacturing, Test, Qualifica;on, etc. §  Methods, Tools, Training, etc §  Tens of thousands Engineers Globally ... More than 90% of Technology and Methods are Reused (produc;vity)! 1: Virtual Components do not appear on BOM
  21. 21. 21 §  But the only way to economically realise this potenNal is by product evoluNon; reusing and reusing again the work of our technical predecessors ... §  Hardware, SoHware and other Technologies; Methods and Tools; and throughout the stack §  In-Company: Sourced and Evolved from Predecessor Products §  Ex-Company: Sourced from businesses with Specialist Knowledge/Experiance §  Reuse Improves Quality; as objects are designed more carefully, and bug-fixes are incremental §  Reuse Improves ProducLvity; as objects can be deployed without understand their implementa;on technology (or its limita;ons) ... It delivers working systems quickly with finite teams; but the dependability cannot be quan;fied! ... Despite this, Commercial Technologies will be used in Systems on which people Depend §  The cost of alternaLves will be several orders of magnitude too great §  The issue is (just) making dependable systems using undependable components Designer Productivity has become the Limiting Factor The Customer Expectation of the Billions of available Transistors is irresistible!
  22. 22. 22 ARM: Delivers Reuse-Based Productivity ... .... 24 Processors in 6 Families for different Applica;on Domains About 50MTr About 50KTr
  23. 23. 23 ...Tools to create optimal Hetrogeneous Multi-Processors ... ACE ACE NIC-400 Network Interconnect Flash GPIO NIC-400 USBQuad Cortex- A15 L2 cache Interrupt Control CoreLink™ DMC-520 x72 DDR4-3200 PHY AHB Snoop Filter Quad Cortex- A15 L2 cache Quad Cortex- A15 L2 cache Quad Cortex- A15 L2 cache CoreLink™ DMC-520 x72 DDR4-3200 8-16MB L3 cache PCIe 10-40 GbE DPI Crypto CoreLink™ CCN-504 Cache Coherent Network IO Virtualisation with System MMU DSP DSP DSP SATA Dual channel DDR3/4 x72 Up to 4 cores per cluster Up to 4 coherent clusters Integrated L3 cache Up to 18 AMBA interfaces for I/O coherent accelerators and IO Peripheral address space Heterogeneous processors – CPU, GPU, DSP and accelerators Virtualized Interrupts Uniform System memory
  24. 24. 24 … Other Tools, Libraries and Partners to Realize the Potential §  Technology to build Electronic System solu2ons: §  SoHware, Drivers, OS-Ports, Tools, ULliLes to create efficient system with op;mized sojware solu;ons §  Diverse Physical Components, including CPU and GPU processors designed for specific tasks §  Interconnect System IP delivering coherency and the quality of service required for lowest memory bandwidth §  OpLmised Cell-Libraries for a highly op;mized SoC implementa;ons §  Well Connected to Partners in the Life-Cycle: §  For complementary tools and methods required by System Developers §  Global Technology Global Partners: §  >900 Licences; Millions of Developers
  25. 25. 25 Are the Outcomes of this 'chain' Dependable? Evidently so:They are Functional and Dependable enough to satisfy Billions/yr! (2Q2015) Smart-Phone shipments 2Q15 - 185 million (~0.75B/yr) ... The probability of a 'fairly reliable' systems failing, when you need to use it for 'improbable' event, is 'highly improbable' ... And mostly this is enough
  26. 26. 26 ‘OpNmal’ Plaporm HW1" HW2" HW3" HW4" Hardware Interface" RTOS/Drivers" Thread" Bus(es) Processor(s) F1" F2" F3" F4" F5" Create FuncNonal-Model1 on a 'Generic' Plaporm (F1)! (F3)! (F5)!(F2)! Evolving the Model (& Plaporm) unNl FuncNonal and Non-FuncNonal, Performance is Adequate. NOTE: 'Final SW' is sNll a Model of Behaviour! Design is Transforming a Model of Behaviour ... ... evolving a Mathematical Model to meet Non-Functional Constraints Transform to a FuncNonal-Model on an 'OpNmal' (HW/SW) Plaporm 1: This includes a Model of Execu;on such as a Java VM.
  27. 27. 27 §  All models are a simplificaNon of reality; therefore they all have limitaNons §  "All models are wrong, but some are useful" (G.E.Box) §  Normal So.ware Design Methods are create-it-wrong, test-it-right ... §  Quality is established by Test; and bug-fixes/patches in the field (An inherently poor method) §  Sojware Reuse offers hugely improved ProducLvity (Not-using it is not an op;on) §  Sojware Reuse offers improved Quality (But over what?) §  ExaminaNon shows that all code has high residual errors ... §  Well structured and tested Source-Code has ~5 errors per 1,000 lines of code (E-KLOC) §  Commercial code is typically ~5x worse than this §  Most errors are harmless – But there is no useful correla;on §  Formal-Methods are beRer; but cost is high if you can't uNlise (normal) legacy code. §  But Even 'Perfect-Sojware' s;ll has to execute on an Imperfect-Plauorm ... "YES!": But Good-Enough sa;sfies the Commercial Impera;ve for most applica;ons Is Software (Logic) Inherently Undependable? Software is a Model of Reality, executing on a Hardware and Software Platform
  28. 28. 28 Open Source is Dependable? "Somebody will see the bugs!" (But only if they look!) 1: http://www.wired.com/2014/04/heartbleedslesson/ 2: http://veridicalsystems.com/blog/of-money-responsibility-and-pride/ “It is now very clear that OpenSSL development could benefit from dedicated full-Nme, properly funded developers” “OSF typically receives only $2,000 a year in donaNons” §  OpenSSL HeartBleed bug (2014) 1 §  Update was received just before a Public Holiday §  Editor was a known and high-quality source §  Code was reviewed informally and released §  Editor was conflicted with day-job, family and holiday pressure 2 §  Too lixle resources to do a proper job. §  This was a classic E-KLOC error ... §  Not a Coding, Formayng, or Func;onal error §  It was a System error (an omission in a non-func;onal aspect of the code). ... Was the ‘fault’ with the sojware Source (OpenSSL Sojware Founda;on (OSF)) ? ... Or a User Community too-ready to believe in the Myth of Open Source sojware?
  29. 29. 29 §  Boolean MathemaNcs (HDL) is Dependable; but implementaNon depends on reliably mapping its equaNons to the physical world through Logic-Gates §  A Gate is a Saturated Analogue circuit; with Non-Func;onal axributes. §  CMOS has been a 'reliable' Boolean mapping for 30 years, but ... §  Today’s 20nm transistors (14nm soon) have larger variability, and there are many more on a chip (Typically 1B in 2014) §  At 70degC, Vtn=130mv (sigma ~25mv) around 1 in 5 million, transistors have Vt<0 (Can’t be turned off) §  So that’s >100 transistors/chip that don’t switch off §  And there's another >100 that only turn-on weakly (low drive/slow) §  This is intrinsic (atomic), so will always be randomly located! ... "NO!": Today’s chips shouldn’t work! (So why do they?) So is Hardware (Logic) Dependable? 1/3 B A +V A B OUTNAND OUT
  30. 30. 30 MiNgaNng this we have ... §  Weak Transistors: Not all ... §  Are at 70 degC even if the die is (But some will be higher) §  Are Minimum Size (Larger ‘area’ reduces variability) §  Are on Cri;cal Paths; and the probability of there being more than one on a path is low! §  CMOS Logic: Is very robust and will conNnue to funcLon with out-of-spec transistors §  Leaky Gates and Faster Transi;ons are seldom func;onal failures (but they do hit reliability!) §  Speed varia;ons on a path average out (on average!) §  Errors are frequently difficult to detect (and thus correct!) §  Memory: Analogue Circuits are much more sensiNve to transistor variaNon. But ... §  Failures are easier to detect (and work around) §  Spare rows/columns are included to fix manufacturing (sta;c) defects ... but not dynamic (use) §  NV-M limited write-cycles and bit failures are shielded by their smart API ... to some degree. ... Hardware failure is not always easily spoxed at the func;onal level! So is Hardware (Logic) Dependable? 2/3
  31. 31. 31 §  And we haven't included imponderables ... §  Internally and Externally generated noise? (Greater suscep;bility at lower voltages) §  High-energy par;cles? (Greater suscep;bility at smaller geometries) §  Wear-out: Vt/Gain drij and Electro Migra;on? (Greater suscep;bility at smaller geometries) §  Local Hot-Spots? (140C is not uncommon on chip) §  Limita;ons of Verifica;on and Test (State-Space explora;on is always a sub-set) §  We are repeatedly mulNplying Nny-improbables, by ever larger-numbers ... §  And many of the values are only guesses! §  We have no real idea about the reliability/dependability of modern Systems or Components §  But we know that as process geometries shrink, SuscepNbility will get worse ... §  Chips will get ever more complex (and more chips will be used in more complex Systems) §  Transistors will get smaller and Designers will erode safety margins to get performance ... Despite this; Chips and Systems do Yield more than we would rightly expect ... ... So we must be u;lising Unknown Safety Margins! So is Hardware (Logic) Dependable? 3/3
  32. 32. 32 Killing a Sacred Cow: SW and HW Logic are the Same ...They have different characteristics, so choice is a System Architectural decision! // A master-slave type D-Flip Flop module flop (data, clock, clear, q, qb); input data, clock, clear; output q, qb; // primitive #delay instance-name // (output, input1, input2, .....), nand #10 nd1 (a, data, clock, clear), nd2 (b, ndata, clock), nd4 (d, c, b, clear), nd5 (e, c, nclock), nd6 (f, d, nclock), nd8 (qb, q, f, clear); nand #9 nd3 (c, a, d), nd7 (q, e, qb); not #10 inv1 (ndata, data), inv2 (nclock, clock); endmodule 'Hardware' Language (Verilog) 'Software' Language (C) #include<time.h> /* Use the PC's timer to check */ /* processing time */ main() { clock_t time,deltime; long junk,i; float secs; LOOP: printf("input loop count: "); scanf("%ld",&junk); time = clock(); for(i=0;i<junk;i++) deltime = clock() - time; secs = (float) deltime/CLOCKS_PER printf("for %ld loops, #tics = % %fn",junk,deltime,secs); goto LOOP; ... Target Platform CMOS -------- CPU Target Architecture Info Compilers HW ----------- SW Configuration Files HW -------------- SW
  33. 33. 33 §  By the Nme you are wriNng ApplicaNons you are hugely dependent on the layered-accuracy of other peoples work beneath ... Both Hardware and So.ware So whilst Boolean Mathematics is Absolute ... ... all implementations of it are not A SoftwareView A HardwareView
  34. 34. 34 §  We Can’t Design them Right §  HW is SW; and Coding errors remain. State-space too big for simula;on explora;on. Can’t model or explore whole Systems and they are too complex for Formal methods. Reuse embodies unknown bugs. §  We Can’t Make them Right §  Chips are subject to Process Imperfec;ons and Variability. Chips and Systems are subject to Verifica;ons and Test Escapes. Boolean math is absolute; logic cells and real layouts are not §  We Can’t Keep them Right §  Chips are suscep;ble to Supply Transients, Wear-Out and High-Energy par;cles. Most damage is not immediately obvious. ... And it will all get worse as process geometries shrink ... Yet every year we make Billions of Systems that work! "The Naysayers are just Harbingers of Doom!" So Complex Electronic Systems are Impossible!
  35. 35. 35 §  System-Level Dependability is what maCers ... §  Component and Sub-System dependability is inherently poor (and will get worse). §  ProducNvity demands that Dependable Systems must Reuse Components and Sub- Systems (Physical and Virtual); and the affordable ones are of Commercial quality! §  Clean-Sheet design is not an op;on for almost all complex products! ... the cost-is-no-object customer is an endangered specie §  Increasing the Dependability of Components and Sub-Systems helps; but can never be enough §  ARM product is really; 'Enhanced Reuse for Electronic System Design and Manufacture' ... The Only Place to implement System-Level Dependability on an Undependable Plauorm, is at the System-Layer! §  Reliable components and sub-systems will help, but cannot ever be enough §  Predominantly a 'So.ware' challenge; but not alone (Don't forget the simple Watch-Dog) Dependable on Undependable Any Methods that are based on perfection in HW or SW are untenable ...
  36. 36. 36 The Real Conclusions §  Systems are what End-Customers buy; they expect them to be Dependable Enough §  A subjec;ve concept; which is Applica;on, State and Context dependent (& Technology independent) §  Commercial Components (HW/SW) will be the building blocks of Dependable Systems §  Commercial use gives us the Technologies which we are economically bound to use today §  Though they work bexer than we would rightly expect, we cannot quan;fy their quality §  Improving their Quality/Reliability/Dependability helps; but 100% is an asympto;c goal! §  The System Knows what the System Wants §  So: System behaviour and robustness must be handled at the System-Level (Top-Level); only it can know the expected ac;on and appropriate correc;ve ac;on for its domain. §  And: Because of the size of the Func;onal and Non-Func;onal Space, conformance cannot be measured; so it will require a Policy Based approach. ... Meanwhile systems that people depend on will be produced ... The Commercial Impera;ve can’t/won't wait for the 'right methodology'
  37. 37. 37 The END IsVery Nigh ... Pdf & SlideCast through http://ianp24.blogspot.com

×