Successfully reported this slideshow.
Stronger than its weakest link
High Integrity So.ware Conference (HIS'15)
Pdf & SlideCast @ hCp://ianp24.blogspot.com
Opinions expressed are my own ...
Prof. Ian Phillips
Principal Staff Engineer
Visiting Prof. at ...
Industry Award 2008
High Integrity Software !?
"The scienNﬁc method assumes that a system with perfect integrity yields a singular
extrapolaNon within its domain that one can test against observed results" (Wikipedia)
§ Is So.ware the weakest link in
High Integrity Systems ?
§ Such that improving it is all that's
necessary to produce High
§ When we say So.ware are we are
actually thinking ComputaNon?
§ But Computa;on is about results not about
We know what Proper Computing is...
§ HPC and Mainframe ... maybe Worksta2on
§ But not really Laptop or ... (Heaven forbid) a Pocketable?
Graham's Orrery - c1700
§ A machine to Compute the posi;on of the planets
§ Single-Task, Con;nuous Time, Analogue, Mechanical, Computer (With backlash!)
George Graham. Clock-Maker (1674-1751)
Amsler’s Planimeter - c1856
Planimeter 2015 !
§ A Machine for Compu;ng the Area of an arbitrary 2D shape
§ Technology: Precision Mechanics, Analogue
§ Available today ... Electronically enhanced
Jakob Amsler-Laffon. Mathematician,
physicist, engineer (1823-1912)
§ State (s) and Time (t) are implicit or explicit variables in this
§ And so are Accuracy (a), Reliability (r) and Cost ($)
§ All of which can be balanced (Architected) to meet End-Customer needs
§ Exceeding needs almost always 'costs' more!
... Technologies and Methodologies just oﬀer 'star$' op;ons over basic func;onality
... Not all of which will be commercially valuable
Computing is solving a Model of a Subset of Reality ...
Fast enough to be useful and affordable by its customer
Digital Electronics Changed the Computation Game ...
2012: Nvidea’s Tegra 3 Processor Unit (Around 1B transistors)
NB:TheTegra 3 is similar to the Apple A4
§ The System is perceived at its Human Interface
§ Though the actual interface is (usually) rela;vely dumb
§ And its Compute Engine is almost always remote (and may be shared)
The Invisible Face of Computing Today
Unrecognised but Vital ... All need to be Dependable
The Visible Face of Computing Today
EssenNal but not Vital ... But BIG-BIG-BIG $
§ Digital Electronics
§ Analogue Electronic
§ Discharge Tube
§ RoboNc Assembly
§ PlasNc, Metal, Glass
Input: Image(Light) => Compute (Process Image) => Output: SD Card (Electrons)
... Many Technologies seamlessly coopera;ng, to Enhance Human Memory
... Tradi;onal siloes (inc. SW and HW) are just a means to this end!
Electronic System (Cyber-physical System) - c2015
Incorporating DIGIC5+ (ARM)
Computing for the Masses ...
... Technology Products are Increasingly ‘Intelligent’
1970 1980 1990 2000 2010 2020 2030
for specific tasks
Computing as part
of our lives
Technology is the Driver
Consumer is the Driver
... Old Markets are s;ll there; but don't drive the Technology today!
Typical 2015 Computing Platform ...
... is just 137.2 x 70.5 x 5.9 mm
Typical 2015 Computing Platform
Eight 32 bit CPUs (big.LITTLE):
• Four big (2.1GHz ARM A15) for
• Four small (1.5GHz ARM A7)
for lighter tasks.
+ Nine Mali GPU cores ...
... A ~30 Core Heterogeneous Mul;-Processor ... In your Shirt Pocket!
One Board ...
21 signiﬁcant ‘Chips’
2010:Apple’s A4 SIP Package (Cross-sec;on)
IC Packaging Technology
§ The processor is the centre rectangle. The silver circles beneath it are solder balls.
§ Two rectangles above are RAM die, oﬀset to make room for the wirebonds.
§ Pufng the RAM close to the processor reduces latency, making RAM
faster and reduces power consumpNon ... But increases cost.
§ Memory: Unknown
§ Processor: Samsung/Apple (ARM Processor)
§ Packaging: Unknown (SIP Technology)
Source ... http://www.ifixit.com
Processor SOC Die
2 Memory Dies
Steve Jobs WWDC 2010
2013: Samsung Solid-State Memory
§ Smart Memory (eMMC)
§ 16-128Gb in a single package
§ 8Gb/die. Stacked 2-16 die/package
§ Handles errors in the API (Smart Interface)
§ Package just 1.4mm thick! (11.5x13x1.4mm)
... Smaller than a postage stamp
Moore’s Law: Increasing Design Challenge...
§ They sell things that Their Customers desire and can aﬀord
§ To sa;sfy the End-Customers needs ... In an End-Product which may be several ‘layers’ above them.
§ Focus on their Core Competencies as a Component Provider in a Global Market
§ Avoid CommodiNsaNon by DiﬀerenNaNon
§ Improved Cost and Quality (by improving Process) ..and..
§ Improved Business-Models (which make the Money) ..and..
§ Improved Func;onality (by new Technology and Methods)
§ But New Product Development is a Cost and a Risk to be Minimised
§ Technology (HW, SW, Mechanics, Op;cs, Graphene, etc) just enables Op;ons!
§ New-Technology may cost more (including risk) than it delivers in Product Value!
§ Over-Design costs ... Business can't aﬀord the Precau;onary Principle!
... Because successful End-Products fund their en;re (RD&I) Value-Chains
... Reuse of their Technologies become economic necessity in other markets!
Computing Technologies in Business Context
Businesses have to be Competitive, Money Making Machines today ...
Component and Sub-Systems from Global Enterprise ...
... Global Teams contributing Specialist Knowledge & Knowhow
§ Apple ID’d 159 Tier-1 Suppliers ...
§ Thousands of Engineers Globally
§ Est. 10x Tier-2 Suppliers ...
§ Including Virtual Components1 and
Sub-Systems (ARM and other IP Providers)
§ Mul;ple Technologies ...
§ Hardware, Sojware, Op;cs,
Mechanics, Acous;cs, RF, Plas;cs, etc
§ Manufacturing, Test, Qualiﬁca;on,
§ Methods, Tools, Training, etc
§ Tens of thousands Engineers Globally
... More than 90% of Technology and
Methods are Reused (produc;vity)!
1: Virtual Components do not appear on BOM
§ But the only way to economically realise this potenNal is by product evoluNon;
reusing and reusing again the work of our technical predecessors ...
§ Hardware, SoHware and other Technologies; Methods and Tools; and throughout the stack
§ In-Company: Sourced and Evolved from Predecessor Products
§ Ex-Company: Sourced from businesses with Specialist Knowledge/Experiance
§ Reuse Improves Quality; as objects are designed more carefully, and bug-ﬁxes are incremental
§ Reuse Improves ProducLvity; as objects can be deployed without understand their implementa;on
technology (or its limita;ons)
... It delivers working systems quickly with ﬁnite teams; but the dependability cannot be quan;ﬁed!
... Despite this, Commercial Technologies will be used in Systems on which people Depend
§ The cost of alternaLves will be several orders of magnitude too great
§ The issue is (just) making dependable systems using undependable components
Designer Productivity has become the Limiting Factor
The Customer Expectation of the Billions of available Transistors is irresistible!
ARM: Delivers Reuse-Based Productivity ...
.... 24 Processors in 6 Families for diﬀerent Applica;on Domains
...Tools to create optimal Hetrogeneous Multi-Processors ...
NIC-400 Network Interconnect
8-16MB L3 cache
CoreLink™ CCN-504 Cache Coherent Network
IO Virtualisation with System MMU
Up to 4 cores
Up to 4
Up to 18 AMBA
Peripheral address space
Heterogeneous processors – CPU, GPU, DSP and
… Other Tools, Libraries and Partners to Realize the Potential
§ Technology to build Electronic System solu2ons:
§ SoHware, Drivers, OS-Ports, Tools, ULliLes to create
eﬃcient system with op;mized sojware solu;ons
§ Diverse Physical Components, including CPU and GPU
processors designed for speciﬁc tasks
§ Interconnect System IP delivering coherency and the
quality of service required for lowest memory bandwidth
§ OpLmised Cell-Libraries for a highly op;mized SoC
§ Well Connected to Partners in the Life-Cycle:
§ For complementary tools and methods required by
§ Global Technology Global Partners:
§ >900 Licences; Millions of Developers
Are the Outcomes of this 'chain' Dependable?
Evidently so:They are Functional and Dependable enough to satisfy Billions/yr!
Smart-Phone shipments 2Q15 - 185 million (~0.75B/yr)
... The probability of a 'fairly reliable' systems failing, when you need to use it
for 'improbable' event, is 'highly improbable' ... And mostly this is enough
HW1" HW2" HW3" HW4"
Create FuncNonal-Model1 on a 'Generic' Plaporm
Evolving the Model (& Plaporm) unNl FuncNonal
and Non-FuncNonal, Performance is Adequate.
NOTE: 'Final SW' is sNll a Model of Behaviour!
Design is Transforming a Model of Behaviour ...
... evolving a Mathematical Model to meet Non-Functional Constraints
Transform to a FuncNonal-Model on an 'OpNmal' (HW/SW) Plaporm
1: This includes a Model of Execu;on such as a Java VM.
§ All models are a simpliﬁcaNon of reality; therefore they all have limitaNons
§ "All models are wrong, but some are useful" (G.E.Box)
§ Normal So.ware Design Methods are create-it-wrong, test-it-right ...
§ Quality is established by Test; and bug-ﬁxes/patches in the ﬁeld (An inherently poor method)
§ Sojware Reuse oﬀers hugely improved ProducLvity (Not-using it is not an op;on)
§ Sojware Reuse oﬀers improved Quality (But over what?)
§ ExaminaNon shows that all code has high residual errors ...
§ Well structured and tested Source-Code has ~5 errors per 1,000 lines of code (E-KLOC)
§ Commercial code is typically ~5x worse than this
§ Most errors are harmless – But there is no useful correla;on
§ Formal-Methods are beRer; but cost is high if you can't uNlise (normal) legacy code.
§ But Even 'Perfect-Sojware' s;ll has to execute on an Imperfect-Plauorm
... "YES!": But Good-Enough sa;sﬁes the Commercial Impera;ve for most applica;ons
Is Software (Logic) Inherently Undependable?
Software is a Model of Reality, executing on a Hardware and Software Platform
Open Source is Dependable?
"Somebody will see the bugs!" (But only if they look!)
“It is now very clear that
OpenSSL development could
beneﬁt from dedicated full-Nme,
properly funded developers”
“OSF typically receives only
$2,000 a year in donaNons”
§ OpenSSL HeartBleed bug (2014) 1
§ Update was received just before a Public Holiday
§ Editor was a known and high-quality source
§ Code was reviewed informally and released
§ Editor was conﬂicted with day-job, family and holiday pressure 2
§ Too lixle resources to do a proper job.
§ This was a classic E-KLOC error ...
§ Not a Coding, Formayng, or Func;onal error
§ It was a System error (an omission in a non-func;onal aspect of the code).
... Was the ‘fault’ with the sojware Source (OpenSSL Sojware Founda;on (OSF)) ?
... Or a User Community too-ready to believe in the Myth of Open Source sojware?
§ Boolean MathemaNcs (HDL) is Dependable; but implementaNon depends on reliably
mapping its equaNons to the physical world through Logic-Gates
§ A Gate is a Saturated Analogue circuit; with Non-Func;onal axributes.
§ CMOS has been a 'reliable' Boolean mapping for 30 years, but ...
§ Today’s 20nm transistors (14nm soon) have larger variability,
and there are many more on a chip (Typically 1B in 2014)
§ At 70degC, Vtn=130mv (sigma ~25mv) around 1 in 5 million,
transistors have Vt<0 (Can’t be turned oﬀ)
§ So that’s >100 transistors/chip that don’t switch oﬀ
§ And there's another >100 that only turn-on weakly (low drive/slow)
§ This is intrinsic (atomic), so will always be randomly located!
... "NO!": Today’s chips shouldn’t work! (So why do they?)
So is Hardware (Logic) Dependable? 1/3
MiNgaNng this we have ...
§ Weak Transistors: Not all ...
§ Are at 70 degC even if the die is (But some will be higher)
§ Are Minimum Size (Larger ‘area’ reduces variability)
§ Are on Cri;cal Paths; and the probability of there being more than one on a path is low!
§ CMOS Logic: Is very robust and will conNnue to funcLon with out-of-spec transistors
§ Leaky Gates and Faster Transi;ons are seldom func;onal failures (but they do hit reliability!)
§ Speed varia;ons on a path average out (on average!)
§ Errors are frequently diﬃcult to detect (and thus correct!)
§ Memory: Analogue Circuits are much more sensiNve to transistor variaNon. But ...
§ Failures are easier to detect (and work around)
§ Spare rows/columns are included to ﬁx manufacturing (sta;c) defects ... but not dynamic (use)
§ NV-M limited write-cycles and bit failures are shielded by their smart API ... to some degree.
... Hardware failure is not always easily spoxed at the func;onal level!
So is Hardware (Logic) Dependable? 2/3
§ And we haven't included imponderables ...
§ Internally and Externally generated noise? (Greater suscep;bility at lower voltages)
§ High-energy par;cles? (Greater suscep;bility at smaller geometries)
§ Wear-out: Vt/Gain drij and Electro Migra;on? (Greater suscep;bility at smaller geometries)
§ Local Hot-Spots? (140C is not uncommon on chip)
§ Limita;ons of Veriﬁca;on and Test (State-Space explora;on is always a sub-set)
§ We are repeatedly mulNplying Nny-improbables, by ever larger-numbers ...
§ And many of the values are only guesses!
§ We have no real idea about the reliability/dependability of modern Systems or Components
§ But we know that as process geometries shrink, SuscepNbility will get worse ...
§ Chips will get ever more complex (and more chips will be used in more complex Systems)
§ Transistors will get smaller and Designers will erode safety margins to get performance
... Despite this; Chips and Systems do Yield more than we would rightly expect ...
... So we must be u;lising Unknown Safety Margins!
So is Hardware (Logic) Dependable? 3/3
Killing a Sacred Cow: SW and HW Logic are the Same
...They have different characteristics, so choice is a System Architectural decision!
// A master-slave type D-Flip Flop
module flop (data, clock, clear, q, qb);
input data, clock, clear;
output q, qb;
// primitive #delay instance-name
// (output, input1, input2, .....),
nand #10 nd1 (a, data, clock, clear),
nd2 (b, ndata, clock),
nd4 (d, c, b, clear),
nd5 (e, c, nclock),
nd6 (f, d, nclock),
nd8 (qb, q, f, clear);
nand #9 nd3 (c, a, d),
nd7 (q, e, qb);
not #10 inv1 (ndata, data),
inv2 (nclock, clock);
'Hardware' Language (Verilog) 'Software' Language (C)
/* Use the PC's timer to check */
/* processing time */
printf("input loop count: ");
time = clock();
deltime = clock() - time;
secs = (float) deltime/CLOCKS_PER
printf("for %ld loops, #tics = %
CMOS -------- CPU
Target Architecture Info
HW ----------- SW
HW -------------- SW
§ By the Nme you are wriNng ApplicaNons you are
hugely dependent on the layered-accuracy of other
peoples work beneath
... Both Hardware and So.ware
So whilst Boolean Mathematics is Absolute ...
... all implementations of it are not
§ We Can’t Design them Right
§ HW is SW; and Coding errors remain. State-space too big for simula;on
explora;on. Can’t model or explore whole Systems and they are too
complex for Formal methods. Reuse embodies unknown bugs.
§ We Can’t Make them Right
§ Chips are subject to Process Imperfec;ons and Variability. Chips and
Systems are subject to Veriﬁca;ons and Test Escapes. Boolean math
is absolute; logic cells and real layouts are not
§ We Can’t Keep them Right
§ Chips are suscep;ble to Supply Transients, Wear-Out and High-Energy
par;cles. Most damage is not immediately obvious.
... And it will all get worse as process geometries shrink
... Yet every year we make Billions of Systems that work!
"The Naysayers are just Harbingers of Doom!"
So Complex Electronic Systems are Impossible!
§ System-Level Dependability is what maCers ...
§ Component and Sub-System dependability is inherently poor (and will get worse).
§ ProducNvity demands that Dependable Systems must Reuse Components and Sub-
Systems (Physical and Virtual); and the aﬀordable ones are of Commercial quality!
§ Clean-Sheet design is not an op;on for almost all complex products!
... the cost-is-no-object customer is an endangered specie
§ Increasing the Dependability of Components and Sub-Systems helps; but can never be enough
§ ARM product is really; 'Enhanced Reuse for Electronic System Design and Manufacture'
... The Only Place to implement System-Level Dependability on an Undependable
Plauorm, is at the System-Layer!
§ Reliable components and sub-systems will help, but cannot ever be enough
§ Predominantly a 'So.ware' challenge; but not alone (Don't forget the simple Watch-Dog)
Dependable on Undependable
Any Methods that are based on perfection in HW or SW are untenable ...
The Real Conclusions
§ Systems are what End-Customers buy; they expect them to be Dependable Enough
§ A subjec;ve concept; which is Applica;on, State and Context dependent (& Technology independent)
§ Commercial Components (HW/SW) will be the building blocks of Dependable Systems
§ Commercial use gives us the Technologies which we are economically bound to use today
§ Though they work bexer than we would rightly expect, we cannot quan;fy their quality
§ Improving their Quality/Reliability/Dependability helps; but 100% is an asympto;c goal!
§ The System Knows what the System Wants
§ So: System behaviour and robustness must be handled at the System-Level (Top-Level);
only it can know the expected ac;on and appropriate correc;ve ac;on for its domain.
§ And: Because of the size of the Func;onal and Non-Func;onal Space, conformance cannot be
measured; so it will require a Policy Based approach.
... Meanwhile systems that people depend on will be produced
... The Commercial Impera;ve can’t/won't wait for the 'right methodology'
The END IsVery Nigh ...
Pdf & SlideCast through http://ianp24.blogspot.com