Tackling	
  400	
  MHz	
  Timing-­‐
Closure	
  for	
  25/50/100	
  GbE	
  
Shep	
  Siegel	
  
Atomic	
  Rules	
  LLC	
  
1	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Shepard.Siegel@atomicrules.com	
  
Tackling	
  400	
  MHz	
  Timing	
  Closure	
  
2015-­‐09-­‐22	
  
IntroducGon	
  Transcript	
  1/3	
  
•  “	
  I	
  don’t	
  know	
  of	
  any	
  reason	
  why	
  you	
  would	
  
have	
  (Gming	
  closure)	
  issues	
  with	
  the	
  V-­‐US	
  
fabric	
  at	
  400	
  MHz,	
  why	
  don’t	
  you	
  try	
  it	
  and	
  
see	
  how	
  it	
  goes?”	
  
– Gordon	
  Brebner	
  
(personal	
  correspondence,	
  Fall	
  2014)	
  
2	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
IntroducGon	
  Transcript	
  2/3	
  
•  “Sure,	
  sounds	
  great,	
  we	
  are	
  pu[ng	
  our	
  best	
  
engineers	
  right	
  on	
  it.	
  The	
  25/50/100	
  GbE	
  work	
  
you	
  are	
  doing	
  sounds	
  exciGng!”	
  
– Shep	
  Siegel	
  
(personal	
  response	
  to	
  Gordon,	
  Fall	
  2014)	
  
3	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
IntroducGon	
  Transcript	
  3/3	
  
•  {	
  Sound	
  of	
  Impact	
  }	
  
– Unknown	
  
(somewhere	
  in	
  Vivado	
  2015.1,	
  January	
  2015)	
  
	
  
4	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Architecture	
  Mabers!	
  
•  This	
  talk	
  teases	
  Fmax	
  and	
  Timing	
  Closure	
  
•  It	
  is	
  really	
  about	
  how	
  to	
  avoid	
  ge[ng	
  painted	
  
into	
  that	
  problem-­‐corner	
  in	
  the	
  first	
  place	
  
	
  
•  And	
  that	
  requires	
  good	
  Architecture	
  
5	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Why	
  Architecture?	
  
•  Architecture	
  impacts	
  many	
  aspects	
  of	
  design,	
  
Gming	
  closure	
  is	
  but	
  one	
  of	
  them	
  
•  Architectural	
  choices	
  are	
  Strategic	
  
– Expensive	
  if	
  you	
  get	
  it	
  wrong	
  
•  Domain-­‐Specific-­‐Languages	
  (DSLs)	
  make	
  
Architectural	
  InvesGgaGon	
  easier	
  than	
  ever	
  	
  
6	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
What	
  Architecture?	
  
•  In	
  what	
  way	
  do	
  you	
  wish	
  to	
  express	
  the	
  design?	
  
–  World	
  of	
  choice	
  of	
  DSLs,	
  legacy	
  RTLs	
  
–  You	
  can	
  mix	
  and	
  match	
  with	
  Vivado	
  IPI	
  
•  What	
  choices	
  are	
  you	
  making?	
  
–  Language	
  Choices	
  
•  ‘C’	
  or	
  other	
  imperaGve	
  expression	
  
•  Your	
  DSL	
  of	
  choice	
  –	
  What	
  is	
  Appropriate?	
  
•  “Pick	
  and	
  Shovel”	
  –	
  SomeGmes	
  a	
  legacy	
  RTL	
  is	
  just	
  fine	
  
–  Device-­‐Centric,	
  Structural	
  Choices	
  
•  MSLICE	
  vs.	
  LSLICE	
  
•  CARRY8	
  vs.	
  DSP48E2	
  
•  Distributed	
  RAM	
  vs.	
  BRAM	
  
•  Are	
  you	
  aware	
  that	
  you	
  are	
  viewing	
  the	
  problem	
  as	
  top-­‐down,	
  
bobom-­‐up,	
  or	
  middle-­‐out?	
  
7	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
How	
  Architecture?	
  
•  All	
  this	
  can	
  be	
  overwhelming	
  
•  Suggest	
  a	
  divide-­‐and-­‐conquer	
  approach	
  
– IteraGve	
  Refinement	
  is	
  one	
  way	
  
•  Don’t	
  delay,	
  start	
  experimenGng	
  at	
  once!	
  
– Small	
  Failures	
  ooen	
  yield	
  rich	
  insights	
  
8	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
This	
  Talk	
  –	
  Problem	
  Statement	
  
•  The	
  20	
  nm	
  UltraScale	
  fabric	
  is	
  fast	
  
•  25/50/100	
  GbE	
  suggests	
  a	
  natural	
  ~400	
  MHz	
  
– Area	
  and	
  Cost	
  concerns	
  to	
  keep	
  packet	
  data	
  paths	
  
as	
  narrow	
  and	
  occupied	
  as	
  is	
  pracGcal	
  
•  But	
  400	
  MHz	
  in	
  a	
  V-­‐US-­‐2	
  is	
  challenging	
  
– What	
  can	
  we	
  do	
  to	
  close	
  Gming?	
  
– How	
  do	
  we	
  avoid	
  negaGve	
  setup	
  slack?	
  
– How	
  did	
  we	
  close	
  Gming	
  with	
  25GbE	
  UDP/IP?	
  
	
   9	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Olivebridge	
  Manifesto	
  
•  AR	
  First-­‐Mover/Early-­‐Adopter	
  in	
  25	
  GbE	
  IP	
  
– Have	
  Product	
  ready	
  to	
  meet	
  market	
  needs	
  
•  L2/L3/L4	
  Packet	
  Processing	
  at	
  Line	
  Rates	
  
– Ethernet	
  802.3	
  /	
  Internet	
  Protocols	
  at	
  25	
  Gbps	
  
•  400	
  MHz	
  Fabric	
  OperaGon	
  on	
  20	
  nm	
  FPGAs	
  
– Requires	
  Specialized	
  Circuit	
  and	
  Physical	
  Design	
  
•  2.5x	
  Under-­‐Clocking	
  for	
  10	
  GbE	
  on	
  28	
  nm	
  
– Broader	
  Market	
  While	
  25	
  Gb	
  AdopGon	
  Grows	
  
10	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Olivebridge	
  Focus	
  
•  UDP/IP	
  Datagram	
  Service	
  for	
  10	
  and	
  25	
  GbE	
  
– Well-­‐defined	
  funcGon	
  and	
  interfaces	
  
– Serve	
  exisGng	
  customer	
  needs	
  in	
  10	
  GbE	
  space	
  
– Be	
  the	
  early-­‐to-­‐market	
  in	
  nascent	
  25	
  GbE	
  space	
  
– Be	
  well	
  posiGoned	
  with	
  50/100	
  GbE	
  variaGons	
  
•  L2	
  802.3	
  Packet	
  ValidaGon	
  
– IniGally	
  use	
  FPGA	
  Vendor	
  MAC/PCS/PMA/PHY	
  IP	
  
– Self-­‐Synchronizing	
  Generators,	
  Mungers,	
  Checkers	
  
11	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Olivebridge	
  Plarorms	
  (parGal)	
  
•  BibWare	
  A5PL	
  –	
  28	
  nm	
  Altera	
  Arria	
  V	
  GZ	
  
•  BibWare	
  S5PEDS	
  –	
  28	
  nm	
  2x	
  Altera	
  StraGx	
  V	
  
•  Xilinx	
  ZC706	
  –	
  28	
  nm	
  Xilinx	
  Zynq-­‐7	
  
•  Xilinx	
  KCU105	
  –	
  20	
  nm	
  Xilinx	
  Kintex-­‐UltraScale	
  (K-­‐US)	
  
•  Xilinx	
  VCU107	
  –	
  20	
  nm	
  Xilinx	
  Virtex-­‐UltraScale	
  (V-­‐US)	
  
•  BibWare	
  Jasper-­‐	
  20	
  nm	
  Xilinx	
  Virtex-­‐UltraScale	
  (V-­‐US)	
  
•  BibWare	
  Mustang	
  –	
  20	
  nm	
  Xilinx	
  Kintex-­‐UltraScale	
  (K-­‐US)	
  
•  BibWare	
  A10PS4	
  -­‐	
  20	
  nm	
  Altera	
  Arria	
  10	
  (vaporware)	
  
•  Xilinx	
  TBD	
  –	
  16	
  nm	
  Xilinx	
  Virtex-­‐UltraScale+	
  (V-­‐US+)	
  
	
  
12	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Where	
  to	
  Start?	
  
•  Business	
  Unit	
  idenGfied	
  the	
  25	
  GBE	
  UDP/IP	
  
baked-­‐into	
  the	
  Olivebridge	
  Manifesto	
  
– Made	
  a	
  business	
  case	
  for	
  investment	
  
•  We	
  started	
  sketching	
  top-­‐down;	
  but	
  know	
  
from	
  experience	
  that	
  bobom-­‐up	
  will	
  come	
  
into	
  play	
  
	
  
13	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
25	
  GbE	
  UDP/IP	
  1/2	
  
•  We	
  bet	
  that	
  the	
  28	
  Gb	
  GTY	
  would	
  be	
  a	
  game-­‐changer	
  
	
  
•  We	
  knew	
  that	
  we	
  didn’t	
  have	
  the	
  depth	
  or	
  resources	
  
of	
  the	
  Sarance	
  team	
  at	
  Xilinx	
  
–  Would	
  rather	
  buy	
  than	
  build	
  the	
  PMA/PCS/MAC	
  
–  Ride	
  Xilinx’	
  coat-­‐tails	
  of	
  silicon,	
  tools,	
  IP	
  
•  We	
  wanted	
  our	
  first	
  IP	
  offering	
  to	
  be	
  unambiguous	
  in	
  
funcGon;	
  but	
  disGncGve	
  in	
  posiGoning	
  
–  UDP/IP	
  is	
  ubiquitous	
  (also	
  the	
  market	
  need)	
  
–  8B	
  data	
  paths	
  are	
  400	
  MHz	
  are	
  not	
  common	
  (yet)	
  
	
  
	
   14	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
25	
  GbE	
  UDP/IP	
  2/2	
  
15	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
BibWare	
  Jasper	
  V-­‐US	
  (VU095)	
  
16	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
ApplicaGon	
  Drives	
  Architecture	
  1/2	
  
•  Facts	
  on	
  the	
  ground	
  were	
  that	
  we	
  were	
  going	
  
to	
  use	
  the	
  Xilinx/Sarance	
  PHY/PMA/PCS/MAC	
  
stack.	
  
– Allowed	
  us	
  to	
  quickly	
  kick	
  off	
  a	
  “show	
  me”	
  demo	
  
of	
  25	
  GbE	
  
•  From	
  L2	
  down	
  to	
  the	
  wire,	
  we	
  had	
  to	
  trust	
  
Xilinx	
  
	
  
	
   17	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
ApplicaGon	
  Drives	
  Architecture	
  2/2	
  
•  But	
  from	
  the	
  MAC	
  L2	
  interface	
  on	
  up,	
  the	
  
freedom	
  to	
  innovate	
  was	
  in	
  our	
  hands	
  
– Our	
  triumph	
  if	
  our	
  choices	
  are	
  good	
  
– Our	
  failure	
  if	
  our	
  choices	
  are	
  bad	
  
•  IPI	
  enables	
  heterogeneous	
  architectures	
  
within	
  a	
  single	
  applicaGon	
  
– One	
  approach	
  does	
  not	
  need	
  to	
  fit	
  all!	
  
	
  
	
  
18	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Off	
  to	
  Work	
  
•  We	
  love	
  our	
  DSLs!	
  
•  Away	
  we	
  code	
  in	
  Bluespec	
  SystemVerilog	
  
(BSV),	
  and	
  in	
  a	
  few	
  weeks	
  we	
  have	
  funcGonal	
  
sim	
  of	
  key	
  elements	
  
– ARP	
  Cache	
  
– SegmentaGon	
  and	
  Reassembly	
  
– IGMP	
  Join/Leave	
  Machinery	
  
– PCAP	
  files	
  as	
  sinks	
  and	
  sources	
  of	
  packet	
  streams	
  
19	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Architecture	
  drives	
  ImplementaGon	
  
•  As	
  good	
  and	
  bad	
  ideas	
  about	
  architecture	
  
anneal,	
  good	
  Gme	
  to	
  not	
  lose	
  site	
  of	
  basic	
  
facts	
  on	
  the	
  ground	
  
	
  
•  Remember	
  
– Architecture	
  is	
  the	
  key	
  Fmax	
  driver	
  
– Experiment	
  early	
  and	
  ooen	
  
– Otherwise	
  you	
  may	
  become	
  a	
  vicGm	
  to	
  the	
  tools	
  
20	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
No	
  Magic	
  –	
  Just	
  Sound	
  PracGce	
  
•  Mature	
  20	
  nm	
  Silicon	
  
– Study	
  the	
  data	
  sheets,	
  use	
  DocNav	
  
	
  
•  Mature	
  FPGA	
  CAD	
  Tools	
  (e.g.	
  Vivado	
  2015.x)	
  
– Run	
  out-­‐of-­‐context	
  builds	
  early	
  and	
  ooen	
  
	
  
•  Mature	
  Engineers	
  
– Frequency	
  Scaling	
  ended	
  a	
  decade	
  ago	
  
21	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Crawl,	
  Walk,	
  Run	
  
•  Observing	
  your	
  code	
  run	
  CORRECTLY	
  in	
  
Verilog	
  sim	
  is	
  a	
  valuable	
  pre-­‐condiGon	
  for	
  
architectural	
  innovaGon!	
  
– You	
  can	
  then	
  automate	
  the	
  tests,	
  so	
  you	
  can	
  
watch	
  your	
  innovaGon	
  break	
  the	
  regressions,	
  then	
  
refine	
  your	
  innovaGon	
  to	
  be	
  correct	
  
•  FuncGonal-­‐Correctness	
  First,	
  Performance	
  
Correctness	
  IteraGvely	
  
22	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Synthesis,	
  Out-­‐of-­‐Context	
  
•  Sooware	
  Engineers	
  say	
  “compile	
  early	
  and	
  
ooen”	
  
•  Circuit	
  Designers	
  can	
  do	
  the	
  same	
  by	
  running	
  
Vivado	
  Synthesis,	
  out-­‐of-­‐context,	
  on	
  RTL	
  
circuit	
  fragments	
  (sub-­‐modules)	
  
– Ge[ng	
  feedback	
  in	
  minutes	
  as	
  to	
  the	
  
approximate	
  area	
  and	
  Fmax	
  of	
  a	
  module	
  is	
  one	
  of	
  
the	
  most-­‐exploitable	
  objecGve	
  measures	
  at	
  your	
  
disposal	
  
23	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
What	
  Happened?	
  
•  Our	
  first	
  architectural	
  choices	
  gave	
  us	
  a	
  taste	
  
of	
  correct	
  funcGon;	
  but	
  missed	
  Gming	
  
miserably.	
  	
  
– We	
  had	
  over	
  50%	
  negaGve	
  setup	
  slack	
  (negaGve	
  
setup	
  Gme	
  of	
  more	
  than	
  1.25	
  ns	
  on	
  a	
  2.5	
  ns	
  Gming	
  
arc)	
  
•  Panic	
  or	
  Progress	
  
– Progress	
  of	
  course!	
  	
  
– We	
  call	
  this	
  a	
  “Happy	
  Mistake”	
  
24	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Two	
  Paths	
  
•  Our	
  iniGal	
  architecture	
  was	
  coming	
  up	
  short	
  at	
  
25	
  GbE	
  (400	
  MHz).	
  The	
  architecture-­‐lead	
  
handed	
  the	
  design	
  over	
  to	
  the	
  
implementaGon-­‐lead	
  with	
  a	
  simple	
  task:	
  
– Under-­‐clock	
  the	
  RTL	
  IP	
  at	
  156.25	
  MHz	
  instead	
  of	
  
400	
  MHz	
  to	
  realize	
  the	
  same	
  funcGon	
  at	
  10	
  GbE,	
  
not	
  25	
  GbE	
  
•  The	
  architecture-­‐lead	
  went	
  back	
  to	
  looking	
  at	
  
topologies	
  with	
  6	
  or	
  fewer	
  levels	
  of	
  6LUTs	
  
25	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
10	
  GbE	
  Success	
  
•  Before	
  we	
  sebled	
  on	
  the	
  correct	
  architecture	
  
choices	
  for	
  25	
  GbE,	
  we	
  had	
  a	
  funcGonal	
  demo	
  
at	
  10	
  GbE	
  to	
  show	
  our	
  stakeholders	
  
•  And	
  since	
  we	
  had	
  yet	
  to	
  make	
  some	
  of	
  the	
  
hard	
  architectural	
  decisions,	
  we	
  had	
  
parallelized	
  some	
  of	
  the	
  architecture	
  work	
  
with	
  the	
  implementaGon	
  work	
  
	
   26	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
27	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
BibWare	
  S5PEDS	
  
8	
  x	
  TXRX	
  x	
  10	
  GbE	
  
Channels	
  /	
  S5	
  	
  
400	
  MHz	
  or	
  Bust	
  
•  As	
  much	
  as	
  this	
  talk	
  stresses	
  architectural	
  
innovaGon	
  at	
  the	
  early	
  stage,	
  it’s	
  worthwhile	
  
to	
  run	
  through	
  P&R	
  to	
  be	
  sure	
  
– We	
  found	
  physical	
  design	
  issues,	
  essenGally	
  
independent	
  of	
  the	
  architectural	
  choices	
  
– Addressing	
  some	
  of	
  them	
  made	
  reasoning	
  about	
  
the	
  architecture	
  choices	
  easier	
  
– In	
  the	
  end	
  the	
  numbers	
  from	
  the	
  backend	
  of	
  
Vivado	
  are	
  the	
  ones	
  that	
  count	
  
28	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
ElasGc	
  Pipelines	
  and	
  Atomic	
  Rules	
  
•  Unsurprisingly,	
  one	
  of	
  our	
  go-­‐to	
  design-­‐
paberns	
  are	
  elasGc	
  pipelines	
  that	
  are	
  
produced	
  and	
  consumed	
  by	
  atomic	
  rules	
  
– In	
  the	
  end	
  we	
  achieved	
  our	
  throughput	
  and	
  area	
  
goals	
  by	
  adding	
  some	
  latency-­‐jiber	
  by	
  the	
  use	
  of	
  a	
  
cascade	
  of	
  shallow	
  (2	
  deep)	
  FIFOs	
  implemented	
  
out	
  of	
  fabric,	
  distributed	
  RAM,	
  or	
  SRL16/32s.	
  
– Since	
  “lowest	
  latency”	
  was	
  not	
  one	
  of	
  the	
  current	
  
requirements,	
  we	
  stopped	
  short	
  of	
  creaGng	
  a	
  
staGc	
  schedule.	
  
	
   29	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
30	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
VU095	
  Single	
  
25GbE	
  
Channel	
  TX/RX	
  
200Gb	
  BisecGon	
  BW	
  on	
  a	
  3m	
  Cable	
  
31	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Hoplite	
  to	
  the	
  Rescue	
  
•  We’ve	
  been	
  closely	
  following	
  the	
  work	
  of	
  Jan	
  
Gray	
  and	
  Nachiket	
  Kapre	
  on	
  their	
  Hoplite	
  NoC	
  
– Best	
  paper	
  award	
  at	
  FPL	
  2015	
  
– Architecture-­‐driven	
  design	
  of	
  an	
  austere	
  NoC	
  	
  
– SpaGal	
  Programing	
  taken	
  to	
  an	
  extreme	
  
– Harmonizes	
  with	
  other	
  400	
  MHz	
  IPs	
  (Olivebridge)	
  
	
  
	
  
32	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
33	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
KU040-­‐2,	
  4x6,	
  2.4ns,	
  6600	
  LUTs(2.7%),	
  16K	
  DFF	
  
(3.4%),	
  0-­‐injecGon	
  latency	
  31ns,	
  100Gb/node	
  	
  
Image	
  courtesy	
  of	
  Jan	
  Gray.	
  See	
  fpga.org/hoplite	
  
Conclusions	
  and	
  Summary	
  
•  400	
  MHz	
  fabric	
  logic	
  is	
  achievable	
  in	
  -­‐2	
  grade	
  
20	
  nm	
  V/K-­‐US	
  with	
  plenty	
  of	
  up-­‐front	
  thinking	
  
•  Failure	
  to	
  understand	
  the	
  Gming	
  costs	
  early	
  
can	
  impact	
  the	
  schedule,	
  or	
  stop	
  the	
  show!	
  
•  Vivado	
  empowers	
  the	
  designer	
  at	
  several	
  
levels	
  to	
  successfully	
  reach	
  these	
  goals	
  
– In	
  IPI	
  to	
  allow	
  heterogeneous,	
  best-­‐tools	
  DSLs	
  
– In	
  Synthesis	
  to	
  get	
  Gming	
  and	
  area	
  early	
  and	
  ooen	
  
– In	
  P&R	
  to,	
  more	
  Gmes	
  than	
  not,	
  beber	
  Synthesis	
  	
  
	
  
	
  
34	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
AR	
  Background	
  
35	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
About	
  Atomic	
  Rules	
  
•  Digital	
  Systems	
  Consultancy	
  based	
  in	
  Auburn,	
  
NH	
  
	
  
•  Specializing	
  in	
  FPGA	
  Programming,	
  Systems	
  
IntegraGon	
  for	
  Commercial	
  and	
  Defense	
  
	
  
•  In	
  business	
  for	
  7	
  years	
  
– AcGvely	
  recruiGng	
  new	
  engineers	
  
	
  
36	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Lead	
  Talent	
  
•  Shep	
  Siegel	
  (CTO	
  and	
  Founder):	
  
– 	
  ex-­‐Mercury	
  Systems,	
  ex-­‐Datacube,	
  ex-­‐Ampex,	
  
author	
  and	
  speaker,	
  graduate	
  of	
  RIT,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Senior	
  Member	
  IEEE,	
  Senior	
  Member	
  ACM	
  
	
  
•  David	
  Wright	
  (VP	
  Strategy):	
  
– 	
  ex-­‐IBM,	
  ex-­‐Perot	
  Systems,	
  ex-­‐Datacube,	
  	
  	
  	
  
graduate	
  of	
  UNH,	
  Agile	
  methods	
  ScrumMaster	
  
37	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Lead	
  Talent	
  (conGnued)	
  
•  John	
  Miller	
  (Embedded	
  Sooware):	
  
–  RDMA,	
  ARM,	
  POSIX,	
  RTOS,	
  Ethernet,	
  PCIe,	
  Linux	
  KMD	
  
–  20	
  Years	
  Bridging	
  H/W	
  and	
  S/W	
  for	
  Embedded	
  
–  John.Miller@atomicrules.com	
  	
  
	
  
•  Hadar	
  Agam	
  (EE/CS	
  Digital	
  Design):	
  
–  Complex	
  Concurrency,	
  Bluespec	
  SystemVerilog,	
  Digital	
  Design	
  
–  20	
  Years	
  of	
  industry-­‐leading	
  design	
  innovaGon	
  
–  Hadar.Agam@atomicrules.com	
  	
  	
  
	
  
•  Ed	
  Czeck	
  (EE/CS	
  Digital	
  Design):	
  
–  Complex	
  Concurrency,	
  FuncGonal	
  Programming,	
  Digital	
  Design	
  
–  20	
  Years	
  of	
  industry-­‐leading	
  design	
  innovaGon	
  
–  Ed.Czeck@atomicrules.com	
  	
  	
  
38	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Lead	
  Talent	
  (conGnued)	
  
•  Bach	
  Long	
  (EE/CS	
  Digital	
  Design)	
  
–  Verilog,	
  BSV,	
  Deep	
  Quartus/Vivado	
  
–  10	
  Years	
  Reconfigurable	
  CompuGng	
  
–  Bach.Long@atomicrules.com	
  	
  
	
  
•  Aaron	
  Severance	
  (EE/CS	
  Digital	
  Design):	
  
–  Vector	
  Processing,	
  Verilog,	
  System	
  Design	
  
–  Recent	
  UBC	
  PhD,	
  VectorBlox	
  Co-­‐Founder	
  
–  Aaron.Severance@atomicrules.com	
  
	
  
•  Steve	
  Gabriel	
  (EE/DSP/Math	
  Control/Signal	
  System	
  Design):	
  
–  Quad8,	
  Evans&Sutherland,	
  Ampex,	
  Microsoo	
  
–  Quaternion	
  rotaGons	
  and	
  Galois	
  fields	
  
–  Steve.Gabriel@atomicrules.com 	
  	
  
	
  
39	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
About	
  Atomic	
  Rules	
  1/2	
  
•  A	
  typical	
  engagement	
  is	
  to	
  create	
  codes	
  to	
  
operaGonalize	
  a	
  plarorm,	
  create/refresh	
  an	
  
applicaGon,	
  or	
  both	
  
	
  
•  We	
  sell	
  rights	
  to	
  use	
  the	
  IP	
  we	
  own	
  
•  We	
  sell	
  IP	
  and	
  product	
  development	
  services	
  
•  We	
  sell	
  support	
  services	
  around	
  code	
  we	
  write	
  
	
  
40	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
About	
  Atomic	
  Rules	
  2/2	
  
•  We	
  know	
  Signals	
  and	
  Systems	
  
•  We	
  know	
  Complex	
  Concurrency	
  
•  We	
  know	
  Middleware	
  and	
  IP	
  IntegraGon	
  
	
  	
  	
  	
  to	
  offer	
  our	
  clients…	
  
•  “More	
  with	
  More”,	
  in	
  less	
  Gme	
  
•  Fewer	
  defects:	
  Correct-­‐by-­‐ConstrucGon	
  
•  Greater	
  producGvity,	
  reduced	
  Gme-­‐to-­‐soluGon	
  
•  Reduced	
  cost	
  of	
  reuse	
  and	
  tech	
  refresh	
  
	
  
41	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Core	
  Beliefs	
  and	
  Axioms	
  
•  SeparaGon	
  of	
  Concerns	
  
•  Divide	
  and	
  Conquer	
  
•  Automate	
  or	
  Die	
  
•  Write	
  Things	
  Once	
  
•  Interface	
  Before	
  ImplementaGon	
  
•  FuncGonal	
  Correctness	
  First	
  
•  Components	
  Must	
  Compose	
  
•  Components	
  Work	
  as	
  Expected	
  
•  IP	
  Should	
  be	
  Portable,	
  Vendor-­‐AgnosGc	
  if	
  
possible	
  
42	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Agile	
  and	
  IteraGve	
  
•  Rapidly	
  “Going	
  Deep”	
  is	
  a	
  highly-­‐valued	
  
– Proof	
  points	
  that	
  can	
  be	
  seen	
  and	
  measured	
  
	
  
•  Agile	
  and	
  IteraGve	
  design	
  
– Achieve	
  FuncGonal	
  Correctness	
  quickly	
  
– Achieve	
  Performance	
  Correctness	
  itera8vely	
  
43	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Client	
  Roster	
  (parGal	
  list)	
  
•  BAE	
  Systems	
  
•  CSPi	
  /	
  Myricom	
  
•  DRS	
  Technologies	
  
•  Maxim	
  Integrated	
  
•  Mercury	
  Federal	
  Systems	
  
•  Skreens	
  Entertainment	
  
•  Stanford	
  University	
  
•  US	
  Air	
  Force	
  (AFRL)	
  
•  Xilinx	
  
	
   44	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Partner	
  Roster	
  
•  25G	
  /	
  50G	
  Ethernet	
  ConsorGum	
  
•  25-­‐50-­‐100	
  Ethernet	
  Alliance	
  
•  Accellera/OCP-­‐IP	
  Community	
  Member	
  
•  ARM	
  Connected	
  Community	
  Member	
  
•  BibWare	
  SoluGon	
  Partner	
  
•  Bluespec	
  Technology	
  Partner	
  
•  MathWorks	
  ConnecGons	
  Partner	
  
•  NetFPGA	
  Infrastructure	
  Developer	
  
•  OpenCPI	
  Infrastructure	
  Developer	
  
•  P4	
  Language	
  ConsorGum	
  Member	
  
•  PCI-­‐SIG	
  Member	
  
•  VITA	
  Trade	
  AssociaGon	
  Member	
  
•  Xilinx	
  Alliance	
  Member	
  Partner	
  
45	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
46	
  NetFPGA-­‐10G	
  team	
  at	
  Xilinx	
  Dublin	
  
Global	
  Teaming	
  
47	
  ©2015	
  Atomic	
  Rules	
  LLC	
  
Intern	
  Program	
  
Atomic	
  Rules	
  Intern	
  Emeritus,	
  U-­‐Ark	
  MSCE	
  Graduate	
  (2013),	
  ChrisGna	
  Smith	
  
48	
  OperaGonal	
  10	
  GbE	
  on	
  65	
  nm	
  Virtex-­‐5,	
  	
  October	
  2010	
  
10	
  GbE	
  
First-­‐Movers	
  
49	
  
OperaGonal	
  25	
  GbE	
  on	
  20	
  nm	
  Virtex-­‐UltraScale	
  VU095	
  on	
  VC107,	
  May	
  2015	
  
25	
  GbE	
   Early-­‐Adopter	
  
Thank	
  You	
  
•  Please	
  share	
  with	
  us	
  your	
  challenges	
  
•  Come	
  see	
  us	
  sponsor	
  (every	
  year	
  since	
  2008)	
  
– FPGA-­‐2016	
  (Monterey,	
  CA)	
  
– FCCM-­‐2016	
  (Washington,	
  DC)	
  
	
  
•  Follow	
  our	
  CTO	
  Blog	
  “Scalable	
  Atomicity”	
  
•  Thanks!	
  
	
  
	
   50	
  ©2015	
  Atomic	
  Rules	
  LLC	
  

Tackling 400 MHz Timing Closure

  • 1.
    Tackling  400  MHz  Timing-­‐ Closure  for  25/50/100  GbE   Shep  Siegel   Atomic  Rules  LLC   1  ©2015  Atomic  Rules  LLC   Shepard.Siegel@atomicrules.com   Tackling  400  MHz  Timing  Closure   2015-­‐09-­‐22  
  • 2.
    IntroducGon  Transcript  1/3   •  “  I  don’t  know  of  any  reason  why  you  would   have  (Gming  closure)  issues  with  the  V-­‐US   fabric  at  400  MHz,  why  don’t  you  try  it  and   see  how  it  goes?”   – Gordon  Brebner   (personal  correspondence,  Fall  2014)   2  ©2015  Atomic  Rules  LLC  
  • 3.
    IntroducGon  Transcript  2/3   •  “Sure,  sounds  great,  we  are  pu[ng  our  best   engineers  right  on  it.  The  25/50/100  GbE  work   you  are  doing  sounds  exciGng!”   – Shep  Siegel   (personal  response  to  Gordon,  Fall  2014)   3  ©2015  Atomic  Rules  LLC  
  • 4.
    IntroducGon  Transcript  3/3   •  {  Sound  of  Impact  }   – Unknown   (somewhere  in  Vivado  2015.1,  January  2015)     4  ©2015  Atomic  Rules  LLC  
  • 5.
    Architecture  Mabers!   • This  talk  teases  Fmax  and  Timing  Closure   •  It  is  really  about  how  to  avoid  ge[ng  painted   into  that  problem-­‐corner  in  the  first  place     •  And  that  requires  good  Architecture   5  ©2015  Atomic  Rules  LLC  
  • 6.
    Why  Architecture?   • Architecture  impacts  many  aspects  of  design,   Gming  closure  is  but  one  of  them   •  Architectural  choices  are  Strategic   – Expensive  if  you  get  it  wrong   •  Domain-­‐Specific-­‐Languages  (DSLs)  make   Architectural  InvesGgaGon  easier  than  ever     6  ©2015  Atomic  Rules  LLC  
  • 7.
    What  Architecture?   • In  what  way  do  you  wish  to  express  the  design?   –  World  of  choice  of  DSLs,  legacy  RTLs   –  You  can  mix  and  match  with  Vivado  IPI   •  What  choices  are  you  making?   –  Language  Choices   •  ‘C’  or  other  imperaGve  expression   •  Your  DSL  of  choice  –  What  is  Appropriate?   •  “Pick  and  Shovel”  –  SomeGmes  a  legacy  RTL  is  just  fine   –  Device-­‐Centric,  Structural  Choices   •  MSLICE  vs.  LSLICE   •  CARRY8  vs.  DSP48E2   •  Distributed  RAM  vs.  BRAM   •  Are  you  aware  that  you  are  viewing  the  problem  as  top-­‐down,   bobom-­‐up,  or  middle-­‐out?   7  ©2015  Atomic  Rules  LLC  
  • 8.
    How  Architecture?   • All  this  can  be  overwhelming   •  Suggest  a  divide-­‐and-­‐conquer  approach   – IteraGve  Refinement  is  one  way   •  Don’t  delay,  start  experimenGng  at  once!   – Small  Failures  ooen  yield  rich  insights   8  ©2015  Atomic  Rules  LLC  
  • 9.
    This  Talk  –  Problem  Statement   •  The  20  nm  UltraScale  fabric  is  fast   •  25/50/100  GbE  suggests  a  natural  ~400  MHz   – Area  and  Cost  concerns  to  keep  packet  data  paths   as  narrow  and  occupied  as  is  pracGcal   •  But  400  MHz  in  a  V-­‐US-­‐2  is  challenging   – What  can  we  do  to  close  Gming?   – How  do  we  avoid  negaGve  setup  slack?   – How  did  we  close  Gming  with  25GbE  UDP/IP?     9  ©2015  Atomic  Rules  LLC  
  • 10.
    Olivebridge  Manifesto   • AR  First-­‐Mover/Early-­‐Adopter  in  25  GbE  IP   – Have  Product  ready  to  meet  market  needs   •  L2/L3/L4  Packet  Processing  at  Line  Rates   – Ethernet  802.3  /  Internet  Protocols  at  25  Gbps   •  400  MHz  Fabric  OperaGon  on  20  nm  FPGAs   – Requires  Specialized  Circuit  and  Physical  Design   •  2.5x  Under-­‐Clocking  for  10  GbE  on  28  nm   – Broader  Market  While  25  Gb  AdopGon  Grows   10  ©2015  Atomic  Rules  LLC  
  • 11.
    Olivebridge  Focus   • UDP/IP  Datagram  Service  for  10  and  25  GbE   – Well-­‐defined  funcGon  and  interfaces   – Serve  exisGng  customer  needs  in  10  GbE  space   – Be  the  early-­‐to-­‐market  in  nascent  25  GbE  space   – Be  well  posiGoned  with  50/100  GbE  variaGons   •  L2  802.3  Packet  ValidaGon   – IniGally  use  FPGA  Vendor  MAC/PCS/PMA/PHY  IP   – Self-­‐Synchronizing  Generators,  Mungers,  Checkers   11  ©2015  Atomic  Rules  LLC  
  • 12.
    Olivebridge  Plarorms  (parGal)   •  BibWare  A5PL  –  28  nm  Altera  Arria  V  GZ   •  BibWare  S5PEDS  –  28  nm  2x  Altera  StraGx  V   •  Xilinx  ZC706  –  28  nm  Xilinx  Zynq-­‐7   •  Xilinx  KCU105  –  20  nm  Xilinx  Kintex-­‐UltraScale  (K-­‐US)   •  Xilinx  VCU107  –  20  nm  Xilinx  Virtex-­‐UltraScale  (V-­‐US)   •  BibWare  Jasper-­‐  20  nm  Xilinx  Virtex-­‐UltraScale  (V-­‐US)   •  BibWare  Mustang  –  20  nm  Xilinx  Kintex-­‐UltraScale  (K-­‐US)   •  BibWare  A10PS4  -­‐  20  nm  Altera  Arria  10  (vaporware)   •  Xilinx  TBD  –  16  nm  Xilinx  Virtex-­‐UltraScale+  (V-­‐US+)     12  ©2015  Atomic  Rules  LLC  
  • 13.
    Where  to  Start?   •  Business  Unit  idenGfied  the  25  GBE  UDP/IP   baked-­‐into  the  Olivebridge  Manifesto   – Made  a  business  case  for  investment   •  We  started  sketching  top-­‐down;  but  know   from  experience  that  bobom-­‐up  will  come   into  play     13  ©2015  Atomic  Rules  LLC  
  • 14.
    25  GbE  UDP/IP  1/2   •  We  bet  that  the  28  Gb  GTY  would  be  a  game-­‐changer     •  We  knew  that  we  didn’t  have  the  depth  or  resources   of  the  Sarance  team  at  Xilinx   –  Would  rather  buy  than  build  the  PMA/PCS/MAC   –  Ride  Xilinx’  coat-­‐tails  of  silicon,  tools,  IP   •  We  wanted  our  first  IP  offering  to  be  unambiguous  in   funcGon;  but  disGncGve  in  posiGoning   –  UDP/IP  is  ubiquitous  (also  the  market  need)   –  8B  data  paths  are  400  MHz  are  not  common  (yet)       14  ©2015  Atomic  Rules  LLC  
  • 15.
    25  GbE  UDP/IP  2/2   15  ©2015  Atomic  Rules  LLC  
  • 16.
    BibWare  Jasper  V-­‐US  (VU095)   16  ©2015  Atomic  Rules  LLC  
  • 17.
    ApplicaGon  Drives  Architecture  1/2   •  Facts  on  the  ground  were  that  we  were  going   to  use  the  Xilinx/Sarance  PHY/PMA/PCS/MAC   stack.   – Allowed  us  to  quickly  kick  off  a  “show  me”  demo   of  25  GbE   •  From  L2  down  to  the  wire,  we  had  to  trust   Xilinx       17  ©2015  Atomic  Rules  LLC  
  • 18.
    ApplicaGon  Drives  Architecture  2/2   •  But  from  the  MAC  L2  interface  on  up,  the   freedom  to  innovate  was  in  our  hands   – Our  triumph  if  our  choices  are  good   – Our  failure  if  our  choices  are  bad   •  IPI  enables  heterogeneous  architectures   within  a  single  applicaGon   – One  approach  does  not  need  to  fit  all!       18  ©2015  Atomic  Rules  LLC  
  • 19.
    Off  to  Work   •  We  love  our  DSLs!   •  Away  we  code  in  Bluespec  SystemVerilog   (BSV),  and  in  a  few  weeks  we  have  funcGonal   sim  of  key  elements   – ARP  Cache   – SegmentaGon  and  Reassembly   – IGMP  Join/Leave  Machinery   – PCAP  files  as  sinks  and  sources  of  packet  streams   19  ©2015  Atomic  Rules  LLC  
  • 20.
    Architecture  drives  ImplementaGon   •  As  good  and  bad  ideas  about  architecture   anneal,  good  Gme  to  not  lose  site  of  basic   facts  on  the  ground     •  Remember   – Architecture  is  the  key  Fmax  driver   – Experiment  early  and  ooen   – Otherwise  you  may  become  a  vicGm  to  the  tools   20  ©2015  Atomic  Rules  LLC  
  • 21.
    No  Magic  –  Just  Sound  PracGce   •  Mature  20  nm  Silicon   – Study  the  data  sheets,  use  DocNav     •  Mature  FPGA  CAD  Tools  (e.g.  Vivado  2015.x)   – Run  out-­‐of-­‐context  builds  early  and  ooen     •  Mature  Engineers   – Frequency  Scaling  ended  a  decade  ago   21  ©2015  Atomic  Rules  LLC  
  • 22.
    Crawl,  Walk,  Run   •  Observing  your  code  run  CORRECTLY  in   Verilog  sim  is  a  valuable  pre-­‐condiGon  for   architectural  innovaGon!   – You  can  then  automate  the  tests,  so  you  can   watch  your  innovaGon  break  the  regressions,  then   refine  your  innovaGon  to  be  correct   •  FuncGonal-­‐Correctness  First,  Performance   Correctness  IteraGvely   22  ©2015  Atomic  Rules  LLC  
  • 23.
    Synthesis,  Out-­‐of-­‐Context   • Sooware  Engineers  say  “compile  early  and   ooen”   •  Circuit  Designers  can  do  the  same  by  running   Vivado  Synthesis,  out-­‐of-­‐context,  on  RTL   circuit  fragments  (sub-­‐modules)   – Ge[ng  feedback  in  minutes  as  to  the   approximate  area  and  Fmax  of  a  module  is  one  of   the  most-­‐exploitable  objecGve  measures  at  your   disposal   23  ©2015  Atomic  Rules  LLC  
  • 24.
    What  Happened?   • Our  first  architectural  choices  gave  us  a  taste   of  correct  funcGon;  but  missed  Gming   miserably.     – We  had  over  50%  negaGve  setup  slack  (negaGve   setup  Gme  of  more  than  1.25  ns  on  a  2.5  ns  Gming   arc)   •  Panic  or  Progress   – Progress  of  course!     – We  call  this  a  “Happy  Mistake”   24  ©2015  Atomic  Rules  LLC  
  • 25.
    Two  Paths   • Our  iniGal  architecture  was  coming  up  short  at   25  GbE  (400  MHz).  The  architecture-­‐lead   handed  the  design  over  to  the   implementaGon-­‐lead  with  a  simple  task:   – Under-­‐clock  the  RTL  IP  at  156.25  MHz  instead  of   400  MHz  to  realize  the  same  funcGon  at  10  GbE,   not  25  GbE   •  The  architecture-­‐lead  went  back  to  looking  at   topologies  with  6  or  fewer  levels  of  6LUTs   25  ©2015  Atomic  Rules  LLC  
  • 26.
    10  GbE  Success   •  Before  we  sebled  on  the  correct  architecture   choices  for  25  GbE,  we  had  a  funcGonal  demo   at  10  GbE  to  show  our  stakeholders   •  And  since  we  had  yet  to  make  some  of  the   hard  architectural  decisions,  we  had   parallelized  some  of  the  architecture  work   with  the  implementaGon  work     26  ©2015  Atomic  Rules  LLC  
  • 27.
    27  ©2015  Atomic  Rules  LLC   BibWare  S5PEDS   8  x  TXRX  x  10  GbE   Channels  /  S5    
  • 28.
    400  MHz  or  Bust   •  As  much  as  this  talk  stresses  architectural   innovaGon  at  the  early  stage,  it’s  worthwhile   to  run  through  P&R  to  be  sure   – We  found  physical  design  issues,  essenGally   independent  of  the  architectural  choices   – Addressing  some  of  them  made  reasoning  about   the  architecture  choices  easier   – In  the  end  the  numbers  from  the  backend  of   Vivado  are  the  ones  that  count   28  ©2015  Atomic  Rules  LLC  
  • 29.
    ElasGc  Pipelines  and  Atomic  Rules   •  Unsurprisingly,  one  of  our  go-­‐to  design-­‐ paberns  are  elasGc  pipelines  that  are   produced  and  consumed  by  atomic  rules   – In  the  end  we  achieved  our  throughput  and  area   goals  by  adding  some  latency-­‐jiber  by  the  use  of  a   cascade  of  shallow  (2  deep)  FIFOs  implemented   out  of  fabric,  distributed  RAM,  or  SRL16/32s.   – Since  “lowest  latency”  was  not  one  of  the  current   requirements,  we  stopped  short  of  creaGng  a   staGc  schedule.     29  ©2015  Atomic  Rules  LLC  
  • 30.
    30  ©2015  Atomic  Rules  LLC   VU095  Single   25GbE   Channel  TX/RX  
  • 31.
    200Gb  BisecGon  BW  on  a  3m  Cable   31  ©2015  Atomic  Rules  LLC  
  • 32.
    Hoplite  to  the  Rescue   •  We’ve  been  closely  following  the  work  of  Jan   Gray  and  Nachiket  Kapre  on  their  Hoplite  NoC   – Best  paper  award  at  FPL  2015   – Architecture-­‐driven  design  of  an  austere  NoC     – SpaGal  Programing  taken  to  an  extreme   – Harmonizes  with  other  400  MHz  IPs  (Olivebridge)       32  ©2015  Atomic  Rules  LLC  
  • 33.
    33  ©2015  Atomic  Rules  LLC   KU040-­‐2,  4x6,  2.4ns,  6600  LUTs(2.7%),  16K  DFF   (3.4%),  0-­‐injecGon  latency  31ns,  100Gb/node     Image  courtesy  of  Jan  Gray.  See  fpga.org/hoplite  
  • 34.
    Conclusions  and  Summary   •  400  MHz  fabric  logic  is  achievable  in  -­‐2  grade   20  nm  V/K-­‐US  with  plenty  of  up-­‐front  thinking   •  Failure  to  understand  the  Gming  costs  early   can  impact  the  schedule,  or  stop  the  show!   •  Vivado  empowers  the  designer  at  several   levels  to  successfully  reach  these  goals   – In  IPI  to  allow  heterogeneous,  best-­‐tools  DSLs   – In  Synthesis  to  get  Gming  and  area  early  and  ooen   – In  P&R  to,  more  Gmes  than  not,  beber  Synthesis         34  ©2015  Atomic  Rules  LLC  
  • 35.
    AR  Background   35  ©2015  Atomic  Rules  LLC  
  • 36.
    About  Atomic  Rules   •  Digital  Systems  Consultancy  based  in  Auburn,   NH     •  Specializing  in  FPGA  Programming,  Systems   IntegraGon  for  Commercial  and  Defense     •  In  business  for  7  years   – AcGvely  recruiGng  new  engineers     36  ©2015  Atomic  Rules  LLC  
  • 37.
    Lead  Talent   • Shep  Siegel  (CTO  and  Founder):   –   ex-­‐Mercury  Systems,  ex-­‐Datacube,  ex-­‐Ampex,   author  and  speaker,  graduate  of  RIT,                                 Senior  Member  IEEE,  Senior  Member  ACM     •  David  Wright  (VP  Strategy):   –   ex-­‐IBM,  ex-­‐Perot  Systems,  ex-­‐Datacube,         graduate  of  UNH,  Agile  methods  ScrumMaster   37  ©2015  Atomic  Rules  LLC  
  • 38.
    Lead  Talent  (conGnued)   •  John  Miller  (Embedded  Sooware):   –  RDMA,  ARM,  POSIX,  RTOS,  Ethernet,  PCIe,  Linux  KMD   –  20  Years  Bridging  H/W  and  S/W  for  Embedded   –  John.Miller@atomicrules.com       •  Hadar  Agam  (EE/CS  Digital  Design):   –  Complex  Concurrency,  Bluespec  SystemVerilog,  Digital  Design   –  20  Years  of  industry-­‐leading  design  innovaGon   –  Hadar.Agam@atomicrules.com         •  Ed  Czeck  (EE/CS  Digital  Design):   –  Complex  Concurrency,  FuncGonal  Programming,  Digital  Design   –  20  Years  of  industry-­‐leading  design  innovaGon   –  Ed.Czeck@atomicrules.com       38  ©2015  Atomic  Rules  LLC  
  • 39.
    Lead  Talent  (conGnued)   •  Bach  Long  (EE/CS  Digital  Design)   –  Verilog,  BSV,  Deep  Quartus/Vivado   –  10  Years  Reconfigurable  CompuGng   –  Bach.Long@atomicrules.com       •  Aaron  Severance  (EE/CS  Digital  Design):   –  Vector  Processing,  Verilog,  System  Design   –  Recent  UBC  PhD,  VectorBlox  Co-­‐Founder   –  Aaron.Severance@atomicrules.com     •  Steve  Gabriel  (EE/DSP/Math  Control/Signal  System  Design):   –  Quad8,  Evans&Sutherland,  Ampex,  Microsoo   –  Quaternion  rotaGons  and  Galois  fields   –  Steve.Gabriel@atomicrules.com       39  ©2015  Atomic  Rules  LLC  
  • 40.
    About  Atomic  Rules  1/2   •  A  typical  engagement  is  to  create  codes  to   operaGonalize  a  plarorm,  create/refresh  an   applicaGon,  or  both     •  We  sell  rights  to  use  the  IP  we  own   •  We  sell  IP  and  product  development  services   •  We  sell  support  services  around  code  we  write     40  ©2015  Atomic  Rules  LLC  
  • 41.
    About  Atomic  Rules  2/2   •  We  know  Signals  and  Systems   •  We  know  Complex  Concurrency   •  We  know  Middleware  and  IP  IntegraGon          to  offer  our  clients…   •  “More  with  More”,  in  less  Gme   •  Fewer  defects:  Correct-­‐by-­‐ConstrucGon   •  Greater  producGvity,  reduced  Gme-­‐to-­‐soluGon   •  Reduced  cost  of  reuse  and  tech  refresh     41  ©2015  Atomic  Rules  LLC  
  • 42.
    Core  Beliefs  and  Axioms   •  SeparaGon  of  Concerns   •  Divide  and  Conquer   •  Automate  or  Die   •  Write  Things  Once   •  Interface  Before  ImplementaGon   •  FuncGonal  Correctness  First   •  Components  Must  Compose   •  Components  Work  as  Expected   •  IP  Should  be  Portable,  Vendor-­‐AgnosGc  if   possible   42  ©2015  Atomic  Rules  LLC  
  • 43.
    Agile  and  IteraGve   •  Rapidly  “Going  Deep”  is  a  highly-­‐valued   – Proof  points  that  can  be  seen  and  measured     •  Agile  and  IteraGve  design   – Achieve  FuncGonal  Correctness  quickly   – Achieve  Performance  Correctness  itera8vely   43  ©2015  Atomic  Rules  LLC  
  • 44.
    Client  Roster  (parGal  list)   •  BAE  Systems   •  CSPi  /  Myricom   •  DRS  Technologies   •  Maxim  Integrated   •  Mercury  Federal  Systems   •  Skreens  Entertainment   •  Stanford  University   •  US  Air  Force  (AFRL)   •  Xilinx     44  ©2015  Atomic  Rules  LLC  
  • 45.
    Partner  Roster   • 25G  /  50G  Ethernet  ConsorGum   •  25-­‐50-­‐100  Ethernet  Alliance   •  Accellera/OCP-­‐IP  Community  Member   •  ARM  Connected  Community  Member   •  BibWare  SoluGon  Partner   •  Bluespec  Technology  Partner   •  MathWorks  ConnecGons  Partner   •  NetFPGA  Infrastructure  Developer   •  OpenCPI  Infrastructure  Developer   •  P4  Language  ConsorGum  Member   •  PCI-­‐SIG  Member   •  VITA  Trade  AssociaGon  Member   •  Xilinx  Alliance  Member  Partner   45  ©2015  Atomic  Rules  LLC  
  • 46.
    46  NetFPGA-­‐10G  team  at  Xilinx  Dublin   Global  Teaming  
  • 47.
    47  ©2015  Atomic  Rules  LLC   Intern  Program   Atomic  Rules  Intern  Emeritus,  U-­‐Ark  MSCE  Graduate  (2013),  ChrisGna  Smith  
  • 48.
    48  OperaGonal  10  GbE  on  65  nm  Virtex-­‐5,    October  2010   10  GbE   First-­‐Movers  
  • 49.
    49   OperaGonal  25  GbE  on  20  nm  Virtex-­‐UltraScale  VU095  on  VC107,  May  2015   25  GbE   Early-­‐Adopter  
  • 50.
    Thank  You   • Please  share  with  us  your  challenges   •  Come  see  us  sponsor  (every  year  since  2008)   – FPGA-­‐2016  (Monterey,  CA)   – FCCM-­‐2016  (Washington,  DC)     •  Follow  our  CTO  Blog  “Scalable  Atomicity”   •  Thanks!       50  ©2015  Atomic  Rules  LLC