OpenSPARC T1
Processor

DV Club
 – Silicon Valley
May 23rd, 2006

Shrenik Mehta
Director, Frontend Technologies &
OpenSPARC Program
System Group
Sun Microsystems, Inc.

  www.opensparc.net
Agenda


                    1. New wave – Chip Multi-Threading (CMT)
                    2. UltraSPARC T1
                    3. Participation Age – UltraSPARC T1
                       verification
                    4. OpenSPARC™ community




www.opensparc.net                                                          2
                               Design Verification Club – Silicon Valley
Making the Right Waves

                                                                                                    Chip Multi-threading
                                                                                                    (CoolThreads ) TM

                                                            Symmetrical
                                                            Multi-processing (SMP)
       Improved Price/Performance




                                      Reduced Instruction Set
                                      Computing (RISC)




                                    1980                1990                            2000                2010
www.opensparc.net                                                                                              3
                                                        Design Verification Club – Silicon Valley
Attributes of Commercial Workloads
                                        Web Services                             Client Server          Data
                                                                                                      Warehouse

                             TIER1          TIER2           TIER3        SAP 2T         SAP 3T         DSS
           Attribute           Web         App Serv           Data                        (DB)        (TPC-H)
                             (Web99)        (JBB)           (TPC-C)

         Application          Web            Server          OLTP            ERP          ERP           DSS
         Category            Server           Java


         Instruction-level
                              Low             Low             Low          Medium          Low          High
         Parallelism

         Thread-level
                              High            High            High           High         High          High
         Parallelism

         Instruction/Data     Large          Large           Large         Medium         Large        Large
         Working Set

         Data Sharing         Low           Medium            High         Medium         High         Medium



www.opensparc.net                                                                                 4
                                     Design Verification Club – Silicon Valley
Memory Bottleneck
     Relative Performance

      10000
                         CPU Frequency                             2x Every 2 Years
                         DRAM Speeds
        1000



         100                                                                          Gap

          10

                                                                   2x Every 6 Years
           1
               1980         1985             1990                  1995        2000         2005
www.opensparc.netWorld
 Source: Sun             Wide Analyst Conference Feb. 25, 2003                         5
                                   Design Verification Club – Silicon Valley
Single Threading                                                              HURRY
    Up to 85% Cycles Waiting for Memory                                           UP AND
                                                                                   WAIT!


            Single Threaded
              Performance

                                                         Typical Processor
                                                         Utilization:15–25%
                                      Thread


                                              C          M          C     M   C         M

                                                                                            Tim
                                                          Memory Latency      Compute        e


www.opensparc.net                                                                  6
                              Design Verification Club – Silicon Valley
The Power of CMT - CoolThreads

                   UltraSPARC T1
           Single Threaded
             Performance Utilization:
             Processor                                         Up                           Chip Multi-threaded
                                    to 85%                                                  (CMT) Performance


  Thread 4                  C       M       C     M C             M
  Thread 3              C       M       C       M C           M
  Thread 2          C       M       C       M C            M
  Thread 1    C         M       C       M C            M

                                                                            Tim
                        Memory Latency                 Compute               e


www.opensparc.net                                                                                       7
                                                Design Verification Club – Silicon Valley
Chip Multi-Threading (CMT) to the rescue




           CMP                       HMT                                         CMT
 (chip multiprocessing) (hardware multithreading)                                (chip
                                                                            multithreading)

    n cores per processor     m strands per core                        n x m threads per processor

www.opensparc.net                                                                       8
                            Design Verification Club – Silicon Valley
Why CMT Works

                                                  “Goal: 100% Resource Utilization”
                                    10.0
  Relative Performance on thread-




                                                                 Multi-Thread, Multi-Core
  rich memory bound workloads




                                                                              Multi-Thread, Single-Core
                                     2.0

                                     1.0                                        Single-Thread, Single-Core


                                                                                     SPARC: 4 threads per core
                                     .05                                             ● Increases core die area by ~20%
                                                                                     ● Improves performance by ~50–100%




                                           20%   Core Size               Maximum
www.opensparc.net                                                                                            9
                                                      Design Verification Club – Silicon Valley
Example - SpecJBB Execution Efficiency
                                            Idle Time

           Single
                                               3.79 cycles                                          1
                                                                                                          =             21%
          Threaded
                                                                                                 1 + 3.79            Efficiency
                                                                     Idle Time
                                                                  1.56 cycles
                                                                                                    4
                                                                                                          =            72%
            Four                                                                                 4 + 1.56           Efficiency
          Threaded


                                                                                                                  Cycles
                      0                                          4                                   8
                Compute             Pipeline Conflict                  Pipeline Latency                   Memory Latency
           A. S. Leon et al., “A Power-Efficient High Throughput 32-Thread SPARC Processor,” ISSCC06, Paper 5.1

                      Copyright Sun Microsystems 2006, Sun Microsystems, Inc. All rights reserved. Used by permission.
Page 10
UltraSPARC T1 Processor
 • SPARC V9 implementation                      DDR-2              DDR-2                 DDR-2         DDR-2
                                               SDRAM              SDRAM                 SDRAM         SDRAM
 • Up to eight 4-way multi-
   threaded cores for up to
   32 simultaneous threads
 • All cores connected through a
   134.4GB/s crossbar switch                               L2$             L2$          L2$      L2$
 • High-bandwidth 12-way associative                                             Xbar                    FPU
   3MB Level-2 cache on chip
 • 4 DDR2 channels (23GB/s)                             C1 C2 C3 C4 C5 C6 C7 C8
 • Power : < 80W
 • ~300M transistors                                                           Sys I/F
                                                                            Buffer Switch
 • 378 sq. mm die                                                               Core



                                              1 of 8 Cores                       BUS
www.opensparc.net                                                                                11
                               Design Verification Club – Silicon Valley
CMT: On-chip = High Bandwidth
 Traditional SMP: 32 Threads
       Example: Typical SMP Machine Configuration                              Niagara: 32 Threads
                                                                                  One motherboard, no switch ASICs
            P P P P                     P P P P
        I                                Switch   I
        O    Switch                               O                                                     Mem Ctlr
            M M M M                     M M M M
                                                                                        P   P      L2
            P P P P                     P P P P
                                                                                                   L2
                                                                                                        Mem Ctlr




                                                                                            XBar
                                                  I                                     P   P
        I
        O    Switch                      Switch   O              Switch                 P   P      L2
                                                                                        P   P      L2   Mem Ctlr
                               Switch
                      Switch




            M M M M                     M M M M
            P P P P                     P P P P                                                         Mem Ctlr
        I                                         I                                             I/O
        O    Switch                      Switch   O
            M M M M                     M M M M
                                                                                    Direct crossbar interconnect
            P P P P                     P P P P
        I                                         I                             Lower cost, better RAS, lower BTUs,
        O    Switch                      Switch   O                                 lower and uniform latency,
            M M M M                     M M M M                                  greater and uniform bandwidth. . .




www.opensparc.net                                                                                            12
                                            Design Verification Club – Silicon Valley
Faster Can Be Cooler

       Single-Core Processor                                    CMT Processor
                                           107C
                                                                    C1 C2 C3 C4
                                           102C

                                            96C

                                            91C

                                            85C

                                            80C

                                            74C

                                            69C

                                            63C

                                            58C                     C5 C6 C7 C8

                                    (Not to Scale)
www.opensparc.net                                                                 13
                        Design Verification Club – Silicon Valley
UltraSPARC-T1: Some Design Choices

                                          • Simpler core architecture to maximize
                                            cores on die
                                          • Caches, dram channels shared across
                                            cores give better area utilization
                                          • Shared L2 decreases cost of
                                            coherence misses by an order of
                                            magnitude
                                          • On die memory controllers reduce miss
                                            latency
                                          • Crossbar good for b/w, latency,
                                            functional verification
                                          • 378mm2 die in 90nm dissipating ~70W



www.opensparc.net                                                      14
                    Design Verification Club – Silicon Valley
UltraSPARC-T1 Processor Core
  MUL
                                 ●   Four threads per core
                                 ●   Single issue 6 stage pipeline
                          EXU
                                 ● 16KB I-Cache, 8KB D-Cache
                                 > Unique resources per thread
                                   > Registers
     IFU
                                   > Portions of I-fetch datapath
                                   > Store and Miss buffers
                                 > Resources shared by 4 threads
   MMU              LSU
                                   > Caches, TLBs, Execution Units
                                   > Pipeline registers and DP
                                 ●   Core Area = 11mm2 in 90nm
     TRAP
                                 ●   MT adds ~20% area to core



www.opensparc.net               Design Verification Club – Silicon Valley   15
SPARC Core Pipeline
          Fetch          Thrd Sel            Decode               Execute              Memory              WB
                                                  Regfile
                                                                                  Modular Exponientation Unit
                                                  x4


                                                                        Alu                  DCache
        ICache          Inst          Thrd                              Mul                  Dtlb
        Itlb            buf x 4       Sel        Decode                 Shft                                Crossbar
                                                                                             Stbuf x 4      Interface
                                      Mux                               Div


                    Thread selects                                     Instruction type
                                               Thread                  misses
                                               select                  traps & interrupts
                                               logic                   resource conflicts

                 Thrd      PC logic
                 Sel       x4
                 Mux


www.opensparc.net                                                                                     16
                                      Design Verification Club – Silicon Valley
Thread Selection Policy
    ●   Switch between available threads every cycle giving
        priority to least recently executed thread
    ●   Threads become unavailable due to:
        ●    Long latency ops like loads, branch, mul, div.
        ●    Pipeline stalls such as cache misses, traps, and resource conflicts
    ●   Loads are speculated as cache hits, and the thread is switched in
        with lower priority




www.opensparc.net                                                             17
                                  Design Verification Club – Silicon Valley
CMT Benefits

                          Performance

                          Cost
                           ●   Fewer servers
                           ●   Less floor space
                           ●   Reduced power consumption
                           ●   Less air conditioning
                           ●   Lower administration and
                               maintenance


                           Reliability
www.opensparc.net                                               18
                    Design Verification Club – Silicon Valley
CMT Pays Off with CoolThreads Technology                              TM



     Sun Fire T1000 and T2000




                 ● Up to 5x the performance
                 ● As low as 1/5 the energy

                 ● As small as 1/4 the size



www.opensparc.net
    *See disclosures                                                        19
                           Design Verification Club – Silicon Valley
CoolThreads Servers are a Hit


    “...performance in several                            “These servers could save
    profiles unmatched for the                              companies millions.”
  power and space it consumes.”




      “...the machines put Sun at the                           “...the UltraSparc T1 is
      cutting edge of one of the chip                            truly a revolutionary
         industry's biggest trends...                                  processor.”
            multi-core systems.”


www.opensparc.net                                                                20
                            Design Verification Club – Silicon Valley
New wave requires rethinking everything




                                                          Why not
                                                        open source
                                                         hardware?


www.opensparc.net                                               21
                    Design Verification Club – Silicon Valley
World's First Open Source Microprocessor
                                                           OpenSPARC.net
                                                    • Governed by GPL (2)
                                                    • Complete chip architecture
                                                    • Register Transfer Logic (RTL)
                                                    • Hypervisor API
                                                    • Verification suite and
                                                      architectural models
                                                    • Simulation model for Solaris
                                                      bringup on s/w




www.opensparc.net                                                       22
                    Design Verification Club – Silicon Valley
Virtualisation

    • Thin software layer between OS and platform                            sun4v virtual machine
      hardware
                                                                                    User      User           User
                                                                                    App       App            App
    • Para-virtualised OS
                                                                                              Solaris

    • Hypervisor + sun4v interface                                                 OpenBoot
       • Virtualises machine HW and isolates OS from register-
         level
       • Delivered with platform not OS                                                    Hypervisor
       • Not itself an OS
                                                                                       SPARC hardware



                                                                    stable
                                                                  interface
                                                                   “sun4v”
www.opensparc.net                                                                                       23
                                       Design Verification Club – Silicon Valley
It’s about
                                                           Participation
www.opensparc.net                                                   24
                    Design Verification Club – Silicon Valley
UltraSPARC T1 - Verification
• Philosophy
    > Verify once and then leverage, e.g. SPARC core, L2$ bank
• Strategy
    > Directed “Black box” testing for architectural, “cycle”
      independent features
    > Stand Alone Testing to flush out all basic bugs allows for more
      “independent bug peeling” and better controllability
    > Directed Random “White box” testing with Functional Coverage
      Feedback
    > Some formal methods, code review in high risk areas



www.opensparc.net         Design Verification Club – Silicon Valley   25
Architecture Verification
• Cross checking with Architectural Simulator (AS)
    > Checks “Architectural state”, but follows RTL on certain
        conditions like memory order
• Program Reference Manual (PRM) Coverage
    > Aim to “qualify” PRM with set of diags
    > Exhaustive set of directed diags, very basic operations, with
      certain coverage goals
    > Timing independent, only checks for functionality
    > Divided in opcode groups – ALU, State-regs, Control-Transfer,
      Mem-ops
    > Traps, ASI & MMU, State-register coverage


www.opensparc.net         Design Verification Club – Silicon Valley   26
Architecture Verification (2)
• Self-Checking Diags
    > Covers areas where AS lags
         >Especially in Diagnostics, IO/DMA, tick-regs, errors, reset
          and self-modifying code
         >Memory model verification
         >Performance diags




www.opensparc.net          Design Verification Club – Silicon Valley   27
Architecture Verification (3)
• System Interface Verification
    > Producer-Consumer variations between SPARC core and I/O
      devices for memory and I/O addresses
    > All threads accessibility to DRAM controller and I/O's
    > Traps on unsupported instructions
• Reset Verification
    > Power-up state of the machine as per PRM
    > Chip state after warm reset, including DRAM controller and I/O
• Debug Support Verification
    > Cache set associativity functionality
    > Error injection, clock-stop, TAP access

www.opensparc.net        Design Verification Club – Silicon Valley   28
Microarchitecture Verification
                                               Function-Focused
                                               MT           Coherency                RAS ... Perf Arch
            Unit-Focused Testing


                                   Fetch
                                   Execute
                                   TLB
                                   Ld/St
                                   Traps
                                   System
                                   Float
                                   Switch
                                   L2
                                   DRAM

www.opensparc.net                            Design Verification Club – Silicon Valley              29
Microarchitecture Verification (2)
• Unit focused Verification
    > Write Checkers
    > Write Directed, C/Perl, Internal code generator diags to verify
      functionality
    > All major blocks Pipe, Traps, TLB, Load Store, Streaming,
      Floating point, crossbar, L2 and DRAM controller
• Function focused Verification
    > Rely on AS and functional checkers
    > Generate directed and pseudo-random
    > Total Store Order (memory ordering), Multi-threading,
      RAS/Errors
    > Notice that function focused verification spans across blocks
www.opensparc.net        Design Verification Club – Silicon Valley   30
Memory verification using Symbolic
Simulation               testbench file (.tb)




             Transistor                            SPICE Netlist
             Schematic                                                                                   RTL




                                                                                 ssf -runcv

                                                                             s                Fa
Main Steps                                                             Pas                         il
1. Schematic to SPICE netlist generation                                                                debug
                                                      coverage analysis
2. Testbench generation
3. Symbolic simulation and debug
4. Symbolic coverage analysis



www.opensparc.net                Design Verification Club – Silicon Valley                                  31
Clock Domain Checking Verification
• Clock Domain Checking (CDC)
    > Static analysis of clock domain crossings
• Key features of CDC
    >   Identify clock domain crossings
    >   Detect missing synchronizers
    >   Identify incorrect trasfer protocols
    >   Verify reconvergence of independently synchronized signals
• Found 8 bugs across 6000 crossings in 3 clock domains



www.opensparc.net         Design Verification Club – Silicon Valley   32
Performance Verification
• Goal to make sure performance of RTL model matches
  the Architectural performance simulator
• Single thread and multi-thread architectural and
  performance diags matches within 2-3% accuracy
• Run bigger Multi-threaded application traces under mini-
  OS kernel
• Found 50+ bugs




www.opensparc.net   Design Verification Club – Silicon Valley   33
Hardware Acceleration
• Prevent functional bug escapes by running long directed
  and random tests
• Facilitate early SW development and system integration
    > Booted Hypervisor, Open Boot Prom (OBP) and multi-threaded
        Solaris prior to Tapeout
• Aid Post-silicon debug and fix verification
    > Repeatability, check-point replay
• Ran estimated 5+ Trillion simulation cycles



www.opensparc.net          Design Verification Club – Silicon Valley   34
Functional Verification Tapeout Criteria
• 15B fullchip random cycles without a hardware failure
• Full chip bug rate < 1 per week
• Coverage
    > Functional coverage greater than 95%
    > Line/Condition coverage greater than 99%
    > Remaining untested coverage reviews and waivers
• All architectural diags including all PRM documented
  features written and passing
• Tester reset tested in RTL & full gate level simulation
• Hypervisor and OBP brought up in RTL
• No open bugs blocking bring up activities             35
www.opensparc.net      Design Verification Club – Silicon Valley
Verification
                     Metrics

www.opensparc.net                                                 36
                      Design Verification Club – Silicon Valley
RTL Stability, by week




www.opensparc.net   Design Verification Club – Silicon Valley   37
Weekly New Bugs




www.opensparc.net   Design Verification Club – Silicon Valley   38
Cumulative Bugs




www.opensparc.net   Design Verification Club – Silicon Valley   39
Coverage Objects




www.opensparc.net   Design Verification Club – Silicon Valley   40
System level Coverage




www.opensparc.net   Design Verification Club – Silicon Valley   41
Coverage
• 20K+ total coverage objects
                 Coverage Distrubution across Functional Blocks
    4500

    4000

    3500

    3000

    2500
                                                                                         Coverage Objects
    2000

    1500

    1000

     500

       0
           IFU   EXU   FFU   MMU SPU     TLU   LSU    MSS    FPU    DRA    IOB     JBI
                                                                    M




www.opensparc.net                      Design Verification Club – Silicon Valley                            42
Verification Effectiveness
                              Test Escapes                                          Escapes across various functionalities
45                                                                    16

40                                                                    14

35
                                                                      12
30
                                                                      10
25
                                                                       8
20
                                                                       6
15
                                                                       4
10

 5                                                                     2

 0                                                                     0
      Critical/Not-critical      Fixed/Feature   Workaround (hw/sw)         Fetch     Execution   Memory   I/O     Electrical        DFT




 •   30% Post TO1.0 bugs were critical
 •   80% of post TO1.0 bugs were fixed, rest became features
 •   Most of the bugs had hardware or software workaround
 •   Memory and I/O subsystem had most escapes
 •   Data compares quite well against other commercial CPUs
www.opensparc.net                                Design Verification Club – Silicon Valley                                      43
Feedback
                    "The positive effect of good verification I've noted in
                    recent project—Sun's Niagara . Projects had a close
                    working relationship between the design team and the
                    verification team. As a result of the close teamwork, very
                    early silicon worked extremely well. On many larger
                    design projects, the design team builds the design and
                    then "throws it over the wall" to the verification team.
                    This bureaucratic approach often appears in entrenched
                    organizations of larger, established companies. Smaller
                    startups cannot afford such a
                    luxury, and the Niagara team was formed as a startup
                    (Afara WebSystems) and then bought by Sun
                    Microsystems. The Afara team managed to maintain its
                    team approach even after being swallowed up by the
                    much larger Sun."
                    -- Kevin Krewell, Analyst Microprocessor report 5/31/05


www.opensparc.net                                                             44
                               Design Verification Club – Silicon Valley
About the Community: opensparc.net
                           Clustermaps for http://opensparc.net




                     Innovation
                      will happen everywhere



www.opensparc.net
                    Innovation Happens Everywhere                        45
                             Design Verification Club – Silicon Valley
Get the Source, Start Innovating
                                 20+ GB/s read/write
        DDR2 DIMM                DDR2 DIMM                  DDR2 DIMM                    DDR2 DIMM



              16B @ 333 MT/s
                                                                                                                Things you can do:
                                                                                                                - use as is
   3MB L2$
                     Memory             Memory                Memory             Memory                         - add/delete cores
                    Controller         Controller            Controller         Controller
                                                                                                                - add new instr.
                    L2$ Bank
                    L2$ Bank           L2$ Bank
                                       L2$ Bank              L2$ Bank
                                                             L2$ Bank             L2$ Bank
                                                                                 L2$ Bank                       - change FPU
                                                                                                                - add video/graphics
                                                 Crossbar
                                                 Crossbar                                          FPU
                                                                                                                - add network interface
                16KB I$    16KB I$   16KB I$    16KB I$    16KB I$   16KB I$   16KB I$   16KB I$                - change memory
                 8KB D$    8KB D$    8KB D$     8KB D$     8KB D$    8KB D$    8KB D$    8KB D$
                                                                                                                interface
                    C1      C2       C3          C4         C5        C6       C7        C8                     - change I/O interface
                                                                                                                - change cache/mem
                                                                                                                interface
                                                     Sys I/F                               4 threads per core   - etc.
                                               Buffer Switch Core


                                                            16B @ 200Mhz
                                      IO BUS                3.2GB/s peak, 2.5GB/s effective


       Innovate anywhere – within it or outside it
www.opensparc.net                                                                                                       46
                                                          Design Verification Club – Silicon Valley
OpenSPARC Communities
       Academia/Universities                                                    EDA Vendors
        Architecture, ISA, VLSI course work                                          Benchmarking
        Threading, Scaling, Parallelization                                          Reference flow
        Benchmarks                                                                   FPGA
                                                                                     Emulation
                                                                                     Verification
                                                                                     Physical Design
                                                                                     Multi-threaded tools

        CMT Tools
        Compilers, Threading
        Optimization                                      Hardware IP Suppliers
        Performance Analysis                                                   PCI cores, SERDES etc.
            Operating Systems
                    OpenSolaris,                      Chip Designers
                    Linux, BSD variants,                       SoC designs, Hard macros
                    Embedded OSs                               Telecom applications
www.opensparc.net                                                                           47
                                   Design Verification Club – Silicon Valley
64 Bits. 32 Threads. Free.


      Imagine what’s
          next...

 "Sun's decision to release Verilog source code for the UltraSPARC hardware design
 under a free software license is a historic step. Sun is showing its profound
 understanding of the forces shaping our technological future in making this decision.
                         -- Eben Moglen, founding director of the Software Freedom Law Center
 "The free world welcomes Sun's decision to use the Free Software Foundation's GNU
 GPL for the freeing of OpenSPARC. We'd love to see other hardware companies
 follow in Sun's footsteps."
                         -- Richard Stallman, Free Software Foundation

www.opensparc.net            Design Verification Club – Silicon Valley            48
Thank You!
Shrenik Mehta
Director, Frontend Technologies &
OpenSPARC Program
Sun Microsystems, Inc.


  www.opensparc.net
Conversations We Should Have Now


                    OpenSource                                Network with Peers

           (Software, hardware)



              OpenStandards
                                                                       Open Verification
       (SystemVerilog, Property
        Specification Language)

www.opensparc.net                Design Verification Club – Silicon Valley                 50
Legal Substantiation – Benchmarks
 •   Sun Fire T2000 (8 cores, 1 chip) 14,001 SPECweb2005. IBM p5 550 (4 cores, 2 chips) 7881 SPECweb2005. IBM
     eServer Xseries x346 (2 cores, 2 chips) 4348 SPECweb2005. SPEC, SPECweb reg tm of Standard Performance
     Evaluation Corporation. Sun Fire T2000 results submitted to SPEC. Other results from www.spec.org as of December 6,
     2005.
 •   SPECjAppServer2004 with BEA - Sun Fire T2000 (8 cores, 1 chip) 615.64 JOPS@Standard. SPECjAppServer2004 Sun
     Fire rx4600 (4 cores, 4 chip) 471.28 JOPS@Standard. SPEC, SPECjAppServer reg tm of Standard Performance
     Evaluation Corporation. Sun Fire T2000 results submitted to SPEC. Other results from www.spec.org as of 12/06/2005.
 •   SPECjAppServer2004 with Sun Java System Application Server. SPECjAppServer2004 Sun Fire T2000 (8 cores, 1 chip)
     436.71 JOPS@Standard. SPECjAppServer2004 HPrx4640 (4 cores, 4 chip) 471.28 JOPS@Standard. SPEC,
     SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Sun Fire T2000 results submitted to SPEC.
     Other results from www.spec.org as of 12/06/2005. Sun HW+SW application tier cost = $37,484.95, appl cost per JOP =
     $85.83. HP HW+SW application tier cost = $140,537.88, appl cost per JOP = $298.20 HP Bill of Material from
     http://www.spec.org/jAppServer2004/results/res2005q3/jAppServer2004-20050913-00016.html BEA pricing from
     http://www.awaretechnologies.com/BEA/index.html.System pricing dated 11/29/05
 •   Sun Fire T1000 Server (1 chip, 8 cores, 1-way) 51,540 bops, 12,885 bops/JVM. IBM x346 (2 chip, 4 cores, 4-way)
     39,585 bops, 39,585 bops/JVM. IBM p520 (1 chip, 2 cores, 2-way) 32,820 bops, 32,820 bops/JVM. Dell SC1425 (2 chip,
     2 cores, 2-way) 24,208 bops, 24,208 bops/JVM. SPEC, SPECjbb reg tm of Standard PerformanceEvaluation Corporation.
     Sun Fire T1000 results submitted to SPEC. Other resultss as of 12/6/2005 on www.spec.org




www.opensparc.net                                                                                       51
                                            Design Verification Club – Silicon Valley
Legal Substantiation – Benchmarks
      (Cont'd)
  •   NotesBench R6iNotes Sun Fire T2000 (1x1200 MHz UltraSPARC T1, 32GB), 4 partitions, Solaris[TM] 10, Lotus[R] Domino
      7.0, 19,000 users, $4.36 per user, 16,061 NotesMark tpm, 400 ms avg rt. NotesBench R6iNotes IBM x346 (2 x 3.4 GHz
      Xeon processors, 8GB), 1 partition, SuSE Linux 8, Lotus[R] DominoR6.5.3, 6,50 users, $9.07 per user, 5,109 NotesMark
      tpm, 569 ms avg rt.More info www.notesbench.org
  •   Portal tests conducted on Sun Fire Sun Fire T2000 configured with 6 cores, 1.0GHz UltraSPARC T1 processor and 16GB
      memory. Dell 6650 configured with 4 x Intel Xeon processors at 2GHz and 4GB memory. 1 processor was active to run
      the test. Internal test using SLAMD Distributed Load Generation Engine AE & Mercury Load Runner. Both systems were
      installed with Solaris 10 OS and Sun Java Portal Server 7. Test date: 11/14/05.
  •   Crypto Processing RSA & DSA sign operations @ 1024-bit




www.opensparc.net                                                                                       52
                                            Design Verification Club – Silicon Valley
Legal Substantiation – Benchmarks
      (Cont'd)
  •   Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1 processor, 8 cores, 32 threads)
      1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg resp, MaxDB 7.5 database, Solaris 10.
      SAP certification number was not available at press time, please see: www.sap.com/benchmark. Benchmark data
      submitted for approval: 950 SD Users (Sales &Distribution), Ave. dialog resp. time: 1.91 seconds, Throughput: Fully
      processed order line items/hour:95,670, Dialog steps/hour: 287,000, SAPS: 4,780, Average DB req. time (dia/upd): 0.080
      sec / 0.157 sec, CPU utilization of central server: 99%, central server OS: Solaris 10, RDBMS: MaxDB 7.5, SAP ECC

      Release: 5.0, Configuration: Sun Fire Model T2000, 1 processor/ 8 cores / 32 threads, UltraSPARC T1, 1200 MHz, 64
      KB(D) + 128 KB(I) L1 cache, 3 MB L2 cache, 32 GB main memory. Two-tier SAP SD Standard Application Benchmark
      Release 4.70 (64-bit)results for the HP ProLiant DL580 (4-way, 4 procs, 4 cores, 4 threads) included 4x 3.33 Ghz Xeon, 32
      GB mem, 937 SAP SD Benchmark users,1.96 sec avg response time, Cert#2005012, running Microsoft® Windows Server

      2003, Enterprise x64 Edition (64-bit) and Microsoft SQL Server 2000 Enterprise Edition (32-bit), certified on March 29,
      2005. Two-tier SAP SD Standard Application Benchmark Release 4.70 (64-bit) results for the HP rx4640 (4-way, 4 procs, 4
      cores, 4 threads) included 4x 1.5 Ghz Itanium2, 32 GB mem, 880 SAP SD Benchmark users, 1.89 sec avg response time,
      Cert#2004030, running HP-UX 11i,Oracle 9i certified on June 4, 2004. More information on SAP Benchmark results can be
      found at www.sap.com/benchmark




www.opensparc.net                                                                                           53
                                              Design Verification Club – Silicon Valley

OpenSPARC T1 Processor

  • 1.
    OpenSPARC T1 Processor DV Club – Silicon Valley May 23rd, 2006 Shrenik Mehta Director, Frontend Technologies & OpenSPARC Program System Group Sun Microsystems, Inc. www.opensparc.net
  • 2.
    Agenda 1. New wave – Chip Multi-Threading (CMT) 2. UltraSPARC T1 3. Participation Age – UltraSPARC T1 verification 4. OpenSPARC™ community www.opensparc.net 2 Design Verification Club – Silicon Valley
  • 3.
    Making the RightWaves Chip Multi-threading (CoolThreads ) TM Symmetrical Multi-processing (SMP) Improved Price/Performance Reduced Instruction Set Computing (RISC) 1980 1990 2000 2010 www.opensparc.net 3 Design Verification Club – Silicon Valley
  • 4.
    Attributes of CommercialWorkloads Web Services Client Server Data Warehouse TIER1 TIER2 TIER3 SAP 2T SAP 3T DSS Attribute Web App Serv Data (DB) (TPC-H) (Web99) (JBB) (TPC-C) Application Web Server OLTP ERP ERP DSS Category Server Java Instruction-level Low Low Low Medium Low High Parallelism Thread-level High High High High High High Parallelism Instruction/Data Large Large Large Medium Large Large Working Set Data Sharing Low Medium High Medium High Medium www.opensparc.net 4 Design Verification Club – Silicon Valley
  • 5.
    Memory Bottleneck Relative Performance 10000 CPU Frequency 2x Every 2 Years DRAM Speeds 1000 100 Gap 10 2x Every 6 Years 1 1980 1985 1990 1995 2000 2005 www.opensparc.netWorld Source: Sun Wide Analyst Conference Feb. 25, 2003 5 Design Verification Club – Silicon Valley
  • 6.
    Single Threading HURRY Up to 85% Cycles Waiting for Memory UP AND WAIT! Single Threaded Performance Typical Processor Utilization:15–25% Thread C M C M C M Tim Memory Latency Compute e www.opensparc.net 6 Design Verification Club – Silicon Valley
  • 7.
    The Power ofCMT - CoolThreads UltraSPARC T1 Single Threaded Performance Utilization: Processor Up Chip Multi-threaded to 85% (CMT) Performance Thread 4 C M C M C M Thread 3 C M C M C M Thread 2 C M C M C M Thread 1 C M C M C M Tim Memory Latency Compute e www.opensparc.net 7 Design Verification Club – Silicon Valley
  • 8.
    Chip Multi-Threading (CMT)to the rescue CMP HMT CMT (chip multiprocessing) (hardware multithreading) (chip multithreading) n cores per processor m strands per core n x m threads per processor www.opensparc.net 8 Design Verification Club – Silicon Valley
  • 9.
    Why CMT Works “Goal: 100% Resource Utilization” 10.0 Relative Performance on thread- Multi-Thread, Multi-Core rich memory bound workloads Multi-Thread, Single-Core 2.0 1.0 Single-Thread, Single-Core SPARC: 4 threads per core .05 ● Increases core die area by ~20% ● Improves performance by ~50–100% 20% Core Size Maximum www.opensparc.net 9 Design Verification Club – Silicon Valley
  • 10.
    Example - SpecJBBExecution Efficiency Idle Time Single 3.79 cycles 1 = 21% Threaded 1 + 3.79 Efficiency Idle Time 1.56 cycles 4 = 72% Four 4 + 1.56 Efficiency Threaded Cycles 0 4 8 Compute Pipeline Conflict Pipeline Latency Memory Latency A. S. Leon et al., “A Power-Efficient High Throughput 32-Thread SPARC Processor,” ISSCC06, Paper 5.1 Copyright Sun Microsystems 2006, Sun Microsystems, Inc. All rights reserved. Used by permission. Page 10
  • 11.
    UltraSPARC T1 Processor • SPARC V9 implementation DDR-2 DDR-2 DDR-2 DDR-2 SDRAM SDRAM SDRAM SDRAM • Up to eight 4-way multi- threaded cores for up to 32 simultaneous threads • All cores connected through a 134.4GB/s crossbar switch L2$ L2$ L2$ L2$ • High-bandwidth 12-way associative Xbar FPU 3MB Level-2 cache on chip • 4 DDR2 channels (23GB/s) C1 C2 C3 C4 C5 C6 C7 C8 • Power : < 80W • ~300M transistors Sys I/F Buffer Switch • 378 sq. mm die Core 1 of 8 Cores BUS www.opensparc.net 11 Design Verification Club – Silicon Valley
  • 12.
    CMT: On-chip =High Bandwidth Traditional SMP: 32 Threads Example: Typical SMP Machine Configuration Niagara: 32 Threads One motherboard, no switch ASICs P P P P P P P P I Switch I O Switch O Mem Ctlr M M M M M M M M P P L2 P P P P P P P P L2 Mem Ctlr XBar I P P I O Switch Switch O Switch P P L2 P P L2 Mem Ctlr Switch Switch M M M M M M M M P P P P P P P P Mem Ctlr I I I/O O Switch Switch O M M M M M M M M Direct crossbar interconnect P P P P P P P P I I Lower cost, better RAS, lower BTUs, O Switch Switch O lower and uniform latency, M M M M M M M M greater and uniform bandwidth. . . www.opensparc.net 12 Design Verification Club – Silicon Valley
  • 13.
    Faster Can BeCooler Single-Core Processor CMT Processor 107C C1 C2 C3 C4 102C 96C 91C 85C 80C 74C 69C 63C 58C C5 C6 C7 C8 (Not to Scale) www.opensparc.net 13 Design Verification Club – Silicon Valley
  • 14.
    UltraSPARC-T1: Some DesignChoices • Simpler core architecture to maximize cores on die • Caches, dram channels shared across cores give better area utilization • Shared L2 decreases cost of coherence misses by an order of magnitude • On die memory controllers reduce miss latency • Crossbar good for b/w, latency, functional verification • 378mm2 die in 90nm dissipating ~70W www.opensparc.net 14 Design Verification Club – Silicon Valley
  • 15.
    UltraSPARC-T1 Processor Core MUL ● Four threads per core ● Single issue 6 stage pipeline EXU ● 16KB I-Cache, 8KB D-Cache > Unique resources per thread > Registers IFU > Portions of I-fetch datapath > Store and Miss buffers > Resources shared by 4 threads MMU LSU > Caches, TLBs, Execution Units > Pipeline registers and DP ● Core Area = 11mm2 in 90nm TRAP ● MT adds ~20% area to core www.opensparc.net Design Verification Club – Silicon Valley 15
  • 16.
    SPARC Core Pipeline Fetch Thrd Sel Decode Execute Memory WB Regfile Modular Exponientation Unit x4 Alu DCache ICache Inst Thrd Mul Dtlb Itlb buf x 4 Sel Decode Shft Crossbar Stbuf x 4 Interface Mux Div Thread selects Instruction type Thread misses select traps & interrupts logic resource conflicts Thrd PC logic Sel x4 Mux www.opensparc.net 16 Design Verification Club – Silicon Valley
  • 17.
    Thread Selection Policy ● Switch between available threads every cycle giving priority to least recently executed thread ● Threads become unavailable due to: ● Long latency ops like loads, branch, mul, div. ● Pipeline stalls such as cache misses, traps, and resource conflicts ● Loads are speculated as cache hits, and the thread is switched in with lower priority www.opensparc.net 17 Design Verification Club – Silicon Valley
  • 18.
    CMT Benefits Performance Cost ● Fewer servers ● Less floor space ● Reduced power consumption ● Less air conditioning ● Lower administration and maintenance Reliability www.opensparc.net 18 Design Verification Club – Silicon Valley
  • 19.
    CMT Pays Offwith CoolThreads Technology TM Sun Fire T1000 and T2000 ● Up to 5x the performance ● As low as 1/5 the energy ● As small as 1/4 the size www.opensparc.net *See disclosures 19 Design Verification Club – Silicon Valley
  • 20.
    CoolThreads Servers area Hit “...performance in several “These servers could save profiles unmatched for the companies millions.” power and space it consumes.” “...the machines put Sun at the “...the UltraSparc T1 is cutting edge of one of the chip truly a revolutionary industry's biggest trends... processor.” multi-core systems.” www.opensparc.net 20 Design Verification Club – Silicon Valley
  • 21.
    New wave requiresrethinking everything Why not open source hardware? www.opensparc.net 21 Design Verification Club – Silicon Valley
  • 22.
    World's First OpenSource Microprocessor OpenSPARC.net • Governed by GPL (2) • Complete chip architecture • Register Transfer Logic (RTL) • Hypervisor API • Verification suite and architectural models • Simulation model for Solaris bringup on s/w www.opensparc.net 22 Design Verification Club – Silicon Valley
  • 23.
    Virtualisation • Thin software layer between OS and platform sun4v virtual machine hardware User User User App App App • Para-virtualised OS Solaris • Hypervisor + sun4v interface OpenBoot • Virtualises machine HW and isolates OS from register- level • Delivered with platform not OS Hypervisor • Not itself an OS SPARC hardware stable interface “sun4v” www.opensparc.net 23 Design Verification Club – Silicon Valley
  • 24.
    It’s about Participation www.opensparc.net 24 Design Verification Club – Silicon Valley
  • 25.
    UltraSPARC T1 -Verification • Philosophy > Verify once and then leverage, e.g. SPARC core, L2$ bank • Strategy > Directed “Black box” testing for architectural, “cycle” independent features > Stand Alone Testing to flush out all basic bugs allows for more “independent bug peeling” and better controllability > Directed Random “White box” testing with Functional Coverage Feedback > Some formal methods, code review in high risk areas www.opensparc.net Design Verification Club – Silicon Valley 25
  • 26.
    Architecture Verification • Crosschecking with Architectural Simulator (AS) > Checks “Architectural state”, but follows RTL on certain conditions like memory order • Program Reference Manual (PRM) Coverage > Aim to “qualify” PRM with set of diags > Exhaustive set of directed diags, very basic operations, with certain coverage goals > Timing independent, only checks for functionality > Divided in opcode groups – ALU, State-regs, Control-Transfer, Mem-ops > Traps, ASI & MMU, State-register coverage www.opensparc.net Design Verification Club – Silicon Valley 26
  • 27.
    Architecture Verification (2) •Self-Checking Diags > Covers areas where AS lags >Especially in Diagnostics, IO/DMA, tick-regs, errors, reset and self-modifying code >Memory model verification >Performance diags www.opensparc.net Design Verification Club – Silicon Valley 27
  • 28.
    Architecture Verification (3) •System Interface Verification > Producer-Consumer variations between SPARC core and I/O devices for memory and I/O addresses > All threads accessibility to DRAM controller and I/O's > Traps on unsupported instructions • Reset Verification > Power-up state of the machine as per PRM > Chip state after warm reset, including DRAM controller and I/O • Debug Support Verification > Cache set associativity functionality > Error injection, clock-stop, TAP access www.opensparc.net Design Verification Club – Silicon Valley 28
  • 29.
    Microarchitecture Verification Function-Focused MT Coherency RAS ... Perf Arch Unit-Focused Testing Fetch Execute TLB Ld/St Traps System Float Switch L2 DRAM www.opensparc.net Design Verification Club – Silicon Valley 29
  • 30.
    Microarchitecture Verification (2) •Unit focused Verification > Write Checkers > Write Directed, C/Perl, Internal code generator diags to verify functionality > All major blocks Pipe, Traps, TLB, Load Store, Streaming, Floating point, crossbar, L2 and DRAM controller • Function focused Verification > Rely on AS and functional checkers > Generate directed and pseudo-random > Total Store Order (memory ordering), Multi-threading, RAS/Errors > Notice that function focused verification spans across blocks www.opensparc.net Design Verification Club – Silicon Valley 30
  • 31.
    Memory verification usingSymbolic Simulation testbench file (.tb) Transistor SPICE Netlist Schematic RTL ssf -runcv s Fa Main Steps Pas il 1. Schematic to SPICE netlist generation debug coverage analysis 2. Testbench generation 3. Symbolic simulation and debug 4. Symbolic coverage analysis www.opensparc.net Design Verification Club – Silicon Valley 31
  • 32.
    Clock Domain CheckingVerification • Clock Domain Checking (CDC) > Static analysis of clock domain crossings • Key features of CDC > Identify clock domain crossings > Detect missing synchronizers > Identify incorrect trasfer protocols > Verify reconvergence of independently synchronized signals • Found 8 bugs across 6000 crossings in 3 clock domains www.opensparc.net Design Verification Club – Silicon Valley 32
  • 33.
    Performance Verification • Goalto make sure performance of RTL model matches the Architectural performance simulator • Single thread and multi-thread architectural and performance diags matches within 2-3% accuracy • Run bigger Multi-threaded application traces under mini- OS kernel • Found 50+ bugs www.opensparc.net Design Verification Club – Silicon Valley 33
  • 34.
    Hardware Acceleration • Preventfunctional bug escapes by running long directed and random tests • Facilitate early SW development and system integration > Booted Hypervisor, Open Boot Prom (OBP) and multi-threaded Solaris prior to Tapeout • Aid Post-silicon debug and fix verification > Repeatability, check-point replay • Ran estimated 5+ Trillion simulation cycles www.opensparc.net Design Verification Club – Silicon Valley 34
  • 35.
    Functional Verification TapeoutCriteria • 15B fullchip random cycles without a hardware failure • Full chip bug rate < 1 per week • Coverage > Functional coverage greater than 95% > Line/Condition coverage greater than 99% > Remaining untested coverage reviews and waivers • All architectural diags including all PRM documented features written and passing • Tester reset tested in RTL & full gate level simulation • Hypervisor and OBP brought up in RTL • No open bugs blocking bring up activities 35 www.opensparc.net Design Verification Club – Silicon Valley
  • 36.
    Verification Metrics www.opensparc.net 36 Design Verification Club – Silicon Valley
  • 37.
    RTL Stability, byweek www.opensparc.net Design Verification Club – Silicon Valley 37
  • 38.
    Weekly New Bugs www.opensparc.net Design Verification Club – Silicon Valley 38
  • 39.
    Cumulative Bugs www.opensparc.net Design Verification Club – Silicon Valley 39
  • 40.
    Coverage Objects www.opensparc.net Design Verification Club – Silicon Valley 40
  • 41.
    System level Coverage www.opensparc.net Design Verification Club – Silicon Valley 41
  • 42.
    Coverage • 20K+ totalcoverage objects Coverage Distrubution across Functional Blocks 4500 4000 3500 3000 2500 Coverage Objects 2000 1500 1000 500 0 IFU EXU FFU MMU SPU TLU LSU MSS FPU DRA IOB JBI M www.opensparc.net Design Verification Club – Silicon Valley 42
  • 43.
    Verification Effectiveness Test Escapes Escapes across various functionalities 45 16 40 14 35 12 30 10 25 8 20 6 15 4 10 5 2 0 0 Critical/Not-critical Fixed/Feature Workaround (hw/sw) Fetch Execution Memory I/O Electrical DFT • 30% Post TO1.0 bugs were critical • 80% of post TO1.0 bugs were fixed, rest became features • Most of the bugs had hardware or software workaround • Memory and I/O subsystem had most escapes • Data compares quite well against other commercial CPUs www.opensparc.net Design Verification Club – Silicon Valley 43
  • 44.
    Feedback "The positive effect of good verification I've noted in recent project—Sun's Niagara . Projects had a close working relationship between the design team and the verification team. As a result of the close teamwork, very early silicon worked extremely well. On many larger design projects, the design team builds the design and then "throws it over the wall" to the verification team. This bureaucratic approach often appears in entrenched organizations of larger, established companies. Smaller startups cannot afford such a luxury, and the Niagara team was formed as a startup (Afara WebSystems) and then bought by Sun Microsystems. The Afara team managed to maintain its team approach even after being swallowed up by the much larger Sun." -- Kevin Krewell, Analyst Microprocessor report 5/31/05 www.opensparc.net 44 Design Verification Club – Silicon Valley
  • 45.
    About the Community:opensparc.net Clustermaps for http://opensparc.net Innovation will happen everywhere www.opensparc.net Innovation Happens Everywhere 45 Design Verification Club – Silicon Valley
  • 46.
    Get the Source,Start Innovating 20+ GB/s read/write DDR2 DIMM DDR2 DIMM DDR2 DIMM DDR2 DIMM 16B @ 333 MT/s Things you can do: - use as is 3MB L2$ Memory Memory Memory Memory - add/delete cores Controller Controller Controller Controller - add new instr. L2$ Bank L2$ Bank L2$ Bank L2$ Bank L2$ Bank L2$ Bank L2$ Bank L2$ Bank - change FPU - add video/graphics Crossbar Crossbar FPU - add network interface 16KB I$ 16KB I$ 16KB I$ 16KB I$ 16KB I$ 16KB I$ 16KB I$ 16KB I$ - change memory 8KB D$ 8KB D$ 8KB D$ 8KB D$ 8KB D$ 8KB D$ 8KB D$ 8KB D$ interface C1 C2 C3 C4 C5 C6 C7 C8 - change I/O interface - change cache/mem interface Sys I/F 4 threads per core - etc. Buffer Switch Core 16B @ 200Mhz IO BUS 3.2GB/s peak, 2.5GB/s effective Innovate anywhere – within it or outside it www.opensparc.net 46 Design Verification Club – Silicon Valley
  • 47.
    OpenSPARC Communities Academia/Universities EDA Vendors Architecture, ISA, VLSI course work Benchmarking Threading, Scaling, Parallelization Reference flow Benchmarks FPGA Emulation Verification Physical Design Multi-threaded tools CMT Tools Compilers, Threading Optimization Hardware IP Suppliers Performance Analysis PCI cores, SERDES etc. Operating Systems OpenSolaris, Chip Designers Linux, BSD variants, SoC designs, Hard macros Embedded OSs Telecom applications www.opensparc.net 47 Design Verification Club – Silicon Valley
  • 48.
    64 Bits. 32Threads. Free. Imagine what’s next... "Sun's decision to release Verilog source code for the UltraSPARC hardware design under a free software license is a historic step. Sun is showing its profound understanding of the forces shaping our technological future in making this decision. -- Eben Moglen, founding director of the Software Freedom Law Center "The free world welcomes Sun's decision to use the Free Software Foundation's GNU GPL for the freeing of OpenSPARC. We'd love to see other hardware companies follow in Sun's footsteps." -- Richard Stallman, Free Software Foundation www.opensparc.net Design Verification Club – Silicon Valley 48
  • 49.
    Thank You! Shrenik Mehta Director,Frontend Technologies & OpenSPARC Program Sun Microsystems, Inc. www.opensparc.net
  • 50.
    Conversations We ShouldHave Now OpenSource Network with Peers (Software, hardware) OpenStandards Open Verification (SystemVerilog, Property Specification Language) www.opensparc.net Design Verification Club – Silicon Valley 50
  • 51.
    Legal Substantiation –Benchmarks • Sun Fire T2000 (8 cores, 1 chip) 14,001 SPECweb2005. IBM p5 550 (4 cores, 2 chips) 7881 SPECweb2005. IBM eServer Xseries x346 (2 cores, 2 chips) 4348 SPECweb2005. SPEC, SPECweb reg tm of Standard Performance Evaluation Corporation. Sun Fire T2000 results submitted to SPEC. Other results from www.spec.org as of December 6, 2005. • SPECjAppServer2004 with BEA - Sun Fire T2000 (8 cores, 1 chip) 615.64 JOPS@Standard. SPECjAppServer2004 Sun Fire rx4600 (4 cores, 4 chip) 471.28 JOPS@Standard. SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Sun Fire T2000 results submitted to SPEC. Other results from www.spec.org as of 12/06/2005. • SPECjAppServer2004 with Sun Java System Application Server. SPECjAppServer2004 Sun Fire T2000 (8 cores, 1 chip) 436.71 JOPS@Standard. SPECjAppServer2004 HPrx4640 (4 cores, 4 chip) 471.28 JOPS@Standard. SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Sun Fire T2000 results submitted to SPEC. Other results from www.spec.org as of 12/06/2005. Sun HW+SW application tier cost = $37,484.95, appl cost per JOP = $85.83. HP HW+SW application tier cost = $140,537.88, appl cost per JOP = $298.20 HP Bill of Material from http://www.spec.org/jAppServer2004/results/res2005q3/jAppServer2004-20050913-00016.html BEA pricing from http://www.awaretechnologies.com/BEA/index.html.System pricing dated 11/29/05 • Sun Fire T1000 Server (1 chip, 8 cores, 1-way) 51,540 bops, 12,885 bops/JVM. IBM x346 (2 chip, 4 cores, 4-way) 39,585 bops, 39,585 bops/JVM. IBM p520 (1 chip, 2 cores, 2-way) 32,820 bops, 32,820 bops/JVM. Dell SC1425 (2 chip, 2 cores, 2-way) 24,208 bops, 24,208 bops/JVM. SPEC, SPECjbb reg tm of Standard PerformanceEvaluation Corporation. Sun Fire T1000 results submitted to SPEC. Other resultss as of 12/6/2005 on www.spec.org www.opensparc.net 51 Design Verification Club – Silicon Valley
  • 52.
    Legal Substantiation –Benchmarks (Cont'd) • NotesBench R6iNotes Sun Fire T2000 (1x1200 MHz UltraSPARC T1, 32GB), 4 partitions, Solaris[TM] 10, Lotus[R] Domino 7.0, 19,000 users, $4.36 per user, 16,061 NotesMark tpm, 400 ms avg rt. NotesBench R6iNotes IBM x346 (2 x 3.4 GHz Xeon processors, 8GB), 1 partition, SuSE Linux 8, Lotus[R] DominoR6.5.3, 6,50 users, $9.07 per user, 5,109 NotesMark tpm, 569 ms avg rt.More info www.notesbench.org • Portal tests conducted on Sun Fire Sun Fire T2000 configured with 6 cores, 1.0GHz UltraSPARC T1 processor and 16GB memory. Dell 6650 configured with 4 x Intel Xeon processors at 2GHz and 4GB memory. 1 processor was active to run the test. Internal test using SLAMD Distributed Load Generation Engine AE & Mercury Load Runner. Both systems were installed with Solaris 10 OS and Sun Java Portal Server 7. Test date: 11/14/05. • Crypto Processing RSA & DSA sign operations @ 1024-bit www.opensparc.net 52 Design Verification Club – Silicon Valley
  • 53.
    Legal Substantiation –Benchmarks (Cont'd) • Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1 processor, 8 cores, 32 threads) 1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg resp, MaxDB 7.5 database, Solaris 10. SAP certification number was not available at press time, please see: www.sap.com/benchmark. Benchmark data submitted for approval: 950 SD Users (Sales &Distribution), Ave. dialog resp. time: 1.91 seconds, Throughput: Fully processed order line items/hour:95,670, Dialog steps/hour: 287,000, SAPS: 4,780, Average DB req. time (dia/upd): 0.080 sec / 0.157 sec, CPU utilization of central server: 99%, central server OS: Solaris 10, RDBMS: MaxDB 7.5, SAP ECC Release: 5.0, Configuration: Sun Fire Model T2000, 1 processor/ 8 cores / 32 threads, UltraSPARC T1, 1200 MHz, 64 KB(D) + 128 KB(I) L1 cache, 3 MB L2 cache, 32 GB main memory. Two-tier SAP SD Standard Application Benchmark Release 4.70 (64-bit)results for the HP ProLiant DL580 (4-way, 4 procs, 4 cores, 4 threads) included 4x 3.33 Ghz Xeon, 32 GB mem, 937 SAP SD Benchmark users,1.96 sec avg response time, Cert#2005012, running Microsoft® Windows Server 2003, Enterprise x64 Edition (64-bit) and Microsoft SQL Server 2000 Enterprise Edition (32-bit), certified on March 29, 2005. Two-tier SAP SD Standard Application Benchmark Release 4.70 (64-bit) results for the HP rx4640 (4-way, 4 procs, 4 cores, 4 threads) included 4x 1.5 Ghz Itanium2, 32 GB mem, 880 SAP SD Benchmark users, 1.89 sec avg response time, Cert#2004030, running HP-UX 11i,Oracle 9i certified on June 4, 2004. More information on SAP Benchmark results can be found at www.sap.com/benchmark www.opensparc.net 53 Design Verification Club – Silicon Valley