SlideShare a Scribd company logo
1 of 63
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Essential
Performance
Advanced
Performance
Distributed
Performance
Efficient
Performance
Features Of Modern Intel
Microprocessors
Prepared By:
Krunal P Siddhapathak (10BEC097)
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core and Multi-Core Processor
What is a Core?
 A standard processor has one core (single-core.) Single core processors
only process one instruction at a time (they do use pipelines internally,
which allow several instructions to be processed together; however, they
are still run one at a time.)
What is a Multi-Core Processor?
 A multi-core processor is comprised of two or more independent cores,
each capable of processing individual instructions. A dual-core processor
contains two cores, a quad-core processor contains four cores, and a
hexa-core processor contains six cores.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Need Of Multi-Core Processors
Multiple cores can be used to run two programs side by side
and, when an intensive program is running, (AV Scan, Video
conversion, CD ripping etc.) you can utilize another core to
run your browser to check your email etc.
Multiple cores really shine when you’re using a program that
can utilize more than one core (called Parallelization) to
improve the program’s efficiency and addressability. Programs
such as graphic software, games etc. can run multiple
instructions at the same time and deliver faster, smoother
results.
If you use CPU-intensive software, multiple cores will likely
provide a better computing experience. If you use your PC to
check emails and watch the occasional video, you really don’t
need a multi-core processor.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core 2 Duo vs. Core i3 vs. Core i5
Core 2 Duo Core i3 Core i5
Number of
Threads
Two Four Four
Socket 775 (45/65nm) 1156 (nm) 1156 (nm)
Compatible
RAM
DDR2 DDR3 DDR3
Turbo Boost No No Yes
Overclocking No Yes No
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Do I need an i3, i5, or i7?
As with all computer hardware, the type of processor you
need depends on your needs, for how long you want your
computer to stay current, and your budget.
If you:
 Browse the internet, check email, and play the occasional flash game (like
Farmville): Get a single core netbook or desktop
 Do word processing, spreadsheets etc., listen to music often, and watch
movies, get an i3 processor (or any dual core processor i.e. core 2 duo)
 Play the occasional game and are happy with lower resolution and lower
quality graphics (my suggestion assumes the graphics processor on the
pre-built PC will be well-matched for the processor suggestions), watch
HD movies etc., get an i5.
 If you do graphic publishing, music creation, programming (and
compiling), watch HD movies, or like to play visually appealing games,
get a quad core i5, or i7.
 If you like to have the very best hardware and play the most graphically
intense games, get a quad core or hexa corei7 Extreme.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Sandy Bridge Microarchitecture
 Many of the bottlenecks of previous designs have been dealt with in the Sandy Bridge.
 Instruction fetch and predecoding has been a serious bottleneck in Intel designs for
many years. In the NetBurst architecture they tried to fix this problem by caching
decoded µops, without much success.
 In the Sandy Bridge design, they are caching instructions both before and after
decoding. The limited size of the µop cache is therefore less problematic, and the µop
cache appears to be very efficient. The limited number of register read ports has been a
serious, and often neglected, bottleneck since the old Pentium Pro.
 This bottleneck has now finally been removed in the Sandy Bridge. Previous Intel
processors have only one memory read port where AMD processors have two. This was a
bottleneck in many math applications. The Sandy Bridge has two read ports, whereby
this bottleneck is removed. The branch prediction has been improved by having bigger
buffers and a shorter misprediction penalty, but it has no loop predictor, and
mispredictions are still quite common.
 The new AVX instruction set is an important improvement. The throughput of floating
point addition and multiplication is doubled when the new 256-bit YMM registers are
used. The new non-destructive three-operand instructions are quite convenient for
reducing register pressure and avoiding register move instructions. There is, however, a
serious performance penalty for mixing vector instructions with and without the VEX
prefix. This penalty is easily avoided if the programming guidelines are followed, but I
suspect that it will be a very common programming error in the future to inadvertently
mix VEX and non-VEX instructions, and such errors will be difficult to detect.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Sandy Bridge Microarchitecture(Contd.)
Whenever the narrowest bottleneck is removed from a
system, the next less narrow bottleneck will become the
limiting factor. The new bottlenecks that require attention in
the Sandy Bridge are the following:
 The µop cache: This cache can ideally hold up to 1536 µops. The effective utilization
will be much less in most cases. The programmer should pay attention to make sure
the most critical inner loops fit into the µop cache.
 Instruction fetch and decoding: The fetch/decode rate has not been improved over
previous processors and is still a potential bottleneck for code that doesn’t fit into
the µop cache.
 Data cache bank conflicts: The increased memory read bandwidth means that the
frequency of cache conflicts will increase. Cache bank conflicts are almost
unavoidable in programs that utilize the memory ports to their maximum capacity.
 Branch prediction: While the branch history buffer and branch target buffers are
probably bigger than in previous designs, mispredictions are still quite common.
 Sharing of resources between threads: Many of the critical resources are shared
between the two threads of a core when hyperthreading is on. It may be wise to turn
off hyperthreading when multiple threads depend on the same execution resources.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Ivy Bridge Microarchitecture
Ivy Bridge is the codename for an Intel microprocessor using
the Sandy Bridge microarchitecture. The name is also applied
more broadly to the 22 nm die shrink of the microarchitecture
based on tri-gate ("3D") transistors, which is also used in the
future Ivy Bridge-EX and Ivy Bridge-EP microprocessors. Ivy
Bridge processors are backwards-compatible with the Sandy
Bridge platform, but might require a firmware update (vendor
specific). Intel has released new 7-series Panther Point
chipsets with integrated USB 3.0 to complement Ivy Bridge.
Volume production of Ivy Bridge chips began in the third
quarter of 2011. Quad-core and dual-core-mobile models
launched on April 29, 2012 and May 31, 2012 respectively.
Core i3 desktop processors, as well as the first 22 nm Pentium
were launched and available the first week of September,
2012.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Ivy Bridge Microarchitecture(Contd.)
How much faster are the Ivy Bridge processors?
 The base clock frequency of these processors ranges from 2.8
GHz (for Core i5-3450S) to 3.5 GHz (for Core i7-3770K).
What different types of the Ivy Bridge processors
are available?
 There are many types of processors in the Ivy Bridge family. The
type is indicated by putting a suffix to the CPU model name. The
following list explains these suffixes -
 K – Unlocked, ready to be overclocked.
 S – Performance optimized. Low power consumption.
 T – Power optimized. Ultra low power consumption.
 M – Mobile processors for mobile devices.
 Q – Quad core processors.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Ivy Bridge Microarchitecture(Contd.)
Features present in Ivy Bridge:
 HD graphics – Ivy Bridge processors have in-built GPU chip inside them.
The GPU supports DirectX 11 (Sandy Bridge supports version 10.1),
OpenGL 3.1 (Sandy Bridge supports version 3.0). Ivy Bridge processors
have the Intel HD4000/HD2500 GPU chips. This means that you do not
need an add-on graphics card.
 QuickSync Video – This feature is introduced in the Intel 3rd generation
processors. It uses dedicated media processing to make video creation
and conversion faster and easier. Whether you want to create DVDs,
create, convert and edit 3D/2D videos, upload to your favorite social
networking sites – everything is done in a jiffy.
 WiDi 3.0 – Wireless Display technology allows you to stream media
content to a multitude of your Wi-Fi connected display devices. You can
share a 1080p 60FPS video using WiDi.
 Turbo Boost Technology 2.0 – Using the Turbo boost technology, you can
make your Ivy Bridge processors run faster than their base frequency. For
example, a 3.5GHz iCore i7 can be made to run at 3.9 GHz for some time.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core 2 Duo vs. Core i3
 The Core 2 Duo is Intel's veteran, covering a wide range of price and
performance sweet spots. It is now being replaced, however, by
Intel's rookie Core i3. So, is the Core i3 actually better than the Core
2 Duo, or can you hold off upgrading for a while longer?
 The Core 2 Duo has been the processor of choice in laptops for about
three years. Over those three years the average speeds of Core 2
Duo processors have advanced significantly and many of today's
Core 2 Duo laptops have speeds of around 2.2 GHz or faster. Core 2
Duo processors have also been the go-to for many less expensive
desktop systems, with speeds reaching over 3 GHz.
 However, there is a newcomer which is challenging the Core 2 Duo.
This is the Core i3. It is very similar to the Core 2 Duo in many
ways. Both are dual-core processors and most Core 2 Duos and Core
i3 have similar clock speeds. However, the processors are based on
different architectures.
 So, which one is better?
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core 2 Duo vs. Core i3(Contd.)
Architecture
 The Core 2 Duo processors are based off the Core 2 architecture.
The Core and Core 2 architectures were arguably Intel's most
successful architectures, as they replaced the Pentium 4
processors in desktop systems and made Intel competitive in that
space once again.
 The Core i3 is based off a new architecture called Nehalem. The
Nehalem architecture has numerous advantages over the Core 2
architecture. Nehalem is better constructed for quad-core
processors, has hyper-threading available, and can use a feature
called Turbo Boost which maximizes processor speed. However,
because the Core i3 is the low-end Nehalem variant, most of
these features are disabled or not relevant - the Core i3 is a dual
core processor and Turbo Boost is disabled, but hyper-threading
is enabled.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core 2 Duo vs. Core i3(Contd.)
Processor Performance
 The Core i3 is the slowest variant of the Nehalem based processor. The
Core 2 Duo processors, however, don't have the same differentiation
between versions of the same architecture. The fastest Core 2 Duo
desktop processor has a speed of 3.33 GHz, while the fastest Core i3
desktop processor is clocked at 3.06 GHz.
 You might therefore expect that the Core 2 Duo would have the edge -
particularly when you consider that the Core 2 Duo costs almost three
times as much if you buy it individually - but in fact the Core i3 is faster,
and often by no small margin. The Core i3 is faster even in single-
threaded applications, but the performance gap really widens in multi-
threaded applications. This is because the Core i3 has hyper-threading,
which turns the two real cores into four virtual cores. Windows works with
the Core i3 as if it is a quad-core processor.
 These results remain true in the mobile space, as well. Core i3 processors
punch at least 500 MHz above their weight in single-thread applications,
and are virtually always faster in multi-threaded applications, no matter
the clock speeds of the Core 2 Duo and Core i3 processors you are
comparing.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core 2 Duo vs. Core i3(Contd.)
Power Usage and Heat
 A look at the technical specifications of the Core i3 processors automatically puts
them into a negative light when it comes to power consumption. The desktop Core i3
parts at listed as having a 73 Watt TDP, while most Core 2 Duo desktop parts have a
65 Watt TDP. In laptops the Core i3 has a 35 watt TDP, while Core 2 Duo mobile
processors usually have a 25 Watt TDP.
 These differences pan out about how you'd expect them to when it comes to
absolute power consumption. The Core i3 processors do consume just slightly more
power than Core 2 Duo processors at load and at idle. We're talking a difference of
around 10 Watts on desktops and a few on laptops - nothing huge, but a difference
none the less.
 However, when it comes to power efficiency the answer becomes less clear. In order
for a processor to be power efficient, it needs to not only have low power
consumption but also the ability to complete tasks quickly. This lowers the overall
"task energy" because a faster processor will be done with a task before a slower
processor, and once done it will slip back into an idle state.
 When viewed from this perspective, the Core i3 is much more efficient than the Core
2 Duo on both the desktop and the laptop. This means that the Core i3 will probably
not use any more power than a Core 2 Duo - and may actually use less - unless your
usage patterns place a constant load on your processor.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel
Core i3 Series
 Intel's Core i3 processor line has always been a budget option. These
processors remain dual-core, unlike the rest of the Core line, which is
made up of quad core processors. Intel's Core i3 processors also have
many features restricted.
 The main feature that is kept from the Core i3 processors is Turbo Boost,
the dynamic overclocking available on most Intel processors. This,
alongside with the dual-core design, accounts for most of the performance
difference between Core i3 processors and the i5 and i7 options.
 One feature that Core i3 has - and i5 doesn't - is hyper-threading. This is
Intel's logic-core duplication technology which allows each physical core to
be used as two logic cores. The result of this is that Windows will display a
dual-core Core i3 processor as if it were a quad-core.
 Finally, Core i3 processors have their integrated graphics processor
restricted to a maximum clock speed of 1100 MHz, and all Core i3
processors have the 2000 series IGP, which is restricted to 6 execution
cores. This will result in slightly lower IGP performance overall, but the
difference is frankly inconsequential in many situations.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel(Contd.)
Core i5 Series
 Intel used to split the Core i5 processor brand into two different
lines, one of which was dual-core and one of which was quad-
core.
 All Sandy Bridge Core i5 processors are quad-core processors,
they all have Turbo Boost, and they all lack Hyper-Threading.
Most of the Core i5 processors, besides the K series (explained
later) use the same 2000 series IGP with a maximum clock speed
of 1100 MHz and six execution cores.
 In the i3 vs. i5 vs. i7 battle, the Core i5 processor is now
obviously the main-stream option no matter which product you
buy. The only substantial difference between the Core i5 options
is the clock speed, which ranges from 2.8 GHz to 3.3 GHz.
Obviously, the products with a quicker clock speed are more
expensive than those that are slower.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel(Contd.)
Core i7 Series
 These processors are virtually identical to the Core i5. They have a 100
MHz higher base clock speed, which is inconsequential in most situations.
The real feature difference is the addition of hyper-threading on the Core
i7, which means that the processor will appear as an 8-core processor in
Windows. This improves threaded performance and can result in a
substantial boost if you're using a program that is able to take advantage
of 8 threads.
 Of course, most programs can't take advantage of 8 threads. Those that
can are almost usually meant for enterprise or advanced video editing
applications - 3D rendering programs, photo editing programs, and
scientific programs are categories of software frequently designed to use
8 threads. The average user is unlikely to see the full benefit of the hyper-
threading feature. In the Core i3 vs. i5 vs. i7 battle, the i7 has limited
appeal.
 The IGP on Core i7 processors can also reach a higher maximum clock
speed of 1350 MHz as I've said before; however, this difference is largely
inconsequential when measuring real-world performance.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel(Contd.)
The K series processor
 Late in the lifespan of Intel's previous Core i branded products;
Intel introduced the "K" series. These processors had unlocked
multipliers, making them easier to overclock.
 Intel has kept this line of products alive with the new Sandy
Bridge architecture by introducing a K series Core i5 and i7
processor. As before, these processors have unlocked multipliers.
However, they also have a new feature - better integrated
graphics processors.
 This comes in the form of the 3000 series IGP, which has 12
execution cores instead of 6. The maximum clock speed remains
limited by the processor brand - the Core i5 K is limited to 1100
MHz, while the Core i7 K can reach 1350 MHz the additional
execution cores can result in better performance in games,
although to honest, the IGP isn't remotely cut out for desktop
gaming.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel(Contd.)
The IGP Features: Sandy Bridge
 The most importance new feature added to Intel's Sandy Bridge
processors is the inclusion of an IGP on the processor. Intel did this
before with Core i3 and some Core i5 processors, but the IGP was still
separate from the processor itself - the IGP and CPU were placed on the
same piece of silicon, but didn't physically work together.
 Now Intel has taken the IGP integration a step further and worked the IGP
into the CPU architecture. It even shares cache with the processor. What
this means, in practical terms, is that the on-board graphics of Intel's new
processors are superior to anything they've offered before. It also enables
Quick Sync, a video transcoding feature that provides blazing
performance when converting videos to a different format.
 Intel is offering two different types of IGPs on its processors. The 2000
has 6 execution units, while the 3000 has 12 execution units. Obviously,
the later is quicker. Intel hasn't tied the IGP that you receive to the type
of processor you choose, however. Instead, it has tied the 3000 series
IGP to the "K" series processors. If you see a "K" at the end of the
processor's name, it has the 3000 series IGP. So far, Intel doesn't offer a
Core i3 K series processor, but that could change in the future.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Various Core Processors Of Intel(Contd.)
Laying Out the Chipset
 The staggered release of Intel's previous Core i3/i5/i7 products also
resulted in a staggered release of processor sockets and their related
chipsets. First came LGA 1366 processor socket, which was tied to some
Core i7 processors. Then Intel confused things by releasing the LGA 1156
socket, which was made available on several different chipsets and
processor types. Choosing the right socket and chipset for a processor
wasn't easy.
 Intel has now clarified matters by releasing a single processor socket and
two processor chipsets alongside Sandy Bridge. The new socket is LGA
1155, and it isn't backwards compatible with anything Intel has previously
offered. The new chipsets are P67 and H67, with the P variant being
performance-oriented and the H variant targeted at general use. The main
difference is that P67 allows for processor overclocking, while H67 does
not. P67 also offers 16 additional PCIe lanes. Both Core i3 and i5
processors are compatible with either chipset.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core i5 vs. Core i7
Core i5: The New Middle Class
 While the hardware has changed, Intel's branding scheme remains the
same, and Core i5 remains Intel's primary mid-range processor. It is
targeted at the heart of the market, with pricing that is not at budget
levels but still affordable, and performance that is extremely quick but not
the fastest Intel offers.
 Intel's high-end processor line is the Core i7. Many users who are looking
for a high-performance part end up considering both i5 and i7 products.
A Unified Socket and Chipset
 Perhaps the best news to come out of Intel's new line of i5 and i7
processors is introduction of a single socket for all Sandy Bridge Core
i3/i5/i7 processors. For now, however, the Sandy Bridge processors all
use the LGA 1155 socket. In case you're wondering, this socket is not
backwards compatible with previous LGA1156 processors.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core i5 vs. Core i7(Contd.)
Intel Turbo Boost
 Intel has made Turbo Boost a standard feature on all Core i5 and
i7 processors, from the least to most expensive. Intel has also
reduced the gap between the maximum turbo boost frequencies
on different processors. Previously, some of the older Core i7
processors actually had a much less efficient Turbo Boost feature
than some newer Core i5s.
 All of Intel's current Core i5 and i7 processors offer a boost of
between 300 and 400 MHz The least expensive i5s offer the 300
MHz boost - for example, the Core i5 2300 has a base clock
speed of 2.8 GHz and a maximum Turbo Boost speed of 3.1 GHz.
The Intel Core i7 2600, on the other hand, offers a base clock
speed of 3.4 GHz and a maximum Turbo Boost of 3.8 GHz.
 Besides the clock speed difference, Turbo Boost is essentially the
same on the i5 and i7 processors.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core i5 vs. Core i7(Contd.)
Difference in Hyper-Threading
 Another significant performance difference is how the Core i7 and
Core i5 products will be handling hyper-threading. Hyper-
threading is a technology used by Intel to simulate more cores
than actually exist on the processor. While Core i7 products have
all been quad-cores, they appear in Windows as having eight
cores. This further improves performance when using programs
that make good use of multi-threading.
 All Sandy Bridge Core i5 processors have hyper-threading
disabled, and all Sandy Bridge Core i7 processors have hyper-
threading enabled. This is a major feature difference of Core i5
vs. Core i7 processors, and it will give the Core i7 products an
advantage over Core i5 processors in some heavily multi-
threaded applications.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core i5 vs. Core i7(Contd.)
The New IGP
All of Intel's Sandy Bridge processors make use of a new
integrated IGP that is part of the processor architecture.
While far from a gaming-grade video solution, the
integrated IGP offers reasonable performance without
consuming much power. It also enables features like Quick
Sync, which can transcode video extremely quickly.
There are two versions of this IGP; the 2000 and the 3000.
The only difference between the two is the number of
execution units. The 2000 has 6, while the 3000 has 12.
This doesn't mean the 3000 is twice as quick, but it does
means the 3000 is about 50% quicker in most
benchmarks.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Core i5 vs. Core i7(Contd.)
i5 vs. i7: What it means to Consumers and Power Users
 Currently, the Core i5 processor brand makes up most of Intel's Sandy
Bridge processor line. The prices of these processors range from $177 to
$216 with base clock speeds between 2.8 GHz and 3.3 GHz. Intel only
offers two Core i7 products, the Core i7-2600 and Core i7-2600K, both of
which have a 3.4 GHz base clock speed. The i7-2600 has a price tag of
$294.
 As you may have guessed, paying about $80 more for the 100 MHz clock
speed increase between the fastest i5 and the i7 isn't a great deal. The
main reason to pay this additional cash for an i7 is hyper-threading, but
this advantage will only be evident if you frequently use programs that
can actually make use of 8 threads.
 For most users, the i5 is clearly the better deal. The i5-2500 makes the
most sense in my opinion, as it offers an extremely quick base clock
speed of 3.3 GHz for about $200. Of course, the value of this is subject to
change in the future as Intel fleshes out its product line with new models.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading
 Hyper-Threading Technology brings the concept of simultaneous multi-threading to
the Intel Architecture. Hyper-Threading Technology makes a single physical
processor appear as two logical processors. The physical execution resources are
shared and the architecture state is duplicated for the two logical processors. From
a software or architecture perspective, this means operating systems and user
programs can schedule processes or threads to logical processors as they would on
multiple physical processors. From a microarchitecture perspective, this means
that instructions from both logical processors will persist and execute
simultaneously on shared execution resources.
 The amazing growth of the Internet and telecommunications is powered by ever-
faster systems demanding increasingly higher levels of processor performance. To
keep up with this demand we cannot rely entirely on traditional approaches to
processor design. Microarchitecture techniques used to achieve past processor
performance improvement–super-pipelining, branch prediction, super-scalar
execution, out-of-order execution, caches–have made microprocessors
increasingly more complex, have more transistors, and consume more power. In
fact, transistor counts and power are increasing at rates greater than processor
performance. Processor architects are therefore looking for ways to improve
performance at a greater rate than transistor counts and power dissipation. Intel’s
Hyper-Threading Technology is one solution.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
 A look at today’s software trends reveals that server applications consist of
multiple threads or processes that can be executed in parallel. On-line
transaction processing and Web services have an abundance of software
threads that can be executed simultaneously for faster performance. Even
desktop applications are becoming increasingly parallel. Intel architects have
been trying to leverage this so-called thread-level parallelism (TLP) to gain a
better performance vs. transistor count and power ratio.
 In both the high-end and mid-range server markets, multiprocessors have
been commonly used to get more performance from the system. By adding
more processors, applications potentially get substantial performance
improvement by executing multiple threads on multiple processors at the
same time. These threads might be from the same application, from different
applications running simultaneously, from operating system services, or from
operating system threads doing background maintenance. Multiprocessor
systems have been used for many years, and high-end programmers are
familiar with the techniques to exploit multiprocessors for higher performance
levels.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
 In recent years a number of other techniques to further exploit TLP have been
discussed and some products have been announced. One of these techniques is
chip multiprocessing (CMP), where two processors are put on a single die. The two
processors each have a full set of execution and architectural resources. The
processors may or may not share a large on-chip cache. CMP is largely orthogonal
to conventional multiprocessor systems, as you can have multiple CMP processors
in a multiprocessor configuration. Recently announced processors incorporate two
processors on each die. However, a CMP chip is significantly larger than the size of
a single-core chip and therefore more expensive to manufacture; moreover, it
does not begin to address the die size and power considerations.
 Another approach is to allow a single processor to execute multiple threads by
switching between them. Time-slice multithreading is where the processor
switches between software threads after a fixed time period. Time-slice
multithreading can result in wasted execution slots but can effectively minimize
the effects of long latencies to memory. Switch-on-event multithreading would
switch threads on long latency events such as cache misses. This approach can
work well for server applications that have large numbers of cache misses and
where the two threads are executing similar tasks. However, both the time-slice
and the switch-on event multi- threading techniques do not achieve optimal
overlap of many sources of inefficient resource usage, such as branch
mispredictions, instruction dependencies, etc.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
 Finally, there is simultaneous multi-threading, where multiple threads can
execute on a single processor without switching. The threads execute
simultaneously and make much better use of the resources. This approach
makes the most effective use of processor resources: it maximizes the
performance vs. transistor count and power consumption. Hyper-Threading
Technology brings the simultaneous multi-threading approach to the Intel
architecture. In this paper we discuss the architecture and the first
implementation of Hyper-Threading Technology on the Intel Xeon processor
family.
 Hyper-Threading Technology makes a single physical processor appear as
multiple logical processors. To do this, there is one copy of the architecture
state for each logical processor, and the logical processors share a single set
of physical execution resources. From a software or architecture perspective,
this means operating systems and user programs can schedule processes or
threads to logical processors as they would on conventional physical
processors in a multiprocessor system. From a microarchitecture perspective,
this means that instructions from logical processors will persist and execute
simultaneously on shared execution resources.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
There are few elements in CPU that need to be understand to
know about hyper-threading technology:
 Registers - Registers are basically circuits that hold a single 64-bit value and are
the fastest form of storage available on a computer. The x86- architecture provides a
number of General Purpose Registers that are used by an executing program. In a
multicore chip, registers are unique to each core so if you have a quad-core
processor, there will be 4 sets of general purpose registers.
 Cache – Cache is essentially a form of storage that falls between registers and RAM
in terms of speed. In modern processors there are generally three levels and in the
case of the i7, Levels 1 & 2 is private and Level 3 is shared by all the cores on a chip.
The most important thing to know is that accessing the cache is slower than
registers but still faster than RAM.
 Execution Unit – This is the section in the CPU responsible for actually executing
the instructions. If you tell the computer to add 2 + 3, this is the part that operation
would be performed in.
 Front-End – This is a unit of the processor that is also known as Instruction
Fetch/Decode. Essentially this unit will grab instructions from either cache or RAM
and decode them into a form that execution unit can understand.
 Branch Predictor - this unit will attempt to predict branches in program code. If
there is an ―if-then‖ statement in a program, it will guess which statements will be
executed and prefetch them for the front-end.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
 In a core with HT, the registers are all duplicated. This means that one core
will have 2 sets of registers and this is what the operating will see as a
―logical core‖ since the sum of the registers represents the processor’s state.
We’ll call these sets A and B. Even though it appears as two cores, they will
still be sharing the same cache, branch predictor, front-end, and most
importantly, execution unit. Because they still share so many resources, only
one thread will technically execute at once. The advantage of adding the HT
logic is that if a thread is executing and stalls for any reason, the other
thread can be switched in very fast while the cause of the stall in the first
thread is addressed. To better illustrate how this works, consider the
following:
 Set A is considered the current state of the processor.
 Thread a starts executing.
 Thread A needs a value from memory that isn’t in the cache.
 Memory access is very time consuming in CPU terms, so thread A is considered
stalled.
 Instead of wasting cycles waiting for the memory operation to complete, set B is
considered the current state.
 Thread B is now executing until it stalls or until thread A can execute again (memory
operation finishes).
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Hyper Threading(Contd.)
This process basically just continues on constantly. Now,
there should be an obvious question: What can cause a
thread to stall? There are a few things; the simplest one to
understand is a cache miss. This is when the thread goes to
access a value that isn’t currently in the cache or any of the
registers. A branch miss prediction can also occur when the
branch predictor prefetches the wrong instructions into the
cache.
There is another time Hyper-Threading kicks in, and that is if
one thread is using Floating-Point resources while the 2nd is
only using Integer resources. HT will allow them both to
execute simultaneously while they don't conflict.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Does hyper threading actually help?
 Hyper-Threading has some interesting performance characteristics as a
result of its nature. HT will provide close to zero advantage if instruction
decoding or execution is the limiting factor in performance. In the
Nehalem architecture this is rarely the case. It performs ideally when
there are a lot of cache misses or branch miss predictions since the
execution unit would otherwise be idle waiting for these issues to be
resolved.
 Basically, certain applications will benefit more than others. Running a
more parallel workload such as rendering or encoding video will see a
nice benefit from HT since it’s likely both threads will be accessing the
same data so they aren’t really competing for cache. Additionally the
relatively small amount of local L2 cache in the i7 (256k) means there
will be a decent amount of memory access giving the second thread time
to execute. Also, it can result in a more responsive machine if not much
is going on since threads will have very low execution time and it’s much
faster for the CPU to switch the active register set than to grab another
thread from RAM and load it into the registers.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Are there drawbacks?
As with most engineering decisions, there are drawbacks to
HT. One of the more obvious one is that since HT keeps the
execution unit fed more efficiently, it spends less time idle
and can result in higher operating temperatures. More time
idle would mean the CPU got a chance to cool down before
the next execution burst and would result in a lower max
temperature.
There are also programs that will either not see any benefit
from HT or see decreased performance as well. Typically
something that has performance limited by cache, instruction
decode, the execution unit, or memory access will see little to
negative improvement from HT (one of the reasons the i7 has
so much memory bandwidth).
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Are there drawbacks?(Contd.)
 Running more than one multithreaded, computationally intensive task at
a time can also be a situation where HT doesn’t help performance. If a
processor core is running threads from different programs or that are
operating on different data, all of the shared resources are effectively
halved (data cache, branch prediction, instruction cache). This means
branch miss predictions and cache misses become even more common,
possibly to the point where both threads are stalled. Depending on the
specific program this can mean either lower performance (compared to
HT being disabled) or worse scaling than expected.
 The last drawback is probably the most important one: The benefit of HT
is inconsistent and dependent upon the specific operating environment
and programs being run. Because of the way it works, code that is
heavily optimized is likely to show less benefit as it would be designed to
lower branch miss-predictions and cache misses. The inconsistency of HT
while multitasking won’t show up on benchmarks since they’re designed
to only test a single task at a time.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Is it worth to use Hyper-thread technology?
If one does a lot of 3d rendering or Video
Transcoding then it probably is since this is the
workload HT is best suited for. If you find that you
generally run multiple intensive tasks
simultaneously (like playing a game while encoding
a video or recompiling the Linux kernel in a VM)
then HT could have a negative impact on overall
performance (though not necessarily). One thing
that is for sure is its impact is exaggerated in
synthetic benchmarks, almost to the point where it
becomes misleading.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtualization
Server virtualization:
 Huge data-centers contains large number of server. Work- load,
user-activity and other things decides which server when to use
and for the servers that are not been used according to their
capacities companies still spending their money, energy and
resources to keeping them updated and preventing them from
any crashing and overheating. So server virtualization concept is
used to make that physical server consolidate on fewer more
powerful and energy efficient server and that vm (virtual
machine) or energy efficient server imitate or pretends to be
multiple servers on network. Virtual server environment is
transparent on network so each user can interact with virtual
server as if they are still multiple servers but now main
advantage is that they should have to take care of only few
energy efficient servers instead of many servers and saving of
resources, energy and money also possible.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtualization(Contd.)
 As shown in figure in traditional architecture there is hardware which is
working on single operating system and in that operating system different -
different application are working.
 But as we know as this system as not energy efficient so one virtual
environment is developed through which now we can work on different
operating system with a single machine.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtual Machine
 A virtual machine monitor (VMM) is a host program that allows a
single computer to support multiple, identical execution
environments. All the users see their systems as self-contained
computers isolated from other users, even though every user is
served by the same machine. In this context, a virtual machine is an
operating system (OS) that is managed by an underlying control
program. For example, IBM's VM/ESA can control multiple virtual
machines on an IBM S/390 system.
 We are doing server virtualization to reduce energy cost, simplify
manageability and disaster management.
 In server virtualization what we are doing is adding VMM software to
allow hardware to use more than one OS.
 Major component of the server:
 Processor
 Chipset
 Network interface
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtual Machine(Contd.)
 Individual technologies that make up Intel VT are built in this
component that boost Performance, boost reliability, and boost
flexibility.
 Intel VT supports virtual machine architectures comprised of two
principal classes of software:
 Virtual-Machine Monitor (VMM): A VMM acts as a host and has full control
of the processor(s) and other platform hardware. VMM presents guest
software (see below) with an abstraction of a virtual processor and allows
it to execute directly on a logical processor. A VMM is able to retain
selective control of processor resources, physical memory, interrupt
management, and I/O.
 Guest Software: Each virtual machine is a guest software environment
that supports a stack consisting of an operating system (OS) and
application software. Each operates independently of other virtual
machines and uses the same interface to processor(s), memory, storage,
graphics, and I/O provided by a physical platform. The software stack
acts as if it were running on a platform with no VMM. Software executing
in a virtual machine must operate with reduced privilege so that the VMM
can retain control of platform resources.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Virtualization Technology-Flex Migration
(Intel VT-X)
 Obviously, as IT adds new systems, it would be much more convenient and
efficient if an IT manager could simply add new resources to existing pools without
having to worry about differences in processor generation. For this reason, Intel
has developed Intel VT Flex Migration. When combined with support from
virtualization software, it ensures that the hypervisor can expose a consistent set
of instructions across all servers in the pool. Intel VT Flex Migration support starts
with Intel® Core™ microarchitecture and will be available in future generations of
the Intel Xeon processor family.
 With Intel VT Flex Migration, IT managers can easily add current and future Intel
Xeon processor-based systems to the same resource pool when using supporting
hypervisor software. This gives IT the power to choose the right server platform
when it is needed to optimize performance, cost, power, and reliability, without
having to worry about forward and backward compatibility across generations of
Intel Xeon processor-based servers starting with Intel Core microarchitecture and
extending into future generations of Intel Xeon processors. IT managers can pool
server resources using multiple generations of Intel Xeon processors whether they
are single, dual- or multi-processor based. This creates a dynamic virtual server
infrastructure that enables the use of live VM migration to improve usage models
such as failover, load balancing, disaster recovery, and server maintenance.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel VT-X(Contd.)
 Current Intel® Xeon® 5400 and 5200
processor series, 3300 and 3100 processor
series, as well as future Intel Xeon
processors, support Intel VT Flex Migration.
Using virtualization software that is enabled
to take advantage of this feature, Intel
servers based on these processors can be
pooled with earlier generation of Intel Core
microarchitecture processors. These include
Intel® Xeon® 7300, 5300, 5100, 3200,
3000 series processors. Major Intel VT-x
component is Intel VT-x flex migration. By
using this technology, we will be able to
migrate the application from one server to
another and recover from disaster.
 From Intel VT flex migration one can
migrate between to generation processor so
one can react quickly on change in condition
making it much easier to server upend
running.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Flex Priority
 Intel VT Flex Priority optimizes and accelerates interrupt virtualization by
improving virtual machine access to the Task Priority Register thereby
enabling efficient Symmetric Multi-Processing (SMP) configurations of 32-bit
guest operating systems. For users, this translates into more efficient
performance in virtual environments for their critical enterprise applications.
 Intel VT Flex Priority was designed to accelerate virtualization interrupt
handling thereby improving virtualization performance. Intel VT Flex Priority
accelerates interrupt handling by preventing unnecessary VMExits on
accesses to the Advanced Programmable Interrupt Controller.
 Intel flex priority improves virtualization by 35%
 When processor is constantly bombarded with interruption many of which are
critical so Intel VT flex priority is kind of like receptionist who alerts when
interruption is critical. Because it is not necessary that all the interrupt that
are given to the processor are necessarily
 Critical to be executed at the time of occurrence of interruption so through
flex priority is kind like receptionist who alerts when interruption is critical so
processor can work efficiently if it is less interrupted.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtualization for directed I/O
 A VMM must support virtualization of I/O requests from guest software. I/O
virtualization may be supported by a VMM through any of the following
models:
 Emulation: A VMM may expose a virtual device to guest software by emulating an existing
(legacy) I/O device. VMM emulates the functionality of the I/O device in software over whatever
physical devices are available on the physical platform. I/O virtualization through emulation
provides good compatibility (by allowing existing device drivers to run within a guest), but pose
limitations with performance and functionality.
 New Software Interfaces: This model is similar to I/O emulation, but instead of emulating
legacy devices, VMM software exposes a synthetic device interface to guest software. The
synthetic device interface is defined to be virtualization-friendly to enable efficient virtualization
compared to the overhead associated with I/O emulation. This model provides improved
performance over emulation, but has reduced compatibility (due to the need for specialized guest
software or drivers utilizing the new software interfaces).
 Assignment: A VMM may directly assign the physical I/O devices to VMs. In this model, the driver
for an assigned I/O device runs in the VM to which it is assigned and is allowed to interact directly
with the device hardware with minimal or no VMM involvement. Robust I/O assignment requires
additional hardware support to ensure the assigned device accesses are isolated and restricted to
resources owned by the assigned partition. The I/O assignment model may also be used to create
one or more I/O container partitions that support emulation or software interfaces for virtualizing
I/O requests from other guests. The I/O-container-based approach removes the need for running
the physical device drivers as part of VMM privileged software.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Virtualization for directed I/O(Contd.)
Models contd.:
 I/O Device Sharing: In this model, which is an extension to the I/O
assignment model, an I/O device supports multiple functional interfaces,
each of which may be independently assigned to a VM. The device
hardware itself is capable of accepting multiple I/O requests through any
of these functional interfaces and processing them utilizing the device's
hardware resources.
 Depending on the usage requirements, a VMM may support any of
the above models for I/O virtualization. For example, I/O emulation
may be best suited for virtualizing legacy devices. I/O assignment
may provide the best performance when hosting I/O-intensive
workloads in a guest. Using new software interfaces makes a trade-
off between compatibility and performance, and device I/O sharing
provides more virtual devices than the number of physical devices in
the platform.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Overview Of Intel Virtualization
A general requirement for all of above I/O virtualization
models is the ability to isolate and restrict device accesses to
the resources owned by the partition managing the device.
Intel VT for Directed I/O provides VMM software with the
following capabilities:
 I/O device assignment: For flexibly assigning I/O devices to
VMs and extending the protection and isolation properties of VMs
for I/O operations.
 DMA remapping: For supporting independent address
translations for Direct Memory Accesses (DMA) from devices.
 Interrupt remapping: For supporting isolation and routing of
interrupts from devices and external interrupt controllers to
appropriate VMs.
 Reliability: For recording and reporting to system software DMA
and interrupt errors that may otherwise corrupt memory or
impact VM isolation.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
DMA Remapping
 DMA remapping facilities have been implemented in a variety of contexts
in the past to facilitate different usages. In workstations and server
platforms, traditional I/O memory management units (IOMMUs) have
been implemented in PCI root bridges to efficiently support
scatter/gather operations or I/O devices with limited DMA addressability.
Other well-known examples of DMA remapping facilities include the AGP
Graphics Aperture Remapping Table (GART), the Translation and
Protection Table (TPT) defined in the Virtual Interface Architecture, and
subsequently influencing a similar capability in the InfiniBand
Architecture and Remote DMA (RDMA) over TCP/IP specifications. DMA
remapping facilities have also been explored in the context of NICs
designed for low latency cluster interconnects.
 Traditional IOMMUs typically support an aperture-based architecture. All
DMA requests that target a programmed aperture address range in the
system physical address space are translated irrespective of the source
of the request. While this is useful for handling legacy device limitations
(such as limited DMA addressability or scatter/gather capabilities), they
are not adequate for I/O virtualization usages that require full DMA
isolation.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
DMA Remapping(Contd.)
 The VT-d architecture is a generalized IOMMU architecture that enables
system software to create multiple DMA protection domains. A protection
domain is abstractly defined as an isolated environment to which a subset of
the host physical memory is allocated. Depending on the software usage
model, a DMA protection domain may represent memory allocated to a VM,
or the DMA memory allocated by a guest-OS driver running in a VM or as
part of the VMM itself. The VT-d architecture enables system software to
assign one or more I/O devices to a protection domain. DMA isolation is
achieved by restricting access to a protection domain's physical memory from
I/O devices not assigned to it, through address- translation tables.
 The I/O devices assigned to a protection domain can be provided a view of
memory that may be different than the host view of physical memory. VT-d
hardware treats the address specified in a DMA request as a DMA virtual
address (DVA). Depending on the software usage model, a DVA may be the
Guest Physical Address (GPA) of the VM to which the I/O device is assigned,
or some software-abstracted virtual I/O address (similar to CPU linear
addresses). VT-d hardware transforms the address in a DMA request issued
by an I/O device to its corresponding Host Physical Address (HPA).
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
DMA Remapping(Contd.)
 Figure 5 illustrates DMA
address translation in a
multi-domain usage. I/O
devices 1 and 2 are
assigned to protection
domains 1 and 2,
respectively, each with
its own view of the DMA
address space.
 Figure 6 illustrates a PC
platform configuration
with VT-d hardware
implemented in the
north-bridge component.
Figure 5: DMA remapping
Figure 6: Platform configuration with VT-d
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Smart Memory Access
 Intel Smart Memory Access improves system performance by
optimizing the use of the available data bandwidth from the memory
subsystem and hiding the latency of memory accesses. The goal is
to ensure that data can be used as quickly as possible and is located
as close as possible to where it’s needed to minimize latency and
thus improve efficiency and speed.
 Intel Smart Memory Access includes a new capability called memory
disambiguation, which increases the efficiency of out-of-order
processing by providing the execution cores with the built-in
intelligence to speculatively load data for instructions that are about
to execute before all previous store instructions are executed.
 Intel Smart Memory Access also includes an instruction pointer-
based prefetcher that ―prefetches‖ memory contents before they are
requested so they can be placed in cache and readily accessed when
needed. Increasing the number of loads that occur from cache
versus main memory reduces memory latency and improves
performance.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Smart Memory Access(Contd.)
How Intel smart memory access improves execution
throughput?
 Intel core microarchitecture memory cluster (level 1 data memory
subsystem) is highly out of order, non blocking and speculative.
It has a variety of methods of caching and buffering to help
achieve its performance. Included among these are Intel Smart
Memory Access and its two key features: memory disambiguation
and instruction pointer based (IP-based) prefetcher to the level 1
data cache.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Memory Disambiguation
 Since Intel Pentium pro and all Intel processor have featured a sophisticated
out of order memory engine allowing the CPU to execute non -dependent
instruction in any order but they had significant short coming, these
processors were built around a conservative set of assumptions concerning
which memory accesses could proceed out of order. They would not move a
load in the execution order above a store having an unknown address (cases
where a prior store has not been executed yet). This was because if the store
and load end up sharing the same address, it results in an incorrect
instruction execution. Yet many loads are to locations unrelated to recently
executed stores. Prior hardware implementations created false dependencies
if they blocked such loads based on unknown store addresses. All these false
dependencies resulted in many lost opportunities for out-of-order execution.
 In designing Intel Core microarchitecture, Intel sought a way to eliminate
false dependencies using a technique known as memory disambiguation.
(―Disambiguation‖ is defined as the clarification that follows the removal of an
ambiguity.) Through memory disambiguation, Intel Core microarchitecture is
able to resolve many of the cases where the ambiguity of whether a
particular load and store share the same address thwart out-of-order
execution.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Memory Disambiguation(Contd.)
 Memory disambiguation uses a predictor and accompanying
algorithms to eliminate these false dependencies that block a load
from being moved up and completed as soon as possible. The basic
objective is to be able to ignore unknown store-address blocking
conditions whenever a load operation dispatched from the
processor’s reservation station (RS) is predicted to not collide with a
store. This prediction is eventually verified by checking all RS-
dispatched store addresses for an address match against newer
loads that were predicted non-conflicting and already executed. If
there is an offending load already executed, the pipe is flushed and
execution restarted from that load.
 The memory disambiguation predictor is based on a hash table that
is indexed with a hashed version of the load’s EIP address bits.
(―EIP‖ is used here to represent the instruction pointer in all x86
modes.) Each predictor entry behaves as a saturating counter, with
reset.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Memory Disambiguation(Contd.)
The predictor has two write operation both done during the
load’s retirement:
 Increment the entry if load ―behaved well‖ that if it meet
unknown store address but none of them collided.
 Reset the entry to zero if the load ―misbehaved.‖ That is, if it
collided with at least one older store that was dispatched by the
RS after the load. The reset is done regardless of whether the
load was actually disambiguated.
The predictor takes a conservative approach. In order to allow
memory disambiguation, it requires that a number of
consecutive iterations of a load having the same EIP behave
well. This isn’t necessarily a guarantee of success though. If
two loads with different EIPs clash in the same predictor
entry, their prediction will interact.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Memory Disambiguation(Contd.)
Predictor lookup
 The predictor is looked up when load instruction is dispatched from RS to
the memory pipe. If the respective counter is saturated, the load is
assumed to be safe and the result is written to the ―disambiguation
allowed bit‖ in the loaded buffer. This means that if load finds its relevant
store address and the load is allowed to go on. If the predictor is not
saturated, the load will behave like in prior implementations. In other
words, if there is a relevant unknown store address, the load will get
blocked.
 Load dispatch
 In case the load meets an older unknown store address, it sets the
―update bit‖ indicating the load should update the predictor. If the
prediction was "go,‖ the load will be dispatched and set the ―done‖ bit
indicating that disambiguation was done. If the prediction was "no go,"
the load will be conservatively blocked until resolving of all older store
addresses.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Memory Disambiguation(Contd.)
Prediction verification
 To recover in case of a misprediction by the disambiguation predictor, the
address of all the store operations dispatched from the RS to the Memory
Order Buffer must be compared with the address of all the loads that are
younger than the store. If such a match is found the respective ―reset bit‖
is set. When a load retires that was disambiguated and its reset bit set,
we restart the pipe from that load to re-execute it and all its dependent
instructions correctly.
Watchdog mechanism
 Disambiguation is based on prediction and mispredictions can cause
execution pipe flush, it’s important to build in safeguards to avoid rare
cases of performance loss. Consequently, Intel Core microarchitecture
includes a mechanism to temporarily disable memory disambiguation to
prevent cas.es of performance loss. This mechanism constantly monitors
the success rate of the disambiguation predictor.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Advanced smart cache
 Intel Advanced Smart Cache is a multi-core optimized cache that
improves performance and efficiency by increasing the probability
that each execution core of a dual core processor can access data
from a higher-performance, more-efficient cache subsystem.
 To accomplish this, Intel Core microarchitecture shares the Level 2
(L2) cache between the cores. This better optimizes cache resources
by storing data in one place that each core can access. By sharing L2
cache between each core, Intel Advanced Smart Cache allows each
core to dynamically use up to 100 percent of available L2 cache.
Threads can then dynamically use the required cache capacity.
 As an extreme example, if one of the cores is inactive, the other core
will have access to the full cache. Intel Advanced Smart Cache
enables very efficient sharing of data between threads running in
different cores. It also enables obtaining data from cache at higher
throughput rates for better performance. Intel Advanced Smart
Cache provides a peak transfer rate of 96 GB/sec (at 3 GHz
frequency).
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Wide dynamic execution
 Intel Wide Dynamic Execution significantly enhances dynamic execution,
enabling delivery of more instructions per clock cycle to improve execution
time and energy efficiency. Every execution core is 33 percent wider than
previous generations, allowing each core to fetch, decode, and retire up to
four full instructions simultaneously.
 Intel Wide Dynamic Execution also includes a new and innovative capability
called Macrofusion. Macrofusion combines certain common x86 instructions
into a single instruction that is executed as a single entity, increasing the
peak throughput of the engine to five instructions per clock. The wide
execution engine, when Macrofusion comes into play, is then capable of up to
six instructions per cycle throughputs for even greater energy -efficient
performance.
 Intel Core microarchitecture also uses extended microfusion, a technique that
―fuses‖ micro-ops derived from the same macro-op to reduce the number of
micro-ops that need to be executed. Studies have shown that micro-op fusion
can reduce the number of micro-ops handled by the out-of-order logic by
more than 10 percent.
 Intel Core microarchitecture ―extends‖ the number of micro-ops that can be
fused internally within the processor.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Wide dynamic execution(Contd.)
Intel Core microarchitecture also incorporates an updated ESP
(Extended Stack Pointer) Tracker. Stack tracking allows safe
early resolution of stack references by keeping track of the value
of the ESP register. About 25 percent of all loads are stack loads
and 95 percent of these loads may be resolved in the front end,
again contributing to greater energy efficiency [Bekerman].
Micro-op reduction resulting from micro-op fusion, Macrofusion,
ESP Tracker, and other techniques make various resources in the
engine appear virtually deeper than their actual size and results
in executing a given amount of work with less toggling of
signals—two factors that provide more performance for the same
or less power.
Intel Core microarchitecture also provides deep out of-order
buffers to allow for more instructions in flight, enabling more out-
of-order execution to better instruction level parallelism.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Advanced Digital media boost
 Intel Advanced Digital Media Boost helps achieve similar dramatic gains
in throughputs for programs utilizing SSE instructions of 128-bit
operands. (SSE instructions enhance Intel architecture by enabling
programmers to develop algorithms that can mix packed, single-
precision, and double-precision floating point and integers, using SSE
instructions.)
 These throughput gains come from combining a 128-bit-wide internal
data path with Intel Wide Dynamic Execution and matching widths and
throughputs in the relevant caches. Intel Advanced Digital Media Boost
enables most 128-bit instructions to be dispatched at a throughput rate
of one per clock cycle, effectively doubling the speed of execution and
resulting in peak floating point performance of 24 GFlops (on each core,
single precision, at 3 GHz frequency).
 Intel Advanced Digital Media Boost is particularly useful when running
many important multimedia operations involving graphics, video, and
audio, and processing other rich data sets that use SSE, SSE2, and SSE3
instructions.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intelligent power capability
 Intel Intelligent Power Capability is a set of capabilities for reducing
power consumption and device design requirements. This feature
manages the runtime power consumption of all the processor’s
execution cores. It includes an advanced power-gating capability
that allows for an ultra fine-grained logic control that turns on
individual processor logic subsystems only if and when they are
needed.
 Additionally, many buses and arrays are split so that data required in
some modes of operation can be put in a low-power state when not
needed. In the past, implementing such power gating has been
challenging because of the power consumed in powering down and
ramping back up, as well as the need to maintain system
responsiveness when returning to full power [Wechsler].
 Through Intel Intelligent Power Capability Intel has been able to
satisfy these concerns, ensuring significant power savings without
sacrificing responsiveness.
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
References
http://www.brighthub.com
http://mintywhite.com
http://www.flyertalk.com
http://www.overclock.net/a/hyperthreading-explained
http://download.intel.com/technology/computing/vptech/Intel
(r)_VT_for_Direct_IO.pdf
http://software.intel.com/sites/default/files/m/3/4/d/6/3/183
74-sma.pdf
http://www.youtube.com/watch?v=gqZrarZiHp8
http://www.youtube.com/watch?v=3fcI6G7Scqk
http://www.youtube.com/watch?v=V9AiN7oJaIM
http://www.youtube.com/watch?v=kkrqyEpINSQ
http://www.youtube.com/watch?v=y0Q40pBoIwA
Software & Services Group
Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Thank You

More Related Content

What's hot

Embedded c & working with avr studio
Embedded c & working with avr studioEmbedded c & working with avr studio
Embedded c & working with avr studioNitesh Singh
 
SOC - system on a chip
SOC - system on a chipSOC - system on a chip
SOC - system on a chipParth Kavi
 
Vlsi technology-dinesh
Vlsi technology-dineshVlsi technology-dinesh
Vlsi technology-dineshdinesh kumar
 
AMD Processor
AMD ProcessorAMD Processor
AMD ProcessorAli Fahad
 
System On Chip
System On ChipSystem On Chip
System On ChipA B Shinde
 
Differnce of two processors
Differnce of two processorsDiffernce of two processors
Differnce of two processorshashim102
 
Evolución de procesadores pentium intel
Evolución de procesadores pentium intelEvolución de procesadores pentium intel
Evolución de procesadores pentium intelVíctor Zuñiga Calero
 
Introduction to Embedded Architecture
Introduction to Embedded Architecture Introduction to Embedded Architecture
Introduction to Embedded Architecture amrutachintawar239
 
System On Chip
System On ChipSystem On Chip
System On Chipanishgoel
 
Design and Implementation of an Efficient Carry Skip Adder
Design and Implementation of an Efficient Carry Skip AdderDesign and Implementation of an Efficient Carry Skip Adder
Design and Implementation of an Efficient Carry Skip AdderIRJET Journal
 
An Entire Concept of Embedded systems
An Entire Concept of Embedded systems An Entire Concept of Embedded systems
An Entire Concept of Embedded systems Prabhakar Captain
 
Multi core processors i5
Multi core processors i5Multi core processors i5
Multi core processors i5Raafat Ismael
 
1. Introduction to Embedded Systems & IoT
1. Introduction to Embedded Systems & IoT1. Introduction to Embedded Systems & IoT
1. Introduction to Embedded Systems & IoTIEEE MIU SB
 
Interfacing the Raspberry Pi to the World
Interfacing the Raspberry Pi to the WorldInterfacing the Raspberry Pi to the World
Interfacing the Raspberry Pi to the WorldOmer Kilic
 
difference between an Intel Core i3, i5 and i7
difference between an Intel Core i3, i5 and i7difference between an Intel Core i3, i5 and i7
difference between an Intel Core i3, i5 and i7SAHA HINLEY
 

What's hot (20)

Embedded systems
Embedded systemsEmbedded systems
Embedded systems
 
Embedded c & working with avr studio
Embedded c & working with avr studioEmbedded c & working with avr studio
Embedded c & working with avr studio
 
System on Chip (SoC)
System on Chip (SoC)System on Chip (SoC)
System on Chip (SoC)
 
SOC - system on a chip
SOC - system on a chipSOC - system on a chip
SOC - system on a chip
 
Dual-core processor
Dual-core processorDual-core processor
Dual-core processor
 
Vlsi technology-dinesh
Vlsi technology-dineshVlsi technology-dinesh
Vlsi technology-dinesh
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Differnce of two processors
Differnce of two processorsDiffernce of two processors
Differnce of two processors
 
VLSI Technology
VLSI TechnologyVLSI Technology
VLSI Technology
 
BIOS AND OS
BIOS AND OSBIOS AND OS
BIOS AND OS
 
Evolución de procesadores pentium intel
Evolución de procesadores pentium intelEvolución de procesadores pentium intel
Evolución de procesadores pentium intel
 
Introduction to Embedded Architecture
Introduction to Embedded Architecture Introduction to Embedded Architecture
Introduction to Embedded Architecture
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Design and Implementation of an Efficient Carry Skip Adder
Design and Implementation of an Efficient Carry Skip AdderDesign and Implementation of an Efficient Carry Skip Adder
Design and Implementation of an Efficient Carry Skip Adder
 
An Entire Concept of Embedded systems
An Entire Concept of Embedded systems An Entire Concept of Embedded systems
An Entire Concept of Embedded systems
 
Multi core processors i5
Multi core processors i5Multi core processors i5
Multi core processors i5
 
1. Introduction to Embedded Systems & IoT
1. Introduction to Embedded Systems & IoT1. Introduction to Embedded Systems & IoT
1. Introduction to Embedded Systems & IoT
 
Interfacing the Raspberry Pi to the World
Interfacing the Raspberry Pi to the WorldInterfacing the Raspberry Pi to the World
Interfacing the Raspberry Pi to the World
 
difference between an Intel Core i3, i5 and i7
difference between an Intel Core i3, i5 and i7difference between an Intel Core i3, i5 and i7
difference between an Intel Core i3, i5 and i7
 

Viewers also liked

MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION
MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION
MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION Krunal Siddhapathak
 
Sic a new era in power electronics
Sic a new era in power electronicsSic a new era in power electronics
Sic a new era in power electronicsKrunal Siddhapathak
 
Rrjeta kompjuterike leksion 1
Rrjeta kompjuterike leksion 1  Rrjeta kompjuterike leksion 1
Rrjeta kompjuterike leksion 1 Xhendris Ismaili
 
Njesia Qendrore (Pjeset Perberese) , Lenda : Tik
Njesia Qendrore (Pjeset Perberese)  ,  Lenda : TikNjesia Qendrore (Pjeset Perberese)  ,  Lenda : Tik
Njesia Qendrore (Pjeset Perberese) , Lenda : TikDaniel Duro
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessorsharinder
 

Viewers also liked (10)

MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION
MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION
MICROWAVE RADIO COVERAGE FOR VEHICLE TO-VEHICLE AND IN-VEHICLE COMMUNICATION
 
Grid computing
Grid computingGrid computing
Grid computing
 
Atumatic toll tax system
Atumatic toll tax systemAtumatic toll tax system
Atumatic toll tax system
 
Musketeers mind rover
Musketeers mind roverMusketeers mind rover
Musketeers mind rover
 
Embedded C
Embedded CEmbedded C
Embedded C
 
Sic a new era in power electronics
Sic a new era in power electronicsSic a new era in power electronics
Sic a new era in power electronics
 
Rrjeta kompjuterike leksion 1
Rrjeta kompjuterike leksion 1  Rrjeta kompjuterike leksion 1
Rrjeta kompjuterike leksion 1
 
Njesia Qendrore (Pjeset Perberese) , Lenda : Tik
Njesia Qendrore (Pjeset Perberese)  ,  Lenda : TikNjesia Qendrore (Pjeset Perberese)  ,  Lenda : Tik
Njesia Qendrore (Pjeset Perberese) , Lenda : Tik
 
Ultra wide band antenna
Ultra wide band antennaUltra wide band antenna
Ultra wide band antenna
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessors
 

Similar to Features of modern intel microprocessors

Core i 7 processor
Core i 7 processorCore i 7 processor
Core i 7 processorSumit Biswas
 
Ivy bridge vs Sandy bridge Micro-architecture.
Ivy bridge vs Sandy bridge Micro-architecture.Ivy bridge vs Sandy bridge Micro-architecture.
Ivy bridge vs Sandy bridge Micro-architecture.Sumit Khanka
 
6th gen processor
6th gen processor6th gen processor
6th gen processorAmit Sinha
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsIntel® Software
 
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoUnderstanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoDarrenYaoYao
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Intel core i7 processor
Intel core i7 processorIntel core i7 processor
Intel core i7 processorGautam Kumar
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationIntel IT Center
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelIntel® Software
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Agora Group
 
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Agora Group
 
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Agora Group
 
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Agora Group
 
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...BeMyApp
 

Similar to Features of modern intel microprocessors (20)

Core i 7 processor
Core i 7 processorCore i 7 processor
Core i 7 processor
 
Ivy bridge vs Sandy bridge Micro-architecture.
Ivy bridge vs Sandy bridge Micro-architecture.Ivy bridge vs Sandy bridge Micro-architecture.
Ivy bridge vs Sandy bridge Micro-architecture.
 
6th gen processor
6th gen processor6th gen processor
6th gen processor
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Corei7
Corei7Corei7
Corei7
 
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoUnderstanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYao
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Intel core i7 processor
Intel core i7 processorIntel core i7 processor
Intel core i7 processor
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010
 
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010
 
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
 
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
 
Intel i7
Intel i7Intel i7
Intel i7
 
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...
[Android Codefest Germany] Adding x86 target to your Android app by Xavier Ha...
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Features of modern intel microprocessors

  • 1. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Essential Performance Advanced Performance Distributed Performance Efficient Performance Features Of Modern Intel Microprocessors Prepared By: Krunal P Siddhapathak (10BEC097)
  • 2. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core and Multi-Core Processor What is a Core?  A standard processor has one core (single-core.) Single core processors only process one instruction at a time (they do use pipelines internally, which allow several instructions to be processed together; however, they are still run one at a time.) What is a Multi-Core Processor?  A multi-core processor is comprised of two or more independent cores, each capable of processing individual instructions. A dual-core processor contains two cores, a quad-core processor contains four cores, and a hexa-core processor contains six cores.
  • 3. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Need Of Multi-Core Processors Multiple cores can be used to run two programs side by side and, when an intensive program is running, (AV Scan, Video conversion, CD ripping etc.) you can utilize another core to run your browser to check your email etc. Multiple cores really shine when you’re using a program that can utilize more than one core (called Parallelization) to improve the program’s efficiency and addressability. Programs such as graphic software, games etc. can run multiple instructions at the same time and deliver faster, smoother results. If you use CPU-intensive software, multiple cores will likely provide a better computing experience. If you use your PC to check emails and watch the occasional video, you really don’t need a multi-core processor.
  • 4. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core 2 Duo vs. Core i3 vs. Core i5 Core 2 Duo Core i3 Core i5 Number of Threads Two Four Four Socket 775 (45/65nm) 1156 (nm) 1156 (nm) Compatible RAM DDR2 DDR3 DDR3 Turbo Boost No No Yes Overclocking No Yes No
  • 5. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Do I need an i3, i5, or i7? As with all computer hardware, the type of processor you need depends on your needs, for how long you want your computer to stay current, and your budget. If you:  Browse the internet, check email, and play the occasional flash game (like Farmville): Get a single core netbook or desktop  Do word processing, spreadsheets etc., listen to music often, and watch movies, get an i3 processor (or any dual core processor i.e. core 2 duo)  Play the occasional game and are happy with lower resolution and lower quality graphics (my suggestion assumes the graphics processor on the pre-built PC will be well-matched for the processor suggestions), watch HD movies etc., get an i5.  If you do graphic publishing, music creation, programming (and compiling), watch HD movies, or like to play visually appealing games, get a quad core i5, or i7.  If you like to have the very best hardware and play the most graphically intense games, get a quad core or hexa corei7 Extreme.
  • 6. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Sandy Bridge Microarchitecture  Many of the bottlenecks of previous designs have been dealt with in the Sandy Bridge.  Instruction fetch and predecoding has been a serious bottleneck in Intel designs for many years. In the NetBurst architecture they tried to fix this problem by caching decoded µops, without much success.  In the Sandy Bridge design, they are caching instructions both before and after decoding. The limited size of the µop cache is therefore less problematic, and the µop cache appears to be very efficient. The limited number of register read ports has been a serious, and often neglected, bottleneck since the old Pentium Pro.  This bottleneck has now finally been removed in the Sandy Bridge. Previous Intel processors have only one memory read port where AMD processors have two. This was a bottleneck in many math applications. The Sandy Bridge has two read ports, whereby this bottleneck is removed. The branch prediction has been improved by having bigger buffers and a shorter misprediction penalty, but it has no loop predictor, and mispredictions are still quite common.  The new AVX instruction set is an important improvement. The throughput of floating point addition and multiplication is doubled when the new 256-bit YMM registers are used. The new non-destructive three-operand instructions are quite convenient for reducing register pressure and avoiding register move instructions. There is, however, a serious performance penalty for mixing vector instructions with and without the VEX prefix. This penalty is easily avoided if the programming guidelines are followed, but I suspect that it will be a very common programming error in the future to inadvertently mix VEX and non-VEX instructions, and such errors will be difficult to detect.
  • 7. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Sandy Bridge Microarchitecture(Contd.) Whenever the narrowest bottleneck is removed from a system, the next less narrow bottleneck will become the limiting factor. The new bottlenecks that require attention in the Sandy Bridge are the following:  The µop cache: This cache can ideally hold up to 1536 µops. The effective utilization will be much less in most cases. The programmer should pay attention to make sure the most critical inner loops fit into the µop cache.  Instruction fetch and decoding: The fetch/decode rate has not been improved over previous processors and is still a potential bottleneck for code that doesn’t fit into the µop cache.  Data cache bank conflicts: The increased memory read bandwidth means that the frequency of cache conflicts will increase. Cache bank conflicts are almost unavoidable in programs that utilize the memory ports to their maximum capacity.  Branch prediction: While the branch history buffer and branch target buffers are probably bigger than in previous designs, mispredictions are still quite common.  Sharing of resources between threads: Many of the critical resources are shared between the two threads of a core when hyperthreading is on. It may be wise to turn off hyperthreading when multiple threads depend on the same execution resources.
  • 8. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Ivy Bridge Microarchitecture Ivy Bridge is the codename for an Intel microprocessor using the Sandy Bridge microarchitecture. The name is also applied more broadly to the 22 nm die shrink of the microarchitecture based on tri-gate ("3D") transistors, which is also used in the future Ivy Bridge-EX and Ivy Bridge-EP microprocessors. Ivy Bridge processors are backwards-compatible with the Sandy Bridge platform, but might require a firmware update (vendor specific). Intel has released new 7-series Panther Point chipsets with integrated USB 3.0 to complement Ivy Bridge. Volume production of Ivy Bridge chips began in the third quarter of 2011. Quad-core and dual-core-mobile models launched on April 29, 2012 and May 31, 2012 respectively. Core i3 desktop processors, as well as the first 22 nm Pentium were launched and available the first week of September, 2012.
  • 9. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Ivy Bridge Microarchitecture(Contd.) How much faster are the Ivy Bridge processors?  The base clock frequency of these processors ranges from 2.8 GHz (for Core i5-3450S) to 3.5 GHz (for Core i7-3770K). What different types of the Ivy Bridge processors are available?  There are many types of processors in the Ivy Bridge family. The type is indicated by putting a suffix to the CPU model name. The following list explains these suffixes -  K – Unlocked, ready to be overclocked.  S – Performance optimized. Low power consumption.  T – Power optimized. Ultra low power consumption.  M – Mobile processors for mobile devices.  Q – Quad core processors.
  • 10. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Ivy Bridge Microarchitecture(Contd.) Features present in Ivy Bridge:  HD graphics – Ivy Bridge processors have in-built GPU chip inside them. The GPU supports DirectX 11 (Sandy Bridge supports version 10.1), OpenGL 3.1 (Sandy Bridge supports version 3.0). Ivy Bridge processors have the Intel HD4000/HD2500 GPU chips. This means that you do not need an add-on graphics card.  QuickSync Video – This feature is introduced in the Intel 3rd generation processors. It uses dedicated media processing to make video creation and conversion faster and easier. Whether you want to create DVDs, create, convert and edit 3D/2D videos, upload to your favorite social networking sites – everything is done in a jiffy.  WiDi 3.0 – Wireless Display technology allows you to stream media content to a multitude of your Wi-Fi connected display devices. You can share a 1080p 60FPS video using WiDi.  Turbo Boost Technology 2.0 – Using the Turbo boost technology, you can make your Ivy Bridge processors run faster than their base frequency. For example, a 3.5GHz iCore i7 can be made to run at 3.9 GHz for some time.
  • 11. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core 2 Duo vs. Core i3  The Core 2 Duo is Intel's veteran, covering a wide range of price and performance sweet spots. It is now being replaced, however, by Intel's rookie Core i3. So, is the Core i3 actually better than the Core 2 Duo, or can you hold off upgrading for a while longer?  The Core 2 Duo has been the processor of choice in laptops for about three years. Over those three years the average speeds of Core 2 Duo processors have advanced significantly and many of today's Core 2 Duo laptops have speeds of around 2.2 GHz or faster. Core 2 Duo processors have also been the go-to for many less expensive desktop systems, with speeds reaching over 3 GHz.  However, there is a newcomer which is challenging the Core 2 Duo. This is the Core i3. It is very similar to the Core 2 Duo in many ways. Both are dual-core processors and most Core 2 Duos and Core i3 have similar clock speeds. However, the processors are based on different architectures.  So, which one is better?
  • 12. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core 2 Duo vs. Core i3(Contd.) Architecture  The Core 2 Duo processors are based off the Core 2 architecture. The Core and Core 2 architectures were arguably Intel's most successful architectures, as they replaced the Pentium 4 processors in desktop systems and made Intel competitive in that space once again.  The Core i3 is based off a new architecture called Nehalem. The Nehalem architecture has numerous advantages over the Core 2 architecture. Nehalem is better constructed for quad-core processors, has hyper-threading available, and can use a feature called Turbo Boost which maximizes processor speed. However, because the Core i3 is the low-end Nehalem variant, most of these features are disabled or not relevant - the Core i3 is a dual core processor and Turbo Boost is disabled, but hyper-threading is enabled.
  • 13. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core 2 Duo vs. Core i3(Contd.) Processor Performance  The Core i3 is the slowest variant of the Nehalem based processor. The Core 2 Duo processors, however, don't have the same differentiation between versions of the same architecture. The fastest Core 2 Duo desktop processor has a speed of 3.33 GHz, while the fastest Core i3 desktop processor is clocked at 3.06 GHz.  You might therefore expect that the Core 2 Duo would have the edge - particularly when you consider that the Core 2 Duo costs almost three times as much if you buy it individually - but in fact the Core i3 is faster, and often by no small margin. The Core i3 is faster even in single- threaded applications, but the performance gap really widens in multi- threaded applications. This is because the Core i3 has hyper-threading, which turns the two real cores into four virtual cores. Windows works with the Core i3 as if it is a quad-core processor.  These results remain true in the mobile space, as well. Core i3 processors punch at least 500 MHz above their weight in single-thread applications, and are virtually always faster in multi-threaded applications, no matter the clock speeds of the Core 2 Duo and Core i3 processors you are comparing.
  • 14. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core 2 Duo vs. Core i3(Contd.) Power Usage and Heat  A look at the technical specifications of the Core i3 processors automatically puts them into a negative light when it comes to power consumption. The desktop Core i3 parts at listed as having a 73 Watt TDP, while most Core 2 Duo desktop parts have a 65 Watt TDP. In laptops the Core i3 has a 35 watt TDP, while Core 2 Duo mobile processors usually have a 25 Watt TDP.  These differences pan out about how you'd expect them to when it comes to absolute power consumption. The Core i3 processors do consume just slightly more power than Core 2 Duo processors at load and at idle. We're talking a difference of around 10 Watts on desktops and a few on laptops - nothing huge, but a difference none the less.  However, when it comes to power efficiency the answer becomes less clear. In order for a processor to be power efficient, it needs to not only have low power consumption but also the ability to complete tasks quickly. This lowers the overall "task energy" because a faster processor will be done with a task before a slower processor, and once done it will slip back into an idle state.  When viewed from this perspective, the Core i3 is much more efficient than the Core 2 Duo on both the desktop and the laptop. This means that the Core i3 will probably not use any more power than a Core 2 Duo - and may actually use less - unless your usage patterns place a constant load on your processor.
  • 15. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel Core i3 Series  Intel's Core i3 processor line has always been a budget option. These processors remain dual-core, unlike the rest of the Core line, which is made up of quad core processors. Intel's Core i3 processors also have many features restricted.  The main feature that is kept from the Core i3 processors is Turbo Boost, the dynamic overclocking available on most Intel processors. This, alongside with the dual-core design, accounts for most of the performance difference between Core i3 processors and the i5 and i7 options.  One feature that Core i3 has - and i5 doesn't - is hyper-threading. This is Intel's logic-core duplication technology which allows each physical core to be used as two logic cores. The result of this is that Windows will display a dual-core Core i3 processor as if it were a quad-core.  Finally, Core i3 processors have their integrated graphics processor restricted to a maximum clock speed of 1100 MHz, and all Core i3 processors have the 2000 series IGP, which is restricted to 6 execution cores. This will result in slightly lower IGP performance overall, but the difference is frankly inconsequential in many situations.
  • 16. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel(Contd.) Core i5 Series  Intel used to split the Core i5 processor brand into two different lines, one of which was dual-core and one of which was quad- core.  All Sandy Bridge Core i5 processors are quad-core processors, they all have Turbo Boost, and they all lack Hyper-Threading. Most of the Core i5 processors, besides the K series (explained later) use the same 2000 series IGP with a maximum clock speed of 1100 MHz and six execution cores.  In the i3 vs. i5 vs. i7 battle, the Core i5 processor is now obviously the main-stream option no matter which product you buy. The only substantial difference between the Core i5 options is the clock speed, which ranges from 2.8 GHz to 3.3 GHz. Obviously, the products with a quicker clock speed are more expensive than those that are slower.
  • 17. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel(Contd.) Core i7 Series  These processors are virtually identical to the Core i5. They have a 100 MHz higher base clock speed, which is inconsequential in most situations. The real feature difference is the addition of hyper-threading on the Core i7, which means that the processor will appear as an 8-core processor in Windows. This improves threaded performance and can result in a substantial boost if you're using a program that is able to take advantage of 8 threads.  Of course, most programs can't take advantage of 8 threads. Those that can are almost usually meant for enterprise or advanced video editing applications - 3D rendering programs, photo editing programs, and scientific programs are categories of software frequently designed to use 8 threads. The average user is unlikely to see the full benefit of the hyper- threading feature. In the Core i3 vs. i5 vs. i7 battle, the i7 has limited appeal.  The IGP on Core i7 processors can also reach a higher maximum clock speed of 1350 MHz as I've said before; however, this difference is largely inconsequential when measuring real-world performance.
  • 18. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel(Contd.) The K series processor  Late in the lifespan of Intel's previous Core i branded products; Intel introduced the "K" series. These processors had unlocked multipliers, making them easier to overclock.  Intel has kept this line of products alive with the new Sandy Bridge architecture by introducing a K series Core i5 and i7 processor. As before, these processors have unlocked multipliers. However, they also have a new feature - better integrated graphics processors.  This comes in the form of the 3000 series IGP, which has 12 execution cores instead of 6. The maximum clock speed remains limited by the processor brand - the Core i5 K is limited to 1100 MHz, while the Core i7 K can reach 1350 MHz the additional execution cores can result in better performance in games, although to honest, the IGP isn't remotely cut out for desktop gaming.
  • 19. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel(Contd.) The IGP Features: Sandy Bridge  The most importance new feature added to Intel's Sandy Bridge processors is the inclusion of an IGP on the processor. Intel did this before with Core i3 and some Core i5 processors, but the IGP was still separate from the processor itself - the IGP and CPU were placed on the same piece of silicon, but didn't physically work together.  Now Intel has taken the IGP integration a step further and worked the IGP into the CPU architecture. It even shares cache with the processor. What this means, in practical terms, is that the on-board graphics of Intel's new processors are superior to anything they've offered before. It also enables Quick Sync, a video transcoding feature that provides blazing performance when converting videos to a different format.  Intel is offering two different types of IGPs on its processors. The 2000 has 6 execution units, while the 3000 has 12 execution units. Obviously, the later is quicker. Intel hasn't tied the IGP that you receive to the type of processor you choose, however. Instead, it has tied the 3000 series IGP to the "K" series processors. If you see a "K" at the end of the processor's name, it has the 3000 series IGP. So far, Intel doesn't offer a Core i3 K series processor, but that could change in the future.
  • 20. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Various Core Processors Of Intel(Contd.) Laying Out the Chipset  The staggered release of Intel's previous Core i3/i5/i7 products also resulted in a staggered release of processor sockets and their related chipsets. First came LGA 1366 processor socket, which was tied to some Core i7 processors. Then Intel confused things by releasing the LGA 1156 socket, which was made available on several different chipsets and processor types. Choosing the right socket and chipset for a processor wasn't easy.  Intel has now clarified matters by releasing a single processor socket and two processor chipsets alongside Sandy Bridge. The new socket is LGA 1155, and it isn't backwards compatible with anything Intel has previously offered. The new chipsets are P67 and H67, with the P variant being performance-oriented and the H variant targeted at general use. The main difference is that P67 allows for processor overclocking, while H67 does not. P67 also offers 16 additional PCIe lanes. Both Core i3 and i5 processors are compatible with either chipset.
  • 21. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core i5 vs. Core i7 Core i5: The New Middle Class  While the hardware has changed, Intel's branding scheme remains the same, and Core i5 remains Intel's primary mid-range processor. It is targeted at the heart of the market, with pricing that is not at budget levels but still affordable, and performance that is extremely quick but not the fastest Intel offers.  Intel's high-end processor line is the Core i7. Many users who are looking for a high-performance part end up considering both i5 and i7 products. A Unified Socket and Chipset  Perhaps the best news to come out of Intel's new line of i5 and i7 processors is introduction of a single socket for all Sandy Bridge Core i3/i5/i7 processors. For now, however, the Sandy Bridge processors all use the LGA 1155 socket. In case you're wondering, this socket is not backwards compatible with previous LGA1156 processors.
  • 22. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core i5 vs. Core i7(Contd.) Intel Turbo Boost  Intel has made Turbo Boost a standard feature on all Core i5 and i7 processors, from the least to most expensive. Intel has also reduced the gap between the maximum turbo boost frequencies on different processors. Previously, some of the older Core i7 processors actually had a much less efficient Turbo Boost feature than some newer Core i5s.  All of Intel's current Core i5 and i7 processors offer a boost of between 300 and 400 MHz The least expensive i5s offer the 300 MHz boost - for example, the Core i5 2300 has a base clock speed of 2.8 GHz and a maximum Turbo Boost speed of 3.1 GHz. The Intel Core i7 2600, on the other hand, offers a base clock speed of 3.4 GHz and a maximum Turbo Boost of 3.8 GHz.  Besides the clock speed difference, Turbo Boost is essentially the same on the i5 and i7 processors.
  • 23. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core i5 vs. Core i7(Contd.) Difference in Hyper-Threading  Another significant performance difference is how the Core i7 and Core i5 products will be handling hyper-threading. Hyper- threading is a technology used by Intel to simulate more cores than actually exist on the processor. While Core i7 products have all been quad-cores, they appear in Windows as having eight cores. This further improves performance when using programs that make good use of multi-threading.  All Sandy Bridge Core i5 processors have hyper-threading disabled, and all Sandy Bridge Core i7 processors have hyper- threading enabled. This is a major feature difference of Core i5 vs. Core i7 processors, and it will give the Core i7 products an advantage over Core i5 processors in some heavily multi- threaded applications.
  • 24. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core i5 vs. Core i7(Contd.) The New IGP All of Intel's Sandy Bridge processors make use of a new integrated IGP that is part of the processor architecture. While far from a gaming-grade video solution, the integrated IGP offers reasonable performance without consuming much power. It also enables features like Quick Sync, which can transcode video extremely quickly. There are two versions of this IGP; the 2000 and the 3000. The only difference between the two is the number of execution units. The 2000 has 6, while the 3000 has 12. This doesn't mean the 3000 is twice as quick, but it does means the 3000 is about 50% quicker in most benchmarks.
  • 25. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Core i5 vs. Core i7(Contd.) i5 vs. i7: What it means to Consumers and Power Users  Currently, the Core i5 processor brand makes up most of Intel's Sandy Bridge processor line. The prices of these processors range from $177 to $216 with base clock speeds between 2.8 GHz and 3.3 GHz. Intel only offers two Core i7 products, the Core i7-2600 and Core i7-2600K, both of which have a 3.4 GHz base clock speed. The i7-2600 has a price tag of $294.  As you may have guessed, paying about $80 more for the 100 MHz clock speed increase between the fastest i5 and the i7 isn't a great deal. The main reason to pay this additional cash for an i7 is hyper-threading, but this advantage will only be evident if you frequently use programs that can actually make use of 8 threads.  For most users, the i5 is clearly the better deal. The i5-2500 makes the most sense in my opinion, as it offers an extremely quick base clock speed of 3.3 GHz for about $200. Of course, the value of this is subject to change in the future as Intel fleshes out its product line with new models.
  • 26. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading  Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture. Hyper-Threading Technology makes a single physical processor appear as two logical processors. The physical execution resources are shared and the architecture state is duplicated for the two logical processors. From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical processors. From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources.  The amazing growth of the Internet and telecommunications is powered by ever- faster systems demanding increasingly higher levels of processor performance. To keep up with this demand we cannot rely entirely on traditional approaches to processor design. Microarchitecture techniques used to achieve past processor performance improvement–super-pipelining, branch prediction, super-scalar execution, out-of-order execution, caches–have made microprocessors increasingly more complex, have more transistors, and consume more power. In fact, transistor counts and power are increasing at rates greater than processor performance. Processor architects are therefore looking for ways to improve performance at a greater rate than transistor counts and power dissipation. Intel’s Hyper-Threading Technology is one solution.
  • 27. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.)  A look at today’s software trends reveals that server applications consist of multiple threads or processes that can be executed in parallel. On-line transaction processing and Web services have an abundance of software threads that can be executed simultaneously for faster performance. Even desktop applications are becoming increasingly parallel. Intel architects have been trying to leverage this so-called thread-level parallelism (TLP) to gain a better performance vs. transistor count and power ratio.  In both the high-end and mid-range server markets, multiprocessors have been commonly used to get more performance from the system. By adding more processors, applications potentially get substantial performance improvement by executing multiple threads on multiple processors at the same time. These threads might be from the same application, from different applications running simultaneously, from operating system services, or from operating system threads doing background maintenance. Multiprocessor systems have been used for many years, and high-end programmers are familiar with the techniques to exploit multiprocessors for higher performance levels.
  • 28. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.)  In recent years a number of other techniques to further exploit TLP have been discussed and some products have been announced. One of these techniques is chip multiprocessing (CMP), where two processors are put on a single die. The two processors each have a full set of execution and architectural resources. The processors may or may not share a large on-chip cache. CMP is largely orthogonal to conventional multiprocessor systems, as you can have multiple CMP processors in a multiprocessor configuration. Recently announced processors incorporate two processors on each die. However, a CMP chip is significantly larger than the size of a single-core chip and therefore more expensive to manufacture; moreover, it does not begin to address the die size and power considerations.  Another approach is to allow a single processor to execute multiple threads by switching between them. Time-slice multithreading is where the processor switches between software threads after a fixed time period. Time-slice multithreading can result in wasted execution slots but can effectively minimize the effects of long latencies to memory. Switch-on-event multithreading would switch threads on long latency events such as cache misses. This approach can work well for server applications that have large numbers of cache misses and where the two threads are executing similar tasks. However, both the time-slice and the switch-on event multi- threading techniques do not achieve optimal overlap of many sources of inefficient resource usage, such as branch mispredictions, instruction dependencies, etc.
  • 29. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.)  Finally, there is simultaneous multi-threading, where multiple threads can execute on a single processor without switching. The threads execute simultaneously and make much better use of the resources. This approach makes the most effective use of processor resources: it maximizes the performance vs. transistor count and power consumption. Hyper-Threading Technology brings the simultaneous multi-threading approach to the Intel architecture. In this paper we discuss the architecture and the first implementation of Hyper-Threading Technology on the Intel Xeon processor family.  Hyper-Threading Technology makes a single physical processor appear as multiple logical processors. To do this, there is one copy of the architecture state for each logical processor, and the logical processors share a single set of physical execution resources. From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on conventional physical processors in a multiprocessor system. From a microarchitecture perspective, this means that instructions from logical processors will persist and execute simultaneously on shared execution resources.
  • 30. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.) There are few elements in CPU that need to be understand to know about hyper-threading technology:  Registers - Registers are basically circuits that hold a single 64-bit value and are the fastest form of storage available on a computer. The x86- architecture provides a number of General Purpose Registers that are used by an executing program. In a multicore chip, registers are unique to each core so if you have a quad-core processor, there will be 4 sets of general purpose registers.  Cache – Cache is essentially a form of storage that falls between registers and RAM in terms of speed. In modern processors there are generally three levels and in the case of the i7, Levels 1 & 2 is private and Level 3 is shared by all the cores on a chip. The most important thing to know is that accessing the cache is slower than registers but still faster than RAM.  Execution Unit – This is the section in the CPU responsible for actually executing the instructions. If you tell the computer to add 2 + 3, this is the part that operation would be performed in.  Front-End – This is a unit of the processor that is also known as Instruction Fetch/Decode. Essentially this unit will grab instructions from either cache or RAM and decode them into a form that execution unit can understand.  Branch Predictor - this unit will attempt to predict branches in program code. If there is an ―if-then‖ statement in a program, it will guess which statements will be executed and prefetch them for the front-end.
  • 31. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.)  In a core with HT, the registers are all duplicated. This means that one core will have 2 sets of registers and this is what the operating will see as a ―logical core‖ since the sum of the registers represents the processor’s state. We’ll call these sets A and B. Even though it appears as two cores, they will still be sharing the same cache, branch predictor, front-end, and most importantly, execution unit. Because they still share so many resources, only one thread will technically execute at once. The advantage of adding the HT logic is that if a thread is executing and stalls for any reason, the other thread can be switched in very fast while the cause of the stall in the first thread is addressed. To better illustrate how this works, consider the following:  Set A is considered the current state of the processor.  Thread a starts executing.  Thread A needs a value from memory that isn’t in the cache.  Memory access is very time consuming in CPU terms, so thread A is considered stalled.  Instead of wasting cycles waiting for the memory operation to complete, set B is considered the current state.  Thread B is now executing until it stalls or until thread A can execute again (memory operation finishes).
  • 32. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Hyper Threading(Contd.) This process basically just continues on constantly. Now, there should be an obvious question: What can cause a thread to stall? There are a few things; the simplest one to understand is a cache miss. This is when the thread goes to access a value that isn’t currently in the cache or any of the registers. A branch miss prediction can also occur when the branch predictor prefetches the wrong instructions into the cache. There is another time Hyper-Threading kicks in, and that is if one thread is using Floating-Point resources while the 2nd is only using Integer resources. HT will allow them both to execute simultaneously while they don't conflict.
  • 33. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Does hyper threading actually help?  Hyper-Threading has some interesting performance characteristics as a result of its nature. HT will provide close to zero advantage if instruction decoding or execution is the limiting factor in performance. In the Nehalem architecture this is rarely the case. It performs ideally when there are a lot of cache misses or branch miss predictions since the execution unit would otherwise be idle waiting for these issues to be resolved.  Basically, certain applications will benefit more than others. Running a more parallel workload such as rendering or encoding video will see a nice benefit from HT since it’s likely both threads will be accessing the same data so they aren’t really competing for cache. Additionally the relatively small amount of local L2 cache in the i7 (256k) means there will be a decent amount of memory access giving the second thread time to execute. Also, it can result in a more responsive machine if not much is going on since threads will have very low execution time and it’s much faster for the CPU to switch the active register set than to grab another thread from RAM and load it into the registers.
  • 34. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Are there drawbacks? As with most engineering decisions, there are drawbacks to HT. One of the more obvious one is that since HT keeps the execution unit fed more efficiently, it spends less time idle and can result in higher operating temperatures. More time idle would mean the CPU got a chance to cool down before the next execution burst and would result in a lower max temperature. There are also programs that will either not see any benefit from HT or see decreased performance as well. Typically something that has performance limited by cache, instruction decode, the execution unit, or memory access will see little to negative improvement from HT (one of the reasons the i7 has so much memory bandwidth).
  • 35. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Are there drawbacks?(Contd.)  Running more than one multithreaded, computationally intensive task at a time can also be a situation where HT doesn’t help performance. If a processor core is running threads from different programs or that are operating on different data, all of the shared resources are effectively halved (data cache, branch prediction, instruction cache). This means branch miss predictions and cache misses become even more common, possibly to the point where both threads are stalled. Depending on the specific program this can mean either lower performance (compared to HT being disabled) or worse scaling than expected.  The last drawback is probably the most important one: The benefit of HT is inconsistent and dependent upon the specific operating environment and programs being run. Because of the way it works, code that is heavily optimized is likely to show less benefit as it would be designed to lower branch miss-predictions and cache misses. The inconsistency of HT while multitasking won’t show up on benchmarks since they’re designed to only test a single task at a time.
  • 36. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Is it worth to use Hyper-thread technology? If one does a lot of 3d rendering or Video Transcoding then it probably is since this is the workload HT is best suited for. If you find that you generally run multiple intensive tasks simultaneously (like playing a game while encoding a video or recompiling the Linux kernel in a VM) then HT could have a negative impact on overall performance (though not necessarily). One thing that is for sure is its impact is exaggerated in synthetic benchmarks, almost to the point where it becomes misleading.
  • 37. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtualization Server virtualization:  Huge data-centers contains large number of server. Work- load, user-activity and other things decides which server when to use and for the servers that are not been used according to their capacities companies still spending their money, energy and resources to keeping them updated and preventing them from any crashing and overheating. So server virtualization concept is used to make that physical server consolidate on fewer more powerful and energy efficient server and that vm (virtual machine) or energy efficient server imitate or pretends to be multiple servers on network. Virtual server environment is transparent on network so each user can interact with virtual server as if they are still multiple servers but now main advantage is that they should have to take care of only few energy efficient servers instead of many servers and saving of resources, energy and money also possible.
  • 38. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtualization(Contd.)  As shown in figure in traditional architecture there is hardware which is working on single operating system and in that operating system different - different application are working.  But as we know as this system as not energy efficient so one virtual environment is developed through which now we can work on different operating system with a single machine.
  • 39. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtual Machine  A virtual machine monitor (VMM) is a host program that allows a single computer to support multiple, identical execution environments. All the users see their systems as self-contained computers isolated from other users, even though every user is served by the same machine. In this context, a virtual machine is an operating system (OS) that is managed by an underlying control program. For example, IBM's VM/ESA can control multiple virtual machines on an IBM S/390 system.  We are doing server virtualization to reduce energy cost, simplify manageability and disaster management.  In server virtualization what we are doing is adding VMM software to allow hardware to use more than one OS.  Major component of the server:  Processor  Chipset  Network interface
  • 40. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtual Machine(Contd.)  Individual technologies that make up Intel VT are built in this component that boost Performance, boost reliability, and boost flexibility.  Intel VT supports virtual machine architectures comprised of two principal classes of software:  Virtual-Machine Monitor (VMM): A VMM acts as a host and has full control of the processor(s) and other platform hardware. VMM presents guest software (see below) with an abstraction of a virtual processor and allows it to execute directly on a logical processor. A VMM is able to retain selective control of processor resources, physical memory, interrupt management, and I/O.  Guest Software: Each virtual machine is a guest software environment that supports a stack consisting of an operating system (OS) and application software. Each operates independently of other virtual machines and uses the same interface to processor(s), memory, storage, graphics, and I/O provided by a physical platform. The software stack acts as if it were running on a platform with no VMM. Software executing in a virtual machine must operate with reduced privilege so that the VMM can retain control of platform resources.
  • 41. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Virtualization Technology-Flex Migration (Intel VT-X)  Obviously, as IT adds new systems, it would be much more convenient and efficient if an IT manager could simply add new resources to existing pools without having to worry about differences in processor generation. For this reason, Intel has developed Intel VT Flex Migration. When combined with support from virtualization software, it ensures that the hypervisor can expose a consistent set of instructions across all servers in the pool. Intel VT Flex Migration support starts with Intel® Core™ microarchitecture and will be available in future generations of the Intel Xeon processor family.  With Intel VT Flex Migration, IT managers can easily add current and future Intel Xeon processor-based systems to the same resource pool when using supporting hypervisor software. This gives IT the power to choose the right server platform when it is needed to optimize performance, cost, power, and reliability, without having to worry about forward and backward compatibility across generations of Intel Xeon processor-based servers starting with Intel Core microarchitecture and extending into future generations of Intel Xeon processors. IT managers can pool server resources using multiple generations of Intel Xeon processors whether they are single, dual- or multi-processor based. This creates a dynamic virtual server infrastructure that enables the use of live VM migration to improve usage models such as failover, load balancing, disaster recovery, and server maintenance.
  • 42. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel VT-X(Contd.)  Current Intel® Xeon® 5400 and 5200 processor series, 3300 and 3100 processor series, as well as future Intel Xeon processors, support Intel VT Flex Migration. Using virtualization software that is enabled to take advantage of this feature, Intel servers based on these processors can be pooled with earlier generation of Intel Core microarchitecture processors. These include Intel® Xeon® 7300, 5300, 5100, 3200, 3000 series processors. Major Intel VT-x component is Intel VT-x flex migration. By using this technology, we will be able to migrate the application from one server to another and recover from disaster.  From Intel VT flex migration one can migrate between to generation processor so one can react quickly on change in condition making it much easier to server upend running.
  • 43. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Flex Priority  Intel VT Flex Priority optimizes and accelerates interrupt virtualization by improving virtual machine access to the Task Priority Register thereby enabling efficient Symmetric Multi-Processing (SMP) configurations of 32-bit guest operating systems. For users, this translates into more efficient performance in virtual environments for their critical enterprise applications.  Intel VT Flex Priority was designed to accelerate virtualization interrupt handling thereby improving virtualization performance. Intel VT Flex Priority accelerates interrupt handling by preventing unnecessary VMExits on accesses to the Advanced Programmable Interrupt Controller.  Intel flex priority improves virtualization by 35%  When processor is constantly bombarded with interruption many of which are critical so Intel VT flex priority is kind of like receptionist who alerts when interruption is critical. Because it is not necessary that all the interrupt that are given to the processor are necessarily  Critical to be executed at the time of occurrence of interruption so through flex priority is kind like receptionist who alerts when interruption is critical so processor can work efficiently if it is less interrupted.
  • 44. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtualization for directed I/O  A VMM must support virtualization of I/O requests from guest software. I/O virtualization may be supported by a VMM through any of the following models:  Emulation: A VMM may expose a virtual device to guest software by emulating an existing (legacy) I/O device. VMM emulates the functionality of the I/O device in software over whatever physical devices are available on the physical platform. I/O virtualization through emulation provides good compatibility (by allowing existing device drivers to run within a guest), but pose limitations with performance and functionality.  New Software Interfaces: This model is similar to I/O emulation, but instead of emulating legacy devices, VMM software exposes a synthetic device interface to guest software. The synthetic device interface is defined to be virtualization-friendly to enable efficient virtualization compared to the overhead associated with I/O emulation. This model provides improved performance over emulation, but has reduced compatibility (due to the need for specialized guest software or drivers utilizing the new software interfaces).  Assignment: A VMM may directly assign the physical I/O devices to VMs. In this model, the driver for an assigned I/O device runs in the VM to which it is assigned and is allowed to interact directly with the device hardware with minimal or no VMM involvement. Robust I/O assignment requires additional hardware support to ensure the assigned device accesses are isolated and restricted to resources owned by the assigned partition. The I/O assignment model may also be used to create one or more I/O container partitions that support emulation or software interfaces for virtualizing I/O requests from other guests. The I/O-container-based approach removes the need for running the physical device drivers as part of VMM privileged software.
  • 45. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Virtualization for directed I/O(Contd.) Models contd.:  I/O Device Sharing: In this model, which is an extension to the I/O assignment model, an I/O device supports multiple functional interfaces, each of which may be independently assigned to a VM. The device hardware itself is capable of accepting multiple I/O requests through any of these functional interfaces and processing them utilizing the device's hardware resources.  Depending on the usage requirements, a VMM may support any of the above models for I/O virtualization. For example, I/O emulation may be best suited for virtualizing legacy devices. I/O assignment may provide the best performance when hosting I/O-intensive workloads in a guest. Using new software interfaces makes a trade- off between compatibility and performance, and device I/O sharing provides more virtual devices than the number of physical devices in the platform.
  • 46. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Overview Of Intel Virtualization A general requirement for all of above I/O virtualization models is the ability to isolate and restrict device accesses to the resources owned by the partition managing the device. Intel VT for Directed I/O provides VMM software with the following capabilities:  I/O device assignment: For flexibly assigning I/O devices to VMs and extending the protection and isolation properties of VMs for I/O operations.  DMA remapping: For supporting independent address translations for Direct Memory Accesses (DMA) from devices.  Interrupt remapping: For supporting isolation and routing of interrupts from devices and external interrupt controllers to appropriate VMs.  Reliability: For recording and reporting to system software DMA and interrupt errors that may otherwise corrupt memory or impact VM isolation.
  • 47. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. DMA Remapping  DMA remapping facilities have been implemented in a variety of contexts in the past to facilitate different usages. In workstations and server platforms, traditional I/O memory management units (IOMMUs) have been implemented in PCI root bridges to efficiently support scatter/gather operations or I/O devices with limited DMA addressability. Other well-known examples of DMA remapping facilities include the AGP Graphics Aperture Remapping Table (GART), the Translation and Protection Table (TPT) defined in the Virtual Interface Architecture, and subsequently influencing a similar capability in the InfiniBand Architecture and Remote DMA (RDMA) over TCP/IP specifications. DMA remapping facilities have also been explored in the context of NICs designed for low latency cluster interconnects.  Traditional IOMMUs typically support an aperture-based architecture. All DMA requests that target a programmed aperture address range in the system physical address space are translated irrespective of the source of the request. While this is useful for handling legacy device limitations (such as limited DMA addressability or scatter/gather capabilities), they are not adequate for I/O virtualization usages that require full DMA isolation.
  • 48. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. DMA Remapping(Contd.)  The VT-d architecture is a generalized IOMMU architecture that enables system software to create multiple DMA protection domains. A protection domain is abstractly defined as an isolated environment to which a subset of the host physical memory is allocated. Depending on the software usage model, a DMA protection domain may represent memory allocated to a VM, or the DMA memory allocated by a guest-OS driver running in a VM or as part of the VMM itself. The VT-d architecture enables system software to assign one or more I/O devices to a protection domain. DMA isolation is achieved by restricting access to a protection domain's physical memory from I/O devices not assigned to it, through address- translation tables.  The I/O devices assigned to a protection domain can be provided a view of memory that may be different than the host view of physical memory. VT-d hardware treats the address specified in a DMA request as a DMA virtual address (DVA). Depending on the software usage model, a DVA may be the Guest Physical Address (GPA) of the VM to which the I/O device is assigned, or some software-abstracted virtual I/O address (similar to CPU linear addresses). VT-d hardware transforms the address in a DMA request issued by an I/O device to its corresponding Host Physical Address (HPA).
  • 49. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. DMA Remapping(Contd.)  Figure 5 illustrates DMA address translation in a multi-domain usage. I/O devices 1 and 2 are assigned to protection domains 1 and 2, respectively, each with its own view of the DMA address space.  Figure 6 illustrates a PC platform configuration with VT-d hardware implemented in the north-bridge component. Figure 5: DMA remapping Figure 6: Platform configuration with VT-d
  • 50. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Smart Memory Access  Intel Smart Memory Access improves system performance by optimizing the use of the available data bandwidth from the memory subsystem and hiding the latency of memory accesses. The goal is to ensure that data can be used as quickly as possible and is located as close as possible to where it’s needed to minimize latency and thus improve efficiency and speed.  Intel Smart Memory Access includes a new capability called memory disambiguation, which increases the efficiency of out-of-order processing by providing the execution cores with the built-in intelligence to speculatively load data for instructions that are about to execute before all previous store instructions are executed.  Intel Smart Memory Access also includes an instruction pointer- based prefetcher that ―prefetches‖ memory contents before they are requested so they can be placed in cache and readily accessed when needed. Increasing the number of loads that occur from cache versus main memory reduces memory latency and improves performance.
  • 51. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Smart Memory Access(Contd.) How Intel smart memory access improves execution throughput?  Intel core microarchitecture memory cluster (level 1 data memory subsystem) is highly out of order, non blocking and speculative. It has a variety of methods of caching and buffering to help achieve its performance. Included among these are Intel Smart Memory Access and its two key features: memory disambiguation and instruction pointer based (IP-based) prefetcher to the level 1 data cache.
  • 52. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Memory Disambiguation  Since Intel Pentium pro and all Intel processor have featured a sophisticated out of order memory engine allowing the CPU to execute non -dependent instruction in any order but they had significant short coming, these processors were built around a conservative set of assumptions concerning which memory accesses could proceed out of order. They would not move a load in the execution order above a store having an unknown address (cases where a prior store has not been executed yet). This was because if the store and load end up sharing the same address, it results in an incorrect instruction execution. Yet many loads are to locations unrelated to recently executed stores. Prior hardware implementations created false dependencies if they blocked such loads based on unknown store addresses. All these false dependencies resulted in many lost opportunities for out-of-order execution.  In designing Intel Core microarchitecture, Intel sought a way to eliminate false dependencies using a technique known as memory disambiguation. (―Disambiguation‖ is defined as the clarification that follows the removal of an ambiguity.) Through memory disambiguation, Intel Core microarchitecture is able to resolve many of the cases where the ambiguity of whether a particular load and store share the same address thwart out-of-order execution.
  • 53. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Memory Disambiguation(Contd.)  Memory disambiguation uses a predictor and accompanying algorithms to eliminate these false dependencies that block a load from being moved up and completed as soon as possible. The basic objective is to be able to ignore unknown store-address blocking conditions whenever a load operation dispatched from the processor’s reservation station (RS) is predicted to not collide with a store. This prediction is eventually verified by checking all RS- dispatched store addresses for an address match against newer loads that were predicted non-conflicting and already executed. If there is an offending load already executed, the pipe is flushed and execution restarted from that load.  The memory disambiguation predictor is based on a hash table that is indexed with a hashed version of the load’s EIP address bits. (―EIP‖ is used here to represent the instruction pointer in all x86 modes.) Each predictor entry behaves as a saturating counter, with reset.
  • 54. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Memory Disambiguation(Contd.) The predictor has two write operation both done during the load’s retirement:  Increment the entry if load ―behaved well‖ that if it meet unknown store address but none of them collided.  Reset the entry to zero if the load ―misbehaved.‖ That is, if it collided with at least one older store that was dispatched by the RS after the load. The reset is done regardless of whether the load was actually disambiguated. The predictor takes a conservative approach. In order to allow memory disambiguation, it requires that a number of consecutive iterations of a load having the same EIP behave well. This isn’t necessarily a guarantee of success though. If two loads with different EIPs clash in the same predictor entry, their prediction will interact.
  • 55. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Memory Disambiguation(Contd.) Predictor lookup  The predictor is looked up when load instruction is dispatched from RS to the memory pipe. If the respective counter is saturated, the load is assumed to be safe and the result is written to the ―disambiguation allowed bit‖ in the loaded buffer. This means that if load finds its relevant store address and the load is allowed to go on. If the predictor is not saturated, the load will behave like in prior implementations. In other words, if there is a relevant unknown store address, the load will get blocked.  Load dispatch  In case the load meets an older unknown store address, it sets the ―update bit‖ indicating the load should update the predictor. If the prediction was "go,‖ the load will be dispatched and set the ―done‖ bit indicating that disambiguation was done. If the prediction was "no go," the load will be conservatively blocked until resolving of all older store addresses.
  • 56. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Memory Disambiguation(Contd.) Prediction verification  To recover in case of a misprediction by the disambiguation predictor, the address of all the store operations dispatched from the RS to the Memory Order Buffer must be compared with the address of all the loads that are younger than the store. If such a match is found the respective ―reset bit‖ is set. When a load retires that was disambiguated and its reset bit set, we restart the pipe from that load to re-execute it and all its dependent instructions correctly. Watchdog mechanism  Disambiguation is based on prediction and mispredictions can cause execution pipe flush, it’s important to build in safeguards to avoid rare cases of performance loss. Consequently, Intel Core microarchitecture includes a mechanism to temporarily disable memory disambiguation to prevent cas.es of performance loss. This mechanism constantly monitors the success rate of the disambiguation predictor.
  • 57. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Advanced smart cache  Intel Advanced Smart Cache is a multi-core optimized cache that improves performance and efficiency by increasing the probability that each execution core of a dual core processor can access data from a higher-performance, more-efficient cache subsystem.  To accomplish this, Intel Core microarchitecture shares the Level 2 (L2) cache between the cores. This better optimizes cache resources by storing data in one place that each core can access. By sharing L2 cache between each core, Intel Advanced Smart Cache allows each core to dynamically use up to 100 percent of available L2 cache. Threads can then dynamically use the required cache capacity.  As an extreme example, if one of the cores is inactive, the other core will have access to the full cache. Intel Advanced Smart Cache enables very efficient sharing of data between threads running in different cores. It also enables obtaining data from cache at higher throughput rates for better performance. Intel Advanced Smart Cache provides a peak transfer rate of 96 GB/sec (at 3 GHz frequency).
  • 58. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Wide dynamic execution  Intel Wide Dynamic Execution significantly enhances dynamic execution, enabling delivery of more instructions per clock cycle to improve execution time and energy efficiency. Every execution core is 33 percent wider than previous generations, allowing each core to fetch, decode, and retire up to four full instructions simultaneously.  Intel Wide Dynamic Execution also includes a new and innovative capability called Macrofusion. Macrofusion combines certain common x86 instructions into a single instruction that is executed as a single entity, increasing the peak throughput of the engine to five instructions per clock. The wide execution engine, when Macrofusion comes into play, is then capable of up to six instructions per cycle throughputs for even greater energy -efficient performance.  Intel Core microarchitecture also uses extended microfusion, a technique that ―fuses‖ micro-ops derived from the same macro-op to reduce the number of micro-ops that need to be executed. Studies have shown that micro-op fusion can reduce the number of micro-ops handled by the out-of-order logic by more than 10 percent.  Intel Core microarchitecture ―extends‖ the number of micro-ops that can be fused internally within the processor.
  • 59. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Wide dynamic execution(Contd.) Intel Core microarchitecture also incorporates an updated ESP (Extended Stack Pointer) Tracker. Stack tracking allows safe early resolution of stack references by keeping track of the value of the ESP register. About 25 percent of all loads are stack loads and 95 percent of these loads may be resolved in the front end, again contributing to greater energy efficiency [Bekerman]. Micro-op reduction resulting from micro-op fusion, Macrofusion, ESP Tracker, and other techniques make various resources in the engine appear virtually deeper than their actual size and results in executing a given amount of work with less toggling of signals—two factors that provide more performance for the same or less power. Intel Core microarchitecture also provides deep out of-order buffers to allow for more instructions in flight, enabling more out- of-order execution to better instruction level parallelism.
  • 60. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Advanced Digital media boost  Intel Advanced Digital Media Boost helps achieve similar dramatic gains in throughputs for programs utilizing SSE instructions of 128-bit operands. (SSE instructions enhance Intel architecture by enabling programmers to develop algorithms that can mix packed, single- precision, and double-precision floating point and integers, using SSE instructions.)  These throughput gains come from combining a 128-bit-wide internal data path with Intel Wide Dynamic Execution and matching widths and throughputs in the relevant caches. Intel Advanced Digital Media Boost enables most 128-bit instructions to be dispatched at a throughput rate of one per clock cycle, effectively doubling the speed of execution and resulting in peak floating point performance of 24 GFlops (on each core, single precision, at 3 GHz frequency).  Intel Advanced Digital Media Boost is particularly useful when running many important multimedia operations involving graphics, video, and audio, and processing other rich data sets that use SSE, SSE2, and SSE3 instructions.
  • 61. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intelligent power capability  Intel Intelligent Power Capability is a set of capabilities for reducing power consumption and device design requirements. This feature manages the runtime power consumption of all the processor’s execution cores. It includes an advanced power-gating capability that allows for an ultra fine-grained logic control that turns on individual processor logic subsystems only if and when they are needed.  Additionally, many buses and arrays are split so that data required in some modes of operation can be put in a low-power state when not needed. In the past, implementing such power gating has been challenging because of the power consumed in powering down and ramping back up, as well as the need to maintain system responsiveness when returning to full power [Wechsler].  Through Intel Intelligent Power Capability Intel has been able to satisfy these concerns, ensuring significant power savings without sacrificing responsiveness.
  • 62. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. References http://www.brighthub.com http://mintywhite.com http://www.flyertalk.com http://www.overclock.net/a/hyperthreading-explained http://download.intel.com/technology/computing/vptech/Intel (r)_VT_for_Direct_IO.pdf http://software.intel.com/sites/default/files/m/3/4/d/6/3/183 74-sma.pdf http://www.youtube.com/watch?v=gqZrarZiHp8 http://www.youtube.com/watch?v=3fcI6G7Scqk http://www.youtube.com/watch?v=V9AiN7oJaIM http://www.youtube.com/watch?v=kkrqyEpINSQ http://www.youtube.com/watch?v=y0Q40pBoIwA
  • 63. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Thank You