147. First identifies repeating instruction sequences.
148. Once identified the traditional branch prediction, fetch and decode phases of execution are temporarily turned off while the loop executes.
149. This saves the cycles that might have been otherwise wasted in these pipeline stages due to repeated set of instructions. </li></ul>3/25/2011<br />AN ARCHITECTURE PERSPECTIVE<br />36<br />
150. <ul><li>Enhanced Branch Prediction:
151. New Second-Level Branch Target Buffer: To improve branch predictions in large coded apps (e.g., database applications).
152. New Renamed Return Stack Buffer: Stores forward and return pointers associated with calls and returns.
153. SSE 4.2:
154. Introduces seven new SSE 4.2 instructions including four that optimizes string and text processing.
155. STTNI (String and text new instructions):
156. Operate on 16 bytes at a time.
157. This boosts the XML parsing speed and enables faster search and pattern matching, lexing, tokenizing and regular expression evaluation.</li></ul>3/25/2011<br />AN ARCHITECTURE PERSPECTIVE<br />37<br />
159. Intelligent Power Technology<br /><ul><li>Integrated Power Gates:
160. Allows independent individual idling of a core to non zero power reducing idle power.
161. Automated Low-Power States:
162. Automatically put processor and memory into the lowest available power states that will meet the requirement of the current workload. </li></ul>3/25/2011<br />AN ARCHITECTURE PERSPECTIVE<br />39<br />
164. So hypervisor can pin Virtual machine to a specific execution microprocessor and its dedicated memory.
165. Hardware-assisted page-table management:
166. Allows the guest OS more direct access to the hardware and reducing compute intensive software translation from the hypervisor.
167. Directed I/O:
168. Speed data movement and eliminates much of the performance overhead by giving designated virtual machines their own dedicated I/O devices, thus reducing the overhead of the VMM in managing I/O traffic.
169. Virtualized Connectivity:
170. Integrating extensive hardware assists into the I/O devices
171. Performing routing functions to and from virtual machines in dedicated network silicon, it speeds delivery and reduces the load on the VMM and Server Processors.
172. Improves two times the throughput than non-hardware assisted devices.</li></ul>3/25/2011<br />AN ARCHITECTURE PERSPECTIVE<br />40<br />
173. Enhancement Over Core Microarchitecture<br /><ul><li>Pipeline: 14 stage in core but 20 to 24 stages in Nehalem.
174. Branch Prediction: advanced RSB and L2 Branch Predictor.
175. Unified 2nd Level TLB: 512 entry L2 TLB against 256 entry of core.
176. Macrofusion: Condenses 64-bit Macro-ops than 32-bits in Core.
177. The Loop Stream Detection: More efficient in Nehalem.
178. The Execution Engine and the Out of Order executor:
179. The Reorder Buffer has been made a third larger- up from 96 to 128 Entries.
180. The Reservation Station (which schedules operations to available Execution Units) has been given an extra four slots allowing 36 Entries .</li></ul>3/25/2011<br />AN ARCHITECTURE PERSPECTIVE<br />41<br />