Overview Of Msil


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Overview Of Msil

  1. 1. An Overview of MSIL The .NET architecture addresses an important need - language interoperability. Instead of generating native code that is specific to one platform, programming languages can generate code in MSIL (Microsoft Intermediate Language) targeting the Common Language Runtime (CLR) to reap the rich benefits provided by .NET. Advanced programmers occasionally peek into MSIL code when they are in doubt of what is happening under the hood (using the Ildasm tool). Therefore, it is essential that the C# programmer understands the basics of MSIL. This beginner-level article gives an overview of MSIL and debugging with the Ildasm tool. System Requirements The programming examples in this article use C# as the source language for generating MSIL code, and so the reader is expected to have some basic understanding of C#. No prior exposure to MSIL is necessary. In addition, the reader is assumed to know what a stack data structure is. It is preferable that the reader has access to the Ildasm tool and the C# compiler. Article Structure The article has three main sections: ● An Overview of MSIL: The basics of MSIL, the data types, instruction types, and the way that the instructions are executed are explained. ● Examining MSIL: This section covers MSIL using simple example programs. ● Debugging Using the Ildasm tool: Explains the use of the intermediate language disassembler (Ildasm) and the way it can be used for debugging. .NET supports several high-level languages such as C#, VB.NET and Managed C++.NET. The MSIL is designed to accommodate a wide range of languages. In .NET, the unit of deployment is the PE (Portable Executable) file - a predefined binary standard (similar to the class files of Java). MSIL, along with metadata, is stored inside the PE files generated by the compiler. MSIL is such a simple language that it doesn't require much effort to understand. Metadata describes the types - its definition, signature, etc - that are useful at runtime. An Overview of MSIL MSIL is a CPU independent, stack-based instruction set that can be efficiently converted to the native code of a specific platform. In this stack-based approach, the representation assumes the presence of a run-time stack and the code is generated keeping the stack in mind. The runtime environment may use the stack for evaluation of expressions, and store the intermediate values in the stack itself. Such an evaluation using a runtime stack is a form of interpretation. In practice, the MSIL is not interpreted - there is a Just-In-Time (JIT) compiler that translates the intermediate code to native code to execute in a particular platform at runtime. The stack-based code facilitates maximum portability across the platforms and is easy to verify. The MSIL:
  2. 2. ● Supports object oriented programming. ● Works in terms of the data types available in the .NET Framework, for example, System.String and System.Int32. ● Instructions can be classified into various types such as: loading (ld*), storing (st*), method invocation, arithmetic operations, logical operations, control flow, memory allocation, and exception handling. The following section covers basic instructions using examples. Examining MSIL Let us start with the following simple C# code, and see how it is compiled to intermediate code. Console.WriteLine(quot;hello worldquot;); The MSIL code looks like this (using the Ildasm tool that is discussed later). // disassembled code using ildasm tool ldstr quot;hello worldquot; call void [mscorlib]System.Console::WriteLine(string) Now let us examine how it works: The ldstr (standing for 'load string') instruction indicates that the string constant quot;hello worldquot; be pushed onto the evaluation stack. ● The call instruction is for calling a method. Here, the call is made for the static WriteLine method of the Console class that is available in mscorlib.dll, in the System namespace. The WriteLine method takes a string as the argument and its return type is void. It executes as follows: ● The ldstr instruction pushes the reference to the constant quot;hello worldquot; into the stack. ● The call method calls the WriteLine method, which looks for a string argument, and pops it from the stack. Now the stack contains nothing. The WriteLine method now executes to print the message quot; hello world quot; on the screen and returns. As you can see, understanding the MSIL code is far from difficult! If you have prior exposure to any assembly language, it will be very easy for you to learn MSIL. From this simple program, let us move on to a program illustrating branching and arithmetic instructions. // C# source code int i = 10; if(i!=20) i = i*20; Console.WriteLine(i); // disassembled MSIL code using ildasm tool IL_0000: ldc.i4.s 10 IL_0002: stloc.0 IL_0003: ldloc.0 IL_0004: ldc.i4.s 20 IL_0006: beq.s IL_000d IL_0008: ldloc.0 IL_0009: ldc.i4.s 20 IL_000b: mul IL_000c: stloc.0 IL_000d: ldloc.0
  3. 3. IL_000e: call void [mscorlib]System.Console::WriteLine(int32) You can see that lots of MSIL code has been generated for this simple C# code, but it is simple once you understand what the instructions do. You can see that the instructions are preceded by IL_xxxx: - these are labels used so that it is possible to 'jump' from one part of the code to another. The ldc.i4.s (stands for 'load constant'.'four byte integer'.'single byte argument') instruction pushes the integer constant 10 onto the stack. The stloc.0 (stands for 'store in location'.'zeroeth variable') instruction pops the integer constant 10 from the stack and stores it in the variable number 0 (local variables are remembered by counting them from 0). The ldloc.0 (stands for 'load from location'.'zeroeth variable') instruction loads the value of the variable from location zero (i.e. variable i in the source code) and push it onto the stack. The ldc.i4.s instruction pushes the integer constant 20 onto the stack. The beq.s (stands for 'branch if equal to'.' single byte argument') instruction pops two items from the stack and checks if they are equal and if so, it transfers the control to the instruction at the location identified by the label IL_000d. The ldloc.0 instruction pushes the value of variable i onto the stack. The ldc.i4.s instruction pushes the integer constant 0 onto the stack. The mul (stands for 'multiply') instruction pops two items from the stack, multiplies the values, and pushes the result back to the stack. Now the result of the multiplication is at the top of the stack. The stloc.0 instruction pops the top value from the stack (the result of the multiplication in this case) and stores it in variable i. The ldloc.0 instruction pushes the value of i onto the stack The call (stands for 'call the method') instruction calls the WriteLine method that takes an integer as an argument. The WriteLine method pops the value from the stack and displays it on the screen. Debugging Using ILDASM Tool Microsoft's .NET SDK is shipped with an IL disassembler, Ildasm.exe (usually located in the directory Program FilesMicrosoft.NetFrameworkSDKBin). A disassembler loads your assemblies and shows the MSIL code with other details in the assembly. This tool can be handy in debugging code once you become proficient at understanding MSIL code. How can MSIL help in debugging? Bugs happen in code when there is a mismatch between what we expect the code to do and what the code actually does. If we can dig down to a lower level and see what the machine is actually doing with our code, it is easier to spot the mismatch. That is the idea behind using ILDASM for debugging. Let us look at an example and see how we can debug the code. The following innocent looking code doesn't work as you'd expect. It doesn't print quot; yes, o1 == o2 quot; as we'd expect, even though the code is straightforward. int i = 10; object o1 = i, o2 = i; if(o1 == o2) Console.WriteLine(quot;yes, o1 == o2quot;);
  4. 4. Now let us dig a little deeper and see what the machine is actually doing by looking at the MSIL code generated by the Ildasm tool: IL_0000: ldc.i4.s 10 IL_0002: stloc.0 IL_0003: ldloc.0 IL_0004: box [mscorlib]System.Int32 IL_0009: stloc.1 IL_000a: ldloc.0 IL_000b: box [mscorlib]System.Int32 IL_0010: stloc.2 IL_0011: ldloc.1 IL_0012: ldloc.2 IL_0013: bne.un.s IL_001f IL_0015: ldstr quot;yes, o1 == o2quot; IL_001a: call void [mscorlib]System.Console::WriteLine(string) IL_001f: ret There lies the clue. Can you see that the boxing operation from int to object type is taking place twice? As the value type is converted to a reference type, the object is allocated on the heap. Since boxing is done twice, the two objects o1 and o2 are allocated in two different places on the heap. We have found where things went wrong, and this means we can make a simple correction to our code: int i = 10; object o1 = i, o2 = o1; if(o1 == o2) Console.WriteLine(quot;yes, o1 == o2quot;); Now when we look at the resulting MSIL code (again disassembling using the Ildasm tool), the boxing is done only once, and both the references are pointing to the same object now. So, the program now works as expected. IL_0000: ldc.i4.s 10 IL_0002: stloc.0 IL_0003: ldloc.0 IL_0004: box [mscorlib]System.Int32 IL_0009: stloc.1 IL_000a: ldloc.1 IL_000b: stloc.2 IL_000c: ldloc.1 IL_000d: ldloc.2 IL_000e: bne.un.s IL_001a IL_0010: ldstr quot;yes, o1 == o2quot; IL_0015: call void [mscorlib]System.Console::WriteLine(string) IL_001a: ret The example shown here is simple, but it shows how the tool can be employed effectively for debugging code. Article Review In this article we have explained the basics of MSIL, and using this knowledge, looked into how the Ildasm tool can be used to help debug your code. This is only a beginner-level article, and so interested readers are encouraged to look further into MSIL and the Ildasm tool. All rights reserved. Copyright Jan 2004.