First compailer written


Published on

Details how compailler first written

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

First compailer written

  1. 1. How was the first compiler written? I heard about the chicken and the egg and bootstrapping. I have a few questions. What wrote the first compiler that converted something into binary instructions? Is assembly compiled or translated into binary instructions? ...I'd find it hard to believe they wrote a compiler in binary. Assembly instructions are (generally) a direct mapping to opcodes, which are (multi-)byte values of machine code that can be directly interpreted by the processor. It is quite possible to write a program in opcodes directly by looking them up from a table (such as this one for the 6039 microprocessor, for example) that lists them with the matching assembly instructions, and hand-determining memory addresses/offsets for things like jumps. The first programs were done in exactly this fashion - hand-written opcodes. However, most of the time it's simpler to use an assembler to "compile" assembly code, which automatically does these opcode lookups, as well as being helpful in computing addresses/offsets for named jump labels, et cetera. The first assemblers were written by hand. Those assemblers could then be used to assemble more complicated assemblers, which could then be use to assemble compilers written for higher-level languages, and so on. This process of iteratively writing the tools to simplify the creation of the next set of tools is called (as mentioned by David Rabinowitz in his answer) bootstrapping. Opcode
  2. 2. From Wikipedia, the free encyclopedia Jump to: navigation, search In computer science, an opcode (operation code) is the portion of a machine language instruction that specifies the operation to be performed. Opcode From Wikipedia, the free encyclopedia (Redirected from Op-code) Jump to: navigation, search In computer science, an opcode (operation code) is the portion of a machine language instruction that specifies the operation to be performed. Contents 1 ISA 2 Assembly 3 Software instruction sets 4 See also 5 References ISA Their specification and format are laid out in the instruction set architecture (ISA) of the processor in question (which may be a general CPU or a more specialized processing unit). Apart from the opcode itself, an instruction normally also has one or more specifiers for operands (i.e. data) on which the operation should act, although some operations may have implicit operands, or none at all. There are instruction sets with nearly uniform fields for opcode and operand specifiers, as well as others (the x86 architecture for instance) with a more complicated, varied length structure. [1] Depending on architecture, the operands may be register values, values in the stack, other memory values, I/O ports, etc., specified and accessed using more or less complex addressing modes. The types of operations include arithmetics, data copying, logical operations, and program control, as well as special instructions (such as CPUID and others). Assembly Assembly language, or just assembly, is a low-level programming language, which uses mnemonics, instructions and operands to represent machine code. This enhances the readability while still giving precise control over the machine instructions. Most programming is currently
  3. 3. done using high-level programming languages,[2][3] which are typically easier to read and write. These languages need to be compiled (translated into assembly language), or run through other compiled programs.[4] Software instruction sets Opcodes can also be found in so called byte codes and other representations intended for a software interpreter rather than a hardware device. These software based instruction sets often employ slightly higher-level data types and operations than most hardware counterparts, but are nevertheless constructed along similar lines. Examples include the byte code found in Java class files which are then interpreted by the Java Virtual Machine (JVM), the byte code used in GNU Emacs for compiled LISP code, .NET Common Intermediate Language (CIL), and many others How were the first compilers made? I always wonder this, and perhaps I need a good history lesson on programming languages. But since most compilers nowadays are made in C, how were the very first compilers made (AKA before C) or were all the languages just interpreted? down vote favorite With that being said, I still don't understand how even the first assembly language 8 was done, I understand what assembly language is but I don't see how they got the VERY first assembly language working (like, how did they make the first commands (like mov R21) or w/e set to the binary equivalent? 73 down vote accepted Ha, I've done this. Many CPUs have simple, fixed-size instructions that are just a couple of bytes long. For a simple CPU like a Motorola 6800 for example, you could fit all of its instructions on a single sheet of paper. Each instruction would have a two-byte opcode associated with it, and arguments. You could hand-assemble a program by looking up each instruction's opcode. You'd then write your program on paper, annotating each instruction with its corresponding opcode. Once you had
  4. 4. written out your program, you could could burn each opcode in sequence to an EPROM which would then store your program. Wire the EPROM up to the CPU with just the right instructions at the right addresses, and you have a simple working program. And to answer your next question, yes. It was painful (we did this in high school). But I have to say that wiring up every chip in an 8-bit computer and writing a program manually gave me a depth of understanding of computer architecture which I could probably not have achieved any other way. More advanced chips (like x86) are far more difficult to hand-code, because they often have variable-length instructions. VLIW/EPIC processors like the Itanium are close to impossible to hand-code efficiently because they deal in packets of instructions which are optimized and assembled by advanced compilers. For new architectures, programs are almost always written and assembled on another computer first, then loaded into the new architecture. In fact, for firms like Intel who actually build CPUs, they can run actual programs on architectures which don't exist yet by running them on simulators. But I digress... As for compilers, at their very simplest, they can be little more than "cut and paste" programs. You could write a very simple, non-optimizing, "high level language" that just clusters together simple assembly language instructions without a whole lot of effort. If you want a history of compilers and programming languages, I suggest you GOTO a history of FORTRAN. History of FORTRON 1-1 A Brief History of FORTRAN/Fortran *************************************** (Thanks to John Nebel for the nice description of a FORTRAN's user point of view) A note on names --------------Both forms of the language name, FORTRAN and Fortran, are used. In this text, older versions (before and including 1977) of the language will be referred to as FORTRAN, post-1977 ones will be referred to as 'Fortran 90', 'Fortran 95' etc. The development of FORTRAN I ---------------------------The first FORTRAN compiler was a milestone in the history of computing, at that time computers had very small memories (on the order of 15KB, it was common then to count memory capacities in bits), they were slow
  5. 5. and had very primitive operating systems (if they had them at all). At those days it seemed that the only practical way is to program in assembly language. The pioneers of FORTRAN didn't invent the idea of writing programs in a High Level Language (HLL) and compiling the source code to object code with an optimizing compiler, but they produced the first successful HLL. They designed an HLL that is still widely used, and an optimizing compiler that produced very efficient code, in fact the FORTRAN I compiler held the record for optimizing code for 20 years! This wonderful first FORTRAN compiler was designed and written from scratch in 1954-57 by an IBM team lead by John W. Backus and staffed with super-programmers like Sheldon F. Best, Harlan Herrick, Peter Sheridan, Roy Nutt, Robert Nelson, Irving Ziller, Richard Goldberg, Lois Haibt and David Sayre. By the way, Backus was also system co-designer of the computer that run the first compiler, the IBM 704. The new invention caught quickly, no wonder, programs computing nuclear power reactor parameters took now hours instead of weeks to write, and required much less programming skill. Another great advantage of the new invention was that programs now became portable. Fortran won the battle against Assembly language, the first in a series of battles to come, and was adopted by the scientific and military communities and used extensively in the Space Program and military projects. The phenomenal success of the FORTRAN I team, can be attributed in part to the friendly non-authoritative group climate. Another factor may be that IBM management had the sense to shelter and protect the group, even though the project took much more time than was first anticipated. FORTRAN II, III, IV and FORTRAN 66 ---------------------------------FORTRAN II (1958) was a significant improvement, it added the capability for separate compilation of program modules, assembly language modules could also be 'linked loaded' with FORTRAN modules. FORTRAN III (1958) was never released to the public. It made possible using assembly language code right in the middle of the FORTRAN code. Such "inlined" assembly code can be more efficient, but the advantages of an HLL are lost (e.g. portability, ease of use). FORTRAN IV (1961) was a 'clean up' of FORTRAN II, improving things like the implementation of the COMMON and EQUIVALENCE statements, and eliminating some machine-dependant language irregularities. A FORTRAN II to FORTRAN IV translator was used to retain backward compatibility with earlier FORTRAN programs. On May 1962 another milestone was traversed, an ASA committee started developing a standard for the FORTRAN language, a very important step that made it worthwhile for vendors to produce FORTRAN systems for every new computer, and made FORTRAN an even more popular HLL. The new ASA standard was published in 1966, and was known accordingly
  6. 6. as FORTRAN 66, it was the first HLL standard in the world. FORTRAN 77 standard ------------------Formally outdated many years ago, compilers for FORTRAN 77 are still used today, mainly to re-compile legacy code. FORTRAN 77 added: o o o o o o DO loops with a decreasing control variable (index). Block if statements IF ... THEN ... ELSE ... ENDIF. Before F77 there were only IF GOTO statements. Pre-test of DO loops. Before F77 DO loops were always executed at least once, so you had to add an IF GOTO before the loop if you wanted the expected behaviour. CHARACTER data type. Before F77 characters were always stored inside INTEGER variables. Apostrophe delimited character string constants. Main program termination without a STOP statement. The next Fortran standard (fortran 90) was published too many years after Fortran 77 was out, allowing other programming languages to evolve and compete with Fortran. For example, the system-programming language C, and its evolved variant C++, became more popular in the traditional strongholds of Fortran: the scientific and engineering worlds, in spite of being non-computationally oriented. The delay in publishing a new standard can be attributed in part to political reasons as testified by Brian Meek in: The Fortran Saga Fortran 90 standard ------------------A new standard has been designed and widely implemented in recent years. It is unofficially called Fortran 90, and adds many powerful extensions to FORTRAN 77. The language in its present form is competitive with computer languages created later (e.g. C). Fortran 90 added: o o o o o o o o o o Free format source code form (column independent) Modern control structures (CASE & DO WHILE) Records/structures - called "Derived Data Types" Powerful array notation (array sections, array operators, etc.) Dynamic memory allocation Operator overloading Keyword argument passing The INTENT (in, out, inout) procedure argument attribute Control of numeric precision and range Modules - packages containing variable and code Fortran 95 standard
  7. 7. ------------------Fortran 95 added some minor improvements to the Fortran 90 standard. Fortran from a user point of view --------------------------------... yes, it was FORTRAN on the IBM 7094. [I] Have written volumes of Fortran code and have suffered through "it ought to be written in assembly language", "it ought to be written in PL/1", "it ought to be written in COBOL", "it ought to be written in Pascal", "it ought to be written in C", etc. depending on what generation of programmers was doing the criticizing. A few years ago, in the COBOL era, one of the users resorted to replying to questioners by showing them some function they liked and asking "you tell me, what language was that written in?" ... It was good to see someone else cognizant of the language's obvious merits. Bibliography on FORTRAN history ------------------------------Annals of History of Computing, 6, 1, January, 1984 (whole issue). Programming Systems and Languages (S. Rosen ed.), McGraw Hill, 1967, pp 29-47 (this is Backus's original paper). History of Programming Languages (R.L. Wexelblat ed.), Academic Press, 1981, pp 25-74. A summary appears in vol. 5 of the Encyclopedia of Science and Technology, Academic Press, 1986, under 'Fortran'. and in Chapter 1 of Fortran 90 Explained (Oxford, 1990). +-------------------------------------------------+ | FORTRAN IS THE COMPUTING LANGUAGE OF CHOICE A History of Computer Programming Languages Ever since the invention of Charles Babbage’s difference engine in 1822, computers have required a means of instructing them to perform a specific task. This means is known as a
  8. 8. programming language. Computer languages were first composed of a series of steps to wire a particular program; these morphed into a series of steps keyed into the computer and then executed; later these languages acquired advanced features such as logical branching and object orientation. The computer languages of the last fifty years have come in two stages, the first major languages and the second major languages, which are in use today. In the beginning, Charles Babbage’s difference engine could only be made to execute tasks by changing the gears which executed the calculations. Thus, the earliest form of a computer language was physical motion. Eventually, physical motion was replaced by electrical signals when the US Government built the ENIAC in 1942. It followed many of the same principles of Babbage’s engine and hence, could only be “programmed” by presetting switches and rewiring the entire system for each new “program” or calculation. This process proved to be very tedious. In 1945, John Von Neumann was working at the Institute for Advanced Study. He developed two important concepts that directly affected the path of computer programming languages. The first was known as “shared-program technique” ( This technique stated that the actual computer hardware should be simple and not need to be hand-wired for each program. Instead, complex instructions should be used to control the simple hardware, allowing it to be reprogrammed much faster. The second concept was also extremely important to the development of programming languages. Von Neumann called it “conditional control transfer” ( This idea gave rise to the notion of subroutines, or small blocks of code that could be jumped to in any order, instead of a single set of chronologically ordered steps for the computer to take. The second part of the idea stated that computer code should be able to branch based on logical statements such as IF (expression) THEN, and looped such as with a FOR statement. “Conditional control transfer” gave rise to the idea of “libraries,” which are blocks of code that can be reused over and over. (Updated Aug 1 2004: Around this time, Konrad Zuse, a German, was inventing his own computing systems independently and developed many of the same concepts, both in his machines and in the Plankalkul programming language. Alas, his work did not become widely known until much later. For more information, see this website:, or the entries on Wikipedia: Konrad Zuse and Plankalkul.) In 1949, a few years after Von Neumann’s work, the language Short Code appeared ( It was the first computer language for electronic devices and it required the programmer to change its statements into 0’s and 1’s by hand. Still, it was the first step towards the complex languages of today. In 1951, Grace Hopper wrote the first compiler, A-0 ( A compiler is a program that turns the language’s statements into 0’s and 1’s for the computer to understand. This lead to faster programming, as the programmer no longer had to do the work by hand. In 1957, the first of the major languages appeared in the form of FORTRAN. Its name stands for FORmula TRANslating system. The language was designed at IBM for scientific computing. The components were very simple, and provided the programmer with low-level access to the computers innards. Today, this language would be considered restrictive as it only included IF, DO, and GOTO statements, but at the time, these commands were a big step forward. The basic
  9. 9. types of data in use today got their start in FORTRAN, these included logical variables (TRUE or FALSE), and integer, real, and double-precision numbers. Though FORTAN was good at handling numbers, it was not so good at handling input and output, which mattered most to business computing. Business computing started to take off in 1959, and because of this, COBOL was developed. It was designed from the ground up as the language for businessmen. Its only data types were numbers and strings of text. It also allowed for these to be grouped into arrays and records, so that data could be tracked and organized better. It is interesting to note that a COBOL program is built in a way similar to an essay, with four or five major sections that build into an elegant whole. COBOL statements also have a very English-like grammar, making it quite easy to learn. All of these features were designed to make it easier for the average business to learn and adopt it. (Updated Aug 11 2004) In 1958, John McCarthy of MIT created the LISt Processing (or LISP) language. It was designed for Artificial Intelligence (AI) research. Because it was designed for a specialized field, the original release of LISP had a unique syntax: essentially none. Programmers wrote code in parse trees, which are usually a compiler-generated intermediary between higher syntax (such as in C or Java) and lower-level code. Another obvious difference between this language (in original form) and other languages is that the basic and only type of data is the list; in the mid-1960’s, LISP acquired other data types. A LISP list is denoted by a sequence of items enclosed by parentheses. LISP programs themselves are written as a set of lists, so that LISP has the unique ability to modify itself, and hence grow on its own. The LISP syntax was known as “Cambridge Polish,” as it was very different from standard Boolean logic (Wexelblat, 177): x V y - Cambridge Polish, what was used to describe the LISP program OR(x,y) - parenthesized prefix notation, what was used in the LISP program x OR y - standard Boolean logic LISP remains in use today because its highly specialized and abstract nature. The Algol language was created by a committee for scientific use in 1958. It’s major contribution is being the root of the tree that has led to such languages as Pascal, C, C++, and Java. It was also the first language with a formal grammar, known as Backus-Naar Form or BNF (McGraw-Hill Encyclopedia of Science and Technology, 454). Though Algol implemented some novel concepts, such as recursive calling of functions, the next version of the language, Algol 68, became bloated and difficult to use ( This lead to the adoption of smaller and more compact languages, such as Pascal. Pascal was begun in 1968 by Niklaus Wirth. Its development was mainly out of necessity for a good teaching tool. In the beginning, the language designers had no hopes for it to enjoy widespread adoption. Instead, they concentrated on developing good tools for teaching such as a debugger and editing system and support for common early microprocessor machines which were in use in teaching institutions.
  10. 10. Pascal was designed in a very orderly approach, it combined many of the best features of the languages in use at the time, COBOL, FORTRAN, and ALGOL. While doing so, many of the irregularities and oddball statements of these languages were cleaned up, which helped it gain users (Bergin, 100-101). The combination of features, input/output and solid mathematical features, made it a highly successful language. Pascal also improved the “pointer” data type, a very powerful feature of any language that implements it. It also added a CASE statement, that allowed instructions to to branch like a tree in such a manner: CASE expression OF possible-expression-value-1: statements to execute... possible-expression-value-2: statements to execute... END Pascal also helped the development of dynamic variables, which could be created while a program was being run, through the NEW and DISPOSE commands. However, Pascal did not implement dynamic arrays, or groups of variables, which proved to be needed and led to its downfall (Bergin, 101-102). Wirth later created a successor to Pascal, Modula2, but by the time it appeared, C was gaining popularity and users at a rapid pace. C was developed in 1972 by Dennis Ritchie while working at Bell Labs in New Jersey. The transition in usage from the first major languages to the major languages of today occurred with the transition between Pascal and C. Its direct ancestors are B and BCPL, but its similarities to Pascal are quite obvious. All of the features of Pascal, including the new ones such as the CASE statement are available in C. C uses pointers extensively and was built to be fast and powerful at the expense of being hard to read. But because it fixed most of the mistakes Pascal had, it won over former-Pascal users quite rapidly. Ritchie developed C for the new Unix system being created at the same time. Because of this, C and Unix go hand in hand. Unix gives C such advanced features as dynamic variables, multitasking, interrupt handling, forking, and strong, low-level, input-output. Because of this, C is very commonly used to program operating systems such as Unix, Windows, the MacOS, and Linux. In the late 1970’s and early 1980’s, a new programing method was being developed. It was known as Object Oriented Programming, or OOP. Objects are pieces of data that can be packaged and manipulated by the programmer. Bjarne Stroustroup liked this method and developed extensions to C known as “C With Classes.” This set of extensions developed into the full-featured language C++, which was released in 1983. C++ was designed to organize the raw power of C using OOP, but maintain the speed of C and be able to run on many different types of computers. C++ is most often used in simulations, such as games. C++ provides an elegant way to track and manipulate hundreds of instances of people in elevators, or armies filled with different types of soldiers. It is the language of choice in today’s AP Computer Science courses.
  11. 11. In the early 1990’s, interactive TV was the technology of the future. Sun Microsystems decided that interactive TV needed a special, portable (can run on many types of machines), language. This language eventually became Java. In 1994, the Java project team changed their focus to the web, which was becoming “the cool thing” after interactive TV failed. The next year, Netscape licensed Java for use in their internet browser, Navigator. At this point, Java became the language of the future and several companies announced applications which would be written in Java, none of which came into use. Though Java has very lofty goals and is a text-book example of a good language, it may be the “language that wasn’t.” It has serious optimization problems, meaning that programs written in it run very slowly. And Sun has hurt Java’s acceptance by engaging in political battles over it with Microsoft. But Java may wind up as the instructional language of tomorrow as it is truly object-oriented and implements advanced techniques such as true portability of code and garbage collection. Visual Basic is often taught as a first programming language today as it is based on the BASIC language developed in 1964 by John Kemeny and Thomas Kurtz. BASIC is a very limited language and was designed for non-computer science people. Statements are chiefly run sequentially, but program control can change based on IF..THEN, and GOSUB statements which execute a certain block of code and then return to the original point in the program’s flow. Microsoft has extended BASIC in its Visual Basic (VB) product. The heart of VB is the form, or blank window on which you drag and drop components such as menus, pictures, and slider bars. These items are known as “widgets.” Widgets have properties (such as its color) and events (such as clicks and double-clicks) and are central to building any user interface today in any language. VB is most often used today to create quick and simple interfaces to other Microsoft products such as Excel and Access without needing a lot of code, though it is possible to create full applications with it. Perl has often been described as the “duct tape of the Internet,” because it is most often used as the engine for a web interface or in scripts that modify configuration files. It has very strong text matching functions which make it ideal for these tasks. Perl was developed by Larry Wall in 1987 because the Unix sed and awk tools (used for text manipulation) were no longer strong enough to support his needs. Depending on whom you ask, Perl stands for Practical Extraction and Reporting Language or Pathologically Eclectic Rubbish Lister. Programming languages have been under development for years and will remain so for many years to come. They got their start with a list of steps to wire a computer to perform a task. These steps eventually found their way into software and began to acquire newer and better features. The first major languages were characterized by the simple fact that they were intended for one purpose and one purpose only, while the languages of today are differentiated by the way they are programmed in, as they can be used for almost any
  12. 12. purpose. And perhaps the languages of tomorrow will be more natural with the invention of quantum and biological computers.