How was the first compiler written?
I heard about the chicken and the egg and bootstrapping. I have a few questions.
What wrote the first compiler that converted something into binary instructions?
Is assembly compiled or translated into binary instructions?
...I'd find it hard to believe they wrote a compiler in binary.
Assembly instructions are (generally) a direct mapping to opcodes, which are (multi-)byte values
of machine code that can be directly interpreted by the processor. It is quite possible to
write a program in opcodes directly by looking them up from a table (such as
this one for the 6039 microprocessor, for example) that lists them with the
matching assembly instructions, and hand-determining memory
addresses/offsets for things like jumps.
The first programs were done in exactly this fashion - hand-written opcodes.
However, most of the time it's simpler to use an assembler to "compile" assembly code, which
automatically does these opcode lookups, as well as being helpful in computing addresses/offsets
for named jump labels, et cetera.
The first assemblers were written by hand. Those
assemblers could then be used to assemble more
complicated assemblers, which could then be use to
assemble compilers written for higher-level
languages, and so on. This process of iteratively writing the tools to simplify
the creation of the next set of tools is called (as mentioned by David Rabinowitz in his answer)
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In computer science, an opcode (operation code) is the portion of a machine language
instruction that specifies the operation to be performed.
From Wikipedia, the free encyclopedia
(Redirected from Op-code)
Jump to: navigation, search
In computer science, an opcode (operation code) is the portion of a machine language
instruction that specifies the operation to be performed.
3 Software instruction sets
4 See also
Their specification and format are laid out in the instruction set architecture (ISA) of the
processor in question (which may be a general CPU or a more specialized processing unit). Apart
from the opcode itself, an instruction normally also has one or more specifiers for operands (i.e.
data) on which the operation should act, although some operations may have implicit operands,
or none at all. There are instruction sets with nearly uniform fields for opcode and operand
specifiers, as well as others (the x86 architecture for instance) with a more complicated, varied
length structure. 
Depending on architecture, the operands may be register values, values in the stack, other
memory values, I/O ports, etc., specified and accessed using more or less complex addressing
modes. The types of operations include arithmetics, data copying, logical operations, and
program control, as well as special instructions (such as CPUID and others).
Assembly language, or just assembly, is a low-level programming language, which uses
mnemonics, instructions and operands to represent machine code. This enhances the readability
while still giving precise control over the machine instructions. Most programming is currently
done using high-level programming languages, which are typically easier to read and write.
These languages need to be compiled (translated into assembly language), or run through other
Software instruction sets
Opcodes can also be found in so called byte codes and other representations intended for a
software interpreter rather than a hardware device. These software based instruction sets often
employ slightly higher-level data types and operations than most hardware counterparts, but are
nevertheless constructed along similar lines. Examples include the byte code found in Java class
files which are then interpreted by the Java Virtual Machine (JVM), the byte code used in GNU
Emacs for compiled LISP code, .NET Common Intermediate Language (CIL), and many others
How were the first compilers made?
I always wonder this, and perhaps I need a good history lesson on programming
languages. But since most compilers nowadays are made in C, how were the very first
compilers made (AKA before C) or were all the languages just interpreted?
With that being said, I still don't understand how even the first assembly language
was done, I understand what assembly language is but I don't see how they got the
VERY first assembly language working (like, how did they make the first commands
(like mov R21) or w/e set to the binary equivalent?
Ha, I've done this. Many CPUs have simple, fixed-size instructions that are just a
couple of bytes long. For a simple CPU like a Motorola 6800 for example, you
could fit all of its instructions on a single sheet of paper. Each instruction would
have a two-byte opcode associated with it, and arguments. You could hand-assemble
a program by looking up each instruction's opcode. You'd then write your program
on paper, annotating each instruction with its corresponding opcode. Once you had
written out your program, you could could burn each opcode in sequence to an
EPROM which would then store your program. Wire the EPROM up to the CPU
with just the right instructions at the right addresses, and you have a simple working
program. And to answer your next question, yes. It was painful (we did this in high
school). But I have to say that wiring up every chip in an 8-bit computer and writing
a program manually gave me a depth of understanding of computer architecture
which I could probably not have achieved any other way.
More advanced chips (like x86) are far more difficult to hand-code, because they
often have variable-length instructions. VLIW/EPIC processors like the Itanium are
close to impossible to hand-code efficiently because they deal in packets of
instructions which are optimized and assembled by advanced compilers. For new
architectures, programs are almost always written and assembled on another
computer first, then loaded into the new architecture. In fact, for firms like Intel who
actually build CPUs, they can run actual programs on architectures which don't exist
yet by running them on simulators. But I digress...
As for compilers, at their very simplest, they can be little more than "cut and paste"
programs. You could write a very simple, non-optimizing, "high level language"
that just clusters together simple assembly language instructions without a whole lot
If you want a history of compilers and programming languages, I suggest you
GOTO a history of FORTRAN.
History of FORTRON
1-1 A Brief History of FORTRAN/Fortran
(Thanks to John Nebel for the nice description of a FORTRAN's user
point of view)
A note on names
--------------Both forms of the language name, FORTRAN and Fortran, are used.
In this text, older versions (before and including 1977) of the
language will be referred to as FORTRAN, post-1977 ones will be
referred to as 'Fortran 90', 'Fortran 95' etc.
The development of FORTRAN I
---------------------------The first FORTRAN compiler was a milestone in the history of computing,
at that time computers had very small memories (on the order of 15KB,
it was common then to count memory capacities in bits), they were slow
and had very primitive operating systems (if they had them at all).
At those days it seemed that the only practical way is to program in
The pioneers of FORTRAN didn't invent the idea of writing programs in a
High Level Language (HLL) and compiling the source code to object code
with an optimizing compiler, but they produced the first successful HLL.
They designed an HLL that is still widely used, and an optimizing compiler
that produced very efficient code, in fact the FORTRAN I compiler held
the record for optimizing code for 20 years!
This wonderful first FORTRAN compiler was designed and written from
scratch in 1954-57 by an IBM team lead by John W. Backus and staffed with
super-programmers like Sheldon F. Best, Harlan Herrick, Peter Sheridan,
Roy Nutt, Robert Nelson, Irving Ziller, Richard Goldberg, Lois Haibt
and David Sayre. By the way, Backus was also system co-designer of the
computer that run the first compiler, the IBM 704.
The new invention caught quickly, no wonder, programs computing nuclear
power reactor parameters took now hours instead of weeks to write, and
required much less programming skill. Another great advantage of the new
invention was that programs now became portable. Fortran won the battle
against Assembly language, the first in a series of battles to come,
and was adopted by the scientific and military communities and used
extensively in the Space Program and military projects.
The phenomenal success of the FORTRAN I team, can be attributed in part
to the friendly non-authoritative group climate. Another factor may be
that IBM management had the sense to shelter and protect the group,
even though the project took much more time than was first anticipated.
FORTRAN II, III, IV and FORTRAN 66
---------------------------------FORTRAN II (1958) was a significant improvement, it added the capability
for separate compilation of program modules, assembly language modules
could also be 'linked loaded' with FORTRAN modules.
FORTRAN III (1958) was never released to the public. It made possible
using assembly language code right in the middle of the FORTRAN code.
Such "inlined" assembly code can be more efficient, but the advantages
of an HLL are lost (e.g. portability, ease of use).
FORTRAN IV (1961) was a 'clean up' of FORTRAN II, improving things
like the implementation of the COMMON and EQUIVALENCE statements,
and eliminating some machine-dependant language irregularities.
A FORTRAN II to FORTRAN IV translator was used to retain backward
compatibility with earlier FORTRAN programs.
On May 1962 another milestone was traversed, an ASA committee started
developing a standard for the FORTRAN language, a very important step
that made it worthwhile for vendors to produce FORTRAN systems for
every new computer, and made FORTRAN an even more popular HLL.
The new ASA standard was published in 1966, and was known accordingly
as FORTRAN 66, it was the first HLL standard in the world.
FORTRAN 77 standard
------------------Formally outdated many years ago, compilers for FORTRAN 77 are still
used today, mainly to re-compile legacy code.
FORTRAN 77 added:
DO loops with a decreasing control variable (index).
Block if statements IF ... THEN ... ELSE ... ENDIF.
Before F77 there were only IF GOTO statements.
Pre-test of DO loops. Before F77 DO loops were always
executed at least once, so you had to add an IF GOTO
before the loop if you wanted the expected behaviour.
CHARACTER data type. Before F77 characters were always
stored inside INTEGER variables.
Apostrophe delimited character string constants.
Main program termination without a STOP statement.
The next Fortran standard (fortran 90) was published too many years
after Fortran 77 was out, allowing other programming languages to
evolve and compete with Fortran. For example, the system-programming
language C, and its evolved variant C++, became more popular in the
traditional strongholds of Fortran: the scientific and engineering
worlds, in spite of being non-computationally oriented.
The delay in publishing a new standard can be attributed in part
to political reasons as testified by Brian Meek in:
The Fortran Saga
Fortran 90 standard
------------------A new standard has been designed and widely implemented in recent years.
It is unofficially called Fortran 90, and adds many powerful extensions
to FORTRAN 77. The language in its present form is competitive with
computer languages created later (e.g. C).
Fortran 90 added:
Free format source code form (column independent)
Modern control structures (CASE & DO WHILE)
Records/structures - called "Derived Data Types"
Powerful array notation (array sections, array operators, etc.)
Dynamic memory allocation
Keyword argument passing
The INTENT (in, out, inout) procedure argument attribute
Control of numeric precision and range
Modules - packages containing variable and code
Fortran 95 standard
------------------Fortran 95 added some minor improvements to the Fortran 90 standard.
Fortran from a user point of view
--------------------------------... yes, it was FORTRAN on the IBM 7094. [I] Have written volumes
of Fortran code and have suffered through "it ought to be written
in assembly language", "it ought to be written in PL/1", "it ought
to be written in COBOL", "it ought to be written in Pascal", "it
ought to be written in C", etc. depending on what generation of
programmers was doing the criticizing.
A few years ago, in the COBOL era, one of the users resorted to
replying to questioners by showing them some function they liked
and asking "you tell me, what language was that written in?"
... It was good to see someone else cognizant of the language's
Bibliography on FORTRAN history
------------------------------Annals of History of Computing, 6, 1, January, 1984 (whole issue).
Programming Systems and Languages (S. Rosen ed.), McGraw Hill,
1967, pp 29-47 (this is Backus's original paper).
History of Programming Languages (R.L. Wexelblat ed.),
Academic Press, 1981, pp 25-74.
A summary appears in vol. 5 of the Encyclopedia of Science
and Technology, Academic Press, 1986, under 'Fortran'.
and in Chapter 1 of Fortran 90 Explained (Oxford, 1990).
FORTRAN IS THE COMPUTING LANGUAGE OF CHOICE
A History of Computer Programming
Ever since the invention of Charles Babbage’s difference engine in 1822, computers have
required a means of instructing them to perform a specific task. This means is known as a
programming language. Computer languages were first composed of a series of steps to wire a
particular program; these morphed into a series of steps keyed into the computer and then
executed; later these languages acquired advanced features such as logical branching and object
orientation. The computer languages of the last fifty years have come in two stages, the first
major languages and the second major languages, which are in use today.
In the beginning, Charles Babbage’s difference engine could only be made to execute tasks by
changing the gears which executed the calculations. Thus, the earliest form of a computer
language was physical motion. Eventually, physical motion was replaced by electrical signals
when the US Government built the ENIAC in 1942. It followed many of the same principles of
Babbage’s engine and hence, could only be “programmed” by presetting switches and rewiring
the entire system for each new “program” or calculation. This process proved to be very tedious.
In 1945, John Von Neumann was working at the Institute for Advanced Study. He developed
two important concepts that directly affected the path of computer programming languages. The
first was known as “shared-program technique” (www.softlord.com). This technique stated that
the actual computer hardware should be simple and not need to be hand-wired for each program.
Instead, complex instructions should be used to control the simple hardware, allowing it to be
reprogrammed much faster.
The second concept was also extremely important to the development of programming
languages. Von Neumann called it “conditional control transfer” (www.softlord.com). This idea
gave rise to the notion of subroutines, or small blocks of code that could be jumped to in any
order, instead of a single set of chronologically ordered steps for the computer to take. The
second part of the idea stated that computer code should be able to branch based on logical
statements such as IF (expression) THEN, and looped such as with a FOR statement.
“Conditional control transfer” gave rise to the idea of “libraries,” which are blocks of code that
can be reused over and over. (Updated Aug 1 2004: Around this time, Konrad Zuse, a German,
was inventing his own computing systems independently and developed many of the same
concepts, both in his machines and in the Plankalkul programming language. Alas, his work did
not become widely known until much later. For more information, see this website:
http://www.epemag.com/zuse/, or the entries on Wikipedia: Konrad Zuse and Plankalkul.)
In 1949, a few years after Von Neumann’s work, the language Short Code appeared
(www.byte.com). It was the first computer language for electronic devices and it required the
programmer to change its statements into 0’s and 1’s by hand. Still, it was the first step towards
the complex languages of today. In 1951, Grace Hopper wrote the first compiler, A-0
(www.byte.com). A compiler is a program that turns the language’s statements into 0’s and 1’s
for the computer to understand. This lead to faster programming, as the programmer no longer
had to do the work by hand.
In 1957, the first of the major languages appeared in the form of FORTRAN. Its name stands for
FORmula TRANslating system. The language was designed at IBM for scientific computing.
The components were very simple, and provided the programmer with low-level access to the
computers innards. Today, this language would be considered restrictive as it only included IF,
DO, and GOTO statements, but at the time, these commands were a big step forward. The basic
types of data in use today got their start in FORTRAN, these included logical variables (TRUE
or FALSE), and integer, real, and double-precision numbers.
Though FORTAN was good at handling numbers, it was not so good at handling input and
output, which mattered most to business computing. Business computing started to take off in
1959, and because of this, COBOL was developed. It was designed from the ground up as the
language for businessmen. Its only data types were numbers and strings of text. It also allowed
for these to be grouped into arrays and records, so that data could be tracked and organized
better. It is interesting to note that a COBOL program is built in a way similar to an essay, with
four or five major sections that build into an elegant whole. COBOL statements also have a very
English-like grammar, making it quite easy to learn. All of these features were designed to make
it easier for the average business to learn and adopt it.
(Updated Aug 11 2004) In 1958, John McCarthy of MIT created the LISt Processing (or LISP)
language. It was designed for Artificial Intelligence (AI) research. Because it was designed for a
specialized field, the original release of LISP had a unique syntax: essentially none.
Programmers wrote code in parse trees, which are usually a compiler-generated intermediary
between higher syntax (such as in C or Java) and lower-level code. Another obvious difference
between this language (in original form) and other languages is that the basic and only type of
data is the list; in the mid-1960’s, LISP acquired other data types. A LISP list is denoted by a
sequence of items enclosed by parentheses. LISP programs themselves are written as a set of
lists, so that LISP has the unique ability to modify itself, and hence grow on its own. The LISP
syntax was known as “Cambridge Polish,” as it was very different from standard Boolean logic
x V y - Cambridge Polish, what was used to describe the LISP
OR(x,y) - parenthesized prefix notation, what was used in the LISP
x OR y - standard Boolean logic
LISP remains in use today because its highly specialized and abstract nature.
The Algol language was created by a committee for scientific use in 1958. It’s major
contribution is being the root of the tree that has led to such languages as Pascal, C, C++,
and Java. It was also the first language with a formal grammar, known as Backus-Naar
Form or BNF (McGraw-Hill Encyclopedia of Science and Technology, 454). Though
Algol implemented some novel concepts, such as recursive calling of functions, the next
version of the language, Algol 68, became bloated and difficult to use (www.byte.com).
This lead to the adoption of smaller and more compact languages, such as Pascal.
Pascal was begun in 1968 by Niklaus Wirth. Its development was mainly out of necessity
for a good teaching tool. In the beginning, the language designers had no hopes for it to
enjoy widespread adoption. Instead, they concentrated on developing good tools for
teaching such as a debugger and editing system and support for common early
microprocessor machines which were in use in teaching institutions.
Pascal was designed in a very orderly approach, it combined many of the best features of
the languages in use at the time, COBOL, FORTRAN, and ALGOL. While doing so,
many of the irregularities and oddball statements of these languages were cleaned up,
which helped it gain users (Bergin, 100-101). The combination of features, input/output
and solid mathematical features, made it a highly successful language. Pascal also
improved the “pointer” data type, a very powerful feature of any language that
implements it. It also added a CASE statement, that allowed instructions to to branch like
a tree in such a manner:
CASE expression OF
statements to execute...
statements to execute...
Pascal also helped the development of dynamic variables, which could be created while a
program was being run, through the NEW and DISPOSE commands. However, Pascal
did not implement dynamic arrays, or groups of variables, which proved to be needed and
led to its downfall (Bergin, 101-102). Wirth later created a successor to Pascal, Modula2, but by the time it appeared, C was gaining popularity and users at a rapid pace.
C was developed in 1972 by Dennis Ritchie while working at Bell Labs in New Jersey.
The transition in usage from the first major languages to the major languages of today
occurred with the transition between Pascal and C. Its direct ancestors are B and BCPL,
but its similarities to Pascal are quite obvious. All of the features of Pascal, including the
new ones such as the CASE statement are available in C. C uses pointers extensively and
was built to be fast and powerful at the expense of being hard to read. But because it
fixed most of the mistakes Pascal had, it won over former-Pascal users quite rapidly.
Ritchie developed C for the new Unix system being created at the same time. Because of
this, C and Unix go hand in hand. Unix gives C such advanced features as dynamic
variables, multitasking, interrupt handling, forking, and strong, low-level, input-output.
Because of this, C is very commonly used to program operating systems such as Unix,
Windows, the MacOS, and Linux.
In the late 1970’s and early 1980’s, a new programing method was being developed. It
was known as Object Oriented Programming, or OOP. Objects are pieces of data that can
be packaged and manipulated by the programmer. Bjarne Stroustroup liked this method
and developed extensions to C known as “C With Classes.” This set of extensions
developed into the full-featured language C++, which was released in 1983.
C++ was designed to organize the raw power of C using OOP, but maintain the speed of
C and be able to run on many different types of computers. C++ is most often used in
simulations, such as games. C++ provides an elegant way to track and manipulate
hundreds of instances of people in elevators, or armies filled with different types of
soldiers. It is the language of choice in today’s AP Computer Science courses.
In the early 1990’s, interactive TV was the technology of the future. Sun Microsystems
decided that interactive TV needed a special, portable (can run on many types of
machines), language. This language eventually became Java. In 1994, the Java project
team changed their focus to the web, which was becoming “the cool thing” after
interactive TV failed. The next year, Netscape licensed Java for use in their internet
browser, Navigator. At this point, Java became the language of the future and several
companies announced applications which would be written in Java, none of which came
Though Java has very lofty goals and is a text-book example of a good language, it may
be the “language that wasn’t.” It has serious optimization problems, meaning that
programs written in it run very slowly. And Sun has hurt Java’s acceptance by engaging
in political battles over it with Microsoft. But Java may wind up as the instructional
language of tomorrow as it is truly object-oriented and implements advanced techniques
such as true portability of code and garbage collection.
Visual Basic is often taught as a first programming language today as it is based on the
BASIC language developed in 1964 by John Kemeny and Thomas Kurtz. BASIC is a
very limited language and was designed for non-computer science people. Statements are
chiefly run sequentially, but program control can change based on IF..THEN, and
GOSUB statements which execute a certain block of code and then return to the original
point in the program’s flow.
Microsoft has extended BASIC in its Visual Basic (VB) product. The heart of VB is the
form, or blank window on which you drag and drop components such as menus, pictures,
and slider bars. These items are known as “widgets.” Widgets have properties (such as its
color) and events (such as clicks and double-clicks) and are central to building any user
interface today in any language. VB is most often used today to create quick and simple
interfaces to other Microsoft products such as Excel and Access without needing a lot of
code, though it is possible to create full applications with it.
Perl has often been described as the “duct tape of the Internet,” because it is most often
used as the engine for a web interface or in scripts that modify configuration files. It has
very strong text matching functions which make it ideal for these tasks. Perl was
developed by Larry Wall in 1987 because the Unix sed and awk tools (used for text
manipulation) were no longer strong enough to support his needs. Depending on whom
you ask, Perl stands for Practical Extraction and Reporting Language or Pathologically
Eclectic Rubbish Lister.
Programming languages have been under development for years and will remain so for
many years to come. They got their start with a list of steps to wire a computer to perform
a task. These steps eventually found their way into software and began to acquire newer
and better features. The first major languages were characterized by the simple fact that
they were intended for one purpose and one purpose only, while the languages of today
are differentiated by the way they are programmed in, as they can be used for almost any
purpose. And perhaps the languages of tomorrow will be more natural with the invention
of quantum and biological computers.