Wednesday, February 01, 2006

Class II - Some Early History - Pseudocodes and FORTRAN

Machine Code:
benefits: fast
drawback: hard to write. ADD might be 14. Readability. Absolute addressing. Recomputing locations, NOPs for deletions. Gave rise to Assembler, with same benefit.

issues:
floats were simulated. indexing missing - adding var quantity to fixed address.

instead, bec of the von Neumann architecture, since the instruction was in memory, they modified the address portion of the instruction, which was error prone.

thus, subroutines to do floats, and to index. since most of time spent simulating floats, little overhead comparatively for doing other things. Thus, pseudocode interpreters were natural developement.

Short Code
Words are 72 bits, as 12 6-bit bytes. mathematic expressions to be evaluated. Byte-pair values. Some codes:

01 -    06 abs         1n (n+2)nd power
02 ) 07 + 2n (n+2)nd root
03 = 08 pause 4n if <= n
04 / 09 ( 58 print and tab
implemented as pure interpreted. 50X slower than machine code.

X0 = SQRT(ABS(Y0))
00 X0 03 20 Y0

Homework: How would we write Z0 = CUBEROOT(Y1 * Z0)? How would we write Z0=Y0?

pseudocode - generally used today as english description of code, not always going into specifics of implementation. here, means something else: an instruction diff than that provided by the machine. initially presented as way of saving space in memory, since pseudocode more compact.

Implemented a virtual computer, with own data types and operations, atop an actual computer.

Automation Principle: Automate mechanical, tedious, error-prone activities.
Also, simpler to understand since fewer exceptions, thus Regularity principle: Regular rules, w/o exceptions, are easier to read, learn, describe, implement.

Compiling routines mentioned in book are predecessor to macros.

Book designs a pseudocode, based on following capabilities:
float arithmetic and comparison, indexing, transfer of control, IO.
Syntax: Use numbers since many did not have alphabetic IO, 1 number on each card, each instruction one word. Choose how big addresses to be. Machine has only 2000 locations, so allow 3 decimal digits, and other 1000 for interpreter and program.
Impossible error Principle: rather than detecting errors, may them impossible to commit.

Since address is 3 digits and word is 10 digits, restricted to refer to 3 variables (mem locations), so commands of the form
operator operand1 operand2 dest. +1 means ADD
+1 010 150 200

Orthogonality Principle: they use + and - sign to denote reverse operations. +/-, */ divide, square/squareroot. Two orthogonal, or indep mechanisms: the number = operation, and sign=direct or reverse. two primitives combine in various ways. corollary of regularity principle.
Less to remember.

But orthogonality can go to far.

Similarly, = vs !=, and >= vs <. Need not reverse since just reverse order of operands. no need for positive, negative, zero tests. Can adopt convention that 000 contains 0. Moving: could be implemented via adding to 0, but adding floats is slow. (So do check if adding to 0 and make that a move op!) Chose 0 so that ops have easy to remember vals: move, add, multiply, square. regularity principle. Indexing and loops: x[i] -> z; x -> y[i]
+6 xxx iii zzz
-6 xxx yyy iii

Abstract out code common to all loops. Abstraction principle, corollary of Automation principle. eliminates possible source of error. "Avoid requiring something be stated more than once, factor out recurring pattern."
+7 iii nnn ddd
i = index; n=upper bound; d=location of beginning of loop. this is a do loop.
+8: read. -8 print
+9: stop

memory cards read in program as well as initial data, separated by 10 9's.
pg 20 has sample program.

Construct an interpreter for pseudocode.
read-execute cycle: called fetch-execute cycle before.

decoding by extracting their parts. similar to lexical parsing.
dest := abs(instruction) mod 1000

then, based on sign and instruction:

if sign is + then
do case op of:
0: move;
1: add;
...
if sign is - then
do case op of
0: // op if have negative 0
1: subtract;

etc.

Computational instructions:
multiply:
data[dest] := floating_product(Data[operand1], Data[operand2])

Test equality:
if floating equality (Data[operand1], Data[operand2]) then IP:=dest

Debugging:
Add to fetch-execute cycle a print of IP and the contents at IP

Statement Labels
As mentioned above, inserts into code shifts everything over.
Labeling Principle: Do not require the user to know absolute pos of item in a list, but rather associate labels with any pos that must be referenced elsewhere.

-7 0LL 000 000

this binds a symbolic label to absolute position.

Thus, equality test becomes
+4 xxx yyy 0LL

but then end up scanning program for the labels. can make label table instead.

By extension, symbol table for vars. In the data section:
+0 sss nnn 000
+d ddd ddd ddd

+0 666 150 000
+0 000 000 001
inits a 150 num array to 1s

This binds the symbol to the location at load time. And loader takes care of storage allocation, instead of having to initialize them all via cards like in the past.

+6 666 111 222
V222 := V666[V111]

Can check type info in symbol table to avoid out of bound errors.
Security Principle:
No prog that violates the def of language or intended structure should escape detection.

Extend to symbolic pseudo-code. Symbols for vars, ops.

To allow symbolic, look up ops and vars on symbol table and replace them with actual values, and then interpret. (Alteratively, could have interp directly.) Thus, first translation then interpretation.

In sum, pseudocodes provide virtual computer, which was more regular and at a higher level. Increased security, allow debugging such as traces (and breakpoints).

FORTRAN
And the IBM 704.
The engineer-coder bottleneck. On other hand, speed of execution an issue.
The IBM 704 developed together with the language, thus tightly integrated. IBM 704 supported floats directly, so no need for simulation, thus speed issue against pseudocodes, thus compiled language.
Skip features in language which not quickly produced in hardware. Thus, limits on statements, on control structures, no recursion.

FORTRAN 0: A design document. spoke of efficiency of machine code with ease of pseudocode. "FORTRAN would elim coding errors, thus little syntax error checking."

Environment: Computers slow, small, unreliable. Used primarily of scientific applications. No existing efficient way to program computers. High cost of computers vs. programmers.

FORTRAN I: Implementation introducing realities. Features: IO formatting, var names 6 chars, user defined subroutines, weird mathematical IF statement. DO Loop.
From Wikipedia:
10 dim i
20 i = 0
30 i = i + 1
40 if i <> 10 then goto 90
50 if i = 10 then goto 70
60 goto 30
70 print "Program Completed."
80 end
90 print i & " squared = " & i * i
100 goto 30

Here is the same code written using control structures.

dim i
for i = 1 to 10
print i & " squared = " & square(i)
next
print "Program Completed."

function square(i)
square = i * i
end function


No > in charset, 704 had 3 way mathematic branch instruction, which replaced FORTRAN 0's IF.
IF (arithmetic expression) N1, N2, N3

DO Loop. Was posttest rather than pretest because of 704's architecture, as well as other control statements.
No data typing. I_, J_, K_, L_, M_ are ints, rest are floats. because of scientific domain.

largest part of time spent developing spend optimizing.

FORTRAN II:
besides other features, independent compilation of subs. otherwise would crash before compiling, limiting programs to 300 - 400 lines.

FORTRAN IV:
Explicit type declarations. Why is this important? reduce errors by type checking, do not introduce new vars erroneously, expand set of names.
Logical IF construct.
Pass subs as parameters to subs.
How would we do this in C++? Digression into complex declarations in general.

Homework Assignment to print above avg nums in array.

A web site about development of FORTRAN - what features were added.

FORTRAN 77: adds char string handling, logical loop control statements (rather than just for loops), IF with optional ELSE.

FORTRAN 90: major differences. names up to 31 chars. ! as comment. collection of matrix functions: e.g. MATMUL, DOTPRODUCT, TRANSPOSE, MAXVAL, MINVAL, PRODUCT, SUM. User defined data type called "derived type." Dynamic allocation of these types, and dynamic allocation and deallocation of arrays. Vs. prev. only had static data. Select case. Exit = break, cycle = continue of C; recursion. Modules (with public and private data and methods). specification of numeric precision, ";" as statement separator. Lines up to 132 chars in length.
While array processing: A= B*Sin(A)
deprecating features.

FORTRAN 95: Minor changes. Forall, initialization of pointers, and of "derived types."
Also, NULL function, automatic garbage collection.

FORTRAN 2003:
Interop with C, international - allow , or . for decimal point. Access to command line arguments, environmental vars, procedure pointers, IO enhancements, OOP support, parameterized derived types, etc.

Some linear algebra and graph algorithm introduction. Describe the algorithm.

Next Time: Names and Binding, in more than just FORTRAN.

No comments: