Programming Languages: 2006

Thursday, May 25, 2006

Review - A Work In Progress

Conceptual Summary - In Reverse Order
Here are some concepts to focus on in reviewing.

Exception Handling:
Dynamic vs static binding of exception handling. How is dynamic more flexible? How is static better?
Scoping and propagation.
How does Java improve over C++'s exception handling? How does the finally clause improve? How about the OO nature of Java? The handling of built in exceptions not explicitly raised by the user?
How can a lang without specific features for exception handling handle errors? what are drawbacks of this approach?

Concurrency:

What is cooperative, competitive synchronization?
Why do we need them?
How does a binary semaphore help with competitve synchronization?
How does a regular semaphore help with competitve synchronization?
what are the inner workings of wait and release for semaphores?
What is a monitor?
What is message passing? Rendezvous?
How can we use message passing to implement a binary semaphore? A monitor?
What mechanism is used for concurrency in Ada 83? Ada 95? Java?

Interactive Fiction:
Why is an object oriented language well suited for programming interactive fiction? Why is a natural language based programming language?

Implementing Subprograms:
FORTRAN 77 with no recursion vs ALGOL and on.

FORTRAN 77:
How handle activation records, access to nonlocals (but not globals either), transfer of control. Why does this present a problem in terms of recursion?

ALGOL:
with stacked activation records. how does this allow for recursion?
What does an activation record look like.
What is the difference between the dynamic and static link. Why is a static link chain O(n)? What is the idea behind a display, and how can this minimize lookup?
Where scoping is static, how does a pair (depth, offset) work to identify a nonlocal? Why can this not work for dynamic scoping, and what would we need to store instead?

LISP:

How would you write a LISP program to return N + 3?
To get the length of a list?
What is car? cdr? caadar? If I gave you a list, could you tell me caadar of that list?

Guarded Commands:
The od and fi constructs - how they work. How are guarded commands useful in terms of ensuring program accuracy?

How are guarded commands useful in concurrency to acheiving nondeterminism?

Subprograms:

in, out, inout parameters.
what are: pass by value, pass by result, pass by value-result,
pass by reference?
positional parameters vs. keyword parameters. default parameters.

Iterative Statements
Difference between pretest and posttest,

Control Structures:
Nothing

PROLOG:
Facts, goals, and rules.
the Terach example. How would you write a rule for ancestor?

Data types:
arrays: static, fixed-stack dynamic, stack-dynamic, heap-dynamic.

Functional Programming Languages:
What is a higher order function? What is mapcar?
In Haskell, what is an infinite list? Lazy evaluation.

Tuesday, May 23, 2006

Slides for Functional Programming Langs

http://www.csee.umbc.edu/331/fall00/nicholas/lectures/fininfp.ppt

Thursday, May 18, 2006

Exception Handling

http://www.cs.ndsu.nodak.edu/~slator/html/CS372/sebesta-pdf/13.pdf

Monday, May 15, 2006

Read this White Paper

by Graham Nelson on Inform and programming in English
http://www.inform-fiction.org/I7Downloads/Documents/WhitePaper.pdf

Read this on Java synchronization

http://www.ccs.neu.edu/home/kenb/g712/res/tutorial.html

Tuesday, May 09, 2006

GNAT for Mac

Check it out here:

http://www.macada.org/

Instructions on Setting Up Ada for DOS

OK, I got a simple compilation working.
Go to this ftp site (open in Firefox or IE)
ftp://ftp.cs.kuleuven.ac.be/pub/Ada-Belgium/mirrors/gnu-ada/3.15p/winnt/

and download the gnat-3.15p-nt.exe file. running this file will install the ada compiler.
we do not really need an IDE (integrated development environment). we can start with just command line.
it should install to c:\gnat
inside c:\gnat\bin, there are a bunch of executables, but all are runnable from the one program gnat
so type gnat and you will see the options.
copy to that directory ..\examples\hello.ada, so that you need not worry about other directories.
hello.ada is the source program.

Then:
gnat compile hello.adb
(this will create a file hello.ali)
gnat bind -x hello.ali
(then we need to link)
gnat link hello.ali
(and finally will run the program)
hello

the output will be:
Hello World. Welcome to GNAT

similar steps are available for gnat on other platforms

Thursday, May 04, 2006

Further Ada Assignment

Work through the examples on this web site to familiarize yourself with concurrency in Ada.

http://linux.disco.unimib.it/~ferretti/lp/ada_task.html

Also read pgs 292 - 300 in the book, which covers much related info.

Assignment

Download an Ada compiler, install it, and write a hello world program.
We are going to write a concurrent program in Ada.

Lecture Notes for Concurrency

Available as a pdf here.

Definition of Race hazard (condition)

at wikipedia:

A race hazard (or race condition) is a flaw in a system or process where the output exhibits unexpected critical dependence on the relative timing of events. The term originates with the idea of two signals racing each other to influence the output first.

Race hazards occur in poorly-designed electronics systems, especially logic circuits, but they may also arise in computer software.

...

Computing

Race hazards may arise in software, especially when communicating between separate processes or threads of execution.

Here is a simple example:

Let us assume that two threads T1 and T2 each want to increment the value of a global integer by one.

If the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong

Integer i = 0;
T1 reads the value of i into a register : 0
T2 reads the value of i into a register : 0
T1 increments the value of i : 0 + 1 = 1
T2 increments the value of i : 0 + 1 = 1

The final value of i is 1 instead of the expected result of 2.

For another example, consider the following two tasks, in pseudocode:

global integer A = 0;

task Received() 
{
   A = A + 1;
   print "RX";
}

task Timeout() // Print only the even numbers
{
   if (A is divisible by 2)
   {
       print A;
   }
}

task Received is activated whenever an interrupt is received from the serial controller, and increments the value of A.

task Timeout occurs every second. If A is divisible by 2, it prints A. Output would look something like:

0
0
0
RX
RX
2
RX
RX
4
4

Now consider this chain of events, which might occur next:

timeout occurs, activating task Timeout
task Timeout evaluates A and finds it is divisible by 2, so elects to execute the "print A" next.
data is received on the serial port, causing an interrupt and a switch to task Received
task Received runs to completion, incrementing A and printing "RX"
control returns to task Timeout
task timeout executes print A, using the current value of A, which is 5.

Mutexes are used to address this problem in concurrent programming.

In filesystems, File locking provides a commonly-used solution. A more cumbersome remedy involves reorganizing the system in such a way that one unique process (running a daemon or the like) has exclusive access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process (which of course requires synchronization at the process level).

In networking, consider a distributed chat network like IRC, where a user acquires channel-operator privileges in any channel he starts. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither server will yet have received the other server's signal that it has allocated that channel. (Note that this problem has been largely solved by various IRC server implementations.)

In this case of a race hazard, the concept of the "shared resource" covers the state of the network (what channels exist, as well as what users started them and therefore have what privileges), which each server can freely change as long as it signals the other servers on the network about the changes so that they can update their conception of the state of the network. However, the latency across the network makes possible the kind of race condition described. In this case, heading off race conditions by imposing a form of control over access to the shared resource—say, appointing one server to control who holds what privileges—would mean turning the distributed network into a centralized one (at least for that one part of the network operation). Where users find such a solution unacceptable, a pragmatic solution can have the system 1) recognize when a race hazard has occurred; and 2) repair the ill effects.

A race condition exemplifies an anti-pattern.

A particularly poignant example of a race condition was one of the problems that plagued the Therac-25 (a Life-critical system) accidents. Another example is the Energy Management System used by Ohio-based FirstEnergy Corp., that had a race condition in the alarm subsystem; when three sagging power lines were tripped simultaneously, the condition prevented alerts being raised to the monitoring technicians. This software flaw eventually led to the North American Blackout of 2003.

Tuesday, May 02, 2006

Concurrency

Concurrency

program level
subprogram level
statement level
instruction level

first computers: general, i, o
early 60's: multiple processors - program level
Now: multi-processor: single instruction multiple data - eg. on vectors
mult instruc mult data: each general processor has own data. can be a distributed system

physical vs. logical concurrency
thread of control
coroutines - quasi-concurrent, because single thread of control

physical, logical concurrency - have, or can imagine have, different threads of control

task: a unit that can be in concurrent execution with other such units
vs. subprogram:
can be implicitly started rather than explicitly called
when task invoked, need not wait for return from that task (asynchronous)
on return, may not necessarily return control to the caller

Communication w other tasks
shared nonlocal variables
message passing
parameters
if no such communication, called disjoint
often used for simulations, so to coordinate need communication

Synchronization:
mechanism controlling order of task execution
coop vs competition.
cooperative when A must wait for B, in order for task A to continue sensibly
Competitive: both require same resource which cannot be used simultaneously
if A needs to access shared location x, which B is using, B must wait.

famous cooperation: producer/consumer problem. from OS
one unit produces, the other consumes.place in, take from a buffer.
cannot take from buffer if empty.

competitive:
int total = 3;
A: add 1 to total.
B: mult total by 2

each must fetch, modify
what can happen without syncronization?
3 values can result. 8 (correct),
if A, B fetch before other writes back, --- if A writes first, Total = 6. If B first, total = 4

mutual exclusion. each must request and then release the resource.

Also, if more tasks than processors, need a scheduler, which delays execution.
Can time slice. let process execute for X amount of time. OK if all processes have same priority. complications,
as e.g. they wait for i/o, delays for syncronization.

tasks can be in several states.
1. New - task created, not begun
2. runnable, ready - ready to run but not currently running. given processor time, or was running and was blocked
3. running - currently executing
4. blocked - was running, but interrupted by an event. i/o most common, since most slow.
5. dead - no longer actuve. execution finished or was killed.

different algorithms for choosing which ready task to make runnable when currently running one is blocked. e.g. priority queue.

characteristic: liveness - program continues ti execute, leading to its completion.
deadlike can break liveness. how so?
A and B need X and Y to continue, but A has X, B has Y. deadlock avoidance - in lang and program design.

Language support for concurrency: PL/I, Ada 95, Java.

to be continued

Tuesday, April 25, 2006

Common Blocks in Fortran 77

http://www.nationalfacility.apac.edu.au/training/FortranAdvanced/slides/slides.017.html

Common blocks are a Fortran 77 construct, providing areas of shared storage between different subroutines. eg.

program main
   integer :: a, b
   real :: c
   common /d/ a, b, c
 ...
end program main

subroutine foo()
   integer :: a, b
   real :: c
   common /d/ a, b, c
 ...
end subroutine foo

the common storage is called d and contains the integers a, b and the real c. Setting these variables in one routine changes it in all. Common blocks are now replaced by modules in Fortran 90.

Implementing Subprograms

Here:

Thursday, April 06, 2006

A Lisp link

Here:

Tuesday, April 04, 2006

Lisp Interepters

Here

I'm going to try out UfaSoft's one.

Monday, April 03, 2006

LISP - reading assignment and homework

Read chapter 9 in the book until page 318, or at least read pgs 317-318, and do exercise 9-2 questions 1 and 3.

Thursday, March 30, 2006

Guarded Commands

See here.

Subprograms

Access the pdf here.

Tuesday, March 28, 2006

Iterative Statements

FORTRAN I: do was posttest.
DO label var = init, terminal, stepsize
stepsize defaults to 1
unsigned int constants or simple positive int variables

FORTRAN 90: DO statement is pretest. fortran 77: allows loop var to be int, real, double.
loops can be expressions, with neg and pos values. eval at beginning of DO to arrive at an iteration count. It is this value, not the loop parameters, that control the loop. so one could change them.

no multiple entry, yes multiple exits.

ALGOL 60: a case in flexibility
[for_stmt] -) for var := [list_element] {, [list_element]} do [statement]
[list_element] -) [expression]
| [expression] step [expression] until [expression]
| [expression] while [Boolean_expr]

See this website: http://logos.cs.uic.edu/476/notes/overheads/ch7controlstructures.pdf

Pascal For Statement
for var = init (to | downto) final do statement

it grows or shrinks in steps of 1. loop var cannot be changed in the body.
after loop, loop var undefined. loop var is an ordinal, with scope of the loop.

init and final checked at the start, and not after, and so may be changed in the loop.
why do this? efficiency, reliability?

Ada For statement
for variable in [reverse] discrete_range loop
...
end loop

and note that loop var has loop scope, will mask variables with wider scope.

C, C++:
We know this.
for (int i = 9; i < style="font-weight: bold;">
any of the elements in the for may be missing.
scope until end of function.

Logically Controlled Loops (rather than counting loops)
posttest, pretest?
should it be special form of a counting loop?

examples of pretest, posttest in C++, Java.
do while vs. while.

how would we implement them (=operational semantics)

FORTRAN 90: has neither.
Ada: pretest, no posttest. why?
Perl: while and until.
how so? see here: http://vergil.chemistry.gatech.edu/resources/programming/perl-tutorial/control.html

  while ($food < $stomach_ache) {    $plop++;    $fizz--;    $oh_what_a_relief_it_is *= 5;  }

Like the unless statement above, it is easier to say "until something is true" (Schwartz 61). Thus, Perl provides the until statement:

  until ($x > 6) {
  print "$x is the number!\n";
  $x++;
}

Like C, Perl lets you do do, while loops, but Perl also adds do, until loops:

  # do while example
do {
  $junk++;
} while ($junk < aok =" @array_hooray[$x];">

and of course the for loop.
might as well talk about Perl conditionals here as well:

if and else statements are used as conditionals as in any other language. For example:

  if ($cow < $sheep) {    print "baaah....\n";  } else {    print "moo....\n";  }

unless statement is specific to Perl, in which you want to always just evaluate the false part instead of doing something when not true. For example:

  unless ($age <>

Pascal has a repeat...until statement, which is the reverse of C's do...while, same as Perl's do...until.

Think about it - when we have break statements, they are the reverse of a standard while conditional. repeat...until maintains that.

Post-tests not freq used. dangerous because loop always executed at least once, and programmer might forget this. put after loop body helps make this a bit clearer.

to be continued...

Thursday, March 23, 2006

Control Structures

Control Statement = control statement + statement(s) it controls
control execution path
without, all have are assignments and evaluating expressions.
two types: selection (among multiple execution paths) and repeated execution.
basically, if and while. but can add more = greater writability, but programmers only learn subset of lang.

how many statements do you allow in the control statement?
one or many. blocks. otherwise, restricted to single statement, or use of a goto. control structures can replace goto.

another q: can we enter into the middle of control structures (multiple entries) - langs differ. in terms of whether can exit multiply, not such a distinction since generally safe to do that - but there is a design philosophy.

Selection
generally 2 way vs. n way selectors (1 way "degenerate" form of the two way. fortran's 3 way selector).

design issue: what type of expression controls the selector? can we select more than a single statement? do we handle nested selection?

Two way: Logical If:
FORTRAN IV
IF (BOOL expression) statement
simple, inflexible. how do you get multiple statments? with goto circumventing.

IF (FLAG .NE. 1) GO TO 20
I = 1
J = 1
20 CONTINUE

but points off for lack of readability.

How about multiple entry? Yes. How?

ALGOL 60 and on (=Java, FORTRAN 90): two-way selector:
if (bool expr) then
statement
else
statement

only one or the other is executed.

and either of these can be compound, in which case it is the then clause and else clause, rather than statement.

how do we generate the degenerate 1-way if? how do we generate the two-way?

Nesting Selectors
Ambiguity.
if (a == 1)
if (b == 2)
result = 5;
else
result = 1;

Python: semantic interpretation based on indentation. other languages, no.

In Java, most other imperative langs - match most recent, so must indent the else.
by a rule, rather than in the syntax. but will the programmer know?

ALGOL 60 - by Syntax rather than by fiat (rule).
no if nested in another if. must nest in a begin/end clause.

How would we then write the code to get each interp? (Use {} as a shorthand.) Can get either effect in Java, but to match first if rather than second, must use {}.

Perl: then and else clauses must always be compound.

in C++, Java, the introduction of the else clause, or else by compounds, {}, marked the end of the statement or compound, and where this was a single statement with no immediate else, was ambiguity. And difference between marker of end of clause, if it is a compound of single statement.

In Ada, VB6, have end if;
how does this help?

Because "end if" closes the nested if, shows which one it matches. how do we write each in Ada?

Multiple Selection
generalization of 2-way, and can implement 2-way as multiple.

design issues
do we allow compound statements, type of expression (2 way selector needed only 2 vals), can we select more than one statement/compound? what if does not eval to one of specified vals?

FORTRAN I - the three-way selector: IF(X) 10, 20, 30

and here can go really anywhere - need not even immediately follow if, or each other. user responsible for gotos, and can omit with compiler not detecting.
Also in FORTRAN I
computed:
GO TO (label1, label2, labeln) expression

assigned:
GO TO var_or_const

C.A.R. Hoare, in ALGOL-W (1966)

case int_expression of
begin
statement1;
...
statementn;
end

and could be compounds.

Pascal similar, except
statement_i becomes:
constant_list_i: statement_i;

yet we cannot say goto caselabel in Pascal.

later Pascals: an else clause, = to default
no fallthru.

in C#:

      switch(n) 
{   
   case 1:
      cost += 25;
      break;            
   case 2:      
      cost += 25;
      goto case 1;     
   case 3:      
      cost += 50;
      goto case 1;   
   default:      
      Console.WriteLine("Invalid selection. Please select 1, 2, or 3.");      
      break;
 }

and they will not allow you to omit a break of a goto.

does not disallow multiple entry. and has fallthru. unless use break, which basically is a restricted goto.
but what if forget the break and get unwanted fallthru?

reliability vs. flexibility.

In Ada, allow ranges: 10..15, as well as 5 | 6 | 19, and have an others clause

this is good for basing on single value at the top. but sometimes, based on various boolean conditions. must use nested ifs. poor readability. some langs (FORTRAN 90, Ada, Perl) give way around this.

elsif vs a bunch of nestings and endifs

Iterative statements
execute statment 0, 1, more times
all prog langs.
sometimes by iterative constructs, sometimes by recursion.

at first, directly related to arrays. why? that was the problem domain.

design questions: how iteration controlled, where in the loop is the control mechanism?

body, pretest, posttest.

counter-controlled loops: inital, terminal values of loop var, + step size.
logically controlled loops more general, but not necessarily more commonly used. could be more complex to implement that counter-controlled.
often supported by machine architecture. sometimes lang features outlive machine architecture (Fortran's 3 way if)

more design qs:
type, scope of loop var. what value does it have at termination, can we change loop vars in the loop (what are effects of this?), are loop params eval once, or for every iteration (let us say the terminal condition changes)

FORTRAN I: do was posttest.
DO label var = init, terminal, stepsize
stepsize defaults to 1
unsigned int constants or simple positive int variables

FORTRAN 90: DO statement is pretest. fortran 77: allows loop var to be int, real, double

Thursday, March 16, 2006

Expressions, Assignment Statements

Overloaded Operators:
operator used for more than one purpose. writability vs. reliability. is this so? depends on how use it.

op overloading in C++. give me an example (<<, >>, +).

by language def, by user.

APL and SNOBOL have binary, unary versions of almost all operators, leads to confusion.

C++: & operator, as address of and bitwise and.
unrelated functions. similarly, * operator.

general problem of reusing symbols. comments, references, etc.
a /*p

unary, binary -.

"distinct symbols increase readability." is this so? + for string appending.
when used correctly, aids in readability.
add(matmul(A, B), matmul(C, D)
vs.
A * B + C * D
on the other hand, need know types of both operands to know meaning of operator.

example from Pascal, vs. C++: find sum of bunch of ints.
avg := sum / count;
avg = sum / count;

in Pascal, / specifies float result of division between two ints. VB as well, / vs
Type conversion
Narrowing vs. Widening conversion. widening generally safer.

mixed mode expressions = op has operands of different types. coercion to appropriate type where operator not defined on that type. also, there are explicit conversions via typecasting.

Diff langs disagree on whether to do coercion.

int a, b, c; float d;
a = b * d; // as an error for b * c

case in flexibility vs. efficiency:
PL/I - string can be combined with an int. string is scanned for numeric value, and if a decimal point, assumed to be a float, and coerce other operand to float. Coercion, and choice of coercion, at run time.

reduce flexibility in favor of reliability
Ada:
cannot mix int and float in arithmetic expression except for **, exponent.

most others, no such restrictions.

Java (byte, short), C++ (short, char). operators such as + first convert to int, and then convert back on assignment, even though both operands of the same type.

Explicit type conversion
Ada:
AVG := FLOAT(SUM) / FLOAT(COUNT)

C:
(int) angle
also C++'s function-like casting

errors can result from coercions. divide by zero when divide by a float.

Relational, Boolean Expressions
for the most part, similar. sometimes slight differences. FORTRAN used .EQ. and .NE. because symbols not on punch cards.
Ada uses = for comparison. /= for not equal.

relational given lower precedence than arithmetic, so that can say
a + b > c + d

Boolean expressions:
In Ada:
**, abs, not
*, /, mod, rem
unary +, -
etc.

lowest:
and, or, xor, and then, or else

these ops non-associative. A > B and A < k =" 0"> b > c.
Python handles this interestingly.

x <> z

This is interpreted by Python as::

if x <> z:
return y > z # 1 for plain comparisons
else:
return y > z # 0 for plain comparisons
else:
return x <>

Note that this requires testing the truth value of the result of
comparisons, with potential "shortcutting" of the right-side
comparison testings. In other words, the truth-value of the
result of a comparison determines the result of a chained
operation.

Which brings us to Short-circuiting.
When know answer already, can ignore rest.
(13 * a) * (b / 13 - 1)
what if a is 0? But not easily detected. By boolean, already considering truth value.
(a >= 0) and (b < style="font-weight: bold;">speed.
can use this shortcircuiting to your advantage.
if (index in range and a[index] > 9)
so as to prevent an error.

on the other hand, sometimes will not expect that it is shortcircuiting, so issue of reliability. calling a function, incrementing a variable.

how decide: FORTRAN 77: up to lang implementor, if part is a func which assigns to a nonlocal variable, that nonlocal becomes "undefined"
difficulty in detecting these side-effects of functions.

In Ada: and then, or else does short-circuiting.

In C++: &&, || do short-circuit. If wish to avoid short-circuit, force 1 and 0 results and use bitwise operators.

Assignment

A = B

what about A = B = C?
depends. in C, yes. In PL/I, Basic, Visual Basic, = also does comparison, so would assign truth value of B = C to A.

ALGOL 60: := for assignment.

Mult targets:
PL/I: Sum, Total = 0.
other langs, e.g., C, do not.

Conditional targets
?: to give an lval rather than an rval, in C++.

Compound Assignment
a = a + b
a += b

C, C++, Java.
interesting feature of C# that defines these for you.

Unary Assignment
in C, C++, Java, etc. ++a.
when apply two unary ops, one on left side, one on right, works right to left:
- (count ++) even without parens

Assignment has an expression value
in C.
while ((ch = getchar() != EOF)

Mixed mode assignment
Do the source, target need be same type? FORTRAN, C, C++: coercion.
Pascal: depends - int to real, but not real to int.
Ada, Modula2 - nope.

Java: differs from C++ - only if widening results. increases reliability.

Tuesday, March 07, 2006

Data Types - III

Associative Arrays

Perl:
%ages = ("Josh" => 10, "John" = 32);
$ages{"Josh"} = 29;
// and assignment will create an element
delete $ages{"John"};
// or the entire hash:
@salaries = ();

Here is a Perl tutorial on hashes. Read through it. keys and values operator/collections.

    for my $key ( keys %hash ) {
my $value = $hash{$key};
print "$key => $value\n";
}

exists keyword, or else will create the element.
if (exists $ages{"Peter"} ) ...

Possible Implementation: Read this website about implementation of Perl hashes. Linked Lists - but then O(N). Could turn each string into unique index, and use array, but then very sparse. Can use a hash function. Handle collisions how? Better hash function, or else chaining. (Alternatively, red-black trees.) Read article for more info.

C++: Map, multimap. C#: Dictionary, etc.

Record:
Heterogeneous data structure.
First, COBOL. (Not in early Fortrans.) C's struct. C++'s class. Java has class, not struct. C# has both, but with different meanings (struct is treated as a primitive, unboxed).

COBOL:


01 EMPLOYEE-RECORD.
02   EMPLOYEE-NAME.
05   FIRST    PICTURE IS X(20).
05   MIDDLE   PICTURE IS X(10).
05   LAST     PICTURE IS X(20).
02   HOURLY-RATE   PICTURE IS 99V99.

Pascal, Ada, use record nesting rather than relative numbers.


EMPLOYEE-RECORD :
record
EMPLOYEE-NAME :
  record
    FIRST    : STRING(1..20);
    MIDDLE   : STRING(1..10);
    LAST     : STRING(1..20);.
  end record;
HOURLY-RATE   : FLOAT;
end record;

Referring to subelements:
COBOL: MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-RECORD
Others langs: dot, or -> for pointers to structs. Fortran: %

elliptical names - as long as unambiguous, can omit middle stuff.

with in Pascal, VB6. Can do this in C++ using references.

Record Ops
Assignment. types must be same. Ada: comparison for =, <>, initialization with literals.

COBOL: MOVE CORRESPONDING
if subfields have same name, copy it.

Implementation. adjacent memory, and field names associated with offset. descriptor has, for each field: Name, type, offset.

Unions
type that stores diff type values at diff times.
Q: Should we provide type-checking for current type, dynamically? Should unions be embedded in records?

free unions give programmer complete control, no type checking.
Pascal's unions, Ada's unions: have a tag or discriminant that tells current type.
more about this, but can read up on it. Ada adds constrained union such that in declaration, can restrict to one specific type, and thus do static type-checking.

Eval: flexibility vs. reliability.

Implementation:
Use same address for all members of union. enough storage for the largest.

Sets
C++ has but not exactly the same.
Until then, Pascal alone among imperative langs. max size implementation dependent, often < style="font-weight: bold;">type colors = {red, green, blue};
colorset = set of colors.
var set1, set2 : colorset;

set1 := [red, blue];

if ch in ['a', 'e', 'i', 'o', 'u'] ...

Check out ISETLW.

Eval
Ada, based on Pascal, but did not provide sets. rather, added membership op on enums.

other langs, use arrays and implement ops. unweildy, less efficient.

Impl
stored as bit strings. 0 if present, 1 if not present.

(C++ has a bitset.)

Pointers
We know about pointers, dereferencing, assignment to local, off heap, etc.

some problems with pointers:
dangling pointers - points to deallocated heap variable. and what if that memory claimed by s/t else?
lost heap-dynamic variable - lost in terms of no longer accessible. memory leak, vs. garbage collection.

Pascal: create new vars with new and elim with dispose, which sets all pointers pointing at this object to nil. implementation dependent.

Ada: access types. auto dealloc when ptr goes out of scope. elims problems caused by reliance on explicit dealloc. though gives deliberately unweildy alternate method.

C++, Java: ref types.

Handling dangling ptrs:
tombstones. ptrs only point to tombstones. tombstone points to actual memory. when deallocate, tombstone remains and is set to nil. costly in time and space. storage never reclaimed. no prog lang uses.

garbage collection vs. reference counters: lazy, eager approach.

reference counters - keep track how many things pointing to this, when ref count reaches zero, can be reclaimed.

garbage collection - when run out of space, or can explicitly invoke. how do we know when something not pointed to? keep track of each ptr. named ptrs, and following named ptrs as far as they go.

Next: Expressions, Assignment Statements
unary, binary, ternary ops.
usually Infix.

a + b * c

right to left, left to right, or operator precedence.

associativity to left, to right: a - b + c - d

use of parentheses.

side effects -what else happens as a result.
a + fun(a)
functional side-effects
FORTRAN only allows (=legal) if function does not change value. but s/t hard to determine, based on pointers, globals.

operator side-effects
b = a++ + a++;
reference point, e.g. assigning twice between two reference points.

to be continued...

More Prolog

Here are some Prolog programs (written in Strawberry Prolog) and their output:

program
father(avraham, yitzchak).
father(yitzchak, yaakov).

ancestor(X, Y) :- father(X,Y).
ancestor(X, Y) :- father(X,Z), ancestor(Z, Y).

?- father(yitzchak, X), write(X), write("\n").

Output:
Compiling the file:
c:\father
0 errors, 0 warnings.

yaakov
Yes.

program
add1(X, Y) :- Y is X + 1.

?- add1(4, 5).

output
Yes.

program
add1(X, Y) :- Y is X + 1.

?- add1(4, 6).

output
No.

program
add1(X, Y) :- Y is X + 1.

?- add1(4, X), write(X), write("\n").

output
5
Yes.

program
add(X, Y, Z) :- Z is X + Y.

?- add(4, 5, X), write(X), write("\n").

output
9
Yes.

program
add(X, Y, Z) :- Z is X + Y.

?- add(4, "hello", X), write(X), write("\n").

output
4hello
Yes.

program
add(X, Y, Z) :- Z is X + Y.

?- add(4, Y, X), write(X), write("\n").

output
Compiling the file:
c:\father
0 errors, 0 warnings.

Runtime Error at step 3:
The argument 2 of "+"
is "var" but has to be "str" or "int" or "float".

No.

program
add(X, Y, Z) :- Z is X + Y.

?- add(4, 5, 9), write(X), write("\n").

output
_9
Yes.

program
father(avraham, yitzchak).
father(yitzchak, yaakov).

ancestor(X, Y) :- father(X,Y).
ancestor(X, Y) :- father(X,Z), ancestor(Z, Y).

?- ancestor(avraham, X), write(X), write("\n").

output (selecting Run, then Next Answer)
Compiling the file:
C:\father
0 errors, 0 warnings.

yitzchak
Yes.
yaakov
Yes.

program
abs(X, Y) :- X = Y, X >= 0.

?- abs(4, T), write(T), nl.

output
4
Yes.

program
abs(X, Y) :- X = Y, X >= 0.

?- abs(-4, T), write(T), nl.

output
No.

program
abs(X, Y) :- Y = X, X >= 0.
abs(X, Y) :- Y = 0 - X.

?- abs(-4, T), write(T), nl.

output
4
Yes.

program:
append([], L, L).
append([H | T] , L2, [H | L3] ) :- append(T, L2, L3).

1) empty list + L = L
2) recursion step
appending a list [H | T] to any list L2 produces the list [H | L3], on condition that L3 is formed by appending T to L2.

try out an example.

Wednesday, March 01, 2006

Prolog Compilers, Interpreters

There is one that actually runs in your browser window if you have a Java-enabled browser. Click here, scroll down to "the distribution," and click on the button labelled "Start W-Prolog" to start the prolog interpreter Applet.

Copy the text of the facts and rules from the previous post, and run a query to see if terach is the father of yishmael.

There are some free compilers and interpreters here.

Prolog - I

Facts, rules, and goals.

father(terach, avraham).
father(avraham, yitzchak).
father(avraham, yishmael).

ancestor(X, Y) :- father(X, Y).
ancestor(X, Y) :- father(X, Z), ancestor(Z, Y).

try this out. how would you write a brother rule?

Tuesday, February 28, 2006

Data Types 2

Array = homogenous collection
q: what types for subscripts, do we rangecheck, are ranges bound, array alloc when, how many subscripts (1,2), initialization, are slices allowed?
FORTRAN:
SUM = SUM + B(I)
same () as functions so needs keep track. why? no other chars avail. Ada as well - because both are mappings of parameters to values. Pascal, C++, Java = []

obvious point: element type vs. subscript type. most often subrange of int. Pascal, Ada: Bool, char, enum.

subrange checking: Pascal, Ada, Java. Not C++, FORTRAN - out of bound errors. speed vs. reliability.

Binding of subscript type statically bound (usually) but value range can s/t vary dynamically.

lower bound: 0, 1, set by user.

4 cat, based on binding to subscript range and binding to storage.
static array - both ranges and binding to storage at compile time. no dynamic alloc dealloc. FORTRAN 77.
fixed stack-dynamic - subscripts bound statically, storage bound off the stack - C, Pascal.
stack-dynamic - same as above but subscript bound dynamically as well. but then remain fixed for lifetime of var. Ada:
GET (LIST_LEN)
declare
LIST : array (1..LIST_LEN) of INTEGER
begin
...
end
heap-dynamic - same, but can change many times during lifetime. FORTRAN 90:
INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT
ALLOCATE (MAT(10, NUMBER_OF_COLUMNS)
DEALLOCATE(MAT)
same idea in C, C++ using pointers.

Number of Subscripts
saw in fortran assignment that could use mult indices. original fortran restricted to 3. The ZERO-ONE-INFINITE principle. Let programmer bear the cost if he wants. FORTRAN IV it became 7. Modern - unlimited.

Array init: FORTRAN has it in the form of a DATA statement:
DATA MYVAR / 8, 9, 10/
in C, init of arrays, of strings, of arrays of strings. In Java.
Ada:
LIST : array (1..5) of INTEGER := (1, 4, 6, 8, 10)
LIST : array (1..5) of INTEGER := (1 => 3, 4 => 9, others =>0)

array ops

to be continued...

Thursday, February 23, 2006

Class Notes - Data Types - 1

Lifetime
Not same as scope. Example – subprogram in C where exists but hidden. Static in C.

Symbolic constants
Why? Readability. Reliability. Ease of changing the program. We saw in the program where can be useful. Two types – lvalues and rvalues. Const vs. #define.
Pascal - simple, Modula-2: complex expressions. manifest constants –simple due to static binding of values.
C++, Java, Ada: constants based on variables – bound at runtime

Initializing Variables
We know C++. int x = 6; FORTRAN has DATA statement
REAL PI
INTEGER SUM
DATA SUM /0/, PI /3.14/
Some langs do not have init statements. Just do assignment. Pascal.
When is this initialization done? Static vs. auto in C.

ALGOL 60
First, ALGOL 58, designed by international committee. 3 goals: close to standard mathematical notation; use to describe computing processes in publications; able to translate to machine code. JOVIAL, MAD, NELIAC. In general, taken to be guidelines for developing langs. IBM interested in promoting FORTRAN.
ALGOL 60, Peter Naur described using BNF, which while elegant, was not well understood back then.
Contributions: many langs based on this. Example: pascal, ALGOL 68, Ada, C++, PL/I, Java. Was only accepted way of describing algorithms. Influenced machine architecture to implement the concepts. Advances: block structure; pass by value and pass by name. recursive procs. Stack dynamic arrays. (now they have them in C, I think). But lacked standard IO = difficulty porting. People already using FORTRAN.

I posted before an initial ALGOL reading – 95 – 121. A summary of material covered there – Algol 58, use of BNF (with further discussion in next chapter), Algol 60. Blocks. Though ALGOL has static scoping, a discussion of dynamic scoping. Pg 118. generalized arrays by allowing lower bounds, dynamic arrays. Pg 119 – strong typing. Look thru the code.

Now 121-140. A lot of what we discussed. Pass by name, value. Recursive procedures. Introduction of for loop with step – very general. The switch.

months
For days := 31, 28, 31, 30, 31, 31, 31, 30, 31, 30 do
For days := 31, if (mod(year, 4) = 0 then 29, else 28, 31, 30, 31, 31, 31, 30, 31, 30

Homework: Print Oscillating sum
Homework: Set up Prolog

Data types:
Early on, model linked list, binary tree, with arrays. Think. How would do it?
COBOL – allowed specification of precision, structured data type for records. PL/I – extended precision to ints and floats. ALGOL 68: primitives and structure-defining ops. (examples of ops – in C: {}, (), *, []) User defined data types – able to associate name w/ data type. Abstract data types – use of a type differs from internal representation and ops available on type
Homogeneous (array) vs. heterogenous (record).

Primitives – not based on other types. Ints reflect storage in hardware. Others – in software.

Numerics – integers: signed (one’s complement, two’s complement), unsigned, short, int, long. Floats: 1 sign bit, 8 bit exponent, 23 bit fraction = 32. Double: 1/11/52 = 64.
Decimal: use byte or nibble per digit. Decimal point does not float. Boolean types. As bit, byte, int. Char type: ASCII, Unicode (2 byte).

String Types: might be built in or built upon array of char. Do we allow array ops on it. Should they be static or dynamic length?
Not primitive: C++, Pascal, Ada.
In Ada: array of char, can reference a single char. Various ops built in.
substr: name1(2:4)
& for concatenation. As in VB6.
C++: ignoring the stl. After all, literal strings are arrays of char, null terminated. Built in library functions – strcpy, strcat, etc.
FORTRAN 70+: they are primitives. Have assignment, relational ops, concatenation, substring.
Java: primitive type. String class: holds constant strings. StringBuffer. Why this is really annoying. String comparisons in Java.
SNOBOL4, Perl, Javascript: pattern matching string op. many other langs: as functions.
In SNOBOL4, patterns can be assigned to variables.
LETTER = ‘abcdefghijklmnopqrstuvwxyz’
WORDPATTERN = BREAK(LETTER) SPAN(LETTER) . WORD
Means skip until find a letter, then span letters, then assign (the .) to the variable WORD
TEXT WORDPATTERN
Perl: Based on regular expressions. /[A-Za-z][ [A-Za-z]+/ /\d+\. ?\d* | \.\d+/

static vs. dynamic string lengths
FORTRAN 90:
CHARACTER (LEN = 15) A, B
Always full, empty chars are blank.
C, C++: limited dynamic. Except for STL
SNOBOL4, JavaScript, Perl: dynamic
Talk about benefits, drawbacks of each scheme

Implementation
Static: at compile time: name of type, the length, the address
Limited dynamic: store capacity, current length. (though C++ needs this not)
No need for dynamic allocation.
Dynamic: as linked list, but have many pointers. Dense list. But what if space afterwards is claimed – must realloc and copy. We can use a mixture of the two approaches.

User-defined Ordinals.
enum. C, Ada
Design: In Pascal, a given enum val (e.g. January) not allowed to be used in more than a single enum, cannot be input or output, but can be assigned, compared, used in for loop.
Some languages allow overloading of these literals, but then need to specify type:
e.g. type letters is (‘A’, ‘B’, ‘C’, ‘D’ ….
type vowels is (‘A’, ‘E’, ‘I’, ‘O’, ‘U’)
for letter in ‘A’ ,,. ‘U’ loop
for letter in vowels(‘A’) ,,. Vowels(‘U’) loop

Ada’s BOOLEAN and CHARACTER are predefined enums

Ops: pred, successor, position. Pascal: pred(blue) is red. Ada, as attributes: Letter’Pred(‘B’) is ‘A’

Improves readability, reliability

Subrange types – continuous subsequence of an ordinal
Pascal:
type
uppercase = ‘A’ .. ‘Z’;
index = 1 .. 100;

In Ada, called subtypes. Assuming already defined an enum:
Subtype WEEKDAYS is DAYS range Mon .. Fri
Subtype INDEX is INTEGER range 1 .. 100

Ada allows assignment from parent to subtype unless it is not defined, so might assign Tuesday but not Sunday.
Recall that diff from Ada’s type, where cannot assign from parent to derived type.

Implementation: As parent types, but with range checking for each assignment.

Array Types

Tuesday, February 21, 2006

Syllabus - work in progress

Work in progress - to be refined, added to, etc.

Preliminaries

why study programming languages, programming domains, criteria to evaluate languages, influences on language design, language categories, design trade-off.

Some History

pseudocodes, FORTRAN, ALGOL. discuss other languages later, as they come up.

Syntax

describing syntax, lexemes, tokens, language recognizers and generators, BNF and CFGs, EBNF, parse trees, ambiguity, precedence, associativity, regular expressions, FSA, NFSAs, PDAs.
Lexical analysis, parsing, lex and yacc.

Names, Bindings, Type Checking, Scopes, Lifetime. static and dynamic
Data Types

Primitives, strings, user-defined ordinals, arrays, associative arrays, records, unions, sets, pointers. Also, some discussion of ALGOL

Expressions and Assignment statements

arithmetics expressions, overloaded operators, type conversions, relational/boolean expressions, short circuit evaluation, assignment statements, mixed-mode assignment. More ALGOL.

Control Structures

compound statements, selection statements, iterative statements, unconditional branching, guarded commands

Subprograms

design issues, local referencing environments, parameter passing methods, overloaded subprograms, generic subprograms, separate and independent compilation, design issues for functions, accessing nonlocal environments, user-defined overloaded operators, coroutines.

Implementing subprograms

General semantics of calls and returns, implementing: FORTRAN subprograms, ALGOL subprograms, blocks, dynamic scoping, parameters that are subprogram names.

Encapsulation, OOP - we will probably skip since we already know this fairly well.
Concurrency - perhaps
Exception handling - perhaps if have time at the end, since will cover this elsewhere

in PL/I, in Ada, in C++, in Java

Functional Programming - we will do LISP

Mathematical functions, fundamentals of functional programming langs, LISP, Scheme, Common LISP, ML, Haskell, Applications of, comparison between functional and imperative languages

Logic Programming Languages - Prolog

Predicate Calculus, Proving Theorems, Logic Programming, Intro to Prolog, applications of

Bindings And Scope

3. Variables
Replace absolute address with symbols. A tuple: (name, address, value, type, lifetime, scope).
address: can have many instantiations of same name at different addresses. via functions, different scope, recursion. vs.
alias - many names for one address. EQUIVALENCE in Fortran. Pascal and Ada's variant record structure. C's union. C's pointers, C++'s references.
type - defines value range and ops can perform. unsigned short int.
value - contents of mem cell.

4. Binding
Association between attrib and entity, or between op and symbol.
binding time = when binding takes place.
int count; ... count = count + 5;
Bindings and binding times:
set of poss types for count: lang design time
type of count: compile time
poss values for count: compiler design time
count's value: execution time
meanings of +: lang definition time
meaning of + in this statement: compile time
internal representation of 5: compiler design time

binding attribs to vars: static = before runtime. vs. dynamic.

Type Binding. static w/ or w/o explicit declarations. Modern: Perl, ML - no explicit. VB6 optional. Fortran, PL/I, Basic: implicit. we spoke about Fortran's convention, and about unreliability. C: declarations vs definitions. dynamic: not bound by declaration, but at assignment, and can be reassigned. =flexibility. func can deal with data of any type. C++ needs know type of the data. Consider swap in C.
Jscript: list = [1.4, 4.5] and can reassign to a scalar.
lack of reliability due to typos generating new vars, or if assignment should not work, will.
usually interpreted rather than compiled because hard to implement dynamic type changes in machine code.

Type inference: ML can infer type.
fun circum(r) = 3.14 * r * r;
fun square(x) : int = x * x; //int specifies return type

Storage Bindings/Lifetimes
allocation/deallocation. lifetime= time var bound to a specific address.
static vars - throughout program execution. retain history, speed 'cause no alloc/dealloc. no recursion, waste of space.
stack-dynamic - allocated when declaration statement reached. could either be when enter func (Ada) or when reach that pos (Java). Allows recursion, sharing of space.
explicit heap-dynamic - using new and delete
implicit heap-dynamic - Perl, Javascript arrays and strings. as a result of assignment.

Type-Checking
For both operations and functions. Type checking: make sure operands are of compatible types. compatible = legal for operator or can be implicitly converted (=coersion). type error - where cannot apply operator to operand.

depends on bindings of vars to types. if static, can do type checking statically. if dynamic, must be done at runtime = dynamic type checking. better to detect earlier. tradeoff of reliability with flexibility. What about variants, unions? Must maintain current type, if do actually check.

Strong Typing
Structured programming in 70's --> strong typing. each name has single type, defined statically = known at compile time. For variant records: type errors are always detected.
Pascal: almost strongly typed. allows type tag omission in variants.
Ada as well: variants dynamically checked. But can specify UNCHECKED_CONVERSION function. oiwerful, useful for using pointers as ints for user-defined storage allocation/deallocation. Modula-3: LOOPHOLE
C, C++: not. functions where parameters not type-checked. Namely, the "..." How do you think printf works?
ML: Strong: types all statically known, from declarations of inference.
Java: Strongly typed. Can do explicit conversion, resulting in type error, but no implicit conversion. at least in assignment. still some implicit coversion, in terms of evaluating expressions. int + float. string + int. "hello" + i + j.
more coercion --> less reliability.

Type Compatibility
in terms of assignment, most often. name type compatibility = only if have same names.
structure type compatibility - if structures are the same.

Consider the enum in C. Can we assign it to an int of v.v.?
When pass vars to funcs. if we define locally, who is to say that types are the same? Original Pascal - define it globally.

flexibility vs. difficulty implementing. compiler compare name, or compare each element type. What if have same structure in terms of data types but diff field names? Diff subscript ranges (0...9) vs (1..10)? Two enums with same num of components?

type celsuis = real;
type fahrenheit = real;

Pascal did not initally specify, which led to diff implementations. ISO Pascal - mixed: declaration equivalence - if define one type name = another type.

Ada: derived type vs subtype. derived:

type celsuis is new FLOAT;
type fahrenheit is new FLOAT;

cannot assign to each other, to and fro FLOATS, except literals.
subtype is compatible with parent:
subtype SMALL_TYPE is INTEGER [range 0 .. 99];

C: structural equivalence.
C++: name equivalence. typedef does not introduce new type, but acts as a macro.

Ada: need no type names
C, D : array (1 .. 10) of INTEGER; is incompatible
type: LIST_10 is array (1 .. 10) of INTEGER;
C, D: LIST_10;

Scope
scope = range in which is visible, ie can be referenced in a statement. locals vs. globals.

ALGOL 60: static scope. this is what we are used to. in C: globals seen in functions. within functions, nested ifs, blocks, etc. However, also nested procedures. static parent and ancestors.


procedure big;
var x : integer;
procedure sub1;
begin { sub1 }
...x...
end; {sub1}
procedure sub2;
var x : integer;
begin { sub2 }
...x...
end; {sub2}
begin { big }
...x...
end; {big}

Also, have narrower scop hiding wider scope. Scope resolution op in C++. C++ doesn't allow nested funcs, but do allow globals.

Blocks
Again, ALGOL 60.
In Ada:
declare TEMP : integer
begin
TEMP := FIRST
FIRST := SECOND
SECOND := FIRST
end;

C, Java: {}. for's changing scope.

Dynamic Scope
APL, SNOBOL4, early lisp
problems: cannot determine attributes statically.
1) less reliable, since subprograms, even far off textually, can modify local vars.
2) cannot do static type checking
3) akin to Spaghetti code.
4) speed issues - much slower to access
benefits: no need to pass vars, bec implicitly visible
But static is more readable and faster. perhaps later discuss implementation of each.

Reading in ALGOL, Compiling in ALGOL

We are about ready to move on to the next programming language: ALGOL.

Here is a page with some free ALGOL compilers and interpreters. Try installing one and see if the following "hello world" code will run:

'BEGIN'
 'COMMENT' Hello World in Algol 60;
  OUTPUT(4,'(''('Hello World!')',/')')
'END'

Or in ALGOL 68:

( # Hello World in Algol 68 # print(("Hello World!",newline)))

In the book, read from pg. 95-121.

Here is a Wikipedia article on ALGOL:

Analysis of a FORTRAN Matrix Program - III

The code, as a reminder:

! program to calculate whether a graph is strongly connected
! it calculates this my using matrix multiplication on the
! adjacency matrix representing the graph
PROGRAM matrix

INTEGER :: length
PARAMETER (length = 4)
INTEGER :: a,b, c, m(length,length), copy(length,length), total, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

!initialize a copy of m so that we may calculate N^2
copy = m

DO a = 1, length
DO b = 1, length
total = 0
DO c = 1, length
total = m(a,c) * copy(c,b) + total
END DO
IF (total > 0) THEN
m(a,b) = 1
ELSE
m(a,b) = 0
END IF
END DO
END DO

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

Now, while the example I used in class had only 1s and 0s in the result, in reality we may have higher numbers. This is fine. A zero in position (i,j) means that there is no path from i to j, but a number greater than zero, such as 1 or 3 or 8, means that there is a path. So we need not check specifically for a nonzero and set the result to 1.

One we know this, we may use the built-in matmul function that I suggested you use, rather than doing the matrix multiplication by hand. Doing matmul it by hand is certainly still good programming exercise, but if we have this function built in, it may just be incredibly optimized. Further, it saves programmer work. Further, it makes the code smaller and thus more readable and less error-prone.

There was actually a bug in this code, which I should comment upon before proceeding:


DO a = 1, length
DO b = 1, length
    total = 0

        ! the following code calculates a single dotproduct
    DO c = 1, length
        total = m(a,c) * copy(c,b) + total
    END DO

        ! and here we assign the result to m
    IF (total > 0) THEN
        m(a,b) = 1
    ELSE
        m(a,b) = 0
    END IF
END DO
END DO

The problem is twofold. Firstly, we are only doing a single matrix multiplication in this entire code - calculating M^2 rather than M^4. Secondly, we are not even calculating M^2 correctly! This is because we are modifying M in the middle of the multiplication! M(1,1) is being set, and then that value is being used to compute M(1,2), M(1,3), M(1,4), and M(2,1), M(3,1), and M(4,1). We could solve this by making a third matrix, result, and assigning to that matrix, and then stating m = result at the end. But let us skip this entire step and just use the built in matmul function. We no longer need the copy matrix for this purpose, but I will leave it in in case we need it for some other purpose. I will change the code, commenting out the previous code so we can see what changed. I will still only calculate M^2. We will deal with M^4 and the associated problem later.


! program to calculate whether a graph is strongly connected
! it calculates this my using matrix multiplication on the
! adjacency matrix representing the graph
PROGRAM matrix

INTEGER :: length
PARAMETER (length = 4)
INTEGER :: a,b, c, m(length,length), copy(length,length), total, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

!initialize a copy of m so that we may calculate N^2
copy = m

!DO a = 1, length
! DO b = 1, length
!  total = 0
!
!  DO c = 1, length
!   total = m(a,c) * copy(c,b) + total
!  END DO
!
!  IF (total > 0) THEN
!   m(a,b) = 1
!  ELSE
!   m(a,b) = 0
!  END IF
! END DO
!END DO

! calculate M^2
m = matmul(m, m)

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

Analysis of a FORTRAN Matrix Program - II

The program already has a nice feature. It defines length to be 4 and then uses length throughout in the loops. That way, if we want to change the program to handle a 5 X 5 matrix, we can just change the declaration of our matrices to 5 X 5, and length to = 5, and the program will run. Using symbols instead of literals in this way makes the program mode readily adaptable.

The program, again:

! program to calculate whether a graph is strongly connected
! it calculates this my using matrix multiplication on the
! adjacency matrix representing the graph
PROGRAM matrix

INTEGER :: a,b, c, m(4,4), copy(4,4), total, length = 4, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

! copy(1, :) = (/ 1, 0, 1, 0/)
! copy(2, :) = (/ 1, 0, 0, 0/)
! copy(3, :) = (/ 1, 1, 0, 1/)
! copy(4, :) = (/ 0, 0, 1, 0/)

!initialize a copy of m so that we may calculate N^2
copy = m

DO a = 1, length
DO b = 1, length
total = 0
DO c = 1, length
total = m(a,c) * copy(c,b) + total
END DO
IF (total > 0) THEN
m(a,b) = 1
ELSE
m(a,b) = 0
END IF
END DO
END DO

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

However, as implemented, there are two things I would change. Firstly, length is declared as a variable. This means that somewhere in the program, I might change it by accident. It would be better to declare it a constant. But how do we do this in FORTRAN? The answer, we declare its data type and then set its value with a PARAMETER statement.

Another thing I would change: the declaration of our matrices m and copy use a literal 4. When modifying the program, I would not only have to change the definition of length but my matrix definitions as well. Instead, let us use our new constant to define the matrix size. We will also remove the old code that was commented out:

! program to calculate whether a graph is strongly connected
! it calculates this my using matrix multiplication on the
! adjacency matrix representing the graph
PROGRAM matrix

INTEGER :: length
PARAMETER (length = 4)
INTEGER :: a,b, c, m(length,length), copy(length,length), total, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

!initialize a copy of m so that we may calculate N^2
copy = m

DO a = 1, length
DO b = 1, length
total = 0
DO c = 1, length
total = m(a,c) * copy(c,b) + total
END DO
IF (total > 0) THEN
m(a,b) = 1
ELSE
m(a,b) = 0
END IF
END DO
END DO

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

Analysis of a FORTRAN Matrix Program - I

Here I present a matrix homework. I will post comments and suggestions in a series of posts. First, here is the program at the outset.

PROGRAM matrix

INTEGER :: a,b, c, m(4,4), copy(4,4), total, length = 4, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

copy(1, :) = (/ 1, 0, 1, 0/)
copy(2, :) = (/ 1, 0, 0, 0/)
copy(3, :) = (/ 1, 1, 0, 1/)
copy(4, :) = (/ 0, 0, 1, 0/)

DO a = 1, length
DO b = 1, length
total = 0
DO c = 1, length
total = m(a,c) * copy(c,b) + total
END DO
IF (total > 0) THEN
m(a,b) = 1
ELSE
m(a,b) = 0
END IF
END DO
END DO

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

One thing that is missing from this program is comments. Tell me what the program is supposed to do, and what each section of the code is supposed to do. It used to be that the letter C at the beginning of the line would start a comment. Here, instead, we use the character !

Another thing to realize is that you can assign an entire matrix to another matrix with the = operator. You need not initialize copy separately, or even loop, copying elements. Rather, just use =. I will comment out the code and put in the new code in its place. My changes in red.

! program to calculate whether a graph is strongly connected
! it calculates this my using matrix multiplication on the
! adjacency matrix representing the graph
PROGRAM matrix

INTEGER :: a,b, c, m(4,4), copy(4,4), total, length = 4, d, e, f

m(1, :) = (/ 1, 0, 1, 0/)
m(2, :) = (/ 1, 0, 0, 0/)
m(3, :) = (/ 1, 1, 0, 1/)
m(4, :) = (/ 0, 0, 1, 0/)

! copy(1, :) = (/ 1, 0, 1, 0/)
! copy(2, :) = (/ 1, 0, 0, 0/)
! copy(3, :) = (/ 1, 1, 0, 1/)
! copy(4, :) = (/ 0, 0, 1, 0/)

!initialize a copy of m so that we may calculate N^2
copy = m

DO a = 1, length
DO b = 1, length
total = 0
DO c = 1, length
total = m(a,c) * copy(c,b) + total
END DO
IF (total > 0) THEN
m(a,b) = 1
ELSE
m(a,b) = 0
END IF
END DO
END DO

DO f = 1, length
PRINT *, m(f,1:4)
END DO

DO e = 1, length
total = 0
DO d = 1, length
IF( m(d,e) .EQ. 1 .AND. m(e,d) .EQ. 1 ) total = total + 2
END DO
IF ( total == 8 ) THEN
PRINT *, 'Strongly Connected At: ', e
ELSE
PRINT *, 'Not Strongly Connected At: ', e
END IF
END DO

END PROGRAM matrix

Monday, February 20, 2006

Some Pointers with the FORTRAN matrix assignment

1)
Definition of Strongly Connected Graph

A directed graph is called strongly connected if for every pair of vertices u and v there is a path from u to v and a path from v to u.

2) A program that uses the built in matmul, and does assignment from one matrix to another:

! EXAMPLE 4
!
! Aprogram to compute the largest eigenvalue
! of a real, dense matrix by the power method

integer, parameter::devno=10, n=10, minit = 10, maxit=100
real::lambda_old, lambda_new
real::rel_err, tol = 2.0**(-20)
real,dimension(n)::x_new, x_old=(/(1.0/n,i=1,n)/)
real,dimension(n,n)::a
open(devno, file='a.dat')
read(10,10) ((a(i,j),j=1,n),i=1,n)
10 format(10f4.1)
iterate: do i=1,maxit
 write (*,*) "i = ", i
 x_new=matmul(a, x_old)
 lambda_new = sqrt(dot_product(x_new, x_new))
 write(*,*) "lambda_new = ", lambda_new
 if (i .gt. minit) then
    rel_err = abs((lambda_new - lambda_old)/lambda_old)
    if (rel_err .lt. tol) exit iterate
 endif
 if (i .eq. maxit) write (*,*) "Maximum Iterations reached"
 lambda_old = lambda_new
 x_old = x_new/lambda_new
end do iterate
write(*,*) "lambda = ", lambda_new
end

Another place that uses matmul.

Friday, February 17, 2006

Explaining Lex and Yacc

Note: Change {} after includes to pointy brackets.
1)
What exactly are you doing when you process something with lex?

Well, lex takes your lex file, say example2.l and generates a file in the C language. The name of this file is lex.yy.c.

In class, I spoke about Finite State Machines, and how they can recognize regular languages. lex.yy.c is an implementation of a Finite State Machine based on your lex file. The regular expressions are inputs, and the actions or printfs are the outputs.

Thus, let us say we have example2.l:

%{
#include {stdio.h}
%}

%%
[0123456789]+           printf("NUMBER\n");
[a-zA-Z][a-zA-Z0-9]*    printf("WORD\n");
%%

The stuff between %{ and %} will be included in the C file, which is important since we are using printf. The stuff between %% and %% are regular expressions together with the action to take after successfully recognizing that expression.

At the command line, we type:
lex example2.l

This generates the file lex.yy.c.

Pseudocode to explain what this file contains (obtained from this website):

        state= 0; get next input character
      while (not end of input) {
         depending on current state and input character
            match: /* input expected */
               calculate new state; get next input character
            accept: /* current pattern completely matched */
               state= 0; perform action corresponding to pattern
            error: /* input unexpected */
               state= 0; echo 1st character input after last accept or error;
               reset input to 2nd character input after last accept or error;
      }

Thus, we start at the start state, read input. So long as we are matching, we calculate a new state based on current state and input character. We do this until we enter an accept state. If we do, we go to the start state once again, so we can match the next string (=token), . If we go to an error state along the way, we say so and try matching again.

We need not go into detail here how exactly lex.yy.c implements this Finite State Machine in C. You can try examining the code if you want.

Now, if we want to run this Finite State Machine in lex.yy.c, we cannot, because it is written in C, not in machine language. Thus, we must compile the file.

There are two ways to compile it. We can incorporate the lex.yy.c with other files in a C project, and call the various functions and access the variables it exposes. We would need to do this because lex.yy.c does not have its own main function. Alternatively, we could specify that libl provide us with an appropriate main(). We do the latter.

Thus, at the command line, we type:
cc lex.yy.c -o example2 -ll

This -ll gets the main() function from libl. -o states that the output file should be what we specify, namely example2.

We can then run the executable, example2.

2) What exactly does yacc do?
yacc takes a yacc file which contains BNF and generates a C file which implements a Push Down Automaton (PDA) which accepts the context free language specified in the yacc file.

We use yacc together with lex.
Our lex file will recognize and inform us about tokens, while our yacc file will base itself upon those tokens and recognize a context free grammar.

Beforehand, we used the -ll option with the cc command, so that we were provided with a main() function.

Now, however, we want to compile more than just the lex.yy.c file. We want to incorporate it with C code generated by yacc. Thus, we will no longer use the -ll option. Instead, we will make sure that our yacc file has a main() function defined.

3)
Let us consider example4, which is the thermostat example.

Create a lex file, example4.l. The contents of this file should be:

%{
#include {stdio.h}
#include "y.tab.h"
%}
%%
[0-9]+                  return NUMBER;
heat                    return TOKHEAT;
on|off                  return STATE;
target                  return TOKTARGET;
temperature             return TOKTEMPERATURE;
\n                      /* ignore end of line */;
[ \t]+                  /* ignore whitespace */;
%%

Most of this should be familiar. We are matching tabs, newlines, the string "on" or "off", etcetera.

One thing is different, though. Rather that just having #include {stdio.h}, we have an additional line that we are explicitly marking to be included in the generated C file, lex.yy.c. That line is:

#include "y.tab.h"

y.tab.h is a header file needed in association with the C file generated by yacc, namely y.tab.c.

We now need a yacc file. The tutorial does a bad job here by separating parts of the file, and not . Basically, we need a single file that does three things:

Provide literal code that will include necessary .h files and provide needed functions such as main(). This part will be between %{ and %}
Give a definition of the tokens to be used in our grammar. This line will start %token.
Lay out the context free grammar, with rules, as well as actions to take (such as printf) after the rule has been satisfied. This should be embedded within %% and %% tags, but the tutorial omitted it, and thus you may have had a compile time error.

The contents of the file example4.y should be:

%{
#include
#include

void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}

int yywrap()
{
return 1;
}

main()
{
yyparse();
}

%}

%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE

%%
commands: /* empty */
| commands command
;

command:
heat_switch
|
target_set
;

heat_switch:
TOKHEAT STATE
{
printf("\tHeat turned on or off\n");
}
;

target_set:
TOKTARGET TOKTEMPERATURE NUMBER
{
printf("\tTemperature set\n");
}
;
%%

Save this file as example4.y. Then run yacc on it by typing at the command line:

yacc example4.y

This will generate two files, y.tab.h and y.tab.c.

Now, we have our three files:
lex.yy.c
y.tab.c
y.tab.h

We need to compile them. Remember we are not using the -ll option since we are providing our own main() function.

cc lex.yy.c y.tab.c -o example4

Then run example4.

Thursday, February 16, 2006

Names and Binding

We may return to talk more about parsing later.

1. Name forms:
max length, connector characters, case sensitivity, Limitation on max length to ease symbol table. C++ unlimited, but depends on implementation. does case sensitivity hurt readability, reliability, writability?
2. Special words = keywords vs. reserved words.
REAL APPLE
REAL = 3.4
INTEGER REAL
REAL INTEGER
predefined names in Ada. INTEGER and FLOAT are there by default but may be overridden. Pascal. Others in Ada brought in via with statements. printf and scanf in C.
3. Variables
Replace absolute address with symbols. A tuple: (name, address, value, type, scope).
address: can have many instantiations of same name at different addresses. via functions, different scope, recursion. vs.
alias - many names for one address. EQUIVALENCE in Fortran. Pascal and Ada's variant record structure. See here:
How do we implement this? Change size of allocate maximum.






TYPE Geom_Type IS (Unknown, Rectangle, Circle);
TYPE Geometric_Figure(Figure_Type : Geom_Type := Unknown) IS RECORD
-- Invariant data fields here
CASE Figure_Type IS
WHEN Rectangle =>
   Width  : Float;
   Height : Float;
WHEN Circle =>
   Radius : Float;
WHEN Unknown =>
   NULL;
END CASE;
END RECORD;

constrained vs. unconstrained.

GF : Geometric_Figure;
-- An unconstrained Geometric_Figure
GC : Geometric_Figure( Figure_Type => Circle );
-- A constrained Geometric Figure of Figure_Type Circle

discriminant, fixed, variant. reading allowed for all, var so long as proper type. writing: discr if dont change val or as record assignment. record assignment allowed if unconstrained, else discr must match. record compare if discr match/
C's union. C's pointers, C++'s references.

type - defines value range and ops can perform. unsigned short int.
value - contents of mem cell.

4. Binding
Association between attrib and entity, or between op and symbol.
binding time = when binding takes place.
int count; ... count = count + 5;
Bindings and binding times:
set of poss types for count: lang design time
type of count: compile time
poss values for count: compiler design time
count's value: execution time
meanings of +: lang definition time
meaning of + in this statement: compile time
internal representation of 5: compiler design time

binding attribs to vars: static = before runtime. vs. dynamic.

Type Binding. static w/ or w/o explicit declarations. Modern: Perl, ML - no explicit. VB6 optional. Fortran, PL/I, Basic: implicit. we spoke about Fortran's convention, and about unreliability. C: declarations vs definitions. dynamic: not bound by declaration, but at assignment, and can be reassigned. =flexibility. func can deal with data of any type. C++ needs know type of the data. Consider swap in C.
Jscript: list = [1.4, 4.5] and can reassign to a scalar.
lack of reliability due to typos generating new vars, or if assignment should not work, will.
usually interpreted rather than compiled because hard to implement dynamic type changes in machine code.

Type inference: ML can infer type.
fun circum(r) = 3.14 * r * r;
fun square(x) : int = x * x; //int specifies return type

Storage Bindings/Lifetimes
allocation/deallocation. lifetime= time var bound to a specific address.
static vars - throughout program execution. retain history, speed 'cause no alloc/dealloc. no recursion, waste of space.
stack-dynamic - allocated when declaration statement reached. could either be when enter func (Ada) or when reach that pos (Java). Allows recursion, sharing of space.
explicit heap-dynamic - using new and delete
implicit heap-dynamic - Perl, Javascript arrays and strings. as a result of assignment.