Tuesday, March 07, 2006

Data Types - III

Associative Arrays

Perl:
%ages = ("Josh" => 10, "John" = 32);
$ages{"Josh"} = 29;
// and assignment will create an element
delete $ages{"John"};
// or the entire hash:
@salaries = ();

Here is a Perl tutorial on hashes. Read through it. keys and values operator/collections.

    for my $key ( keys %hash ) {
my $value = $hash{$key};
print "$key => $value\n";
}
exists keyword, or else will create the element.
if (exists $ages{"Peter"} ) ...

Possible Implementation: Read this website about implementation of Perl hashes. Linked Lists - but then O(N). Could turn each string into unique index, and use array, but then very sparse. Can use a hash function. Handle collisions how? Better hash function, or else chaining. (Alternatively, red-black trees.) Read article for more info.

C++: Map, multimap. C#: Dictionary, etc.

Record:
Heterogeneous data structure.
First, COBOL. (Not in early Fortrans.) C's struct. C++'s class. Java has class, not struct. C# has both, but with different meanings (struct is treated as a primitive, unboxed).

COBOL:

01 EMPLOYEE-RECORD.
02 EMPLOYEE-NAME.
05 FIRST PICTURE IS X(20).
05 MIDDLE PICTURE IS X(10).
05 LAST PICTURE IS X(20).
02 HOURLY-RATE PICTURE IS 99V99.

Pascal, Ada, use record nesting rather than relative numbers.


EMPLOYEE-RECORD :
record
EMPLOYEE-NAME :
record
FIRST : STRING(1..20);
MIDDLE : STRING(1..10);
LAST : STRING(1..20);.
end record;
HOURLY-RATE : FLOAT;
end record;
Referring to subelements:
COBOL: MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-RECORD
Others langs: dot, or -> for pointers to structs. Fortran: %

elliptical names - as long as unambiguous, can omit middle stuff.

with in Pascal, VB6. Can do this in C++ using references.

Record Ops
Assignment. types must be same. Ada: comparison for =, <>, initialization with literals.

COBOL: MOVE CORRESPONDING
if subfields have same name, copy it.

Implementation. adjacent memory, and field names associated with offset. descriptor has, for each field: Name, type, offset.

Unions
type that stores diff type values at diff times.
Q: Should we provide type-checking for current type, dynamically? Should unions be embedded in records?

free unions give programmer complete control, no type checking.
Pascal's unions, Ada's unions: have a tag or discriminant that tells current type.
more about this, but can read up on it. Ada adds constrained union such that in declaration, can restrict to one specific type, and thus do static type-checking.

Eval: flexibility vs. reliability.

Implementation:
Use same address for all members of union. enough storage for the largest.

Sets
C++ has but not exactly the same.
Until then, Pascal alone among imperative langs. max size implementation dependent, often < style="font-weight: bold;">type colors = {red, green, blue};
colorset = set of colors.
var set1, set2 : colorset;

set1 := [red, blue];

if ch in ['a', 'e', 'i', 'o', 'u'] ...

Check out ISETLW.

Eval
Ada, based on Pascal, but did not provide sets. rather, added membership op on enums.

other langs, use arrays and implement ops. unweildy, less efficient.

Impl
stored as bit strings. 0 if present, 1 if not present.

(C++ has a bitset.)

Pointers
We know about pointers, dereferencing, assignment to local, off heap, etc.

some problems with pointers:
dangling pointers - points to deallocated heap variable. and what if that memory claimed by s/t else?
lost heap-dynamic variable - lost in terms of no longer accessible. memory leak, vs. garbage collection.

Pascal: create new vars with new and elim with dispose, which sets all pointers pointing at this object to nil. implementation dependent.

Ada: access types. auto dealloc when ptr goes out of scope. elims problems caused by reliance on explicit dealloc. though gives deliberately unweildy alternate method.

C++, Java: ref types.

Handling dangling ptrs:
tombstones. ptrs only point to tombstones. tombstone points to actual memory. when deallocate, tombstone remains and is set to nil. costly in time and space. storage never reclaimed. no prog lang uses.

garbage collection vs. reference counters: lazy, eager approach.

reference counters - keep track how many things pointing to this, when ref count reaches zero, can be reclaimed.

garbage collection - when run out of space, or can explicitly invoke. how do we know when something not pointed to? keep track of each ptr. named ptrs, and following named ptrs as far as they go.


Next: Expressions, Assignment Statements
unary, binary, ternary ops.
usually Infix.

a + b * c

right to left, left to right, or operator precedence.

associativity to left, to right: a - b + c - d

use of parentheses.

side effects -what else happens as a result.
a + fun(a)
functional side-effects
FORTRAN only allows (=legal) if function does not change value. but s/t hard to determine, based on pointers, globals.

operator side-effects
b = a++ + a++;
reference point, e.g. assigning twice between two reference points.


to be continued...

No comments: