Cs 1621 B

CS1621 Structure of Programming Languages Part B
By John C. Ramirez Department of Computer Science University of Pittsburgh
Course Notes for
These notes are intended for use by students in CS1621 at the University of Pittsburgh and no one else These notes are provided free of charge and may not be sold in any shape or form Material from these notes is obtained from various sources, including, but not limited to, the textbooks:
Concepts of Programming Languages, Seventh Edition, by Robert W. Sebesta (Addison Wesley) Programming Languages, Design and Implementation, Fourth Edition, by Terrence W. Pratt and Marvin V. Zelkowitz (Prentice Hall)
Compilers Principles, Techniques, and Tools, by Aho, Sethi and Ullman (Addison Wesley)
Expressions
Expressions are vital to programs

Allow programmer to specify the calculations that computer is to perform It is important that programmer understand how a language evaluates expressions
Things to consider:
Precedence and associativity Order of operand evaluation
Side-effects of evaluation
Overloadings and coercions
3
Expressions
Precedence and Associativity

We always learn these rules for any new language
Vital to using expressions correctly
Most languages have similar precedence for the standard operators: * / then + But programmer needs to understand precedence and associativity for all operators, especially those that may be unusual
4
Expressions
Ex: boolean and relational operators

and or not < > <= >= != ==
In Pascal, the boolean operators have higher precedence than the relational operators (opposite of C++)
if x < y then writeln(Less); if x < y and y < z then writeln(Middle);
Above is an error in Pascal, since the first subexpression evaluated would be y and y
if (x < y) and (y < z) then writeln(Middle);

Now it is ok
In C++
if (x < y && y < z) cout << Middle << endl;
This is fine in C++
5
Expressions
Ex: unary ++ and -- in C++
Precedence and associativity are wacky!

#include <iostream> using namespace std; int main() { unsigned int i1 = 0, i2, i3, i4, i5, j, k, m1, m2, m3, m4, m5; j = i1++; k = ++i1; cout << j << " " << k << endl; i5 = i4 = i3 = i2 = i1; m1 = i1++ + i1++ + i1++; m2 = i2++ + ++i2 + i2++; m3 = i3++ + ++i3 + ++i3; m4 = ++i4 + i4++ + ++i4; m5 = ++i5 + ++i5 + ++i5; cout << i1 << " " << m1 << endl; cout << i2 << " " << m2 << endl; cout << i3 << " " << m3 << endl; cout << i4 << " " << m4 << endl; cout << i5 << " " << m5 << endl; } 6
Expressions
Output? See plusplus.cpp try it on different platforms http://www.cppreference.com/operator_precedenc e.html See problem in Assignment 3 Compare to plusplus.java and plusplus.pl
Expressions
In some cases, expression is ambiguous and compiler will not let you do it, or warn you about it
Ex: A ** B ** C in Ada
Must have parentheses
Ex: Mixing bitwise operators in C++

Warning to use parentheses
Sometimes you could probably figure it out, but youre better off not trying
Ex: If more than one coercion can occur in C++
May have defined constructor and conversion fn
8
Expressions
Sometimes you dont think you should care, about precedence and associativity, but you should
In math, addition and multiplication are associative and commutative On computer, overflow can cause this to not always be the case:
floats x = 1e+30, B = 1.0/1e+30, C = 1e+30 A*B*C A*C*B ~= 1e+30 = infinity see Overflow.cpp F1.add(F2); F2.add(F1) -- If F1 and F2 are from different classes, the operations may be different or perhaps not even legal
9
Expressions
Side-effects can also cause evaluation order problems

Expressions can involve function calls, which can change variable values
Y = f(X) + X; Y = X + f(X);
Without side-effects, the results are the same, but if f(X) changes the value of X, the results could be different
Most languages allow reference parameters with functions These can cause logic errors if used improperly See side.cpp
10
Expressions
How to handle this?

Leave it up to the programmer, as in Pascal and C++
Limits compiler optimizations, some of which may include reordering of operations Compiler cannot reorder if it could possibly change result
Do not allow (most) side-effects to occur, as in Ada

Ada functions cannot change parameters Now optimizations can reorder expressions without changing result (at least due to this)
Best advice is to program in such away as to either avoid all side-effects, or to only allow them in cases where they will not affect expression evaluation
11
Expressions
Operator Overloading
Used in many newer high-level languages
Can be good and bad

Good:
Aids in readability and simplifies code if used correctly
Ex: New class Complex variables A, B and C
A + B + C is more clear than (A.add(B)).add(C)
Ex: String variables can be compared

if (A < B)
is clearer than
if (A.compareTo(B) < 1)
12
Expressions
Bad:
Can harm readability if used incorrectly
Ex: + defined to do multiplication
But methods could be improperly named as well
Function calls are not obvious, especially if other versions of the function exist
In C++ we could have an member function + and also a friend function + which is used?
Can allow some logic errors to go undetected

Ex: C++ uses / for float and integer division
If user expects a value between 0 and 1, its not going to happen if integer division is used
13
Expressions
Some languages like C++ and Ada allow programmer-defined operator overloading
Others like Java do not

Both positions have support
14
Expressions
Coercion and conversion

In many expressions we use more than one datatype
Mixed expressions This seems a reasonable thing to allow
However, often the operators and functions used are defined for only a single type In this case, to allow mixed expressions to be used, some types must be converted to other types
The differences in languages are whether these conversions should be IMPLICIT or EXPLICIT
15
Expressions
Explicit conversion
In this case the language allows little or no mixed expressions in the code To allow mixing of data types, the programmer must convert through an operation of function call
Ex: Ada does not even allow mixing of floats and integers
Good:
Everything is clear no uncertainty or ambiguity Programmer can more easily verify correctness of programs Easier to avoid logic errors
16
Expressions
Bad:
Makes language very wordy Can be annoying, especially when the types are similar (ex. addition of integers and floats)
Implicit conversion coercion

In this case mixed expressions are allowed, and the language coerces types where needed to allow types to match Usually a language has some rules by which the coercions are performed
Good:
Less wordy makes programs shorter and sometimes easier to write
17
Expressions
Bad:
Programs are harder to verify for correctness It is not always clear which coercion is being done, especially when programmer-defined coercions are allowed Can lead to logic errors in programs Ex: In C++ expressions are always coerced if they can be Standard rules of promotion for predefined types can be easily remembered However, programmer can also define functions that will be used for coercion
Constructors for classes and conversion functions are both implicitly called if necessary Now the rules are less clear and can lead to ambiguity and logic errors
18
Expressions
Consider A = B + C where A, B and C are all of different types Any/all of the following could exist:
+ operator with two type B arguments + operator with two type C arguments Constructor for type B with argument type Constructor for type C with argument type Coercion function from C to B Coercion function from B to C Constructor for type A with argument type Constructor for type A with argument type C B
B C
How does programmer know which will be used? Should NOT assume any particular coercion will occur in this case
Here explicit coercion should be used to remove ambiguity
See coercion.cpp and rational.h

19
Expressions
Boolean expressions
Expressions that evaluate to TRUE or FALSE
Formed using relational operators and boolean operators

Relational operators operators which compare values
Operands can be most primitive types and complex types as well in some cases
Boolean operators operators used to combine boolean results

Operands must be boolean values Exception is C/C++
20
Expressions
Same guidelines for precedence and associativity hold here

Know the rules for current language
Ex: Ada boolean operators and, or have the same precedence but are NON-associative when mixed with each other
if A and B or C then illegal in Ada must parenthesize
Ex: C++ boolean operator && has higher precedence than ||
21
Expressions
Short-Circuit Evaluation
Important note (that we may not have emphasized earlier):
Operator precedence and associativity are for OPERATORS, not OPERANDS
The operators simply indicate how the operands are combined/utilized, NOT the order in which they are accessed/determined For example: A + B + C + D We know we first add A and B, then add C, then add D But the VALUES for A, B, C and D could be obtained in ANY ORDER
Done to optimize execution (ex. in parallel)
22
Expressions
This is significant in (at least) 2 situations:

1) Operand evaluation produces a side-effect that changes result of subsequent operand evaluation
As we discussed previously, operand could be a function call with a reference parameter Operand could be used/modified more than once, as with ++ example
2) An operand may not be even be valid if a previous operand evaluates in a certain way
Ex: if ((X != 0) && (Y/X < 1)) cout << rational; Considering the && operator, if the first operand evaluates to FALSE, the second operand evaluates to a run-time error Now if the compiler would try to do these in parallel it could cause problems Solution is SHORT-CIRCUIT EVALUATION (SSE)
23
Expressions
Idea of SSE is simple:

Evaluate boolean expressions only until a final answer can be determined
For example with &&, we know that FALSE && ANYTHING == FALSE so we would not get the division by zero error
SSE is nice because it makes our code simpler

If we know compiler uses SSE, we can put into a single expression what otherwise would require two
24
Expressions
Ex:
if ((X != 0) && (Y/X < 1)) cout << rational;
Without SSE, how would we have to write this to prevent possible run-time error?
Do on board
Drawbacks of SSE?
Now computer must evaluate operands sequentially Slows down program execution, especially in environments with multiple CPUs
So we have safety/ease of programming vs. execution efficiency

25
Expressions
Solution is to offer programmer the choice

Ada uses arbitrary evaluation of operands normally
But special operators and then and or else provide short-circuit evaluation if desired
C++ and Java use SSE for && and || but arbitrary evaluation for bitwise & and |
26
Expressions
Assignment
Central to Imperative Languages
Gives a value to a variable

Typical syntax:
<variable> <assig. operator> <expression> Semantics:
1) Compute lvalue of variable 2) Compute rvalue of expression 3) Store computed rvalue in lvalue location
27
Expressions
Variations
Some languages allow multiple targets
C++ and Java allow conditional targets

Wacky ?: operator
C, C++ and Java have many assignment variations for convenience

Ex: ++, +=, *=
C, C++ and Java return the rvalue as operation result

Allows assignment to be mixed within other expressions As with many features from C, C++, this is both good and bad
28
Expressions
Allows shorter code in cases such as:

A=B=C while ((ch = getchar()) != EOF)
Since it is changing the value of a variable, order of evaluation is critical

Typically associates right to left, and it is a good idea to parenthesize (as above)
Famous C/C++ bug that we mentioned before: if (x = y) is wacky!

Will ALWAYS be true if y is non-zero Will ALWAYS be false if y is zero Newer compilers warn you about it Not possible in Java since if requires a boolean
Concern also must be given for overloading the assignment operator (legal in C++ and Ada)
It is possible to cause it to behave differently from what is normally expected Care has to be taken so that it works in all cases
29
Expressions
Ex: Overloading = for a linked list variable

LList<myData> A, B; // Fill B with various nodes A = B;
If we want to use this assignment as with other assignments, we need to return the assigned result as the result of the assignment
In C++ this is typically a reference return value, so that we can cascade the operator effectively A = (B = C); (A = B) = C; On the left, when the assignment B = C is finished, we need the rvalue of the result On the right, when the assignment A = B is finished, we need the lvalue of the result Reference allows both (even though right seems silly to do)
Also, how about A = A;

If we destroy old LL before assigning new one, this could destroy the value
30
Expressions
One issue that you may not normally consider: How is the rvalue evaluated?
For statically typed languages, there is usually no ambiguity expression result type must match the type of the variable
But for dynamically typed languages, it is no longer clear
Ex: in Prolog A=5+3 Since A is not necessarily an integer, 5 + 3 could be taken as a string just as reasonably as it could be taken as an arithmetic expression See assig.pl
31
Control Statements
Primary types of control in imperative languages

Selection
Choose between 1 or more different actions
Iteration
Repeat an action 0 or more times
32
Control Statements
Selection
One-way selection
if statement exists in virtually every imperative language
Idea here is that we either execute a statement or do not In modern languages this is achieved using an if without the optional else
Two-way selection
Now we incorporate the else with the if
33
Control Statements
Typical syntax:
if <condition> <statement> else <statement>
Interesting issues:
1) Form of condition? 2) What kinds of statements are allowed? 3) Is nesting allowed and how is it interpreted?
34
Control Statements
1) Form of condition
Most languages require a boolean expression (true or false only) C/C++ are exceptions int values are allowed Original FORTRAN and BASIC allowed only a single statement
This is not conducive to good programming techniques Only way to have multiple statements is by using an unconditional branch, i.e. GO TO
2) Kinds of statements
35
Control Statements
ALGOL 60 introduced the compound statement

Now an arbitrary number of statements can be used All newer imperative languages (and updates of older languages) either use compound statements or allow multiple statements within the if
3) Nesting
It logically follows that a statement within an if clause or else clause could be another if statement
Remember orthogonality
What issues occur in this case?

36
Control Statements
Only problem of interest is one we have already discussed

If the number of if clauses and else clauses are not equal, how are they associated?
There are two main approaches to handling this:

1) Use a rule (static semantics) to determine how this is handled
This is the approach taken in Pascal, C, C++ and Java System handles the rule consistently, so there is no ambiguity, but, like rules of precedence and associativity, the programmer could forget it or make a mistake that is not caught Can lead to logic errors We have already seen this example
37
Control Statements
2) Use syntax to determine how it is handled

This is the approach taken in Ada, BASIC, Modula2, ALGOL 68 Every if statement must be syntactically terminated (ex: end if) Now an inner if clause without an else clause must still have an end if, and syntactically the outer else can only be associated with the outer if Perl has a slightly different approach: the statement for an if MUST be a compound statement. Result is the same, since the inner if will now be within a compound statement
38
Control Statements
Multiple Selection
Idea is to choose from many possible options
Clearly one way of doing this is through nested if statements

Often preferable, especially if the means of selection is a series of separate boolean expressions
// Break tie for A and B in some sport if (A beat B twice) then A wins tie else if (B beat A twice) then B wins tie else if (A scored more points than B) then A wins tie else if (B scored more points than A) then B wins tie
39
Control Statements
However, in some situations, the options are based on different result values of a single expression:
Ex: Menu in which user chooses an option from 1 to 5; each option causes a different action
In these instances, nested ifs could be used

In fact these are all we really need But the nesting gets complicated, often making the statements harder to follow and making them more prone to logic errors
So many languages supply a case statement

Specifically designed for multiple alternative selection based on different results of a single expression
40
Control Statements
There are some interesting issues to consider here

Many are the same as for two-way selection Text discusses them at length
A few that we will look at

What happens after the code for the matched selection is executed? One option is to break out of the structure, continuing with the next statement after it
This makes each option mutually exclusive This approach is taken by Algol W, Pascal, Ada Probably the most intuitive idea the choices are mutually exclusive by default
41
Control Statements
C, C++ and Java do not automatically break out after the selection has been executed
This is good and bad (as usual) Adds flexibility
If the execution for one selection is a superset of another, it makes sense to allow the flow to continue within the selection statement
Causes potential logic problems

Programmer must manually add breaks If one is missed no syntax error occurs
What happens if no match is found?

Two logical alternatives:
1. Do nothing 2. Error
42
Control Statements
C, C++, Java adopt the do-nothing approach

Seems logical that if nothing matches nothing should be done
ANSI Standard Pascal and Ada adopt the error approach

More reliable, since now an accidental out of range value will be detected as an error rather than just a do nothing
C, C++, Java, Ada, Turbo Pascal, BASIC also provide a default choice
Good idea to always use so you can detect an out of range value without causing a runtime or logic error
43
Control Statements
Iteration
Three primary types of iterative loops: conditional loops, counting loops and arbitrary loops 1) Conditional (logically controlled) loops
Number of iterations is determined by a boolean condition, and cannot be (usually) precalculated
Note that we cannot predict when this condition will become false
44
ex: while (infile && valid == 1)
Control Statements
Many languages have two versions of the conditional loop

Pretest condition is tested prior to entering the loop body
May execute loop body 0 times
Posttest condition is tested immediately after executing loop body

Will always execute loop body at least 1 time Ada does not have this version
Two versions are provided for convenience we can always simulate one loop with the other (plus some conditionals)
See loops.cpp Clearly the difference is where each is more appropriate
45
Control Statements
Conditional loops are the most general kind of loops, and are really all that is needed in an imperative programming language However, many looping applications deal with arrays and sequences of values
For convenience and efficiency it is prudent to provide a looping structure geared toward these applications
2) Counting Loops (counter-controlled loops)

Number of iterations determined by a control variable, an initial value, a terminal value, and an increment
46
Control Statements
We can (usually) precalculate the number of iterations based on the initial value, terminal value and increment
Ex: for (int i = 3; i <= N; i+=2) { i obtains values 3, 5, 7, , N (or N 1 if N is even) For N = 31, the number of iterations equals CEILING((TERM INIT+1)/INCR) or CEILING((N 3 + 1)/2) = CEILING((31 3 + 1)/2) = 15
Precalculation is nice because it allows the computer to base the loop on an iteration count (if it chooses to do so) which can be executed more quickly than conditional testing each time
47
Control Statements
Machine can use a register for the iteration count and not have to worry about obtaining operands for the comparisons at each iteration of the loop, something that must be done with a conditional loop
To allow precalculation and iteration counts to work, some restrictions must be made on the loop
Loop control variable cannot be altered by the programmer within the loop body Terminal value must be calculated only one time, when loop is first entered It will also speed things up if the loop control variable is an integer (or integral type) so no float operations are necessary
This is the approach taken in Pascal and Ada

See for.p
48
Control Structures
Pascal and Ada also do not allow an increment other than 1 or 1, and do not carry the value of the control variable past the end of the loop
In Pascal, the value is officially undefined, but in any Pascal implementation it will typically be one of two things: 1) The terminal value of the loop or 2) The terminal value + 1 or 1. 1) typically indicates that iteration counts are being used In Ada, the loop control variable is implicitly declared in the loop header, and becomes really undefined at the end of the loop accessing it afterward will cause an undeclared variable error
This is now generally accepted as a good idea, since it reduces side-effect problems of using loop control variables that were declared and assigned elsewhere. C++ and Java both allow (but do not require) this as well
49
Control Structures
Attitude in Pascal and Ada is that if you want more complex iteration (ex. increment other than 1 or 1, option of changing number of iterations during the loops execution) you should use a while loop
C, C++ and Java have a different approach

For loop is not really a for loop in the traditional sense
It is a very general loop that can be used for any looping application It more appropriately is a while loop with the addition of an initialization-statement and a post-body statement
50
Control Statements
for (init-expr; pretest-expr; post-body-expr)
Now really anything goes and the pre-testexpr and post-body-expr are evaluated for each iteration of the loop Can certainly be used for a counting loop, as most of you have used it
Can also be used as an arbitrary loop to do more or less whatever programmer wants it to do
Added flexibility, with added danger The usual for C, C++ see for.cpp
51
"foreach" loop
Newer languages also have included a "foreach" loop to iterate through data
Key difference between "for" and "foreach"
"for" iterates through indexes (typically), which can be used to access an array / collection if desired
Loop control variable is typically an integer
"foreach" iterates through the values in the collection directly

No indexing is used, at least not directly Loop control variable is the data type we are accessing in the collection
52
"foreach" loop
foreach loop has its advantages and disadvantages

Advantages:
Since no counter is used, we eliminate the possibility of index out of bounds problems
We can iterate over a collection without having to know the implementation details of the collection
Allows for data hiding and improves error prevention We will likely discuss this more when we discuss object-oriented programming
53
"foreach" loop
Disadvantage
When accessing an array, we may want or need the index value
Ex: What if we want to change the data in the array or reorganize it
Ex: Sorting would difficult using "foreach"
See forEach.java and foreach.pl
54
Control Statements
3) Arbitrary Loops Now the loop is basically an infinite loop, with the programmer expected to break out of it explicitly at some point Ada allows this with the
loop
end loop; exit statement will break out of the loop, and can be put into an if statement Thus we can break out of the loop from more than one place
55
Control Statements
Although C, C++ and Java do not explicitly have this construct, you can certainly build it by making a while or for loop an infinite loop and using the break statement to break out
while (1) // C while (true) // Java
{
}
{
}
Again this feature adds flexibility, but makes code less readable and harder to debug
56
Control Statements
Unconditional Branching
Transfer execution from one section of code to another section of code Commonly known as the goto Used extensively in early languages which lacked block control structures
Ex. early FORTRAN and BASIC programs relied heavily on the goto
It was necessary then, but most modern languages contain block control structures
57
Control Statements
Even then computer scientists were aware of how problematic they could be
Spaghetti code that results is very difficult to read Modification of one code segment can significantly impact many parts of the program programmer must be aware of all places that can go to that code segment Debugging is very difficult it is hard to find and fix logic errors since all possible execution paths are difficult to trace
Now languages have blocks and extensive control structures

It has been shown that goto adds no functionality (i.e. nothing can be done with it that cannot be done without it) However, many languages still have goto
58
Control Statements
Unrestricted goto allows code segments that normally have only one entry and exit point to have many
Ex: What happens if you jump into the middle of a procedure (what about parameters?) or a while loop (condition is skipped)
Most newer languages that have the goto have restrictions on it

Ex: Cannot jump into an inactive statement or block in Pascal If restricted and used infrequently, can actually be useful in some languages
Ex: Pascal does not have a break statement. If an exceptional situation would case an exit from a loop, using a goto may be more readable than adding extra convoluted logic
59
Control Statements
Some (newer) languages do not have goto at all

Ex: Java Allows breaks from loops Has exception handlers
60
Subprograms
Subprograms
Semi-independent blocks of code with the following basic characteristics:
Only one entry point the beginning of the subprograms, and execute when called:
Parameter information is passed to subprogram Caller execution is temporarily suspended, and subprogram executes When subprogram terminates, caller execution resumes at point directly following the subprogram call
61
Subprograms
What types of subprograms can we have?

Most languages have two different types, procedures and functions
Procedures can be thought of as new named statements that can supplement the predefined statements in the language
Ex: Statements to search or sort an array
Once defined, these can be used anywhere they are needed in a program
62
Subprograms
In order to have an effect on the overall program, a procedure needs to act on something other than just the variables local to the procedure. This can be done through:
Outputting data to the display or to a file Altering a (relatively) global variable that will be accessed/used later by a different part of the program Altering formal parameters such that the actual parameters in the caller are modified
This will be discussed in more detail soon
63
Subprograms
Functions can be thought of as code segments that calculate and return a single result
Modeled after math functions Used within expressions, where result value is substituted for the call The effect of functions on the overall program is the value returned by them. Thus, from an ideal (and mathematical) point of view, functions should have NO OTHER effect on the overall program
64
Subprograms
Should NOT modify global variables Should NOT alter actual parameters
Naturally, both of the above are allowed in many languages

In these cases it is up to the programmer to decide how he/she wants to use functions Again the tradeoff for the increased flexibility is the more potential for logic errors and more difficulty in debugging
C/C++/Java
Only have functions, no procedures
void functions can mimic the behavior of procedures

65
Subprograms
Local variables
How/when are they allocated?
Stack-dynamic:
Default in most modern imperative languages Required for recursive calls, since memory must be associated with each call, not each subprogram
Ex: Binary Search
mid = (left + right)/2;
Many different values for mid must be able to coexist, one for each call on the run-time stack Could not do it memory was statically allocated
66
Subprograms
Overhead is time for allocation and deallocation each time a subprogram is called
May not seem like a lot of time is needed, but it can add up if many calls are made in a program
Access must be indirect since actual memory location of variable will not be known until a subprogram call is made
Location in run-time stack depends upon calls made prior to current one, which can differ from run to run Also adds some time overhead
Static:
Used in languages that do not support recursion (ex. older FORTRAN)
67
Subprograms
Also optional in other languages, such as C and C++ Allow variables to retain values from call to call
Remember the lifetime is the duration of the program Ex: In CS1501 LZW algorithm writing codewords to a file, the bit buffer is static The leftover bits are kept in the buffer for the next call
68
Subprograms
Parameters
Parameters are vital to subprograms
Allow information to be:

Passed IN to the subprogram Passed OUT from the subprogram Passed IN and OUT to and from the subprogram
When writing subprograms, programmer decides which is required for a given subprogram
69
Subprograms
Then programmer utilizes syntax/rules in language being used to achieve the desired option
Sometimes the syntax/rules of the language do not fit exactly with the 3 use options given In these cases programmer must be careful to use the parameters as he/she intends
Some definitions:
Formal Parameter:
Parameter specified in the subprogram header Only exists during duration of subprogram exec Sometimes called "parameter"
70
Subprograms
Actual Parameter:
Parameter specified in call of the subprogram May exist outside of the scope of the procedure Sometimes called just "argument"
Rules for Formal and Actual parameters differ, as we will discuss
71
Subprograms
Parameter Passing Options

Pass-by-Value
Pass-by-Reference
Pass-by-Result Pass-by-Value-Result Pass-by-Name
You should be familiar with Pass-by-Value and Pass-by-Reference Others may be new to you Well discuss each
72
Subprograms
Pass-by-Value
Formal parameter is a copy of the actual parameter
i.e. get r-value of actual parameter and copy it into the formal parameter
Default in many imperative languages

Only kind used in C and Java
Used for IN parameter passing
Actual can typically be a variable, constant or expression

73
Subprograms
Benefit is that actual parameters cannot be altered through manipulation of the formals
Also useful in some recursive calls, since a new copy is made with each call
Problem is that copying a parameter can be quite expensive, both in terms of time and memory
Ex: Consider an object with an array of 1000 floats
Object is copied with each call to the function If, for example, recursive calls are made, a lot of memory can be consumed very quickly
74
Subprograms
Implementation:
Using a run-time stack, this is straightforward
When subprogram is called, copy of actual parameter is placed into a local variable, which is stored on the run-time stack (in the activation record for the subprogram) During subprogram execution, formal parameter is used like any other local variable for the subprogram Only difference is that it is initialized via the actual parameter
75
Subprograms
Pass-by-Reference
Formal parameter is a reference to (or address of) the actual parameter variable
get l-value of actual param and copy it into the formal param, then access the actual param indirectly through the formal param
Used in Pascal (var parameters), in C (using explicit pointers) and C++ and PHP (&) Most appropriate for IN and OUT parameter passing, but can be used for all Actual param usually restricted to a variable
76
Subprograms
Benefit is that we can change or not change the actual parameter using the formal it is up to the programmer Also good that memory is saved only an address is copied Problem is that we can miss logic errors if we accidentally alter an actual parameter through the formal parameter Also some applications (ex: some recursion) dont work as well
We may not want change at one call to affect another call
77
Subprograms
Constant Reference Parameters

Developers of C++ realized that value parameters are not practical for large data objects (too much time and memory, esp. for recursive algorithms) Reference parameters have danger of accidental side effects (when used for IN parameters) Solution is to pass parameters by reference, but not allow them to be altered constant reference
Now compiler gives error if parameter is changed within subprogram Copy made if passed by reference to another sub
78
Subprograms
Good concept, but not perfect

Programmer can get around it by casting to a pointer and altering indirectly See params.cpp
Ada IN parameters have a similar idea

Cannot be assigned/altered within the function Cannot be passed by out or in out to another sub More on Ada params shortly
Implementation:
Using run-time stack, address of actual is stored in activation record Actual is accessed indirectly in sub through its address
79
Subprograms
Pass-by-Result
Reference parameters are not an exact fit for out parameters
Ex: A procedure designed to read data from a file into an object
Here we dont care about what used to be in the object we just want to be sure that at the end the appropriate value is assigned
With reference parameters we COULD access the old value and use it if we wanted to (or by mistake) Pass-by-Result prevents this
80
Subprograms
In Pass-by-Result, actual parameter is not actually passed to the subprogram it only waits to have a value passed back to it Formal parameter is a local variable
During life of subprogram its value does not affect actual parameter at all At end of subprogram its value is passed back to the actual parameter
So what is actually needed of actual parameter is its address (lvalue)

When address is obtained can affect result for some contrived examples
81
Subprograms
// Note: This is NOT real code int A[8]; for (int i = 0; i < 8; i++) A[i] = i; global int j = 2; foo(A[j]); output(A[]); sub foo(int param) { int temp = 25; j = 5; param = temp; } -----------------------------------------------Output: 0 1 25 3 4 5 6 7 // if address obtained // at call Output: 0 1 2 3 4 25 6 7 // if obtained at ret.
82
Subprograms
If used, address is typically obtained at call Ada 83 out parameters for simple types are ALMOST this, but the formal parameter value cannot be accessed within the sub (so it is not really a local variable)
Ada 95 changed out parameters to allow them to be accessed, fitting the Pass-By-Result model more closely
Implementation:
At sub call, actual param address is calculated and stored in run-time stack, as is the formal param (as a local) Final result of formal is copied back to actual address at end of sub
83
Subprograms
Pass-by-Value-Result
Now actual parameters value is passed to the formal parameter when subprogram is called, being stored and used as a local variable At the end of the subprogram the value is passed back to the actual parameter As the name indicates, this is a combination of Pass-by-Value and Pass-by-Result Used for IN and OUT parameters
84
Subprograms
If aliasing is NOT allowed/used, and if no exceptions occur in the subprogram the effect of value-result and reference is the same
Precondition: Actual parameter has value obtained previous to call
During subprogram: Only formal parameter is accessed, updated as desired

Postcondition: Actual parameter has last value assigned within subprogram
85
Subprograms
However if aliasing is allowed/used, there can be differences

Ex: Actual parameter is accessed directly as a global variable and is also passed to the sub as a parameter
With reference params, changes to the formal immediately change the global actual param With value-result params, changes to the formal do not affect the global actual param (until the sub terminates)
Ada uses value-result for simple IN OUT parameters

But in Ada 83 it is not specified how structured in out params are passed
86
Subprograms
Idea is that language creators did not want to require the params to be passed in any specific way
They just wanted to require the in-out effect If the result could differ based on whether params are value-result or reference, then the program is erroneous
Up to programmer to NOT use aliases
Ada 95 clarified, requiring all structured inout parameters to be reference

See params.adb
Implementation:
Value + Result
87
Subprograms
Pass-by-Name
Definitely wackiest way of param passing
Used for IN and OUT parameters, and only in Algol

Idea is that actual parameter is textually substituted for the formal in all places that it is accessed in the subprogram
Kind of like a macro substitution
It is only evaluated at the point of use in the subprogram

Evaluated EACH TIME it is used in subprogram
88
Subprograms
Thus the parameter value or address could change based on where/when in the subprogram it is evaluated However, the referencing environment used is that of the CALLER, not of the subprogram
So only changes within the subprogram that have a global effect will change its evaluation This also makes implementation more difficult
For simple variables this is equivalent to pass-by-reference

Variable address evaluates the same way regardless of where in the subprogram it is located
89
Subprograms
For constant expressions, this is (almost) equivalent to pass-by-value

Evaluation of constant expr. will not change from one part of the subprogram to another
But cannot assign a new value to the formal param unless a copy is made
But it gets wacky when array elements or variable expressions are passed
Now changes within the subprogram can affect the index of the array or a variable within the expression
Can cause evaluation to differ in different parts of the subprogram
90
Subprograms
global int i = 0, var = 11, n = 5; global int A[2] = {4, 8}; foo(var, 2*n, A[i]); // all pass by name void foo(int x, int y, int z) { x = x + 1; output(var);
output(y);
output(z); i = i + 1; }
n = n + 1;
z = z + 1; z = z + 1;
output(y);
output(z); output(z);
91
Subprograms
Implementation:
It is not trivial to allow macro to be evaluated and reevaluated in environment of the caller Parameterless subprograms called thunks are used
Thunk evaluates parameter in current state of callers referencing environment Returns the resulting address or value
Clearly this is a lot of overhead
Overhead and confusing results are why this is not used in newer languages
92
Subprograms
Subprograms as Parameters
We allow variables as parameters so that we can access their values (or addresses) from within a subprogram Why not allow subprograms so that we can execute them from within a subprogram? Some languages do allow this (ex. Pascal, C++, PHP)
However, there are some issues to consider
93
Subprograms
Can the parameter subprogram arguments differ in form from each other?
If so, how to type check and even check the number of arguments when the subprogram is actually called?
Easiest solution is to require the arguments to all have the same form
Header of parameter subprogram must be given within the header of the subprogram it is being passed to
Scope is also an issue what is the referencing environment of the subprogram that is being passed as a parameter? Three reasonable possibilities exist:
94
Subprograms
1) The referencing environment in which the parameter subprogram is CALLED: shallow binding 2) The referencing environment in which the parameter subprogram is DEFINED: deep binding
3) The referencing environment in which the parameter subprogram is PASSED as an argument: ad hoc binding
Note that shallow binding fits well with dynamic scoping and deep binding fits well with static scoping
95
Subprograms
Pascal and C++ both use deep binding Shallow binding is used by SNOBOL, which also uses dynamic scoping Ad hoc binding has never been used See fnparams.cpp
96
Subprograms
Overloading (ad hoc polymorphism)

Using the same subprogram name with different parameter lists When a subprogram is called, the compiler selects the correct version based on the parameter lists
In Ada, return type for a function is also used, since coercion is not done in Ada and function return values cannot be ignored
Enables programmer to use the same name for similar functions that take different argument types
97
Subprograms
Use: Make it easier for the programmer to use consistent names for subprograms
Without overloading: Programmer must make up different but similar names for subprograms that do similar things but for different types
Ex: abs(int) fabs(float) labs(long) Ex: ISort(int * A) FSort(float * A)
With overloading: Programmer uses the same name and the compiler decides which to use
Ex: abs(int) abs(float) abs(long) Ex: Sort(int * A) Sort(float * A)
98
Subprograms
But programmer must be careful:

Ada and C++ both allow overloading and default parameters Leaving out some parameters in the call could make a call ambiguous
i.e. it matches more than one function header
Call can also be ambiguous if implicit casting of arguments is done
Operator Overloading is the same idea, but with symbols rather than identifiers
We discussed these issues previously
See Slide 12 of cs1621b.ppt
99
Generics
Generics
Parametric polymorphism
One or more parameters are passed to a subprogram when it is instantiated (i.e. when the code is generated) indicating the types that will be used for the parameters in the subprogram call
Can also be used in conjunction with packages (Ada) and classes (C++)
Thus a single subprogram declaration can be used to generate many different callable subprograms, all with the same functionality
100
Generics
Motivation:
Programmers often apply data structures and algorithms to more than one data type
Ex. Sorting, Searching algos Ex. BST, PQ, Stack, Queue data structures
Even with overloading, the programmer must still write different (identical except for type) versions of the code Generics simply transfer the job of making the different versions from the programmer to the compiler automates the overloading process
Note that DIFFERENT VERSIONS of the code MUST STILL BE generated
101
Generics
So the reason we have generics is to save the programmer some time (and perhaps some confusion)
Ada vs. C++:

In Ada, template instantiations must be explicit
Programmer specifies template arguments using the new statement Ex: package int_io is new integer_io(integer); The generic package is integer_io The instantiated package is int_io The type argument is integer
As is usual in Ada, if declaration is explicit, there will be no surprises

102
Generics
In C++, template instantiations can be explicit or implicit Implicit: generated automatically by the compiler when a call is seen with the appropriate arguments
Duplicate instantiations are merged into a single code segment Coercion cannot be done, since the types wont match the template correctly Saves programmer some typing
Explicit: programmer declares each version

Coercion can be done using regular C++ promotion and conversion rules Programmer is aware of each version
See template.cpp and tordlist.h

103
Generics
Java Generics
In Java 1.5 "generics" were added to the language It is somewhat misleading, since generic abilities were always built into the Java language
Collections were defined in terms of class Object, which is the superclass to other Java classes
They could be used to store any Java class
104
Generics
However, retrieving objects back from the collection required explicit casting to the actual type if we wanted full access to them
ArrayList A = new ArrayList(); A.add(new String("Wacky")); String S = (String) A.remove(0);
Also any typing mistakes (mixing types in the collection unintentionally) could only be caught at run-time (via casting exceptions) Overall not bad, but some people thought type parameters should be allowed
105
Generics
JDK 1.5 added syntax very similar to that for C++ templates
However, it is very different from C++ templates (and Ada generics as well)
It is not really adding any new generic abilities to the language It is not creating new code for each version of the class or method It is designed to make collections of objects more type-safe See more details in the handout
106
Implementing Subprograms
What is involved when a subprogram is called, during its execution, and when it terminates?
This will differ depending on if recursion is allowed in a language or not
Most modern languages allow recursion, but original FORTRAN (up to FORTRAN 77) did not allow it
107
FORTRAN 77 (and before)

All variables within a subprogram were static, and recursive calls were not allowed Activation records were still used, but they also could be static
Since all data was static, the size was known at compile time
Run-time stack not needed, since at most one call per sub could be performed at a time
What do we need to know when a subprogram is called?

108
Return Value
If sub is a function
Local Variables
Static
Parameters
Like local variables that are initialized

Where to go back to when subprogram ends
109
Return Address
C, C++ and Java

To allow for recursive calls, a run-time stack is used Multiple activations of the same subprogram can co-exist
Each needs its own copy of parameters and local variables
But subs are not allowed to be directly nested

The only non-local variables that need to be accessed are global variables However, inner classes allow a nesting of sorts 110
So the activation record looks similar to that used in FORTRAN

With additional link location to access global variables
Now multiple instances of an activation record can occur at the same time, so they must be created dynamically (at run-time), unlike in FORTRAN Lets look at some of the contents of an activation record
111
Temporaries
Local Variables
Temps and local variables are allocated within the subprog. call. In Pascal, C and C++, the local variables must be of fixed size. In Ada, they can be variable size (ex. arrays)
Parameters
Dynamic Link to previous call Static Link to NonLocals Return Address
Parameters, links to non-Locals and the return address are placed into the AR by the caller of the subprogram, so they are lower in the record
112
See rtstack.cpp
Accessing non-local variables within a subprogram

Local variables are located within the activation record (AR)
Can be accessed by knowing the base address of the AR plus a local_offset for each variable Ex: Base address of AR = 162
int x, y[5]; // address of x is 162 + (other AR stuff) float z; // address of z is 162 + (other AR stuff) // + 4 + 20
113
Non-locals are located elsewhere

For languages like C and C++:
Subprograms cannot be nested Besides locals there are global variables
For languages like Ada and Pascal:

Subprograms can be nested to arbitrary depth A sub can be declared within a sub, which is within a sub, which is within a sub Using static scope, variables declared in a textual parent sub are accessible from an inner sub
Relative global variables
But the variable locations could be in different places on the run-time stack How to find them?
114
What do we need to do?

1) Locate the AR that contains the nonlocal
2) Find where in the AR the variable is located
Finding where in the AR to look is the same as for local variables

Keep track of a local_offset value for the variable
Locating the AR is a different story

May not be directly prior to current AR
115
Two techniques used to locate AR

1) Static links
A link is kept in an AR to that ARs textual parent (from the declaration)

To access a single nonlocal many links may be crossed A single array is kept to indicate all of the currently accessible nested subs Any nonlocal can be accessed with two indirect accesses
116
2) Display

Static Links
Due to rules of static scope, if a subprogram is called, its textual parent subprogram MUST be active
sub foo { sub fum { } } main { // cannot call fum directly }
117
However, textual parent does NOT have to be previous call on run-time stack
So dynamic link in AR is not enough (but would work for dynamic scoping)
sub foo { sub innerA { } sub innerB { innerA; } innerB; } main { foo;
innerA
innerB foo
}
118
Static links connect an AR to the AR of the subs textual parent, no matter where previously on the RT stack it is How is this used to access nonlocal variables?
Can be determined and maintained based on the nesting depths of the subprograms that are called
The difference in the nesting depths between the sub using a nonlocal variable and the sub in which the nonlocal is declared is equal to the number of static links that must be crossed to find the correct AR for the variable
119
This difference can be stored for each variable when the program is compiled, so that at run-time finding the variable is simple
sub parent { var X, Y sub child1 { var X, Z sub grand1 { var Z } } sub child2 { var Y call child1 } } main { call parent }
If variable Y is accessed within grand1

chain offset is 2, since Y is declared two levels outside grand1 so search for Y only has to be done once at compile-time at run-time we know to follow two static links, whatever call sequence is
120
What actually happens when a sub is called? AR for textual parent of sub must be located on the run-time stack, so that the static link can be linked to it
A clear (but inefficient) way to do this is to follow dynamic links down the RTS until the AR for the parent sub is found A better way can take advantage of the fact that the calling sub and the called sub must be relatives in the declaration tree
Calling sub could grandparent) Calling sub could Calling sub could Calling sub could recursion) Calling sub could be parent of called sub (but not be called sub (direct recursion) be a sibling of called sub be a descendent of called sub (indirect be a niece of called sub
121
So instead of following dynamic links, at compile-time we can pre-calculate the number of static links (from caller) to follow to find the appropriate textual parent AR
Always equal to: nesting_depth (calling sub) nesting_depth(called sub) + 1 Calling sub could be parent of called sub
X (X+1) + 1 = 0 static links (user caller's AR)
Calling sub could be called sub (direct recursion)

X X + 1 = 1 static link same textual parent
Calling sub could be a sibling of called sub

X X + 1 = 1 static link same textual parent
Calling sub could be a descendent of called sub (indirect recursion) Calling sub could be a niece of called sub
Follow diff. in nesting depth + 1 static links
122
Implementing Subprogams
procedure Bigsub is procedure A(Flag: Boolean) is procedure B is ... A(false); end; -- B begin -- A if flag then B; else C; end; -- A procedure C is procedure D is here end; -- D ... D; end; -- C begin -- Bigsub A(true); end; -- Bigsub
D dynamic link to C static link to C return addr. to C dynamic link to A static link to Bigsub return addr. to A param flag ( = false) dynamic link to B static link to Bigsub return addr. to B dynamic link to A static link to A return addr. to A param flag ( = true) dynamic link to Bigsub static link to Bigsub return addr. to Bigsub dynamic link to caller static link return addr.
Bigsub
123
Evaluation of static links

Maintaining is not too time-consuming
Chain offsets can be calculated at compile time
Local variables can be accessed directly
Non-locals must follow 1 or more static links

Works well if nesting depths do not get too deep
For deep sub nesting, cost of non-local access can be high

But usually 2 or 3 levels is max used
124
Display
Uses a single array to store links to ARs at all relevant nesting depths
To access a nonlocal at a given nesting depth, we just follow the display entry for that depth, then the local_offset
Never more than one link to follow
Array is updated as subs are called and as they terminate Generally faster than static links if many nesting levels are used We will skip the details here read the text
125
Nested declaration blocks

Idea could be similar to nested subs
Blocks could be treated as parameterless subs
Static links could be used to determine textual parent
But it is actually much easier to handle, since block entry and exit is always the same
Parent block goes to child block
When child block terminates, we revert to parent block

126
Simply push new block declarations onto run-time stack, and pop them when block terminates But we only have one activation record, so no links are required
"Non-locals" can be accessed just like locals
127
Dynamic Scoping
When a non-local variable is accessed, we always follow the dynamic links until the correct declaration is found
Clearly could differ depending upon call sequence But the mechanics are actually simple
ARs must store names of local variables so we know where to stop the search
In static scoping the names are not needed just the offsets
128
Data Abstraction
Procedural (process) abstraction:

Action can be performed without requiring detailed knowledge of how it is performed
Data abstraction:
New type can be used without required detailed knowledge of how it is implemented
We don't need to know the details of how it is stored in memory
We don't need to know the details of how it is manipulated via operations

129
Data Abstraction
More formally, an ADT must satisfy two conditions:

1) The declarations of the type and operations (interface) are contained in a single syntactic unit ENCAPSULATION
The interface does not depend on how the objects are represented or how the operations are implemented
2) The representation of the objects is hidden from users of the ADT DATA HIDING
Objects can only be manipulated via the provided interface
130
Data Abstraction
Ex: Stack
Data: something that can store and access multiple data values in the manner dictated by the operations Operations:
Push add new value to top of stack Pop remove top value from stack Top view top value (or a copy) without removing Empty is stack empty
User of stack only needs to know the parameters and effect of each operation to use a stack correctly Implementation could be an array, a linked-list, or maybe something different
Does not affect use Implementer can hide these details from the user through private declarations
131
Data Abstraction
The idea of data abstraction was not always supported by programming languages
Ex: FORTRAN, Pascal, C did not fully support either encapsulation or data hiding
When learning good programming style, users tried to "simulate" data abstraction
Logically group type definitions, procedures and functions together as a unit Only access the data type via the procedures and functions
Naturally, this was at the programmer's discretion See ADT.p
132
Data Abstraction
Newer languages added true data abstraction

Ada via packages
C++, Java, C#, Ada95 via classes / objects
Encapsulation units that contain all details of the new type Access modifiers that prevent access to internal details of the ADT from outside the encapsulation unit
See text for more details

133
Object-Oriented Programming (OOP)
Characteristics of OOP
1) Data abstraction: encapsulation + information-hiding
The operations for manipulating data are considered to be part of the data type (encapsulated) The implementation details of the data type (both the structure of the data and the implementation of the operations) are separate from their specifications and (possibly) hidden from the user
As we discussed with ADTs
134
OOP
2) Inheritance
The characteristics of an ADT (data + operations) can be passed on to a subtype
Subtype can also add new data and operations
Allows programmer to build new (derived) types from old (parent) ones
Common data/operations do not have to be rewritten (or copied) Operations that are slightly different in derived type can be rewritten (overridden) for that type New data/operations tailor the derived type to the problem at hand Parent type is unchanged and may (sometimes) be used together with derived type
135
OOP
Ex: Shape class

Has data: CenterX, CenterY Has operations AREA, DRAW Subclasses: Rectangle, Circle, Triangle
Each subclass inherits the data and operations from the Shape class Rectangle adds data: length, width Rectangle overrides AREA = length * width, and DRAW in appropriate way for a rectangle
Subclass of Rectangle: Square

Guarantees that length == width
Similar ideas for Circle and Triangle
136
OOP
3) Polymorphism
Variables of a parent class can also be assigned objects of a subclass (or subclass of a subclass) Operations used with a variable are based upon the class of the object currently stored (could be a parent type object or a derived type object)
Operations may have been overridden in the derived class Dynamic binding allows parent and derived objects to be used together in a logical way
137
OOP
Ex: Shape class

We could declare: Shape shapelist[100]; shapelist[0] = new Rectangle(0, 0, 10, 20); shapelist[1] = new Square(50, 100, 30, 30); shapelist[2] = new Circle(100, 50, 25); for (int i = 0; i < 3; i++) shapelist[i].Draw();
Polymorphism allows these different objects to be accessed consistently within the same array Think about how you could do the code above in C or Pascal
It would not be easy!
138
OOP
One option: Make one giant struct or record to contain all of the data, including a union or variant
Base class would use only the core data items Derived classes would use additional data items as provided in the union or variant To do the operations, we would need a switch or case to test which type the variable is, so that it can be written out appropriately
Now what if we want to add another new derived class, Pentagon?

With OOP, it is simple to add any new data and override the necessary operations Without OOP we would have to change the overall structure of the data and operations old types would change, possibly causing problems
139
OOP
OO Languages
1) Smalltalk was the first and purest OOL
All data (even numeric literals) are objects, and are all descendents of class Object
Objects are all allocated from the heap, and implicitly deallocated (garbage collection) Variables are references, with implicit dereferencing Execution of a program (logically) involves objects sending messages to each other, executing methods, and responding back
So the data is driving the execution, not the control statements
140
OOP
Smalltalk example to count letters in an input string

| data ctr letters | data := Prompter prompt: 'Enter your name' default:''. ctr := 1. letters := 0. [ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ]. letters printNl.
Note variables are not typed

Only type checking is that message sent to the object is recognized
Even blocks [] are objects

Evaluated when appropriate methods are called
141
OOP
Consider the while loop below

[ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ].
Semantics of this loop are as follows:

whileTrue: is a message sent to the top block, with the second block as a parameter The top block executes a method corresponding to the whileTrue: message that does the following:
Evaluates the top block If true, evaluates the parameter block If false, exits the method
This propagation of messages can sometimes lead to very short code, if variables are eliminated
142
OOP
Equivalent to previous code:

| letters | letters := 0. (Prompter prompt: 'Enter your name' default:'') do: [ :c | c isLetter ifTrue: [ letters := letters + 1 ]. ]. letters printNl.
Now we cascade the messages to allow fewer statements (also do: loop iterates through characters in a string, so we dont need the loop counter
(((Prompter prompt: 'Enter your name' default:'') select: [ :c | c isLetter ]) size printNl.
Now the select: loop generates a string based on the condition in the block
143
OOP
More on Smalltalk (classes and objects)

Data in an object can be an instance variable or a class variable
Instance variables are associated with objects
Separate data for each object Accessible only through the methods defined for that object always private to the class
Class variables are associated with classes

Shared data for all objects of the same class Accessible from all objects, but still private to the class
Methods have a similar grouping, but are public

Instance methods associated with objects Class methods associated with entire class
144
OOP
More on Smalltalk (inheritance)

Object base class of all others
Only single inheritance allowed

All inheritance is implementation inheritance
Data and methods of parent class are always accessible to the derived class
i.e. Cannot hide implementation details from derived class
Advantage: Derived class can likely implement its methods more efficiently with access to parent data Disadvantage: Change in parent class implementation will likely require change in derived class implementation
Ex. Traversable stack
145
OOP
More on Smalltalk (polymorphism)

All messages are dynamically bound to methods
At run-time, when a message is received, the objects class is searched for a method, then, if necessary its superclass, its super-superclass and so on up to Object
Variables have no types since they are only used to refer to objects, not to determine the messages an object can receive
Clearly some liabilities with this approach
Slows language down due to run-time overhead Programmer type errors cannot be caught until execution time
146
OOP
Let's look at some examples:

person.cls as an example of a new class
See personTest.st
student.cls as an example of a subclass studentTest.st as an example showing polymorphic access twodarry.cls as another subclass example
See twodTest.st
For more information, see the GNU Smalltalk User's Guide:

http://www.gnu.org/software/smalltalk/gst-manual/gst.html
147
OOP
2) C++ is an imperative/OO mix

Had to be backward compatible with C Wanted to add object-oriented features Result is that programmer can use as few or as many OO features as he/she wants to
C++ Classes and Objects

Can be static, stack-dynamic or heap-dynamic Member data and member functions can be private, protected or public Allows programmer to decide Like Smalltalk, has notion of class variables Delcared as static in C++ Destructor needed if object uses dynamic memory
148
OOP
C++ Inheritance
Do not need a superclass (no Object base class for all other classes) Multiple inheritance is allowed
Complex and difficult to use
Implementation inheritance or interface inheritance are allowed

With interface inheritance, all data and functions are still inherited, but only public ones are directly accessible to the derived class Advantage: Modifications to parent class do not affect derived class, as long as they do not change the interface Disadvantage: Operations may be slower, since they cannot access the data directly
149
OOP
C++ polymorphism
By default all functions are statically bound
Recall that this allows faster execution, a goal of the C++ language However true polymorphism can not be utilized with statically bound functions
Dynamic binding is enabled by using virtual functions and pointers (or references)
This tells the compiler not to bind the function name to the code until run-time
Abstract base classes can be created with pure virtual functions

Not implemented in the base class
See poly.cpp
150
OOP
3) Java falls in between Smalltalk and C++

Like Smalltalk:
Object is base class to other classes Single inheritance only Objects are (almost) all dynamic, with garbage collection References used to access Method names are (by default) dynamically bound
Like C++:
Access can be private, public or protected Static binding can optionally be used to improve run-time speed Overall syntax for member data and function access Variables are typed
151
OOP
Other Java OOP features:

Interfaces allow for a simplified form of multiple inheritance
An interface is in a sense a base class with no data and only abstract (pure virtual) methods A class that implements an interface simply implements the methods specified therein Advantages: Objects that implement an interface can be used whereever the interface is specified. This allows for a type of generic behavior
Ex: Comparable interface, Runnable interface
Disadvantage: Can become complicated when interfaces and inheritance are both used
Reflection that allows us to manipulate the classes themselves

See poly.java
152
OOP
OOL Implementation
Data:
Typically a record/struct type of storage is used Class Instance Record (CIR)
Data members are accessed by name, in the same way as records Subclass adds extra data to CIR of parent class Private access enforced by limiting visibility of the data
153
OOP
Subprograms:
Static binding
Subprograms that will be called are determined by the variable type Variable types are known at compile time and code can be determined then
Dynamic Binding:
Subprograms that will be called are determined by the objects type, not the variables type Objects stored in a variable are determined at run time Appropriate links must be stored with the object
But they are the same for all objects of that class
Virtual Method Table (VMT) used to store links to all pertinent subprograms
154
Parallelism
Parallelism is incorporated into programs for 2 primary reasons:

1) Program is running in a multiprocessing or distributed environment
Many computers now have multiple CPUs
Many jobs are distributed over multiple computers in a network

A programming language should be able to take advantage of this parallelism
Many algorithms can be improved if designed for parallel execution
This is PHYSICAL PARALLELISM

155
Parallelism
2) Program is running in a simulated parallel environment, allowing for asynchronous activity

Ex: Two windows are displayed to the user. One shows the current time (incremented by seconds) and one allows the user to draw images on the screen
We dont want the act of the user drawing to stop the clock We dont want the clock running to prevent the user from drawing Even with a single processor, we want both of these activities to execute in parallel
This is LOGICAL PARALLELISM

156
Parallelism
What issues must we be concerned with?

Synchronization
Execution of tasks in parallel causes them to be asynchronous
Cannot predict at what point in time one task will execute an instruction relative to another task
If the tasks are independent, this is not a problem

No resources are shared, so it doesnt matter where in the execution each task is Ex: One task to count ballots from Florida, one task to count ballots from New Mexico
157
Parallelism
If the tasks have some dependencies, there can be a problem

Most common dependency is shared data To handle this we must synchronize the tasks Cooperation Synchronization One task is dependent upon an output/outcome of another Ex: Task B must process data produced by Task A
Contractor B cannot put up drywall until contractor A has finished the wiring Task to count ballots cannot proceed until task that collects ballots provides it with some
We must have a mechanism that allows Task B to pause until the data is available
B could loop and keep checking for data B could wait for some signal from A
158
Parallelism
Competition Synchronization Both tasks are competing for the same shared resource If one or both tasks modify the data, it could cause data inconsistencies Ex: Task A and Task B are MAC machine accesses of the same bank account
Task Task Task Task Task Task A B A A B B checks the balance: $200 checks the balance: $200 withdraws $200 updates balance to $0 withdraws $200 updates balance to $-200
We must have some mechanism that ensures MUTUAL EXCLUSION for CRITICAL DATA
We could have a LOCK on the data, or a similar mechanism allowing only one task to access it at a time
159
Parallelism
Synchronization Mechanisms
Semaphores
Devised by Dijkstra
Basically guards that are placed around code
P must succeed to gain access to code
Decrements a counter when it succeeds
V executes when critical section ends Based on initial value of counter, we can control how many tasks are allowed to access the critical section at once
If used properly, can guarantee either cooperation or competition synchronization

However, it is easy to NOT use them properly
Can cause problems
160
Parallelism
Monitors
Devised by Hansen and Hoare
Critical data section is part of a data object that allows only one task entry at a time
Better than semaphores for competition synchronization, because mechanism is built into the monitor
Harder to programmer to mess up
No better for cooperation synchronization

Still must be done manually
Used in Concurrent Pascal, Modula-2 and (somewhat) in Java

161
Parallelism
Message Passing
Proposed by Hansen and Hoare
More general than either of the two previous techniques

Tasks are synchronized via messages sent to each other Message is similar in look/execution to a subprogram call, but with restrictions:
Caller (or passer) of the message is blocked at the call until the receiver is ready to receive it Receiver (or executer) of the message is blocked at the message code until the message is called Caller and Receiver meet at a rendezvous
162
Parallelism
Idea is that we know exactly where in the code both tasks will be when a rendezvous occurs
So even though tasks execute asynchronously, we synchronize them with respect to each other at a rendezvous
Ex: Ada
Still much of the work is up to the programmer
163
Parallelism
Parallel processing concerns

Data consistency
We have already discussed this
Mutual exclusion is needed to prevent multiple tasks from accessing critical data at the same time However, efforts to ensure data consistency can cause other problems, such as DEADLOCK and STARVATION
164
Parallelism
Deadlock
When a (shared) resource has restricted access, it can cause a task to stop execution
Wait in a semaphore queue Wait in a monitor queue Wait in an accept queue
If a circular resource dependency exists, we can get deadlock Ex:

Task A has acquired binary semaphore S1 Task B has acquired binary semaphore S2 Task A is waiting for binary semaphore S2 Task B is waiting for binary semaphore S1
165
Parallelism
Starvation
To combat deadlock, most languages allow a task to release a resource prematurely in some circumstances
Ex: If one of the Tasks in the previous example release the semaphore, the other can proceed
Under these circumstances there is the possibility that a task may never acquire all of the resources that it needs at the time it needs them starvation
We must be careful to avoid all of these problems when programming in parallel

166
Parallelism
Lets look at Java as an example:

Deadlock: see deadlock.java
Corrupt data: see corrupt.java

Some features of older Java impls are now deprecated because they are too prone to deadlock and starvation problems
Suspend / Resume
Does not free locked objects Can easily lead to deadlock if not resumed
Stop
Immediately frees locked objects Can lead to data inconsistency
167
Prolog
As we discussed previously, Prolog is a language used for logic programming
"Programs" in Prolog consist of facts and rules in a database

Facts consist of an identifier followed by a comma separated list of objects (atoms) followed by a period
The identifier represents some relationship amongst the objects, and is called a predicate The objects are the arguments
Ex. from ex1.pl: father(herb, irving).
168
Prolog
Rules are predicates that consist of a head and a body

In order for the head to "succeed" in its evaluation, all of the goals in the body must be satisfied Ex from ex1.pl:
sibling(X,Y) :- X \== Y, parent(P,X), parent(P,Y). The :- can be thought of as "if"
These goals could be facts, or could be other rules
Execution of a program is in fact a sequence of questions, or assertions

Database is searched in an effort to satisfy all of the assertions
169
Prolog
If assertions can be satisfied, answer is yes

Otherwise, answer is no
If a given assertion succeeds, execution proceeds to the next one If a given assertion fails, execution backtracks and attempts to re-satisfy the previous assertion
So what about variable assignments?

These are in fact just side effects that occur in an effort to satisfy the query In fact variables are not assigned in the traditional (imperative language) sense
170
Prolog
Variables in Prolog are dynamically typed and have two states:

Uninstantiated:
Variable is not associated with a value
Instantiated
Variable is associated with a value
Once a variable is instantiated, it keeps that value, and all occurrences of that variable within the same scope have that value
Cannot be re-assigned in sense of imperative languages However, if execution backtracks past the point at which it was instantiated, it can again become uninstantiated
Let's look again at ex1.pl

171
Prolog
Recursion and database search

Recursion is a fundamental part of programming in prolog Execution is simply satisfaction of goals, and there are no loops as in imperative languages Thus, to build complex "programs" we must utilize recursive programming
Each attempt to satisfy a goal initiates a search of the database

172
Prolog
By default the DB is searched from top to bottom
We can take advantage of this in our programs

Ex: put the base case before the recursive case, so we don't have to explicitly test for it Although, as the text points out, this could be considered to be a flaw in the language, since the order that the rules are considered should not matter to the "truth" of the logic
173
Prolog
If a subgoal in a rule fails at any point, we backtrack and attempt to resatisfy a previously satisfied subgoal
When resatisfying a subgoal, the db search resumes from the point at which it succeeded the first time
See recurse.pl
174
Prolog Lists
As in Lisp, the list is an important data structure in Prolog

A list consists of a head and a tail
Tail could be the empty list
175

Cs 1621 B

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cs 1621 B

Încărcat de

Drepturi de autor:

Formate disponibile

CS1621 Structure of Programming Languages Part B

By John C. Ramirez Department of Computer Science University of Pittsburgh

Course Notes for

Expressions are vital to programs

Precedence and Associativity

Ex: boolean and relational operators

if (x < y) and (y < z) then writeln(Middle);

Ex: unary ++ and -- in C++

Precedence and associativity are wacky!

Ex: Mixing bitwise operators in C++

Side-effects can also cause evaluation order problems

How to handle this?

Do not allow (most) side-effects to occur, as in Ada

Can be good and bad

Ex: String variables can be compared

Can allow some logic errors to go undetected

Others like Java do not

Coercion and conversion

Implicit conversion coercion

See coercion.cpp and rational.h

Formed using relational operators and boolean operators

Boolean operators operators used to combine boolean results

Same guidelines for precedence and associativity hold here

Ex: C++ boolean operator && has higher precedence than ||

This is significant in (at least) 2 situations:

Idea of SSE is simple:

SSE is nice because it makes our code simpler

if ((X != 0) && (Y/X < 1)) cout << rational;

So we have safety/ease of programming vs. execution efficiency

Solution is to offer programmer the choice

Gives a value to a variable

C++ and Java allow conditional targets

C, C++ and Java have many assignment variations for convenience

C, C++ and Java return the rvalue as operation result

Allows shorter code in cases such as:

Since it is changing the value of a variable, order of evaluation is critical

Famous C/C++ bug that we mentioned before: if (x = y) is wacky!

Ex: Overloading = for a linked list variable

Also, how about A = A;

Primary types of control in imperative languages

ALGOL 60 introduced the compound statement

What issues occur in this case?

Only problem of interest is one we have already discussed

There are two main approaches to handling this:

2) Use syntax to determine how it is handled

Clearly one way of doing this is through nested if statements

In these instances, nested ifs could be used

So many languages supply a case statement

There are some interesting issues to consider here

A few that we will look at

Causes potential logic problems

What happens if no match is found?

C, C++, Java adopt the do-nothing approach

ANSI Standard Pascal and Ada adopt the error approach

ex: while (infile && valid == 1)

Many languages have two versions of the conditional loop

Posttest condition is tested immediately after executing loop body

2) Counting Loops (counter-controlled loops)

This is the approach taken in Pascal and Ada

C, C++ and Java have a different approach

for (init-expr; pretest-expr; post-body-expr)

"foreach" iterates through the values in the collection directly