Sunteți pe pagina 1din 175

CS1621 Structure of Programming Languages Part B

By John C. Ramirez Department of Computer Science University of Pittsburgh

Course Notes for

These notes are intended for use by students in CS1621 at the University of Pittsburgh and no one else These notes are provided free of charge and may not be sold in any shape or form Material from these notes is obtained from various sources, including, but not limited to, the textbooks:
Concepts of Programming Languages, Seventh Edition, by Robert W. Sebesta (Addison Wesley) Programming Languages, Design and Implementation, Fourth Edition, by Terrence W. Pratt and Marvin V. Zelkowitz (Prentice Hall)

Compilers Principles, Techniques, and Tools, by Aho, Sethi and Ullman (Addison Wesley)

Expressions

Expressions are vital to programs


Allow programmer to specify the calculations that computer is to perform It is important that programmer understand how a language evaluates expressions

Things to consider:
Precedence and associativity Order of operand evaluation

Side-effects of evaluation
Overloadings and coercions
3

Expressions

Precedence and Associativity


We always learn these rules for any new language
Vital to using expressions correctly

Most languages have similar precedence for the standard operators: * / then + But programmer needs to understand precedence and associativity for all operators, especially those that may be unusual
4

Expressions

Ex: boolean and relational operators


and or not < > <= >= != ==

In Pascal, the boolean operators have higher precedence than the relational operators (opposite of C++)
if x < y then writeln(Less); if x < y and y < z then writeln(Middle);
Above is an error in Pascal, since the first subexpression evaluated would be y and y

if (x < y) and (y < z) then writeln(Middle);


Now it is ok

In C++
if (x < y && y < z) cout << Middle << endl;
This is fine in C++
5

Expressions

Ex: unary ++ and -- in C++

Precedence and associativity are wacky!


#include <iostream> using namespace std; int main() { unsigned int i1 = 0, i2, i3, i4, i5, j, k, m1, m2, m3, m4, m5; j = i1++; k = ++i1; cout << j << " " << k << endl; i5 = i4 = i3 = i2 = i1; m1 = i1++ + i1++ + i1++; m2 = i2++ + ++i2 + i2++; m3 = i3++ + ++i3 + ++i3; m4 = ++i4 + i4++ + ++i4; m5 = ++i5 + ++i5 + ++i5; cout << i1 << " " << m1 << endl; cout << i2 << " " << m2 << endl; cout << i3 << " " << m3 << endl; cout << i4 << " " << m4 << endl; cout << i5 << " " << m5 << endl; } 6

Expressions

Output? See plusplus.cpp try it on different platforms http://www.cppreference.com/operator_precedenc e.html See problem in Assignment 3 Compare to plusplus.java and plusplus.pl

Expressions

In some cases, expression is ambiguous and compiler will not let you do it, or warn you about it
Ex: A ** B ** C in Ada
Must have parentheses

Ex: Mixing bitwise operators in C++


Warning to use parentheses

Sometimes you could probably figure it out, but youre better off not trying
Ex: If more than one coercion can occur in C++
May have defined constructor and conversion fn
8

Expressions

Sometimes you dont think you should care, about precedence and associativity, but you should
In math, addition and multiplication are associative and commutative On computer, overflow can cause this to not always be the case:
floats x = 1e+30, B = 1.0/1e+30, C = 1e+30 A*B*C A*C*B ~= 1e+30 = infinity see Overflow.cpp F1.add(F2); F2.add(F1) -- If F1 and F2 are from different classes, the operations may be different or perhaps not even legal
9

Expressions

Side-effects can also cause evaluation order problems


Expressions can involve function calls, which can change variable values
Y = f(X) + X; Y = X + f(X);

Without side-effects, the results are the same, but if f(X) changes the value of X, the results could be different
Most languages allow reference parameters with functions These can cause logic errors if used improperly See side.cpp
10

Expressions

How to handle this?


Leave it up to the programmer, as in Pascal and C++
Limits compiler optimizations, some of which may include reordering of operations Compiler cannot reorder if it could possibly change result

Do not allow (most) side-effects to occur, as in Ada


Ada functions cannot change parameters Now optimizations can reorder expressions without changing result (at least due to this)

Best advice is to program in such away as to either avoid all side-effects, or to only allow them in cases where they will not affect expression evaluation

11

Expressions

Operator Overloading
Used in many newer high-level languages

Can be good and bad


Good:
Aids in readability and simplifies code if used correctly
Ex: New class Complex variables A, B and C
A + B + C is more clear than (A.add(B)).add(C)

Ex: String variables can be compared


if (A < B)

is clearer than
if (A.compareTo(B) < 1)
12

Expressions

Bad:
Can harm readability if used incorrectly
Ex: + defined to do multiplication
But methods could be improperly named as well

Function calls are not obvious, especially if other versions of the function exist
In C++ we could have an member function + and also a friend function + which is used?

Can allow some logic errors to go undetected


Ex: C++ uses / for float and integer division
If user expects a value between 0 and 1, its not going to happen if integer division is used

13

Expressions

Some languages like C++ and Ada allow programmer-defined operator overloading

Others like Java do not


Both positions have support

14

Expressions

Coercion and conversion


In many expressions we use more than one datatype
Mixed expressions This seems a reasonable thing to allow

However, often the operators and functions used are defined for only a single type In this case, to allow mixed expressions to be used, some types must be converted to other types

The differences in languages are whether these conversions should be IMPLICIT or EXPLICIT
15

Expressions

Explicit conversion
In this case the language allows little or no mixed expressions in the code To allow mixing of data types, the programmer must convert through an operation of function call
Ex: Ada does not even allow mixing of floats and integers

Good:
Everything is clear no uncertainty or ambiguity Programmer can more easily verify correctness of programs Easier to avoid logic errors
16

Expressions

Bad:
Makes language very wordy Can be annoying, especially when the types are similar (ex. addition of integers and floats)

Implicit conversion coercion


In this case mixed expressions are allowed, and the language coerces types where needed to allow types to match Usually a language has some rules by which the coercions are performed

Good:
Less wordy makes programs shorter and sometimes easier to write
17

Expressions

Bad:
Programs are harder to verify for correctness It is not always clear which coercion is being done, especially when programmer-defined coercions are allowed Can lead to logic errors in programs Ex: In C++ expressions are always coerced if they can be Standard rules of promotion for predefined types can be easily remembered However, programmer can also define functions that will be used for coercion
Constructors for classes and conversion functions are both implicitly called if necessary Now the rules are less clear and can lead to ambiguity and logic errors

18

Expressions

Consider A = B + C where A, B and C are all of different types Any/all of the following could exist:
+ operator with two type B arguments + operator with two type C arguments Constructor for type B with argument type Constructor for type C with argument type Coercion function from C to B Coercion function from B to C Constructor for type A with argument type Constructor for type A with argument type C B

B C

How does programmer know which will be used? Should NOT assume any particular coercion will occur in this case
Here explicit coercion should be used to remove ambiguity

See coercion.cpp and rational.h


19

Expressions

Boolean expressions
Expressions that evaluate to TRUE or FALSE

Formed using relational operators and boolean operators


Relational operators operators which compare values
Operands can be most primitive types and complex types as well in some cases

Boolean operators operators used to combine boolean results


Operands must be boolean values Exception is C/C++
20

Expressions

Same guidelines for precedence and associativity hold here


Know the rules for current language
Ex: Ada boolean operators and, or have the same precedence but are NON-associative when mixed with each other
if A and B or C then illegal in Ada must parenthesize

Ex: C++ boolean operator && has higher precedence than ||

21

Expressions

Short-Circuit Evaluation
Important note (that we may not have emphasized earlier):
Operator precedence and associativity are for OPERATORS, not OPERANDS
The operators simply indicate how the operands are combined/utilized, NOT the order in which they are accessed/determined For example: A + B + C + D We know we first add A and B, then add C, then add D But the VALUES for A, B, C and D could be obtained in ANY ORDER
Done to optimize execution (ex. in parallel)
22

Expressions

This is significant in (at least) 2 situations:


1) Operand evaluation produces a side-effect that changes result of subsequent operand evaluation
As we discussed previously, operand could be a function call with a reference parameter Operand could be used/modified more than once, as with ++ example

2) An operand may not be even be valid if a previous operand evaluates in a certain way
Ex: if ((X != 0) && (Y/X < 1)) cout << rational; Considering the && operator, if the first operand evaluates to FALSE, the second operand evaluates to a run-time error Now if the compiler would try to do these in parallel it could cause problems Solution is SHORT-CIRCUIT EVALUATION (SSE)

23

Expressions

Idea of SSE is simple:


Evaluate boolean expressions only until a final answer can be determined
For example with &&, we know that FALSE && ANYTHING == FALSE so we would not get the division by zero error

SSE is nice because it makes our code simpler


If we know compiler uses SSE, we can put into a single expression what otherwise would require two

24

Expressions

Ex:

if ((X != 0) && (Y/X < 1)) cout << rational;

Without SSE, how would we have to write this to prevent possible run-time error?
Do on board

Drawbacks of SSE?
Now computer must evaluate operands sequentially Slows down program execution, especially in environments with multiple CPUs

So we have safety/ease of programming vs. execution efficiency


25

Expressions

Solution is to offer programmer the choice


Ada uses arbitrary evaluation of operands normally
But special operators and then and or else provide short-circuit evaluation if desired

C++ and Java use SSE for && and || but arbitrary evaluation for bitwise & and |

26

Expressions

Assignment
Central to Imperative Languages

Gives a value to a variable


Typical syntax:
<variable> <assig. operator> <expression> Semantics:
1) Compute lvalue of variable 2) Compute rvalue of expression 3) Store computed rvalue in lvalue location

27

Expressions

Variations
Some languages allow multiple targets

C++ and Java allow conditional targets


Wacky ?: operator

C, C++ and Java have many assignment variations for convenience


Ex: ++, +=, *=

C, C++ and Java return the rvalue as operation result


Allows assignment to be mixed within other expressions As with many features from C, C++, this is both good and bad
28

Expressions

Allows shorter code in cases such as:


A=B=C while ((ch = getchar()) != EOF)

Since it is changing the value of a variable, order of evaluation is critical


Typically associates right to left, and it is a good idea to parenthesize (as above)

Famous C/C++ bug that we mentioned before: if (x = y) is wacky!


Will ALWAYS be true if y is non-zero Will ALWAYS be false if y is zero Newer compilers warn you about it Not possible in Java since if requires a boolean

Concern also must be given for overloading the assignment operator (legal in C++ and Ada)
It is possible to cause it to behave differently from what is normally expected Care has to be taken so that it works in all cases
29

Expressions

Ex: Overloading = for a linked list variable


LList<myData> A, B; // Fill B with various nodes A = B;

If we want to use this assignment as with other assignments, we need to return the assigned result as the result of the assignment
In C++ this is typically a reference return value, so that we can cascade the operator effectively A = (B = C); (A = B) = C; On the left, when the assignment B = C is finished, we need the rvalue of the result On the right, when the assignment A = B is finished, we need the lvalue of the result Reference allows both (even though right seems silly to do)

Also, how about A = A;


If we destroy old LL before assigning new one, this could destroy the value
30

Expressions

One issue that you may not normally consider: How is the rvalue evaluated?
For statically typed languages, there is usually no ambiguity expression result type must match the type of the variable
But for dynamically typed languages, it is no longer clear
Ex: in Prolog A=5+3 Since A is not necessarily an integer, 5 + 3 could be taken as a string just as reasonably as it could be taken as an arithmetic expression See assig.pl

31

Control Statements

Primary types of control in imperative languages


Selection
Choose between 1 or more different actions

Iteration
Repeat an action 0 or more times

32

Control Statements

Selection
One-way selection
if statement exists in virtually every imperative language
Idea here is that we either execute a statement or do not In modern languages this is achieved using an if without the optional else

Two-way selection
Now we incorporate the else with the if
33

Control Statements

Typical syntax:
if <condition> <statement> else <statement>

Interesting issues:
1) Form of condition? 2) What kinds of statements are allowed? 3) Is nesting allowed and how is it interpreted?

34

Control Statements

1) Form of condition
Most languages require a boolean expression (true or false only) C/C++ are exceptions int values are allowed Original FORTRAN and BASIC allowed only a single statement
This is not conducive to good programming techniques Only way to have multiple statements is by using an unconditional branch, i.e. GO TO

2) Kinds of statements

35

Control Statements

ALGOL 60 introduced the compound statement


Now an arbitrary number of statements can be used All newer imperative languages (and updates of older languages) either use compound statements or allow multiple statements within the if

3) Nesting
It logically follows that a statement within an if clause or else clause could be another if statement
Remember orthogonality

What issues occur in this case?


36

Control Statements

Only problem of interest is one we have already discussed


If the number of if clauses and else clauses are not equal, how are they associated?

There are two main approaches to handling this:


1) Use a rule (static semantics) to determine how this is handled
This is the approach taken in Pascal, C, C++ and Java System handles the rule consistently, so there is no ambiguity, but, like rules of precedence and associativity, the programmer could forget it or make a mistake that is not caught Can lead to logic errors We have already seen this example

37

Control Statements

2) Use syntax to determine how it is handled


This is the approach taken in Ada, BASIC, Modula2, ALGOL 68 Every if statement must be syntactically terminated (ex: end if) Now an inner if clause without an else clause must still have an end if, and syntactically the outer else can only be associated with the outer if Perl has a slightly different approach: the statement for an if MUST be a compound statement. Result is the same, since the inner if will now be within a compound statement

38

Control Statements

Multiple Selection
Idea is to choose from many possible options

Clearly one way of doing this is through nested if statements


Often preferable, especially if the means of selection is a series of separate boolean expressions
// Break tie for A and B in some sport if (A beat B twice) then A wins tie else if (B beat A twice) then B wins tie else if (A scored more points than B) then A wins tie else if (B scored more points than A) then B wins tie
39

Control Statements

However, in some situations, the options are based on different result values of a single expression:
Ex: Menu in which user chooses an option from 1 to 5; each option causes a different action

In these instances, nested ifs could be used


In fact these are all we really need But the nesting gets complicated, often making the statements harder to follow and making them more prone to logic errors

So many languages supply a case statement


Specifically designed for multiple alternative selection based on different results of a single expression

40

Control Statements

There are some interesting issues to consider here


Many are the same as for two-way selection Text discusses them at length

A few that we will look at


What happens after the code for the matched selection is executed? One option is to break out of the structure, continuing with the next statement after it
This makes each option mutually exclusive This approach is taken by Algol W, Pascal, Ada Probably the most intuitive idea the choices are mutually exclusive by default
41

Control Statements

C, C++ and Java do not automatically break out after the selection has been executed
This is good and bad (as usual) Adds flexibility
If the execution for one selection is a superset of another, it makes sense to allow the flow to continue within the selection statement

Causes potential logic problems


Programmer must manually add breaks If one is missed no syntax error occurs

What happens if no match is found?


Two logical alternatives:
1. Do nothing 2. Error
42

Control Statements

C, C++, Java adopt the do-nothing approach


Seems logical that if nothing matches nothing should be done

ANSI Standard Pascal and Ada adopt the error approach


More reliable, since now an accidental out of range value will be detected as an error rather than just a do nothing

C, C++, Java, Ada, Turbo Pascal, BASIC also provide a default choice
Good idea to always use so you can detect an out of range value without causing a runtime or logic error

43

Control Statements

Iteration
Three primary types of iterative loops: conditional loops, counting loops and arbitrary loops 1) Conditional (logically controlled) loops
Number of iterations is determined by a boolean condition, and cannot be (usually) precalculated
Note that we cannot predict when this condition will become false
44

ex: while (infile && valid == 1)

Control Statements

Many languages have two versions of the conditional loop


Pretest condition is tested prior to entering the loop body
May execute loop body 0 times

Posttest condition is tested immediately after executing loop body


Will always execute loop body at least 1 time Ada does not have this version

Two versions are provided for convenience we can always simulate one loop with the other (plus some conditionals)
See loops.cpp Clearly the difference is where each is more appropriate
45

Control Statements

Conditional loops are the most general kind of loops, and are really all that is needed in an imperative programming language However, many looping applications deal with arrays and sequences of values
For convenience and efficiency it is prudent to provide a looping structure geared toward these applications

2) Counting Loops (counter-controlled loops)


Number of iterations determined by a control variable, an initial value, a terminal value, and an increment
46

Control Statements

We can (usually) precalculate the number of iterations based on the initial value, terminal value and increment
Ex: for (int i = 3; i <= N; i+=2) { i obtains values 3, 5, 7, , N (or N 1 if N is even) For N = 31, the number of iterations equals CEILING((TERM INIT+1)/INCR) or CEILING((N 3 + 1)/2) = CEILING((31 3 + 1)/2) = 15

Precalculation is nice because it allows the computer to base the loop on an iteration count (if it chooses to do so) which can be executed more quickly than conditional testing each time
47

Control Statements

Machine can use a register for the iteration count and not have to worry about obtaining operands for the comparisons at each iteration of the loop, something that must be done with a conditional loop

To allow precalculation and iteration counts to work, some restrictions must be made on the loop
Loop control variable cannot be altered by the programmer within the loop body Terminal value must be calculated only one time, when loop is first entered It will also speed things up if the loop control variable is an integer (or integral type) so no float operations are necessary

This is the approach taken in Pascal and Ada


See for.p
48

Control Structures

Pascal and Ada also do not allow an increment other than 1 or 1, and do not carry the value of the control variable past the end of the loop
In Pascal, the value is officially undefined, but in any Pascal implementation it will typically be one of two things: 1) The terminal value of the loop or 2) The terminal value + 1 or 1. 1) typically indicates that iteration counts are being used In Ada, the loop control variable is implicitly declared in the loop header, and becomes really undefined at the end of the loop accessing it afterward will cause an undeclared variable error
This is now generally accepted as a good idea, since it reduces side-effect problems of using loop control variables that were declared and assigned elsewhere. C++ and Java both allow (but do not require) this as well
49

Control Structures

Attitude in Pascal and Ada is that if you want more complex iteration (ex. increment other than 1 or 1, option of changing number of iterations during the loops execution) you should use a while loop

C, C++ and Java have a different approach


For loop is not really a for loop in the traditional sense
It is a very general loop that can be used for any looping application It more appropriately is a while loop with the addition of an initialization-statement and a post-body statement
50

Control Statements

for (init-expr; pretest-expr; post-body-expr)

Now really anything goes and the pre-testexpr and post-body-expr are evaluated for each iteration of the loop Can certainly be used for a counting loop, as most of you have used it

Can also be used as an arbitrary loop to do more or less whatever programmer wants it to do
Added flexibility, with added danger The usual for C, C++ see for.cpp

51

"foreach" loop

Newer languages also have included a "foreach" loop to iterate through data
Key difference between "for" and "foreach"
"for" iterates through indexes (typically), which can be used to access an array / collection if desired
Loop control variable is typically an integer

"foreach" iterates through the values in the collection directly


No indexing is used, at least not directly Loop control variable is the data type we are accessing in the collection
52

"foreach" loop

foreach loop has its advantages and disadvantages


Advantages:
Since no counter is used, we eliminate the possibility of index out of bounds problems

We can iterate over a collection without having to know the implementation details of the collection
Allows for data hiding and improves error prevention We will likely discuss this more when we discuss object-oriented programming
53

"foreach" loop

Disadvantage
When accessing an array, we may want or need the index value
Ex: What if we want to change the data in the array or reorganize it
Ex: Sorting would difficult using "foreach"

See forEach.java and foreach.pl

54

Control Statements

3) Arbitrary Loops Now the loop is basically an infinite loop, with the programmer expected to break out of it explicitly at some point Ada allows this with the
loop
end loop; exit statement will break out of the loop, and can be put into an if statement Thus we can break out of the loop from more than one place
55

Control Statements

Although C, C++ and Java do not explicitly have this construct, you can certainly build it by making a while or for loop an infinite loop and using the break statement to break out
while (1) // C while (true) // Java

{
}

{
}

Again this feature adds flexibility, but makes code less readable and harder to debug

56

Control Statements

Unconditional Branching
Transfer execution from one section of code to another section of code Commonly known as the goto Used extensively in early languages which lacked block control structures
Ex. early FORTRAN and BASIC programs relied heavily on the goto

It was necessary then, but most modern languages contain block control structures
57

Control Statements

Even then computer scientists were aware of how problematic they could be
Spaghetti code that results is very difficult to read Modification of one code segment can significantly impact many parts of the program programmer must be aware of all places that can go to that code segment Debugging is very difficult it is hard to find and fix logic errors since all possible execution paths are difficult to trace

Now languages have blocks and extensive control structures


It has been shown that goto adds no functionality (i.e. nothing can be done with it that cannot be done without it) However, many languages still have goto
58

Control Statements

Unrestricted goto allows code segments that normally have only one entry and exit point to have many
Ex: What happens if you jump into the middle of a procedure (what about parameters?) or a while loop (condition is skipped)

Most newer languages that have the goto have restrictions on it


Ex: Cannot jump into an inactive statement or block in Pascal If restricted and used infrequently, can actually be useful in some languages
Ex: Pascal does not have a break statement. If an exceptional situation would case an exit from a loop, using a goto may be more readable than adding extra convoluted logic
59

Control Statements

Some (newer) languages do not have goto at all


Ex: Java Allows breaks from loops Has exception handlers

60

Subprograms

Subprograms
Semi-independent blocks of code with the following basic characteristics:
Only one entry point the beginning of the subprograms, and execute when called:
Parameter information is passed to subprogram Caller execution is temporarily suspended, and subprogram executes When subprogram terminates, caller execution resumes at point directly following the subprogram call

61

Subprograms

What types of subprograms can we have?


Most languages have two different types, procedures and functions
Procedures can be thought of as new named statements that can supplement the predefined statements in the language
Ex: Statements to search or sort an array

Once defined, these can be used anywhere they are needed in a program
62

Subprograms

In order to have an effect on the overall program, a procedure needs to act on something other than just the variables local to the procedure. This can be done through:
Outputting data to the display or to a file Altering a (relatively) global variable that will be accessed/used later by a different part of the program Altering formal parameters such that the actual parameters in the caller are modified
This will be discussed in more detail soon

63

Subprograms

Functions can be thought of as code segments that calculate and return a single result
Modeled after math functions Used within expressions, where result value is substituted for the call The effect of functions on the overall program is the value returned by them. Thus, from an ideal (and mathematical) point of view, functions should have NO OTHER effect on the overall program

64

Subprograms

Should NOT modify global variables Should NOT alter actual parameters

Naturally, both of the above are allowed in many languages


In these cases it is up to the programmer to decide how he/she wants to use functions Again the tradeoff for the increased flexibility is the more potential for logic errors and more difficulty in debugging

C/C++/Java
Only have functions, no procedures

void functions can mimic the behavior of procedures


65

Subprograms

Local variables
How/when are they allocated?

Stack-dynamic:
Default in most modern imperative languages Required for recursive calls, since memory must be associated with each call, not each subprogram
Ex: Binary Search
mid = (left + right)/2;

Many different values for mid must be able to coexist, one for each call on the run-time stack Could not do it memory was statically allocated
66

Subprograms

Overhead is time for allocation and deallocation each time a subprogram is called
May not seem like a lot of time is needed, but it can add up if many calls are made in a program

Access must be indirect since actual memory location of variable will not be known until a subprogram call is made
Location in run-time stack depends upon calls made prior to current one, which can differ from run to run Also adds some time overhead

Static:
Used in languages that do not support recursion (ex. older FORTRAN)
67

Subprograms

Also optional in other languages, such as C and C++ Allow variables to retain values from call to call
Remember the lifetime is the duration of the program Ex: In CS1501 LZW algorithm writing codewords to a file, the bit buffer is static The leftover bits are kept in the buffer for the next call

68

Subprograms

Parameters
Parameters are vital to subprograms

Allow information to be:


Passed IN to the subprogram Passed OUT from the subprogram Passed IN and OUT to and from the subprogram

When writing subprograms, programmer decides which is required for a given subprogram
69

Subprograms

Then programmer utilizes syntax/rules in language being used to achieve the desired option
Sometimes the syntax/rules of the language do not fit exactly with the 3 use options given In these cases programmer must be careful to use the parameters as he/she intends

Some definitions:
Formal Parameter:
Parameter specified in the subprogram header Only exists during duration of subprogram exec Sometimes called "parameter"
70

Subprograms

Actual Parameter:
Parameter specified in call of the subprogram May exist outside of the scope of the procedure Sometimes called just "argument"

Rules for Formal and Actual parameters differ, as we will discuss

71

Subprograms

Parameter Passing Options


Pass-by-Value

Pass-by-Reference
Pass-by-Result Pass-by-Value-Result Pass-by-Name
You should be familiar with Pass-by-Value and Pass-by-Reference Others may be new to you Well discuss each
72

Subprograms

Pass-by-Value
Formal parameter is a copy of the actual parameter
i.e. get r-value of actual parameter and copy it into the formal parameter

Default in many imperative languages


Only kind used in C and Java

Used for IN parameter passing

Actual can typically be a variable, constant or expression


73

Subprograms

Benefit is that actual parameters cannot be altered through manipulation of the formals

Also useful in some recursive calls, since a new copy is made with each call
Problem is that copying a parameter can be quite expensive, both in terms of time and memory
Ex: Consider an object with an array of 1000 floats
Object is copied with each call to the function If, for example, recursive calls are made, a lot of memory can be consumed very quickly
74

Subprograms

Implementation:
Using a run-time stack, this is straightforward
When subprogram is called, copy of actual parameter is placed into a local variable, which is stored on the run-time stack (in the activation record for the subprogram) During subprogram execution, formal parameter is used like any other local variable for the subprogram Only difference is that it is initialized via the actual parameter

75

Subprograms

Pass-by-Reference
Formal parameter is a reference to (or address of) the actual parameter variable
get l-value of actual param and copy it into the formal param, then access the actual param indirectly through the formal param

Used in Pascal (var parameters), in C (using explicit pointers) and C++ and PHP (&) Most appropriate for IN and OUT parameter passing, but can be used for all Actual param usually restricted to a variable
76

Subprograms

Benefit is that we can change or not change the actual parameter using the formal it is up to the programmer Also good that memory is saved only an address is copied Problem is that we can miss logic errors if we accidentally alter an actual parameter through the formal parameter Also some applications (ex: some recursion) dont work as well
We may not want change at one call to affect another call
77

Subprograms

Constant Reference Parameters


Developers of C++ realized that value parameters are not practical for large data objects (too much time and memory, esp. for recursive algorithms) Reference parameters have danger of accidental side effects (when used for IN parameters) Solution is to pass parameters by reference, but not allow them to be altered constant reference
Now compiler gives error if parameter is changed within subprogram Copy made if passed by reference to another sub
78

Subprograms

Good concept, but not perfect


Programmer can get around it by casting to a pointer and altering indirectly See params.cpp

Ada IN parameters have a similar idea


Cannot be assigned/altered within the function Cannot be passed by out or in out to another sub More on Ada params shortly

Implementation:
Using run-time stack, address of actual is stored in activation record Actual is accessed indirectly in sub through its address
79

Subprograms

Pass-by-Result
Reference parameters are not an exact fit for out parameters
Ex: A procedure designed to read data from a file into an object

Here we dont care about what used to be in the object we just want to be sure that at the end the appropriate value is assigned
With reference parameters we COULD access the old value and use it if we wanted to (or by mistake) Pass-by-Result prevents this
80

Subprograms

In Pass-by-Result, actual parameter is not actually passed to the subprogram it only waits to have a value passed back to it Formal parameter is a local variable
During life of subprogram its value does not affect actual parameter at all At end of subprogram its value is passed back to the actual parameter

So what is actually needed of actual parameter is its address (lvalue)


When address is obtained can affect result for some contrived examples
81

Subprograms

// Note: This is NOT real code int A[8]; for (int i = 0; i < 8; i++) A[i] = i; global int j = 2; foo(A[j]); output(A[]); sub foo(int param) { int temp = 25; j = 5; param = temp; } -----------------------------------------------Output: 0 1 25 3 4 5 6 7 // if address obtained // at call Output: 0 1 2 3 4 25 6 7 // if obtained at ret.
82

Subprograms

If used, address is typically obtained at call Ada 83 out parameters for simple types are ALMOST this, but the formal parameter value cannot be accessed within the sub (so it is not really a local variable)
Ada 95 changed out parameters to allow them to be accessed, fitting the Pass-By-Result model more closely

Implementation:
At sub call, actual param address is calculated and stored in run-time stack, as is the formal param (as a local) Final result of formal is copied back to actual address at end of sub
83

Subprograms

Pass-by-Value-Result
Now actual parameters value is passed to the formal parameter when subprogram is called, being stored and used as a local variable At the end of the subprogram the value is passed back to the actual parameter As the name indicates, this is a combination of Pass-by-Value and Pass-by-Result Used for IN and OUT parameters
84

Subprograms

If aliasing is NOT allowed/used, and if no exceptions occur in the subprogram the effect of value-result and reference is the same
Precondition: Actual parameter has value obtained previous to call

During subprogram: Only formal parameter is accessed, updated as desired


Postcondition: Actual parameter has last value assigned within subprogram

85

Subprograms

However if aliasing is allowed/used, there can be differences


Ex: Actual parameter is accessed directly as a global variable and is also passed to the sub as a parameter
With reference params, changes to the formal immediately change the global actual param With value-result params, changes to the formal do not affect the global actual param (until the sub terminates)

Ada uses value-result for simple IN OUT parameters


But in Ada 83 it is not specified how structured in out params are passed
86

Subprograms

Idea is that language creators did not want to require the params to be passed in any specific way
They just wanted to require the in-out effect If the result could differ based on whether params are value-result or reference, then the program is erroneous
Up to programmer to NOT use aliases

Ada 95 clarified, requiring all structured inout parameters to be reference


See params.adb

Implementation:
Value + Result
87

Subprograms

Pass-by-Name
Definitely wackiest way of param passing

Used for IN and OUT parameters, and only in Algol


Idea is that actual parameter is textually substituted for the formal in all places that it is accessed in the subprogram
Kind of like a macro substitution

It is only evaluated at the point of use in the subprogram


Evaluated EACH TIME it is used in subprogram
88

Subprograms

Thus the parameter value or address could change based on where/when in the subprogram it is evaluated However, the referencing environment used is that of the CALLER, not of the subprogram
So only changes within the subprogram that have a global effect will change its evaluation This also makes implementation more difficult

For simple variables this is equivalent to pass-by-reference


Variable address evaluates the same way regardless of where in the subprogram it is located
89

Subprograms

For constant expressions, this is (almost) equivalent to pass-by-value


Evaluation of constant expr. will not change from one part of the subprogram to another
But cannot assign a new value to the formal param unless a copy is made

But it gets wacky when array elements or variable expressions are passed
Now changes within the subprogram can affect the index of the array or a variable within the expression
Can cause evaluation to differ in different parts of the subprogram
90

Subprograms

global int i = 0, var = 11, n = 5; global int A[2] = {4, 8}; foo(var, 2*n, A[i]); // all pass by name void foo(int x, int y, int z) { x = x + 1; output(var);

output(y);
output(z); i = i + 1; }

n = n + 1;
z = z + 1; z = z + 1;

output(y);
output(z); output(z);

91

Subprograms

Implementation:
It is not trivial to allow macro to be evaluated and reevaluated in environment of the caller Parameterless subprograms called thunks are used
Thunk evaluates parameter in current state of callers referencing environment Returns the resulting address or value

Clearly this is a lot of overhead

Overhead and confusing results are why this is not used in newer languages
92

Subprograms

Subprograms as Parameters
We allow variables as parameters so that we can access their values (or addresses) from within a subprogram Why not allow subprograms so that we can execute them from within a subprogram? Some languages do allow this (ex. Pascal, C++, PHP)

However, there are some issues to consider

93

Subprograms

Can the parameter subprogram arguments differ in form from each other?
If so, how to type check and even check the number of arguments when the subprogram is actually called?

Easiest solution is to require the arguments to all have the same form
Header of parameter subprogram must be given within the header of the subprogram it is being passed to

Scope is also an issue what is the referencing environment of the subprogram that is being passed as a parameter? Three reasonable possibilities exist:
94

Subprograms

1) The referencing environment in which the parameter subprogram is CALLED: shallow binding 2) The referencing environment in which the parameter subprogram is DEFINED: deep binding

3) The referencing environment in which the parameter subprogram is PASSED as an argument: ad hoc binding
Note that shallow binding fits well with dynamic scoping and deep binding fits well with static scoping
95

Subprograms

Pascal and C++ both use deep binding Shallow binding is used by SNOBOL, which also uses dynamic scoping Ad hoc binding has never been used See fnparams.cpp

96

Subprograms

Overloading (ad hoc polymorphism)


Using the same subprogram name with different parameter lists When a subprogram is called, the compiler selects the correct version based on the parameter lists
In Ada, return type for a function is also used, since coercion is not done in Ada and function return values cannot be ignored

Enables programmer to use the same name for similar functions that take different argument types
97

Subprograms

Use: Make it easier for the programmer to use consistent names for subprograms
Without overloading: Programmer must make up different but similar names for subprograms that do similar things but for different types
Ex: abs(int) fabs(float) labs(long) Ex: ISort(int * A) FSort(float * A)

With overloading: Programmer uses the same name and the compiler decides which to use
Ex: abs(int) abs(float) abs(long) Ex: Sort(int * A) Sort(float * A)

98

Subprograms

But programmer must be careful:


Ada and C++ both allow overloading and default parameters Leaving out some parameters in the call could make a call ambiguous
i.e. it matches more than one function header

Call can also be ambiguous if implicit casting of arguments is done

Operator Overloading is the same idea, but with symbols rather than identifiers
We discussed these issues previously
See Slide 12 of cs1621b.ppt
99

Generics

Generics
Parametric polymorphism
One or more parameters are passed to a subprogram when it is instantiated (i.e. when the code is generated) indicating the types that will be used for the parameters in the subprogram call
Can also be used in conjunction with packages (Ada) and classes (C++)

Thus a single subprogram declaration can be used to generate many different callable subprograms, all with the same functionality
100

Generics

Motivation:
Programmers often apply data structures and algorithms to more than one data type
Ex. Sorting, Searching algos Ex. BST, PQ, Stack, Queue data structures

Even with overloading, the programmer must still write different (identical except for type) versions of the code Generics simply transfer the job of making the different versions from the programmer to the compiler automates the overloading process
Note that DIFFERENT VERSIONS of the code MUST STILL BE generated
101

Generics

So the reason we have generics is to save the programmer some time (and perhaps some confusion)

Ada vs. C++:


In Ada, template instantiations must be explicit
Programmer specifies template arguments using the new statement Ex: package int_io is new integer_io(integer); The generic package is integer_io The instantiated package is int_io The type argument is integer

As is usual in Ada, if declaration is explicit, there will be no surprises


102

Generics

In C++, template instantiations can be explicit or implicit Implicit: generated automatically by the compiler when a call is seen with the appropriate arguments
Duplicate instantiations are merged into a single code segment Coercion cannot be done, since the types wont match the template correctly Saves programmer some typing

Explicit: programmer declares each version


Coercion can be done using regular C++ promotion and conversion rules Programmer is aware of each version

See template.cpp and tordlist.h


103

Generics

Java Generics
In Java 1.5 "generics" were added to the language It is somewhat misleading, since generic abilities were always built into the Java language
Collections were defined in terms of class Object, which is the superclass to other Java classes

They could be used to store any Java class

104

Generics

However, retrieving objects back from the collection required explicit casting to the actual type if we wanted full access to them
ArrayList A = new ArrayList(); A.add(new String("Wacky")); String S = (String) A.remove(0);

Also any typing mistakes (mixing types in the collection unintentionally) could only be caught at run-time (via casting exceptions) Overall not bad, but some people thought type parameters should be allowed
105

Generics

JDK 1.5 added syntax very similar to that for C++ templates

However, it is very different from C++ templates (and Ada generics as well)
It is not really adding any new generic abilities to the language It is not creating new code for each version of the class or method It is designed to make collections of objects more type-safe See more details in the handout
106

Implementing Subprograms

What is involved when a subprogram is called, during its execution, and when it terminates?
This will differ depending on if recursion is allowed in a language or not

Most modern languages allow recursion, but original FORTRAN (up to FORTRAN 77) did not allow it

107

Implementing Subprograms

FORTRAN 77 (and before)


All variables within a subprogram were static, and recursive calls were not allowed Activation records were still used, but they also could be static
Since all data was static, the size was known at compile time
Run-time stack not needed, since at most one call per sub could be performed at a time

What do we need to know when a subprogram is called?


108

Implementing Subprograms

Return Value

If sub is a function

Local Variables

Static

Parameters

Like local variables that are initialized


Where to go back to when subprogram ends
109

Return Address

Implementing Subprograms

C, C++ and Java


To allow for recursive calls, a run-time stack is used Multiple activations of the same subprogram can co-exist
Each needs its own copy of parameters and local variables

But subs are not allowed to be directly nested


The only non-local variables that need to be accessed are global variables However, inner classes allow a nesting of sorts 110

Implementing Subprograms

So the activation record looks similar to that used in FORTRAN


With additional link location to access global variables

Now multiple instances of an activation record can occur at the same time, so they must be created dynamically (at run-time), unlike in FORTRAN Lets look at some of the contents of an activation record

111

Implementing Subprograms

Temporaries

Local Variables

Temps and local variables are allocated within the subprog. call. In Pascal, C and C++, the local variables must be of fixed size. In Ada, they can be variable size (ex. arrays)

Parameters
Dynamic Link to previous call Static Link to NonLocals Return Address

Parameters, links to non-Locals and the return address are placed into the AR by the caller of the subprogram, so they are lower in the record

112

Implementing Subprograms

See rtstack.cpp

Accessing non-local variables within a subprogram


Local variables are located within the activation record (AR)
Can be accessed by knowing the base address of the AR plus a local_offset for each variable Ex: Base address of AR = 162

int x, y[5]; // address of x is 162 + (other AR stuff) float z; // address of z is 162 + (other AR stuff) // + 4 + 20
113

Implementing Subprograms

Non-locals are located elsewhere


For languages like C and C++:
Subprograms cannot be nested Besides locals there are global variables

For languages like Ada and Pascal:


Subprograms can be nested to arbitrary depth A sub can be declared within a sub, which is within a sub, which is within a sub Using static scope, variables declared in a textual parent sub are accessible from an inner sub
Relative global variables

But the variable locations could be in different places on the run-time stack How to find them?
114

Implementing Subprograms

What do we need to do?


1) Locate the AR that contains the nonlocal

2) Find where in the AR the variable is located

Finding where in the AR to look is the same as for local variables


Keep track of a local_offset value for the variable

Locating the AR is a different story


May not be directly prior to current AR
115

Implementing Subprograms

Two techniques used to locate AR


1) Static links

A link is kept in an AR to that ARs textual parent (from the declaration)


To access a single nonlocal many links may be crossed A single array is kept to indicate all of the currently accessible nested subs Any nonlocal can be accessed with two indirect accesses
116

2) Display

Implementing Subprograms

Static Links
Due to rules of static scope, if a subprogram is called, its textual parent subprogram MUST be active
sub foo { sub fum { } } main { // cannot call fum directly }
117

Implementing Subprograms

However, textual parent does NOT have to be previous call on run-time stack
So dynamic link in AR is not enough (but would work for dynamic scoping)
sub foo { sub innerA { } sub innerB { innerA; } innerB; } main { foo;

innerA
innerB foo

}
118

Implementing Subprograms

Static links connect an AR to the AR of the subs textual parent, no matter where previously on the RT stack it is How is this used to access nonlocal variables?
Can be determined and maintained based on the nesting depths of the subprograms that are called
The difference in the nesting depths between the sub using a nonlocal variable and the sub in which the nonlocal is declared is equal to the number of static links that must be crossed to find the correct AR for the variable
119

Implementing Subprograms

This difference can be stored for each variable when the program is compiled, so that at run-time finding the variable is simple
sub parent { var X, Y sub child1 { var X, Z sub grand1 { var Z } } sub child2 { var Y call child1 } } main { call parent }

If variable Y is accessed within grand1


chain offset is 2, since Y is declared two levels outside grand1 so search for Y only has to be done once at compile-time at run-time we know to follow two static links, whatever call sequence is
120

Implementing Subprograms

What actually happens when a sub is called? AR for textual parent of sub must be located on the run-time stack, so that the static link can be linked to it
A clear (but inefficient) way to do this is to follow dynamic links down the RTS until the AR for the parent sub is found A better way can take advantage of the fact that the calling sub and the called sub must be relatives in the declaration tree
Calling sub could grandparent) Calling sub could Calling sub could Calling sub could recursion) Calling sub could be parent of called sub (but not be called sub (direct recursion) be a sibling of called sub be a descendent of called sub (indirect be a niece of called sub
121

Implementing Subprograms

So instead of following dynamic links, at compile-time we can pre-calculate the number of static links (from caller) to follow to find the appropriate textual parent AR
Always equal to: nesting_depth (calling sub) nesting_depth(called sub) + 1 Calling sub could be parent of called sub
X (X+1) + 1 = 0 static links (user caller's AR)

Calling sub could be called sub (direct recursion)


X X + 1 = 1 static link same textual parent

Calling sub could be a sibling of called sub


X X + 1 = 1 static link same textual parent

Calling sub could be a descendent of called sub (indirect recursion) Calling sub could be a niece of called sub
Follow diff. in nesting depth + 1 static links
122

Implementing Subprogams
procedure Bigsub is procedure A(Flag: Boolean) is procedure B is ... A(false); end; -- B begin -- A if flag then B; else C; end; -- A procedure C is procedure D is here end; -- D ... D; end; -- C begin -- Bigsub A(true); end; -- Bigsub
D dynamic link to C static link to C return addr. to C dynamic link to A static link to Bigsub return addr. to A param flag ( = false) dynamic link to B static link to Bigsub return addr. to B dynamic link to A static link to A return addr. to A param flag ( = true) dynamic link to Bigsub static link to Bigsub return addr. to Bigsub dynamic link to caller static link return addr.

Bigsub

123

Implementing Subprograms

Evaluation of static links


Maintaining is not too time-consuming
Chain offsets can be calculated at compile time

Local variables can be accessed directly

Non-locals must follow 1 or more static links


Works well if nesting depths do not get too deep

For deep sub nesting, cost of non-local access can be high


But usually 2 or 3 levels is max used
124

Implementing Subprograms

Display
Uses a single array to store links to ARs at all relevant nesting depths
To access a nonlocal at a given nesting depth, we just follow the display entry for that depth, then the local_offset
Never more than one link to follow

Array is updated as subs are called and as they terminate Generally faster than static links if many nesting levels are used We will skip the details here read the text
125

Implementing Subprograms

Nested declaration blocks


Idea could be similar to nested subs
Blocks could be treated as parameterless subs
Static links could be used to determine textual parent

But it is actually much easier to handle, since block entry and exit is always the same
Parent block goes to child block

When child block terminates, we revert to parent block


126

Implementing Subprograms

Simply push new block declarations onto run-time stack, and pop them when block terminates But we only have one activation record, so no links are required
"Non-locals" can be accessed just like locals

127

Implementing Subprograms

Dynamic Scoping
When a non-local variable is accessed, we always follow the dynamic links until the correct declaration is found
Clearly could differ depending upon call sequence But the mechanics are actually simple

ARs must store names of local variables so we know where to stop the search
In static scoping the names are not needed just the offsets
128

Data Abstraction

Procedural (process) abstraction:


Action can be performed without requiring detailed knowledge of how it is performed

Data abstraction:
New type can be used without required detailed knowledge of how it is implemented
We don't need to know the details of how it is stored in memory

We don't need to know the details of how it is manipulated via operations


129

Data Abstraction

More formally, an ADT must satisfy two conditions:


1) The declarations of the type and operations (interface) are contained in a single syntactic unit ENCAPSULATION
The interface does not depend on how the objects are represented or how the operations are implemented

2) The representation of the objects is hidden from users of the ADT DATA HIDING
Objects can only be manipulated via the provided interface
130

Data Abstraction

Ex: Stack
Data: something that can store and access multiple data values in the manner dictated by the operations Operations:
Push add new value to top of stack Pop remove top value from stack Top view top value (or a copy) without removing Empty is stack empty

User of stack only needs to know the parameters and effect of each operation to use a stack correctly Implementation could be an array, a linked-list, or maybe something different
Does not affect use Implementer can hide these details from the user through private declarations
131

Data Abstraction

The idea of data abstraction was not always supported by programming languages
Ex: FORTRAN, Pascal, C did not fully support either encapsulation or data hiding
When learning good programming style, users tried to "simulate" data abstraction
Logically group type definitions, procedures and functions together as a unit Only access the data type via the procedures and functions
Naturally, this was at the programmer's discretion See ADT.p
132

Data Abstraction

Newer languages added true data abstraction


Ada via packages
C++, Java, C#, Ada95 via classes / objects

Encapsulation units that contain all details of the new type Access modifiers that prevent access to internal details of the ADT from outside the encapsulation unit

See text for more details


133

Object-Oriented Programming (OOP)

Characteristics of OOP
1) Data abstraction: encapsulation + information-hiding
The operations for manipulating data are considered to be part of the data type (encapsulated) The implementation details of the data type (both the structure of the data and the implementation of the operations) are separate from their specifications and (possibly) hidden from the user
As we discussed with ADTs
134

OOP

2) Inheritance
The characteristics of an ADT (data + operations) can be passed on to a subtype
Subtype can also add new data and operations

Allows programmer to build new (derived) types from old (parent) ones
Common data/operations do not have to be rewritten (or copied) Operations that are slightly different in derived type can be rewritten (overridden) for that type New data/operations tailor the derived type to the problem at hand Parent type is unchanged and may (sometimes) be used together with derived type
135

OOP

Ex: Shape class


Has data: CenterX, CenterY Has operations AREA, DRAW Subclasses: Rectangle, Circle, Triangle
Each subclass inherits the data and operations from the Shape class Rectangle adds data: length, width Rectangle overrides AREA = length * width, and DRAW in appropriate way for a rectangle

Subclass of Rectangle: Square


Guarantees that length == width

Similar ideas for Circle and Triangle

136

OOP

3) Polymorphism
Variables of a parent class can also be assigned objects of a subclass (or subclass of a subclass) Operations used with a variable are based upon the class of the object currently stored (could be a parent type object or a derived type object)
Operations may have been overridden in the derived class Dynamic binding allows parent and derived objects to be used together in a logical way

137

OOP

Ex: Shape class


We could declare: Shape shapelist[100]; shapelist[0] = new Rectangle(0, 0, 10, 20); shapelist[1] = new Square(50, 100, 30, 30); shapelist[2] = new Circle(100, 50, 25); for (int i = 0; i < 3; i++) shapelist[i].Draw();

Polymorphism allows these different objects to be accessed consistently within the same array Think about how you could do the code above in C or Pascal
It would not be easy!
138

OOP

One option: Make one giant struct or record to contain all of the data, including a union or variant
Base class would use only the core data items Derived classes would use additional data items as provided in the union or variant To do the operations, we would need a switch or case to test which type the variable is, so that it can be written out appropriately

Now what if we want to add another new derived class, Pentagon?


With OOP, it is simple to add any new data and override the necessary operations Without OOP we would have to change the overall structure of the data and operations old types would change, possibly causing problems
139

OOP

OO Languages
1) Smalltalk was the first and purest OOL

All data (even numeric literals) are objects, and are all descendents of class Object
Objects are all allocated from the heap, and implicitly deallocated (garbage collection) Variables are references, with implicit dereferencing Execution of a program (logically) involves objects sending messages to each other, executing methods, and responding back
So the data is driving the execution, not the control statements
140

OOP

Smalltalk example to count letters in an input string


| data ctr letters | data := Prompter prompt: 'Enter your name' default:''. ctr := 1. letters := 0. [ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ]. letters printNl.

Note variables are not typed


Only type checking is that message sent to the object is recognized

Even blocks [] are objects


Evaluated when appropriate methods are called
141

OOP

Consider the while loop below


[ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ].

Semantics of this loop are as follows:


whileTrue: is a message sent to the top block, with the second block as a parameter The top block executes a method corresponding to the whileTrue: message that does the following:
Evaluates the top block If true, evaluates the parameter block If false, exits the method

This propagation of messages can sometimes lead to very short code, if variables are eliminated
142

OOP

Equivalent to previous code:


| letters | letters := 0. (Prompter prompt: 'Enter your name' default:'') do: [ :c | c isLetter ifTrue: [ letters := letters + 1 ]. ]. letters printNl.

Now we cascade the messages to allow fewer statements (also do: loop iterates through characters in a string, so we dont need the loop counter
(((Prompter prompt: 'Enter your name' default:'') select: [ :c | c isLetter ]) size printNl.

Now the select: loop generates a string based on the condition in the block
143

OOP

More on Smalltalk (classes and objects)


Data in an object can be an instance variable or a class variable
Instance variables are associated with objects
Separate data for each object Accessible only through the methods defined for that object always private to the class

Class variables are associated with classes


Shared data for all objects of the same class Accessible from all objects, but still private to the class

Methods have a similar grouping, but are public


Instance methods associated with objects Class methods associated with entire class
144

OOP

More on Smalltalk (inheritance)


Object base class of all others

Only single inheritance allowed


All inheritance is implementation inheritance
Data and methods of parent class are always accessible to the derived class
i.e. Cannot hide implementation details from derived class

Advantage: Derived class can likely implement its methods more efficiently with access to parent data Disadvantage: Change in parent class implementation will likely require change in derived class implementation
Ex. Traversable stack
145

OOP

More on Smalltalk (polymorphism)


All messages are dynamically bound to methods
At run-time, when a message is received, the objects class is searched for a method, then, if necessary its superclass, its super-superclass and so on up to Object

Variables have no types since they are only used to refer to objects, not to determine the messages an object can receive
Clearly some liabilities with this approach
Slows language down due to run-time overhead Programmer type errors cannot be caught until execution time
146

OOP

Let's look at some examples:


person.cls as an example of a new class
See personTest.st

student.cls as an example of a subclass studentTest.st as an example showing polymorphic access twodarry.cls as another subclass example
See twodTest.st

For more information, see the GNU Smalltalk User's Guide:


http://www.gnu.org/software/smalltalk/gst-manual/gst.html
147

OOP

2) C++ is an imperative/OO mix


Had to be backward compatible with C Wanted to add object-oriented features Result is that programmer can use as few or as many OO features as he/she wants to

C++ Classes and Objects


Can be static, stack-dynamic or heap-dynamic Member data and member functions can be private, protected or public Allows programmer to decide Like Smalltalk, has notion of class variables Delcared as static in C++ Destructor needed if object uses dynamic memory
148

OOP

C++ Inheritance
Do not need a superclass (no Object base class for all other classes) Multiple inheritance is allowed
Complex and difficult to use

Implementation inheritance or interface inheritance are allowed


With interface inheritance, all data and functions are still inherited, but only public ones are directly accessible to the derived class Advantage: Modifications to parent class do not affect derived class, as long as they do not change the interface Disadvantage: Operations may be slower, since they cannot access the data directly
149

OOP

C++ polymorphism
By default all functions are statically bound
Recall that this allows faster execution, a goal of the C++ language However true polymorphism can not be utilized with statically bound functions

Dynamic binding is enabled by using virtual functions and pointers (or references)
This tells the compiler not to bind the function name to the code until run-time

Abstract base classes can be created with pure virtual functions


Not implemented in the base class

See poly.cpp
150

OOP

3) Java falls in between Smalltalk and C++


Like Smalltalk:
Object is base class to other classes Single inheritance only Objects are (almost) all dynamic, with garbage collection References used to access Method names are (by default) dynamically bound

Like C++:
Access can be private, public or protected Static binding can optionally be used to improve run-time speed Overall syntax for member data and function access Variables are typed
151

OOP

Other Java OOP features:


Interfaces allow for a simplified form of multiple inheritance
An interface is in a sense a base class with no data and only abstract (pure virtual) methods A class that implements an interface simply implements the methods specified therein Advantages: Objects that implement an interface can be used whereever the interface is specified. This allows for a type of generic behavior
Ex: Comparable interface, Runnable interface

Disadvantage: Can become complicated when interfaces and inheritance are both used

Reflection that allows us to manipulate the classes themselves


See poly.java
152

OOP

OOL Implementation
Data:
Typically a record/struct type of storage is used Class Instance Record (CIR)
Data members are accessed by name, in the same way as records Subclass adds extra data to CIR of parent class Private access enforced by limiting visibility of the data

153

OOP

Subprograms:
Static binding
Subprograms that will be called are determined by the variable type Variable types are known at compile time and code can be determined then

Dynamic Binding:
Subprograms that will be called are determined by the objects type, not the variables type Objects stored in a variable are determined at run time Appropriate links must be stored with the object
But they are the same for all objects of that class

Virtual Method Table (VMT) used to store links to all pertinent subprograms
154

Parallelism

Parallelism is incorporated into programs for 2 primary reasons:


1) Program is running in a multiprocessing or distributed environment
Many computers now have multiple CPUs

Many jobs are distributed over multiple computers in a network


A programming language should be able to take advantage of this parallelism
Many algorithms can be improved if designed for parallel execution

This is PHYSICAL PARALLELISM


155

Parallelism

2) Program is running in a simulated parallel environment, allowing for asynchronous activity


Ex: Two windows are displayed to the user. One shows the current time (incremented by seconds) and one allows the user to draw images on the screen
We dont want the act of the user drawing to stop the clock We dont want the clock running to prevent the user from drawing Even with a single processor, we want both of these activities to execute in parallel

This is LOGICAL PARALLELISM


156

Parallelism

What issues must we be concerned with?


Synchronization
Execution of tasks in parallel causes them to be asynchronous
Cannot predict at what point in time one task will execute an instruction relative to another task

If the tasks are independent, this is not a problem


No resources are shared, so it doesnt matter where in the execution each task is Ex: One task to count ballots from Florida, one task to count ballots from New Mexico

157

Parallelism

If the tasks have some dependencies, there can be a problem


Most common dependency is shared data To handle this we must synchronize the tasks Cooperation Synchronization One task is dependent upon an output/outcome of another Ex: Task B must process data produced by Task A
Contractor B cannot put up drywall until contractor A has finished the wiring Task to count ballots cannot proceed until task that collects ballots provides it with some

We must have a mechanism that allows Task B to pause until the data is available
B could loop and keep checking for data B could wait for some signal from A
158

Parallelism

Competition Synchronization Both tasks are competing for the same shared resource If one or both tasks modify the data, it could cause data inconsistencies Ex: Task A and Task B are MAC machine accesses of the same bank account
Task Task Task Task Task Task A B A A B B checks the balance: $200 checks the balance: $200 withdraws $200 updates balance to $0 withdraws $200 updates balance to $-200

We must have some mechanism that ensures MUTUAL EXCLUSION for CRITICAL DATA
We could have a LOCK on the data, or a similar mechanism allowing only one task to access it at a time
159

Parallelism

Synchronization Mechanisms
Semaphores
Devised by Dijkstra
Basically guards that are placed around code
P must succeed to gain access to code
Decrements a counter when it succeeds

V executes when critical section ends Based on initial value of counter, we can control how many tasks are allowed to access the critical section at once

If used properly, can guarantee either cooperation or competition synchronization


However, it is easy to NOT use them properly
Can cause problems
160

Parallelism

Monitors
Devised by Hansen and Hoare

Critical data section is part of a data object that allows only one task entry at a time
Better than semaphores for competition synchronization, because mechanism is built into the monitor
Harder to programmer to mess up

No better for cooperation synchronization


Still must be done manually

Used in Concurrent Pascal, Modula-2 and (somewhat) in Java


161

Parallelism

Message Passing
Proposed by Hansen and Hoare

More general than either of the two previous techniques


Tasks are synchronized via messages sent to each other Message is similar in look/execution to a subprogram call, but with restrictions:
Caller (or passer) of the message is blocked at the call until the receiver is ready to receive it Receiver (or executer) of the message is blocked at the message code until the message is called Caller and Receiver meet at a rendezvous
162

Parallelism

Idea is that we know exactly where in the code both tasks will be when a rendezvous occurs
So even though tasks execute asynchronously, we synchronize them with respect to each other at a rendezvous

Ex: Ada

Still much of the work is up to the programmer

163

Parallelism

Parallel processing concerns


Data consistency
We have already discussed this
Mutual exclusion is needed to prevent multiple tasks from accessing critical data at the same time However, efforts to ensure data consistency can cause other problems, such as DEADLOCK and STARVATION

164

Parallelism

Deadlock
When a (shared) resource has restricted access, it can cause a task to stop execution
Wait in a semaphore queue Wait in a monitor queue Wait in an accept queue

If a circular resource dependency exists, we can get deadlock Ex:


Task A has acquired binary semaphore S1 Task B has acquired binary semaphore S2 Task A is waiting for binary semaphore S2 Task B is waiting for binary semaphore S1
165

Parallelism

Starvation
To combat deadlock, most languages allow a task to release a resource prematurely in some circumstances
Ex: If one of the Tasks in the previous example release the semaphore, the other can proceed

Under these circumstances there is the possibility that a task may never acquire all of the resources that it needs at the time it needs them starvation

We must be careful to avoid all of these problems when programming in parallel


166

Parallelism

Lets look at Java as an example:


Deadlock: see deadlock.java

Corrupt data: see corrupt.java


Some features of older Java impls are now deprecated because they are too prone to deadlock and starvation problems
Suspend / Resume
Does not free locked objects Can easily lead to deadlock if not resumed

Stop
Immediately frees locked objects Can lead to data inconsistency
167

Prolog

As we discussed previously, Prolog is a language used for logic programming

"Programs" in Prolog consist of facts and rules in a database


Facts consist of an identifier followed by a comma separated list of objects (atoms) followed by a period
The identifier represents some relationship amongst the objects, and is called a predicate The objects are the arguments
Ex. from ex1.pl: father(herb, irving).

168

Prolog

Rules are predicates that consist of a head and a body


In order for the head to "succeed" in its evaluation, all of the goals in the body must be satisfied Ex from ex1.pl:
sibling(X,Y) :- X \== Y, parent(P,X), parent(P,Y). The :- can be thought of as "if"

These goals could be facts, or could be other rules

Execution of a program is in fact a sequence of questions, or assertions


Database is searched in an effort to satisfy all of the assertions
169

Prolog

If assertions can be satisfied, answer is yes


Otherwise, answer is no

If a given assertion succeeds, execution proceeds to the next one If a given assertion fails, execution backtracks and attempts to re-satisfy the previous assertion

So what about variable assignments?


These are in fact just side effects that occur in an effort to satisfy the query In fact variables are not assigned in the traditional (imperative language) sense
170

Prolog

Variables in Prolog are dynamically typed and have two states:


Uninstantiated:
Variable is not associated with a value

Instantiated
Variable is associated with a value

Once a variable is instantiated, it keeps that value, and all occurrences of that variable within the same scope have that value
Cannot be re-assigned in sense of imperative languages However, if execution backtracks past the point at which it was instantiated, it can again become uninstantiated

Let's look again at ex1.pl


171

Prolog

Recursion and database search


Recursion is a fundamental part of programming in prolog Execution is simply satisfaction of goals, and there are no loops as in imperative languages Thus, to build complex "programs" we must utilize recursive programming

Each attempt to satisfy a goal initiates a search of the database


172

Prolog

By default the DB is searched from top to bottom

We can take advantage of this in our programs


Ex: put the base case before the recursive case, so we don't have to explicitly test for it Although, as the text points out, this could be considered to be a flaw in the language, since the order that the rules are considered should not matter to the "truth" of the logic

173

Prolog

If a subgoal in a rule fails at any point, we backtrack and attempt to resatisfy a previously satisfied subgoal
When resatisfying a subgoal, the db search resumes from the point at which it succeeded the first time

See recurse.pl

174

Prolog Lists

As in Lisp, the list is an important data structure in Prolog


A list consists of a head and a tail
Tail could be the empty list

175

S-ar putea să vă placă și