Sunteți pe pagina 1din 7

Stellenbosch University: CS144 Project Flapjack: A stack-based language for the JVM Part 1: The Reader

July 31, 2013

Introduction

This project is due Sunday 25 August at 23:55. See the section Hand-in Specication at the end of this document. In this project, we will implement an interpreter for a stack-based programming language for the JVM that well call Flapjack. The language will steal ideas from other stack-based languages such as Forth and Postscript, and languages like Lisp and Mathematica. We have not covered stacks yet, but if you want to, you can read ahead in the textbooks section on using stacks for evaluating arithmetic expressions, stack-based languages and function-call abstraction (pages 570575). You do not need to do this yet to complete the rst part of the project.

Flapjacks Syntax

What does a valid Flapjack program look like? (Roughly, this is what we mean by the syntax of a language). We start with this example from the textbook 1 2 3 + 4 5 * * + The structure of this program is very simple, it is just a series of forms (here numbers or operators) separated by whitespace (spaces / tab characters / newlines). We are going to add another possible form which will be a series of zero or more forms within curly brackets. In the snippet of code below, there are three such forms {5 7 <} ifthen {2 -3 +} else {4}

the forms in question being {5 7 <}, {2 -3 +} and {4}. These compound forms will allow us to more easily structure the language, just as they do in other languages like Java for example. Here is a more complex example of a Flapjack program that calculates the factorials 0! through 10!, and prints each one out, one per line: defun factorial { {dup 2 <} ifthen {pop 1} else { dup 1 factorial * } } -1 true while { {1 +} dup factorial println {dup 10 <} } \ {} --> {true -1} \ \ \ \ \ {k} --> {(k+1)} {j} --> {j j} {j j} --> {j! j} {j! j} --> {j} {j ...} --> {(j < 10) j ...}

\ {n ...} --> {(n<2) n ...}

\ {n ...} --> {1 ...}

\ {n ...} --> {(n-1) n ...} \ Recursion: {(n-1) n ...} --> {(n-1)! n} \ {(n-1)! n ...} --> {n! ...}

Note that everything after a backslash (\) is ignored as a comment (just as // starts a line comment in Java).

What does it all mean?


The lack of a detailed explanation of what each of the elements of Flapjack actually means is deliberate. The part of the project you will be writing can be used to implement many dierent languages. This doesnt mean that the code will be able to read any languages exact syntax, but you can easily translate a language so that its essence is captured by this syntax. Things like strings can always be added later. Some ideas if you want to play around with your code later:

x86 Assembly Language


label infinite_loop mov ax 5 mov bx 6 add ax bx jmp infinite_loop ...

Java(ish)
public class Foo { public static void main{String{} args} { int a = 4 if {a == 4} { System.out.println{5} } else { System.out.println{7} } } }

Turtle graphics: Logo(ish)


forward 20 penup turnleft 90 forward 30 pendown forward 10

First things rst: The Reader


In languages with very simple syntax such as Lisp and Prolog, the very rst stage of the compiler/interpreter is called the reader.1 Your rst task will be to write a reader for Flapjack. First we must specify the Flapjack syntax more exactly (you will learn how to do this more formally in a future course). We do this recursively, starting at the biggest unit, the program, and dening each part in terms of smaller parts.
In languages with more complex syntax such as Java, this stage is usually divided into what are called the scanner and the parser.
1

A Flapjack Program is a series of zero or more Forms. A Form is either a Simple Form or a Compound Form. A Simple Form is either an Integer or a Symbol A Compound Form starts with a { and ends with a }, between which are zero or more Forms. An Integer is a series of digits (0-9), possibly starting with a minus -. A Symbol is any string of characters (ASCII characters 33 through 126) that Does not start with a digit (0-9), but characters other than the rst one may be digits. Does not contain whitespace characters (spaces, newlines, tabs etc.) Does not contain { or } Does not contain a backslash \ (this is used for line comments, see below). Some more rules: Simple forms must be separated by either a piece of whitespace or curly brace, so 1234foo is not legal, but 1234 foo is, and so is 1234{foo}. Note that foo1234 is legal, it is a single symbol. Line comments are started using the backslash \. Every character from the backslash up until the end of the line is treated as whitespace. (By the previous rule, that means comments are a valid means of separating simple forms.) Based on this specication, we can outline the readers algorithm (note in this language we only ever need the current character and the next character in the program to make the necessary decisions during reading) Are we at the end of the le? If so, we are done. If this happens inside a recursive call to the reader, there is a missing closing curly brace }, throw a MissingStackEndTokenException. Read a character. Is the character whitespace? Then throw it away and start again. Otherwise continue on. Is the character a backslash \? Then read and throw away each character until the end of the current line and start again. Otherwise continue on. Is the character a digit? This is a positive integer, call the integer reading subroutine. Otherwise continue on.

Is the character a minus? If not, continue on. If so, this may be a negative integer OR a symbol starting with -. Look at the next character, if it is a digit, then call the integer reading subroutine (this is a negative integer). If it is not a digit, then we have a symbol (note that it may be a minus by itself, the next character may be whitespace!). Is the character a opening curly brace { ? If so, call the reader recursively and continue. Is the character a closing curly brace } ? If so, return from recursive reader call. If we arent in a recursive call to the reader, then this is a mismatched closing brace, throw a UnexpectedTokenCharacterException. When reading an integer, make sure to check whether it is within the bounds allowed for ints in Java. If there is overow or underow, signal an IntegerLiteralOutOfBoundsException.

Instructions
For this part of the project, you will be provided with code skeletons for the classes FJLookaheadReader, FJReader, FJInteger, FJSymbol, FJBoolean and relevant exceptions. You will also receive the interface FlapjackObject. You must implement the missing portions of these classes (and any auxiliary classes you deem necessary). Look at the code skeletons for API denitions, and what each method is supposed to do. We will now give an overview of what the function of each of the objects is within the larger system. Recall that while we are reading a program with this particular syntax, we only ever need to look at the current character and next character in order to make the necessary decisions. FJLookaheadReader allows us to do this. A new FJLookaheadReader is created by passing a string containing the entire program to the constructor. You will use this object to go through the string one character at a time, so that you can look at the current and next character at any point in the program. All pieces of data handled by Flapjacks virtual machine (you will get this later) will implement the interface FlapjackObject. FJInteger, FJSymbol and FJBoolean are the most basic examples of these. We will discuss what is meant by implement and interface when we deal with inheritance. For now, this aspect of the program is not important yet. FJInteger has the responsibility of representing and storing integer values (internally represented by a Java int). It also must contain the code for reading in an integer given a FJLookaheadReader using the static method readForm. This method is called by the FJReader when it has determined an integer has been encountered. You must write your own routine for converting the sequence of characters into an integer, you may not use existing library functions such as Integer.parseInt to do this for you. FJSymbol has the responsibility of representing and storing symbols (internally represented by a Java String). It also must contain the code for reading in a symbol given 5

a FJLookaheadReader using the static method readForm. This method is called by the FJReader when it has determined a symbol has been encountered. As with FJInteger.readForm, you must write your own routine to accumulate the characters of the symbol one by one, you may not use existing methods such as String.substring to do this. Note that symbol names are immutable. Once a symbol is created, its name may never be altered. FJBoolean objects are a special kind of FJSymbol which represent truth values. When the readForm class is called on FJSymbol, the method must determine whether the symbol represents true or false, and then return those FJBooleans instead of regular FJSymbols. There should only ever be two instances of FJBoolean, one representing the value true and one representing the value false. (hint: google for the Singleton design pattern). All boolean values are then just references to these two instances. This means that memory isnt wasted making redundant FJBoolean objects, and we can also check whether an object is the value true or false using the == operator, which is very fast. Just note, if an object is != true, it isnt necessarily false, it could be any other object like an FJInteger for example. FJReader is responsible for taking a String and implementing the logic to read in all the forms (symbols, integers and booleans) from the source, and skipping any whitespace and comments between the forms it encounters. It should delegate the reading of integers to the relevant method in FJInteger, and the reading of symbols to the relevant method in FJSymbol.

Output Specication
When you have completed this part of the project, your code will not yet execute the Flapjack program it is given, so we need another means of testing your code. To do this, you will write a small client with the class name FJTestReader and a main method that accepts a single lename as its only parameter from the console. The program then loads the entire contents of the le into a single string, and passes the string to FJReader. You then print out each element you encounter, one per line, as you encounter them. If an exception is encountered, you print the name of the exception on its own line, then the message associated with it on its own line, and then simply exit the program. Below is a more detailed description of how to print each language element ( is used to represent a single space within a string). FJSymbol: System.out.println("FJSymbol: " + StringOfSymbol + ""); FJBoolean: System.out.println("FJBoolean: " + StringOfBoolean + ""); FJInteger: System.out.println("FJInteger: " + intValue + " Hash: " + ihash(intValue)); 6

Opening brace ({): System.out.println("{"); Closing brace (}): System.out.println("}"); Exception: System.out.println(ex.getClass.getName()); System.out.println(ex.getMessage()); System.exit(1);

The static method ihash is dened as: static int ihash(int value) { return ((value * 11) ^ 123456); }

Test Files
You will be given a set of Flapjack programs to test with. If you can correctly process and output all the forms in the test programs, you will be on your way, but you will not necessarily get a good mark for this part of the project. You will have to think a bit further and even write your own test cases.

Hand-in Specication
After you have completed this part of the project, you are to compress all the source code into a tar le and submit it on http://webstudies2.sun.ac.za/. You have been given clear instructions with the rst tutorial on how to compress les using the tar format. If you submit late, your work will not be evaluated and it may cost you the course this semester! Rather submit a bit early.

S-ar putea să vă placă și