The runtime stack, and Scheme

Today we’re going to be talking about the runtime stack, and then moving on to Scheme, with a new assignment soon to be assigned soon.

Runtime Stack

Also known as just “the stack.”

Procedures can call each other, and even call themselves. When a procedure returns, you have to continue execution in the calling procedure just after the procedure call. To implement this, you use a stack data structure.

This stack stores all needed information about all currently active procedure calls, such as parameters, local variables, and so on.

The data structure that’s stored on the stack, corresponding to a single procedure call, is called a stack frame. (Also historically called an “activation record”.)

For example, if you have functions f, g, and h, and they call each other in this order:

f → g → g → g → h → f

The stack would look like this, with each of the table cells representing stack frames:

*
f
h
g
g
g
f

Where * is the top of the stack. New stack frames get added here, and stack frames are removed from this end when functions return.

We often draw the stack as growing up (and will do so in class), but in practice, the stack often grows down in memory.

Why? Because the memory layout of a program’s address space is usually something like this:

Section Description
STACK The stack. Grows down, so new entries appear at the bottom.
 
The idea is that you maximize the empty space available for both the stack and the heap by making them both grow into this empty space.
 
DATA Contains: constants, and global variables, and the heap where malloced or newed memory is allocated. The heap typically grows upwards when needed.
CODE Area where your executable code is stored.

What does a stack frame consist of?

Something like this:

Section Description
Local Variables Section of memory where the local variables for the called procedure are stored.
Dynamic Link Pointer to the calling procedure’s stack frame. More on this later.
Return address Address of the code in the calling procedure that you have to jump back to when the called procedure returns.
Static Link More on this later.
Parameters Parameters for the called function.
_______ __________________________________
... Calling procedure’s stack frame

This is managed mainly by assembly code’s call instruction, which puts the current code address onto the stack and then jumps to the code for the function you’re calling. There’s then a ret (return) instruction, which pops the return address from the stack, and jumps to that address.

There are two dedicated registers that every machine has: the stack pointer and the frame pointer. The stack pointer always points to the top of the stack. The frame pointer always points to a specific point in the stack frame, usually the dynamic link. Why? Because local variables and parameters are accessed as offsets from the frame pointer (fp ± some value).

And this demystifies the dynamic link for us: it’s just the frame pointer from the calling procedure. The frame pointer gets pushed onto the stack during the call instruction, and it’s restored afterwards.

But what happens if a function accesses some non-local variable?

Global variables are easy, in that they are sitting in some fixed place in memory. We know where they are.

But what if you’re accessing an outer variable in a nested function? For example:

procedure f() is
x : Integer := 0;
begin
    def g():
    begin
        x := x + 1;
    end g;
end f;

That’s where you use the static link. The static link points to the stack frame from the surrounding procedure. In our example, g’s static link would point to f. This isn’t needed in languages like C where you can’t nest procedures, but it is needed in languages like Ada, or any language that has nested procedures and static scoping. See the course notes for a useful example.

What if you take a procedure as a parameter, and the parameter-procedure accesses variables from its parent procedures? For example:

procedure A() is
    x : Integer;
    
    procedure B(procedure D)
        x : Integer;
    begin
        D();
    end B;
    
    procedure C()
    begin
        x := 1;
    end C;
    
begin
    B(C);
end A;

The function B has to know not just the location of the code for C, but also its static link. So the value that has to get passed in is a pair of pointers: one for C’s code, and one for its static link, which is A’s stack frame. This is called a closure.

And now, on to Scheme.

Scheme

Huzzah for functional languages! The next two languages we’re going to talk about, Scheme and ML, are functional. These are languages where:

  • Programs are composed of function definitions,

  • Variables denote non-modifiable values (like in math), not memory locations, and

  • Functions are first-class values, so they can be treated like any other kind of data: passed as a parameter, returned from other functions, put into a data structure, whatever.

Because variables can’t be changed, loops make no sense, and so don’t exist. If you can’t change a value, it makes no sense to do something multiple times. Repetition is instead accomplished through recursion.

A brief diversion on recursion

Remember: think about recursion the way you think about inductive proofs. Show what it’ll do in the base case (with the “smallest” possible input). Then assume it works right for input of “size” k, and compute the result for “size” k+1.

Remember the factorial function in ML:

fun fac 0 = 1
 |  fac n = n * fac(n-1)

First line is the base case, second is the recursive case, which assumes the recursive call works.

Back to Scheme

Scheme is a dialect of a language called Lisp, which is one of the earliest programming languages, developed mainly by AI researchers in the late 1950s. At the time, the main programming language in use was Fortran, which is very good for processing data but doesn’t have very flexible data structures or anything.

Scheme is a small elegant dialect of Lisp, developed at MIT in 1975. It’s statically scoped, as opposed to classic Lisp which was dynamically scoped by accident.

The implementation we’re going to be using is Racket. There’s a nice IDE called DrRacket, or you can use it in the shell or in Emacs. More on this tomorrow in recitation. More specifically, we’re using Racket R5RS.

Prof uses Emacs, and will be saving his demo for future perusal. Selected notes are here.

Scheme, like Lisp, is dynamically typed, so all type checking is done at runtime, and types are not declared. The programmer has to make sure they never perform any operations on variables of the wrong type.

Scheme has a few fundamental types built in:

23 ; Integers
4.623 ; Floating-point numbers
"I am a string" ; Strings
'hello ; A symbol: data type whose only attribute is its name
'Jonathan ; Another symbol

'(1 2 3 4 5) ; Primary aggregate data structure is a list
'(1 2 (3 4 (5) 6 7)) ; And they can be nested

We have 15 minutes left in class, and he’s going to give us the rest of the syntax of the language in that time. It’s possible because the syntax is trivial.

Scheme syntax

In scheme, you have two types of expressions:

  1. Atomic expressions: These are things like literals or symbols.

  2. Non-atomic expressions: All of these start with an open parenthesis (, a series of expressions, and a close parenthesis ). E.g. (exp1 exp2 exp3 ... expN). In this case, exp1 should be a function or a keyword.

So to call functions, you put the function name first, and then the arguments. Ex. to add two numbers, or to do a bit of math:

(+ 3 4) ; + is a function
(+ 3 (* 5 6))

Keywords include things like conditionals, using the if keyword:

(if (= 3 4) 'yes 'no) ; First "argument" to if is
                      ; condition, second is "true"
                      ; branch, and third is "false"
                      ; branch.

Defining a variable:

(define x 20)
(+ x 29) ; returns 49

x will now forever be 20. You never get to change it.

There’s a special version of define for defining functions, by using a non-atomic expression as the first “argument” to define:

(define (fac n) (if (= n 0) 1 (* n (fac (n-1)))))

Protip: you need an editor which can show you whether your parens are balanced.

Here’s another way you can do conditionals, which is more convenient than doing a whole bunch of nested ifs:

(cond ((= x 10) 'ten)
      ((= x 20) 'twenty)
      (else 'thirty))

This will execute the branch of the first true expression.

5 minutes left, let’s talk more about lists.

Lists

There’s a function called list that takes any number of parameters and puts them into a list.

(list 2 4 (* 5 6) (+ 7 8)) ; Returns '(2 4 30 15)

There’s another function called cons, which takes a value and a list, and returns a new list with the value at the front.

(cons 1 (list 3 5 7 9)) ; Returns '(1 3 5 7 9)

Note the quote character ' there and above: it tells the interpreter not to evaluate the expression, but instead treat it as data. If you just type (1 2 3) it tries to evaluate 1 as a keyword or a function and errors out. But typing '(1 2 3) just gives you a list containing 1, 2, and 3.

More in the next class.

Like these posts? Want more? Subscribe by RSS or email: