Sunteți pe pagina 1din 7

GARBAGE COLLECTION

Huan Keat, Toh


Florida State University
November 2003

ABSTRACT
Since the introduction of memory management technique in contemporary operating system,
much advancement has been made in terms of application development. As good as these
advancements may sound, many drawbacks soon follows. Many contemporary operating
systems offers the flexibility of explicitly allocating and de-allocating memory on the heap using
user-level functions such as malloc, calloc and free. Creating objects on the heap is easy, getting
rid of them is hard. Explicit reclamation of heap allocated objects imposes a serious burden on
the programmer and eventually leads to memory leaks and dangling references. Here we will
take a look at a mechanism employed by most modern programming language to automate the
reclamation of these heap allocated objects. We will mainly talk about two distinct technique;
reference counting and mark and sweep. One is not better that the other, but simply a necessity
in order for garbage collection to work.
Introduction
Garbage Collection was first introduced in functional languages. Imperative languages soon
realize that they too have to use this technique as more and more memory is consumed by
process. Clu, Cedar, Ada, Modula-3, and Java are among the languages that implements garbage
collection. It should be noted that these imperative languages mentioned above are all strongly
type language. In order for garbage collection to work, the programming language must be
strongly typed.

This mechanism to automatic reclaim unused or unreferenced heap allocated objects is a very
tedious process, but however cumbersome it may be, it benefits and convenience far outweighs
the implementation hardship. The two technique that makes garbage collection works are
``reference counting'' and ``mark and sweep''. These two techniques though entirely different,
must co-exist in order for garbage collection to work

Reference Count
A garbage collector has to have a way of knowing if the object is in use or not. One possible
way of determining it is to see if any references (i.e. pointers) from outside the heap exist.
Sample pseudo-code and Figure 1 (a) and (b) will provide help you better understand the
workings of ``reference counting''.
1 class Customer {
2 Customer(){…}
3 Handles
4 public static void main (String [] argv) {
Customer
5 Customer a, b;
a 1
6
7 a = new Customer();
8 b = new Customer(); Customer
9 a = b; 1
10
11 // rest of program
12 } Figure 1 (a)
13 }

1 class Customer {
2 Customer(){…} Handles
3
4 public static void main (String [] argv) { Customer
5 Customer a, b; a 0
6 b
7 a = new Customer();
Customer
8 b = new Customer();
2
9 a = b;
10
11 // rest of program
12 }
13 } Figure 1 (b)
All define handles in the program namely “a” and “b”, are references located outside the heap
(stack) to objects allocated in the heap. When the reference count of an object reaches zero, the
run-time system must “recursively” decrement counts for any objects referred to by pointers
within the object being reclaimed and reclaim those objects if their count reaches zero. (I.e.
Pointers to object within object.). Figure 2 (a), (b) and (c) illustrates the reference count
technique used to decrement count of objects within objects.
class Customer {
Address add
Account ac;
Customer Account
ac
Customer () { 1 add 1
add = new Address();
ac = new Account();
}

public static void main (String [] argv) {


Customer a, b; Address
1
a = new Customer();
b = new Customer();
a = b;
Figure 2 (a)
}
}

class Customer {
Address add
Account ac;
Customer Account
ac
Customer () { 0 add 1
add = new Address();
ac = new Account();
}

public static void main (String [] argv) {


Customer a, b; Address
1
a = new Customer();
b = new Customer();
a = b;
Figure 2 (b)
}
}

class Customer {
Address add
Account ac;
Customer Account
ac
Customer () { 0 add 0
add = new Address();
ac = new Account();
}

public static void main (String [] argv) {


Customer a, b; Address
0
a = new Customer();
b = new Customer();
a = b;
Figure 2 (c)
}
}
As mentioned early, garbage collection only works on strongly typed languages. Reference
count uses the type descriptor that is generated by the language compiler to keep track of every
pointer used in the program. One type descriptor for every distinct type used in the program. On
most system, the type descriptor is just a table that list the offset within the type at which
pointers can be found. The next important element is the counter field in every heap allocated
objects.

As promising as this method may sound, there are some major drawbacks. First, there is
memory space overhead in every heap allocated objects (counter field). Second, the cost of
updating reference count can be significant when large amount of pointers exist with a program.
Recursive update when there are pointers to object within objects exist. Lastly and most
important of all, this technique may fail to collect circular structures. Figure 3 illustrate the
shortcomings of reference count technique.

Circular Structure
„ Useful objects – references to objects exist.
„ Useless objects – objects that cannot be reached by following a
chain of valid pointers starting from outside the heap.
Stack Heap
stooges 2 “larry” 1 “moe”
1 “curly”

stooges := nil
stooges 1 “larry” 1 “moe”
1 “curly”
Mark and Sweep
This technique is devised to overcome the shortcomings of technique 1: Reference Count.
However, it is only executed whenever the system memory space falls below a minimum
threshold. This technique is a 3-step process:

1. The collector walks through the heap, marking every single block as “useless” regardless if it
is allocated for objects or not.
2. Beginning with all pointers outside the heap (the pointers on the stack), the collector
recursively traverse all linked data structures, marking each newly discovered blocked as
“useful”. (Here traversal is done using stack with size that is proportional to the longest
chain through the heap)
3. The collector walks through the heap again, moving every block that is still marked “useless”
to the free list.

Figure: 3 Step 2 – Traversal Illustrated

Step 2 – Traversal Illustrated


Stack Heap
s R

Stack
X Y

Z W Push() Pop()
Pointers outside the heap

W
Y
R
Marking is done again using the type descriptor generated by compiler. The type descriptor must
be word aligned on most machines; the two low-order bits of its address are guaranteed to be
zero. We can use these bits to store the “free” and “useful” flags by masking them out before
using the address.

Again, with every new technique introduced, there are some shortcomings. This is especially
true in the case of mark and sweep. The most apparent issue is the use of a stack for heap
traversal. Our goal of running garbage collection is to free up some memory because we are
running low on memory. But, traversal is done using stack with size that is proportional to the
longest chain through the heap. It makes no sense to use such a stack space because there is
none available. In contemporary memory management system, the heap and the stack grow
toward each other.

Stack
Figure 4

In our attempt to solve this major setback, some method/ technique to minimize the use of stack
space has to be devised. Hence “Pointer-reversal technique” introduced by H. Schorr and W. M.
Waite came to the rescue. This technique achieves traversal over the linked data structures
without using additional stack space. It gives us an “Illusion of a stack”. Figure 5 illustrates the
“Pointer-reversal technique”.

Technique 3 – Pointer-reversal

R R

X Y X Y

Z W Z W

Previous Block
Current Block
Figure 6 illustrates the 3 step process.

Mark And Sweep – 3 Stage


Process

Heap exploration via pointer traversal


(1) (2) (3)
W W W
Walks through the heap

Walks through the heap


Z Z Z

Move to free list


Y Y Y
X X X

R R R

Useless Objects
Useful Objects

Conclusion
Garbage collection is not a new concept in the field of programming languages. Many
imperative languages have now begun to realize that garbage collection is indeed a useful
feature. It saves much hassle in terms of application development. Although there are only two
techniques (“reference counting” and “mark and sweep”) mentioned in this paper, we have to
realize that there are many more different approaches. Reference counting is done during
application run-time. The Mark and Sweep is done only when the system memory runs below a
certain pre-defined threshold.

As we progress along the near future, we most certainly will see much more efficient technique
to automatically reclaimed heap allocated objects. Of course, when 64-bit computing hit
industry, we will see higher capacity hardware, especially computer memory. This leaves us
with a question; of wether a garbage collection mechanism for application software is still
significant.

References
Michael L. Scott, Programming Language Pragmatics - (Section 7.7.3), Morgan Kaufmann.
ISBN: 1-55860-442-1.

S-ar putea să vă placă și