Documente Academic
Documente Profesional
Documente Cultură
By now, you should have a pretty good feel for how the serialization mechanism works for individual classes. The next step in explaining serialization is to discuss the actual serialization algorithm in a little more detail. This discussion won't handle all the details of serialization.[5] Instead, the idea is to cover the algorithm and protocol, so you can understand how the various hooks for customizing serialization work and how they fit into the context of an RMI application.
After writing out the associated class information, the serialization mechanism stores out the following information for each instance:
A description of the most-derived class. Description : The version ID of the class, which is an integer used to validate the .class files A boolean stating whether writeObject( )/readObject( ) are implemented The number of serializable fields A description of each field (its name and type) Extra data produced by ObjectOutputStream's annotateClass( ) method A description of its superclass if the superclass is serializable
Data associated with the instance, interpreted as an instance of the least superclass. Data associated with the instance, interpreted as an instance of the second least superclass.
And so on until:
Data associated with the instance, interpreted as an instance of the mostderived class.
So what really happens is that the type of the instance is stored out, and then all the serializable state is stored in discrete chunks that correspond to the class structure. But there's a question still remaining: what do we mean by "a description of the mostderived class?" This is either a reference to a class description that has already been recorded (e.g., an earlier location in the stream) or the following information:
The version ID of the class, which is an integer used to validate the .class files A boolean stating whether writeObject( )/readObject( ) are implemented The number of serializable fields A description of each field (its name and type) Extra data produced by ObjectOutputStream's annotateClass( ) method A description of its superclass if the superclass is serializable
This should, of course, immediately seem familiar. The class descriptions consist entirely of metadata that allows the instance to be read back in. In fact, this is one of the most beautiful aspects of serialization; the serialization mechanism automatically, at runtime, converts class objects into metadata so instances can be serialized with the least amount of programmer work.
written to the stream, the handle is written to the stream, and no further operations are necessary. If, however, writeObject( ) is passed an instance that has not yet been written to the stream, two things happen. First, the instance is assigned a reference handle, and the mapping from instance to reference handle is stored by ObjectOutputStream. The handle that is assigned is the next integer in a sequence. TIP: Remember the reset( ) method on ObjectOutputStream? It clears the mapping and resets the handle counter to 0x7E0000 .RMI also automatically resets its serialization mechanism after every remote method call. Second, the instance data is written out as per the data format described earlier. This can involve some complications if the instance has a field whose value is also a serializable instance. In this case, the serialization of the first instance is suspended, and the second instance is serialized in its place (or, if the second instance has already been serialized, the reference handle for the second instance is written out). After the second instance is fully serialized, serialization of the first instance resumes. The contents of the stream look a little bit like Figure 10-5. Figure 10-5. Contents of Serialization's data stream
Reading From the description of writing, it's pretty easy to guess most of what happens when readObject( ) is called. Unfortunately, because of versioning issues, the implementation of readObject( ) is actually a little bit more complex than you might guess. When it reads in an instance description, ObjectInputStream gets the following information:
Descriptions of all the classes involved The serialization data from the instance
The problem is that the class descriptions that the instance of ObjectInputStream reads from the stream may not be equivalent to the class descriptions of the same classes in the local JVM. For example, if an instance is serialized to a file and then read back in three years later, there's a pretty good chance that the class definitions used to serialize the instance have changed. This means that ObjectInputStream uses the class descriptions in two ways:
It uses them to actually pull data from the stream, since the class descriptions completely describe the contents of the stream. It compares the class descriptions to the classes it has locally and tries to determine if the classes have changed, in which case it throws an exception. If the class descriptions match the local classes, it creates the instance and sets the instance's state appropriately.
Key structures implemented as part of the Java Collections API are various types of maps, and in particular the hash map (via the HashMap class and other related classes).
ConcurrentHashMap Maps allow you to associate keys with values and crop up in all sorts of uses such as:
Caches: for example, after reading the contents of a given file or database table, we could associate the file name with its contents (or database key with a representation of the row data) in a HashMap; Dictionaires: for example, we could associate locale abbrevations with a language name; Sparse arrays: by mapping integers to values, we in effect create an array which does not waste space on blank elements.
Frequently-accessed hash maps can be important on server applications for caching purposes. And as such, they can receive a good deal of concurrent access. Before Java 5, the standard HashMap implementation had the weakness that accessing the map concurrently meant synchronizing on the entire map on each access. This means that, for example, a frequently-used cache implemented as a hash map can encounter high contention: multiple threads attempting to access the map at the same time frequently have to block waiting for one another.
possible to lock only the portion of the map that is being accessed. This optimisation is generally called lock striping. Java 5 brings a hash map optimised in this way in the form of ConcurrentHashMap. A combination of lock striping plus judicious use of volatile variables gives the class two highly concurrent properties:
Writing to a ConcurrentHashMap locks only a portion of the map; Reads can generally occur without locking.
In this example, we're using a plain old HashMap wrapped up in a synchronization wrapper. Recall that wrapping the map with Collections.synchronizedMap(...) makes it safe to access the map concurrently: each call to get(), put(), size(), containsKey() etc will synchronize on the map during the call. (One problem that we'll see in a minute is that iterating over the map does still require explicit synchronization.) Note that this doesn't make incrementCount() atomic, but it does make it safe. That is, concurrent calls to incrementCount() will never leave the map in a corrupted state. But they might 'miss a count' from time to time. For example, two threads could concurrently read a current value of, say, 2 for a particular query, both independently increment it to 3, and both set it to 3, when in fact two queries have been made. Generally in the context of counting queries, we'd probably live with this: it's quite unlikely that two clients are making the selfsame query at exactly the same time, and even if they were, we wouldn't really care about missing the odd count here and there in order to improve performance. In this example, we can improve concurrency in a single line by replacing our synchronized hash map with a ConcurrentHashMap:
private Map<String,Integer> queryCounts = new ConcurrentHashMap<String,Integer>(1000);
Note that our incrementCount() will still have the same semantics: that is, it will never leave the map in an inconsistent state, but it could still miss a count in an unlucky case.
In our case, the interesting methods are the replace() methods, which are effectively compare-and-set operations for a map. So we can implement our incrementCount() method as follows. Note that we do now need to change the signature of our queryCounts map and declare it as a ConcurrentMap:
public final class MyServlet extends MyAbstractServlet { private ConcurrentMap<String,Integer> queryCounts = new ConcurrentHashMap<String,Integer>(1000); private void incrementCount(String q) {
Integer oldVal, newVal; do { oldVal = queryCounts.get(q); newVal = (oldVal == null) ? 1 : (oldVal + 1); } while (!queryCounts.replace(q, oldVal, newVal)); } }
This code is very similar to the code to update an AtomicInteger: we read the current value of the count, calculate the new count, and then say to the ConcurrentHashMap: "please map this key to this new value, if and only if the previously mapped value was this". If the call returns false to say that we were wrong about the previously mapped value, indicating in effect that another thread has "snuck in", then we simply loop round and try again. As with AtomicInteger updates, this is very efficient because we rarely expect another thread to sneak in, and where it does, we can keep hold of the CPU rather than having to sleep while the other thread releases the lock.
the map in a "safe state", reflecting at the very least the state of the map at the time iteration began. This is both good news and bad news:
Good news: it is perfect for cases where we want iteration not to affect concurrency, at the expense of possibly missing an update while iterating (e.g. in our imaginary web server, while iterating in order to persist the current query counts to a database: we probably wouldn't care about missing the odd count); Bad news: because there's no way to completely lock a ConcurrentHashMap, there's no easy option for taking a "snapshot" of the map as a truly atomic operation.
Garbage collection
Reference counting is a form of garbage collection whereby each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero. Compared to tracing garbage collection, reference counting guarantees that objects are destroyed as soon as they become unreachable (assuming that there are no reference cycles), and usually only accesses memory which is either in CPU caches, in objects to be freed, or directly pointed by those, and thus tends to not have significant negative side effects on CPU cache and virtual memory operation. There are some disadvantages to reference counting:
If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems using reference counting (like the one in CPython) use specific cycle-detecting algorithms to deal with this issue.[9] Another strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in that when the referent object becomes garbage, any weak reference to it lapses, rather than being permitted to remain dangling, meaning that it turns into a predictable value, such as a null reference. In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, in the common case, when a reference is copied from an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference. In
the programming language C++, this technique is readily implemented and demonstrated with the use of const references. Reference counting in C++ is usually implemented using "smart pointers" whose constructors, destructors and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new reference (which would increase the reference count on entry into the function and decrease it on exit). Instead the function receives a reference to the smart pointer which is produced inexpensively.
When used in a multithreaded environment, these modifications (increment and decrement) may need to be atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they have to be emulated with software algorithms. It is possible to avoid this issue by adding per-thread or per-CPU reference counts and only accessing the global reference count when the local reference counts become or are no longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction in exchange for not having a global reference count at all), but this adds significant memory overhead and thus tends to be only useful in special cases (it's used, for example, in the reference counting of Linux kernel modules). Naive implementations of reference counting do not in general provide realtime behavior, because any pointer assignment can potentially cause a number of objects bounded only by total allocated memory size to be recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating the freeing of objects whose reference count dropped to zero to other threads, at the cost of extra overhead.
Thread Local
contains these methods: Purpose Returns the value for the current thread Sets a new value for the current thread Used to return an initial value (if ThreadLocal is subclassed) In JDK 5 only - used to delete the current thread's value (for clean-up only)
The simplest way to use a ThreadLocal object is to implement it as a singleton. Here's an example in which the value stored in the ThreadLocal is a List:
public class MyThreadLocal { private static ThreadLocal tLocal = new ThreadLocal();
public static void set(List list) { tLocal.set(list); } public static List get() { return (List) tLocal.get(); } . . .
The first time you use this technique, it may seem a bit like magic, but behind the scenes, the local data is simply fetched using a unique ID of the thread. This class provides thread-local variables. These variables differ from their normal counterparts in that each thread that accesses one (via its get or set method) has its own, independently initialized copy of the variable. ThreadLocal instances are typically private static fields in classes that wish to associate state with a thread (e.g., a user ID or Transaction ID). For example, the class below generates unique identifiers local to each thread. A thread's id is assigned the first time it invokes UniqueThreadIdGenerator.getCurrentThreadId() and remains unchanged on subsequent calls.
import java.util.concurrent.atomic.AtomicInteger; public class UniqueThreadIdGenerator { private static final AtomicInteger uniqueId = new AtomicInteger(0); private static final ThreadLocal < Integer > uniqueNum = new ThreadLocal < Integer > () { @Override protected Integer initialValue() { return uniqueId.getAndIncrement(); } }; public static int getCurrentThreadId() { return uniqueId.get(); } } // UniqueThreadIdGenerator
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).