Ai Complete Notes

CHAPTER 1 : AI And Internal Representation INTRODUCTION The problem of defining artificial intelligence becomes one of defining intelligence itself.
. What is intelligence..? Is it a single faculty? Or is it a collection of properties..? Is it a learned faculty.? Or is it something already existed.? Can it be observed in behavior.? Or, do we have some scale to measure it? These questions, though answers of which may be diverse nature, help in understanding and exploring the limits of AI. HISTORY Intelligence forms the foundation of all human technology and in fact all human civilization. But there was a feeling that the human efforts to learn knowledge constitute a transgression against the laws of God or nature. E.g. Eve in Bible, Prometheus in Greek mythology. The logical starting point of the history in AI dates back to Aristotle. He formalized the insights, wonders and fears regarding nature with careful analysis into a disciplined thought. For him, the study of thought it self was the basis of all knowledge. In his Logic, he investigated whether certain propositions can be said to be true because they are related to other things that are known to be true. Gottlob Frege, Bertrand Russel, Kurt Godel, Alan Turing etc followed this school of thought. The major development, which drastically changed the world view was the discovery of Copper Niccus - that the earth is not the center of universe but is a just a component of it. Though it was against the practiced dogmas and revered religious beliefs, it was a new realization that o Our ideas about world may be fundamentally different from its appearance. o There is a gap between the human mind and its surrounding realities. o There is a gap between idea about things and things about themselves. The argument is thato We could separate mind and physical world. o It is necessary to find way to connect these. The accepted view is that, though they are separate, they are not fundamentally different. Mental process, like physical process, can be characterized using formal mathematics or logic. LOGIC BASED INTELLIGENCE As thinking has become a form of computation, its formalization and mechanization became necessary. In 1887, Leibniz introduced a system of formal logic and constructed a machine for automating its calculation. Eulers discovery of graph theory, through the Kongsberg problem, introduced the concept of state space representation. George Boole, through his Boolean algebra, made the mathematical formalization of the laws of logic through AND, OR & NOT. Gottlob Frege created a language, called First Order Predicate Calculus, which later used as a representation techniques in AI.
Page 1 of 98
Bertrand Russel, in his Principia Mathematica, developed theoretical foundation of AI by treating mathematics as a formal system. Alfred Tarski created a Theory of Reference wherein the well-formed formulae can be referred. Though study of science and mathematics were the prerequisites for the formal study of AI, by the invention of digital computers it was changed. Using its potential to provide memory and processing power, we can treat o Intelligence as a form information processing. o Search as a problem solving methodology. o Knowledge can be put in a representational way, which can be manipulated using algorithms.
THINKING MACHINES Can machines think? Are humans machines? Can machines think like humans? How would we know whether a computer is thinking? CAN COMPUTERS THINK? Some common answers: o No (dualist/mystic): Computers lack "mental stuff". They don't have intuitions, feelings, phenomenology. (Soul?) o No (neurophysiology critical): Even if their behavior is arbitrary close, biology is essential. o No (beyond our capabilities): Not impossible in practice, but too complex. There are limits on our self-knowledge which will prevent us from creating a thinking computer. o Yes (but not in our lifetimes): Too complex. Practical obstacles may be insurmountable. But with better science and technology, maybe. o Yes (functionalist): Programs/computers will become smarter without a clear limit. For all practical purposes, they will 'think' because they will perform the FUNCTIONS of thinking. (The standard AI answer.) o Yes (extreme functionalist; back to mystic): Computers already think. All matter exhibits mind in different aspects and degrees. TURING TEST Allan Turing, in 1950, considered the question of whether a machine could actually make to think. Turing test measures the performance of an intelligent machine.
Page 2 of 98
LOGIC BASED INTELLIGENCE VS. AGENT BASED INTELLIGENCE We have looked at intelligence as a logical inference and logic as a knowledge representation technique. This is a typical western thought starting from Aristotle. Of late, there have been instances of questioning this school of thought. This argument is based on simple facts like o Formation of Language has no logical reasoning. It has much to do with cultural and social situation. o Working of brain is based on the inputs of neurons (So we have Artificial Neural Network). o Species adapt to an environment (So we have Genetic Algorithm). o Working of social systems is based on the performance of autonomous individual agents (So we have Intelligent Agents). These examples show two things o Intelligence is emerged in the process. o Intelligence is reflected by the collective behavior of agents. Agent-oriented and emergent view of intelligence has the following properties o Agents are autonomous or semi-autonomous. Each agent has a specified responsibility to undertake and is ignorant about what others are doing. o Each agent is sensitive to its own surrounding environment and has no knowledge of the full domain. o Agents interact one another. The society of agents is structured, helping in the solving of global problem. o The cooperative interaction of the agents results in the emergence of intelligence. FOUNDATIONS OF AI Different fields have contributed to AI in the form of ideas, viewpoints and techniques. o Philosophy: Logic, reasoning, mind as a physical system, foundations of learning, language and rationality. o Mathematics: Formal representation and proof algorithms, computation, (un) decidability, (in) tractability, probability. o Psychology: adaptation, phenomena of perception and motor control. o Economics: formal theory of rational decisions, game theory. o Linguistics: knowledge representation, grammar.
Page 3 of 98
o Neuroscience: physical substrate for mental activities. o Control theory: homeostatic systems, stability, and optimal agent design. A BRIEF HISTORY What happened after WWII? o 1943: Warren Mc Culloch and Walter Pitts: a model of artificial Boolean neurons to perform computations. First steps toward connectionist computation and learning (Hebbian learning). Marvin Minsky and Dann Edmonds (1951) constructed the first neural network computer. o 1950: Alan Turings Computing Machinery and Intelligence First complete vision of AI. The birth of AI (1956) o Darmouth Workshop bringing together top minds on automata theory, neural nets and the study of intelligence. Allen Newell and Herbert Simon: The logic theorist (first non numerical thinking program used for theorem proving). For the next 20 years the field was dominated by these participants. o Great expectations (1952-1969) Newell and Simon introduced the General Problem Solver. Imitation of human problem-solving Arthur Samuel (1952) investigated game playing (checkers) with great success. John McCarthy(1958) Inventor of Lisp (second-oldest high-level language) Logic oriented, Advice Taker (separation between knowledge and reasoning). Great expectations continued o Marvin Minsky (1958) Introduction of microworlds that appear to require intelligence to solve: e.g. blocks-world. Anti-logic orientation, society of the mind. o Collapse in AI research (1966 - 1973) Progress was slower than expected. Unrealistic predictions. Some systems lacked scalability. Combinatorial explosion in search. Fundamental limitations on techniques and representations. Minsky and Papert (1969) Perceptrons. AI revival through knowledge-based systems (1969-1970). o General-purpose vs. domain specific E.g. the DENDRAL project. First successful knowledge intensive system. o Expert systems MYCIN to diagnose blood infections Introduction of uncertainty in reasoning. o Increase in knowledge representation research.
Page 4 of 98
Logic, frames, semantic nets, AI becomes an industry (1980 - present) Connectionist revival (1986 - present) o Parallel distributed processing (RumelHart and McClelland, 1986); back propagation. AI becomes a science (1987 - present) o In speech recognition: hidden markov models. o In neural networks. o In uncertain reasoning and expert systems: Bayesian network formalism. The emergence of intelligent agents (1995 - present) o The whole agent problem: How does an agent act/behave embedded in real environments with continuous sensory inputs.
STATE OF THE ART Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997. ALVINN: No hands across America (driving autonomously 98% of the time from Pittsburgh to San Diego). DART: During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program that involved up to 50,000 vehicles, cargo, and people. NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft. Proverb solves crossword puzzles better than most humans. AI IN LOGIC PERSPECTIVE AI is the study of mental faculties through the use of computational models. It is on the premise that what brain does may be thought of as a kind of computation. Though what brain does easily takes enormous efforts to be done by a machine. E.g. vision. INTERNAL REPRESENTATION In order to act intelligently, a computer must have the knowledge about the domain of interest. Knowledge is the body of facts and principles gathered or the act, fact, or state of knowing. This knowledge needs to be presented in a form, which is understood by the machine. This unique format is called internal representation. Thus plain English sentences could be translated into an internal representation and they could be used to answer based on the given sentences. PROPERTIES OF INTERNAL REPRESENTATION Internal representation must remove all referential ambiguity. o Referential ambiguity is the ambiguity about what the sentence refers to. o E.g. Raj said that Ram was not well. He must be lying. o Who does he refer to? Internal representation should avoid word-sense ambiguity. o Word-sense ambiguities arise because of multiple meaning of words. o E.g. Raj caught a pen. Raj caught a train. Raj caught fever. Internal representation must explicitly mention functional structure. o Functional structure is the word order used in the language to express an idea. o E.g. Ram killed Ravan. Ravan was killed by Ram.
Page 5 of 98
o Thus internal representation may not use the order of the original sentence. Internal representation should be able handle complex sentence without losing meaning attached with it.
PREDICATE CALCULUS Predicate Calculus is an internal representation methodology which helps us in deducing more results from the given propositions (statements). Predicate calculus accesss individual components of a proposition and represent the proposition. For example, the sentence Raj came late on Sunday can be represented in predicate calculus as: (came-late Raj Sunday) Here came-late is a predicate that describes the relation between a person and a day. Raj came late on a rainy Sunday can be represented as: (came-late Raj Sunday) & (inst Sunday rainy) Predicate permits us to break a statement down into component parts namely, objects, a characteristic of the object, or some assertion about the object. SYNTAX OF PREDICATE CALCULUS Predicate and Arguments o In predicate calculus, a proposition is divided into two parts: Arguments (or objects) Predicate (or assertion) o The arguments are the individual or objects an assertion is made about. The predicate is the assertion made about them. o In an English language sentence, objects are nouns that serve as subject and object of the sentence and predicate would be the verb or part of the verb. o For example the proposition: Vinod likes apple would be stated as: (likes Vinod apple) o Where likes is the predicate and Vinod and apple are the arguments. o In some cases, the proposition may not have any predicates. For example: Anita is a woman i.e. (inst Anita woman). Constants o Constants are fixed value terms that belong to a given domain. o They are denoted by numbers and words. E.g. 123, abc. Variables o In predicate calculus, letters may be substituted for the arguments. o The symbols x or y could be used to designate some object or individual. o The example Vinod likes apple could be expressed in variable form if x = Vinod and y = apple. Then the proposition becomes: (likes x, y) o If variables are used, then the stated proposition must be true for any names substituted for the variables. Instantiation o Instantiation is the process of assigning the name of a specific individual or object to a variable. o That object or individual becomes an instance of that variable. o In the previous example, supplying Vinod for x and apple for y is a case of instantiation. Connectives o There are four connectives used in predicate calculus. o They are not, and, or and if.
Page 6 of 98
o If p and q are formulas then (and p, q), (or p, q), (not p) and (if p, q) are also formulas. o They can be expressed in truth tables. o (not p): p (not p) T F F T o (and p, q): p q (and p, q) T T T T F F F T F F F F o (or p, q) p q (or p, q) T T T T F T F T T F F F o (if p, q): p q (if p, q) T T T T F F F T T F F T Quantifiers o A quantifier is a symbol that permits us to state the range or scope of the variables in a predicate logic expression. o Two quantifiers are used in logic: The universal quantifier for all. E.g. (forall (x) f) for a formula f. The existential quantifier exists. E.g. (exists (x) f) for a formula f. Function applications o It consists of a function which takes zero or more arguments. o E.g. friend-of (x). All Maharashtrians are Indian citizens could be expressed as: o (forall (x) (if Maharastrian(x) Indian citizen(x)). Every car has a wheel could be expressed as: o (forall (x) (if (Car x) (exists (y) wheel-of (x y))).
THE PREDICATE CALCULUS CONSISTS OF: A set of constant terms. A set of variables. A set of predicates, each with a specified number of arguments. A set of functions, each with a specified number of arguments. The connectives - if, and, or and not. The quantifiers - exists and forall. The terms used in predicate calculus are:
Page 7 of 98
o Constant terms. o Variables. o Functions applied to the correct number of terms. The formulas used in predicate calculus are: o A predicate applied to the correct number of terms. o If p and q are formulas then (if p, q), (and p, q), (or p, q) and (not p). o If x is a variable, and p is a formula, then (exists(x) p), and (forall (x) p). In predicate calculus, the initial facts from which we can derive more facts are called axioms. The facts we deduce from the axioms are called theorems. The set of axioms are not stable and in fact change over time as new information (axioms) comes.
INFERENCE RULES From a given set of axioms, we can deduce more facts using inference rules. The important inference rules are: o Modus ponens: From p and (if p, q) infer q. o Chain rule: From (if p, q) and (if q, r) infer (if p, r). o Substitution: if p is a valid axiom, then a statement derived using consistent substitution of propositions is also svalid. o Simplification: From (and p, q) infer p. o Conjunction: From p and q infer (and p q). o Transposition: From (if p, q) infer (if (not q) (not p)). o Universal instantiation: if something is true of everything, then it is true for any particular thing. o Abduction: From q and (if p, q) infer p. (Abduction can lead to wrong conclusions. Still, it is very important as it gives lot explanation. For example: medical diagnosis.) o Induction: From (P, a), (P, b) infer (forall (x) (P, x)). (Induction leads to learning.) EXERCISE: EXPRESS THE FOLLOWING IN PREDICATE CALCULUS: Roses are red. o (if (inst x rose) (color x red)). Violets are blue. o (if (inst x violet) (color x red)). Every chicken hatched from an egg. o (forall (x) (if (chicken x) (exists (y) hatched-from(x y))). Some language is spoken by everyone in this class. o (forall (x) (if (belong-to-class x) (exists (y) speak-language(x y))). If you push anything hard enough, it will fall over. o (forall (x) (if (push-hard x) (fall-over x)). Everybody loves somebody sometime. o (forall (x) ((exists (y) loves-sometime(x y))). Anyone with two or more spouses is a bigamist. o (forall (x) ((inst x have-two-or-more-wife) (inst x bigamist)))
Page 8 of 98
ALTERNATIVE NOTATIONS Knowledge, which is represented in the internal representation technique predicate calculus, could be represented in a number of alternative notations. The important representations are: o Semantic networks. One of the oldest and easiest to understand knowledge representation schemes is the semantic network. They are basically graphical depictions of knowledge that show hierarchical relationships between objects. For example Sachin is a cricketer i.e. (inst Sachin cricketer), can be represented in associative network as Cricketer
inst
Sachin A semantic network is made up of a number of ovals or circles called nodes. Nodes represent objects and descriptive information about those objects. Objects can be any physical item, concept, event or an action. The nodes are interconnected by links called arcs. These arcs show the relationships between the various objects and descriptive factors. The arrows on the lines point from an object to its value along the corresponding arc. From the viewpoint of predicate calculus, associative networks replace terms with nodes and relation with labeled directed arcs. The semantic network is a very flexible method of knowledge representation. There are no hard rules about knowledge in this form. Semantic networks can show inheritances in the sense that it can explain how elements of specific classes inherit attributes and values from more general classes in which they are included. The isa relation is a subset relation. The cricketers are a subset of the set of sportsman.
Page 9 of 98
Cricketer isa inst
Sachin
Sportsman
E.g. (isa cricketer sportsman). The instance relation corresponds to the relation element-of. Sachin is an element of the set of cricketers. Thus he is an element of all the supersets of all cricketers. The isa relation corresponds to the relation subset of. Cricketers are a subset of sportsmen and hence cricketers inherit al the properties of sportsmen.
The predicate calculus lacks a backward pointer resulting a long search for retrieving information. Thus the predicate calculus along with an indexing (pointing) scheme is a much better internal representation scheme than semantic networks as it has connectives and quantifiers.
Page 10 of 98
o Slot Assertion Notation In a slot assertion notation various arguments, called slots, of predicate are expressed as separate assertions. Slot assertion notation is a special type of predicate calculus representation. For example (catch-object Sachin ball) can be expressed as (inst catch1 catch-object) // catch1 is a one type of catching. (catcher catch1 Sachin) // Sachin did the catching. (caught catch1 ball) // he caught the ball. o Frame notation. Frame notation combines the different slots of the slot assertion notation. Thus we have, (catch-object catch1 (catcher Sachin) (caught ball)) Here we have constructed a single structure called a frame that includes all the information. EXERCISE: CONVERT THE FOLLOWING TO FIRST-ORDER PREDICATE LOGIC USING THE PREDICATES INDICATED: swimming_pool(X) steamy(X) large(X) unpleasant(X) noisy(X) place(X) All large swimming pools are noisy and steamy places. All noisy and steamy places are unpleasant. All noisy and steamy places except swimming pools are unpleasant. The swimming pool is small and quiet. ANSWERS: All large swimming pools are noisy and steamy places. o (forall (x) (if (and large(X) swimming_pool(X)) (and noisy(X) (and (steamy(X) place(X)))). All noisy and steamy places are unpleasant. o (forall (x) (if (and noisy(X) (and (steamy(X) place(X))) unpleasant(X))). All noisy and steamy places except swimming pools are unpleasant. o (forall (x) (if ((not swimming_pool(x)) and noisy(X) (and (steamy(X) place(X))) unpleasant(X)))). The swimming pool is small and quiet. o (and swimming_pool(x) and (not large(X)) (not noisy(X))) CHAPTER 2 : LISP BRIEF HISTORY Lisp (the acronym stands for LISt Processor) is the second oldest programming language still in use (after FORTRAN), invented by Ram McCarthy at MIT in 1958. For many years it could only be run on special purpose and rather expensive hardware. Until the mid '80s Lisp was more a family of dialects than a single language.
Page 11 of 98
In 1986 an ANSI subcommittee was formed to standardize these dialects into a single Common Lisp, the result being the first Object Oriented language to become standardized, in 1994. Famous book: ANSI Common Lisp by Paul Graham, Prentice Hall 1996.
LIST Lists are surrounded by parentheses. Anything surrounded by parentheses is a list. Here are some examples of things that are lists: o (1 2 3 4 5) o (a b c) o (cat 77 dog 89) What if I put parentheses around nothing? What if I put parentheses around another list? In both cases the answer is the same. You still have a list. Atoms are separated by white space or parentheses. A name for the things that appear between the parentheses the things that are not themselves lists, but rather (in our examples so far) words and numbers. These things are called atoms. Accordingly, these words and numbers are all atoms: o 1 o 25 o 342 o mouse o factorial FORM A form is meant to be evaluated. A form can be either an atom or a list. The important thing is that the form is meant to be evaluated. A number is an atom. (Its value is constant for obvious reasons.) Lisp does not store a value for a number the number is said to be self-evaluating. We are going to introduce a new term without a complete definition. For now, think of a symbol as an atom that can have a value. If a form is a list, then the first element must be either a symbol or a special form called a lambda expression. The symbol must name a function. In Lisp, the symbols +, -, *, and / name the four common arithmetic operations: addition, subtraction, multiplication, and division. Each of these symbols has an associated function that performs the arithmetic operation. So when Lisp evaluates the form (+ 2 3), it applies the function for addition to the arguments 2 and 3, giving the expected result 5. FUNCTION A function is applied to its arguments. Lisp, when given a list to evaluate, treats the form as a function call. For example: o (+49) 13
Page 12 of 98
We can apply arithmetic operations to these numbers. The syntax for doing so is o Left parenthesis o Name of operator o Various arguments to that operator, each preceded by white space o Right parenthesis For example: o (+ 1 2 3)
NESTED CALCULATIONS We can nest one function call within another one, for example: o (+ (* 2 3) (* 4 5 6)) There is an unambiguous rule for evaluating function calls, as follows: o Process the arguments to the function, in order ("from left to right"). o Evaluate each argument in turn. o Once all the arguments have been evaluated, call the original function with these values. o Return the result. So, to evaluate (+ (* 2 3) (* 4 5 6)) o We start by noting that we have a call to the function + with arguments (* 2 3) and (* 4 5 6). o We evaluate the first argument, namely (* 2 3) We note that this is itself a function call (the function is * and its arguments are 2 and 3). We must therefore evaluate the first argument to * - this is the number 2 The number 2 evaluates to itself We next evaluate the second argument to * - this is the number 3 The number 3 evaluates to itself We can now call the function * with arguments 2 and 3 The result of this function call is 6 o The value of the first argument to the function + is therefore 6. o Similarly the value of the second argument to the function + is 120. o We can now call the function + with arguments 6 and 120 and finally return the value 126. Lisp always does the same thing to evaluate a list form: o Evaluate the arguments, from left to right. o Get the function associated with the first element. o Apply the function to the arguments. Remember that an atom can also be a Lisp form. When given an atom to evaluate, Lisp simply returns its value: o 17.95 17.95 Here are a few more examples: o (atom 123) T o (numberp 123) T o (atom :foo)
Page 13 of 98
T Atom and numberp are predicates. Predicates return a true or false value. NIL is the only false value in Lisp everything else is true. A function can return any number of values. Sometimes we did like to have a function return several values. For now, let's see what happens when Lisp evaluates a VALUES form: o (values 1 2 3 :hi "Hello") 1 2 3 :HI "Hello"
SETQ SETQ evaluates a symbol form by retrieving its variable value. (setq his-name "Rahul") o "Rahul" his-name o "Rahul" (setq a-variable 57) o 57 a-variable o 57 SETQ's first argument is a symbol. This is not evaluated. The second argument is assigned as the variable's value. SETQ returns the value of its last argument. SETQ performs the assignments from left to right, and returns the rightmost value. The SETQ form can actually take any even number of arguments, which should be alternating symbols and values: o (setq month "July" day 12 year 2005) 1954 o month "July" o day 12 o year 2005 SETQ performs the assignments from left to right, and returns the rightmost value. LET The LET form looks a little more complicated than what we have seen so far. The LET form uses nested lists, but because it's a special form, only certain elements get evaluated. (let ((a 3) (b 4) (c 5)) (* (+ a b) c)) o 35
Page 14 of 98
a o Error: Unbound variable. The above LET form defines values for the symbols A, B, and C, then uses these as variables in an arithmetic calculation. In general, LET looks like this: o (let (bindings) forms) o where bindings is any number of two-element lists each list containing a symbol and a value and forms is any number of Lisp forms. The forms are evaluated, in order, using the values established by the bindings. If you define a variable using SETQ and then name the same variable in a LET form, the value defined by LET supersedes the other value during evaluation of the LET: o (setq a 89) 89 o a 89 o (let ((a 3)) (+ a 2)) 5 o a 89 Unlike SETQ, which assigns values in left-to-right order, LET binds variables all at the same time: o (setq w 77) 77 o (let ((w 8) (x w)) (+ w x)) 85 LET bounds w to 8 and x to w. Because these bindings happened at the same time, w still had its value of 77.
COND The COND macro lets to evaluate Lisp forms conditionally. Like LET, COND uses parentheses to delimit different parts of the form. Consider these examples: o (let ((a 1) (b 2) (c 1) (d 1)) (cond ((eql a b) 1) ((eql a c) "First form" 2) ((eql a d) 3))) 2 EQL returns T if its two arguments are identical, or the same. Only two of the three tests are executed. The first, (EQL A B), returned NIL. Therefore, the rest of that clause (containing the number 1 as its only form) was skipped. The second clause tested (EQL A C), which was true. Because this test returned a non-NIL value, the remainder of the clause (the two atomic forms, "First form" and 2) was evaluated, and the value of the last form was returned as the value of the COND, which was then returned as the value of the enclosing LET. The third clause was never tested, since an earlier clause had already been chosen clauses are tested in order. Conventional use of COND uses T as the test form in the final clause. This guarantees that the body forms of the final clause get evaluated if the tests fail in all of the other clauses.
Page 15 of 98
You can use the last clause to return a default value or perform some appropriate operation. Here's an example: o (let ((a 32)) (cond ((eql a 13) "An unlucky number") ((eql a 99) "A lucky number") (t "Nothing special about this number"))) "Nothing special about this number". Sometimes we did like to suppress Lisp's normal evaluation rules. One such case is when we'd like a symbol to stand for itself, rather than its value, when it appears as an argument of a function call: o (setq a 97) 97 o a 97 o (setq b 23) 23 o (setq a b) 23 o a 23 o (setq a (quote b)) B o a B The difference is that B's value is used in (SETQ A B), whereas B stands for itself in (SETQ A (QUOTE B)). The QUOTE form is so commonly used that Lisp provides a shorthand notation: o (QUOTE form) = 'form The symbol means that the two Lisp forms are equivalent.
CONS CONS is the most basic constructor of lists. It is a function, so it evaluates both of its arguments. The second argument must be a list or NIL. (cons 1 nil) o (1) (cons 2 (cons 1 nil)) o (2 1) (cons 3 (cons 2 (cons 1 nil))) o (3 2 1) CONS adds a new item to the beginning of a list. The empty list is equivalent to NIL. ( ) = NIL So we could also have written: o (cons 1 ( )) (1) o (cons 2 (cons 1 ( ))) (2 1) o (cons 3 (cons 2 (cons 1 ( )))) (3 2 1)
Page 16 of 98
NIL is one of two symbols in Lisp that isn't a keyword but still has itself as its constant value. T is the other symbol that works like this. The fact that NIL evaluates to itself, combined with ( ) NIL, means that you can write ( ) rather than (QUOTE ( )). Otherwise, Lisp would have to make an exception to its evaluation rule to handle the empty list.
LIST As we have noticed, building a list out of nested CONS forms can be a bit tedious. The LIST form does the same thing in a more perspicuous manner: o (list 1 2 3) (1 2 3) LIST can take any number of arguments. Because LIST is a function, it evaluates its arguments: o (list 1 2 :hello "there" 3) (1 2 :HELLO "there" 3) o (let ((a :this) (b :and) (c :that)) (list a 1 b c 2)) (:THIS 1 :AND :THAT 2) FIRST AND REST If we think of a list as being made up of two parts the first element and everything else then you can retrieve any individual element of a list using the two operations, FIRST and REST. (setq my-list (quote (1 2 3 4 5))) o (1 2 3 4 5) (first my-list) o 1 (rest my-list) o (2 3 4 5) (first (rest my-list)) o 2 (rest (rest my-list)) o (3 4 5) (first (rest (rest my-list))) o 3 (rest (rest (rest my-list))) o (4 5) (first (rest (rest (rest my-list)))) o 4 NAMING AND IDENTITY A symbol is just a name o It can stand for itself. o This makes it easy to write certain kinds of programs in Lisp. o For example, if we want your program to represent relationships in your family tree, we can make a database that keeps relationships like this: (father Ram Arun) (son Ram Dev) (father Ram Sangita)
Page 17 of 98
(mother Lakshmi Arun) (mother Lakshmi Sangita) o Each relationship is a list. o (father Ram Arun) means that Ram is Arun's father. o Every element of every list in our database is a symbol. o Our Lisp program can compare symbols in this database to determine, for example, that Dev is Arun's grandfather. o If we try to write a program like this in another language a language without symbols we have to decide how to represent the names of family members and relationships, and then create code to perform all the needed operations reading, printing, comparison, assignment, etc. o This is all built into Lisp, because symbols are a data type distinct from the objects they might be used to name. A symbol is always unique o Every time our program uses a symbol, that symbol is identical to every other symbol with the same name. We can use the EQ test to compare symbols: (eq 'a 'a) T (eq 'david 'a) NIL (eq 'Lakshmi 'Sangita) T (setq zzz 'sleeper) SLEEPER (eq zzz 'sleeper) T o Notice that it does not matter whether we use uppercase or lowercase letters in your symbol names. o Internally, Lisp translates every alphabetic character in a symbol name to a common case usually upper. A symbol can name a value o Although the ability for a Lisp symbol to stand for itself is sometimes useful, a more common use is for the symbol to name a value. o This is the role played by variable and function names in other programming languages. o A Lisp symbol most commonly names a value or when used as the first element of a function call form a function. o What's unusual about Lisp is that a symbol can have a value as a function and a variable at the same time: (setq first 'number-one) NUMBER-ONE (first (list 3 2 1)) 3 first NUMBER-ONE o Note how FIRST is used as a variable in the first and last case, and as a function (predefined by Lisp, in this example) in the second case.
Page 18 of 98
o Lisp decides which of these values to use based on where the symbol appears. o When the evaluation rule requires a value, Lisp looks for the variable value of the symbol. o When a function is called for, Lisp looks for the symbol's function. o A symbol can have other values besides those it has as a variable or function. o A symbol can also have values for its documentation, property list, and print name. o A symbol's documentation is text that we create to describe a symbol. o We can create this using the DOCUMENTATION form or as part of certain forms, which define a symbol's value. o Because a symbol can have multiple meanings, we can assign documentation to each of several meanings, for example as a function and as a variable. A value can have more than one name o A value can have more than one name. o That is, more than one symbol can share a value. o Other languages have pointers that work this way. o Lisp does not expose pointers to the programmer, but does have shared objects. o An object is considered identical when it passes the EQ test. Consider the following: (setq L1 (list 'a 'b 'c)) (A B C) (setq L2 L1) (A B C) (eq L1 L2) T (setq L3 (list 'a 'b 'c)) (A B C) (eq L3 L1) NIL o Here, L1 is EQ to L2 because L1 names the same value as L2. o In other words, the value created by the (LIST 'A 'B 'C) form has two names, L1 and L2. The (SETQ L2 L1) form says, "Make the value of L2 be the value of L1." o Not a copy of the value, but the value. o So L1 and L2 share the same value the list (A B C, which was first assigned as the value of L1. o L3 also has a list (A B C) as its value, but it is a different list than the one shared by L1 and L2. o Even though the value of L3 looks the same as the value of L1 and L2, it is a different list because it was created by a different LIST form. o So (EQ L3 L1) NIL because their values are different lists, each made of the symbols A, B, and C.
ESSENTIAL FUNCTION DEFINITIONS DEFUN o DEFUN defines named functions. o We can define a named function using the DEFUN form:
Page 19 of 98
(defun secret-number (the-number) (let ((the-secret 37)) (cond ((= the-number the-secret) 'that-is-the-secret-number) ((< the-number the-secret) 'too-low) ((> the-number the-secret) 'too-high)))) SECRET-NUMBER o The DEFUN form has three arguments: The name of the function: SECRET-NUMBER. A list of argument names: (THE-NUMBER), which will be bound to the function's parameters when it is called. The body of the function: (LET ...). o (secret-number 11) TOO-LOW o (secret-number 99) TOO-HIGH o (secret-number 37) THAT-IS-THE-SECRET-NUMBER o Of course, we can define a function of more than one argument: (defun my-calculation (a b c x) (+ (* a (* x x)) (* b x) c)) MY-CALCULATION (my-calculation 3 2 7 5) 92 LAMBDA o LAMBDA defines anonymous functions. o Sometimes the function you need is so trivial or so obvious that you don't want to have to invent a name or worry about whether the name might be in use somewhere else. o For situations like this, Lisp lets we create an unnamed, or anonymous, function using the LAMBDA form. A LAMBDA form looks like a DEFUN form without the name: (lambda (a b c x) (+ (* a (* x x)) (* b x) c)) o We can't evaluate a LAMBDA form; it must appear only where Lisp expects to find a function normally as the first element of a form: (lambda (a b c x) (+ (* a (* x x)) (* b x) c)) Error ((lambda (a b c x) (+ (* a (* x x)) (* b x) c)) 3 2 7 5) 92 DEFMACRO o DEFMACRO defines named macros. o The macro body returns a form to be evaluated. In other words, we need to write the body of the macro such that it returns a form, not a value. o Here are a couple of simple macros to illustrate most of what you need to know: (defmacro setq-literal (place literal) (setq place `literal)) SETQ-LITERAL (setq-literal a b) B a B o SETQ-LITERAL works like SETQ, except that neither argument is evaluated. o So in our call to (SETQ-LITERAL A B) above, here's what happens:
Page 20 of 98
Bind PLACE to the symbol A. Bind LITERAL to the symbol B. Evaluate the body `(SETQ ,PLACE ',LITERAL), following these steps: Evaluate PLACE to get the symbol A. Evaluate LITERAL to get the symbol B. Return the form (SETQ A 'B). Evaluate the form (SETQ A 'B). o Most forms create only one value. o A form typically returns only one value. o Lisp has only a small number of forms which create or receive multiple values. VALUES o VALUES create multiple (or no) values. o The VALUES form creates zero or more values: (values :this) :THIS (values :this :that) :THIS :THAT
DATA TYPES Lisp almost always does the right thing with numbers. (/ 1 3) o 1/3 (float (/ 1 3)) o 0.3333333333333333 Characters give Lisp something to read and write. Basic Lisp I/O uses characters. The READ and WRITE functions turn characters into Lisp objects and vice versa. READ-CHAR and WRITE-CHAR read and write single characters. (read) a o A (read) #\a o A (read-char) a o #\a (write a) A o A (write #\a) #\a o #\a (write-char #\a) a o #\a (write-char a) o Error: Not a character You should notice that newline terminates READ input. This is because READ collects characters trying to form a complete Lisp expression. In the example, READ collects a symbol, which is terminated by the newline.
Page 21 of 98
The symbol could also have been terminated by a space, a parenthesis or any other character that can't be part of a symbol. In contrast, READ-CHAR reads exactly one character from the input. As soon as that character is consumed, READ-CHAR completes executing and returns the character. Lisp represents a single character using the notation #\char, where char is a literal character. Character Lisp Space #\Space Newline #\Newline Backspace #\Backspace Tab #\Tab Linefeed #\Linefeed Formfeed #\Page Carriage return #\Return Rubout or DEL #\Rubout Arrays o If we need to organize data in tables of two, three, or more dimensions, you can create an array: (setq a1 (make-array (3 4))) #2A((NIL NIL NIL NIL) (NIL NIL NIL NIL) (NIL NIL NIL NIL)) (setf (aref a1 0 0) (list element 0 0)) (ELEMENT 0 0) (setf (aref a1 1 0) (list element 1 0)) (ELEMENT 1 0) (setf (aref a1 2 0) (list element 2 0)) (ELEMENT 2 0) a1 #2A(((ELEMENT 0 0) NIL NIL NIL) ((ELEMENT 1 0) NIL NIL NIL) ((ELEMENT 2 0) NIL NIL NIL)) (aref a1 0 0) (ELEMENT 0 0) (setf (aref a1 0 1) pi) 3.141592653589793 (setf (aref a1 0 2) "hello") "hello" (aref a1 0 2) "hello" o An array's rank is the same as its number of dimensions. o We created a rank-2 array in the above example. o Lisp prints an array using the notation #rankA(...). o The contents of the array appear as nested lists, with the first dimension appearing as the outermost grouping, and the last dimension appearing as the elements of the innermost grouping. o To retrieve an element of an array, use AREF. AREF's first argument is the array; the remaining arguments specify the index along each dimension. o The number of indices must match the rank of the array.
Page 22 of 98
o Vectors are one-dimensional arrays. o We can create a vector using MAKE-ARRAY, and access its elements using AREF. o (setq v1 (make-array (3))) #(NIL NIL NIL) o (make-array 3) #(NIL NIL NIL) o (setf (aref v1 0) :zero) :ZERO o (setf (aref v1 1) :one) :ONE o (aref v1 0) :ZERO o v1 #(:ZERO :ONE NIL) o Lisp prints vectors using the slightly abbreviated form #(...), rather than #1A(...). o You can use either a single-element list or a number to specify the vector dimensions to MAKE-ARRAY -- the effect is the same. o We can create a vector from a list of values, using the VECTOR form: (vector 34 22 30) #(34 22 30) o This is similar to the LIST form, except that the result is a vector instead of a list. o We can use AREF to access the elements of a vector, or you can use the sequence-specific function, ELT: setf v2 (vector 34 22 30 99 66 77)) #(34 22 30 99 66 77) (setf (elt v2 3) :radio) :RADIO v2 #(34 22 30 :RADIO 66 77) Strings o Strings are vectors that contain only characters. o We already know how to write a string using the "..." syntax. o Since a string is a vector, you can apply the array and vector functions to access elements of a string. o We can also create strings using the MAKE-STRING function or change characters or symbols to strings using the STRING function. o (setq s1 "hello, there.") "hello, there." o (setf (elt s1 0) #\H)) #\H o (setf (elt s1 12) #\!) #\! o s1 "Hello, there!" o (string 'a-symbol) "A-SYMBOL"
Page 23 of 98
o (string #\G) "G" Symbols o Symbols are unique, but they have many values. o We know that a symbol has a unique identity. o A symbol is identical to any other symbol spelled the same way. o We also know that a symbol can have values as a variable and a function, and for documentation, print name, and properties. o A symbols property list is like a miniature database which associates a number of key / value pairs with the symbol. o For example, if our program represented and manipulated objects, we could store information about an object on its property list: (setf (get object-1 color) red) RED (setf (get object-1 size) large) LARGE (setf (get object-1 shape) round) ROUND (setf (get object-1 position) (on table)) (ON TABLE) (setf (get object-1 weight) 15) 15 (symbol-plist object-1) (WEIGHT 15 POSITION (ON TABLE) SHAPE ROUND SIZE LARGE COLOR RED) (get object-1 color) RED object-1 Error: no value o Note that OBJECT-1 doesn't have a value -- all of the useful information is in two places: the identity of the symbol, and the symbols properties. o This could be able to do in a much easier way using structures. Structures o Structures let us store related data. o A Lisp structure gives us a way to create an object which stores related data in named slots. o (defstruct struct-1 color size shape position weight) STRUCT-1 o (setq object-2 (make-struct-1 :size small :color green :weight 10 :shape square)) #S(STRUCT-1 :COLOR GREEN :SIZE SMALL :SHAPE SQUARE :POSITION NIL :WEIGHT 10) o (struct-1-shape object-2) SQUARE o (struct-1-position object-2) NIL o (setf (struct-1-position object-2) (under table)) (UNDER TABLE)
Page 24 of 98
o (struct-1-position object-2) (UNDER-TABLE) o In the example, we defined a structure type named STRUCT-1 with slots named COLOR, SHAPE, SIZE, POSITION, and WEIGHT. o Then we created an instance of a STRUCT-1 type, and assigned the instance to the variable OBJECT-2. o The rest of the example shows how to access slots of a structure instance using accessor functions named for the structure type and the slot name. o Example: (defstruct point x y z) POINT (defun distance-from-origin (point) (let* ((x (point-x point)) (y (point-y point)) (z (point-z point))) (sqrt (+ (* x x) (* y y) (* z z))))) DISTANCE-FROM-ORIGIN (defun reflect-in-y-axis (point) (setf (point-y point) (- (point-y point)))) REFLECT-IN-Y-AXIS (setf my-point (make-point :x 3 :y 4 :z 12)) #S(POINT X 3 Y 4 Z 12) (type-of my-point) POINT (distance-from-origin my-point) 13.0 (reflect-in-y-axis my-point) -4 my-point #S(POINT X 3 Y -4 Z 12) Type Information o Type information is apparent at runtime. o A symbol can be associated with any type of value at runtime. For cases where it matters, Lisp lets we query the type of a value. o (type-of 123) FIXNUM o (type-of 123456789000) BIGNUM o (type-of "hello, world") (SIMPLE-BASE-STRING 12) o (type-of toolbar) SYMBOL o (type-of (a b c)) CONS o TYPE-OF returns a symbol or a list indicating the type of its argument.
ESSENTIAL INPUT AND OUTPUT Read o READ accepts Lisp data. o READ turns characters into Lisp data.
Page 25 of 98
o So far, we've seen a printed representation of several kinds of Lisp data: symbols and numbers strings, characters, lists, arrays, vectors, and structures o The Lisp reader does its job according to a classification of characters. o The standard classifications are shown below: Standard Constituent Characters abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 !$%&*+-./:<=>?@[]^_{}~ <backspace> <rubout> Standard Terminating Macro Characters "'(),;` Standard Non-Terminating Macro Characters # Standard Single Escape Characters \ Standard Multiple Escape Characters | Standard Whitespace Characters <tab> <space> <page> <newline> <return> <linefeed> o If READ starts with a constituent character, it begins accumulating a symbol or number. o When READ encounters a terminating macro character or a white space character, it tries to interpret the collected constituent characters first as a number, then as a symbol. If a numeric interpretation is possible, READ returns the number. o Otherwise, READ changes the alphabetical characters to a standard case (normally upper case), interns the name as a symbol, and returns the symbol. o Escape characters play a special role. o A single escape character forces the following character to be treated exactly as a constituent character. o In this way characters that are normally treated as white space or terminating macro characters can be part of a symbol. o If READ encounters an escape character, it never attempts to interpret the resulting constituents as a number, even if only digits were escaped. o If READ starts with a macro character, the character determines the next step: Read a string. Read a form. ( Read a list. ; Ignore everything up to new line. # Decide what to do based on the next character. Print
Page 26 of 98
o PRINT writes Lisp data for us and for READ. o The PRINT function changes a Lisp object into the sequence of characters that READ would need to reconstruct it: (print abc) ABC ABC (print (list 1 2 3)) (1 2 3) (1 2 3) (print A String) A string A string (print 387.9532) 387.9532 387.9532 o PRINT always begins its output with a new line character ( ), and follows its output with a space ( ). o This ensures that the PRINT output stands apart from any surrounding output, since new line and space are both treated as white space, and cannot be part of the printed representation of a Lisp object (unless escaped). o Other variations of PRINT have different uses. PRIN1 behaves as PRINT, but does not surround its output with white space. o This might be useful if we are building up a name from successive pieces, for example. PRINC behaves as PRIN1, but generates output intended for display, rather than READ; for example, PRINC omits the quotes around a string, and does not print escape characters. o (print a\ bc) |A BC| o (prin1 a\ bc) |A BC| o (princ |A BC|) A BC If o if allows the execution of a form to be dependent on a single test-form. o First test-form is evaluated. If the result is true, then then-form is selected; otherwise elseform is selected. Whichever form is selected is then evaluated. o Examples: (if t 5) 5 (if nil 1 2) 2 o The syntax for the operator if is if predicate (always evaluated) what to do if the predicate is true, ie if the predicate does not evaluate to nil (this argument is not evaluated if the predicate is false)
Page 27 of 98
what to do if the predicate is false, ie if the predicate evaluates to nil (this argument is not evaluated if the predicate is true) o Suppose that we have been asked to write a function which prompts the user for a value, reads it in and prints the result. o (fahrenheit-to-celsius) Please give a value in degrees F: 32 32 degrees F is 0.0 degrees C. Format o It is a function for generating formatted output. First argument is a destination. o Specify NIL for output to a string (like sprintf in C), which format generates and returns. t for output to the listener (like printf in C), in which case format returns nil. o The next argument is known as the format string. In the same way that printf handles specially any occurrence of the character %, format handles specially any occurrence of the character ~ (pronounced tilde). In particular, ~& means: output a fresh line (i.e. if we weren't already at the start of a line, output a new one). ~a means: take the next of the arguments to format and insert its printed representation here. o Example: (let ((name Alok)) (format nil Hello, ~a. name)) Hello, Alok.
LET AND LET* LET and LET* create new variable bindings and execute a series of forms that use these bindings. LET performs the bindings in parallel and LET* does them sequentially. The form (let ((var1 form-1) (var2 form-2) ) first evaluates the expressions init-form-1, initform-2, and so on, in that order, saving the resulting values. Then all of the variables varj are bound to the corresponding values each binding is lexical unless there is a special declaration to the contrary. The expressions formk are then evaluated in order; the values of all but the last are discarded. LET* is similar to LET, but the bindings of variables are performed sequentially rather than in parallel. The expression for the form of a var can refer to vars previously bound in the LET*. The form (let* ((var1 form-1) (var2 form-2) ... ) first evaluates the expression form-1, then binds the variable var1 to that value; then it evaluates form-2 and binds var2, and so on. The expressions formj are then evaluated in order; the values of all but the last are discarded. For both LET and LET*, if there is not a form associated with a var, var is initialized to nil. The special form LET has the property that the scope of the name binding does not include any initial value form. For LET*, a variables scope also includes the remaining initial value forms for subsequent variable bindings. Examples: o (setq a big ) BIG o (defun dummy-function ( ) a) DUMMY-FUNCTION o (let ((a small) (b a)) (format nil ~S ~S ~S a b (dummy-function))) SMALL BIG BIG
Page 28 of 98
o (let* ((a small) (b a)) (format nil ~S ~S ~S a b (dummy-function))) SMALL SMALL BIG FURTHER LOGIC And o Takes any number of arguments. o Evaluates them in order until one returns nil, at which point and stops evaluating things and returns nil. o If the last argument returns non-nil, then and returns that value. o Examples: (and 1 2 3 4 5) 5 (and 1 2 nil 4 5) NIL Or o Takes any number of arguments. o Evaluates them in order until one return a non-nil value, at which point or stops evaluating things and returns that value. o If the last argument returns nil, then or returns nil. o Examples: (or 1 2 3 4 5) 1 (or nil nil nil 4 nil 5 6 7) 4 Setf o As in the case of vectors and arrays, we can change the value of a variable using setf function. o (defun look-at-setf (thing) (format t "~&Value supplied was ~a" thing) (setf thing 99) (format t "~&Value has been changed to ~a" thing) thing) LOOK-AT-SETF o (look-at-setf 55) Value supplied was 55 Value has been changed to 99 99 TOP LOOP We interact with the Lisp system through a built-in piece of code called the toploop, which repeats three simple steps for as long as we run the Lisp system: o Read an expression (we provide the expression). o Evaluate the expression just read. o Print the result(s) of the evaluation. This is also called the "read-eval-print" loop. The toploop also provides a minimal user interface -- a prompt to indicate that it's ready to read a new expression -- and a way to gracefully catch any errors we might make. If we were to write the Lisp code for a toploop, it would look something like this: o (loop (terpri) (princ ready>) (print (eval (read))))
Page 29 of 98
(terpri) prints a blank line. (loop ...) executes its forms in order, then repeats (eval ...) returns the result of evaluating a form The systems prompt has been by replaced with READY>. Every valid Lisp form we type will be read, evaluated, and printed by our toploop. Example: o READY> (cons 1 (cons 2 (cons 3 nil))) (1 2 3) o READY> We can get out of this prompt using abort. In Lisp, the debugger is accessed via a break loop. This behaves just like a toploop, but accepts additional commands to inspect or alter the state of the broken computation.
FUNCTION THAT TAKES ONE OR MORE OPTIONAL ARGUMENTS If we want to make a function that takes one or more optional arguments, use the &OPTIONAL keyword followed by a list of parameter names, like this: o (defun silly-list (p1 p2 &optional p3 p4) (list p1 p2 p3 p4)) SILLY-LIST o (silly-list 'f 'b) (F B NIL NIL) o (silly-list 'f 'b 'ba) (F B BA NIL) o (silly-list 'f 'b 'ba 'hi) (F B BA HI) The optional parameters default to NIL when the call does not supply a value. RECURSIVE FUNCTIONS A function that calls itself is recursive. The recursive call may be direct (the function calls itself) or indirect (the function calls another function which -- perhaps after calling still more functions -- calls the original function). Suppose we want to find the factorial of a number: o (defun factorial (n) (if (eql n 0) 1 (* n (factorial (- n 1))))) This can be done alternatively as follows: o (defun factorial (n) (cond ((= n 0) 1) (t (* n (factorial (- n 1)))))) This function has two cases, corresponding to the two branches of the COND. The first case says that the factorial of zero is just one -- no recursive call is needed. The second case says that the factorial of some number is the number multiplied by the factorial of one less than the number -- this is a recursive call which reduces the amount of work remaining because it brings the number closer to the terminating condition of the first COND clause. The following function calculates the length of list: o (defun length (list) (cond ((null list) 0) (t (1+ (length (rest list)))))) LENGTH o (length '(a b c d))
Page 30 of 98
4 NULL is true for an empty list, so the first COND clause returns zero for the empty list. The second COND clause gets evaluated (if the first clause if skipped) because its condition is T; it adds one to the result of the recursive call on a list which is one element shorter (a list consists of its FIRST element and the REST of the list.) Note the similarities between FACTORIAL and MY-LENGTH. The base case is always the first in the COND because it must be tested before the recursive case -- otherwise, the recursive function calls would never end. Suppose we are trying to write a function which takes two arguments - a list and a number - and if the number is = to the any element of the list then return the position of that element in the list. (So if the number matches the first element in the list return 0, if it matches the second return 1, and so on.) If the number isn't found in the list, we'll return nil. Heres one solution to the problem: o (defun position-one (list number) (if list (if (= (first list) number) 0 (let* ((pos (positionone (rest list) number))) (if pos (1+ pos) (1+ pos) nil)))))
LOOPS A simple loop looks like the following: o (loop (print How are we?) (return 1) (print I am fine.)) How are we? 1 RETURN is normally used in a conditional form, like this: o (let ((n 0)) (loop (when (> n 10) (return)) (print n) (prin1 (* n n)) (incf n))) 00 11 24 39 4 16 5 25 6 36 7 49 8 64 9 81 10 100 NIL incf and decf are used for incrementing and decrementing the value respectively. DOTIMES FOR A COUNTED LOOP To simply loop for some fixed number of iterations, the DOTIMES form is the best choice. The previous example simplifies to: o (dotimes (n 11) (print n) (prin1 (* n n))) 00 11 24 39 4 16
Page 31 of 98
5 25 6 36 7 49 8 64 9 81 10 100 NIL DOTIMES always returns NIL (or the result of evaluating its optional third argument).
DOLIST TO PROCESS ELEMENTS OF A LIST Another common use for iteration is to process each element of a list. DOLIST supports this: o (dolist (item (1 2 4 5 9 17 25)) (format t ~&~D is~:[nt~;~] a perfect square.~% item (integerp (sqrt item)))) 1 is a perfect square. 2 isnt a perfect square. 4 is a perfect square. 5 isnt a perfect square. 9 is a perfect square. 17 isnt a perfect square. 25 is a perfect square. NIL FORMATTING Lisps FORMAT implements a programming language in its own right, designed expressly for the purposes of formatting textual output. FORMAT can print data of many types, using various decorations and embellishments. It can print numbers as words or -- for we movie buffs -- as Roman numerals. We can even make portions of the output appear differently depending upon the formatted variables. FORMAT expects a destination argument, a format control string, and a list of zero or more arguments to be used by the control string to produce formatted output. Output goes to a location determined by the destination argument. If the destination is T, output goes to *STANDARD-OUTPUT*. The destination can also be a specific output stream. There are two ways FORMAT can send output to a string. One is to specify NIL for the destination: FORMAT will return a string containing the formatted output. The other way is to specify a string for the destination; the string must have a fill pointer. (defparameter *s* (make-array 0 :element-type character :adjustable t :fill-pointer 0)) o (format *s* "Hello~%") o NIL *s* o Hello (format *s* Goodbye)
Page 32 of 98
o NIL *s* o Hello o Goodbye (setf (fill-pointer *s*) 0) o 0 *s* o (format *s* A new beginning) o NIL *s* o A new beginning The call to MAKE-ARRAY with options as shown above creates an empty string that can expand to accommodate new output. Formatting additional output to this string appends the new output to whatever is already there. To empty the string, we can either reset its fill pointer (as shown) or create a new empty string. FORMAT returns NIL except when the destination is NIL. The format control string contains literal text and formatting directives. Directives are always introduced with a ~ character. Directive ~% ~& ~| ~T ~< ~> ~C ~( ~) ~D ~B ~O ~X ~bR ~R ~P ~F ~E ~$ ~A ~S Interpretation New Line Fresh Line Page Break Tab Stop Justification Terminate ~< Character Case Conversion Terminate ~( Decimal Integer Binary Integer Octal Integer Hexadecimal Integer Base-b Integer Spell An Integer Plural Floating Point Scientific Notation Monetary Legibly, Without Escapes Readably, With Escapes
DYNAMIC AND GLOBAL VARIABLES Common Lisp provides two ways to create global variables: DEFVAR and DEFPARAMETER. Both forms take a variable name, an initial value, and an optional documentation string.
Page 33 of 98
After it has been DEFVARed or DEFPARAMETERed, the name can be used anywhere to refer to the current binding of the global variable. Global variables are conventionally named with names that start and end with *. Examples of DEFVAR and DEFPARAMETER look like this: o (defvar *count* 0) o (defparameter *sum* 0.001) The difference between the two forms is that DEFPARAMETER always assigns the initial value to the named variable while DEFVAR does so only if the variable is undefined. Practically speaking, we should use DEFVAR to define variables that will contain data we did want to keep even if we made a change to the source code that uses the variable. After defining a variable with DEFVAR or DEFPARAMETER, we can refer to it from anywhere. For instance, o (defun countplus () (incf *count*)) It turns out that thats exactly what Common Lisps other kind of variable -- dynamic variables -lets us to do. When we bind a dynamic variable -- for example, with a LET variable or a function parameter -the binding thats created on entry to the binding form replaces the global binding for the duration of the binding form. And it turns out that all global variables are, in fact, dynamic variables. A simple example shows how this works: o (defvar *x* 10) o (defun fun() (format t "X: ~d~%" *x*)) The DEFVAR creates a global binding for the variable *x* with the value 10. The reference to *x* in fun will look up the current binding dynamically. If we call fun from the top level, the global binding created by the DEFVAR is the only binding available, so it prints 10. o (fun) X: 10 But we can use LET to create a new binding that temporarily shadows the global binding, and fun will print a different value. o (let ((*x* 20)) (fun)) X: 20 Now call fun again, with no LET, and it again sees the global binding. o (fun) X: 10 Now define another function, o (defun bar () (fun) (let ((*x* 20)) (fun)) (fun)) Note that the middle call to fun is wrapped in a LET that binds *x* to the new value 20. When we run bar, we get this result: o (bar) X: 10 X: 20 X: 10 As we can see, the first call to fun sees the global binding, with its value of 10.
Page 34 of 98
DO
The middle call, however, sees the new binding, with the value 20. But after the LET, fun once again sees the global binding. As with lexical bindings, assigning a new value affects only the current binding. To see this, we can redefine fun to include an assignment to *x*. o (defun fun () (format t "Before assignment~18tX: ~d~%" *x*) (setf *x* (+ 1 *x*)) (format t "After assignment~18tX: ~d~%" *x*)) Now fun prints the value of *x*, increments it, and prints it again. If we just run fun, we will see this: o (fun) Before assignment X: 10 After assignment X: 11 Not too surprising. Now run bar. o (bar) Before assignment X: 11 After assignment X: 12 Before assignment X: 20 After assignment X: 21 Before assignment X: 12 After assignment X: 13 Notice that *x* started at 11 -- the earlier call to fun really did change the global value. The first call to fun from bar increments the global binding to 12. The middle call doesn't see the global binding because of the LET. Then the last call can see the global binding again and increments it from 12 to 13. The name of every variable defined with DEFVAR and DEFPARAMETER is automatically declared globally special. This means whenever we use such a name in a binding form -- in a LET or as a function parameter or any other construct that creates a new variable binding -- the binding that's created will be a dynamic binding. This is why the * naming convention is so important itd be bad news if we used a name for what we thought was a lexical variable and that variable happened to be globally special. If we always name global variables according to the * naming convention, well never accidentally use a dynamic binding where we intend to establish a lexical binding. While DOLIST and DOTIMES are convenient and easy to use, they aren't flexible enough to use for all loops. (do (variable-definition*) (end-test-form result-form*) statement*) Example: o (do ((i 0 (1+ i))) ((>= i 4)) (print i))
DEFMACRO DEFMACRO stands for DEFine MACRO. The basic skeleton of a DEFMACRO is quite similar to the skeleton of a DEFUN. (defmacro name (parameter*) Optional documentation string. body-form*) Like a function, a macro consists of a name, a parameter list, an optional documentation string, and a body of Lisp expressions.
Page 35 of 98
However, the job of a macro isnt to do anything directly -- its job is to generate code that will later do what we want. (defmacro mac1 (a b) (+ ,a (* ,b 3))) o MAC1 (mac1 4 5) o 19
APPEND The function APPEND takes any number of list arguments and returns a new list containing the elements of all its arguments. For instance: o (append (list 1 2) (list 3 4)) (1 2 3 4) REDUCING A SEQUENCE The function reduce takes (in the simplest case) a function and a sequence. It uses this function first to combine the first two elements of the sequence, then to combine the result with the third element, then to combine this latest result with the fourth element, and so on until the whole sequence has been processed. Example: o (reduce + (1 2 3 4 5 6 7)) 28 CAR AND CDR OF A LIST car returns the first element of list. cdr returns the rest of the list. Example: o (car ( 1 2 3 4)) 1 o (cdr ( 1 2 3 4)) ( 2 3 4) RPLACA, RPLACD, SETF CIRCULARITY A list is constructed of CONS cells. Each CONS has two parts, a CAR and a CDR. The CAR holds the data for one element of the list, and the CDR holds the CONS that make up the head of the rest of the list.
By using RPLACA and RPLACD to change the two fields of CONS, we can alter the normal structure of a list. For example, we could splice out the second element of a list like this: o (defparameter *my-list* (list 1 2 3 4)) *MY-LIST* o (rplacd *my-list* (cdr (cdr *my-list*)))
Page 36 of 98
(1 3 4) o *my-list* (1 3 4) CONTRAST EXAMPLE: PUSH AND DELETE Here's an example showing DELETE and PUSH: o (defparameter *my-list* (list 1 2 3 4)) *MY-LIST* o (delete 3 *my-list*) (1 2 4) o *my-list* (1 2 4) o (delete 1 *my-list*) (2 3 4) o *my-list* (1 2 3 4) But some macros, for example PUSH and POP, take a place as an argument and arrange to update the place with the correct value. For example: o (defparameter *stack* ()) *STACK* o (push 3 *stack*) (3) o (push 2 *stack*) (2 3) o (push 1 *stack*) (1 2 3) o *stack* (1 2 3) o (pop *stack*) 1 o *stack* (2 3) COMPARISONS Not All Comparisons are Equal. Lisp has a core set of comparison functions that work on virtually any kind of object. These are: o EQ o EQL o EQUAL o EQUALP The tests with the shorter names support stricter definitions of equality. The tests with the longer implement less restrictive, perhaps more intuitive, definitions of equality. EQ
Page 37 of 98
o EQ is true for identical symbols. o In fact, its true for any identical object. In other words, an object is EQ to itself. Even a composite object, such as a list, is EQ to itself. (But two lists are not EQ just because they look the same when printed; they must truly be the same list to be EQ.) Under the covers, EQ just compares the memory addresses of objects. o EQ is not guaranteed to be true for identical characters or numbers. o This is because most Lisp systems dont assign a unique memory address to a particular number or character; numbers and characters are generally created as needed and stored temporarily in the hardware registers of the processor. EQL o EQL is also true for identical numbers and characters. o EQL retains EQs notion of equality, and extends it to identical numbers and characters. o Numbers must agree in value and type; thus 0.0 is not EQL to 0. Characters must be truly identical; EQL is case sensitive. EQUAL o EQUAL is usually true for things that print the same. o EQ and EQL are not generally true for lists that print the same. o Lists that are not EQ but have the same structure will be indistinguishable when printed; they will also be EQUAL. o Strings are also considered EQUAL if they print the same. Like EQL, the comparison of characters within strings is case-sensitive. EQUALP o EQUALP ignores number type and character case. o EQUALP is the most permissive of the core comparison functions. Everything that is EQUAL is also EQUALP. But EQUALP ignores case distinctions between characters, and applies the (typeless) mathematical concept of equality to numbers; thus 0.0 is EQUALP to 0. o Furthermore, EQUALP is true if corresponding elements are EQUALP in the following composite data types: Arrays Structures Hash Tables Longer tests are slower; know what we are comparing. The generality of the above longer-named tests comes with a price. They must test the types of their arguments to decide what kind of equality is applicable; this takes time. EQ is blind to type of an object; either the objects are the same object, or they're not. This kind of test typically compiles into one or two machine instructions and is very fast. We can avoid unnecessary runtime overhead by using the most restrictive (shortest-named) test that meets we need.
CHAP 3 :NEURAL NETWORK & FUZZY SYSTEMS WHAT DO WE SEE ON THE FOLLOWING PICTURE?
Page 38 of 98
We recognize an illusionary bright cross. But technically, there is no cross. Only four squares within a square. Such a square exists only in our brain, not on the screen. The real time interaction of millions of neurons in our brain is behind this cross like perception. The asynchronous, nonlinear, massively parallel, distributed neurons perform such recognition under uncertainty. We reason something with vague concepts, beliefs, estimates, guesses etc. This inexactness is called fuzzy. Though we use exact scientific tools in day to day decision making, the final control remains fuzzy. E.g. medical diagnosis. This we may casually refer as experience, judgment, sixth sense etc.
LOGIC Bivalent. On or Off. True or False. Present or Absent A or Not A. BIVALENT LOGIC CREATES PARADOXES A man says Dont trust me. Can we trust him? One side of a card says The sentence on the other side is true and the other side of the card says The sentence on the other side is not true. Which side is true? A speaker tells I lie. Does he tell the truth? A liar says all his friends are liars. Does he lie? A barber shaves everybody who cannot shave themselves. Can he shave himself? BIVALENT PARADOXES AS FUZZY MID POINTS The paradoxes have the same property. A statement S and its negation have the same truth value i.e. t(S) = t (not S) i.e. t(S) = 1 - t (S) If S is true t(S) = 1, then 1 = 0 If S is false t(S) = 0, then 0 = 1 The fuzzy interpretation takes 2 * t(S) = 1 Giving t(S) = Thus paradoxes reduce to literal half truths. Or, mathematically the midpoint of the interval [0, 1]. Thus, fuzziness means multi-valuedness or multivalence. Three valued fuzziness corresponds to true, false and indeterminent.
Page 39 of 98
Or present, absent and ambiguous. Heap of sand problem can also be tackled using fuzzy approach. Thus fuzziness reduces the black and white rigidity of bivalent logic to a multi-valued gray areas between black and white. TRUE and FALSE becomes two limiting cases of a band of indeterminacy.
FUZZY SYSTEMS: HISTORY In 300 years BC, Aristotle came up with binary logic involving the numbers 0 and 1. It came down to one law: True or False. Later Plato questioned the rationale of it by proposing a third region which is beyond True or false. Buddha supported this argument by stating that world as it is, filled with contradictions, with things and not things. Philosophers like Hegel, Marx and Engels supported the school of thought of many valued logic. In 1900, Lukasiewich first proposed three value logic along with the mathematics to accompany it. The third value he proposed was possible and he assigned it a numerical value between True and False. In 1965 Dr. Lotfi A Zadeh published his work Fuzzy Sets, which described the mathematics of fuzzy set theory. This theory proposed the membership function operate over the range of real numbers [0, 1], in place of 0 and 1 as followed by Boolean Set theory. The indicator function of a non fuzzy set A is given by IA = 1 if x A = 0 if x A. Zadeh extended this function to multi value membership function mA: x [0, 1]. This membership function measures the degree to which element x belongs to the set A. mA(x) = Degree (x A) mA(x) = 0 denotes that x is not a member of the set; mA(x) = 1 denotes that x is definitely a member; And all other values denote degrees of membership. FUZZY SYSTEMS: EXAMPLE Let us consider the example of TALL to illustrate fuzzy set. We can assign a degree of membership to each person in the fuzzy set TALL as follows: Tall(x) = 0, if height(x) < 5 = (height(x) - 5)/2, if 5 <= height(x) <= 7 = 1, if height(x) > 7. The heights and degrees of membership of each person can be shown as follows: Person A B C D Height 5 2 5 3 5 5 5 7 Degree of Membership 0.10 0.15 0.25 0.35
Page 40 of 98
6 1
0.54
SOME APPLICATIONS / PRODUCTS Railway subway in Sendai, Japan where trains movements are controlled by fuzzy controlled systems. Omron Camera aiming for the telecast of sporting events. Hitachi washing machine with single button control. Sony pocket computers with handwritten recognition. FUZZY SYSTEMS & NEURAL NETWORKS Process inexact info inexactly. Share the same mathematical foundation. Each neuron emits a bounded signal just like a bounded set value. A set of n neurons defines a family of n-directional continuous or fuzzy sets. At each instant the n-vector neural output defines a fuzzy unit. Each fuzzy unit indicates the degree to which the neuron belongs to the n-dimensional fuzzy set. The neuronal state pace (the set of all n-possibilities) equals the set of all n-dimensional fit vectors (the fuzzy power set), given by In = [0, 1] * [0, 1] * * [0, 1]. This power set has 2n vertices, which is an n-dimensional unit cube. NON FUZZY SET TO FUZZY SET: 1 DIMENSION Consider a non fuzzy set X = {x1}, contains only one element. The power set of X = {, {x1}} where = 0 and {x1} = 1, the binary bits. The corresponding fuzzy set contains all the values from 0 to 1. Thus, 0 1 0 1
NON FUZZY SET TO FUZZY SET: 2 DIMENSIONS Consider a non fuzzy set X = {x1, x2}, contains elements. The power set of X = {, {x1}, {x2}, {x1, x2}} where = (0, 0), {x1} = (1, 0), {x2} = (0, 1) and {x1, x2} = (1, 1). The points correspond to vertices of a unit square. Thus, (0, 1) (1, 1) (0, 1) (1, 1)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
NON FUZZY SET TO FUZZY SET: 3 DIMENSIONS Consider a non fuzzy set X = {x1, x2, x3}, contains 3 elements.
Page 41 of 98
The power set of X = {, {x1}, {x2}, {x3}, {x1, x2}, {x1, x3}, {x2, x3}, {x1, x2, x3}} where = (0, 0, 0), {x1} = (1, 0, 0), {x2} = (0, 1, 0), {x3} = (0, 0, 1), {x1, x2} = (1, 1, 0), {x1, x3} = (1, 0, 1), {x2, x3} = (0, 1, 1) and {x1, x2, x3} = (1, 1, 1). The points correspond to vertices of a unit cube. Thus,
NON FUZZY SET TO FUZZY SET: N DIMENSION In general, a n-vector non-fuzzy set value corresponds to an n-dimensional fit vector in the ncube In = [0, 1] * [0, 1] * * [0, 1]. The mid point (1/2, 1/2 1/2) of this ncube corresponds to the paradoxes of logic where truth and falsity has same value. FIT VECTOR: EXAMPLE 1-dimension: 1/3 2-dimension: (1/3, 2/5) where mA (x1) = 1/3 and mA (x2) = 2/5. 3-dimension: (1/3, 2/5, 3/4) where mA (x1) = 1/3, mA (x2) = 2/5 and mA (x3) = 3/4. SUBSETHOOD THEOREM Subsethood measures the degree to which set A B and is denoted by S (A, B). S(A, B) = Degree (A C B) = M(A B) / M(A) = P(B/A) where M(A) denotes the fuzzy of count of fit vector, i.e. if A = (a1, a2, , an) the M(A) = a1 + a2 + + an where 0 <= S(A, B) <= 1. Question: Apply subsethood theorem for R3 with A = (3/4, 1/3, 1/6) and B = (1/4, 1/2, 1/3). Answer: X = R3 where X = {x1, x2, x3}, contains 3 elements. The power set of X = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)} Consider the fuzzy subset B = (1/4, 1/2, 1/3). If A and B are fuzzy sets, then m A (A B) = min(mA(A), mA(B)) and mA(A U B) = max(mA (A), mA (B)). (A B ) = min(A, B) = (1/4, 1/3, 1/6) M(A B ) = 1/4 + 1/3 + 1/6 = 3/4. M(A) = 3/4 + 1/3 + 1/6 = 5/4. S(A, B) = M(A B) / M(A) = (3/4) / (5/4) = 3/5 = 60%. PROBABILITY AS SUBSETHOOD Consider a statistical experiment X of n trials. Suppose A defines the subset of successful trials. Let there be nA successes out of n trials. Let 1 denote success and 0 denote failure. S (A, X) = M (A X) / M (X) = M (A) / M (X) = nA / n = P (A). Thus probability reduces to subsethood. DYNAMIC SYSTEMS APPROACH Dynamic systems are common. Electrical engineering: Signal processing, filtering
Page 42 of 98
Computer science: algorithms, robotics ... Maths: functions, Statistical estimation Philosophy: thinking, action Biology: neuroscience, evolution Economics: market equilibrium, game theory Anthropology: culture All receive stimuli and adapt to give responses. Thus, Brain = A dynamic System.
BRAIN
BIOLOGICAL NEURAL SYSTEMS
The brain is composed of approximately 100 billion (1011) neurons. A typical neuron collects signals from other neurons through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin strand known as an axon, which splits into thousands of branches. At the end of the branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on the other changes.
BRAIN = A DYNAMIC SYSTEM

Page 43 of 98
We can duplicate working of brain in machines. These machines are expected to work smarter. This smartness is referred as machine intelligence. Artificial Neural networks and Fuzzy systems are two adaptive machine intelligent systems.
NEURAL AND FUZZY SYSTEMS AS FUNCTION ESTIMATORS Neural Networks and Fuzzy Systems estimate I/O functions. Both are model free in I/O analysis and so the same architecture can be used for different problems. Both are trainable dynamical systems using sample data. These sample information is encoded in parallel distributed framework. Neural Networks have the property of recognition without definition and learn from previous experiences. E.g. child. This property helps in generalizing and better learning. E.g. NL development. Distributed encoding in Neural Networks helps in recognizing partial patterns, fault tolerance and graceful degradation. Neural Networks contain a collection of processing units called neurons. Neurons work as I/O functions and synapses (joints) work as adjustable weights. Thus Neural Networks behave as adaptive function estimators. NEURAL NETWORKS AS TRAINABLE DYNAMICAL SYSTEMS Network activity in Neural Network follows a trajectory in the state space of all possibilities. Each point in the state space is a possible Neural Network configuration. Corresponding to an input, trajectory begins and with a solution trajectory ends. E.g. pattern recognition. Here synaptic values gradually change to learn new patterns. ARTIFICIAL NEURAL NETWORKS Artificial neurons are analogous to their biological inspirers.
Here the neuron is actually a processing unit, it calculates the weighted sum of the input signal to the neuron to generate the activation signal a, given by
Where wi is the strength of the synapse connected to the neuron, xi is an input feature to the neuron.
Page 44 of 98
The activation signal is passed through a transform function to produce the output of the neuron, given by
The transform function can be linear, or non-linear, such as a threshold or sigmoid function
CLASSIFICATION OF NEURAL NETWORK MODELS We can classify Neural Networks based on whether they learn with supervision or whether they contain feedback. Supervised & Feed Forward Perceptron LMS Back Propagation Unsupervised & Feed Forward Self-Organizing Map Data Clustering Supervised & Feed Back Recurrent Back Propagation Unsupervised & Feed Back Boltzmann Learning Hopfield Network
INTELLIGENT BEHAVIOR AS ADAPTIVE MODEL FREE ESTIMATION Intelligent systems adaptively functions from data without a model for I/O processing. Living creatures respond to stimuli. Or, we map stimuli to responses (like f: X Y). Intelligent systems associate similar responses with similar stimuli. They produce minor changes if the inputs are changed slightly. INTELLIGENT SYSTEMS GENERALIZE Let S and R be the spaces of stimuli and responses. Consider balls Bx and By. Here f (Bx) = By.
For every similar y in By, we can find some similar stimulus x in Bx such that y = f(x). That is, f is an onto map.
INTELLIGENT SYSTEMS ARE CREATIVE The measure of creativity of f is given by CBx(f) = V(By) / V(Bx), V volume. Case 1: CBx(f) = 0. o => V(By) = 0 or V(Bx) = infinity. o => By = 0 or V(Bx) = infinity. o => f is a constant function or Bx is of infinite radius. o => f is dull or stimuli overwhelm responses. Case 2: CBx(f) = infinity.
Page 45 of 98
o => Infinite radius for By. o => f emits infinite responses. Small variations in input provide simplest novel stimuli. This manifests as creativity.
LEARNING AS CHANGE Intelligent systems learn or adapt. Learning or adaptation means just parameter change. The parameter may be: o Average transmission rate at synaptic junctions. o Gene frequency at the locus of chromosome. Practically, learning means change. So, learning laws could describe a dynamic system. We can impart any system to encode or decode information. E.g. mowing of lawn of grass. Lawn = brain. Supervised learning uses class membership information. It can know that belong and not belong. E.g. speech recognition system at airport supervised using carrot and stick policy. Unsupervised systems adaptively cluster, like patterns with like patterns. Biological synapses learn without supervision. SYMBOLS VS NUMBERS: RULES VS PRINCIPLES We cannot mathematically define our behavior. E.g. emotions. Natural languages follow this approach. Some languages are developed as articulated languages. E.g. Esperanto, Interlingua. Computer programming languages follow articulation. For example, Lisp and Prolog process symbols within a frame work of bivalent logic and propositional rules. EXPERT SYSTEM KNOWLEDGE AS RULE TREES AI systems store and process propositional rules. The rules have the form: o IF (CONDITION) THEN (ACTION) E.g. water jug, IF (4, 0) THEN (4, 3). The collection of all such rules for a given problem is called knowledge base. The knowledge base can be put in a tree format as shown (0, 0)
(4, 0)
(0, 3)
(4, 3)
(0, 0)
(1, 3)
(4, 3)
(0, 0)
(3, 0)
Page 46 of 98
By searching through the knowledge tree we can perform inference (process of finding solution). There are two forms of inference: o Forward chaining o Backward chaining
FORWARD CHAINING Forward chaining starts with some initial information and work forward, attempting to match that information with a rule. Once a fact has been matched to the IF part of the rule, the rule is fired. The action could produce new knowledge or a new fact that is stored in the knowledge base. This new fact may then be used to search out the next appropriate rule. This searching and matching process continues until a final conclusion rule is fired. BACKWARD CHAINING Backward chaining starts with a fact in the database, but this time it is the hypothesis. The rule interpreter then begins examining the THEN parts of rules for a match. The inference engine searches for an evidence to support the hypothesis originally stated. If a match is found, the database is updated recording the conditions or premises that the rule stated as necessary for supporting the matched conclusion. The chaining process continues with the system repeatedly attempting to match the right hand side of the rule against the current status of the system. The corresponding IF sides of the rules matched are used to generate new intermediate hypotheses or goal states which are recorded in the database. Again this backward chaining continues until the hypothesis is proved. REMARK The choice of inference strategy with either forward or backward chaining is determined by the design of the system and the nature of the problem being solved. In large systems with many rules, the forward chaining or data driven approach may be too slow, as it will generate many sequences of rules. The search, as a result, can go off on undesired directions exploring alternatives that do not fit in the problem. In such cases a backward chaining or goal driven approach may be advantageous. On the other hand, a backward chaining process could get a fixation on a particular hypothesis and continue to explore it though the data available to support it may not be there. The system does not know when to switch the emphasis or context to a more appropriate search sequence. Some expert systems incorporate both forward and backward chaining. This speeds up the process and ensures a solution. The concurrent forward-backward search rapidly converges on an answer. BRAIN = COMPUTER Consider Brain as computer: o Language strings thoughts. o Programming learning. o Logical inference evolution.
Page 47 of 98
o Feed forward search through knowledge feed back from previous experience. DIFFERENT MODEL FREE ESTIMATORS Model free = output mathematically not depending on inputs. FRAMEWORK Symbolic AI Expert Systems ----Numeric Fuzzy Systems Neural Networks
KNOWLEDGE
Structured Unstructured
FUZZY SYSTEMS AS STRUCTURED NUMERIC ESTIMATORS Fuzzy systems encode structured knowledge in a numeric framework. We can enter a fuzzy association like (TALL, HEIGHT) as an entry in FAM (Fuzzy Associative Memory) rule matrix. FAM rule is an I/O map. FAM RULES: E.G. FUZZY CONTROL OF A PENDULUM Let angle of pendulum, angular velocity of pendulum, v current to the motor control that adjusts pendulum. All variables are fuzzy and v is the output and others are inputs. Each variable has 5 fuzzy set values: o Negative Medium (NM) o Negative Small (NS) o Zero (ZE) o Positive Small (PS) o Positive Medium (PM) FAM RULES OF PENDULUM NM NS ZE PS NM PM NS PS ZE PM PS ZE NS PS NS PM NM Usually fuzzy set values are defined as trapezoids. E.g. angle value 0 belongs to fuzzy value ZE to degree 1. The angle value 3 may belong to ZE only to degree say 0.6 PM NM
GENERATING FUZZY RULES WITH PRODUCT SPACE CLUSTERING Pendulum case: 2 inputs, 1 output. Product space = R3 Each I/O triplet (, , v) is a point in R3 Movement pendulum defines a trajectory and v = ZE corresponds to (0, 0, 0). Each fuzzy variable has 5 fuzzy subset of x, y, z coordinates. The Cartesian product of these subsets 5 . 5 . 5 = 125 FAM cells. Most systems pass through only few of these cells.
Page 48 of 98
Each FAM cell corresponds to a FAM rule.
FUZZY SYSTEMS AS PARALLEL ASSOCIATORS Fuzzy Systems store and process FAM rules in parallel. B = WJ BJ Adaptive Fuzzy Systems use sample data and neural / statistical algorithms to choose the coefficients. If the input fuzzy system define points in the unit hypercube In and output fuzzy system define points in the unit hypercube Ip then the transformation S defines a Fuzzy System: if S: In Ip S defines an adaptive Fuzzy System if it changes with time i.e. dS/dt != 0.
FUZZY SYSTEMS AS PRINCIPLE BASED SYSTEMS AI expert systems work through rules. Inference is performed by traversing through the decision tree. The tree can be shallow or deep. E.g. shallow chess, deep water jug. Shallow tree use only a small proportion of the stored knowledge in the inference. In that sense, they are non-interactive. Fuzzy systems are shallow but interactive. Every inference fires every FAM rule to some degree. AI expert systems use rule based approach. But fuzzy systems use principle based approach. E.g. AI vs. fuzzy judge. Rules apply in an all-or-none fashion. Principles have a dimension of weight or importance. Principles evolve, but rules are static. Adaptive Fuzzy Systems use neural techniques to abstract fuzzy principles from samples. This is similar to our acquisition of knowledge. NEURONS AS FUNCTIONS Neurons transforms an activation x(t) into a bounded output signal S(x(t)). Usually a sigmoid function is used for this purpose.
EFFECT OF SIGNAL FUNCTION
Page 49 of 98
Where = wn+1 Usually signal functions are monotone non-decreasing i.e. dS/dt >= 0. But, dS/dt = (dS/dx) (dx/dt) i.e. signal velocity depends on activation velocity. CHAP 4: NEURAL NETWORK THEORY
NEURONS AS FUNCTIONS Neurons transforms an activation x(t) into a bounded output signal S(x(t)). Usually a sigmoid function is used for this purpose.
EFFECT OF SIGNAL FUNCTION
Where = wn+1 Usually signal functions are monotone non-decreasing i.e. dS/dt >= 0. But, dS/dt = (dS/dx) (dx/dt) i.e. signal velocity depends on activation velocity.
NEURON FIELDS A field of neuron is a topological grouping. E.g. closeness or proximity. In human, volume proximity offers a field. We denote fields of neurons as Fx, Fy, Fz etc.
Page 50 of 98
NEURONAL STATE SPACE Consider a network with only two fields the input field Fx of dim n and output field Fy of dim p. Then the state space of the system is given by: o X(t) = (x1(t), x2(t), , xn(t)) o Y(t) = (y1(t), y2(t), , yn(t)) Thus state space of X(t) = Rn. Thus state space of Y(t) = Rp. Thus state space of NN = Rn * Rp. SIGNAL STATE SPACE AS HYPERCUBES The signal state space S(X) of field Fx is given by: o S(X(t)) = (S1X(x1(t), S2X(x2(t), , SnX(xn(t)) SiX denotes the signal function of the Ith neuron in the field Fx. S(X) consists of all possible signal spaces. Since signal functions are bounded, S(X) is an n dimensional hypercube. If the range of the signal function is [0, 1], then the signal state space S(X) is In = [0, 1]n The unit hypercube In also defines the fuzzy power set F (2X) of a fuzzy set X of n elements. COMMON SIGNAL FUNCTIONS Logistical signal function o S(x) = 1 / (1 + ecx) o S = c S (1 - S) o S > 0 o => S is monotonic increasing. Hyperbolic tangent signal function o S(x) = tanh (c x) o S = c (1 - S2) o S > 0 o => S is monotonic increasing. Threshold linear signal function o It is binary function. o S(x) = 1, if c x >= 1 o = 0, if c x < 0 o = c x, otherwise o S = c o => S > 0 Linear signal function
o => S is monotonic increasing.
o S(x) = c x
Page 51 of 98
o S = c o => S > 0 o => S is monotonic increasing. Threshold exponential signal function o S(x) = min(1, ecx) o S = c ecx, if ecx < 1 o => S > 0 o => S is monotonic increasing. Exponential distribution signal function o S(x) = max(0, 1 - e-cx) o S = c e-cx, if x > 0 o => S > 0 o => S is monotonic increasing. o And S = c2 e-cx < 0 o => Strictly convex. Ratio polynomial signal function o S(x) = max(0, (xn / (c + xn)) o S = c n xn-1 / ( c + xn)2, x > 0 o => S > 0 o => S is monotonic increasing. CHAP 5: A GENTLE INTRODUCTION TO GENETIC ALGORITHMS Genetic Algorithms are search and optimization techniques based on Darwins Principle of Natural Selection. DARWINS PRINCIPLE OF NATURAL SELECTION IF there are organisms that reproduce, and IF offsprings inherit traits from their progenitors, and IF there is variability of traits, and IF the environment cannot support all members of a growing population, THEN those members of the population with less-adaptive traits (determined by the environment) will die out, and THEN those members with more-adaptive traits (determined by the environment) will thrive. The result is the evolution of species. EVOLUTION The context of evolution is a population (of organisms, objects, agents ...) that survives for a limited time (usually) and then dies. Some produce offspring for succeeding generations, the fitter ones tend to produce more. Over many generations, the make-up of the population changes. Without the need for any individual to change, successive generations, the species changes, in some sense (usually) adapts to the conditions. REQUIREMENTS Heredity
Page 52 of 98
o Offspring are (roughly) identical to their parents. Variability o Except not exactly the same, some significant variation. Selection o The fitter ones are likely to have more offspring. Variability is usually random and undirected. Selection is usually un-random and directed. In natural evolution the direction of selection does not imply a conscious director. In artificial evolution often we are the director.
BASIC IDEA OF PRINCIPLE OF NATURAL SELECTION Select The Best, Discard The Rest. AN EXAMPLE OF NATURAL SELECTION Giraffes have long necks. o Giraffes with slightly longer necks could feed on leaves of higher branches when all lower ones had been eaten off: They had a better chance of survival. Favorable characteristic propagated through generations of giraffes. Now, evolved species has long necks. o Longer necks may have been a deviant characteristic (mutation) initially but since it was favorable, was propagated over generations. o Now an established trait. o So, some mutations are beneficial. NATURE TO COMPUTER MAPPING Nature Population Individual Fitness Chromosome Gene Reproduction Computer Set of solutions Solution to a problem Quality of a solution Encoding of a solution Part of the encoding of a solution Crossover, mutation
EVOLUTION THROUGH NATURAL SELECTION
Page 53 of 98
Initial Population of Animals
Struggle for Existence Survival of the Fittest
Surviving Individuals Reproduce, Propagate Favorable Characteristics

Millions of Years
Evolved Species
(Favorable Characteristics Now a Trait of Species)
CLASSES OF SEARCH TECHNIQUES

Search Techniques
Calculus Based Techniques
Guided Random Techniques
Enumerative Techniques
Direct Methods
Indirect Methods
Evolutionary Algorithms
Simulated Annealing
Dynamic Programming
Fibonacci
Newton
Evolutionary Strategies
Genetic Algorithms
Parallel
Sequential
Centralized
Distributed
Steady-state
Generational
SEARCH METHODS Blind random search does not use acquired information in deciding on the future direction of the search. Hill combing and gradient descent use acquired information; however, they are prone to becoming trapped on local optima.
Page 54 of 98
THE GENETIC ALGORITHM Directed search algorithms based on the mechanics of biological evolution. Developed by John Holland, University of Michigan (1970s). To understand the adaptive processes of natural systems. To design artificial systems software that retains the robustness of natural systems. Provide efficient, effective techniques for optimization and machine learning applications. Widely used today in business, scientific and engineering circles. GENETIC ALGORITHMS VS. CONVENTIONAL OPTIMIZATION TECHNIQUES Direct manipulation of a coding. Search from a population, not from a single point. Search via blind search. Search using probabilistic not deterministic rules. Does not use a knowledgebase. Biologically inspired. COMPONENTS OF A GENETIC ALGORITHM A problem to solve, and ... o Encoding technique (gene, chromosome) o Initialization procedure (creation) o Evaluation function (environment) o Selection of parents (reproduction) o Genetic operators (mutation, recombination / reproduction) o Parameter settings (practice and art) WORKING MECHANISM OF GENETIC ALGORITHMS
Page 55 of 98
Begin
Initialize Population
Evaluate Solution
T=0
Optimum Solution?
Y
Selection
Crossover
T=T+1
Stop Mutation
SIMPLE GENETIC ALGORITHM Simple_Genetic_Algorithm() { Initialize the Population; Calculate Fitness Function; While(Fitness Value != Optimal Value) { Selection; //Natural Selection, Survival Of Fittest Crossover; //Reproduction, Propagate favorable characteristics Mutation; //Mutation Calculate Fitness Function; } } BASIC GENETIC ALGORITHM
Page 56 of 98
Current Population
Produce Offspring From Parents
Evaluate All Fitness
Select Fitter For Parents
THE GENETIC ALGORITHM CYCLE OF REPRODUCTION

Reproduction
Children
Modification
Modified Children
Parents
Population
Evaluated Children Deleted Members
Evaluation
Discard
POPULATION
Population
Chromosomes could be: o Bit strings (0101 ... 1100) o Real numbers (43.2 -33.1 ... 0.0 89.2) o Permutations of element (E11 E3 E7 ... E1 E15) o Lists of rules (R1 R2 R3 ... R22 R23) o Program elements (genetic programming) o ... any data structure ...
REPRODUCTION
Page 57 of 98
Reproduction
Children
Parents
Population
Parents are selected at random with selection chances biased in relation to chromosome evaluations.
CHROMOSOME MODIFICATION
Children
Modification
Modified Children
Evaluation
Modifications are stochastically triggered. Operator types are: o Mutation o Crossover o Reproduction (recombination)
EVALUATION
Modified Children
Evaluation
Evaluated Children
The evaluator decodes a chromosome and assigns it a fitness measure. The evaluator is the only link between a classical Genetic Algorithm and the problem it is solving.
DELETION
Page 58 of 98
Population
Deleted Members
Discard
Generational Genetic Algorithm o Entire population is replaced with each iteration. Steady-state Genetic Algorithm: o A few members replaced each generation. (1 0 1 1 0 1 1 0) (1 0 1 0 0 1 1 0) (1.38 -69.4 326.44 0.1) (1.38 -67.5 326.44 0.1)
MUTATION: LOCAL MODIFICATION Before: After: Before: After:
Causes movement in the search space (local or global). Restores lost information to the population.
CROSSOVER P1 P2 (0 1 1 0 1 0 0 0) (1 1 0 1 1 0 1 0) (0 1 0 0 1 0 0 0) C1 (1 1 1 1 1 0 1 0) C2
Crossover is a critical feature of genetic algorithms: o It greatly accelerates search early in evolution of a population. o It leads to effective combination of schemata (sub-solutions on different chromosomes).
AN ABSTRACT EXAMPLE
Distribution of Individuals in Generation 0
Page 59 of 98
Distribution of Individuals in Generation N A SIMPLE EXAMPLE
GENETIC ALGORITHM OPERATORS AND PARAMETERS Encoding o The process of representing the solution in the form of a string that conveys the necessary information. o Just as in a chromosome, each gene controls a particular characteristic of the individual; similarly, each bit in the string represents a characteristic of the solution. Encoding Methods o Binary Encoding Most common method of encoding. Chromosomes are strings of 1s and 0s and each position in the chromosome represents a particular characteristic of the problem. Chromosome A Chromosome B 10110010110011100101 11111110000000011111
o Permutation Encoding Useful in ordering problems such as the Traveling Salesman Problem (TSP). Example, in TSP, every chromosome is a string of numbers, each of which represents a city to be visited. Chromosome A Chromosome B 1 5 3 2 6 4 7 9 8 8 5 6 7 2 3 1 4 9
o Value Encoding Used in problems where complicated values, such as real numbers, are used and where binary encoding would not suffice.
Page 60 of 98
Good for some problems, but often necessary to develop some specific crossover and mutation techniques for these chromosomes. Chromosome A Chromosome B 1.235 5.323 0.454 2.321 2.454 (left), (back), (left), (right), (forward)
FITNESS FUNCTION A fitness function quantifies the optimality of a solution (chromosome) so that that particular solution may be ranked against all the other solutions. A fitness value is assigned to each solution depending on how close it actually is to solving the problem. Ideal fitness function correlates closely to goal + quickly computable. Example. In TSP, f(x) is sum of distances between the cities in solution. The lesser the value, the fitter the solution is. ROULETTE WHEEL SELECTION Each current string in the population has a slot assigned to it which is in proportion to its fitness. We spin the weighted roulette wheel thus defined n times (where n is the total number of solutions). Each time the Roulette Wheel stops, the string corresponding to that slot is created. Strings that are fitter are assigned a larger slot and hence have a better chance of appearing in the new population.
EXAMPLE OF ROULETTE WHEEL SELECTION No. 1 2 3 4 String 01101 11000 01000 10011 Total Fitness 169 576 64 361 1170 % of Total 14.4 49.2 5.5 30.9 100.0
CROSSOVER It is the process in which two chromosomes (strings) combine their genetic material (bits) to produce a new offspring which possesses both their characteristics. Two strings are picked from the mating pool at random to cross over. The method chosen depends on the Encoding Method.
Page 61 of 98
CROSSOVER METHODS Single Point Crossover o A random point is chosen on the individual chromosomes (strings) and the genetic material is exchanged at this point.
Chromosome 1 Chromosome 2 Offspring 1 Offspring 2
11011 | 00100110110 11011 | 11000011110 11011 | 11000011110 11011 | 00100110110
Two Point Crossover o Two random points are chosen on the individual chromosomes (strings) and the genetic material is exchanged at these points. Chromosome 1 Chromosome 2 Offspring 1 Offspring 2 11011 | 00100 | 110110 10101 | 11000 | 011110 10101 | 00100 | 011110 11011 | 11000 | 110110
NOTE: These chromosomes are different from the last example. Uniform Crossover o Each gene (bit) is selected randomly from one of the corresponding genes of the parent chromosomes. Chromosome 1 Chromosome 2 Offspring 11011 | 00100 | 110110 10101 | 11000 | 011110 10111 | 00000 | 110110
NOTE: Uniform Crossover yields only 1 offspring. Crossover between 2 good solutions may not always yield a better or as good a solution. Since parents are good, probability of the child being good is high.
Page 62 of 98
If offspring is not good (poor solution), it will be removed in the next iteration during Selection.
ELITISM Elitism is a method which copies the best chromosome to the new offspring population before crossover and mutation. When creating a new population by crossover or mutation the best chromosome might be lost. Forces Genetic Algorithms to retain some number of the best individuals at each generation. Has been found that elitism significantly improves performance. MUTATION It is the process by which a string is deliberately changed so as to maintain diversity in the population set. We saw in the giraffes example, that mutations could be beneficial. Mutation Probability o Determines how often the parts of a chromosome will be mutated. After an offspring has been produced from two parents (if sexual Genetic Algorithm) or from one parent (if asexual Genetic Algorithm). Mutate at randomly chosen loci with some probability. Locus = a position on the genotype. A SIMPLE OPTIMIZATION EXAMPLE
Optimization of f(x) = x2, with x [0, 31] Problem representation o Encoding of the variable x as a binary vector o [0, 31] [00000, 11111] A GENETIC ALGORITHM BY HAND String No. 1 2 3 4 Initial Population 01101 11000 01000 10011 Mate 2 x Value 13 24 8 19 Crossover Point 4 Fitness F(x) = x2 169 576 64 361 Mutation % of Total Fitness 14.4 49.2 5.5 30.9 New Population 01100 Selection Probability 0.144 0.492 0.055 0.309 Fitness f(x) = x2 144
Page 63 of 98
After Selection 0110 | 1
1100 | 0 11 | 000 10 | 011
1 4 2
4 2 2
3 -
11101 11011 10000
841 729 256
BUILDING BLOCKS (SCHEMAS) How to characterize evolution of population in Genetic Algorithm? Goal: o Identify basic building block of Genetic Algorithms o Describe family of individuals. o Consider (11101) with fitness 841 and (11011) with fitness 729. o The structure (11***) is very much powerful, where * can be 1 or 0. SCHEMA: DEFINITION String: (0, 1, *) (* - dont care). Typical schemata: 10**0*, for string of length 6. Instances of above schema: 101101, 100000, Have 36 = 729 schemata. In general, 3l Schemata for string of length l. Short defining length schemata are highly fit could dominate evolutionary process. Through crossover creating fitter ones. Mutation has little effect. Is an insurance policy to cover genetic policy. BENEFITS OF GENETIC ALGORITHMS Concept is easy to understand. Modular, separate from application. Supports multi-objective optimization. Good for noisy environments. Always an answer; answer gets better with time. Inherently parallel; easily distributed. Many ways to speed up and improve a Genetic Algorithm based application as knowledge about problem domain is gained. Easy to exploit previous or alternate solutions. Flexible building blocks for hybrid applications. Substantial history and range of use. WHEN TO USE A GENETIC ALGORITHM Alternate solutions are too slow or overly complicated. Need an exploratory tool to examine new approaches. Problem is similar to one that has already been successfully solved by using a Genetic Algorithm. Want to hybridize with an existing solution. Benefits of the Genetic Algorithm technology meet key problem requirements. MAIN DIFFICULTIES OF GENETIC ALGORITHMS Adjustment of the Genetic Algorithm control parameters.
Page 64 of 98
o Population Size o Crossover Probability o Mutation Probability Specification of the termination condition. Representation of the problem solutions.
SOME GENETIC ALGORITHM APPLICATION TYPES Domain Control Design Scheduling Robotics Machine Learning Signal Processing Game Playing Combinatorial Optimization Application Types Gas Pipeline, Pole Balancing, Missile Evasion, Pursuit Semiconductor Layout, Aircraft Design, Keyboard Configuration, Communication Networks Manufacturing, Facility Scheduling, Resource Allocation Trajectory Planning Designing Neural Networks, Improving Classification Algorithms, Classifier Systems Filter Design Poker, Checkers, Prisoners Dilemma Set Covering, Traveling Salesman, Routing, Bin Packing, Graph Coloring, Partitioning 6 : GENETIC ALGORITHMS REVISITED: MATHEMATICAL FOUNDATIONS SCHEMA Consider strings constructed over the binary alphabets V = {0, 1} Thus a string can be represented as: o A = 0101101 = a1a2a3 where each ai is called a gene. Each gene can take a value 1 or 0. We call the values of ai (i.e. 1 or 0) alleles. Consider a population of strings A(t) at time t. Consider a schema H taken from the three letter alphabet V+ = {0, 1, *} E.g. the string (*11*0*1) is a representation of H. SCHEMATA: PROPERTIES All schemata are not equal. They differ in counts: o Order: The order of a schema H, denoted by O(H), is the number of fixed positions present in the template of schema. E.g. O(1*11*0*) = 4. o Defining length: The defining length of schema H, denoted by (H), is the distance between the first and the last specific string position. (1*11*0*) = 6 - 1. (**1****) = 0.
Page 65 of 98
SCHEMA DIFFERENCE EQUATION Suppose at any given time t, there are m examples of a particular schema H in the population A(t). It is denoted by m = m(H, t). During reproduction, a particular string gets selected with probability pi = fi / fi, where fi is the fitness. Suppose a completely new generation is created from the population using reproduction. Then the number of schemata at time t is given by m(H, t+1) = m(H, t) * f(H) / f where f = fi / n. f(H) is the average fitness of strings representing schema H at t. And f is the average fitness of the population. Thus, a schema grows as the ratio of the average fitness of the schema to the average fitness of the population. Schemata with fitness value above the population average will receive an increasing number of samples in the next generation. A schema grows or decays according to their schema averages under reproduction. If a particular schema H remains above an average amount cf, then m(H, t+1) = m(H, t) * (f + cf) / f i.e. m(H, t+1) = (1+c) * m(H, t) When t = 0, m(H, 1) = (1+c) * m(H, 0) When t = 1, m(H, 2) = (1+c)2 * m(H, 0) In general, m(H, t) = (1+c)t * m(H, 0) The equation is similar to compound interest, Geometric Progression i.e. reproduction allocations exponentially increasing or decreasing schemata to future generations. EFFECT OF CROSSOVER ON SCHEMATA Consider a string A = (0111100) and two schema H1 = (*1****0) and H2 = (***11**). Let the random crossover site be 3 i.e. A = (011|1100), H1 = (*1* | ***0) and H2 = (*** | 11**). Here schema H1 will be destroyed and H2 will be survived. Since the crossover site can uniformly between 1 and 6, as the defining length of H1 is large, H1 has lesser chance to survive. Probability of H1 to be destroyed = 5 / 6. In general, P(H1 to be destroyed) = pd = (H1) / (l-1). So, p(H1 to be survived) = ps = 1 - pd. If p(crossover) = pc, then ps = (1 - pc) * (H1) / (l-1). So, combined effect of reproduction and crossover is m(H, t+1) >= m(H, t) * (f(H)/f) * ((1 - p c) * (H) / (l-1)). Thus, those schemata with above average fitness and short defining length will grow exponentially during evolution. EFFECT OF MUTATION ON SCHEMA Mutation is the random alteration of a single position with a probability pm. For a schema to be survived, each of the specified position should be survived of mutation. So, survival probability for single position = 1 pm If O(H) is the fixed positions there in the string, p(survival of mutation) = (1 p m) * O(H) = (1 O(H)) * pm + ... (1 - O(H)) * pm SCHEMA THEOREM
Page 66 of 98
So, combined effect of reproduction, crossover and mutation is m(H, t+1) >= m(H, t) * (f(H)/f) * (1 (pc (H)/(l-1)) - O(H) pm) This is called the schema theorem or the Fundamental Theorem of Genetic Algorithm i.e. short, low order above average schemata grow exponentially in the evolution.
SCHEMA PROCESSING: AN EXAMPLE String No. 1 2 3 4 Initial Population 01101 11000 01000 10011 x Value 13 24 8 19 Fitness f(x) = x2 169 576 64 361 % of Total Fitness 14.4 49.2 5.5 30.9 Selection Probability 0.144 0.492 0.055 0.309
After Mate Crossover Mutation New Fitness Selection Point Population f(x) = x2 0110 | 1 2 4 01100 144 1100 | 0 1 4 3 11101 841 11 | 000 4 2 11011 729 10 | 011 2 2 10000 256 Reproduction on H1 o Consider 3 schemata: H1 = 1****, H2 = *10** and H3 = 1***0. o Strings 2 and 4 are representations of H1. o So m(H1, t) = 2. o After reproduction, there are 3 copies of H1. o To check schema theorem, o f(H1) = (576 + 361) / 2 = 468.5 o m(H1, t+1) = (f(H1) / f) * m(H1, t) = (468.5 / 293) * 2 = 3.20 = 3 (observed). Crossover on H1 o No cross over since (H) = 0. Mutation on H1 o If pm = 0.001 then m * pm = 3 * 0.001 = 0.003 = 0 i.e. no bits moved due to mutation in the schema. o No mutation. o Thus, we obtain the expected number of schemata as prescribed by the schema theorem. o Similar is the case with H2 and H3. Two Armed And K Armed Bandit Problem Schemata of low order, short defining length and above average fitness receive exponentially increasing trials in future. Why should this way ? Can be explained using 2 bandit problem. 2 Armed Bandit Problem Slot machine with two arms: L and R. Each pays an award 1or 2 with variance 12 and 22, where 1 > 2. We want 2 things: o Make a decision about which arm to play.
Page 67 of 98
o Collect information about which is the better arm. First is called exploration and second is called exploitation. The trade off between the exploration and exploitation of knowledge is a characteristic of adaptive systems. Experimentally, one can give exponentially increasing number of trials to the observed best of arms. This is similar to the exponential allocation to better schemata.
COMPETING SCHEMATA Two schemata A and B with individual positions ai and bi are competing if at all positions i = 1, 2 l either ai = bi = * or ai != *, bi != *, ai != bi for at least one value. For example, consider the set of 8 schemata. o *00*0** o *00*1** o *01*0** o *01*1** o *10*0** o *10*1** o *11*0** o *11*1** These schemata at fixed locations 2, 3 and 5 compete to be in the next population (similar to 8 armed bandits). There are 7C3 = 35 different locations for these 23 = 8 schemata. In general, for schemata of order j of string length l, there are lCj different 2j schemata. Not all lCj = 2l, schemata are played usually. NUMBER OF SCHEMATA PROCESSED Consider a population of n binary strings. A number of long, high order schemata are destroyed by crossover and mutation. Still, a Genetic Algorithm process O(n3) schemata. This result, due to Holland, is known as Implicit Parallelism. BUILDING BLOCK HYPOTHESIS
Schemata are building blocks. E.g. maximize function f(x) = x2 on [0, 31]. Let H1 = 1**** This 1-bit fixed schema corresponds to the right side of x = 16. The 0-bit schema H2 = 0**** corresponds to the left side of x = 16.
Page 68 of 98
The schema H3 = ****1 corresponds to the half domain between 1 & 2, between 3 & 4, The schema H4 = ****0 corresponds to the half domain between 0 & 2, between 4 & 6, Thus, 1-bit schemata contribute to the half domain of the full space. The schema H5 = 10*** corresponds to the domain between 16 & 24. The schema H6 = **1*1 contribute to the domain between 5 & 6, between 7 & 8, between 13 & 14
SCHEMATA: GEOMETRIC REPRESENTATION Consider strings of length 3. All of them can be represented by the vertices. The 2-bits schemata can be represented by the lines. The 1-bit schemata can be represented by planes. In general, schemata are represented by hyper planes in a hyper cube. GENETIC ALGORITHM HARD AND GENETIC ALGORITHM DECEPTIVE Schemata or building blocks lead to better population. But, not all problems can be solved using Genetic Algorithm way. The problems, which find difficult to solve using Genetic Algorithm techniques, are called Genetic Algorithm hard problems. GENETIC ALGORITHM DECEPTIVE Genetic Algorithm hard problems may have difficulties in coding. Possible solutions may not be amenable to genetic functions (operators). This coding-function combination of Genetic Algorithm hard problems is called Genetic Algorithm deceptive. GENETIC ALGORITHM DECEPTIVE: CHARACTERISTICS Genetic Algorithm deceptive tends to have a remote, isolated optima i.e. a best point surrounded by a huge collection of worst points. Finding such is similar to finding a needle in a haystack. Many, not only Genetic Algorithm, techniques have difficulty in such cases. Consolation: such real world problems are less. MINIMUM DECEPTIVE PROBLEM (MDP) Is the smallest problem that can be deceptive or misleading? For this, we consider low order, short schema which lead to incorrect longer order schema. We can show that 2-bit schema problem is the smallest MDP. 2-BIT PROBLEM IS MDP Consider four 2 order schema over two defining points with attached fitness. o ***0*****0* f00 o ***0*****1* f01 o ***1*****0* f10 o ***1*****1* f11 (The fitness values are schema averages.) Suppose f11 is the global maximum.
Page 69 of 98
Then f11 > f00, f01, f10. Introduce an element deception to make Genetic Algorithm hard. For that, we assume that one or both of sub-optimal, 1 order schemata are better than the global optimal 1 order schemata i.e. f(0*) > f(1*) and f(*0) > f(*1) i.e. (f(00) + f(01)) / 2 > (f(10) + f(11)) / 2 and (f(00) + f(10)) / 2 > (f(01) + f(11)) / 2. Both cannot be true at the same time, as then f11 cannot be the global optimum. Only one result is true. Without loss of generality, assume that f(0*) > f(1*) and f(*0) < f(*1). Normalize the global conditions and label. So r = f11 / f00, c = f01 / f00, c = f10 / f00 and r > c, r > 1 and r > c Deception condition in normalized form: r < 1 + c c These results give: c < 1 and c < c Thus there are two types of deceptive two-problems. o Type 1: f01 > f00 (c > 1) o Type 2: f00 > f01 (c <= 1) Both are deceptive. So 2-bit problem is deceptive. Similarly we can prove that 1-bit problem is not deceptive. So 2-bit problem is the MDP.
EXTENDED SCHEMA ANALYSIS OF 2-BIT PROBLEM We have a 2-bit problem that seems misleading. By schema theorem, m(H, t+1) >= m(H, t) * (f(H)/f) * (1 (pc (H)/(l-1)) O(H) pm) When pm = 0, crossover has more importance. We look at a closer look at crossover. Cross over yield table in 2-bit problem. X 00 01 10 11 00 s s s 01, 10 01 s s 00, 11 s 10 s 00, 11 s s 11 01, 10 s s s
On crossover, complements lose genetic material. This loss is compensated by the gain to the other complementary pair schemata. We have to account for the expected loss and gain of schemata due to cross over. Assuming proportionate reproduction, crossover and mutation, we can have the proportion for each of the schema. Pt+111 = Pt11 * f11 / f{1-pc (f00/f) Pt11} + pc (f01f10/f2) Pt01 Pt10 Pt+110 = Pt10 * f10 / f{1-pc (f01/f) Pt01} + pc (f00f11/f2) Pt00 Pt11 Pt+101 = Pt01 * f01 / f{1-pc (f10/f) Pt10} + pc (f00f11/f2) Pt00 Pt11 Pt+100 = Pt00 * f00 / f{1-pc (f11/f) Pt11} + pc (f01f10/f2) Pt01 Pt10 Where f is average population fitness and pc = p(cross between bits) = pc * (H) / (l-1). These equations predict the expected proportions of the four schemata. A necessary condition for Genetic Algorithm to be successful is that sequence <Pt11> converges to 1.
Page 70 of 98
Thus, Genetic Algorithm refuses to be misled by initial conditions. 7 : COMPUTER IMPLEMENTATION OF A GENETIC ALGORITHM
It is disturbing initially. Due to o Coding o Population not individuals. o Randomness giving direction. GENETIC ALGORITHM IMPLEMENTATION Data Structure o Strings by arrays, o Specify o Population size o String size o Probability of mutation o Probability of crossover Reproduction o Through roulette wheel selection method. o Take a partial sum of the fitness values. o Generate a random number to specify the location where the wheel has stopped. o Correspondingly select that string to be in the population. Crossover o Take two parents randomly. o Generate a random number between 1 and l-1. o This is crossover site. o Exchange bits from that site onwards. o Two off-springs are generated. Mutation o Generate a random number to select a string for mutation. o Generate a random number to select a bit for mutation. o Change the bit of that position. o Generating a new offspring. Fitness Function o Choose a proper fitness function. o A strings suitability in the next population is judged based on this. GOOD GENETIC ALGORITHM: EXPERIMENTS SHOW Choose high crossover probability. Choose low mutation probability. Choose moderate population size say 30. GENETIC ALGORITHM DRAWBACK: PREMATURE CONVERGENCE Procedure may optimize. But not globally.
Page 71 of 98
Resulting in converging points prematurely.
MAPPING OBJECTIVE FUNCTION TO FITNESS FUNCTION Objective of some problems is to minimize a function say cost g(x). Then take f(x) = Cmax g(x) where g(x) < Cmax is the largest cost observed till then. For maximization function, take f(x) = u(x) + Cmin FITNESS SCALING At start of Genetic Algorithm: some better fitter strings among inferiors. Within a few generations, the superior ones will become dominant. This may lead to premature convergence. Later though diverse, almost all may have almost same fitness. This may lead to average and superior ones get same copies in the next population. Survival of the fittest becomes a random walk, which needs to be avoided. Strategy to avoid scaling. LINEAR SCALING f = a f + b (a & b are to be found) Always take: o favg = favg o fmin = fmin o 2 * favg = fmax o fmax = Cmult . favg Where Cmult = expected number of copies of best in the next population. BECAUSE OF SCALING Number of extraordinary best ones is restricted. Number of lowly ones increases. In mature run, a few bad strings may have below population average. If scaling is applied here, the low fitness values can go negative, which is undesired. In that case, take fmin = fmin = 0. Scaling helps to prevent early dominance of a few best ones and encourages healthy competition among equals. CODING IN GENETIC ALGORITHM Different methods. Which to select ? Two guidelines o Principle of meaningful building blocks The user should select a coding so that short order schemata are relevant to the underlying problem and relatively unrelated to schemata over other fixed positions. o Principle of minimal alphabets The user should select the smallest alphabet that permits a natural expression of the problem. CODING: BINARY VS. NON-BINARY
Page 72 of 98
E.g. f(x) = maximize x2 on [0, 31] Binary Representation 32-Alphabet Representation 01101 N 11000 Y 01001 I 10011 T There are non coding similarities to be exploited in non-binary.
DISCRETIZATION OF CONTINUM Many optimization problems require functions over a continum. This can be converted into a finite collection of discrete problems which can then be solved using Genetic Algorithm. CONSTRAINTS Genetic Algorithm generally used for unconstrained problems. Genetic Algorithm can be used for constrained problems too. This can be done by incorporating a penalty in the objective function as and when the constraints are violated. 8 : SOME APPLICATION OF GENETIC ALGORITHMS HISTORY OF GENETIC ALGORITHM Initially biologists used computer for simulating natural genetics. Aim: understand natural phenomena. Pioneer: Frazer considered a phenotype function as phenotype = a + q a |a| + c a3 Chromosome = 15 bit string, where 5 bits each for a, q and c. Interaction of this group of bits forms a selection of diverse population. Chose strings with phenotype values between specified limits say 0 and 1. Future generations are evolved with acceptable string structure. Similar to typical optimization. HOLLAND, THE FATHER OF GENETIC ALGORITHM Introduced Genetic Algorithm as a computational technique, though was entirely different. Wanted to create general programs and machines which could adapt to the changing environment. Recognized: o Importance of selection. o Effectiveness of population against individuals. o Initially, lesser importance to crossover, mutation BAGLEY AND ADAPTIVE GAME PLAYING PROGRAM First coined Genetic Algorithm. Used to play hexa-pawn. Used reproduction, crossover, and mutation. Reduced selection in the beginning to reduce dominance of some. Increased selection later to allow competition.
Page 73 of 98
Technique called as Adaptive Genetic Algorithm.
ROSENBERG AND BIOLOGICAL CELL SIMULATION Simulated a population of a single celled organism. Defined a finite length string with a pair of chromosome (diploid). A string length of 20 with a max of 16 alleles. Introduced Offspring generation function (OGF) fixes number of offspring to check selection. Introduced p(crossover site). CAVICCHIO AND PATTERN RECOGNITION Applied Genetic Algorithm to the design of detectors for pattern recognition. An image is digitized as 25 x 25 binary pixel grid. A detector is a subset of the pixels. During training, known images are presented and list of detector states are stored. During recognition phase, unknown image is presented and count the matching. Allowed reproduction, crossover and mutation. Used pre-selection an offspring always replaces one of the parents to give diversity. WEISENBERG AND CELL SIMULATION Computer simulation of a living cell. Proposed a multilevel Genetic Algorithm. The lower one is an adaptive Genetic Algorithm and the upper is non-adaptive. Lower one is meant to find parameters of Genetic Algorithm. These parameters are given to the upper level Genetic Algorithm and test the fitness of population strings. The fitter ones will be sent to the lower level and continues. Upper one functions like a supreme judge. HOLLSTEIN AND FUNCTION OPTIMIZATION Used five selection methods: o Progeny testing: fitness of offspring controls parents further breeding. o Individual selection: fitness of one decides the future of it as a parent. o Family selection: fitness of family controls use of all family members as parent. o Within family selection: fitness of one within family controls selection within family. o Combined selection: combination of selection methods. Used eight mating preferences: o Random mating: equally likely. o Inbreeding: related ones mate intentionally. o Line breeding: a unique one is identified and mate with standard one and offspring is selected. o Outbreeding: contrast ones are chosen as parents. o Self fertilization: breeds itself. o Clonal propagation: a copy of one is formed. o Positive assortive mating: like ones with like ones. o Negative assortive mating: unlike ones are bred. Used 16 string populations.
Page 74 of 98
Concluded that inbreeding and outbreeding are better.
FRANTZ AND POSITIONAL EFFECT Larger population size (100) and string size (25). Used: o Roulette wheel selection o Simple crossover o Mutation Find a correlation between positional effect and rate improvement. Introduced: o Inversion o Partial complement (migration) o Multiple point crossover Migration: o Select a few strings. o Complement about one third of bits of these strings. o The new strings are called immigrants. o Helps in maintaining diversity, but reduces performance. BOSWORTH, FOO, ZEIGLER REAL GENES Coding based on minimalist binary like - against maximalist approach. Thought mutation needed a change. So, introduced five variations. Not Genetic Algorithm in the pure sense. BOX AND EVOLUTION OPERATION More of a management technique for workers to execute a plan than an algorithm. Followed natures mechanism: o Genetic variability o Selection Loose application of mutation as anything which changes structure. Not a Genetic Algorithm in the modern sense. FOGEL, QUEENS AND WASH EVOLUTION PROGRAMMING Consider a state diagram of 3 state machines. 0 and 1 are the inputs. A, B, C are the states. , , are the outputs. Two operators: o Selection: choose best out of parent and child. o Mutation: make different a string by an output, state transition, no of states or initial state. Drawback: limited to small problem space. The transition description is given by: Present State Input Symbol Next State Output Symbol
Page 75 of 98
C B C A A B
0 1 1 1 0 1
B C A A B C
DE JONG & FUNCTION OPTIMIZATION Mainly used as a function optimizer. Used six functions with properties: o Continuous / discreet o Convex / non-convex o Unimodal / multimodal o Quadratic / non-quadratic o Low dimensionality / high dimensionality o Deterministic / stochastic Devised two different performance measures: o Offline (convergence) performance o Online (ongoing) performance Offline: different functions are tried and the best is saved for subsequent operations. The performance xe(s) of strategy s on environment e is given by xe(s) = 1/T fe(t) where fe(t) is the objective function value for environment e on trial t. Online: acceptable performance is taken. The online performance xe*(s) of strategy s on environment e is given by xe*(s) = 1/T fe*(t) where fe*(t) = best{fe(1), fe(2), , fe(t)}. De Jong called his algorithm reproduction plan R1. In R1, three operations were used: o Roulette wheel selection o Simple crossover o Simple mutation R1 is a family of plans using 4 parameters n, pc, pm and G (Generation Gap). G = 1 for non-overlapping populations = 0 < G < 1 for overlapping populations. In overlapping populations, n x G individuals will be selected for genetic operations. He observed that larger pop size lead to better offline performance and smaller pop size lead to rapid initial change. He investigated five variations of plan R1. They are: o R2 - Elitist Model o R3 - Expected Value Model o R4 - Elitist Expected Value Model o R5 - Crowding Factor Model o R6 - Generalized Crossover Model R2 - Elitist Model
Page 76 of 98
o Let a*(t) be the best individual generated up to time t. After generating A(t+1) in the usual fashion, if a*(t) is not in A(t+1), then include a*(t) to A(t+1) as the (N+1) th member. o It improves the local search at the expense of global perspective. R3 - Expected Value Model o Each string in the population is given an expected number of off-springs f / f. o Thereafter, each time a string is selected for crossover or mutation, its offspring count is reduced by 0.5 o When an individual is selected for reproduction without crossover or mutation, its offspring count is reduced 1. o If the offspring count < 0, it is no longer available for selection. R4 - Elitist Expected Value Model o Combination of R2 and R3. o Much better performance. R5 - Crowding Factor Model o In nature, like individuals dominate a niche in the population. o Then, increased competition for limited resources decreases life expectancy and birth rate. o De Jong enforced a crowding pressure by the forceful replacement of older strings with newer off-springs. o For that, consider an overlapping pop with G = 0.1 o Defined a parameter-crowding factor. o When an off-spring is born, a string is selected for dying. o The dying string is selected as that one which resembles the new off-spring (like bit-bybit similarity). o Process is similar to pre-selection of Cavicchio. R6 - Generalized Cross Over Model o Used a new parameter number of crossover points (CP). o When CP = 1, it is simple crossover. o If l is the length of string, then there are lCCP operators for multiple crossovers. o As CP is increased, each operator has less chance to be picked up during a particular cross and hence less structure can be preserved i.e. effectively, the process becomes a random shuffle and fewer important schemas can be preserved.
IMPROVEMENT IN BASIC TECHNIQUES Since De Jong there were improvements to the basic Genetic Algorithm. They correspond to: o Selection o Scaling o Ranking ALTERNATIVE SELECTION SCHEMA: BRINDLE Deterministic Sampling o Find pselection = fi / fi o E(Number of Strings) = ei = int(pselection * n) o Population is selected according to the fraction part of ei. o Fill the remaining slots of population from the top of the sorted list.
Page 77 of 98
o Example: String No. 1 2 3 4 Initial Population 01101 11000 01000 10011 x Value 13 24 8 19 Fitness f(x) = x2 169 576 64 361 % of Total Fitness 14.4 49.2 5.5 30.9 Selection Probability 0.144 0.492 0.055 0.309
o Here strings 2 and 4 are selected initially. o Then sort the fractional parts 0.96, 0.56, 0.23 and 0.22 o Best strings are 2 and 1 corresponding to the fractional parts 0.96 and 0.56 o Thus new population = {2, 4, 2, 1} Remainder Stochastic Sampling Without Replacement o Like deterministic sampling, integer values are selected. o Fractional parts of ei are taken as probabilities. o Bernoullis trials are conducted with probability of success = fractional probabilities. o E.g. ei = 2.5 will have 2 sure and another with probability 0.5 Remainder Stochastic Sampling With Replacement o Like deterministic sampling, integer values are selected. o Fractional parts of ei are used to calculate weights in a roulette wheel selection procedure. Stochastic Sampling With Replacement o Typical roulette selection. Stochastic Sampling Without Replacement o Typical De Jongs expected value modal R3. o Each string in the population is given an expected number of off-springs f / f`. o Thereafter, each time a string is selected for crossover or mutation, is offspring count is reduced by 0.5 o When an individual is selected for reproduction without crossover or mutation, its offspring count is reduced 1. o If the offspring count < 0, it is no longer available for selection. Stochastic Tournament o Selection probabilities are calculated as normal. o Successive pairs of individuals are drawn using Roulette wheel selection. o Out of a pair, string with higher fitness is taken into the population. o A new pair is drawn and continued until the population is full. These selection procedures show many drawbacks. It is because of the inferiority of Roulette Wheel selection Out of all these, R3 is considered to be better.
SCALING MECHANISM Without scaling some highly fit strings may dominate from the beginning. Important scaling procedures are: o Linear scaling f = a f + b. (a & b are to be found). Always take: favg = favg & fmin = fmin & 2 * favg = fmax & fmax = Cmult . favg
Page 78 of 98
Where Cmult = expected number of copies of best in the next population. o Sigma truncation F = f - (f^ - c) where = population standard deviation and c is a constant between 1 and 3. Negative values will never occur. o Power low scaling f = fk for some k. RANKING PROCEDURES Selection of strings is based on ranks of their fitness values. Population is sorted according to the fitness values. Individuals are assigned an offspring count based on their rank. APPLICATIONS OF GENETIC ALGORITHM Medical image registration with Genetic Algorithm o Genetic Algorithm is used to perform image registration in a Digital Subtraction Angiography (DSA). o DSA checks the interior of an artery by comparing two X-rays which are taken before and after injecting a dye. o Images are digitized and subtracted pixel by pixel. o The difference image gives the interior of the artery. o The pre injection image is transformed by a bilinear map x(x, y) = a 0 + a1x + a2y + a3y and y(x, y) = b0 + b1x + b2y + b3y where ai and bi are unknown. o Genetic Algorithm is used to find this ai and bi by minimizing the mean absolute differences of the images. Iterated prisoners Dilemma Decision Co-operate Defect Player 2 Co-operate Defect (R = 6, R = 6) (S = 0, T = 10) (T = 10, S = 0) (P = 2, P = 2)
Player 1
o Play the problem repeatedly to find history of C and D. o This iterative problem can be solved using tit for tat. o Axelord showed that this can be solved by a much better way using Genetic Algorithm by a representation of 63 bit string and using last three strategies of the other prisoner. 9 : ADVANCED OPERATORS AND TECHNIQUES IN GENETIC SEARCH Until now, we considered Genetic Algorithm with genetic operators: o Selection o Mutation o Crossover
DOMINANCE, DIPLOIDY, ABEYANCE Nature offers: o Diploidy (i.e. pairs of chromosomes)

Page 79 of 98
o Dominance (as shown by Mendel on pea plants) Until now we considered only haploid (i.e. single stranded chromosome like (1011110001)).
NATURES WAYS Most of the complex and difficult life forms in nature are diploids or double stranded chromosomes. In diploid form, a genotype carries pairs of chromosomes. They are called homologous chromosomes. And they carry information for the same function. DIPLOID CHROMOSOMES
Each cell has nucleus Rod-shaped particles inside are chromosomes which we think of in pairs. Different number for species, human (46), tobacco (48), goldfish (94), chimpanzee (48). Usually paired up. X & Y Chromosomes. o Humans: Male(xy), Female(xx) o Birds: Male(xx), Female(xy) E.g. o AbCDe o aBCde Each pair contains upper case and lower case characters. Each allele represents a special characteristic. E.g. o a blue eyes o A green eyes But phenotype at a time can have only one of them. This possible by using genetic operator-dominance.
DOMINANCE An allele is said to be dominant if it is expressed when paired with some other allele. E.g. upper cases are dominant and lower cases are recessive.
Page 80 of 98
The phenotype expressed by the chromosome pair will be: AbCDe and aBCde ABCDe i.e. a dominant gene is expressed when heterozygous (Aa A) or homozygous (CC C). A recessive gene is expressed only when homozygous (ee e). Diploidy remembers alleles. Dominance protects a remembered allele from a harmful selection in a hostile environment. In a changed environment, remembered but neglected allele can be selected. The effectiveness of organism increases as it is more ready to face the changing environment. Diploidy permits to carry along multiple possibilities with only one is expressed. Old lesson learned from experience but not used.
DIPLOIDY AND DOMINANCE IN GENETIC ALGORITHM Two evolving dominance mechanisms. First Scheme: o Each binary gene is described by two genes: A modifier gene A functional gene o Functional gene takes 1 or 0. o Modifier gene takes either M or m. o Here 0 is taken as dominant. o A dominance expression map is constructed as follow: 0M 0m 1M 1m 0M 0 0 0 0 0m 0 0 0 1 1M 0 0 1 1 1m 0 1 1 1
o Similarly he introduced a single locus tri-allelic dominant ma as follows: 0 1 2 o o o o 0 0 0 1 1 0 1 1 2 1 1 1
Later Holland studied this further and represented tri-allelic as {0, 10, 1}. Here most effective becomes dominant and shields the other. A mutation like operator is needed to map 1 to 10 and 10 to 0 and so on. Results: Better population diversity. No improvement in average and final performance in comparison with haploid simulation.
BRINDLES STUDY ON DOMINANCE Introduced six schemes: o Random, Fixed, Global Dominance Dominance of binary allele is determined for all loci for all time at the beginning.
Page 81 of 98
At each location, an unbiased coin is thrown and a single dominance map is recorded. The dominant allele is expressed is it is heterozygous or homozygous. The recessive allele is expressed only when it is homozygous. o Variable, Global Dominance The probability of dominance of 0 or 1 at a locus = proportion of 0s and 1s at each location. The expression of allele at a locus is performed by the Bernoullis trials for heterozygous loci. o Deterministic, Variable, Global Dominance Proportion of 0s and 1s at each location is calculated. The allele with greater proportion is declared dominant. o Choose A Random Chromosome A chromosome is selected from the pair randomly and its alleles are taken as dominant. Equal to selecting and using one of heterozygous pair at random. o Dominance Of The Better Chromosome Compare the fitness of each chromosome and choose the better as dominant. o Haploid Controls Diploid Adaptive Dominance A third (haploid) chromosome carries an adaptive dominance map to determine the expression of the normal diploid pair. Dominance map can be constructed dynamically by a Genetic Algorithm. Conclusion: o Many objected the proposals of Brindle on dominance.
AN ANALYSIS OF DOMINANCE OF DIPLOIDY IN GENETIC ALGORITHM SEARCH We can add dominance and Diploidy to the schema theorem. Let He be the expressed schema and H be the physical schema. Then m(H, t+1) >= m(H, t) (f(He)/f) (1 (pc (H)/(l-1)) - o(H) pm) For a fully dominant schema H, f(H) = f(He) It is expected that f(He) >= f(H) Suppose only two alternatives, competing schemata - one dominant and other recessive. The dominant one is expressed when it is heterozygous and the recessive one is expressed when it is homozygous. Let fd and fr be the expected fitness values. Then the proportion of recessives in the next generation is given by: o Pt+1 = pt K{pt + r(1-pt)/(1-r)pt . pt + r), where r = fd / fr and K = crossover-mutation loss constant. Draw a graph with Pt+1 against Pt. Conclusion: o The haploid case always destroys more than the corresponding diploid case. o Under Diploidy and dominance, mutation plays lesser role. 10 : INTRODUCTION TO GENETICS BASED MACHINE LEARNING
Page 82 of 98
Introduced by Holland. Suggested a language - Broadcast Language. It consists of production rules (called broadcast units) over 10-letter alphabet of bits and a wild car. Later, first GBML system called Cognitive System Level One (CS-1) implemented. Applications in different fields.
CLASSIFIER SYSTEMS Most popular form of GBML. It is a machine learning system that learns syntactically simple string rules to guide its performance in an arbitrary environment. Consists of: o Rule & Message system o Apportionment of credit system (modeled after an info based service economy) o Genetic Algorithm. RULE & MESSAGE SYSTEM Kind of production system. Production rules have format: o If <condition> then <action> o E.g. if <(0,0)> then <(4,0)> Each rule is of a fixed length and is amenable to genetic operations. It allows parallel action of rules a single of usual expert system. The relative value of a rule is to be learned against fixed value. Competition ensures good rules survive. Classifiers bank balance is taken as fitness and an internal currency is introduced. <message> :: {0, 1}l is the basic token of info exchange. <condition > :: {0, 1, #}l A condition is matched by a message if the corresponding bits are mapped. E.g. #10# matches 1100 but not 1000. Once matched, that classifier becomes candidate to post its message to the message list. Example: No. Classifier 1 01## : 0000 2 00#0 : 1100 3 11## : 1000 4 ##00 : 0001 o Suppose environment sends message 0111. APPORTIONMENT OF CREDIT ALGORITHM Bucket Brigade Algorithm: rank the individual classifier according to its efficiency in achieving reward from the environment. It is an info economy, where right to trade is bought & sold by classifiers. They form a chain of middlemen from the info manufactures (environment) to the info customers (effectors). Has two components:
Page 83 of 98
o Auction When a classifier matches a condition, then it is qualified for auction. Each classifier maintains a record of its net worth, called strength. Each matched classifier makes a Bid B proportional to its strength. Thus, highly fit ones are given preference. o Clearing House When a classifier is selected for auction, it clears payment through clearing house. A matched & activated classifier sends its bids to those classifiers responsible for sending the messages that matched bidding classifiers conditions. Classifiers make bids (Bi) during the auction. Winning classifiers turn over their bids to the clearing house as payments (Pi). A classifier may have receipts Ri from its previous message sending activity or from environment. Thus, the strength of ith classifier Si(t+1) = Si(t) Pi(t) Ti(t) + Ri(t). But, bids are corresponding to its strengths, so Bi = Cbid x Si. If there are noise, take the effective bid as EBi = Bi + N( bid). The winner pay their bids Bi not EBi to the clearing house. The tax of the classifier is given by Ti = Ctax x Si. So the difference equation is given by S(t+1) = S(t) Cbid S(t) Ctax S(t) + R(t) I.e. S(t+1) = (1 - k) S(t) + R(t) where k = Cbid + Ctax. The system is stable only when R(t) is bounded. Leaving R(t), S(t+1) = (1 k) S(t) S(n) = (1 k )n S(0), for a active classifier.
GENETIC ALGORITHM In order to apply new and better rules into the system of bucket brigade, Genetic Algorithm is used. Here it is different from the optimization case. Non overlapping population is possible. Generation gap is used. Selection is through RW and De Jongs crowding procedure. Mutation with change as 0 {1, #} with probability. SIMPLE CLASSIFIER SYSTEM (SCS) A SCS, a simple version of classifier system is developed. Experiment results show SCS with Genetic Algorithm outperform SCS without Genetic Algorithm and random guessing. 11 : APPLICATIONS OF GENETIC BASED MACHINE LEARNING GBML systems discuss better computer programs by applying selection, recombination and other genetic operators. Holland pioneered theoretical foundation of GBML. This followed: o Proposal of Broadcast Language. o Implementation of first classifier system (Cognitive System 1) Proposal to construct complex machines built from fixed components with schemata property.
Page 84 of 98
CS-1
Classifier conditions are constructed over {0, 1, #}. Many resources - e.g. hunger, thirst. Maintains different reservoirs for different resources. Current resource levels are used to determine current demand, which then are used to determine which rules to activate. An epochal algorithm is used instead of bucket brigade algorithm. Epoch is the time period between two payoff events. The parameters are the predicted payoff values ui. Let di be the current demand (i.e. lower reservoir level). Then appropriation value = di ui. Roulette wheel is weighted with M where M = match score, which increases with rule specificity. Roulette wheel selection is used to select winner. The epochal apportionment algorithm tracks the accuracy of a classifiers predicted payoff using three parameters: o Age o Frequency o Attenuation Results show the procedure outperforms other methods.
LS-1 SYSTEM Smiths LS-1 system has different architecture. Hollands CS treat rules as individuals whereas LS-1 treats rule sets as individuals in a population. Has four genetic operators: o Reproduction o Mutation o Modified Crossover o Inversion (i.e. r1:r2:r3 r3:r2:r1). Results: good, but cannot be compared to CS-1 as measures are different. BOOKERS FOOD AND POISON LEARNER Studies: o Connection between classifier systems and cognitive science. o Modifications to Genetic Algorithm that offers Machine Learning. o Application of classifier systems to the problem of finding food and avoiding poison. Uses split architecture of CS-1. Introduces two mechanisms: sharing and marriage restriction. Sharing: conditions that match same pattern share the payoff. Marriage Restriction: restricts mating of complementary patterns. EYEEYE COORDINATION
Page 85 of 98
Wilsons classifier system for the sensory-motor coordination of a video camera. To learn to center an object in the video camera by moving the camera in the right direction. Uses an innovated CS-1 architecture. Uses a complex retina-cortex mapping. Instead of 1-dimension string, 4 by 4 arrays are used as rules. Uses 2-dimension crossover.
ANIMAT CLASSIFIER SYSTEM Wilsons roaming classifier system that searches two dimensional woods, seeking food and avoiding trees. Uses a 18 by 58 rectangular grid which contains trees(T) and food(F). The ANIMAT, represented by *, has the knowledge about immediate surroundings. E.g. o BTT o B*F o BBB It generates an environmental message TTFBBBBB. Take T 01, F-11 and B 00. Then 0101110000000000 is the bit representation. There are eight classifiers to recognize this. There are four genetic operators: o Match set, set of matching classifiers. o Create op, when no matching classifier. o Partial intersection op: 2 rules with same action are aligned by replacing mismatch with #. o Time to payoff estimation. PIPE LINE OPERATIONS CLASSIFIER SYSTEM Due to Goldberg. Has two parts: o Optimization of pipeline operations by Genetic Algorithm. o Learning control of pipeline operations by classifier systems. BOOle Due to Wilson. A classified system that learn difficult Boolean function. Applied on a function that uses NOT, AND, OR. CL-ONE Parallel Semantic Networks in a classifier frame work by Forest. Developed a complier to translate code written in semantic network language KL-ONE to classifier system format. Thus connected symbolic Artificial Intelligence to Classifier Systems. Has four components: o Parser and classifier generator
Page 86 of 98
o Symbol table manager o External command processor o Classifier systems LEARNING SIMPLE AND SEQUENTIAL PROGRAMS Due to Cramer. Shows that Gas can be used with programs not in production rule format. Worked with PL language and converted to a simpler one called PL. Devised two coding schemes: JB and TB. GENETIC PROGRAMMING Due to Koza. Genetic Algorithm can be used to generate programs. Highly successful. 12 : INTRODUCTION Challenge: How to manage ever-increasing amounts of information. Solution: Data Mining and Knowledge Discovery Databases (KDD).
INFORMATION AS A PRODUCTION FACTOR Most international organizations produce more information in a week than many people could read in a lifetime. Ability to learn and interpret is not sufficient. Mechanization of filtering, selecting, interpreting of data is important. E.g. Stock Market. COMPUTER SYSTEMS THAT CAN LEARN Adaptation to the environment is natural. E.g. plants, animals, human beings. Learning is a form of adaptation. Machines could be programmed to learn from mistakes. Thus expert systems learning systems. DATA MINING MOTIVATION Mechanical production of data, need for mechanical consumption of data. Large databases = vast amounts of information. Difficulty lies in accessing it. KDD AND DATA MINING KDD o Extraction of knowledge from data. o Official definition: non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data. Data Mining o Discovery stage of the KDD process. o Process of discovering patterns, automatically or semi-automatically, in large quantities of data.
Page 87 of 98
o Patterns discovered must be useful: meaningful in that they lead to some advantage, usually economic.
Data mining is a multi-disciplinary field DATA MINING VS. QUERY TOOLS SQL: When you know exactly what you are looking for. Data Mining: When you only vaguely know what you are looking for. PRACTICAL APPLICATIONS KDD more complicated than initially thought o 80% preparing data o 20% mining data DATA MINING TECHNIQUES Not so much a single technique. More the idea that there is more knowledge hidden in the data than shows itself on the surface. 13 : WHAT IS LEARNING? LEARNING An individual learns how to carry out a certain task by making a transition from a situation in which the task cannot be carried out to a situation in which the same task can be carried out under the same circumstances. SELF LEARNING COMPUTER SYSTEMS A self learning computer can generate programs itself, enabling it to carry out new tasks. MACHINE LEARNING AND THE METHODOLOGY OF SCIENCE
Page 88 of 98
Empirical cycle of scientific research MACHINE LEARNING
Theory Formation
Theory Falsification
Page 89 of 98
The patterns that machine learning programs find can never be definitive theories. They are only hypothesis with temporary values. So machine learning programs need to be checked for their statistical relevance.
CONCEPT LEARNING Recognition by experience. Classification accuracy. Transparency. Statistical relevance. Information content. Complexity of search space. A KANGAROO IN MIST
Complexity of Search Spaces So, it is important to know the complexity of search space before hand. We can use learning algorithms for search: o Supervised vs. unsupervised o Background knowledge o Bias (constraint) o Batch learning vs. incremental learning o Noise and redundancy 14: DATA MINING AND THE DATA WAREHOUSE
WHAT IS DATA WAREHOUSE? Defined in many different ways, but not rigorously. o A decision support database that is maintained separately from the organizations operational database. o Support information processing by providing a solid platform of consolidated, historical data for analysis.
Page 90 of 98
A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process. - W. H. Inmon. Data Warehousing: o The process of constructing and using data warehouses.
DATA WAREHOUSE Subject-Oriented o Organized around major subjects, such as customer, product, sales. o Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. o Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Integrated o Constructed by integrating multiple, heterogeneous data sources. Relational databases, flat files, on-line transaction records. o Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g. Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. Time Variant o The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g. past 5-10 years). o Every key structure in the data warehouse contains an element of time, explicitly or implicitly but the key of operational data may or may not contain time element. Non-volatile o A physically separate store of data transformed from the operational environment. o Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms. Requires only two operations in data accessing: initial loading of data and access of data. DESIGNING DECISION SUPPORT SYSTEM Requirements of user. Hardware and Software requirements. Integration with Data Mining. CLIENT / SERVER AND DATA WAREHOUSE Top down or bottom up. Requirement of data marts. COST JUSTIFICATION Speed.
Page 91 of 98
Complexity. Repetition. Comparison with expert system. 15 : THE KNOWLEDGE DISCOVERY PROCESS
Pre-processing o Data selection o Cleaning o Coding Data Mining o Select a model o Apply the model Analysis of results and assimilation o Take action and measure the results
THE KDD PROCESS

Data Selection Cleaning
- Domain Consistency - De-duplication - Disambiguation
Enrichment
Coding
Data Mining - Clustering - Segmentation - Prediction
Reporting
Info. Reqt
Action
Operational Data
External Data Feedback
DATA PREPROCESSING Data Selection o Identify the relevant data, both internal and external to the organization. o Select the subset of the data appropriate for the particular data mining application. o Store the data in a database separate from the operational systems. Cleaning o Domain consistency: replace certain values with null. o De-duplication: customers are often added to the DB on each purchase transaction. o Disambiguation: highlighting ambiguities for a decision by the user. E.g. if names differed slightly but addresses were the same.
Page 92 of 98
Enrichment o Additional fields are added to records from external sources which may be vital in establishing relationships. Coding o E.g. take addresses and replace them with regional codes o E.g. transform birth dates into age ranges It is often necessary to convert continuous data into range data for categorization purposes.
DATA MINING Preliminary Analysis o Much interesting information can be found by querying the data set. o May be supported by a visualization of the data set. Choose one or more modeling approaches. There are two styles of data mining: o Hypothesis testing o Knowledge discovery The styles and approaches are not mutually exclusive. DATA MINING TECHNIQUES Not so much a single technique. More the idea that there is more knowledge hidden in the data than shows itself on the surface. Any technique that helps to extract more out of data is useful: o Query Tools 80% of interesting info possible through this. But remaining 20% is more vital to business. Capable to provide a nave prediction. Any better algorithm has to give better results. o Statistical Techniques o Visualization Gives better idea about data sets and possible patterns. E.g. scatter diagram.
Likelihood & distance: similar records are close in the space. Records become points in multidimensional data space. Thus we can find clusters in the space. E.g. possible customers of a product.
Page 93 of 98
o On-Line Analytical Processing (OLAP) Assume multidimensional data are available. We can access information corresponding to business requirements. OLAP vs. Data Mining: OLAP tools do not learn, no knowledge. Cannot search for solutions. Data Mining is more powerful. o Case-Based Learning (k-nearest neighborhood) Records that are close to each other live in each others neighborhood. If we want to predict the behavior of an individual, look at the behavior of its neighbors. Take the average of them. That will be prediction for the individual. K is the number of neighbors. Training set includes classes. Examine K items near to item to be classified. New item placed in class with the most number of close items.
KNN Algorithm
Page 94 of 98
o Decision Trees Tree representation of data. Can identify the conditions. E.g. car ownership with age.
o Association Rules Identifies matching habits. E.g. blue jeans and white T-shirts. Example: Market Basket Data Items frequently purchased together: Bread Butter Uses: o Placement o Advertising o Sales o Coupons Objective: increase sales and reduce costs. o Neural Networks
Mainly 3 models. Perceptrons o Perceptron is one of the simplest Neural Network. o No hidden layers.
Page 95 of 98
Back propagation networks Kohonen Self Organizing Maps o Brain has different places called visual maps, maps of spatial possibilities etc. o Initially, SOM has a random assignment of vectors to each unit. o During training, these vectors are incrementally adjusted to give better coverage of the same. o Competitive Unsupervised Learning. o Observe how neurons work in brain: Firing impacts firing of those near. Neurons far apart inhibit each other. Neurons have specific non-overlapping tasks.
o Genetic Algorithms (Refer Previous Notes) 16 : SETTING UP A KDD ENVIRONMENT DIFFERENT FORMS OF KNOWLEDGE Shallow: easily retrievable using SQL. Multidimensional: can use OLAP. Hidden: can use pattern recognition algorithms. Deep: can discover only with clues. E.g. decryption possible only through keys. SIX STAGES OF KDD Data Selection
Page 96 of 98
Cleaning Coding enrichment Data Mining Reporting
DATA MINING Different types of tasks could be tackled by suitable techniques: o Classification tasks: Association Rules, k-n neighborhood, Decision Trees. o Problem solving tasks: Genetic Algorithm. o Knowledge engineering tasks: inductive logic algorithm. A Data Mining algorithm is selected based on: o Quality of input o Quality of output o Performance 10 RULES FOR SETTING A GOOD DATA MINING ENVIRONMENT Support extremely large data set. Support hybrid learning: classification Establish a Data Warehouse. Introduce data cleaning facilities. Facilitate working with dynamic coding. Integrate with DSS. Choose extendable architecture. Support heterogeneous Databases. Introduce Client-Server architecture. Introduce cache optimization. 17 : SOME REAL-LIFE APPLICATIONS Customer profiling for a large bank. CAPTAIN - career planner for pilots in KLM airlines. Discovering foreign key relationships. 18 : SOME FORMAL ASPECTS OF LEARNING ALGORITHMS LEARNING AS A COMPRESSION OF DATA Learning reduces search space i.e. learning is similar to data compression. Not all compressions are easily learnable. E.g. decryption. LEARNING ALGORITHM AS A BLACK BOX With input file message and output file message. Types of messages: o Unstructured or random messages
Page 97 of 98
o Highly structured messages with patterns that are easy to find. o Highly structured data sets that are difficult to decipher. o Partially structured data sets.
Page 98 of 98

Ai Complete Notes

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Ai Complete Notes

Încărcat de

Drepturi de autor:

Formate disponibile

CHAPTER 1 : AI And Internal Representation INTRODUCTION The problem of defining artificial intelligence becomes one of defining intelligence itself.

Cricketer isa inst

BIOLOGICAL NEURAL SYSTEMS

BRAIN = A DYNAMIC SYSTEM

Each FAM cell corresponds to a FAM rule.

EFFECT OF SIGNAL FUNCTION

EFFECT OF SIGNAL FUNCTION

o => S is monotonic increasing.

EVOLUTION THROUGH NATURAL SELECTION

Initial Population of Animals

Struggle for Existence Survival of the Fittest

Surviving Individuals Reproduce, Propagate Favorable Characteristics

(Favorable Characteristics Now a Trait of Species)

CLASSES OF SEARCH TECHNIQUES

Calculus Based Techniques

Guided Random Techniques

Produce Offspring From Parents

Evaluate All Fitness

Select Fitter For Parents

THE GENETIC ALGORITHM CYCLE OF REPRODUCTION

MUTATION: LOCAL MODIFICATION Before: After: Before: After:

Distribution of Individuals in Generation 0

Distribution of Individuals in Generation N A SIMPLE EXAMPLE

Chromosome 1 Chromosome 2 Offspring 1 Offspring 2

11011 | 00100110110 11011 | 11000011110 11011 | 11000011110 11011 | 00100110110

After Selection 0110 | 1

1100 | 0 11 | 000 10 | 011

11101 11011 10000

841 729 256

Resulting in converging points prematurely.

Technique called as Adaptive Genetic Algorithm.

Concluded that inbreeding and outbreeding are better.

DOMINANCE, DIPLOIDY, ABEYANCE Nature offers: o Diploidy (i.e. pairs of chromosomes)

o Similarly he introduced a single locus tri-allelic dominant ma as follows: 0 1 2 o o o o 0 0 0 1 1 0 1 1 2 1 1 1

Empirical cycle of scientific research MACHINE LEARNING

THE KDD PROCESS

Data Mining - Clustering - Segmentation - Prediction

External Data Feedback

Cleaning Coding enrichment Data Mining Reporting

S-ar putea să vă placă și