CSIT5900 Lecture 2: Reactive Agents Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 1 / 35 Overview Agent is a term referring to entities that can perform certain tasks in some environments autonomously. An agent needs to have perception, can perform some actions, has some purpose (goal). We first study stimulus-response agents, then consider adding states to these agents. We consider how to control these agents using rules, neural networks, and genetic algorithms, and whether the programs are desiged or learned (evolved). Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 2 / 35 Stimulus-Response Agents Stimulus-Response (S-R) agents are machines without internal states that simply react to immediate stimuli in their environments. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 3 / 35 A Boundary-Following Robot A robot in a two-dimensional grid world: Boundary Solid object The robot senses whether the eight surrounding cells are free for it to occupy A robot starting here will go clockwise around the inside of the outer boundary A robot starting here will go counterclockwise around the outside boundary of the object s1 s2 s3 s8 s4 s7 s6 s5 © 1998 Morgan Kaufmann Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 4 / 35 Sensors and Actions Sensors: eight of them s1 - s8. si = 1 if the corresponding cell is occupied, it equals to 0 otherwise. Actions: the robot can move to an adjacent free cell. There are four of them: 1 north - moves the robot one cell up. 2 east - moves the robot one cell to the right. 3 south - moves the robot one cell down. 4 west - moves the robot one cell to the left. All of them have their indicated effects unless the robot attempts to move to a cell that is not free; in that case, it have no effect. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 5 / 35 Controling a robot We now have a model of the problem: an environment modeled by a two-dimensional grid world, an agent with sensors to check if any of the nearby cells is occupied, a collection of actions for moving around the world, and the task of following the boundary of the first obstacle that it comes into. We now need an algorithm to control the robot: Designing a production system to control it; Learn a control function; Evolve a program to control it. We start with the manual design approach. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 6 / 35 Basic Architecture Designer’s job: specify a function of the sensory inputs that selects actions appropriate for the task at the hand (boundry-following in our example). In general, it is often convenient to divide the function into two components: perceptual processing and action function: Perceptual processing Action function Sensory input Action Feature vector, X Next to wall In a corner Designer’s intended meanings: 0 1 1 1 1 © 1998 Morgan Kaufmann Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 7 / 35 Perception The robot has 8 sensory inputs s1,...,s8. There are 28 = 256 different combinations of these values. Although some of the combinations are not legal because of our restriction against tight spaces (spaces between objects and boundaries that are only one cell wide), there are still many legal ones. For the task at the hand, we only need four binary-valued features about the sensor: x1, x2, x3, x4: x1 x2 x3 x4 In each diagram, the indicated feature has value 1 if and only if at least one of the shaded cells is not free. © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 8 / 35 Action Given the four features, we can specify a function of these features that will select the right action for boundary-following as follows: if none of the four features equal to 1, then move north (it could be any direction). if x1 = 1 and x2 = 0, then move east; if x2 = 1 and x3 = 0, then move south; if x3 = 1 and x4 = 0, then move west; if x4 = 1 and x1 = 0, then move north. We now need to represent and implement perception and action functions. To that end, we briefly review Boolean algebra below. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 9 / 35 Boolean Algebra A Boolean function maps an n tuple of (0, 1) values to {0, 1}; Boolean algebra is a convenient notation for representing Boolean functions using · (and, often omitted), + (or), and − (negation): 1 + 1 = 1, 1 + 0 = 1, 0 + 0 = 0, 1 · 1 = 1, 1 · 0 = 0, 0 · 0 = 0, 1 = 0, 0 = 1 The two binary operators are commutative and associative. Examples: x4 is s1 + s8; the condition for the robot moving north is x1x2x3x4 + x4x1. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 10 / 35 Production Systems One convenient representation for an action function is production systems. A production system is a sequence of productions, which are rules of the form: c → a meaning if condition c is true, then do a. Here a could be a primitive action, a call to a procedure, or a set of actions to be executed simultaneously. When there are more than one productions can be fired (their condition parts are true), then the first one is applied. The following is a production system representation of the action function for our boundary-following robot: x4x1 → north, x3x4 → west, x2x3 → south, x1x2 → east, 1→ north. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 11 / 35 Here is a production system for getting the robot to a corner: inCorner → nil , 1 → bf , where inCorner is a feature of the sensory input corresponding to detecting a corner, nil is the do-nothing action, and bf is the action that the above boundary-folliwng production system will produce. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 12 / 35 Learning Action Function - Supervised Learning with TLUs Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 13 / 35 Supervised Learning Given a training set consisting of a set Σ of n-dimensional vectors (these vectors could be the vectors of the robot’s sensory inputs, or they might be the feature vectors computed by the perceptual processing component); for each vector in Σ, an associated action called label of the vector (this action could be the one that the learner observed performed by the teacher, or simply the desired action given by the designer of a robot); the task of learning is to compute a function that responds ”acceptably” to the members of the training set: this usually means that the function agrees with as many members of the training set as possible. What functions to learn is crucial. Here we consider a class of simple linear weighted functions called TLUs. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 14 / 35 Example - When to Move East A training set for learning when to move east: 1 2 3 4 5 6 Input Sensory x1x2 number vector (move east) 1 00001100 0 2 11100000 1 3 00100000 1 4 00000000 0 5 00001000 0 6 01100000 1 © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 15 / 35 TLUs Boolean functions, thus those production systems whose conditions are Boolean functions can be implemented easily as programs. They can also be implemented directly as circuits. A type of circuits of particular interest in AI is threshold logic unit (TLU), also called perceptron: w1 x1 w2 x2 wi xi wn xn Σ Σ . . . . . . Σ x wi ii=1 n Weighted sum Output, f Threshold, θ f = 1 if = 0 otherwise x wi i ≥ θ i=1 n © 1998 Morgan Kaufmann Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 16 / 35 TLUs Not all Boolean functions can be implemented as TLUs - those can are called linearly separable functions. Here are two examples: 1.5–1 1 1 x2 x1 x3 x1x2 x3 © 1998 Morgan Kaufman Publishers 0.5 x1x2 s2 s3 s4 s5 1 1 –2 –2 © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 17 / 35 Networks A single production rule can sometimes be encoded as a TLU. A production system needs a network of TLUs to implements. Such networks are often called an (artificial) neural netwroks, which can be used to represent any Boolean function. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 18 / 35 Neurons http://en.wikipedia.org/wiki/Neuron: A neuron is an electrically excitable cell that processes and transmits information through electrical and chemical signals via synapses. Neurons when connect to each other form neural networks, and are the core components of the nervous system. The number of neurons in the brain varies dramatically from species to species. One estimate (published in 1988) puts the human brain at about 100 billion (1011) neurons and 100 trillion (1014) synapses. The fruit fly Drosophila melanogaster, a common subject in biological experiments, has around 100,000 neurons and exhibits many complex behaviors. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 19 / 35 Learning TLUs Recall that a TLU is determined by: number of inputs; for each input an associated number called its weight; a number called the threshold. We know the number of inputs from the training set; we can assume that the threshold is always o by introducing a new input with its weight set to be the negative of the original threshold, and its input always set to 1. So what we need to learn is the vector of weights. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 20 / 35 The Error-Correction Procedure Given a vector X from the training set (augmented by the n + 1th special input 1), let d be the label (desired output) of this input, and f the actual output of the old TLU, the weight change rule is: W←W + c(d − f )X, where c, a number, is called the learning rate. To use this rule to learn a TLU: start with a random initial weight vector, choose a learning rate c , and then iterate through the set of training set repeatedly until it becomes stable. If the traning set can be captured by a linearly separable Boolean function, then this procedure will terminate. The exact number of steps needed depends on: Initial weight values Learning rate in weight updating rule Order of presentation of training examples Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 21 / 35 Learning action functions with genetic programming. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 22 / 35 Machine Evolution Human becomes what we are now through millions of years evolution from apes. Presumably, we could simulate this evolution process to evolve smart machines as well. Evolution has two key components: I reproduction (how do parents produce descendants); and I survival of the fittest (how to select which of these descendants are going to reproduce more descendants). Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 23 / 35 Genetic Programming (GP) GP is about evolving programs to solve specific problems. The first issue is what the representation is for programs: the techniques for evolving C programs would be different from that for ML programs. The general idea is: I decide what the legal programs are; I define a fitness function; I select a set of legal programs as generation 0; I produce the next generation until a desired program is produced. Three common techniques for producing the next generations are: I copy (clone a good gene to the next generation); I crossover (mix-up two parent’s genes); I mutation (mutate a gene) I will illustrate the idea using the boundary-following robot. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 24 / 35 The Task Wall-following in the following grid-world (we have to fix the environment): nw n ne e sessw w © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 25 / 35 Program Representation We shall assume that the program that we are looking for can be constructed from the primitives ones (east, west, etc) by Boolean ops and conditionals. The example in the following slide represent a program that does the same thing as the following production system: (n + ne)e → east, (e + se)s → south, (s + sw)w → west, 1 → north. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 26 / 35 eastAND IF OR NOT n ne e southAND IF OR NOT e se s westAND IF OR NOT s sw w north (IF (AND (OR (n) (ne)) (NOT (e))) (east) (IF (AND (OR (e) (se)) (NOT (s))) (south) (IF (AND (OR (s) (sw)) (NOT (w))) (west) (north)))) © 1998 Morgan Kaufman PublishersFangzhen Lin (HKUST) Lecture 2: Reactive Agents 27 / 35 Fitness Function Given a program, and a starting position for the robot, we run it until it has carried out 60 steps, and then count the number of cells next to the wall that are visited during these 60 steps. (Max=32, Min=0.) For a given program, we do ten of these runs with the robot starting at ten randomly chosen starting positions. The total count of the next-to-the-wall cels visited is taken as the fitness of the program. (Max = 320, Min=0.) Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 28 / 35 The GP Process Generation 0: 5000 random programs. Procedure for producing generation (n+1) from generation n: Copy 10% of the programs from generation n to generation n+1. These programs are chosen by the following tournament selection process: 7 programs are randomly selected from the population, and the most fit of these seven is chosen. The rest (90%) are produced from generation n by a crossover operation as follows: a mother and a father is chosen from generation n by the tournament selection process, and a randomly chosen subtree of the father is used to replace a randomly selected subtree of the mother. See next page for an example. Sometimes a mutation operator is also used: it selects a member from generation n by the tournament selection process, and then a randomly chosen subtree of it is deleted and replaced by a random tree. When used, the mutation operation is used sparingly (maybe 1% rate). Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 29 / 35 An Example of Crossover Operation NOT AND IF se IF AND OR se NOT s OR se NOT nwe west w south north NOT AND se IF NOT s AND NOTwest se north Mother program Father program Child program Randomly chosen crossover points © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 30 / 35 Performance of GP The performance of GP depends on: the size of generation 0; the copy rate, crossover rate, and mutation rate; the parameters used in the tournament selection process. For the wall-following example, it will generate a perfect one after about 10 generations: 350 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 8 9 10 Fi tn es s Generation number © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 31 / 35 State Machines Recall that S-R agents have no memory; they just respond to the current stimuli. Often one needs more powerful agents. State machines are agents who can remember the action that they have just done, and the state that the previous environment is in, and can have some mental states. Action function of a state machine is then a mapping of the current sensory inputs, the state of the environment at the previous time step, the action that the machine has just taken, and the current mental states. Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 32 / 35 Basic Architecture of State Machines 0 1 1 1 1 Perceptual processing Memory (previous feature vector and previous action) Action function Action at Sensory input Xt–1 at–1 Feature vector, Xt © 1998 Morgan Kaufman Publishers Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 33 / 35 Sensory-Impaired Bounday-Following Robot Consider again our boundary-following robot, assume now that it’s somewhat sensory-impaired: its sensory inputs are only (s2, s4, s6, s8). Can you design an S-R agent for following a bounday based only on these four sensors? Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 34 / 35 Let w1 − w8 be the features defined as: wi = si when i = 2, 4, 6, 8; w1 = 1 iff at the previous time step, w2 = 1, and the robot moved east; w3 = 1 iff at the previous time step, w4 = 1, and the robot moved south; w5 = 1 iff at the previous time step, w6 = 1, and the robot moved west; w7 = 1 iff at the previous time step, w8 = 1, and the robot moved north. Using these features, the following production system will do the job: w2w4→ east, w4w6→ south, w6w8→ west, w8w2→ north, w1 → north, w3 → east, w5 → south, w7 → west, 1→ north Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 35 / 35
欢迎咨询51作业君