FIT5222 Planning and automated reasoning Assignment 2 Pacman: Capture the Flag Pacman: Capture the Flag Pacman Capture the Flag is developed by UC Berkeley CS188. It is a multiplayer capture-the-flag variant of Pacman. Two teams, blue and red, will compete with each other. Each team needs to eat the food on the far side of the map while defending their own foods. We will hold a Pacman Capture the Flag competition at the end of this semester. Part 1: Installation Follow the instructions in week 1 to get codes of Pacman : Capture the Flag. Part 2: Rules of Pacman: Capture the Flag Characteristics: ● An agent needs to trade off offense versus defense. ● Only limited information is provided to agents. Layout The Pacman map is now divided into two halves: blue (right) and red (left). Red agents (which all have even indices) must defend the red food while trying to eat the blue FIT5222 Planning and automated reasoning food. When on the red side, a red agent is a ghost. When crossing into enemy territory, the agent becomes a Pacman. Scoring As a Pacman eats food dots, those food dots are stored up inside of that Pacman and removed from the board. When a Pacman returns to his side of the board, he "deposits" the food dots he is carrying, earning one point per food pellet delivered. Red team scores are positive, while Blue team scores are negative. If Pacman is eaten by a ghost before reaching his own side of the board, he will explode into a cloud of food dots that will be deposited back onto the board. Eating Pacman When a Pacman is eaten by an opposing ghost, the Pacman returns to its starting position (as a ghost). No points are awarded for eating an opponent. Power Capsules If Pacman eats a power capsule, agents on the opposing team become "scared" for the next 40 moves, or until they are eaten and respawn, whichever comes sooner. Agents that are "scared" are susceptible while in the form of ghosts (i.e. while on their own team's side) to being eaten by Pacman. Specifically, if Pacman collides with a "scared" ghost, Pacman is unaffected and the ghost respawns at its starting position (no longer in the "scared" state). Observations Agents can only observe an opponent's configuration (position and direction) if they or their teammate is within 5 squares (Manhattan distance). In addition, an agent always gets a noisy distance reading for each agent on the board, which can be used to approximately locate unobserved opponents. FIT5222 Planning and automated reasoning Winning A game ends when one team returns all but two of the opponents' dots. Games are also limited to 1200 agent moves (300 moves per each of the four agents). If this move limit is reached, whichever team has returned the most food wins. If the score is zero (i.e., tied) this is recorded as a tie game. Part 3: Getting Started and Useful Tools Use “git pull” to pull the latest code from pacman-public repo. You can find two examples in the Pacman Capture the Flag project. One is baselineTeam.py and another is team_pddl.py. baselineTeam.py is an official example for Pacman Capture the Flag. team_pddl.py is a simple example of how we use PDDL to guide the high-level actions (Should the agent defend home, attack for food or escape from enemies) of Pacman. Then the low level actions (what exact action to do to complete a high level action) is guided in a Q learning model, which the decision is based on a set of weights and features. Important codes you should know in these examples: ● You can use the ReflexCaptureAgent(CautureAgent) class as template for designing your own Pacman. ● def chooseAction(self,gameState) This is the function that the game will call when it’s the agent’s turn to take action. In other words where things start for each turn. ● gameState This variable is an instance of GameState class (Defined in capture.py file) which contains the current state of the game and a set of useful functions for you to easily retrieve information of the current game state. ● getAgentState(i) function of an instance of GameState. This function returns an instance of the class AgentState which holds the current accessible state of an agent. AgentState provide getPosition() and getDirection() function, which will return the agent’s position (if available) and the agent’s ambiguous direction. Important codes you should read: ● GameState class in capture.py The instance of this class stores all the information about the current game state and a set of functions to retrieve information. Read each function in this class as you will use them when retrieving information from the gameState variable in your agents. ● AgentState in game.py. This class contains the state of an agent, which include is Pacman, sacred timer, food carrying. FIT5222 Planning and automated reasoning ● CaptureAgent class in captureAgents.py This class is the base class of the agents you will implement. Make sure you are familiar with available methods in this class. Part 4: Task Your task will be to implement the Pacman agent/s to navigate the maze while eating food from the enemy’s home turf in myTeam.py. These agents will be uploaded to the server and run against each other. The agents who perform this task quickest, without the enemy eating all of their food will win the tournament. Your implementation should: ● Use PDDL to guide the high level action of agents. You can refer to team_pddl.py as a base implementation, and improve it by considering more states, introducing more high level actions and developing more complicated PDDL models. ● For each high level action, you may improve the low level implementation by introducing more features and weights to the model. Then build the Approximate Q-learning update process to train the model and obtain a set of good weights, this part does not need to be implemented in PDDL. You should carefully consider for each high level action what is the purpose of this high level action and how you will award or penalize the agent. ● To train your Q-learning model, you can run pacman in silence mode with “-Q” argument and automatically run pacman for dozens of game with “-n 1000” argument. You can replace 1000 with other number you want. You also need to use “-l ./layouts/bloxCapture.lay” (replace the bloxCapture.lay to other maps) to train your agent on other maps in “layouts” folder. Please consider that there are several different strategies to implementing your agent considering whether the agents are defensive, offensive or both depending on certain conditions. You may wish to consider either a different strategy for each of the two agents or one unified strategy. Upon submission to the server your agent will be evaluated against a baseline (team_pddl.py) agent implementation for 21 rounds on 3 different maps, if your agent is successful for more than 11 rounds you will proceed to be evaluated for another 21 rounds against the agent ranked next and so on until you are either defeated or ranked first. You are expected to at least beat the baseline implementation, initially on the server, otherwise you will not pass to the next stage of competing with other students’ agents. Within each round there will be a time limit and the agent that has succeeded in eating the most food during that time will win that round. The winning agent will be the agent that beats all other agents for more than 11 rounds out of 21. FIT5222 Planning and automated reasoning Part 5: Agent Report Along with your agent you will be required to submit a report to Moodle describing your agents’ strategy and explaining your code. Please include a description of your model as well as a justification for your strategy choice. We will not provide the number of lines needed for this submission but please make sure that your report is as detailed and complete as possible, including references to the code as well as detailed strategy motivation. For example, when describing your model, also refer to how your code implements that model description. We would like to remind all students that obtaining code online and using it constitutes a breach of academic integrity and will be liable to heavy repercussions. The work must be completely your own. The same goes for obtaining code from a friend or colleague. As such please make sure to work independently and take reasonable steps to make sure other students can't copy or misuse your work. Please see: https://www.monash.edu/students/study-support/academic-integrity Part 6: Marking Rubric Criteria N 0%-49% P 50%-59% C 60%-69% D 70%-79% HD 80%-100% Percent of grade 1. Rank in the competition (the exact score depends on the performance within each cohort) Loses to the baseline implementation Outranks baseline implement ation Ranked at 75% ~ 50% of all submissions Ranked at 50% ~ 75% of all submissions Ranked at top 25% of all submissions. 33.3% 2. Agent strategy implementation Minor updates to example PDDL implementation in the code base (for example, only changing parameter values) One planning strategy; same strategy for both agents One unique strategy for each planning agent. Multiple strategies of varying complexity. Unique RL implementation. Adaptable agents that can transition between planning strategies. 33.3% FIT5222 Planning and automated reasoning 3. Report - Description of your approach Incomplete or insufficient description of the approach and/or pseudo-code. High-level description of the approach and pseudo-co de. Discussion and algorithmic analysis. E.g., time, space, completeness, optimality. Link the agent strategy to what was learned in the Lectures and tutorials, provide motivation for your choices and if possible prove lower and upper bounds. Reflections: advantages and disadvantages of your approach(es) Numerical experiments, analysing the efficiency of your implementations (e.g. on standard benchmarks and vs. appropriate reference algorithms) 23.3% 4. Report - Communication skills Hard to follow with no clear narrative. Inadequate or no separation of discussion text into coherent sections. Writing is not accurate or articulate. The writing has a tenuously logical narrative. Some attempt at the expected structural elements (e.g. Intro, conclusion) . Writing is not accurate or articulate most of the time. The text has a clear logical narrative and expected structural elements (e.g. intro, conclusion). Writing is not accurate or articulate most of the time. There are some supporting The writing is well composed and has a clear and logical narrative and is well structured. Writing is generally accurate and articulate. The document has appropriate The writing is very well composed and has a very clear and logically formed narrative as a whole. Writing is accurate and articulate. The document is expertly structured in the style of a scientific report, including 10% FIT5222 Planning and automated reasoning Part 7: Submission Guide Submission to Moodle: The following submission items are required: 1. Your implementation source codes, in a single directory called "src" (you can copy everything in the piglet folder to "src"). Zip the codes directory with file name last_name_student_id_pacman.zip. (For example, Chen_123456_pacman.zip) 2. The report describing your approaches as a single pdf file. Name the pdf as last_name_student_id_report_pacman.pdf. (For example, Chen_123456_report_pacman.pdf) Submission to Contest Server: You must make a successful submission on the server as part of your mark depending on your rank. Submit as early as possible as you don’t know what will happen if you submit on the last day. Submission deadline : 11:55 pm 28 May 2021
欢迎咨询51作业君