FIT5222-PlanningandAutomatedReasoning
Assignment2:
Pacman-CapturetheFlag
InthisassignmentwewilldevelopanAIcontrollerforamultiplayervariantofPacman.Inthisgame
twoteams,redandblue,competewitheachothertocaptureasmanydotsaspossiblefromthe
oppositesideofthemap.
OriginallydevelopedatUCBerkeley,thissetuppresentsuswithachallengingplanningdomain
andanopportunitytosolveafunproblem.Attheendofthesemesterwewillholdacontesttofind
outwhodevelopedthemosteffectivecontroller.Gloryandafancycertificateawaitthetopthree
students!
Part1:Installation
PigletPDDLSolver
InthisassignmentyouwillusethepigletlibrarytosolvePDDLproblems.Thisrequiresinstalling
piglettothepythonenvironment:
1.Gotopiglet-publicfolder(Ifyouuseavirtualenvironment,activatethevirtualenvfirst.You
coulduseyourflatlandvirtualenvironmentaswell.)
2.gitfetch
3.gitcheckoutpddl_solver//ifsuccessful,youshouldfindapddl_solver.pyunder
lib_piglet/utils/folder
4.pythonsetup.pyinstall
CheckthemyTeam.pystarterimplementationtoseehowtousethenewlyinstalledlibrary.
ThecompetitionserverwillhavethesamepigletinstalledintheenvironmentasaPDDLsolver.
FIT5222-PlanningandAutomatedReasoning
UpdatePacmanCode
MakesureyouhavethelatestversionofthePacmanCTFgameenvironment:
1.Gotothepacman-publicfolder(youclonedtherepoinweek1)
2.gitfetch
3.gitreset--hard//incaseanychangesyoumadelocally
4.gitpull
5.Ifyourcodeisnewest,youshouldseestaffTeam.pyandberkeleyTeam.pyintherepo.
Allcodesforthegameenvironmentarenowfoundinthepacmanfolder.
Part2:GameRules
WecancharacterisePacmanCTFasfollows:
●Multi-agent:Twoagentsneedtoworktogetheragainstanopposingteam.
●Discrete:Theenvironmentisagridmazeandtimeisdiscretisedintounittimesteps.
●Dynamic:Theenvironmentchangesasfoodisconsumed.
●Partiallyobservable:Eachagenthasalimitedsensingrange.
●Sequential:Pastdecisionsaffectfutureactions.
●Deterministic:Thecurrentstatedependsonlyontheteamsandtheiractions.
●Offline:Theworldstopswhileanagentdeliberates.
●Known:Alltherulesofthegameareavailableapriori.
Environment
Thegamemapisdividedintotwohalves:red(left)andblue(right).Whenontheredside,ared
agentisaghost.Whencrossingintoenemyterritory,theagentbecomesaPacman.Red
agentsmustdefendtheredfoodwhiletryingtoeatthebluefood.
Scoring
WhenaPacmanagenteatsafooddot,thatdotisremovedfromtheboardandplacedina
virtualbackpack.Whentheagentreturnstotheirownsideoftheboard,theyautomatically
"deposit"thecontentsoftheirbackpack,earningonepointperdot.TheRedteamscores
positivepoints,whiletheBlueteamscoresnegativepoints.
Watchout!Ifanagentiscaughtbyaghost,beforereachingtheirownsideoftheboard,all
dotsintheirbackpackwillbedepositedbacktotheiroriginalpositionsontheboard.Theagent
alsoreturns(asaghost)toitsoriginalstartingposition.Nopointsarescoredinthiscase.No
pointsareawardedtotheopposingteameither(forcatchingPacman).
FIT5222-PlanningandAutomatedReasoning
PowerCapsules
Asmallnumberofpowercapsulesarescatteredthroughoutthemaze.Thesecanbeeatenby
aPacmanagent.Whenthishappensghostsbecome“scared”andaresusceptibletobeing
caughtbythepoweredupPacman.Thisghosteffectlastsforthenext40timesteps,oruntil
caught,whichevercomessooner.
Observations
Agentscanonlyobserveanopponent'sconfiguration(positionanddirection)iftheyortheir
teammatearewithin5squares(Manhattandistance).Inaddition,anagentalwaysgetsanoisy
distancereadingforeachagentontheboard,whichcanbeusedtoreasonaboutunobserved
opponents.Thereadinghasanerrorof+/-6stepsfromthetruedistance.Ageneraldirection
(towardtheopponent)isnotspecified.
Winning
Agameendswhenoneteamreturnsallbuttwooftheopponents'dots.Gamesarealsolimited
to300timesteps(i.e.,300movesforeachofthefouragents).Ifthismovelimitisreached,
whicheverteamhasreturnedthemostfoodwins.Ifthescoreiszero(i.e.,tied)thisisrecorded
asatiegame.
Part3:Task&StudentWorkflow
Ateverytimestepthesimulatorwillpresentyouwiththecurrentstateofthegameboard.Yourtask
willbetoreasonoverthiscurrentstateanddecidehowyouragentsshouldreact.
Thiswillrequireyoutothinkabouttheplanningprocessinseveraldifferentways:atthehigh-level,
whereyoudecidewhatstrategyyourteamwillpursue;atthelow-level,whereyoutransformthat
strategyintoaconcretesequenceoflow-levelactions;andyouwillneedtodecidewhentoreplan
youragents,inresponsetochangingcircumstances.
Weillustrateanexampleofthestudentworkflowinthediagrambelow.
FIT5222-PlanningandAutomatedReasoning
YourstartingpointformodifyingthisworkflowisthefunctionchooseActioninfilemyTeam.py.
Thisfunctioniscalledbythesimulatorateverytimestepandforeachagent.Thesimulator
providestheagentwithadescriptionofthegameboard.Attheendofthefunctionyoumustreturn
aconcreteactionfortheagenttoexecuteinthenexttimestep.
Howtodecidewhichactiontheagentshouldtakeisentirelyuptoyou.Thestartingcodeprovides
abasictemplateforthedecision-makingprocessandsomesimple,thoughnotveryeffective,
strategies.YouwillneedtomodifythiscodetocreateawinningAI.
Thenextsectionsgivefurtherdetailsaboutthedecision-makingprocess.Formoredetails
regardingthePacmanCTFcode,themyTeam.pyimplementation,andothergeneraltips,referto
theadditionaldocument“Assignment2CodeDocumentation.pdf”!
3.1.ChooseaHighLevelGoal
Whatisthehigh-levelstrategythatagentsshouldpursue?Forexample,theycouldvisitthe
opposingsideandtrytoscorepoints,ortheycouldstayhome,anddefendtheirscore.Eachagent
choosesitsownhighlevelgoal,butthatdecisionmaybeinfluencedbythecurrentstateofthe
board,thecurrentscore,andthestrategyandpositionoftheteammateagent.
YourstartingpointhereisthegetGoalsfunctioninthefilemyTeam.py.GivenaPDDLdescription
ofthecurrentstate,thisfunctionreturnspositiveandnegativePDDLstatedescriptionsthat
togetherdescribeahigh-levelgoal.Youcancreatemultipledifferentgoalsandthenchoose
betweenthembasedontheinformationinthecurrentstate.
FIT5222-PlanningandAutomatedReasoning
3.2.GenerateaHighLevelPlan
Toachieveyourhigh-levelgoalsyouwillneedtocomputeacorrespondingsequenceofhigh-level
actions(i.e.,aplan).GivenaPDDLdescriptionoftheinitial(i.e.,thecurrent)statesandthegoal
states,thefunctiongetHighLevelPlancomputessuchaplan.Youdonotneedtomodifythis
function.Justcallitasnecessary.
Butwhatiftheagentalreadyhasahigh-levelplan,computedinapreviousiteration?It’suptoyou
todecidehowtoproceed.Theagentcouldretaintheexistingplanoritcouldchange,andreplan
anewfromthecurrentstate.Forexample,ifthepre-conditionsforthenextactionarenolonger
satisfied,theagentwillcertainlyneedtogenerateanewhighlevelplan.Theagentmayalso
decidetoreviseitscurrentplan(andpossiblycurrentgoal!)ifitsexecutionproducesan
unexpectedresult(e.g.,beingcaughtbyaghostorapowered-upPacman).
Thestaffimplementationchecksifwecanreuseexistingplan:
Note:OnlyalimitedsetofpredicatesareincludedinthebasicPDDLstatedescription(seethefile
myTeam.pddl)andnotallofthemappearinthegivenactions.Youcancreatearichermodel,and
computemorediverseplans,by(1)refiningexistingactionsorcreatingnewonesthatusemoreof
theavailablepredicates;(2)enhancingthePDDLstatedescriptionwithadditionalinformationfrom
thecurrentgamestate(i.e.,addmorepredicates).Weprovideusefulprogrammaticfacilitiesfor
FIT5222-PlanningandAutomatedReasoning
thispurpose.Seethe get_pddl_statefunctionforexamplesofhowtocollectinformationfromthe
simulatorenvironment(gameState).
3.3.GenerateaLowLevelPlan
Highlevelplanstelltheagentswhattodo,butinbroadstrokes.Forexample,ifthegoalistoscore
points,anecessaryactionistomovetotheoppositesideoftheboard.Buthowshouldtheagent
getthere?Onceontheopposingside,thenexthighlevelactionmightbetoeatafooddot.But
whichone?Lowlevelplanningprovidesanswerstothesequestions.
ThefunctiongetLowLevelPlanisyourstartingpointhere.Itsinputsareahigh-levelactionanda
descriptionofthecurrentgamestate.Thefunctionreturnsasequenceofmoveactionsthatthe
agentneedstotaketoachievethehigh-leveleffectsoftheinputaction.
Youarefreetoimplementthisfunctionusinganymethodyouchoose.Forexample:
●HeuristicSearchStrategy:youcandevelopyourowncustomstrategiesbyimplementing
anewexpanderclassforyourowncustomisedstate-spacerepresentation.
●LearningBasedStrategy:weprovideaverybasicexampleofthisapproach.Youcan
makeyourownimplementationorimproveitbyintroducingmorefeaturestothemodel.
Thenyouwilltrainthenewmodeltoobtainasetofgoodweights.Youshouldcarefully
consider,foreachhighlevelaction,howyouwillrewardorpenalisetheagentforits
performance.Youmayneedtokeepadjustingyourimplementation,observehowyour
agentbehaves,andtrysomethingnewtoovercomethedrawbacksyouobserved.
Butwhatiftheagentalreadyhasalow-levelplan,computedduringapreviousiteration?Again,it’s
uptoyoutodecidehowtoproceed.Theagentcouldcontinuefollowingtheexistinglow-levelplan
oritmightdecidetorevisititsdecisionsonaccountofnewinformationprovidedbythegamestate.
Forexample,ifPacmanisheadingtowardsafooddot,butobservesaghostupahead,continuing
towardthedotmaybeabadidea.
Thestaffimplementationreuselowlevelplans:
FIT5222-PlanningandAutomatedReasoning
Remember!Therearemanydifferentwaystoachieveahigh-levelaction.Youmayalsowishto
considerwhetheryourtwoagentscoordinatetheirlow-levelactionsoractindependentlyofone
another.
3.4.ExecuteMoves
Yourlow-levelstrategyhasgeneratedasequenceofdirectionsfortheagenttomovein.Thefirst
moveinthisplanisgiventothesimulatorforexecution.Eachofyourtwoagents,andthetwo
opposingagents,moveaccordingtotheirowninstructionsandthetimestepisadvancedbyone.
AllofthishappensafterthecompletionofthechooseActionfunction.
Basedonthenewstateofthegameboard,youmayneedtore-evaluateyourgoalsandplans.
Thisworkflowiscyclicandcontinuesuntilthegameisover.
Remember:Youcan(andshould!)compareyourimplementation(myTeam.py)againstthe
existingbaselines(staffTeam.pyorberkeleyTeam.py).Youcanfindtheseimplementationsinthe
pacmanfolder.Thebaselinesareusefultocheckthatyourimplementationworksandtomeasure
yourprogress.
Part4:CompetitionSetup
WewillhaveacompetitiontoseewhohasthestrongestpacmanAI.Youwillbeabletosubmit
yourimplementationtoacontestserverwheretheywillberunagainsteachother.
Uponsubmissiontotheserveryouragentwillbeevaluatedagainstastaffbaselineimplementation
(staffTeam.py).Youragentwillbeevaluatedfor49gameson7differentmaps.Withineachgame
therewillbeatimelimitandtheagentthathassucceededineatingthemostfoodduringthattime
willwinthatgame.Towinthematchwithstaffbaseline,yourteammustwinconvincingly:28outof
49games.
BywinningthestaffTeam.py,youwillgetapassforCriteria1competitionscore(seemoredetails
inmarkingrubrics),haveapositionontheleaderboard,andyouragentwillbeevaluatedagainst
allotherparticipantsontheleaderboardtogetadditionalvictorypoints.
Witheachopponentontheleaderboard(excludingthestaffbaselineimplementation),youragent
willbeevaluatedonthesamesetofproblems:
●Intheeventofaconvincingwin(winatleast28outof49games),youwillget3victory
points.
●Intheeventofatie(winlessthan28games,butloselessthan28gamesaswell,
consideringtiegamesneitherwinnorlost.),youwillget0.5victorypoints.
●Intheeventofaloss(lose28gamesormore),youwillget0victorypoints.
Ifyouwinthestaffbaselineimplementation,yourmarksforCriteria1competitionwillbedecided
by(Yourtotalvictorypoints/Allavailablevictorypoints)%,wheretheallavailablevictorypointsis
3×(numberofallparticipants-1).
FIT5222-PlanningandAutomatedReasoning
Youareexpectedtoclearlywinagainstthestaffbaselineimplementation,otherwiseyouwill
notproceedtothenextstageofcompetingwithentriesfromotherstudents.Inthiscaseyour
gradeforCriteria1ofthemarkingrubricwillbeafail.
Theserverwillkeeparecordofyourmatchesagainsteveryopponentontheleaderboard.When
someonehaveanewbestimplementationrecorded,yourscoreagainstthatstudentwillbe
updatedaswell.
Moredetailsregardingthecompetitionsetupwillbeprovidedinasupplementarydocument
“Assignment2ContestSubmissionInstruction.pdf”.
Part5:Report
Togetherwithyourimplementationsourcecodeyouwillberequiredtosubmitareportthat
describesyourapproach.Youwillneedtowriteadescriptionofyourstrategyandgivea
justificationforalgorithmicchoices.Makesurethatyourreportisasdetailedandcompleteas
possible,includingappropriatereferences.
Whendescribingyourmodelyoushouldexplainnotonlywhatitdoesbutalsoanalyseitsstrengths
andweaknesses(e.g.,complexity,guaranteesetc).Youshouldalsodiscusshowyourcode
implementsthatmodel:forexample,importantdatastructures,necessaryoptimisationsand
parameterchoices.Justifyyourdesignandimplementationchoiceswithdiscussionand,where
appropriate,withexperiments.
Youcanaskquestionsabouttheassignmentandyoucandiscussthemeritsofdifferentdesign
choices(e.g.,onEd).Howeverallsubmittedworkmustbecompletelyyourown.Donotcopy
implementationsyouhappentofindonline.Donotaskyourfriendsorcolleaguesfortheir
implementations.Youwillneedtoworkindependentlyandtakereasonablestepstomakesure
others’don’tcopyormisuseyourwork.Formoreinformationpleasesee
https://www.monash.edu/students/study-support/academic-integrity
PLAGIARISMISNOTACCEPTABLE.
YOUWILLFAILTHEASSIGNMENTANDPOSSIBLYTHEENTIREUNIT.
FIT5222-PlanningandAutomatedReasoning
Part6:MarkingRubric
Therubricspecifiesthemarkingcriteriaforyourimplementationandreport.Toavoiddisappointment,paycloseattentiontotherequirements.
CriteriaN
0%-49%
P
50%-59%
C
60%-69%
D
70%-79%
HD
80%-100%
Weight
(%)
1.CompetitionLosestothestaff
baseline
implementation
Outranksstaffbaseline
implementation.
Get25%~49%ofall
availablevictorypoints.
Get50%~74%ofall
availablevictorypoints.
Get75%~100%ofall
availablevictorypoints.
33.3%
2.Agentstrategy
implementation.
Minorupdatesto
examplePDDL
implementationin
thecodebase(for
example,only
changingparameter
values)
Addressessome
drawbacksinexisting
implementation.
Improvesthemodeland
highlevelplannerby:
(1)Introducingnewactions
tothepddlmodel,which
usesmorelistedpredicates
ornewpredicatesyou
introduced.Or
(2)Designnewgoal
functions.
Sameaspreviouscriteria
plus:
Implementsalternativehigh
levelgoalprioritisation
scheme(agentsswitch
goalsbasedonnew
observations).
Improveslowlevelplanner;
e.g.morefeatures,better
rewardfunctions,oruses
heuristicsearchforlow
levelplanning.
Sameasprevious
criteriaplus:
Customisedlowlevel
plansfordistincthigh
levelactions;e.g.
differentlow-level
strategiesforeatfood
(which?)vs.goto
opposingside(where?).
Sameaspreviouscriteria
plus:
Highlevelandlowlevel
decisionstakeintoaccount
thecurrentactionand
strategyoftheteammate
agent.(Twoagentsshould
cooperatewitheachother,
andshareinformationwith
eachother.)
Ordesignyourownmodel
andagentworkflow.
33.3%
FIT5222-PlanningandAutomatedReasoning
3.Report-
Descriptionof
yourapproach
Incompleteor
insufficient
descriptionofthe
approach.
Insufficientor
missing
justifications.
Littleorno
pseudo-code.
Highleveldescriptionofthe
approachwithsufficient
pseudo-code.
Discusseshowthenew
implementationimproves
uponthestaffbaseline,
staffTeam.py
Sameaspreviouscriteria
plus:
Discussionandalgorithmic
analysis(asapplicable).
E.g.,time,space,
completeness,optimality.
Providemotivationforyour
implementationchoices.
Linktheagentstrategyto
lecturesandtutorials.
Appropriatereferencing
(whereapplicable).
Sameasprevious
criteriaplus:
Reflections:
Discussandcompare
theadvantagesand
disadvantagesofat
least2-3implemented
strategies/attempts.
Sameaspreviouscriteria
plus:
Welldocumentednumerical
experiments,analysingthe
efficiency,advantages,
drawbacksofyourdifferent
implementations.E.g.
improvementsonsuccess
ratebetweendifferent
strategies/attempts,runtime
statisticsonplanningtime,
andappropriatereference
algorithms.
23.3%
4.Report-
Communication
skills
Hardtofollowwith
noclearnarrative.
Inadequateorno
separationof
discussiontextinto
coherentsections.
Writingisnot
accurateor
articulate.
Inadequate
supportingmaterials.
Inadequateor
Thewritinghasatenuously
logicalnarrative.Some
attemptattheexpected
structuralelements(e.g.
Intro,conclusion).
Writingisnotaccurateor
articulatemostofthetime.
Thedocumenthasfew
supportingmaterials
(tables,images,
pseudo-code).
Thestudenthas
attemptedtoundertake
Thetexthasaclearlogical
narrativeandexpected
structuralelements(e.g.
intro,conclusion).
Writingisnotaccurateor
articulatemostofthetime.
Therearesomesupporting
materials(tables,images,
pseudo-code)butnotwell
integratedwiththerestof
thetext.
Thestudentfollowsthe
requirementsforcitingand
Thewritingiswell
composedandhasa
clearandlogical
narrativeandiswell
structured.
Writingisgenerally
accurateandarticulate.
Thedocumenthas
appropriatesupporting
materialsthatarewell
integratedwiththerest
ofthetext.
Thestudentfollowsthe
Thewritingisverywell
composedandhasavery
clearandlogically
formedanarrative.
Writingisaccurateand
articulate.
Thedocumentisexpertly
structuredinthestyleofa
scientificreport,including
appropriatesupporting
materialsthatclearlyimprove
thequalityofassociated
discussion.
10%
FIT5222-PlanningandAutomatedReasoning
missingreferencing.citingandreferencing
withfrequenterrors.
referencing,with
somenotableerrors.
requirementsforciting
andreferencing,with
someminorerrors.
Thestudentfollowsthe
requirementsforcitingand
referencing.
FIT5222-PlanningandAutomatedReasoning
Part7:SubmissionGuide
SubmissiontoMoodle:
Thefollowingsubmissionitemsarerequired:
1.Yourimplementationsourcecodes,inasingledirectorycalled"src"(youcancopy
everythinginthepacmanfolderto"src").Zipthecodesdirectorywithfilename
last_name_student_id_pacman.zip.(Forexample,Chen_123456_pacman.zip)
2.Thereportdescribingyourapproachesasasinglepdffile.Namethepdfas
last_name_student_id_report_pacman.pdf.(Forexample,
Chen_123456_report_pacman.pdf)
SubmissiontoContestServer:
Youmustmakeasuccessfulsubmissiononthecontestserveraspartofyourmark.
Submitearlyandsubmitoften.Don’twaitforthelastday,whenyoumightbesubject
tocircumstancesoutofyourcontrol(likeserveroutages!).
Submissiondeadline:11:55pmWednesday30thOctober2024