ELE8066 Intelligent Systems and Control Developing a Genetic Algorithm to Solve the 0-1 Knapsack Problem It is often the case that we must make the best choices that we can within a limited budget, consider -- grocery shopping while staying within your budget; packing a suitcase for a trip while staying below your luggage weight allowance; deciding which items you will carry in a small handbag; deciding which programs and documents you will keep on your computer with a finite hard drive capacity; deciding which topics to study for an exam with the time and mental energy available! There are also many examples of this decision problem in commercial and industrial scenarios -- a manufacturing plant choosing what to produce with a limited number of operational hours; forming a sports team line-up with some maximum budget for players; filling a shipping vessel with the most profitable contents while limited by the maximum weight and volume of the cargo hold; which towns a salesman will visit with a limited marketing budget. This puzzle has a mathematical name - it is called the Knapsack Problem. It is named because the decision problem is imagined as filling a knapsack (or rucksack, backpack etc.) with the best choice of items. However the difficulty is that the total weight of the chosen items must not exceed the maximum weight capacity of the knapsack (single constraint). The problem can also be extended to two constraints - where the knapsack has a defined maximum weight capacity and a maximum volume capacity. The more constraints that are added, the more difficult it becomes to find the optimal solution to the problem. In fact this problem is considered to be NP-Hard - the time required to find a perfect solution grows exponentially with the number of items! Sometimes it is simple to visualise the decision (Grocery Shopping : monetary cost of an item vs a budget) or a little more abstract (Manufacturing: hours spent on a job vs the total operational hours of a plant) -- notice that the choice can be a set of items or a series of actions. Any decision can be considered in this way if the task is to choose some, but not all, of a set of possibilities. A particular variation of this problem is called the 0-1 Knapsack Problem. In this case we would like to maximise the total profit by examining each and every item from a set, and decide whether to take or leave that item (but we cannot take more than one of each item). We also must remember to abide by the constraint that the total weight of our chosen items is less than or equal to our maximum weight capacity. The constraints of these problems can be defined by the Capacity Ratio - this is ratio of maximum capacity to the total weight of all available items. In other words, a Capacity Ratio of 1 would mean that you have space for all of the available items. Capacity Ratio of 0.5 means that your knapsack will hold only half of the available total weight. Capacity Ratio of 0.1 results in a knapsack that holds only 10% of the available item weight. Note that the item weights and values are all different - 10% of the available total item weight does not necessarily mean 10% of the total number of available items. Mathematically this can be described as - n = Total number of items pi = Profit of the ith item wi = Weight of the ith item W = Maximum weight constraint C = Capacity Ratio T = Total weight of n items xi = Choice for the ith item - 0 (do not take the item) or 1 (put the item in the knapsack) In this assignment your task will be to produce good solutions to the 0-1 Knapsack Problem using a Genetic Algorithm. You will explore the design and architecture of Genetic Algorithms and learn the difference between parameters and mechanisms in heuristic optimisation. This class of algorithms involves randomness and chance - therefore you should run each test more than once, as the exact result will be different each time. Algorithm Base Parameters Use these parameters as a baseline when developing your algorithm. Population Size = 50 Maximum Generations = 150 Selection Number = 25 Mutation Rate = 0.02 A - Implement a Basic Genetic Algorithm A1. Implement a GA using the provided functions Generate your dataset using the provided function genDataset(). This will give you two vectors that represent the profit and weight of the items. This is the set that you will use for your optimisation challenges. The GA functions provided use binary encoding for xi where each gene is a single bit; 1 represents taking an item, and 0 represents leaving the item behind. The chromosome length is fixed to the number of available items. This is only one of many ways to set up a genetic encoding. MATLAB functions have been provided to form the basic building blocks of a genetic algorithm and you can refer to the lecture notes or online resources for more information on the purpose of each phase. Complete the function that implements a basic genetic algorithm to optimise your dataset. There is a template included at the beginning of the script that has the basic functions. Briefly explain the operation and importance of each stage of the genetic algorithm, and how they work in your MATLAB function. [20 Marks] A2. Identify Valid Solutions The algorithm in Part A.1 will ignore the weight limits and chase the highest overall score, but our knapsack has a limited capacity! Calculate the maximum allowable weight outside of the function (total weight of all items multiplied by the capacity ratio). Now, modify your function to also check for, and return, Best Valid Score (you will need to test each population member and decide if they are valid or invalid solutions). Explain how you check for valid solutions. Use the base parameters and Capacity Ratio = 0.65 to produce a plot showing Best Score and Best Valid Score vs Generation on the same graph. [10 Marks] B Add Intelligence to Your GA To solve the problem more effectively we must shape the behaviour of our GA. In metaheuristics it is important to consider an algorithm's organic behaviour, as well as its mathematical parameters. You should use the base parameters for each of the tests in this section. It is required that you create additional arguments to your function, so that you can control the new features - this will make it much easier to perform the tests and will be essential for the final optimisation. Copy your GA function file from Part A, and create a new file, ga_B.m to add the features Add the extra arguments : elite_no, pen_mode, sel_mode to your function before you start implementing the features B1. Elite Selection Elite selection is a mechanism which preserves the best solutions (highest evaluation) from one generation to the next - these elite members of the population are passed directly through selection, are preserved through crossover, and are NOT subject to mutation. Implement elite selection in your algorithm - you must modify the selection, crossover and mutation functions to take the additional argument elite_no. Use the additional argument elite_no to your ga_B function to control this feature. Note: See the maxk() function in MATLAB's reference material. Plot the Best Score vs Generation for elite numbers of [0, 5, 50], on a single graph. Use the base parameters and Capacity Ratio = 0.99 for this test. Describe the effect that elite selection has upon the score. [10 Marks] B2. Penalty Function Solutions which are beyond the weight limit are invalid and solutions which are below the weight limit are not taking full advantage of the resources. We can subtract a penalty from each solution's score, based upon the difference between their total weight and the maximum weight limit. The penalty function should be applied between evaluation and selection. Make sure to preserve the original score of solutions before the penalty is subtracted (you will need the original scores for the output of the algorithm). Implement the Linear and Quadratic penalty functions : vj = Penalty for the jth individual solution Sj = Total profit (score) of the jth individual solution T = Total weight of n items Yj = Total weight of the jth individual solution Q = Constant penalty scalar W = Maximum weight constraint Dj = Distance from the constraint for the jth individual solution Use the additional argument, pen_mode to your ga_B function to control this feature (where: 0=NO penalty, 1=linear penalty, 2=quadratic penalty). Compare the Mean Solution Weight vs Generation of the Linear and Quadratic penalties across a range of Capacity Ratios. Describe the relationships between the penalty functions and the capacity constraint. [10 Marks] B3. Repair Mechanism We noted previously that solutions become invalid when they exceed the weight limit. It is possible to enforce that all solutions are valid by incorporating a repair mechanism that removes items (one at a time) from invalid solutions until they are below the maximum weight limit. The repair mechanism should be applied before evaluation. Implement two forms of repair mechanism : - Random Choice : Choose item to be removed randomly - Least Value: Remove the lowest value item (where value is defined as profit/weight) Use the additional argument, rep_mode to your ga_B function to control this feature (where: 0=NO repair, 1=random repair, 2=least value repair). Plot Mean Solution Weight vs Generation for the Random and Least Value repair mechanisms. Show results for the repair mechanism tests at Capacity Ratios of 0.20 and 0.80 and display the Maximum Weight as a dashed horizontal line. [10 Marks] B4. Summary of Mechanisms Assess the new functionalities of your algorithm by comparing the Best Valid Score achieved for each variation (Linear Penalty, Quadratic Penalty, Random Repair, Least-Value Repair). Remember to use the base parameters and test with the Capacity Ratios of 0.25, 0.50, 0.75. Present the results in a table and scale the scores for each test, so that each Capacity Ratio column has a maximum value of 100. [10 Marks] C = 0.25 C = 0.50 C = 0.75 Linear Penalty Quadratic Penalty Random Repair Least-Value Repair C Tune the Parameters of the Intelligent GA One of the advantages of GA's is that they are often capable of a finding a 'good enough' solution relatively quickly. For smaller knapsack problems, it is also possible to calculate a globally optimal solution - so we can use this to benchmark and tune our GA for convergence speed. Find the optimal score for your assigned capacity ratio (look up your student number in the attached table) by using the provided knapsack_optimal() function. The initial target score for your GA is 97% of the global optimal. Modify your GA code so that it will stop early if it reaches a certain score threshold and return the generation number that it stopped at. Note: An if statement and the break keyword is the easy way to do this Copy your GA function file from Part B, and create a new file. ga_C.m Add an additional argument, score_stop to your ga_C function to control this feature C1. Decide upon which mechanisms to use and identify the optimal parameters Use the results from Part B to decide upon which mechanism(s) you will use in your GA (elite selection, linear/quadratic penalty, random/least-value repair). Systematically identify the GA parameters that result in the fastest solution to your problem. You must not exceed the following values: Population Size = 100 Maximum Generations = 300 All other parameters and mechanisms are free for you to optimise as you see fit. Run a series of tests to identify the best Population Size, Selection Number, Mutation Rate and Elite Number. If the algorithm is performing well and consistently reaching the goal then you should increase the target percentage in order to better identify the parameters required for highest performance. Note: The surf() and slice() functions may be helpful to visualise your algorithm's performance. Note: MATLAB is able to parallelise for loops using the parfor feature - if you are running many parameter tests or taking averages it can be beneficial to take advantage of all the CPU cores on your computer. Explain your choice of GA mechanisms, referring to the results of Part B and any additional testing that you performed. [5 Marks] Discuss your methodology to find the optimal parameters, you may support your analysis with plots and/or tabulated test results. Record the tuned parameters that you have identified for your algorithm. What are your observations about the sensitivity of the algorithm to each parameter? When using a GA to solve a problem, is the fastest converging solution always the best solution? [20 Marks] C2. Let It Go! Run your optimal architecture with tuned parameters for 1,000 generations. Note: You can use the MATLAB Editor's Run and Time to measure how long each part takes to run. Report the % of the optimal score that it achieves and how long the GA takes to compute in comparison to the knapsack_optimal() function. [5 Marks] Assignment Report Maximum 8 pages including cover page, all text, equations and plots, font size 11. Your report should contain the following: • Descriptions of the procedures used and explanations and observations on the results obtained in each exercise. The required ouputs are underlined at the end of each exercise. • Plots should be fully annotated (i.e. appropriate title, legend and axis labels, etc.) and appropriately scaled so important details are clear. (NB: Matlab figures can be copied and pasted into word using the edit->copy figure option in Matlab.) • References to the m-files relevant to the results presented. • Reports exceeding 8 pages will be penalised (10% marks reduction per additional page). Assignment Submission • Your report must be in pdf form and must be submitted via Canvas along with all your Matlab m-files (functions for ga_A, ga_B and ga_C, as well as the relevant scripts used to undertake each analysis). Do not include broken or otherwise non-functioning code. • The m-files should be clearly labelled and include comments to explain their function. • Zip your report document, m-files and models into a single zip file for on-line submission. The zip file should be called your_name_A2.zip • The deadline for submitting the assignment is 1700, 23rd March 2020. For questions or queries, contact: Che Cameron -
[email protected] Part C - Assigned Capacity Ratios Name Capacity Ratio (Part C) Tianyi Chen 0.74 Amrapali Das 0.4 Aravind Rahul Dhanarajan 0.1 Timoteo García Bertoa 0.86 Soukalin Ghosh 0.26 Rahul Jeerumapathi Umapathi 0.76 Anqi Jin 0.36 Muhammad Khan 0.28 Mateusz Kobiera 0.8 Karthik Kumar 0.92 Jie Lei 0.12 Conor Lewis 0.24 Songbing Liang 0.72 Aoqiang Liu 0.82 Feng Liu 0.88 Zhengkun Liu 0.16 Hayley McCready 0.32 Aditi Mishra 0.22 Shazia Mohammadi 0.64 Aditya Mukherjee 0.78 Swagatam Sanyal 0.7 Jinpeng Shan 0.6 Madhu Sai Vamsi Vayala 0.9 Xiaoxin Wang 0.84 Zhe Wang 0.34 Zihao Wang 0.68 Xue Wu 0.66 Ziqi Wu 0.38 Bohan Yu 0.62 Ruijie Zhai 0.3 Zihe Zhang 0.18 Qingyu Zheng 0.2 Dazhi Zhu 0.14