辅导案例-CS547

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
CS547 - Advanced Topics in Software
Engineering
Software Project Economics -
Cost Estimation and Genetic Programming
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 1 / 20
Introduction
Cost estimation is a challenging problem
Given a project with various parameters (estimates of size,
domain, programming languages, expertise of development team
etc. etc.) how much is it going to cost?
Many different approaches to making predictions:
Algorithmic (COCOMO etc.)
Historical data (Expert Judgement, Statistics and Machine
Learning)
Numerous approaches but no agreement over which one to use
This topic looks at the use of genetic programming for cost
estimation
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 2 / 20
Cost Estimation - Using Historical Data
Typical dataset example:
@attribute Input numeric
@attribute Output numeric
@attribute Inquiry numeric
@attribute File numeric
@attribute FPAdj numeric
@attribute RawFPcounts numeric
@attribute AdjFP numeric
@attribute Effort numeric
@data
25 150 75 60 1 1750 1750 102.4
193 98 70 36 1 1902 1902 105.2
70 27 0 12 0.8 535 428 11.1
40 60 20 12 1.15 660 759 21.1
10 69 1 9 0.9 478.89 431 28.8
13 19 0 23 0.75 377.33 283 10
34 14 0 5 0.8 256.25 205 8
...
A project may have several parameters
which capture various characteristics
Will vary between projects and
datasets
Interested in deriving the most
accurate prediction for the Effort
parameter
Given a new project we can use this
to make an estimate for the
development effort
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 3 / 20
Genetic Programming
Application of evolutionary computation to programs or functions
Genetically breeds a population of programs to solve a problem
Uses the same principles of genetic operators: sexual
recombination, mutation etc.
Aim (in this case) is to evolve a function which makes an accurate
prediction of cost given a project’s characteristics
GP is also widely used for a range of other problems such as code
optimisation or automated repair
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 4 / 20
Program Representation
Programs expressed as syntax
trees rather than lines of code
e.g. max(x+x, x+3*y)
Internally represented as trees or
prefix expressions:-
(max(+ x x) (+ x (* 3 y)))
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 5 / 20
GP - Key Components
Need to specify:
Terminal set
Function Set
Fitness Measure
Run Parameters
Termination Criterion
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 6 / 20
Terminal Set
Identify the possible leaves that constitute the program.
May consist of:
External inputs (x, y etc.)
Numerical constants (0, 42, 3.14159 etc.)
0-arity functions (rand() etc.)
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 7 / 20
Function Set
Identify the possible non-terminal elements of the program.
May consist of simple arithmetic operators (+, -, /, *, /) and a
range of additional functions:
Mathematical (sin, abs etc.)
Boolean (and, or, not)
Conditional (if-then-else)
Looping (for, repeat)
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 8 / 20
Other Key Components
Need to specify:
Fitness Measure
Captures goal of search. Same as for other search-based
problems.
Cost estimation: distance between predicted and known effort
value
Run Parameters
Genetic operator probabilities
Maximum size of programs
...
Termination Criterion
Number of generations or fitness-based success criterion
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 9 / 20
Steps in a Run
Identical to other search-based approaches
1 Randomly create an initial population of programs from available
primitives
2 While the termination criteria remains unsatisfied, do:
1 Execute each program to determine its fitness
2 Select programs to contribute to the next generation
3 Create new individual programs by applying genetic operators
3 Return best individual
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 10 / 20
Initialising the Population
Programs in initial population built by recursively generating a tree
composed of random choices of functions and terminals
Individuals generated subject to a defined maximum size
Three standard approaches
Full initialisation
Grow initialisation
Ramped half-and-half (combination of full and grow)
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 11 / 20
Full Initialisation
Nodes taken from function set until a maximum tree depth is
reached, after which only terminals may be chosen
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 12 / 20
Grow Initialisation
Like full but allows nodes to be taken from the full function set until
a maximum tree depth is reached
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 13 / 20
Genetic Operators - Crossover
Given copies of two selected parents, crossover randomly selects a
crossover point in each parent tree and swaps the sub-trees rooted at
the crossover points
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 14 / 20
Genetic Operators - Mutation
Mutation randomly selects a mutation point in a tree and replaces
the sub-tree rooted there with a randomly generated sub-tree
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 15 / 20
Example using tiny gp.java
Data:
1 1
2 4
3 9
4 16
5 25
...
45 2025
46 2116
47 2209
48 2304
49 2401
50 2500
Output:
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 16 / 20
Cost Estimation - Determining Prediction Accuracy
Various standard measures to assess how good a prediction is:
MMRE - Mean Magnitude of Relative Error:-
MRE = (abs(actual effort - estimated effort))/actual effort
Pred(n) - Proportion of estimates within n% of the actual
MAE - Mean Absolute Error :-
abs(actual effort - estimated effort)
These measures are used to evaluate the performance of a model and
may also form the basis of a fitness function
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 17 / 20
How Good is The Evolved Program?
One approach is to use a train/test split:
Train the model on 80% of the data, and hold back 20% for
testing
Alternatively, especially if the data set is small, use k-fold
cross-validation:
Evolve a program using k-1 partitions and compare the program’s
output with the remaining partition
Repeat and average over the whole dataset
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 18 / 20
Comparing Against Other Estimation Techniques
Large number in use, e.g.
Linear Regression
Decision Trees
KNN (K Nearest Neighbours)
Plus many many other more sophisticated machine learning
techniques....
Most implemented in Weka (machine learning and data mining
toolkit) – installed in labs and uses same format as cost estimation
data files you will be working from, or R (more powerful language and
set of libraries), or within Python libraries.
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 19 / 20
Assessed Task
Using GP to build software project cost estimation functions
Read section 8.4 in the ‘Techniques Taxonomy, Tutorial’ paper
Collection of effort datasets on myplace - try at least two of
these (more if you wish)
Several frameworks/libraries to support this (e.g. JGAP for Java
or DEAP for Python)
Evaluate the effectiveness of your approach
Compare with one other basic technique and also baseline (e.g.
mean Effort)
Produce a short report
More details on myplace
Software Project Economics - Cost Estimation and Genetic ProgrammingCS547 - Advanced Topics in Software Engineering 20 / 20
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468