辅导案例-APM395/595:

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

1
APM395/595: Semester Project
Due Tuesday December 1, 2020 by 10 pm uploaded into Blackboard

Introduction:

Accurate estimators of flood discharges and resulting flood damages are a key element to an
effective, flood damage abatement program. In the US, on average about 200 deaths a year occur
due to floods, and worldwide approximately 1 billion people currently live within the path of the
100-year flood. These are often the world’s poorest inhabitants, and the number of people living
within the 100-year floodplain is expected to increase to 2 billion people by 2050. In addition,
annual monetary flood damages worldwide are typically in the 100s of billions of dollars, and
recently the World Bank estimated annual damages to be over $1 trillion by 2050 in 136 of the
world’s largest coastal cities.

One component to preparing for flood events is to predict expected flood levels. Such estimates
are used for insurance purposes (for delineating floodplains), for designing hydraulic structures to
help alleviate or control flooding, and to site and design infrastructure to withstand flooding events.
When a historic streamflow record at a river site is available, a frequency analysis is used to
estimate the flood percentile of interest. In the United States, the recommended procedure is to use
a log-Pearson III distribution (which is similar to a lognormal distribution) to describe annual
maximum streamflows. When no historic streamflow data is available at the site of interest (i.e. an
ungauged river site), the problem of estimating flood percentiles becomes more difficult and less
accurate. One common procedure to estimate flood percentiles at ungauged river sites is to develop
a regional regression model between the flood percentile you are interested in and measured
watershed parameters that describe characteristics such as geology, geomorphology, topography,
and climate. A regression relationship is developed using information from a number of gauged
river sites in a hydrologically similar region. Once the parameters of the regression model have
been estimated, the flood percentile can then be estimated at the ungauged site using measured
watershed characteristics for that site. In this project, you are asked to use methods discussed in
APM395/595 to describe annual maximum streamflows at one gauged river site and to develop a
regional regression relationship between 100-year flood estimates in the region and measured
watershed characteristics.

Project Description:

You will need to use three data files for this project. The first file you must retrieve from the
Internet. We are interested in obtaining the recorded annual maximum streamflows for the
Susquehanna River in Conklin, NY (USGS #01503000). You should retrieve these from the
USGS web site (see Homework #1 for information on how to do this if you don’t remember).
From this data, you should create a file to be read into R (such as a “.csv” file). The second file is
located on our blackboard page under Assignments. This file is called Project_2020.xlsx and
contains information from 66 gauged river sites in our study area. These sites were chosen because
(1) their gauges were located at a longitude between 74° W and 77° W and their longitude was
south of 43° N, (2) they have at least 20 years of record, (3) the hydrologic disturbance index
(HDI) at each site is less than 20 (HDI is a measure of how impacted a watershed is by
anthropogenic influences), and (4) watershed characteristics associated with the stream gauge are
2
available in the GAGES-II database compiled by Falcone (2011) (http://water.usgs.gov/GIS/
metadata/usgswrd/XML/ gagesII_Sept2011.xml), an updated version of the original GAGES
database by Falcone et al. (2010) (http://esapubs.org/Archive/ecol/E091/045/default.htm).

The GAGE-II database contains hundreds of watershed characteristics for over 9000 gauged
watersheds across the US. These watershed characteristics were extracted using GIS tools and
spatially-explicit digital raster grids of variables describing topography, geology, geography,
meteorology, etc. We have pulled a subset of the database for the New York sites. Included in this
file are the USGS gauging station numbers and site names, estimates of the 100-year flood (using
an LP3), and 100 watershed characteristics (column F to column DA in the spreadsheet) for all 66
sites. Here you are asked to describe the annual maximum flows at 1 site and develop a regional
regression model to predict the 100-year flood for this region.

Steps in Project:

Step 1: Description of Maximum Flows
We have discussed a number of different methods, both visual and numerical, to describe a
data set. Use any methods you consider appropriate to describe the annual maximum flows on
the Susquehanna River in Conklin, NY (USGS #01503000). You should think about what
important characteristics of the data set should be mentioned, and what characteristics are less
important (i.e. can be left out). For example, you should definitely discuss whether there are
any apparent temporal (time) trends in annual maximum flows over the period of record (i.e.
are maximum flows getting larger, smaller, staying about the same?). This section should be
thorough, yet concise.

Step 2: Development of a Regional Regression Equation for the 100-year Flood

In general, a commonly employed model between the 100-year flood, Q100, and watershed
characteristics takes the form:

!"" = #!!#"$## . . . %#$

where X1, X2, . . . are watershed characteristics, and b0, b1, b2 to % are model parameters to
be estimated. By taking the logarithm of both sides of this equation, we get:

(!"") = " + !ln(!) + $ln($)+ . . . + %ln (%)

which is a linear equation. The parameters in this linear equation can be estimated using
ordinary least squares (OLS) regression procedures.

Your goal is to develop the best possible model for ln(Q100) using the available watershed
characteristics, Xi, in the file ‘Project_2020.xlsx’. You may use any form of the watershed
characteristics that you choose (an obvious start is to take the logarithms of the strictly positive
parameters, but any form you choose will be allowable).

3
We will judge 'best' by the magnitude of the adjusted-R2 (coefficient of determination) value
for the final regression equation. The larger the adjusted-R2 the better. HOWEVER, PLEASE
NOTE THAT ALL VARIABLES IN THE FINAL MODEL MUST BE SIGNIFICANT AT
AT LEAST A 5% SIGNIFICANCE LEVEL! In addition, you should perform analyses of the
assumptions of the regression model you developed (such as the normality of the residuals,
residual plots, etc.). Once you develop your model, you are to estimate the 100-year flood at
the Susquehanna River in Conklin, NY using this model and watershed characteristics from
this site. All calculations must be done in R.

Project Write-up:

You are to present your analysis as a thorough, yet concise, professionally written engineering
report. Your final report should contain:
1) An introduction to the problem (i.e. why are you doing this?).
2) A visual and numerical description of annual maximum streamflow at the Susquehanna
River in Conklin, NY, as well as a discussion of this information.
3) A description of your technique for developing a regional regression equation for the 100-
year flood, including a presentation and discussion of your analysis of model assumptions.
4) A presentation of your final regression equation, as well as the important statistics relevant
to the equation (i.e. report the adjusted-R2 value, maximum p-values, etc.). Include the
results and discussion of the prediction of the 100-year flood at the Susquehanna River in
Conklin, NY.
5) A discussion of what other variables might be included in this model (i.e. those
characteristics within the GAGES II database that seem important but may not have been
included in your final model, or other characteristics not included in the GAGES II
database) that you think might be important.
6) Concluding remarks on your analysis.
7) The R script you developed to perform your analysis (as a file and not a hard copy in your
report). This script should not have any excess code and should operate without any input
from us (i.e. we should not have to change the name of your input file for the code to run).
Make sure your R code will read in your input files without changes, and you do not
use any additional R packages outside of what we have discussed in class. We will run
the code to verify that it indeed generates the data on which you based your report and
recommendations. If we run the code and it does not produce the output that you have
provided in your report, we will ask you to explain the discrepancy. The script should have
comments (a line starting with “#” in R) the code that will aide us in understanding and
troubleshooting your code. The top of your code should have comments including your
name, the class number, date, a description of what the code does, and a brief
description of every variable used in your code.

Your report (other than the appendix) should contain no more than 4 pages of text (double spaced,
12 pt font, 1-inch margins). Plots and tables are not included in this 4-page total but should instead
be put in an appendix at the end of your report. All written sections, as described above, should fit
within the 4-page limit.

4

Report Submission:

The written report, the R script, and your input file(s) (.csv) must be uploaded to Blackboard by
10 pm on Tuesday December 1, 2020. Make sure your R script calls the input file from the
working directory (i.e. don’t include C:\temp\thiscourseisgreat\kroll_1.csv or a setwd
command, but instead just kroll_1.csv for the input file). The written report should be a
MSWord document (not a pdf). To submit your project, you must make a single folder entitled
‘yourlastname_project’ which will contain 3+ files: ‘yourlastname.R’, ‘yourlastname.doc’ and
‘yourlastname.csv’ (if you have multiple input files, call them ‘yourlastname_1.csv’,
‘yourlastname_2.csv’, etc.).

Grading of Report:
The grading of this report shall be as follows: 25% Clarity, punctuation, and grammar
25% Discussion
50% Technical analysis, experimental
design, and performance of your
final regression equation
Penalty for Late Reports:
The late penalty for this project is:
After 10 pm 12/1/20 and before 10 pm 12/2/20 10% Penalty
After 10 pm 12/2/20 and before 10 am 12/4/20 30% Penalty
After 10 am 12/4/20 100% Penalty

Academic Integrity:
Plagiarism of any type, and especially copied R scripts, will not be tolerated. It is very easy to
identify a copied script. Anyone submitting identical or slightly modified scripts will be charged
with academic plagiarism and will, at least, receive a zero on this assignment. For more
information see the SUNY ESF Student Handbook available at:
https://www.esf.edu/students/handbook/documents/handbook.pdf.

References:
Falcone, J.A., Carlisle, D.M., Wolock, D.M., and Meador, M.R. 2010. GAGES: A stream gage
database for evaluating natural and altered flow conditions in the conterminous United
States. Ecology 91:621.
Falcone, J.A. 2011. GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow
Geospatial_Data_Presentation_Form: vector digital data, http://water.usgs.gov/GIS/
metadata/usgswrd/XML/gagesII_Sept2011.xml, accessed November 10, 2013.

欢迎咨询51作业君