1 APM395/595: Semester Project Due Tuesday December 1, 2020 by 10 pm uploaded into Blackboard Introduction: Accurate estimators of flood discharges and resulting flood damages are a key element to an effective, flood damage abatement program. In the US, on average about 200 deaths a year occur due to floods, and worldwide approximately 1 billion people currently live within the path of the 100-year flood. These are often the world’s poorest inhabitants, and the number of people living within the 100-year floodplain is expected to increase to 2 billion people by 2050. In addition, annual monetary flood damages worldwide are typically in the 100s of billions of dollars, and recently the World Bank estimated annual damages to be over $1 trillion by 2050 in 136 of the world’s largest coastal cities. One component to preparing for flood events is to predict expected flood levels. Such estimates are used for insurance purposes (for delineating floodplains), for designing hydraulic structures to help alleviate or control flooding, and to site and design infrastructure to withstand flooding events. When a historic streamflow record at a river site is available, a frequency analysis is used to estimate the flood percentile of interest. In the United States, the recommended procedure is to use a log-Pearson III distribution (which is similar to a lognormal distribution) to describe annual maximum streamflows. When no historic streamflow data is available at the site of interest (i.e. an ungauged river site), the problem of estimating flood percentiles becomes more difficult and less accurate. One common procedure to estimate flood percentiles at ungauged river sites is to develop a regional regression model between the flood percentile you are interested in and measured watershed parameters that describe characteristics such as geology, geomorphology, topography, and climate. A regression relationship is developed using information from a number of gauged river sites in a hydrologically similar region. Once the parameters of the regression model have been estimated, the flood percentile can then be estimated at the ungauged site using measured watershed characteristics for that site. In this project, you are asked to use methods discussed in APM395/595 to describe annual maximum streamflows at one gauged river site and to develop a regional regression relationship between 100-year flood estimates in the region and measured watershed characteristics. Project Description: You will need to use three data files for this project. The first file you must retrieve from the Internet. We are interested in obtaining the recorded annual maximum streamflows for the Susquehanna River in Conklin, NY (USGS #01503000). You should retrieve these from the USGS web site (see Homework #1 for information on how to do this if you don’t remember). From this data, you should create a file to be read into R (such as a “.csv” file). The second file is located on our blackboard page under Assignments. This file is called Project_2020.xlsx and contains information from 66 gauged river sites in our study area. These sites were chosen because (1) their gauges were located at a longitude between 74° W and 77° W and their longitude was south of 43° N, (2) they have at least 20 years of record, (3) the hydrologic disturbance index (HDI) at each site is less than 20 (HDI is a measure of how impacted a watershed is by anthropogenic influences), and (4) watershed characteristics associated with the stream gauge are 2 available in the GAGES-II database compiled by Falcone (2011) (http://water.usgs.gov/GIS/ metadata/usgswrd/XML/ gagesII_Sept2011.xml), an updated version of the original GAGES database by Falcone et al. (2010) (http://esapubs.org/Archive/ecol/E091/045/default.htm). The GAGE-II database contains hundreds of watershed characteristics for over 9000 gauged watersheds across the US. These watershed characteristics were extracted using GIS tools and spatially-explicit digital raster grids of variables describing topography, geology, geography, meteorology, etc. We have pulled a subset of the database for the New York sites. Included in this file are the USGS gauging station numbers and site names, estimates of the 100-year flood (using an LP3), and 100 watershed characteristics (column F to column DA in the spreadsheet) for all 66 sites. Here you are asked to describe the annual maximum flows at 1 site and develop a regional regression model to predict the 100-year flood for this region. Steps in Project: Step 1: Description of Maximum Flows We have discussed a number of different methods, both visual and numerical, to describe a data set. Use any methods you consider appropriate to describe the annual maximum flows on the Susquehanna River in Conklin, NY (USGS #01503000). You should think about what important characteristics of the data set should be mentioned, and what characteristics are less important (i.e. can be left out). For example, you should definitely discuss whether there are any apparent temporal (time) trends in annual maximum flows over the period of record (i.e. are maximum flows getting larger, smaller, staying about the same?). This section should be thorough, yet concise. Step 2: Development of a Regional Regression Equation for the 100-year Flood In general, a commonly employed model between the 100-year flood, Q100, and watershed characteristics takes the form: !"" = #!!#"$## . . . %#$ where X1, X2, . . . are watershed characteristics, and b0, b1, b2 to % are model parameters to be estimated. By taking the logarithm of both sides of this equation, we get: (!"") = " + !ln(!) + $ln($)+ . . . + %ln (%) which is a linear equation. The parameters in this linear equation can be estimated using ordinary least squares (OLS) regression procedures. Your goal is to develop the best possible model for ln(Q100) using the available watershed characteristics, Xi, in the file ‘Project_2020.xlsx’. You may use any form of the watershed characteristics that you choose (an obvious start is to take the logarithms of the strictly positive parameters, but any form you choose will be allowable). 3 We will judge 'best' by the magnitude of the adjusted-R2 (coefficient of determination) value for the final regression equation. The larger the adjusted-R2 the better. HOWEVER, PLEASE NOTE THAT ALL VARIABLES IN THE FINAL MODEL MUST BE SIGNIFICANT AT AT LEAST A 5% SIGNIFICANCE LEVEL! In addition, you should perform analyses of the assumptions of the regression model you developed (such as the normality of the residuals, residual plots, etc.). Once you develop your model, you are to estimate the 100-year flood at the Susquehanna River in Conklin, NY using this model and watershed characteristics from this site. All calculations must be done in R. Project Write-up: You are to present your analysis as a thorough, yet concise, professionally written engineering report. Your final report should contain: 1) An introduction to the problem (i.e. why are you doing this?). 2) A visual and numerical description of annual maximum streamflow at the Susquehanna River in Conklin, NY, as well as a discussion of this information. 3) A description of your technique for developing a regional regression equation for the 100- year flood, including a presentation and discussion of your analysis of model assumptions. 4) A presentation of your final regression equation, as well as the important statistics relevant to the equation (i.e. report the adjusted-R2 value, maximum p-values, etc.). Include the results and discussion of the prediction of the 100-year flood at the Susquehanna River in Conklin, NY. 5) A discussion of what other variables might be included in this model (i.e. those characteristics within the GAGES II database that seem important but may not have been included in your final model, or other characteristics not included in the GAGES II database) that you think might be important. 6) Concluding remarks on your analysis. 7) The R script you developed to perform your analysis (as a file and not a hard copy in your report). This script should not have any excess code and should operate without any input from us (i.e. we should not have to change the name of your input file for the code to run). Make sure your R code will read in your input files without changes, and you do not use any additional R packages outside of what we have discussed in class. We will run the code to verify that it indeed generates the data on which you based your report and recommendations. If we run the code and it does not produce the output that you have provided in your report, we will ask you to explain the discrepancy. The script should have comments (a line starting with “#” in R) the code that will aide us in understanding and troubleshooting your code. The top of your code should have comments including your name, the class number, date, a description of what the code does, and a brief description of every variable used in your code. Your report (other than the appendix) should contain no more than 4 pages of text (double spaced, 12 pt font, 1-inch margins). Plots and tables are not included in this 4-page total but should instead be put in an appendix at the end of your report. All written sections, as described above, should fit within the 4-page limit. 4 Report Submission: The written report, the R script, and your input file(s) (.csv) must be uploaded to Blackboard by 10 pm on Tuesday December 1, 2020. Make sure your R script calls the input file from the working directory (i.e. don’t include C:\temp\thiscourseisgreat\kroll_1.csv or a setwd command, but instead just kroll_1.csv for the input file). The written report should be a MSWord document (not a pdf). To submit your project, you must make a single folder entitled ‘yourlastname_project’ which will contain 3+ files: ‘yourlastname.R’, ‘yourlastname.doc’ and ‘yourlastname.csv’ (if you have multiple input files, call them ‘yourlastname_1.csv’, ‘yourlastname_2.csv’, etc.). Grading of Report: The grading of this report shall be as follows: 25% Clarity, punctuation, and grammar 25% Discussion 50% Technical analysis, experimental design, and performance of your final regression equation Penalty for Late Reports: The late penalty for this project is: After 10 pm 12/1/20 and before 10 pm 12/2/20 10% Penalty After 10 pm 12/2/20 and before 10 am 12/4/20 30% Penalty After 10 am 12/4/20 100% Penalty Academic Integrity: Plagiarism of any type, and especially copied R scripts, will not be tolerated. It is very easy to identify a copied script. Anyone submitting identical or slightly modified scripts will be charged with academic plagiarism and will, at least, receive a zero on this assignment. For more information see the SUNY ESF Student Handbook available at: https://www.esf.edu/students/handbook/documents/handbook.pdf. References: Falcone, J.A., Carlisle, D.M., Wolock, D.M., and Meador, M.R. 2010. GAGES: A stream gage database for evaluating natural and altered flow conditions in the conterminous United States. Ecology 91:621. Falcone, J.A. 2011. GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow Geospatial_Data_Presentation_Form: vector digital data, http://water.usgs.gov/GIS/ metadata/usgswrd/XML/gagesII_Sept2011.xml, accessed November 10, 2013.
欢迎咨询51作业君