辅导案例-CS 3114

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 1
Geographic Information System

Geographic information systems organize information pertaining to geographic features and provide various kinds of access
to the information. A geographic feature may possess many attributes (see below). In particular, a geographic feature has a
specific location. There are a number of ways to specify location. For this project, we will use latitude and longitude, which
will allow us to deal with geographic features at any location on Earth. A reasonably detailed tutorial on latitude and
longitude can be found in the Wikipedia at en.wikipedia.org/wiki/Latitude and en.wikipedia.org/wiki/Longitude.

The GIS record files were obtained from the website for the USGS Board on Geographic Names (www.usgs.gov/core-
science-systems/ngp/board-on-geographic-names/download-gnis-data). The file begins with a descriptive header line,
followed by a sequence of GIS records, one per line, which contain the following fields in the indicated order:

Figure 1: Geographic Data Record Format

Name Type
Length/
Decimals
Short Description
Feature ID Integer 10
Permanent, unique feature record identifier and official feature name Feature
Name
String 120
Feature
Class
String 50 See Figure 3 later in this specification
State Alpha String 2
The unique two letter alphabetic code and the unique two number code for a US State State
Numeric
String 2
County
Name
String 100
The name and unique three number code for a county or county equivalent
County
Numeric
String 3
Primary
Latitude
DMS
String 7
The official feature location
DMS-degrees/minutes/seconds
DEC-decimal degrees.

Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Primary
Longitude
DMS
String 8
Primary
Latitude
DEC
Real
Number
11/7
Primary
Longitude
DEC
Real
Number
12/7
Source
Latitude
DMS
String 7
Source coordinates of linear feature only (Class = Stream, Valley, Arroyo)
DMS-degrees/minutes/seconds
DEC-decimal degrees.

Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Source
Longitude
DMS
String 8
Source
Latitude
DEC
Real
Number
11/7
Source
Longitude
DEC
Real
Number
12/7

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 2
Elevation
(meters)
Integer 5 Elevation in meters above (-below) sea level of the surface at the primary coordinates
Elevation
(feet)
Integer 6 Elevation in feet above (-below) sea level of the surface at the primary coordinates
Map Name String 100 Name of USGS base series topographic map containing the primary coordinates.
Date
Created
String

The date the feature was initially committed to the database.
Date Edited String

The date any attribute of an existing feature was last edited.

Notes:

 See https://geonames.usgs.gov/domestic/states_fileformat.htm for the full field descriptions.
 The type specifications used here have been modified from the source (URL above) to better reflect the realities of
your programming environment.
 Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W) format, or DEC (real
number, -82.1417975) format. In DMS format, latitude will always be expressed using 6 digits followed by a single
character specifying the hemisphere, and longitude will always be expressed using 7 digits followed by a
hemisphere designator.
 Although some fields are mandatory, some may be omitted altogether. Best practice is to treat every field as if it
may be left unspecified. Certain fields are necessary in order to index a record: the feature name and the primary
latitude and primary longitude. If a record omits any of those fields, you may discard the record, or index it as far as
possible.

In the GIS record file, each record will occur on a single line, and the fields will be separated by pipe ('|') symbols. Empty
fields will be indicated by a pair of pipe symbols with no characters between them. See the posted VA_Monterey.txt file
for many examples.

GIS record files are guaranteed to conform to this syntax, so there is no explicit requirement that you validate the files. On
the other hand, some error-checking during parsing may help you detect errors in your parsing logic.

The file can be thought of as a sequence of bytes, each at a unique offset from the beginning of the file, just like the cells of
an array. So, each GIS record begins at a unique offset from the beginning of the file.

Line Termination

Each line of a text file ends with a particular marker (known as the line terminator). In MS-DOS/Windows file systems, the
line terminator is a sequence of two ASCII characters (CR + LF, 0X0D0A). In Unix systems, the line terminator is a single
ASCII character (LF). Other systems may use other line termination conventions.

Why should you care? Which line termination is used has an effect on the file offsets for all but the first record in the data
file. As long as we’re all testing with files that use the same line termination, we should all get the same file offsets. But if
you change the file format (of the posted data files) to use different line termination, you will get different file offsets than are
shown in the posted log files. Most good text editors will tell you what line termination is used in an opened file, and also let
you change the line termination scheme.

All that being said, this project is not auto-graded, and the grading of correctness will depend on whether you report the
correct search results, not on the file offsets you report.

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 3
Figure 2: Sample Geographic Data
Records

Note that some record fields are optional, and that
when there is no given value for a field, there are
still delimiter symbols for it.

Also, some of the lines are "wrapped" to fit into
the text box; lines are never "wrapped" in the
actual data files.

F
E
A
T
U
R
E
_
I
D
|
F
E
A
T
U
R
E
_
N
A
M
E
|
F
E
A
T
U
R
E
_
C
L
A
S
S
|
S
T
A
T
E
_
A
L
P
H
A
|
S
T
A
T
E
_
N
U
M
E
R
I
C
|
C
O
U
N
T
Y
_
N
A
M
E
|
C
O
U
N
T
Y
_
N
U
M
E
R
I
C
|
P
R
I
M
A
R
Y
_
L
A
T
_
D
M
S
|
P
R
I
M
_
L
O
N
G
_
D
M
S
|
P
R
I
M
_
L
A
T
_
D
E
C
|
P
R
I
M
_
L
O
N
G
_
D
E
C
|
S
O
U
R
C
E
_
L
A
T
_
D
M
S
|
S
O
U
R
C
E
_
L
O
N
G
_
D
M
S
|
S
O
U
R
C
E
_
L
A
T
_
D
E
C
|
S
O
U
R
C
E
_
L
O
N
G
_
D
E
C
|
E
L
E
V
_
I
N
_
M
|
E
L
E
V
_
I
N
_
F
T
|
M
A
P
_
N
A
M
E
|
D
A
T
E
_
C
R
E
A
T
E
D
|
D
A
T
E
_
E
D
I
T
E
D

1
4
7
9
1
1
6
|
M
o
n
t
e
r
e
y

E
l
e
m
e
n
t
a
r
y

S
c
h
o
o
l
|
S
c
h
o
o
l
|
V
A
|
5
1
|
R
o
a
n
o
k
e

(
c
i
t
y
)
|
7
7
0
|
3
7
1
9
0
6
N
|
0
7
9
5
6
0
8
W
|
3
7
.
3
1
8
3
7
5
3
|
-
7
9
.
9
3
5
5
8
5
7
|
|
|
|
|
3
2
3
|
1
0
6
0
|
R
o
a
n
o
k
e
|
0
9
/
2
8
/
1
9
7
9
|
0
9
/
1
5
/
2
0
1
0

1
4
8
1
3
4
5
|
A
s
b
u
r
y

C
h
u
r
c
h
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
6
0
7
N
|
0
7
9
3
3
1
2
W
|
3
8
.
4
3
5
3
9
8
1
|
-
7
9
.
5
5
3
3
8
0
7
|
|
|
|
|
8
1
8
|
2
6
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
1
8
5
2
|
B
l
u
e

G
r
a
s
s
|
P
o
p
u
l
a
t
e
d

P
l
a
c
e
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
3
0
0
0
N
|
0
7
9
3
2
5
9
W
|
3
8
.
5
0
0
1
1
8
8
|
-
7
9
.
5
4
9
7
7
0
2
|
|
|
|
|
7
7
7
|
2
5
4
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
1
8
7
8
|
B
l
u
e
g
r
a
s
s

V
a
l
l
e
y
|
V
a
l
l
e
y
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
2
2
2
W
|
3
8
.
4
9
8
1
7
4
5
|
-
7
9
.
5
3
9
4
9
2
|
3
8
2
6
0
1
N
|
0
7
9
3
8
0
0
W
|
3
8
.
4
3
3
7
3
0
9
|
-
7
9
.
6
3
3
3
8
3
3
|
7
5
9
|
2
4
9
0
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
1
1
0
|
B
u
c
k

H
i
l
l
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
9
0
2
N
|
0
7
9
3
3
5
8
W
|
3
8
.
3
1
7
3
4
5
2
|
-
7
9
.
5
6
6
1
5
7
7
|
|
|
|
|
1
0
0
3
|
3
2
9
1
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
1
7
6
|
B
u
r
n
e
r
s

R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
5
0
9
N
|
0
7
9
3
4
0
9
W
|
3
8
.
4
1
9
2
8
7
3
|
-
7
9
.
5
6
9
2
1
4
4
|
3
8
2
5
3
1
N
|
0
7
9
3
5
3
8
W
|
3
8
.
4
2
5
2
7
7
8
|
-
7
9
.
5
9
3
8
8
8
9
|
8
4
8
|
2
7
8
2
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
3
2
4
|
M
o
u
n
t

C
a
r
l
y
l
e
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
5
5
6
N
|
0
7
9
3
3
5
3
W
|
3
8
.
2
6
5
6
7
9
9
|
-
7
9
.
5
6
4
7
6
8
2
|
|
|
|
|
6
9
8
|
2
2
9
0
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
4
3
4
|
C
e
n
t
r
a
l

C
h
u
r
c
h
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
3
2
3
W
|
3
8
.
4
9
8
1
7
4
4
|
-
7
9
.
5
5
6
4
3
7
1
|
|
|
|
|
7
7
3
|
2
5
3
6
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
5
5
7
|
C
l
a
y
l
i
c
k

H
o
l
l
o
w
|
V
a
l
l
e
y
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
6
1
3
N
|
0
7
9
3
2
3
8
W
|
3
8
.
2
7
0
4
0
2
1
|
-
7
9
.
5
4
3
9
3
4
3
|
3
8
1
7
3
3
N
|
0
7
9
3
3
2
4
W
|
3
8
.
2
9
2
5
|
-
7
9
.
5
5
6
6
6
6
7
|
5
7
3
|
1
8
8
0
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
7
8
5
|
C
r
a
b

R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
7
0
7
N
|
0
7
9
3
1
4
4
W
|
3
8
.
2
8
5
4
0
1
8
|
-
7
9
.
5
2
8
9
3
4
|
3
8
1
9
0
3
N
|
0
7
9
3
4
1
5
W
|
3
8
.
3
1
7
5
|
-
7
9
.
5
7
0
8
3
3
3
|
5
7
9
|
1
9
0
0
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
2
9
5
0
|
D
a
v
i
s

R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
8
2
4
N
|
0
7
9
3
0
5
3
W
|
3
8
.
3
0
6
7
9
0
3
|
-
7
9
.
5
1
4
7
6
7
1
|
3
8
2
0
5
7
N
|
0
7
9
3
5
0
5
W
|
3
8
.
3
4
9
1
6
6
7
|
-
7
9
.
5
8
4
7
2
2
2
|
6
0
1
|
1
9
7
2
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
2
8
1
|
E
l
k

R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
3
6
N
|
0
7
9
3
1
5
3
W
|
3
8
.
4
9
3
4
5
2
4
|
-
7
9
.
5
3
1
4
3
6
2
|
3
8
3
1
2
1
N
|
0
7
9
3
0
5
6
W
|
3
8
.
5
2
2
6
1
8
5
|
-
7
9
.
5
1
5
6
0
2
7
|
7
5
7
|
2
4
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
4
9
2
|
F
o
r
k
s

o
f

W
a
t
e
r
s
|
L
o
c
a
l
e
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
8
5
6
N
|
0
7
9
3
0
3
1
W
|
3
8
.
4
8
2
3
4
1
7
|
-
7
9
.
5
0
8
6
5
7
5
|
|
|
|
|
7
0
5
|
2
3
1
3
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
5
2
7
|
F
r
a
n
k

R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
3
1
0
W
|
3
8
.
4
9
8
1
7
4
4
|
-
7
9
.
5
5
2
8
2
5
8
|
3
8
3
3
0
4
N
|
0
7
9
3
3
4
1
W
|
3
8
.
5
5
1
2
2
8
5
|
-
7
9
.
5
6
1
4
3
8
1
|
7
8
0
|
2
5
5
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
6
4
7
|
G
i
n
s
e
n
g

M
o
u
n
t
a
i
n
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
8
5
0
N
|
0
7
9
3
1
3
9
W
|
3
8
.
4
8
0
6
7
5
|
-
7
9
.
5
2
7
5
4
7
|
|
|
|
|
9
7
8
|
3
2
0
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
8
6
0
|
G
u
l
f

M
o
u
n
t
a
i
n
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
4
0
N
|
0
7
9
3
1
0
3
W
|
3
8
.
4
9
4
5
6
3
6
|
-
7
9
.
5
1
7
5
4
6
8
|
|
|
|
|
1
0
0
6
|
3
3
0
0
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
3
9
1
6
|
H
a
m
i
l
t
o
n

C
h
a
p
e
l
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
7
4
0
N
|
0
7
9
3
7
0
7
W
|
3
8
.
2
9
4
5
6
7
7
|
-
7
9
.
6
1
8
6
5
9
1
|
|
|
|
|
8
2
3
|
2
7
0
0
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

1
4
8
4
0
9
7
|
H
i
g
h
l
a
n
d

H
i
g
h

S
c
h
o
o
l
|
S
c
h
o
o
l
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
4
2
6
N
|
0
7
9
3
4
4
4
W
|
3
8
.
4
0
7
1
3
8
7
|
-
7
9
.
5
7
8
9
3
3
3
|
|
|
|
|
8
7
9
|
2
8
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
0
9
/
1
5
/
2
0
1
0

1
4
8
4
0
9
9
|
H
i
g
h
l
a
n
d

W
i
l
d
l
i
f
e

M
a
n
a
g
e
m
e
n
t

A
r
e
a
|
P
a
r
k
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
9
0
5
N
|
0
7
9
3
4
3
9
W
|
3
8
.
3
1
8
1
7
8
5
|
-
7
9
.
5
7
7
5
4
7
|
|
|
|
|
9
5
4
|
3
1
3
0
|
M
o
n
t
e
r
e
y

S
E
|
0
9
/
2
8
/
1
9
7
9
|

.

.

.

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 4
Assignment

You will implement a system that indexes and provides search features for a file of GIS records, as described above.

Your system will build and maintain several in-memory index data structures to support these operations:

 Importing new GIS records into the database file
 Retrieving data for all GIS records matching given geographic coordinates
 Retrieving data for all GIS records matching a given feature name and state
 Retrieving data for all GIS records that fall within a given (rectangular) geographic region
 Displaying the in-memory indices in a human-readable manner

You will implement a single software system in Java to perform all system functions.

Program Invocation

The program will take the names of three files from the command line, like this:

java GIS

Note that this implies your main class must be named GIS, and must be in the default Java package.

The database file should be created as an empty file; note that the specified database file may already exist, in which case the
existing file should be truncated or deleted and recreated. If the command script file is not found the program should write an
error message to the console and exit. The log file should be rewritten every time the program is run, so if the file already
exists it should be truncated or deleted and recreated.

System Overview

The system will create and maintain a GIS database file that contains all the records that are imported as the program runs.
The GIS database file will be empty initially. All the indexing of records will be done relative to this file.

There is no guarantee that the GIS record file will not contain two or more distinct records that have the same geographic
coordinates. In fact, this is natural since the coordinates are expressed in the usual DMS system. So, we cannot treat
geographic coordinates as a primary (unique) key.

The GIS records will be indexed by the Feature Name and State (abbreviation) fields. This name index will support
finding offsets of GIS records that match a given feature name and state abbreviation.

The GIS records will also be indexed by geographic coordinate. This coordinate index will support finding offsets of GIS
records that match a given primary latitude and primary longitude.

The system will include a buffer pool, as a front end for the GIS database file, to improve search speed. See the discussion of
the buffer pool below for detailed requirements. When performing searches, retrieving a GIS record from the database file
must be managed through the buffer pool. During an import operation, when records are written to the database file, the
buffer pool will be bypassed, since the buffer pool would not improve performance during imports.

When searches are performed, complete GIS records will be retrieved from the GIS database file that your program
maintains. The only complete GIS records that are stored in memory at any time are those that have just been retrieved to
satisfy the current search, or individual GIS records created while importing data or GIS records stored in the buffer pool.

Aside from where specific data structures are required, you may use any suitable Java library containers you like.

Each index should have the ability to write a nicely-formatted display of itself to an output stream.



CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 5
Name Index Internals

The name index will use a hash table for its physical organization. Each hash table entry will store a feature name and state
abbreviation (separately or concatenated, as you like) and the file offset(s) of the matching record(s). Since each GIS record
occupies one line in the file, it is a trivial matter to locate and read a record given nothing but the file offset at which the
record begins.

Your table will use quadratic probing to resolve collisions, with the quadratic function (n2 + n)/2 to compute the step size.

The hash table must use a contiguous physical structure (array). The initial size of the table will be 1024, and the table will
resize itself automatically, by doubling its size whenever the table becomes 70% full.

Since the specified table sizes given above are powers of 2, an empty slot will always be found unless the table is full.

You will use the elfhash() function from the course notes, and apply it to the concatenation of the feature name and state
(abbreviation) field of the data records. Precisely how you form the concatenation is up to you.

You must be able to display the contents of the hash table in a readable manner.

Coordinate Index Internals

The coordinate index will use a bucket PR quadtree for the physical organization. In a bucket PR quadtree, each leaf stores
up to K data objects (for some fixed value of K). Upon insertion, if the added value would fall into a leaf that is already full,
then the region corresponding to the leaf will be partitioned into quadrants and the K+1 data objects will be inserted into
those quadrants as appropriate. As is the case with the regular PR quadtree, this may lead to a sequence of partitioning steps,
extending the relevant branch of the quadtree by multiple levels. In this project, K will probably equal 4, but I reserve the
right to specify a different bucket size with little notice, so this should be easy to modify.

The index entries held in the quadtree will store a geographic coordinate and a collection of the file offsets of the matching
GIS records in the database file.

Note: do not confuse the bucket size with any limit on the number of GIS records that may be associated with a single
geographic coordinate. A quadtree node can contain index objects for up to K different geographic coordinates. Each such
index object can contain references to an unlimited number of different GIS records.

The PR quadtree implementation should follow good design practices, and its interface should be somewhat similar to that of
the BST. You are expected to implement different types for the leaf and internal nodes, with appropriate data membership
for each, and an abstract base type from which they are both derived. Of course, these were all requirements for the related
minor project.

You must be able to display the PR quadtree in a readable manner. PR quadtree display code is given in the course notes.
The display must clearly indicate the structure of the tree, the relationships between its nodes, and the data objects in the leaf
nodes.

Buffer Pool Details

The buffer pool for the database file should be capable of buffering up to 15 records, and will use LRU replacement. You
may use any structure you like to organize the pool slots; however, since the pool will have to deal with record replacements,
some structures will be more efficient (and simpler) to use. You may use any classes from the Java library you think are
appropriate.

It is up to you to decide whether your buffer pool stores interpreted or raw data; i.e., whether the buffer pool stores GIS
record objects or just strings.

You must be able to display the contents of the buffer pool, listed from MRU to LRU entry, in a readable manner. The order
in which you retrieve records when servicing a multi-match search is not specified, so such searches may result in different
orderings of the records within the buffer pool. That is OK.

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 6
A Note on Coordinates and Spatial Regions

It is important to remember that there are fundamental differences between the notion that a geographic feature has specific
coordinates (which may be thought of as a point) and the notion that each node of the PR quadtree corresponds to a particular
sub-region of the coordinate space (which may contain many geographic features).

In this assignment, coordinates of geographic features are specified as latitude/longitude pairs, and the minimum resolution is
one second of arc. Thus, you may think of the geographic coordinates as being specified by a pair of integer values.

On the other hand, the boundaries of the sub-regions are determined by performing arithmetic operations, including division,
starting with the values that define the boundaries of the “world”. Unless the dimensions of the world happen to be powers
of 2, this can quickly lead to regions whose boundaries cannot be expressed exactly as integer values. You may use floating-
point values or integer values to represent region boundaries when computing region boundaries during splitting and quadtree
traversals. If you use integers, be careful not to unintentionally create "gaps" between regions.

Your implementation should view the boundary between regions as belonging to one of those regions. The choice of a
particular rule for handling this situation is left to you. The specification for the PR quadtree project describes how I made
that decision, but there is absolutely no requirement that you follow the same approach.

When carrying out a region search, you must determine whether the search region overlaps with the region corresponding to
a subtree node before descending into that subtree. The Java libraries include a Rectangle class which could be (too) useful.
You may make use of the Rectangle class, but you will be penalized 10% if your submitted solution makes use of any of the
following Rectangle methods: contains(), intersection(), and intersects(). Note though, that it is acceptable to make use
of those methods during development, but you must implement your own versions of them in your final submission.

Other System Elements

There should be an overall controller that validates the command line arguments and manages the initialization of the various
system components. The controller should hand off execution to a command processor that manages retrieving commands
from the script file, and making the necessary calls to other components in order to carry out those commands.

Naturally, there should be a data type that models a GIS record.

There may well be additional system elements, whether data types or data structures, or system components that are not
mentioned here. The fact no additional elements are explicitly identified here does not imply that you will not be expected to
analyze the design issues carefully, and to perhaps include such elements.

Aside from the command-line interface, there are no specific requirements for interfaces of any of the classes that will make
up your GIS; it is up to you to analyze the specification and come up with an appropriate set of classes, and to design their
interfaces to facilitate the necessary interactions. It is probably worth pointing out that an index (e.g., a geographic
coordinate index) should not simply be a naked container object (e.g, quadtree); if that's not clear to you, think more carefully
about what sort of interface would be appropriate for an index, as opposed to a container.

Command File

The execution of the program will be driven by a script file. Lines beginning with a semicolon character (';') are comments
and should be ignored. Blank lines are possible. Each line in the command file consists of a sequence of tokens, which will
be separated by single tab characters. A line terminator will immediately follow the final token on each line. The command
file is guaranteed to conform to this specification, so you do not need to worry about error-checking when reading it.

The first non-comment line will specify the world boundaries to be used:

world
This will be the first command in the file, and will occur once. It specifies the boundaries of the coordinate space to be
modeled. The four parameters will be longitude and latitudes expressed in DMS format, representing the vertical and
horizontal boundaries of the coordinate space.

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 7
It is certainly possible that the GIS record file will contain records for features that lie outside the specified coordinate
space. Such records should be ignored; i.e., they will not be indexed.

Each subsequent non-comment line of the command file will specify one of the commands described below. One command
is used to load records into your database from external files:

import
Add all the valid GIS records in the specified file to the database file. This means that the records will be appended to
the existing database file, and that those records will be indexed in the manner described earlier. When the import is
completed, log the number of entries added to each index, and the longest probe sequence that was needed when
inserting to the hash table. (A valid record is one that lies within the specified world boundaries.)

Another command requires producing a human-friendly display of the contents of an index structure:

debug[ quad | hash | pool ]
Log the contents of the specified index structure in a fashion that makes the internal structure and contents of the index
clear. It is not necessary to be overly verbose here, but you should include information like key values and file offsets
where appropriate.

Another simply terminates execution, which is handy if you want to process only part of a command file:

quit
Terminate program execution.

The other commands involve searches of the indexed records:

what_is_at
For every GIS record in the database file that matches the given , log the offset at
which the record was found, and the feature name, county name, and state abbreviation. Do not log any other data
from the matching records.

what_is
For every GIS record in the database file that matches the given and abbreviation>, log the offset at which the record was found, and the county name, the primary latitude, and the
primary longitude. Do not log any other data from the matching records.

what_is_in
For every GIS record in the database file whose coordinates fall within the closed rectangle with the specified height
and width, centered at the , log the offset at which the record was found, and the
feature name, the state name, and the primary latitude and primary longitude. Do not log any other data from the
matching records. The half-height and half-width are specified as seconds.

The what_is_in command takes an optional modifier, -long, which causes the display of a long listing of the
relevant records. The switch will be the first token following the name of the command. If this switch is present, then
for every GIS record in the database file whose coordinates fall within the closed rectangle with the specified height
and width, centered at the , log every important non-empty field, nicely formatted
and labeled. See the posted log files for an example. Do not log any empty fields. The half-height and half-width are
specified as seconds.

The what_is_in command also takes an optional modifier, causing the search results to be filtered:

-filter [ pop | water | structure ]

The switch and its modifier will be the first and second tokens following the name of the command. If present, this
causes the set of matching records to be filtered to only show those whose feature type field corresponds to the given
filter specifier. See Figure 3 for instructions on how to interpret the feature types shown above.

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 8
If a is specified for a command, it will be expressed as a pair of latitude/longitude values,
expressed in the same DMS format that is used in the GIS record files.

For all the commands, if a search results in displaying information about multiple records, you may display that data in any
order that is natural to your design.

Sample command scripts, and corresponding log files, will be provided on the website. As a general rule, every command
should result in some output. In particular, a descriptive message should be logged if a search yields no matching records.

Log File Description

Since this assignment will be graded by TAs, rather than the Curator, the format of the output is left up to you. Of course,
your output should be clear, concise, well labeled, and correct. You should begin the log with a few lines identifying
yourself, and listing the names of the input files that are being used. No one will object if you use the posted log files as a
model for formatting your own output, but you are certainly not expected to match the formatting of those exactly.

The remainder of the log file output should come directly from your processing of the command file. You are required to
echo each comment line, and each command that you process to the log file so that it's easy to determine which command
each section of your output corresponds to. Each command (except for "world") should be numbered, starting with 1, and
the output from each command should be well formatted, and delimited from the output resulting from processing other
commands. A complete sample log will be posted shortly on the course website.

A zip file is posted, containing scripts, GIS data files, and reference log files you can use in testing. The grading will be done
using files in exactly the same format (perhaps even including some of the posted files).


Administrative Issues

Submission

You will submit this assignment to the Curator System (read the Student Guide), where it will be archived for grading by a
course TA.

For this assignment, you must submit a single zip file containing all the Java source code files for your implementation (i.e.,
.java files). Submit only the Java source files. Do not submit Java bytecode (class) files. If you use packages in your
implementation (and that's good practice), your zip file must include the correct directory structure for those packages, and
your GIS.java file must be in the top directory when the tar file is unpacked. That's easy to verify by running "unzip –
l" on the tar file you are planning to submit.

In order to correct submission errors and late-breaking implementation errors, you will be allowed up to five submissions for
this assignment. By default, we will grade the last submission you make.

The Student Guide and link to the submission client can be found at: http://www.cs.vt.edu/curator/

Evaluation

The quality of the OO design in your solution will be evaluated; you are expected to identify and implement a sound
collection of classes, whether the specification mentions them or not. The quality of your internal documentation will be
evaluated. Finally, the correctness of your solution will be evaluated by executing your solution on a collection of test data
files. Be sure to test your solution with all of the data sets that will be posted, since we will use a variety of data sets,
including at least one very large data one (perhaps hundreds of thousands of records) in our evaluation.

The quality of your design and internal documentation will determine approximately 30% of your grade for this assignment.
Pay attention to the design discussion in class for Project 1.

We will compile and test your solution on CentOS 8, using version 11.0.8 of the JDK. The choice of operating system
should not matter with a Java program (write once, run everywhere, right?), but you should compile and test your solution on
rlogin before submitting it.
CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 9
Late submissions will be penalized at a rate of 10% per day or fraction thereof. There will also be a final deadline, after
which we will not accept submissions.


Pedagogic Points

The goals of this assignment include, but are not limited to:

 implementation of a hash table generic in Java
 implementation of a bucket PR quadtree generic in Java (derived from your existing PR quadtree)
 implementation of a buffer pool in Java
 implementation of complementary internal and leaf node types to conserve memory, using an appropriate hierarchy
of types
 design of appropriate index classes to wrap the containers
 design of appropriate data objects to store in each index
 understanding how to navigate a file in Java
 understanding how to parse delimited strings in Java (based on your solution to an earlier project)
 creation of a sensible OO design for the overall system, including the identification of a number of useful classes not
explicitly named in this specification
 implementation of such an OO design into a working system
 incremental testing of the basic components of the system in isolation
 satisfaction when the entire system comes together in good working order


Pledge

Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically,
you must include the pledge statement provided on the Projects page of the course website. The pledge statement should be
in the file that contains your main class (which must be named GIS.java).


CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 10
Figure 3: Feature Class Designators

The Feature Class field of the GIS records may contain any of the following designators. We care about these because of the
-filter switch that may be used with the what_is_in commands. The table below indicates which of the standard
correspond to one of the filter specifiers.

Class Type Description
Airport structure Manmade facility maintained for the use of aircraft (airfield, airstrip, landing field, landing strip).
Arch Natural arch-like opening in a rock mass (bridge, natural bridge, sea arch).
Area
Any one of several areally extensive natural features not included in other categories (badlands, barren, delta, fan,
garden).
Arroyo water Watercourse or channel through which water may occasionally flow (coulee, draw, gully, wash).
Bar
Natural accumulation of sand, gravel, or alluvium forming an underwater or exposed embankment (ledge, reef,
sandbar, shoal, spit).
Basin Natural depression or relatively low area enclosed by higher land (amphitheater, cirque, pit, sink).
Bay water
Indentation of a coastline or shoreline enclosing a part of a body of water; a body of water partly surrounded by land
(arm, bight, cove, estuary, gulf, inlet, sound).
Beach
The sloping shore along a body of water that is washed by waves or tides and is usually covered by sand or gravel
(coast, shore, strand).
Bench
Area of relatively level land on the flank of an elevation such as a hill, ridge, or mountain where the slope of the land
rises on one side and descends on the opposite side (level).
Bend water
Curve in the course of a stream and (or) the land within the curve; a curve in a linear body of water (bottom, loop,
meander).
Bridge structure
Manmade structure carrying a trail, road, or other transportation system across a body of water or depression
(causeway, overpass, trestle).
Building structure
A manmade structure with walls and a roof for protection of people and (or) materials, but not including church,
hospital, or school.
Canal water Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power (ditch, lateal).
Cape Projection of land extending into a body of water (lea, neck, peninsula, point).
Cave Natural underground passageway or chamber, or a hollowed out cavity in the side of a cliff (cavern, grotto).
Cemetery A place or area for burying the dead (burial, burying ground, grave, memorial garden).
Census
A statistical area delineated locally specifically for the tabulation of Census Bureau data (census designated place,
census county division, unorganized territory, various types of American Indian/Alaska Native statistical areas).
Distinct from Civil and Populated Place.
Channel water
Linear deep part of a body of water through which the main volume of water flows and is frequently used as aroute
for watercraft (passage, reach, strait, thoroughfare, throughfare).
Church structure Building used for religious worship (chapel, mosque, synagogue, tabernacle, temple).
Civil
A political division formed for administrative purposes (borough, county, incorporated place, municipio, parish, town,
township). Distinct from Census and Populated Place.
Cliff Very steep or vertical slope (bluff, crag, head, headland, nose, palisades, precipice, promontory, rim, rimrock).
Crater
Circular-shaped depression at the summit of a volcanic cone or one on the surface of the land caused by the impact
of a meteorite; a manmade depression caused by an explosion (caldera, lua).
Crossing A place where two or more routes of transportation form a junction or intersection (overpass, underpass).
Dam structure
Water barrier or embankment built across the course of a stream or into a body of water to control and (or) impound
the flow of water (breakwater, dike, jetty).
Falls water Perpendicular or very steep fall of water in the course of a stream (cascade, cataract, waterfall).
Flat Relative level area within a region of greater relief (clearing, glade, playa).
Forest
Bounded area of woods, forest, or grassland under the administration of a political agency (see "woods") (national
forest, national grasslands, State forest).
Gap
Low point or opening between hills or mountains or in a ridge or mountain range (col, notch, pass, saddle, water
gap, wind gap).
Glacier water
Body or stream of ice moving outward and downslope from an area of accumulation; an area of relatively permanent
snow or ice on the top or side of a mountain or mountainous area (icefield, ice patch, snow patch).
Gut water Relatively small coastal waterway connecting larger bodies of water or other waterways (creek, inlet, slough).
CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 11
Harbor water Sheltered area of water where ships or other watercraft can anchor or dock (hono, port, roads, roadstead).
Hospital structure Building where the sick or injured may receive medical or surgical attention (infirmary).
Island
Area of dry or relatively dry land surrounded by water or low wetland (archipelago, atoll, cay, hammock, hummock,
isla, isle, key, moku, rock).
Isthmus Narrow section of land in a body of water connecting two larger land areas.
Lake water Natural body of inland water (backwater, lac, lagoon, laguna, pond, pool, resaca, waterhole).
Lava Formations resulting from the consolidation of molten rock on the surface of the Earth (kepula, lava flow).
Levee structure Natural or manmade embankment flanking a stream (bank, berm).
Locale
Place at which there is or was human activity; it does not include populated places, mines, and dams (battlefield,
crossroad, camp, farm, ghost town, landing, railroad siding, ranch, ruins, site, station, windmill).
Military Place or facility used for various aspects of or relating to military activity.
Mine
Place or area from which commercial minerals are or were removed from the Earth; not including oilfield (pit, quarry,
shaft).
Oilfield Area where petroleum is or was removed from the Earth.
Park structure
Place or area set aside for recreation or preservation of a cultural or natural resource and under some form of
government administration; not including National or State forests or Reserves (national historical landmark, national
park, State park, wilderness area).
Pillar Vertical, standing, often spire-shaped, natural rock formation (chimney, monument, pinnacle, pohaku, rock tower).
Plain
A region of general uniform slope, comparatively level and of considerable extent (grassland, highland, kula,
plateau, upland).
Populated
Place
pop
Place or area with clustered or scattered buildings and a permanent human population (city, settlement, town,
village). A populated place is usually not incorporated and by definition has no legal boundaries. However, a
populated place may have a corresponding "civil" record, the legal boundaries of which may or may not coincide
with the perceived populated place. Distinct from Census and Civil classes.
Post Office structure An official facility of the U.S. Postal Service used for processing and distributing mail and other postal material.
Range Chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra).
Rapids water Fast-flowing section of a stream, often shallow and with exposed rock or boulders (riffle, ripple).
Reserve A tract of land set aside for a specific use (does not include forests, civil divisions, parks).
Reservoir water Artificially impounded body of water (lake, tank).
Ridge
Elevation with a narrow, elongated crest which can be part of a hill or mountain (crest, cuesta, escarpment, hogback,
lae, rim, spur).
School structure
Building or group of buildings used as an institution for study, teaching, and learning (academy, college, high school,
university).
Sea water Large body of salt water (gulf, ocean).
Slope A gently inclined part of the Earth's surface (grade, pitch).
Spring water Place where underground water flows naturally to the surface of the Earth (seep).
Stream water
Linear body of water flowing on the Earth's surface (anabranch, awawa, bayou, branch, brook, creek, distributary,
fork, kill, pup, rio, river, run, slough).
Summit
Prominent elevation rising above the surrounding level of the Earth's surface; does not include pillars, ridges, or
ranges (ahu, berg, bald, butte, cerro, colina, cone, cumbre, dome, head, hill, horn, knob, knoll, mauna, mesa,
mesita, mound, mount, mountain, peak, puu, rock, sugarloaf, table, volcano).
Swamp water
Poorly drained wetland, fresh or saltwater, wooded or grassy, possibly covered with open water (bog, cienega,
marais, marsh, pocosin).
Tower structure A manmade structure, higher than its diameter, generally used for observation, storage, or electronic transmission.
Trail Route for passage from one point to another; does not include roads or highways (jeep trail, path, ski trail).
Tunnel structure Linear underground passageway open at both ends.
Unknown This class is assigned to legacy data only. It will not be assigned to new or edited records.
Valley
Linear depression in the Earth's surface that generally slopes from one end to the other (barranca, canyon, chasm,
cove, draw, glen, gorge, gulch, gulf, hollow, ravine).
Well water Manmade shaft or hole in the Earth's surface used to obtain fluid or gaseous materials.
Woods
Small area covered with a dense growth of trees; does not include an area of trees under the administration of a
political agency (see "forest").

CS 3114 Data Structures & Algorithms J4: Geographic Information System
Version 5.00 This is a purely individual assignment! 12
Change Log Relative to Version 5.00

If changes are made to the specification, details will be noted below, the version number will be updated, and an
announcement will be made on the course Forum board.

Page
5.00 Base document. Posted on November 6.




欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468