程序代写案例-INFO20003

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

INFO20003 Database Systems 1© University of Melbourne
INFO20003 Database Systems
Lecture 10
Storage and Indexing
Week 5
Dr Renata Borovica-Gajic
INFO20003 Database Systems 3© University of Melbourne
What this subject is all about.
Remember this?
Database System
Store
Access
Process
SQL
Queries
Results
select val from sales
where id = max;
Organisational Description
and Problem Area
• An investment bank wants to have a database to provide it with the ability to store information about its trading operations. The bank essentially works with
customers by providing the capability for trading stocks, shares and other commodities. The bank has three branches in which exist a number of departments.
Departments have a department manager who supervises a number of staff within the department. A set of accounts are used to store information about the
currency of the organisations operations. Accounts can be customer accounts or internal “house” accounts, each of which allow trades to be made upon them.
There are a number of account types. There are many customers and customers may have one or more contacts. Customers have a facility for lending
money to pay for their purchases of stocks and commodities. Staff make deals on the behalf of their customers using a funding source and keeping track of
settlements on the deals being made. There are many types of deal to be made. Settlements are full or partial payments of the deals and are recorded
whenever a payment is made.
• Please note that this section is purely made up and by all means is a very short description of a real investment bank (although many details have been left out
and wide ranging assumptions have been made.
MODELLING
SQL ARCHITECTURE / INTERNAL WORKINGS
INFO20003 Database Systems 4© University of Melbourne
Components of a DBMS
DBMS
Index files
Heap files
Database
Concurrency
control module
File and access
methods mgr.
Buffer pool mgr.
Disk space mgr.
Storage module Concurrency
control module
Transaction
mgr.
Lock mgr.
Crash recovery
module
Log mgr.
Query processing module
Parser/
Compiler Optimizer Executor
This is one of several possible architectures;
each system has its own slight variations.
TODAY
INFO20003 Database Systems 5© University of Melbourne
Coverage
• File organization (Heap & sorted files)
• Index files & indexes
• Index classification
Readings: Chapter 8, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems 6© University of Melbourne
• FILE: A collection of pages, each containing a collection of
records.
• DBMS must support:
–insert/delete/modify record
–read a particular record (specified using record id)
–scan all records (possibly with some conditions on the
records to be retrieved)
Files (in a DBMS)
Page Page
Record
col1, col2, col3
col1, col2, col3 FILE
INFO20003 Database Systems 7© University of Melbourne
Alternative File Organizations
• Many alternatives exist, each good for some situations, and
not so good in others:
1. Heap files: no particular order among records
–Suitable when typical access is a file scan retrieving all records
2. Sorted Files: pages and records within pages are ordered by some
condition
–Best for retrieval (of a range of records) in some order
3. Index File Organizations:
–Special data structure that has the fastest retrieval in some order
–Will cover shortly..
INFO20003 Database Systems 8© University of Melbourne
1. Heap (Unordered) Files
• Simplest file structure, contains records in no
particular order
• As file grows and shrinks, disk pages are allocated
and de-allocated
–Fastest for inserts compared to other alternatives

12,20 12,10
11,80
13,75

16,20 14,10
14,80
18,75

22,20
21,10
20,80
36,75
…
Used to denote that pages are linked
INFO20003 Database Systems 9© University of Melbourne
Sorted files
• Similar structure like heap files (pages and records), but pages
and records are ordered
• Fast for range queries, but hard for maintenance (each insert
potentially reshuffles records)
• Example: A sorted file ordered by age

12,20
12,10
11,80
13,75

16,20
14,10
14,80
18,75

22,20
21,10
20,80
36,75
…
INFO20003 Database Systems 10© University of Melbourne
Storage hierarchy
• Data is typically stored in pages on Hard Disks (HDD).
• To be able to process and analyze it – data needs to be
brought to Memory (RAM).
INFO20003 Database Systems 11© University of Melbourne
How does a DBMS decide which option is better?
• DBMS model the cost of all operations
• The cost is typically expressed in the number of page accesses (or disk
I/O operations – to bring data from disk to memory)
–1 page access (on disk) == 1 I/O (used interchangeably)
• Example: If we have a table of 100 records, and each page can store 10
records, what would be the cost of accessing the entire file
• Answer: For 100 records we have 10 pages in total (100/10), thus the cost
to access the entire file is 10 I/O (or 10 pages)
INFO20003 Database Systems 12© University of Melbourne
• Heap file (no order) = B;
• Sorted file (exploit order) = log2 B
• Example: Find all records with ages between 20 and 30, for the file that
has B pages. Consider both alternative: having an unsorted and sorted
file. What would be the cheapest cost?
• 20 < age <30, num pages = B
Which alternative is better?
12,20
12,10
11,80
13,75 16,20
14,10
14,80
18,75
22,20
21,10
20,80
36,7552,20
41,10
36,80
80,75
12,20
12,10
11,80
13,75
16,20
14,10
14,80
18,75
22,20
21,10
20,80
36,75
52,20
41,10
36,80
80,75
Heap file
Sorted file
INFO20003 Database Systems 14© University of Melbourne
File Organization & Indexing
• File organization (Heap & sorted files)
• Index files & indexes
• Index classification
INFO20003 Database Systems 15© University of Melbourne
Indexes
• Sometimes, we want to retrieve records by specifying the
values in one or more fields, e.g.,
–Find all students in the “CIS” department
–Find all students with a gpa > 3
• An index is a data structure built on top of data pages used for
efficient search. The index is built over specific fields called
search key fields. E.g. we can build an index on GPA, or
department name.
–The index speeds up selections on the search key fields
–Any subset of the fields of a relation can be the search key for an index
on the relation
–Note: Search key is not the same as key (e.g., doesn't have to be
unique)
INFO20003 Database Systems 16© University of Melbourne
Directory
2.5 3 3.5
1.2 1.7 1.8 1.9 2.2 2.4 2.7 2.7 2.9 3.2 3.3 3.3 3.6 3.8 3.9 4.0
2
Data Records
In Data Pages
An index contains a collection of data entries, and supports efficient
retrieval of data records matching a given search condition
Data entries:
(Index File)
(Data file)
Example: Simple Index on GPA
2Smaller than
Larger/equal than
Find results
Sorted data entries (on GPA)
INFO20003 Database Systems 17© University of Melbourne
File Organization & Indexing
• File organization (Heap & sorted files)
• Index files & indexes
• Index classification
INFO20003 Database Systems 18© University of Melbourne
Index Classification
• Classification based on various factors:
–Clustered vs. Unclustered
–Primary vs. Secondary
–Single Key vs. Composite
–Indexing technique:
−Tree-based, hash-based, other
INFO20003 Database Systems 19© University of Melbourne
Index Classification: Clustering
• Clustered vs. unclustered: If order of data records is the
same as the order of index data entries, then the index is
called clustered index. Otherwise is unclustered.
Data entries
(Index File)
(Data file)
Data Records
Data entries
Data RecordsCLUSTERED UNCLUSTERED
INFO20003 Database Systems 20© University of Melbourne
Zoom in Clustered Index
….
...
INDEX FILE
HEAP FILE
TUPLE
PAGES
Data entries 1 612
1, …
6, …
12, …
ROOT
INTERNAL
NODES
LEAF
NODES
INFO20003 Database Systems 21© University of Melbourne
Zoom in Unclustered Index
….
...
ROOT
INTERNAL
NODES
LEAF
NODES
INDEX FILE
HEAP FILE
TUPLE
PAGES
Data entries
1, …
6, …
12, …
1 126
INFO20003 Database Systems 22© University of Melbourne
Clustering properties
•A data file can have a clustered index on at most one search
key combination (i.e. we cannot have multiple clustered
indexes over a single table)
•Cost of retrieving data records through index varies greatly
based on whether index is clustered (cheaper for clustered)
•Clustered indexes are more expensive to maintain (require
file reorganization), but are really efficient for range search
INFO20003 Database Systems 23© University of Melbourne
Clustered vs. Unclustered Index: Cost
• (Approximated) cost of retrieving records found in range scan:
1. Clustered: cost ≈ # pages in data file with matching records
2. Unclustered: cost ≈ # of matching index data entries (data
records)
Data entries
(Index File)
(Data file)
Data Records
Data entries
Data RecordsCLUSTERED UNCLUSTERED
INFO20003 Database Systems 24© University of Melbourne
• Primary index includes the table’s primary key
• Secondary is any other index
• Properties:
–Primary index never contains duplicates
–Secondary index may contain duplicates
Primary vs. Secondary Index
INFO20003 Database Systems 26© University of Melbourne
• An index can be built over a combination of search keys
• Data entries in index sorted by search keys
sue 13 75
bob
12
10
20
8011
12
name age sal
cal
joe

12,20
12,10
11,80
13,75

20,12
10,12
75,13
80,11
Data records
sorted by name
• Examples:
1. Index on
2. Index on
3. Efficient to answer:
age=12 and sal = 10
age=12 and sal > 15
Composite Search Keys
Data entries
INFO20003 Database Systems 27© University of Melbourne
Hash-based index
• Hash-based index:
–Represents index as a collection of buckets. Hash function
maps the search key to the corresponding bucket.
- h(r.search_key) = bucket in which record r belongs
–Good for equality selections
Bucket 1
Bucket 3
Bucket 1
Bucket 4
Data File Index File (sal)
• Example: Hash-based index on (sal)
H=sal(mod 4)
Find Sal = 2007
2007 mod 4 = 3 go to Buck.4 …
INFO20003 Database Systems 28© University of Melbourne
Tree-based index
• Tree-based index:
–Underlying data structure is a binary (B+) tree. Nodes contain
pointers to lower levels (search left for lower, right for higher).
Leaves contain data entries sorted by search key values.
–Good for range selections
–So far we have shown those
Index File (age)
Data File
• Example: Tree-based index on (age)
Find age > 39
INFO20003 Database Systems 29© University of Melbourne
Summary
• Many alternative file organizations exist, each appropriate
in some situation
• If selection queries are frequent, sorting the file or building
an index is important
• Index is an additional data structure (i.e. file) introduced to
quickly find entries with given key values
–Hash-based indexes only good for equality search
–Sorted files and tree-based indexes best for range search; also
good for equality search
–Files rarely kept sorted in practice (because of the cost of
maintaining them); B+ tree index is better
INFO20003 Database Systems 30© University of Melbourne
What’s examinable
• Describe alternative file organizations
• What is an index, when do we use them
• Index classification
INFO20003 Database Systems 31© University of Melbourne
Next Lecture
- -
• Query processing part 1
‒ Selection and projection (execution, costs)
‒ Let’s demystify how DBMS perform work

欢迎咨询51作业君