代写辅导接单-CSCI 4380 -

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

CSCI 4380 Database Systems 2025 spring

Instructor: Lei Yu Assistant Prof,

Department of Computer Science Outline: • Test 2 grades released.

• Makeup test grades will be available soon. • Lecture 21 Exercise after class:

48 hours

• Conventional indexes • B-Trees The Main Purpose of Index Structures Speedup the search process 3 indexσa=6(R) blocks containing the desired

tuples quickly

figure out

disks otherwise have to scan the entire R also need to handle dynamic changes of R Root B+Tree Example

n=3 10 0 12 0 15 0 18 0 30 3 5 11 30 35 10 0 10 1 11 0 12 0 13 0 15 0 15 6 17 9 18 0 20 0 Sample non-leaf to keys to keys

to keys

to keys < 57

57£ k<81

81£k<95

³95 57 81 95 Sample leaf node:

From non-leaf node

to next leaf

in sequence 57 81 95 To

re co rd

w ith

k ey

5 7 To

re co rd

w ith

k ey

8 1 To

re co rd

w ith

k ey

8 5 In textbook’s notation

n=3 Leaf: Non-leaf: 30 35 30 30 35 30 B+tree rules q Rule 1. All leaves are at same lowest level (balanced

tree) q Rule 2. Pointers in leaves point to records except for

“sequence pointer” q Rule 3. Number of keys/pointers in nodes: 8 Max. # pointers Max. #

keys Min. # pointers Min. # keys Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1 Leaf n+1 n ë(n+1)/2û+1 ë(n+1)/2û Root n+1 n 2 1 A B+tree node of order n

pn+1knk2p2k1p1 p3 …… k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn pn k1 k2 … p1 p2 p3 Minimum Occupancy Rule (or Minimum Fill

Factor Rule) • Balance and Predictable Height: To keep the tree from

becoming lopsided or sparse, we enforce a rule that each node

(except the root) must be at least half full. • Space Efficiency: Nodes map directly to disk blocks, and

Accessing a half-empty block is a waste of I/O. Max. # pointers Max. #

keys Min. # pointers Min. # keys Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1 Leaf n+1 n ë(n+1)/2û+1 ë(n+1)/2û Root n+1 n 2 1

Full node

min. node Non-leaf Leaf n=3 12 0 15 0 18 0 30 3 5 11 30 35 co un ts

e ve n

if

nu ll Search in a B+tree 1. Start from the root 2. Search in a leaf block 3. May not have to go to the data file

11 Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

Search in B+tree 12 A tree node Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 13 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 14 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(*prt, 130) Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 15 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(*prt, 130) Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 16 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(*prt, 130) 100 £ 130 Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 17 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(*prt, 130) 100 £ 130 12 0 £

13 0 < 15 0 Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Search in B+tree 18 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(*prt, 130) 12 0 £

13 0 < 15 0 130 =130 return 100 £ 130 Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Range Search in B+tree 19 To research all records whose key values are

between k1 and k2: Range Search in B+tree 20 To research all records whose key values are

between k1 and k2: Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

Range Search in

B+tree 21 Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Range Search in

B+tree 22 Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Range-Search(*prt, 50, 125) Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Range Search in

B+tree 23 Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root Range-Search(*prt, 50, 125) Search(*prt, 50) Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Range Search in

B+tree 24 Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root 50

< 1 00 30 £ 50 Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

Range-Search(*prt, 50, 125) Search(*prt, 50)A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Range Search in

B+tree 25 Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the

search key value is larger than k2.

100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 root 120 £ 125 return 50

< 1 00 30 £ 50 110 £ 125 101 £ 125 100 £ 125 return returnreturn 135 > 125 STOP 35 < 50 Not Return Search(ptr, k);

\\ search a record of key value k in the subtree rooted at ptr

\\ assume the B+tree is a dense index of order n

Case 1. ptr is a leaf

\\ pn+1 is the sequence pointer

IF (k = ki) for a key ki in *ptr THEN return(pi);

ELSE return(Null);

Case 2. ptr is not a leaf

find a key ki in *ptr such that ki-1 ≤ k < ki;

return(Search(pi, k));

Range-Search(*prt, 50, 125) Search(*prt, 50)A tree node k1 k2 ‧‧‧‧‧‧ kn p1 p2 p3 pn+1pn Insert into B+tree 26 Insert into B+tree Basic idea: 1. Find the leaf L where the record r should be

inserted; 2. If L has further room, then insert r into L, and return; 3. If L is full, spilt L plus r into two leaves (each is

about half full): this causes an additional child for the

parent P of L, thus we need to add a child to P; 4. If P is already full, then we have to split P and add an

additional child to P’s parent … (recursively)

27 Insert into B+tree q Simple case (space available for new child) q Leaf overflow q Non-leaf overflow q New root 28 Insert into B+tree q Simple case (space available for new child) q Leaf overflow q Non-leaf overflow q New root 29 order n=3 30 I. Simple case: Insert key 32 100 30 40 3 5 11 30 31 order n=3 31 I. Simple case: Insert key 32 100 30 40 3 5 11 30 31 Insert(prt, 32) order n=3 32 I. Simple case: Insert key 32 100 30 40 3 5 11 30 31 Insert(prt, 32) 32

< 1 00 30 £ 32 <40 order n=3 33 I. Simple case: Insert key 32 100 30 40 3 5 11 30 31 Insert(prt, 32) 32

< 1 00 30 £ 32 <40 room for 32 order n=3 34 I. Simple case: Insert key 32 100 30 40 3 5 11 30 31 32 Insert(prt, 32) 32

< 1 00 30 £ 32 <40 Insert into B+tree q Simple case (space available for new child) q Leaf overflow q Non-leaf overflow q New root 35 Insert into B+tree q Simple case (space available for new child) q Leaf overflow q Non-leaf overflow q New root 36 Insert into B+tree q Simple case (space available for new child) q Leaf overflow q Non-leaf overflow q New root 37 Idea: when there is no space for a new child

(all pointers are in use), split the node into

two nodes, with both at least half full. 38 Complication: node overflow 39 Complication: Leaf overflow 40 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow p

k

p’

41 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* + 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow p k

k

p’

p

k

p’

42 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* + 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow p k

k

p’

p

k

p’

43 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* + 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow p k

k

p’

q p

k

p’

44 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* p k

k

p’

q

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2 p

k

p’

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow ? What is the

key here? 45 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* p k

k

p’

q

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2 p

k

p’

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow ? What is the

key here? 46 p1 k1 … p k

---

p** p

k

… pn kn pn+1 kn+1 ---

p* p k

k

p’

q

+ 1 /2

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 pn+1 kn+1 p1

k1 … p k

p k

… pn kn p*

+ 1 /2 p

k

p’

+ 1 /2

+ 1 /2 +1

+ 1 /2 +1 Complication: Leaf overflow

+ 1 /2 +1 k

≤ x < k;

+ 1 /2 +1 47 100 30 3 5 11 30 31 Leaf overflow: Insert key 7 (order n = 3) 48 100 30 3 5 11 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 49 100 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 50 100 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 3 5 11 7 51 100 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 3 5 11 7 52 100 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 3 5 11 7 53 100 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 3 5 11 7 54 100 7 30 3 5 30 31 Leaf overflow: Insert key 7 (order n = 3) 7 11 3 5 11 7 55 Complication: nonleaf overflow 56 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow 57 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow 58 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow 59 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow 60 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow ? What is the

key here? 61 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow ? not used What is the

key here? 62 kn+1 pn+2p1

k1 … p k p

k p

k

… pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2 p

k’ p’ p1

k1

p2

k2 … p k

p( + 1)/2 -1 ( + 1)/2 -1 p k

… pn kn pn+1 kn+1 pn+2( + 1)/2 +1 k ( + 1)/2p

k’ p’q ( + 1)/2 ( + 1)/2 +1 ( + 1)/2 +1 ( + 1)/2 +1 Complication: nonleaf overflow

63 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 180 150 156 179 180 210 160 64 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 180 150 156 180 210 150 156 179 160 160 179 65 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 180 150 156 180 210 150 156 179 160 160 179 66 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 180 150 156 180 210 150 156 179 160 160 179 160 67 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 180 150 156 180 210 150 156 179 160 160 179 no room! 160 68 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 156 180 210 150 156 179 160 160 179 120 150 180 160 160 180150 69 Nonleaf overflow: Insert key 160 (order n = 3) 100 120 150 156 180 210 150 156 179 160 160 179 160 180150 120 150 180 160 70 Nonleaf overflow: Insert key 160 (order n = 3) 100 150 120 150 156 180 210 150 156 179 160 160 179 160 180150 120 150 180 160 Delete in B+tree 71 Delete in B+tree Basic idea: 1. Find the leaf L where the record r should be

deleted; 2. If L remains at least half-full after deleting r, then

delete r, and return; 3. Else consider the sibling L’ of L; 3.1

If L’ is more than half-full, then move a record

from L’ to L, and return;

3.2

Else combine L and L’ into a single leaf; 3.3

But now you need to consider the case of deleting

a child from L’s parent … (recursively)

72 q Simple case: the node remains at

least half-full after deletion. q Re-distribute keys among siblings q Coalesce with a sibling and delete a pointer from its father 73 Delete in B+tree q Simple case: the node remains at least half-full after deletion. q Re-distribute keys among siblings q Coalesce with a sibling and delete a pointer from its father 74 Delete in B+tree 75 order n=3Simple case: Delete key 35 76 order n=3Simple case: Delete key 35 105 10 40 3 5 8 10 20 35 Delete(prt, 35) 40 50 77 order n=3Simple case: Delete key 35 105 10 40 3 5 8 10 20 35 Delete(prt, 35) 40 50 Delete

35 78 order n=3Simple case: Delete key 35 105 10 40 3 5 8 10 20 35 Delete(prt, 35) 40 50 After deletion, the

node still keeps at

least half-full 79 order n=3Simple case: Delete key 35 105 10 40 3 5 8 10 20 40 50 q Simple case: the node remains at least half-full after

deletion. q Re-distribute keys among siblings q Coalesce with a sibling and delete a pointer from its father 80 Delete in B+tree 81 Key re-distribution at leaves p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht --- p* q* i = ë(n+1)/2û - 1 t > ë(n+1)/2û

82 p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht --- p* q* p1

k1 … pi ki q1 h1 --- p’

k’

p

k

p” k” q2 h2 … qt ht --- p* q* i = ë(n+1)/2û - 1 t > ë(n+1)/2û

Key re-distribution at leaves 83 p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht --- p* q* p1

k1 … pi ki q1 h1 --- p’

k’

p

k

p” k” q2 h2 … qt ht --- p* q* h2 i = ë(n+1)/2û - 1 t > ë(n+1)/2û

Key re-distribution at leaves 84 order n=3 Key re-distribution at leaves: Delete 5 85 order n=3 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 Key re-distribution at leaves: Delete 5 86 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 Delete 5 order n=3 Key re-distribution at leaves: Delete 5 87 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 order n=3 Key re-distribution at leaves: Delete 5 88 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 Less than

half-full !! order n=3 Key re-distribution at leaves: Delete 5 89 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 Look at the sibling,

which is more than

half-full, so we can

redistribute the keysa order n=3 Key re-distribution at leaves: Delete 5 90 105 10 40 3 5 10 20 35 Delete(prt, 5) 40 50 Look at the sibling,

which is more than

half-full, so we can

redistribute the keys order n=3 Key re-distribution at leaves: Delete 5 91 105 10 40 3 10 20 35 Delete(prt, 5) 40 50 redistribution 3 10 20 35 order n=3 Key re-distribution at leaves: Delete 5 92 105 10 40 3 10 20 35 Delete(prt, 5) 40 50 redistribution 3 10 20 35 20 order n=3 Key re-distribution at leaves: Delete 5 93 105 10 40 3 10 20 35 Delete(prt, 5) 40 50 redistribution 3 10 20 35 20 20 order n=3 Key re-distribution at leaves: Delete 5 94 105 20 40 3 10 20 35 40 50 order n=3 Key re-distribution at leaves: Delete 5 95 Key re-distribution at nonleaves 96 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 97 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 98 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 99 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

? i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 100 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

? i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 101 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

k’ i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 102 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

k’ ? i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 103 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

k’ ? i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 104 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

k’ h1 i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù 105 Key re-distribution at nonleaves p1

k1 … pi ki

pi+1

p’

k’

p

k

p” k” q1 h1 q2 h2 … qt ht qt+1

-

p1

k1 … pi ki

pi+1

q1

p’

k’

p

k

p” k” q2 h2 … qt ht qt+1 k’ h1 i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù q Simple case: the node remains at least half-full after

deletion. q Re-distribute keys among siblings q Coalesce with a sibling and delete a pointer from its father 106 Delete in B+tree q Simple case: the node remains at least half-full after

deletion. q Re-distribute keys among siblings q Coalesce with a sibling and delete a pointer from its father 107 Delete in B+tree Observation: when two siblings both are no more than half full,

coalesce them into a single node (which is nearly full) 108 Node Coalescence 109 Leaf Coalescence 110 p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 … qt ht --- p* q* i = ë(n+1)/2û - 1 t = ë(n+1)/2û

Leaf Coalescence 111 p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 … qt ht --- p* q* p1

k1 … pi ki p’

k’

p

k

p” k” q1 h1 … qt ht

- q* q* i = ë(n+1)/2û - 1 t = ë(n+1)/2û

Leaf Coalescence 112 p1

k1 … pi ki

--- p’

k’

p

k

p” k” q1 h1 … qt ht --- p* q* p1

k1 … pi ki p’

k’

p

k

p” k” q1 h1 … qt ht

- q* q* i = ë(n+1)/2û - 1 t = ë(n+1)/2û

Leaf Coalescence 113 Leaf coalescence : Delete key 5 114 order n=3 Leaf coalescence : Delete key 5 38 10 3 5 10 20 Delete(prt, 5) 40 50 60 80 60 75 80 90 115 order n=3 Leaf coalescence : Delete key 5 38 10 3 5 10 20 Delete(prt, 5) 40 50 60 80 60 75 Delete 5 80 90 116 order n=3 Leaf coalescence : Delete key 5 38 10 3 5 10 20 Delete(prt, 5) 40 50 60 80 60 75 80 90 117 order n=3 Leaf coalescence : Delete key 5 38 10 3 5 10 20 Delete(prt, 5) 40 50 60 80 60 75 The sibling is just

half-full, so we

should coalesce 80 90 118 order n=3 Leaf coalescence : Delete key 5 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Leaf

coalescence 80 90 119 order n=3 Leaf coalescence : Delete key 5 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence 80 90 120 order n=3 Leaf coalescence : Delete key 5 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence more than

half-full, so

we can re- distribute

pointers at

nonleaves 80 90 121 order n=3 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence more than

half-full, so

we can re- distribute

pointers at

nonleaves 80 90 Leaf coalescence : Delete key 5 122 order n=3 Key re-distribution at Nonleaves 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence more than

half-full, so

we can re- distribute

pointers at

nonleaves 80 90 123 order n=3 Key re-distribution at Nonleaves 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence 38 60 more than

half-full, so

we can re- distribute

pointers at

nonleaves 80 90 124 order n=3 Key re-distribution at Nonleaves 38 10 3 10 20 10 20 Delete(prt, 5) 40 50 60 80 60 75 3 5 10 20 Less than

half-full Leaf

coalescence 38 60 80 more than

half-full, so

we can re- distribute

pointers at

nonleaves 80 90 125 order n=3 Key re-distribution at Nonleaves 60 38 3 10 20 40 50 80 80 60 75 80 90 126 Nonleaf Coalescence 127 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 128 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1

i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 129 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 ? i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 130 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 ? i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 131 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 k i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 132 p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1

p1

k1 … pi ki

pi+1 p’ k’

p

k p” k” q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 k i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù Nonleaf Coalescence 133 order n=3 Nonleaf coalescence : Delete key 5 134 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 5 10 20 135 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 5 10 20 delete 5 136 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 5 10 20 137 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 10 20 10 20 leaf

coalescence 3 5 10 20 138 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 10 20 10 20 leaf

coalescence 3 5 10 20 less than

half-full 139 order n=3 Nonleaf coalescence : Delete key 5 55 10 Delete(prt, 5) 55 58 60 61 723 10 20 10 20 leaf

coalescence 3 5 10 20 less than

half-full just half-full,

so we need to

coalesce 140 order n=3 Nonleaf coalescence : Delete key 5 55 55 60 Delete(prt, 5) 55 58 60 61 723 10 20 10 20 leaf

coalescence 3 5 10 20 55 60 nonleaf

coalescence 141 order n=3 Nonleaf coalescence : Delete key 5 55 55 60 Delete(prt, 5) 55 58 60 61 723 10 20 10 20 leaf

coalescence 3 5 10 20 55 60 nonleaf

coalescence new root 142 order n=3 Nonleaf coalescence : Delete key 5 55 60 55 58 61 723 10 20 B+tree deletions in practice q Often, coalescing is not implemented

Too hard and not worth it! 143 Pseudo Code for Insertion in B+tree Insert(pt, (p, k), (p', k')); \\ technically, the smallest key kmin in pt is also returned

\\ (p, k) is a pointer-key pair to be inserted into the subtree rooted at pt; p' is a new sibling

\\ of pt, if created, and k' is the smallest key value in p' ;

Case 1. pt = (p1, k1, ..., pi, ki, --, pn+1) is a leaf

\\ pn+1 is the sequence pointer

IF

(i < n) THEN insert (p, k) into pt;

return(p' = Null, k' = 0);

ELSE re-arrange (p1, k1), ..., (pn, kn),

and (p, k) into (ρ1, κ1, ..., ρn+1, κn+1);

create a new leaf p''; put (ρr+1, κr+1, …, ρn+1, κn+1, --, pn+1) in p'';

\\ r =

ë(n+1)/2û

put (ρ1, κ1, ..., ρr, κr, --, p'') in pt;

\\ pn+1 and p'' are sequence pointers in p'' and pt.

IF pt is the root THEN create a new root with children pt and p'' (and key κr+1)

ELSE return(p' = p'', k' = κr+1);

Case 2. pt is not a leaf

find a key ki in pt such that ki-1 ≤ k < ki;

Insert(pi , (k, p), (k", p"));

IF (p" = Null) THEN return(k' = 0, p' = Null);

ELSE IF there is room in pt, THEN insert (k", p") into pt; return(k' = 0, p' = Null);

ELSE re-arrange the content in pt and (k", p") into (ρ1, κ1, ..., ρn+1, κn+1, ρn+2);

create a new node p''; put (ρr+1, κr+1, …, ρn+1, κn+1, ρn+2, -- ) in p''; \\

r = é(n+1)/2ù

leave (ρ1, κ1, ..., ρr-1, κr-1, ρr , -- ) in pt;

IF pt is the root THEN create a new root with children pt and p'' (and key κr)

ELSE return(p' = p'', k' = κr ).

144 Pseudo Code for Deletion in a B+tree Delete(pt, (k,p), belowmin); \\ technically, the smallest key kmin in *pt is also returned

\\ (k,p) is the data record to be deleted from the subtree rooted at pt; belowmin = true if

\\ after deletion, pt has fewer than the required min # of pointers;

Case 1. pt is a leaf

delete (k,p) in pt;

IF pt has at least ë(n+1)/2û data pointers OR pt is the root

THEN return (belowmin = false) ELSE return (belowmin = true);

Case 2. pt is not a leaf

find a key ki in pt such that ki-1 ≤ k < ki;

Delete(pi , (k, p), belowmin');

IF (not belowmin') THEN return(belowmin= false);

ELSE IF pi has an adjacent sibling p' that has more than the required min # of pointers

THEN move one key-pointer pair from p' to pi;

ELSE combine pi with an adjacent sibling of pi

into a single node;

IF pt is the root with only one pointer pi

THEN pt = pi; return(belowmin= false);

IF pt has at least é(n+1)/2ù pointers OR pt is the root

THEN return(belowmin= false) ELSE return(belowmin= true);

145 Application of B+ Tree • Range query 19. Range Search in B+tree • (a,b) • What if b is infinite?

- to the end of the chain of leaves. • What if a is –infinite? – start from the first leaf. • Duplicate keys • B-tree on multiple attributes

• Scan efficiency B-tree with Duplicate keys • A non-leaf node stores the first key of the first node that is not

already present in the previous left sibling.

• If there is no such key, then a null (placeholder) value is stored at

this location. B-tree with Duplicate keys • A non-leaf node stores the first key of the first node that is not

already present in the previous left sibling.

• If there is no such key, then a null (placeholder) value is stored at

this location. B-tree with Duplicate keys • A non-leaf node stores the first key of the first node that is not

already present in the previous left sibling.

• If there is no such key, then a null (placeholder) value is stored at

this location. B-tree with Duplicate keys • A non-leaf node stores the first key of the first node that is not

already present in the previous left sibling.

• If there is no such key, then a null (placeholder) value is stored at

this location. B-tree with Duplicate keys • A non-leaf node stores the first key of the first node that is not

already present in the previous left sibling.

• If there is no such key, then a null (placeholder) value is stored at

this location. Any key search becomes range query. Index on multiple attributes • An index on multiple attributes like A,B will sort tuples first by A

and then by B Index on R(X, Y) X = 'C' & Y = 8 X = 'J' & 3 <= Y <= 8 Search for J,3 - J,8 'C' <= X <= 'E' & Y = 4 Search for C,4 - E,4 'C' <= X <= 'E' & 3 <= Y <= 8 Search for C,3 - E,8 Y <= 5 B-tree indices

• Each node in a B-tree is a disk page (stored on disk)

• Each B-tree is created on one or more key values

• Leaf nodes store one entry for each tuple: each entry has a key

value and a pointer to a tuple (dense index: one leaf node for each

tuple)

Example • 92,000 tuples in instructors Index on instructors(coursecode,

instructor)

• To index each tuple: 9 (course code) + 16 (instructor) + 20 byte (for

tuple address) 45 bytes per tuple!

• Index page (assume) is 8KB Store: 8000/45 = about 177 tuples per

node

• Each node must have: 177 tuples max (88 tuples min) Assume

each node is 100% full (mostly):

• At leaf level: ceil(92,000/177) = 520 nodes in the leaf level

• The level above: ceil(520/177) = 3 nodes in the next internal level The level

above: ceil(3/177) = 1 node (root)

• Assume each node is 50% full (mostly):

• At leaf level: 1040 nodes At a level above: 12 nodes At a level above: 1

node Cost of queries/index search •

Number of disk pages read/written •

For any search, find the first leaf node in the search and then

scan leaf nodes left to right following sibling pointers until the end

of the range is reached. This will find all the matching tuples for a

query.

• Depending on the query, these tuples may need to be read from

disk as well.

Example • R(A,B,C,D,E), Assume: 100,000 tuples 10,000 pages ( 10 tuples

per page)

• Index I1 on R(A): 400 pages at leaf, 3 levels (root+internal+leaf)

• Index I2 on R(B): 400 pages at leaf, 3 levels (root+internal+leaf)

• Index I3 on R(A,B): 1,000 pages at leaf, 3 levels

(root+internal+leaf)

• Index I4 on R(B,A): 1,000 pages at leaf, 3 levels

(root+internal+leaf)

Example • SELECT A,B,C,D,E FROM R

WHERE 10 <= A and A <= 20 ; (returns: 100 tuples)

Query plan 1 -> Scan the whole relation: Cost = 10,000

Query plan 2: Scan I1 (find tuple pointers that match 10<=A<=20) + Read all

tuples that I find based on the index from relation R 100,000/400 = 250 tuples per node

100 tuples for this query will fit in a single leaf node!

Cost to scan the index = 1 (root) + 1 (internal) + 1 or 2 (leaf) = 4 max

To read the matching 100 tuples: I need to read pages of R

How many pages do I need to read?

Best case: 10 pages (all packed sorted by A)

+ 4 = 14 pages Worst case: 100 Cost of this query (max) : 4 + 100 = 104 pages

Example • Query plan 3: Use index I3:

• 100,000 (tuples of R) / 1,000 = 100 tuples per node

• Cost of index search = 4 (max)

• Cost of searching the relation = 100 Total cost = 104 (max) • Query plan 4: Use index I4: Index I4 on R(B,A): 1,000 pages at leaf,

3 levels (root+internal+leaf)

• Scan all leaf, cost of index search = 1 (root) + 1 (internal) + 1,000 (all leaf)

= 1002

• Scan the tuples = 100

• Cost = 1,102 Example • SELECT B FROM R WHERE 10 <= A and A <= 20 ; (returns: 100

tuples)

• Query plans:

• 1. Scan all of R = 10,000

• 2. Use I1 on R(A) = 104 (max)

• 3. Use I3 on R(A,B) = 4 (index only search)

• 4. Use I4 on R(B,A) = 1,002 (index only search) 51作业君版权所有

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228