Documente Academic
Documente Profesional
Documente Cultură
Yanlei Diao
UMass Amherst
May 1, 2008
3
Handling the Buffer Pool
Force every write to disk at
commit time?
Provides durability.
No Steal Steal
Poor response time. Why?
No force, how can we ensure Force Trivial
durability?
Steal buffer-pool frames
No
from uncommitted Xacts? Force
Desired
If not, poor throughput.
Why?
If steal, how can we ensure
atomicity?
4
Basic Idea: Logging
6
ARIES DB RAM
LSNs pageLSNs flushedLSN
7
Log Records
Possible log record types:
LogRecord fields: Update
prevLSN Commit
XID
Abort
type
pageID End (signifies end of
update length commit or abort)
records offset Compensation Log
only before-image
Records (CLRs)
after-image
for UNDO actions
8
Other Log-Related State
Transaction Table:
One entry per active Xact.
Contains XID, status (running/committed/aborted),
and lastLSN.
Dirty Page Table:
One entry per dirty page in buffer pool.
Contains recLSN (i.e., recovery LSN) -- the LSN of
the log record which first caused the page to be
dirty.
9
Checkpointing
Periodically, DBMS creates a checkpoint, to
minimize the time taken to recover in the event
of a system crash. Write to log:
1. begin_checkpoint record: indicates when chkpt began.
2. end_checkpoint record: contains current Xact table and
dirty page table.
3. Store LSN of chkpt record in a safe place (master
record).
10
Implementing end_checkpoint
Quiescent (sharp) checkpointing:
Complete existing transactions and hold all incoming
transactions.
When no xact is running, copy current Xact table and
dirty page table to create end_checkpoint record.
Nonquiescent (fuzzy) checkpointing:
Copy Xact table and dirty page table incrementally to the
end_checkpoint record while other Xacts continue to
run.
• These tables are accurate only as of the time of the
begin_checkpoint record.
Fuzzy checkpoint to a sharp checkpoint: apply all the
log records generated during the checkpoint (or after
the begin_checkpoint time).
11
The Big Picture:
What’s Stored Where
LOG RAM
DB
LogRecords
prevLSN Xact Table
Data pages XID lastLSN
each type status
with a pageID
pageLSN Dirty Page Table
length
offset recLSN
master record before-image
after-image flushedLSN
flushed In RAM 12
Transaction Commit
13
Simple Transaction Abort
For now, consider an explicit abort of a Xact.
No crash involved.
Want to “play back” the log in reverse order,
UNDOing updates.
Get lastLSN of Xact from Xact table.
Can follow chain of log records backward via the
prevLSN field.
To perform UNDO, also need to have a lock on data!
Before starting UNDO, write an Abort log record.
For recovering from crash during UNDO!
14
Abort (Contd.)
15
Crash Recovery: Big Picture
Oldest log
Analysis figures out which
rec. of Xact Xacts committed since
active at crash
checkpoint, which failed.
Smallest Start from most recent
recLSN in checkpoint (via master record).
dirty page
table after REDO all actions.
Analysis
Repeat history!
Start from smallest recLSN in
Last chkpt Dirth Page Table.
UNDO effects of failed Xacts.
CRASH
Back to oldest LSN of a running
A R U xact at crash. 16
Recovery: The Analysis Phase
Reconstruct state at checkpoint.
Get begin_checkpoint record via the master recod.
Find its end_checkpoint record, read the xact table and
dirty page table (D.P.T.).
(Go back to the begin_checkpoint record.)
Reconstruct Xact Table and Dirth Page Table at
crash: scan log forward from checkpoint.
End record: Remove Xact from Xact table.
Other records: Add Xact to Xact table (if not there), set its
lastLSN to this LSN, change Xact status on commit.
Update record: If P not in Dirty Page Table,
• Add P to D.P.T., set its recLSN to this LSN.
17
Recovery: The REDO Phase
We repeat history to reconstruct state at crash:
Reapply all updates (even of aborted Xacts!), redo CLRs.
Scan forward from log record containing smallest
recLSN in D.P.T. For each CLR or update log
record (LSN), REDO the action unless
1. Affected page is not in the Dirty Page Table, or
2. Affected page is in D.P.T., but has recLSN > LSN, or
3. pageLSN (in DB, read with 1 I/O) ≥ LSN.
To REDO an action:
Reapply logged action.
Set pageLSN to LSN in memory. No additional logging!
18
Recovery: The UNDO Phase
ToUndo={ l | l: lastLSN of a “loser” Xact}
Repeat:
Choose largest LSN among ToUndo.
If this LSN is a CLR and undonextLSN==NULL
• Write an End record for this Xact.
If this LSN is a CLR, and undonextLSN != NULL
• Add undonextLSN to ToUndo
Else this LSN is an update. Undo the update,
write a CLR, add prevLSN to ToUndo.
Until ToUndo is empty.
19
Example of Recovery
LSN LOG
RAM 00 begin_checkpoint
05 end_checkpoint
Xact Table 10 update: T1 writes P5 prevLSNs
lastLSN 20 update: T2 writes P3
status
30 T1 aborts
Dirty Page Table
recLSN 40 CLR: Undo T1 LSN 10
flushedLSN 45 T1 ends
50 update: T3 writes P1
ToUndo 60 update: T2 writes P5
CRASH, RESTART
20
Example: Crash During Restart!
LSN LOG
00,05 begin_checkpoint, end_checkpoint
RAM 10 update: T1 writes P5
20 update T2 writes P3 undonextLSN
Xact Table 30 T1 abort
lastLSN
40,45 CLR: Undo T1 LSN 10, T1 End
status
Dirty Page Table 50 update: T3 writes P1
recLSN 60 update: T2 writes P5
flushedLSN CRASH, RESTART
70 CLR: Undo T2 LSN 60
ToUndo 80,85 CLR: Undo T3 LSN 50, T3 end
CRASH, RESTART
90 CLR: Undo T2 LSN 20, T2 end
21
Summary of Logging/Recovery
Recovery Manager guarantees Atomicity &
Durability.
Use WAL to allow STEAL/NO-FORCE w/o
sacrificing correctness.
LSNs identify log records; linked into backwards
chains per transaction (via prevLSN).
pageLSN allows comparison of data page and
log records.
22
Summary, Cont.
Checkpointing: A quick way to limit the
amount of log to scan on recovery.
Recovery works in 3 phases:
Analysis: Forward from checkpoint.
Redo: Forward from oldest recLSN.
Undo: Backward from end to first LSN of oldest
Xact alive at crash.
Upon Undo, write CLRs.
Redo “repeats history”: simplifies the logic!
23