Sunteți pe pagina 1din 38

Computer System Organization & Architecture

Thursday, September 8, 2016

12:03 AM

Interrupts
Computer Organization
- Signal sent to CPU Stop currently running prog Allow OS do
immediate jobs

- CPUs:
Move data from/to main mem, local
buffer
- Device controller:
In charge of a device

- Interrupt Service Routine (ISR)/Interrupt Handler:


Interrupt vector: Store all ISR addresses
Polling: Tests all devices in turn Determine which interrupt
occurred

Has local buffer


Use interrupt Inform CPU it
finished operations

- Concurrent exec of CPUs, devices:


Compete for mem cyc

Incoming interrupts disabled while another processed


Prevent loss

- Alter current prog's control flow:


Interrupt Hardware exec instruction at ISR address

- Bus:
Connect CPUs, device controllers
Provide access to shared mem

Direct Memory Access (DMA)

- Problem with interrupt-based I/O:

Processor involved throughout I/O processing


Large data transferred Many interrupts
- Solution: DMA:
Device controller transfer data block from buffer
directly to main mem without CPU intervention
Only 1 interrupt/block
Not disturb CPU too much

- Types:
External:

From I/O devices: keyboard, mouse,


Timer: Interrupt CPU every few ms
Internal/Exception:
Generated when hardware detects errors in prog
arithmetic (division by zero), addressing (bad pointer),
invalid instruction,

Prog generated/SuperVisor Call (SVC):


Used to transfer control to OS
Mean for user prog to call OS funcs

Storage Hierarchy Structure

Computer System Architecture

- Multiprocessor system:
2 processors in close comm, share bus, block, mem,
peripheral devices

COMP 3511 Page 1

- Multiprocessor system:
2 processors in close comm, share bus, block, mem,
peripheral devices
Advantages:

Throughput (more work at less time)


Reliability (1 fail, others still run)
Econ of scale (Cheaper than multi-single-processor sys)
Type:

Asym multiprocessing: Master-slave CPUs


Sym multiprocessing (SMP):
Storage device

Type

Feature

Register

Volatile

In CPU, high-speed

Cache

Volatile

In CPU/Module

Main mem (RAM)

Volatile

CPU can access directly

Second mem (SSD, HDD) Non-volatile Extension of RAM, large storage


Tertiary storage

Each processor has own reg, cache but share mem

- Uniform Mem Access (UMA):


All processor access mem through shared bus
Any CPU access any RAM in same time interval

Non-volatile For backup/archival data

Caching

- Idea: Store copies of data from slower to faster storage on


temp basis

- Assumption: Data will used again soon


- Mechanism: Faster storage fist checks if info there
HIT: Info used directly from cache

- Non-uniform Mem Access (NUMA):


Each processor has local mem, access mem of other processor
through shared bus

MISS: Copy data from slower storage & use

- Trait: Cache Storage being cached

- Clustered Systems:

2 individual sys loosely coupled


Share storage via storage-area network (SAN)
Type:
Asym clustering: 1 machine monitors other running app
Sym clustering: Multinode run app, monitor each other

COMP 3511 Page 2

Fundamentals of Operating System


Friday, September 9, 2016

11:33 PM

OS Implementation
Operating System

- Multiprogram:
Organize jobs (code + data) Keep CPU busy all times

- Prog intermediary between user & hardware:

Job scheduling:
Select & run job
When job has to wait (e.g. I/O), switch to another

Kernel: Main prog running all times


Others: sys/app prog
- Goals:

Control, coordinate use of sys res


Use hardware in efficient, protected manner
Make comp sys convenient to users (services)
- Overview Workflow:
Comp startup, boot up OS:

CPU jumps to bootstrapping code in firmware


(ROM/EPROM)
Bootstrapping code initialize all sys parts:
register, device controller, mem
Load & start OS kernel, system
processes/daemon

- Multitasking/Time sharing:
CPU switches jobs frequently Users can interact with
each job while it running Interactive platform
- Dual-mode Operation:
User mode:
Certain machine instruction, reg, mem locs not avail
I/O can't accessed

Kernel mode: No restriction


Mode bit: Provide by hardware, help distinguish user/kernel
mode

Transfer userkernel mode via sys call

Wait to handle events:


Interrupts, sys call,

Purpose: Protect OS from errant users, users among


themselves
- Timer:

Interrupt CPU after specific interval


Help prevent user prog from getting stuck/hogging
resource/never returning control back to OS

COMP 3511 Page 3

Operating System Structures


Saturday, September 10, 2016

12:53 AM

System call

- Implementation:

OS Services

Libraries in programming lang serves as link to OS sys call


A number associated with each sys call, maintained by sys
call interface through index table

- User interface:

Callers know nothing about implementation of sys call

CLI (Command Line Interface)


GUI (Graphical User Interface)
Batch interface: Exec commands, directives stored in file

Touchscreen
- Sys call: Prog request services from OS kernel

- Prog exec: Load prog into mem & run


- I/O operations
- File-sys manipulation: Read, write, create, delete, list, search files &
dirs

- Param passing:

- Comm: Exchange info between procs on same/different computer

Pass in reg (limit no of param passed)


Pass addr of mem block storing param in reg

- Error detection

Push onto stack by program, pop by OS

- Resource alloc: Alloc resources to users/jobs concurrently running

- Types:
Process control: Create, terminate, wait,

- Accounting: Keep track of which users use how much resources

File management
Device management: Request, release device,

- Protection & Security:

Info maintenance: Get/set time,

Protection: Ensure all access to sys resources under control

Comm: Send/receive msg

Security: Defend external users/ devices from invalid access

Protection: Control access to resource, get/set perm,

Separation of Policy vs Mechanism

- Policy: Define what needs to be done


- Mechanism: How it actually implemented
Changes needed when policy/mechanism changed

OS System Design

General principle: Modularity:


Divide funcs into modules Simplify debug & sys verification
Approach Monotholic
Def

Layered

Microkernel

Modular

- OS = n layers

- Remove all non-essential components from kernel,


implement them as user-lvl prog

- Core services + (Dynamically) Loadable kernel


modules

- Microkernel:

- Layered + Microkernel:

- Limited structuring
- Typically: Kernel =
Everything below
sys-call, above
physical hardware

Layer 0 = Hardware
Layer n = UI

- Each layer only interact with 2:


Built on top of lower
Provide service for upper

Minimal proc & mem management


Comm between client prog & services through msg
passing

Layered:
Each kernel section has defined, protected
interface
BUT no hierarchy: Any module can call
any others
Microkernel:
Core module: Core funcs, knowledge of
how to load & comm with sub-modules
BUT: Sub-modules no need to invoke
msg passing to comm

COMP 3511 Page 4

Microkernel:

Core module: Core funcs, knowledge of


how to load & comm with sub-modules
BUT: Sub-modules no need to invoke
msg passing to comm

Pros

Little overhead
Performance

Info hiding:
Upper not need to know how lower
implemented, only know what it does
& how to use them

Simple to construct & debug

- Easy to extend OS:


New services added to user space No need to
modify kernel

- Flexibility: New funcs added/removed from kernel


without need to recompile/reboot kernel

- Easy to port OS to new hardware


- Reliable:
Services running in user space Their failure not
affect kernel

Cons

Difficult to
implement &
maintain

- Difficult to appropriately define layers


- Performance:
Layer Overhead

Virtualization

- VM (Virtual machine):
Create illusion of multiple procs, each exec on
own processors with own (virtual) mem

Benefits:
Programming simplicity: Each procs think
it has all mem/CPU time
Fault isolation:
Procs unable to directly impact others
Bugs not crash whole machine
Protection & portability
- Virtualization: Run OS as app within another OS
- Emulation:
Used when source CPU type target CPU type
Allow apps compiled for source CPU run on
target CPU

Generally slow (Translation of source sys


instruction to equiv target sys instruction)

COMP 3511 Page 5

- Performance:

- Difficult to design: Which funcs in core, in sub-

Service User space-Kernel comm, Sys-func modules?


- Comm between sub-modules: Complex
overhead
implementation

Process
Wednesday, September 21, 2016

1:18 PM

Process Control Block (PCB)

- Each proc represented by PCB in OS


- PCB Components (single thread exec):
State: Run, wait, ready, halt,

Overview

ID: Unique no for each proc

- Def: Prog in exec


Prog = Passive entity

Prog counter: Instruction loc of next exec


CPU reg

File containing list of instructions

Mem-management info

Proc = Active entity (with prog counter specifying next


instruction, )

I/O status: I/O devices allocated, list of open files


CPU scheduling info: Priorities, scheduling queue pointers

Exec file loaded into mem: Prog Proc

Accounting info:

1 prog 1 proc

CPU used, time limits, clock time elapsed since start

- Components:
Text: Prog code

Prog counter, Processor reg


Stack: Temp data
Func params, local var,
Data: Global var

Process Scheduling

Heap: Dynamically allocated mem


- Purpose: Max CPU use by selecting suitable, avail proc for next exec
- How:
Maintain scheduling queues of proc:

Job queue: All procs in sys


Ready queue: Procs in main mem, waiting for exec
Device queues: Proc waiting for I/O
Migrate proc among queues
- Types of scheduler:
- Type:
I/O bound:

Type

Long-term

Short-term

Medium-term

Function

Job scheduler:

CPU scheduler:

Select which procs


should be exec next
& alloc CPU

Select which procs


should be brought
into ready queue

Process-swapping
scheduler:

Provide

Provide lesser

Reduce

Existence in time- Almost


sharing sys
absent/Minimal

Minimal

Part of

Speed

Lowest

Highest

Middle

Invoke every sec, min

Invoke every millisec

Spend more time on I/O than computations

Many short CPU bursts


CPU-bound:
Spend more time on computations
Few long CPU bursts

Control deg of
multiprog

- States:
New: Created
Running: Instruction exec
Waiting: Wait for some event to occur (I/O)

Ready: Wait to be assigned to CPU

Remove proc from


mem, store on disk,
bring back later to
continue exec

Terminated: Finished exec

Context Switch

- Switch of CPU from 1 proc/thread to another


Proc context in PCB
Save state of old proc
Load state of new proc
- Properties:
Overhead: Sys does no useful work while switching
Highly dependent on hardware support

Process Creation

SUN UltraSPARC: Multiple sets of reg per CPU, multiple contexts


loaded at once

- What OS does:
Create new PCB (inexpensive)
Setup new page tables for addr space (more expensive)
Copy data from parent proc (very expensive)

COMP 3511 Page 6

- What OS does:
Create new PCB (inexpensive)
Setup new page tables for addr space (more expensive)
Copy data from parent proc (very expensive)
Copy I/O state (medium expensive)
- UNIX commands:
fork(): Create new proc:

Make 2 identical copy of addr space, 1 for parent, 1


for new child proc. BUT:
In parent: fork() returns child's pid if successfully
create, -1 otherwise
In child, fork() return 0
Both proc start their exec at next statement
following fork() sys call

execl(), execlp(), execle(), execv(), execvp(), execvpe():


Child overwrite its remaining proc with new one, start
completely different prog
wait(pid):
Parent wait child to complete
PID of child & its entry in proc table released

Process Termination

- UNIX commands:
exit(): Proc exec last statement, ask OS to delete it,
dealloc resource
abort(): Parent terminates exec of child proc, when:
Child exceed usage of resources

Interprocess Communication (IPC)

Task assigned to child no longer required


- Zombie: Child proc terminated without parent waiting
- Orphan:
Child of parent terminated without calling wait()
UNIX: Assign init proc as new parent

- Why IPC:
Info sharing
Computation speedup (Subtask exec in parallel, )
Modularity (Sys func divided into multiple procs, thread, )
Convenience (Users multitask, )
- Shared Mem (Implicit: no msg exchanged)
How:

Map addr to shared-mem region


Access mem through syscall: read(), write()
Communication in Client-Server Systems

- Sockets: Comm endpoint, = IP + Port


161.25.19.8:1625

Mem: bounded (fixed size), unbounded (no limit practically)


Issue:

Multicore: Multi-cache coherency


Producer-Consumer: Make sure
Producer not add data into full buffer
Consumer not remove data from empty buffer

- Pipes: Act as conduit for 2 procs to exchange info, coordinate


activities
Ordinary pipes (UNIX)
Comm in producer-consumer style
Require parent-child rel

COMP 3511 Page 7

- Msg Passing (Explicit):


Direct Comm

Indirect Comm

- Msg sent/received through mailbox


Mecha - Proc must explicitly name
nism
recipient/sender its want to comm (port)
- Link established automatically
- Proc comm through shared mailbox

activities
Ordinary pipes (UNIX)

Direct Comm

Comm in producer-consumer style


Require parent-child rel
Unidirectional: Parent = Producer, Child = Consumer
Removed after procs finish comm & terminated

- Only 1 link between 2 procs in


comm

Sys call: pipe (int fd[n])


Return: 0 = Success, -1 = -1 Failure
fd[0] for reading, fd[n-1] for writing

Named pipes:
Bidirectional, no parent-child rel
Can be used by 2 procs

Indirect Comm

- Msg sent/received through mailbox


Mecha - Proc must explicitly name
nism
recipient/sender its want to comm (port)
- Link established automatically
- Proc comm through shared mailbox
- Each mailbox has unique ID

Comm 1-to-1
mode
suppor
t

- 1 link can associated with 2procs


- 2 procs can comm on 1 link
- 1-to-1
- 1-to-Many
- Many-to-1

- Many-to-Many

Operati - send(P, msg): Send msg to proc P - Create/Destroy mailbox


ons
- receive(Q, msg): Receive msg from - send(A, msg): Send msg to mailbox A
proc Q
- receive(A, msg): Receive msg from
mailbox A
Issues

Still exist after procs finish comm

- Limited modularity: Proc name


changed OS has to re-match
every pairs of sender & receiver
- Need to know proc name before
comm

- Complex implementation
- Latency, Overhead

- Sync (Blocking) vs Async (Non-blocking) comm:

Sync

Async

Mecha - Sender: Blocked until msg received


nism
by receiver/mailbox
- Receiver: Block until msg avail

- Sender: Send msg and resume


operation immediately
- Receiver: Retrieve valid msg or null

Pros

- Sender & Receiver dont have to


wait Can do extra
- Concurrency

- Sender: Can effectively confirm


receipt
- No buffer space problem: 1
outstanding msg
- Easy to implement
- Low overhead

Cons

- Avoid some awkward kernel


decisions:
Swap waiting proc out to disc?

- Sender: Can't do extra while waiting - Buffer space problem: Sender &
for receipt confirmation
Receiver must buffer msg until
receipt confirmed
- Receiver: Can't do extra besides
- Receiver must test very often to
waiting for msg
test if new msg waiting Difficult
in interrupt & signal prog

- Buffering: Queue of msg attached to link (direct/indirect), implement as:


Zero cap: Sender must wait for receiver
Bounded cap: Sender must wait if link full
Unbounded cap: Sender never wait

COMP 3511 Page 8

Multithreaded Programming
Thursday, September 29, 2016

5:16 PM

Parallelism vs Concurrency

- Parallelism: Sys can perform 1 tasks simultaneously


Thread Overview

- Def: Portion of code executed indept of main prog

Data parallelism: Distribute different subsets of same data


across multiple cores, same operation on each
Task parallelism: Distribute threads across cores, each
thread perform unique operation

- Concurrency: Sys support 1 tasks making progress


Single processor: Round-robin scheduling
Concurrent exec

- Benefits:
Responsiveness: Allow continuous exec if part of
process blocked, especially important for user interface
Resource sharing:
Threads within proc share resources by default
Easier than shared mem, msg passing (must
explicitly arranged by programmer)

Amdahl's Law

- Identify performance gains from adding cores to app that has


serial & parallel components

Economy:
Thread creation much cheaper than process
creation
Thread switching much lower overhead than
proc's context switching

S: Proportion of serial components


N: No of processing cores

Scalability: Multi-thread proc can take advantage of


multiprocessor architect

App: 75% parallel, 2 cores


S = 1 - 0.75 = 0.25, N = 2

- Thread params:
Private: Thread Control Block (TCB)
Exec info: CPU Register, program counter, stack
Scheduling info: state, priority, CPU time
Accounting info

Pointers (for scheduling queues implementation)

- Serial portion has disproportionate effect on performance


gained by adding cores

Public: Shared by all threads in proc:


Mem content (global var, heap)
I/O state (file sys, network )
- Thread Lifecycle:
New: Created

User Threads vs Kernel Threads

Ready: Waiting to run (in CPU)


Running: Exec (in CPU)
Waiting: Waiting for some event to occur
Terminated: Finish exec

- User threads: Managed by user-lvl thread library, no kernel support


POSIX Pthread, Win32 Thread, Java Thread

- Kernel thread: Supported by kernel


- Relationship between user & kernel threads:

COMP 3511 Page 9

Many-to-One

One-to-One

Many-to-Many

- Many user thread


mapped to single
kernel thread

- Each user thread


mapped to a
kernel thread

- Many user thread mapped


to many kernel thread

- One user thread


blocking cause all to

- More concurrency
than Many-to-One

- Allow OS to create
sufficient no of kernel
threads

blocking cause all to


block

than Many-to-One

threads

- Multiple user thread


may not run parallel
on multicore sys
because only 1 in
kernel at a time

Threading Issues

1. Semantic of forks(), clone(), exec() in multithreaded Environment


- fork() has 2 version:

Solaris Green

Windows NT

GNU Portable

Linux

Solaris 8.x

Duplicate all threads of proc


Duplicate only thread that invoke fork() sys call
- clone():
Similar to fork(), but new proc shares addr space with
calling proc (instead of using completely new addr space like
fork)
Mean of creating thread in Linux
- exec(): Replace the whole proc (as normal)
Call exec() immediately after forking & duplicate all thread is
unnecessary
1. UNIX Signal:
- Used to notify proc a particular event has occurred
- Can be received sync/async
- Follow pattern:
Signal generated (by event)
Signal delivered to proc
Once delivered, signal must be handled

- Every signal has default handler run by kernel, which can be


overridden by user-defined handler
- Signal handling func:
void handler_func_name(int sig)
- Register user-defined handler to kernel:
signal(sig_value, handler_func_name)

3. Thread Cancellation: Involve terminating thread before completion


- Async cancellation: Terminate immediately

- Deferred cancellation: Allow target thread to periodically check if


it should be cancelled

Thread-Local Storage (TLS)

- Allow each thread to have own copy of data


- Unique to each thread
- Useful when user not have control over thread
creation proc
- Different from local var:
Local var only visible during single func
invocation
TLS visible across all func invocation

COMP 3511 Page 10

Two-lvl Model = Many-to-Many + One-to-One


HP-UX, IRIX,

Process Scheduling
Thursday, September 29, 2016

5:16 PM

Dispatcher
- Module that gives CPU control to proc selected by shortterm scheduler

CPU Scheduling

- Steps:

- Def: Act of selecting next proc for CPU to service once current proc
leaves CPU idle

Switch context
Switch user mode

Have K jobs ready to run


Have N 1 CPUs

Jump to proper loc in user prog to restart that prog

Assign which jobs to which CPU?

- Dispatcher latency: Time taken for dispatcher to stop 1


proc & start another

- Criteria:
Max CPU utilization: Keep CPU as busy as possible
Max Throughput: No of proc completing exec per time unit
Min Turnaround time: Time to execute a proc (from time of
submitting proc to ready queue to time of completion)
Min Waiting time: Time proc spending waiting in ready queue

Next CPU Burst Time Prediction

Min Response time: Time from when proc submitted until its
first response produced (in time-sharing env)

(Using Exponential Avg)

NOTE: Waiting Time = Turnaround Time - Total Execution Time

n+1 = tn + (1 - ) n

= Turnaround Time - Total CPU Burst Time

n+1: Predicted length of (n+1)th CPU burst

- Circumstances to manipulate queue:


1: Proc switches from running wait state

n+1: Predicted length of nth CPU burst


tn: Actual length of nth CPU burst

I/O request, wait() called


2: Proc switch from running ready state

= 0.5

By interrupts
3: Proc switch from waiting ready
I/O complete

4: Proc terminated

10

- Scheduling type:
Preemptive (2 & 3): Currently running proc may be interrupted
& moved to ready state by OS
Non-preemptive (1 & 4): If proc in running state, it continues to
execute until termination / I/O required / Syscall required

Scheduling Algorithm

Name

Details

Preemptive / Issue
Nonpreemptive

FirstDispatch proc according to arrival time


Come,
First-Serve

Nonpreemptive

Example

- Pros: Simple, no starvation

- Cons:
Short proc behind long proc CPU
utilization

(FCFS)

CPU bound proc before many I/O


bound proc

Proc

Arrival time CPU Burst time

P1

24

P2

P3

10

Throughput not emphasized

Priority
Schedulin
g

Dispatch proc with highest prio (smallest integer)


first

COMP 3511 Page 11

Both

- Pros: Good response time for high prio proc


- Cons: Starvation (Low-prio proc may never
executed)
- Solution: Aging: As time progresses, proc prio

Non-preemptive prio-scheduling

Proc

Priority CPU Burst time

P1

10

P2

P3

P4

P5

ShortestJob-First

Form of Non-preemptive Prio Scheduling

(SJF)

Nonpreemptive

- Pros:
Optimal waiting time

- Dispatch proc with shortest CPU burst first

Response time for short proc

- Same CPU burst, dispatch proc arriving first

High throughput

- Cons: Starvation

ShortestRemaining
-TimeFirst
(SRTF)

Form of Preemptive Prio Scheduling

Preemptive

- At each time instance: Calculate remaining CPU


burst time of all procs
- If proc with shortest remaining CPU burst time in
ready queue: Preempt currently running proc by
this proc

- Each proc gets small unit of CPU time (Time


RoundNonRobin (RR) quantum)
preemptive
- After time quantum elapse / When proc request
I/O, syscall, : Proc preempted & added to end of
ready queue

- Pros: Good response time for short proc


coming at a time instance
- Cons: Starvation

- Pros:

Better response time overall


No starvation

- Cons: High turnaround time

P5

Proc

CPU Burst time

P1

P2

P3

P4

Proc

Arrival time CPU Burst time

P1

P2

P3

P4

Proc

Arrival time CPU Burst time

P1

24

P2

P3

10

- Time quantum Context switch time


Quantum not necessarily Turnaround
time
Quantum too big: Become FCFS
- n procs in ready queue, Time quantum q: All
procs wait (n-1)q

Multilevel Queue

Multilevel Feedback Queue

- Similar to Multilvl Queue, but proc can move between various


queues
- Defined params:
No of queues
Scheduling algorithm for each queue

Methods to determine when to upgrade proc


Methods to determine when to downgrade proc
Methods to determine which queue proc will enter when
it needs service

- Ready queue partitioned into separated queues


Foreground (interactive) / Background (batch),

COMP 3511 Page 12

- Ready queue partitioned into separated queues


Foreground (interactive) / Background (batch),
- Procs permanently assigned to 1 queue when entering sys,
based on their props

Mem size, prio, proc type,


- Each queue has own scheduling algorithm
- Scheduling must be done among queues:
Fix-prio preemptive
Serve all from foreground, then background

Time slice: Each queue gets certain amount of CPU


time for its own scheduling
80% foreground RR
20% background FCFS

Inter-queue scheduling: Preemptive (Finish all jobs in


Q0, then in Q1, then Q2)
Proc downgrade:
New proc enter Q0, if it not finish in 8ms (Q0
quantum), move to Q1

If still not complete within 16ms (Q1 quantum),


move to Q2
Proc upgrade: If it not consume all pre-allocated CPU
time
I/O request, syscall,

Multicore Processor Scheduling


(multi-processor on same chips)

- Mem stall: When proc access mem, it spends significant time amount
waiting for data to be avail (due to cache miss, )

Multiprocessor Scheduling
(for heterogeneous sys (all CPUs of same type))

- Approaches:
Asym:
- Scheduling takes advantage of mem stall to make progress on
another thread
Dual thread processor core (2 logical processor)
Dual thread, dual core sys = 4 logical cores

1 master processor, controlling all activities & running all


kernel code
Others runs only user code
Sym (more common):
Each processors schedules its own jobs
Ready queue: Can be common/separate
- Processor affinity: Proc has affinity for processor where it is currently
running, especially cache content
Soft affinity: OS attempt to keep proc running on same
processor, but not guarantee it
Hard affinity: Allow proc to specify subset of processors where it
may run

Thread Scheduling

- OS schedules kernel-lvl thread, not procs

- User-lvl thread must mapped into associated kernel-lvl thread,


although this mapping can be done indirectly through lightweight
proc (LWP)
1 User-lvl thread 1 LWP
1 LWP 1 Kernel thread
LWP appears to be virtual processor where proc schedule user
thread running on it

Unbound Threads /

Bound Threads /

Process-Contention Scope (PCS)

Sys-Contention Scope (SCS)

- Thread models applied:

- Thread models applied:

Many-to-1

1-to-1

Many-to-many
- Association of user-lvl thread to
LWPs managed by thread library

- Bound thread permanently


attached to 1 LWP

COMP 3511 Page 13

- Non-Uniform Access (NUMA):


Main mem architect can affect processor affinity, if particular CPUs
have faster access to mem on same chip than to mems in other
locs
If proc has affinity for particular CPU Should preferentially
assigned mem storage in "local" fast access
- Load Balancing: Balance job load between processors
Achieved naturally by sys using common ready queue
Maintained through push & pull migration:
Push migration: A proc runs periodically & moves procs
from heavily loaded processors to less loaded
Pull migration: Idle processors takes procs from ready
queues of others
Push & Pull migration:

Can run in parallel (not mutually exclusive)


Work against principle of processor affinity

Scheduling Algorithm Evaluation

- Deterministic model: Take particular pre-determined workload,


define performance of each algorithm on that workload
waiting time
- Queueing model:
Suppose distribution of CPU, I/O burst known (based on
measurement, simple estimation)
Consider computer sys as network of servers, each with queue
of waiting procs
Based on these, calculate certain performance characteristic of
individual waiting queues
Little's Formula:
n= W
n: Queue length
: Avg rate of new jobs arriving
W: Avg waiting time in queue

- Simulation
- Real implementation

COMP 3511 Page 14

Process Synchronization
Wednesday, October 19, 2016

Critical Section Problem

9:19 AM

- Situation: N cooperating procs p0, p1, , pn-1, each has structure:


do {
{ Entry Section }
{ Critical Section }
{ Exit Section }

Race Condition

- >1 procs access & manipulate shared data concurrently


- Outcome of exec depends on particular order which
access/exec occur

{ Remainder Section }
} while (true)
Entry section: Request entry into critical section

Critical section: Access & update shared vars,

Proc

C++ code

Proc instructions

P0

Counter++ Register1 = Counter


Register1 = Register1 + 1
Counter = Register1

P1

Counter--

Register2 = Counter
Register2 = Register2 - 1
Counter = Register2

Exit section: Release shared vars lock, allow other procs in


- Assumption: Procs exec at non-zero speed
- Problem: Design protocol to ensure when 1 proc in Critical Section, others
can't be in

- Solution:
Mutual Exclusion: Only 1 proc in Critical Section at any time

Initially: Count = 5. P0, P1 exec interleavingly:

Proc

Hardware Instruction

Value

P0

Register1 = Counter

Register1 = 5

P0

Register1 = Register1 + 1 Register1 = 6

P1

Register2 = Counter

P1

Register2 = Register2 - 1 Register2 = 4

P0

Counter = Register1

Counter = 6

P1

Counter = Register2

Counter = 4

Progress: No procs running outside Critical Section can block others


from entering Critical Section
Bounded Waiting: Upon request, no procs should wait arbitrary
long to enter Critical Section

Register2 = 5

Critical Section Hardware Solutions

Critical Section Algorithmic Solution - Peterson's Algorithm

General approaches:
- Protect critical regions via locks
Implemented by special atomic hardware instruction
Test/Set/Swap mem words

1. Assumption: LOAD, STORE instruction are atomic (non-interruptible)


2. Implementation:

- Disable interrupt:
Make currently running code execute without being preempted by others
Not work for multiprocessor env

Initialization:
int intendToEnter[] = {0, 0};
// turn = i: It's turn for proc i to enter Critical Section
int turn;

Lock-based Solution
1. Test-and-Set:

a. Assumption: Func TestAndSet() atomic


P0

P1

while (1) {
/* Entry section */
intendToEnter[0] = 1;
turn = 1;
while (intendToEnter[1]
&& turn == 1);

while (1) {
/* Entry section */
intendToEnter[1] = 1;
turn = 0;
while (intendToEnter[0] &&
turn == 0);

{CRITICAL SECTION}

{CRITICAL SECTION}

/* Exit section */
intendToEnter[0] = 0;

/* Exit section */
intendToEnter[1] = 0;

{REMAINDER SECTION}
}

{REMAINDER SECTION}
}

3. Proof of Correctness:
- Mutual exclusion:
For both P0 & P1 to enter Critical Section:
intendToEnter[0] = intendToEnter[1] = 1
turn = 0 & turn = 1 at same time

Impossible

b. Implementation:
// Initialization:
// lock = 1: A proc is still in Critical Section
int lock = 0;
// waiting[i] = 1: Proc i is waiting to enter Critical
Section
int waiting[MAX_N];
memset(waiting, sizeof(waiting), 0);
// Acquire lock
int TestAndSet(int* target) {
int retVal = *target;
*target = 1;
return retVal;
}

// Pi's code
do {
/* Entry Section */
waiting[i] = 1;
while (waiting[i] && TestAndSet(&lock));
waiting[i] = 0;
{CRITICAL SECTION}

- Progress:
P0 can be prevented from entering Critical Section if

intendToEnter[1] = 1 & turn = 1

COMP 3511 Page 15

/* Exit Section */
// Scan waiting[] in cyclic ordering i+1, i+2, , n-1,

{CRITICAL SECTION}

- Progress:
P0 can be prevented from entering Critical Section if

/* Exit Section */
// Scan waiting[] in cyclic ordering i+1, i+2, , n-1,
0, , i-1
// Find first proc j with waiting[j] = 1
int j = (i + 1) % n;
while (j != i && !waiting[j]) j = (j + 1) % n;

intendToEnter[1] = 1 & turn = 1


If P1 not in Critical Section:
P1 has finished Critical Section: Reset intendToEnter[1] = 0 P0
can enter Critical Section
P1 is in Entry Section: Set turn = 0 P0 can enter Critical Section

P0 can't be prevented

if (j == i) // No more procs waiting, release lock


lock = 0;
else waiting[j] = 0; // Hand in entry to Pj

- Bounded-Waiting:

If P0 is waiting in Entry Section: intendToEnter[1] = 1 & turn = 1


After P1 leaves Critical Section: Reset intendToEnter[1] = 0 P0 can
enter Critical Section

{REMAINDER SECTION}
} while (1)

P0 only needs to wait at most 1 turn to enter Critical Section


c. Sketch Proof of Correctness
- Mutual exclusion: Pi enter Critical Section only if waiting[i] = 0 or lock
=0
waiting[i] = 0 only if:
Pj just left Critical Section & changed waiting[i] from 1 0
(other waiting[j] = 1 remain unchanged)
lock = 1 kept unchanged

Spinlock

lock = 0 only if: Pi is the first to enter Critical Section. After entering
Critical Section, lock = 1
- Proc waste CPU cycle due to busy waiting (using loop to
repeated check condition) for entry to Critical Section

Only 1 proc in Critical Section at any time


- Progress: Pi exiting Critical Section will set lock = 0 or waiting[j] = 0

- Advantage: No context switch needed for waiting phase


Suitable for short waiting time

Allow another proc to enter Critical Section

- Bounded-Waiting:

- Often used in multiprocessor sys:

A proc leaves Critical Section

1 thread "spin" 1 processor

Scan waiting[] in cyclic order Find the first waiting[j] = 1 (if


avail)

1 thread perform Critical Section on another processor

Allow Pj to enter Critical Section


Waiting proc only need to wait at most (n - 1) turns to enter Critical
Section

2. Mutex lock

Semaphore

a. Assumption: Func Acquire(), Release() atomic

b. Implementation:
// Initialization
bool available = true;

- General sync tool


- Components:
Shared integer var S
2 atomic operations:

void

wait(int& S) {
while (S <= 0);
S--;

void signal(int& S) {
S++;
}

Support funct

Proc code

Acquire() {
// Busy Waiting
while (!available);
available = false;
}

do {
Acquire();
{CRITICAL SECTION}

Release();
- Type:
Counting: No restrict for S's domain

Release() {
available = true;
}

Binary: S {0, 1}

{REMAINDER SECTION}
} while (true)

- Non-busy-waiting implementation:
struct Semaphore {
int value;
ProcessWaitingQueue Q;
}
void wait(Semaphore& S) {
S.value--;

void signal(Semaphore& S) {
S.value++;

if (S.value < 0) {
/* Push proc invoking wait()
to waiting queue */
S.Q.push(GetInvokingProc());

if (S.value <= 0) {
Process P = S.Q.pop()
/* Syscall: Resume
exec of previously
blocked proc */
wakeup(P)

/* Syscall: Suspend invoking


proc */
block();
}

}
}

Deadlock & Starvation

- Deadlock: 2 waiting procs waits indefinitely for event caused by only 1 of them

S, Q: Semaphore, initialized with 1


(1)
(2)

(3)
(4)

P0

P1

wait(S);
wait(Q);

signal(S);
signal(Q);

wait(Q);
wait(S);

signal(Q);
signal(S);

After (2): Q's queue: P1, P0


S's queue: P0, P1
To resume P0: P1 must release Q's lock (by calling signal(Q))

But P1 stuck at (2) because of waiting in S


P1 can't reach (3)

Priority Inversion

Similar for P1
COMP 3511 Page 16

To resume P0: P1 must release Q's lock (by calling signal(Q))


But P1 stuck at (2) because of waiting in S

P1 can't reach (3)

Priority Inversion

Similar for P1

- Lower-prio proc holds lock needed by higher-prio proc


- Issue: Low-prio proc preempted by another proc with higher prio
(but not need lock) Lock-required high-prio proc need to wait
longer
3 proc: L, M, H (prio: L < M < H)

H needs resource R, currently locked by L


M not need R
H wait for L finish using R

During that time: M arrives, preempt L


H has to wait longer to get access R

- Solution: Prio Inheritance Protocol


Procs accessing resources needed by higher-prio proc inherit
higher-prio until they finish using resource
Prevent being preempted by higher-prio proc not
needing resource
After using resource, prio reverted to original

COMP 3511 Page 17

- Starvation (Indefinite blocking): Proc never removed from semaphore queue


where it is suspended
Semaphore queue use LIFO (Stack, =last-in first-out) order

Synchronization Problems
Sunday, October 23, 2016

4:20 PM

Readers-Writers

- Statement:

Bounded-Buffer

Data set shared among 1 writer, multiple readers


Readers: Only read data set

- Statement: Sync producer & consumer on buffer of size n

Writers: Read + Write data set


- Solution

- Solution:
Semaphore

Init

Usage

mutex

Ensure access mutually excluded

haveData

Lock access until buffer has data

haveEmptySlot n

Initialization:
// Ensure writer/reader access mutually excluded
// No separate access among readers
Semaphore rw_mutex = 1;

Lock access until buffer has empty slot

Producer

Consumer

do {

do {
{Produce item}

wait(haveData);
wait(mutex);

wait(haveEmptySlot);
wait(mutex);
{Add item to buffer}
signal(mutex);
signal(haveData);
} while (true)

{Remove item from buffer}


signal(mutex);
signal(haveEmptySlot)
{Consume item}
} while (true)

// Lock reader's access to shared var read_count


Semaphore mutex = 1;
// Number of readers currently access data
int read_count = 0;
Writer

Reader

do {

do {
wait(rw_mutex);

wait(mutex);
read_count++;

{Perform writing}
signal(rw_mutex);
} while (true)

/* Because multiple reader can work


simultaneously, only the FIRST incoming one
need to acquire rw_mutex lock */
if (read_count == 1)
wait(rw_mutex);
signal(mutex);
{Perform reading}
wait(mutex);
read_count--;
/* LAST reader that finishes accessing
release rw_mutex lock */
if (read_count == 0)
signal(rw_mutex);
signal(mutex);
while (true);

COMP 3511 Page 18

Deadlocks
Wednesday, November 9, 2016

System Model for Deadlocks

10:20 AM

- Resource types: R1, R2, , Rm


- Each Ri has Wi instances

Deadlock

- Each proc utilize resource as following:


- Set of blocked procs, each holding resource & waiting to
acquire resource held by other procs
- Happen if 4 conditions hold simultaneously (necessary, but
NOT sufficient):
Mutual exclusion: Only 1 proc use 1 resource instance
at time
Hold & Wait: Proc holding 1 resource waiting to
acquire resources held by others
No preemption: Resource can only voluntarily released
by proc holding it
Circular wait:

1: Request for resource


2: Use resource

3: Release resource
- Resource Allocation Graph:
Vertices:
Procs: P = {P1, P2, , Pn}
Resource types: R = {R1, R2, , Rm}
Edge:
Request edge: Pi Rj (Pi request 1 instance of R j)
Assignment edge: Rj Pi (1 instance of Rj assigned to Pi)

Waiting proc set {P0, P1, , Pn}


P0 waiting for resource held by P 1
P1 waiting for resource held by P 2

Pn waiting for resource held by P0

No cycle

Cycle

Cycle

No deadlock

Deadlock

No deadlock

Deadlock Prevention

Ensure 1 necessary condition can't hold by restraining ways


request can made
- Mutual exclusion:

P4 can run & terminate normally, then release 1


instance of R2

Could hard to eliminate because some resources are


inherently non-sharable
- Hold & Wait:

Whenever proc request resource, it doesn't hold any


others
Proc can only request & get allocated all
resources before beginning exec
Proc can only request resource when it has none

P3 then take that R2 instance, break cycle


Observation:
No cycles, no deadlock
Cycle:
1 instance/resource Deadlock
1 instance/resource Possibly deadlock

Low resource utilization, starvation possible


- No preemption:

When proc (currently holding resources) request another


resource not immediately allocable
All held resources released

Deadlock Recovery

Preempted resources given to waiting proc


Proc holding released resource only restarted when
it can regain its old resources
Can only applied to resources whose state can easily
saved & restored
Register, mem space
Can't applied to locks, semaphores
- Circular wait:
Impose total ordering of all resource types
Require proc to request resources in increasing order
of enumeration
R1 < R 2 < < R m

COMP 3511 Page 19

- Terminate proc:
Abort all deadlock proc:
Not desirable
Abort proc 1 by 1, until deadlock cycle eliminated
Consideration overhead (must run detection algorithm every time 1 proc
aborted)
- Preempt resource:
Successively preempt resources from proc & give them to other procs until
deadlock cycle broken
Possibly starvation (same proc always picked as victim)

Deadlock Avoidance
Friday, December 16, 2016

11:58 AM

Resource-Allocation Graph Scheme

Deadlock avoidance

- Require in advanced additional info about which resource


requested by proc

OS can decide for each resource, whether proc should wait


Avoid unsafe state
- Approach:

Proc declare max no of resources of each type it may need


Deadlock avoidance algorithm dynamically examine
resource-allocation state Ensure circular wait never exist

- Claim edge Pi Rj (dashed line): Pi MAY request Rj


P1 R2, P2 R2
- Request edge Pi Rj: Converted from claim edge, when Pi DEFINITELY
request Rj
P2 R1
- Assignment edge Rj Pi: Converted from request edge, when Rj
allocated to Pi

Safe State

R 1 P1
- When Pi release Rj, assignment edge reconverted to claim edge

- Sys in safe state if

Sequence P1, P2, , Pn


Resources needed by Pi can satisfied by currently avail
resources + resources held by Pj (j < i)

- Resource-Allocation Graph Algorithm:


Request Pi Rj can granted only if converting request edge to
assignment edge doesnt form cycle

Max pieces of resource need Current allocation


P1

P2

P3

Banker's Algorithm

Exec sequence to safe state: P2, P1, P3


Free instance = 10 - 4 - 2 - 3 = 1 P2 get 1 more, run, release

- Assumption:

Free instance = 1 - 1 + 3 = 3 P1 get 2 more, run, release

Resource has multiple instance

Free instance = 3 - 2 + 6 = 7 P3 get 6 more, run, release

Proc must declare priori max use


When proc request resource, it has to wait (First need to check whether
this allocation results in safe state)

- Implication:
If Pi's resource needs not immediately available: Wait until
all/some Pj (j < i) finish

When proc get all resources, it must return them in finite time amount
- Data structure:

- Sys in safe state: No deadlock


- Sys in unsafe state: Possibly deadlock

n = No of procs

m = No of resource types

Why: Not every instance of proc would require MAX pieces of


resource to finish

available[1 m]:
available[j] = k: Rj has k avail instances

Max pieces of resource need Current allocation


P1

10

P2

Unsafe state: Only 10 - 4 -5 = 1 piece of resource left


Deadlock happens if 1 proc request max amount of resource
In reality:

P1 can finish job with current allocation, then release


resources
P2 will have enough resource to finish

No deadlock

Max[1 n, 1 m]:
Max[i, j] = k: Pi may request at most k instances of Rj
Allocation[1 n, 1 m]:

Allocation[i, j] = k: Pi currently allocated k instance of Rj


Need[1 n, 1 m]:

Need[i, j] = Max[i, j] - Allocation[i, j]


- Safety Algorithm: Find sequence of exec proc in safe state
Initialization:
Work Available

Finish[i] = false, i = 1, 2, , n
Loop until Finish[i] = True i:

Find i:

Finish[i] = false

Need[i] Work
If i: Sys not in safe state, terminate
Add Pi to sequence
Work Work + Allocation[i]
Finish[i] True

COMP 3511 Page 20

Finish[i] True
- Resource-request algorithm:
Given Pi, Rj
Request[i, 1 m]: Pi suddenly wants k more instance of Rj (in addition to
Allocation[i])

Example of Banker Algorithm

1: If Request[i] Need[i]: Go to step 2

Otherwise, raise error (since Pi exceeds its max claim)


2: If Request[i] Available: Go to step 3
Otherwise, Pi must wait (since resources currently not avail)

3: Pretend to allocate resources to Pi:


Avail Avail - Request[i]
Safe state: P0, P3, P2, P1, P4

Allocation[i] Allocation[i] + Request[i]

Need[i] Need[i] - Request[i]


4: Run safety algorithm to determine safe state:
If safe: Additional resources can allocated to Pi

If unsafe:
Pi must wait
Restore Avail, Allocation[i], Need[i]

Request from P4 arrives: Request[4] = (0, 0, 2, 0)


Request[4] < Need[4] ((0, 0, 2, 0) < (2, 2, 3, 3))

Request[4] < Available ((0, 0, 2, 0) < (3, 3, 2, 1))


Pretending to allocate resources to P4:
Avail (3, 3, 2, 1) - (0, 0, 2, 0) = (3, 3, 0, 1)

Allocation[4] (1, 4, 3, 2) + (0, 0, 2, 0) = (1, 4, 5, 3)


Need[4] (2, 2, 3, 3) - (0, 0, 2, 0) = (2, 2, 1, 3)

Available not enough to execute any Pi (as C = 0)

P4's request (0, 0, 2, 0) can't granted immediately

COMP 3511 Page 21

Deadlock Detection
Friday, December 16, 2016

Multiple Instance of Each Resource Type

12:25 PM

Single Instance of Each Resource Type

- Data structure:
Available[1 m]:
Available[i] = k: Ri still has k avail instance
Allocation[1 n, 1 m]:

Allocation[i, j] = k: Pi currently allocated k instance of Rj

- Wait-for graph:
Vertices: Procs

Request[1 n, 1 m]:

Edge Pi Pj: Pi waiting for Pj

Request[i, j] = k: Pi request k more instances of Rj

- Detection algorithm:
1: Initialize
Work Available

Finish[i] = False, if Allocation[i] 0


True, otherwise (i = 1, 2, , n)

2: Find i:

Finish[i] = False
Request[i] Work

If i: Go to step 4
- Periodically search for cycle in graph
Cycle Deadlock

3: Work Work + Allocation[i]


Finish[i] = True

Go to step 2

- Complexity: O(n2)

4: If i: Finish[i] = False: Pi deadlocked, system in deadlock state

COMP 3511 Page 22

Memory Management
Thursday, December 15, 2016

Address protection & Dynamic relocation

9:11 PM

- Addr protection:
Addr space of proc/user defined by base & limit register

Overview

- Prog must brought into mem & placed within proc to run
Done by long-term scheduler

CPU checks every mem access generated in user mode, ensure it within base for
that proc/user
Implementation: Limit register: Contain range of logical addr

- Memory usually divided into 2 partitions:


Low mem: Resident OS, interrupt vector
High mem: User procs
- Physical addr: Addr seen by memory unit

- Logical (Virtual) addr:


Generated by CPU
User progs deal with logical addr, never see real physical one

- Symbolic addr: Addr represented by symbols in source code


x, y, count

- Relative (Relocatable) addr: Addr represented as offset from


location

- Dynamic relocation:
Routines kept on disk in relocated format & not loaded to mem until called
Utilize mem space
Unused routine never loaded
Large code portion handling infrequently occurring cases rarely loaded
Implementation:
Prog designed to support dynamic loading

Multistep Address Binding of User Program

OS provides library implementing dynamic loading


Relocation register: Contain value for smallest physical addr

- Compile time:
Compiler translates
symbolic relative addr
Same logical & physical
addr

- Loading time:
Loader translate relative
absolute addr
Same logical & physical
addr

- Exec time:
Memory Management
Unit translates absolute
physical addr

Contiguous Allocation

Different logical &


physical addr

- Hole: Block of avail mem

- When proc arrives: Allocated in hole which is large enough to accommodate it


- When proc exists: Free its partitions, adj free partition combined

Swapping

- Allocation strategy:
- Temporarily swap proc from mem to backing store, then
brought back into mem to continue exec
Support: Physical mem space < Logical mem space
Degree of multiprogramming
- Ready-queue: Manage ready-to-run proc having mem images
on disk

COMP 3511 Page 23

First fit

Best fit

Worst fit

Allocated to first, big


enough hole

Allocated to smallest, big


enough hole

Allocated to largest hole

Produce smallest leftover


hole

Produce largest leftover


hole

Must search through hole


list

Must search through hole


list

Fragmentation

- External fragmentation: Mem space exist to satisfy request, but not contiguous

Segmentation

- Program is collection of segment

- Internal fragmentation:
Allocated mem slightly larger than requested mem

However, remaining mem is inside allocated partition, can't used by other procs
- Solution
Compaction:
Shuffle mem content to place all free mem together in 1 large block
Possible only if addr relocation is dynamic, done at exec time
Can be expensive
Segmentation

- Logical addr format: (Seg no, Offset)


- Segment table:
Entry: (Base, Limit, Validation bit):
Base: Starting physical addr where segment reside in
mem
Limit: Length of segment
Validation bit: Illegal? Read/Write privilege

Referenced by 2 hardware register:


Segment-table base register (STBR): Table's location in
mem
Segment-table length register (STLR): No of segments
used by prog

COMP 3511 Page 24

Paging

Paging
Thursday, December 15, 2016

10:24 PM

Page Table

- Register:
Page-table base register (PTBR): Page table location in mem
Page-table length register (PTLR): Indicate page table's no of entries

Overview

- Divide physical mem into fixed-size block Frame


Frame size = 2n (512B 16MB)
- Divide logical mem into fixed-size block Page
Usually: Page size = Frame size
- Program of size N pages need N free frames
Page size = 2048 B
Process size = 72766 B
72766 / 2048 = 35, remain 1086
Need 36 pages
Internal fragmentation = 2048 - 1086 = 962

- Every data/instruction access requires 2 mem access:


Page table Data/Instruction
- Translation look-aside buffer (TLBs):
Portion of page table, stored in cache Efficiency
Small (64 - 1024 entries)
Some entries wired down for permanent access
Entries for key kernel code

- Effective Access Time (EAT): Avg time needed to access physical mem addr
EAT = (TLB Search Time + Mem Access Time)
+ (1 - ) (TLB Search Time + 2 Mem Access Time)
: Hit ratio: Prob of finding page no in TLB

- Logical addr format: (Page no, Offset)


Page no (p): Index into page table

Page size = 2n
Logical space size = 2m
No of page table = 2m - n

Hierarchical Paging

- Break up logical addr space into multiple-lvl page table

Memory Protection

- Protection bit:
Associated with each frame
Indicate read/write access
- Valid-invalid bit
Associated with each page table entry
Meaning:
Valid: Page in proc's logical addr space Legal page
Invalid: Page not in proc's logical addr space

- Logical addr format: (p1, p2, d)

- Violation Trap to kernel

- Motivation: Dont always need all logical addr space Don't need to create
intermediate-lvl page table at same time

Real life Paging Architecture

- Intel IA-32:

COMP 3511 Page 25

Page Size

- Page Size: Page Fault, Internal


fragmentation
- Multiple page size: Allow app to use suitable
page size to them

- Intel x86-64:
Only implement 48 bit addressing
4 lvl paging hierarchy

- ARM:
1 lvl paging for section
2 lvl paging for smaller pages
2 lvl of TLBs:
Outer lvl: 2 micro TLB (1 data + 1 instruction)
Inner: 1 TLB

COMP 3511 Page 26

Virtual Memory Management


Thursday, December 15, 2016

11:13 PM

Virtual Memory Overview

- Motivation: Separation of logical mem from physical mem


Support logical addr space > Physical addr space
Prog size not constraint by physical mem

Allow more program to run concurrently


- Proc's virtual addr space:
- Heap: Grow upward, used
for dynamic mem allocation
- Stack: Grow downward,
used for successive function
calls
- Blank space between heap
& stack:

Demand Paging

- Lazy swapper principle: Page only brought from disk to mem when needed

Part of virtual addr


space

Physical mem consumption

Only require actual


physical space if
heap/stack grow

More procs can run concurrently

No unnecessary I/O

- Valid-invalid bit: Indicate whether page in mem


v (Valid): In mem
i (Invalid): Not in mem
- Pager: Responsible for swapping page in/out mem

- Handling page fault (page not currently in mem):


- Shared library: Each proc consider shared libraries as part of its
virtual addr space

OS look at another table (usually in PCB) to decide reference is valid/invalid:


Invalid: Abort
Valid & Not in mem: Continue

Get empty frame


Swap page into frame via scheduled disk operation
Reset table to indicate page now in mem
Set Validation bit = v
Restart instruction causing page fault
- Page replacement:

Occur when no free frame avail


Swap out page in mem not really in use
Modify (dirty) bit: Indicate whether page modified Avoid unnecessary
page transfer
- Demand paging performance:
p = Page fault rate
Effective Access time (EAT) =

Prepaging

(1 - p) Memory Access
+ p Avg Page-Fault Service Time

- Prepage all/some pages before proc actually reference it

Large page faults at proc startup


- However: Prepage unused I/O + mem waste

Avg Page-Fault Service Time =

Page Fault Overhead


+ Swap Page out + Swap Page in
+ Restart Overhead

s pages prepaged
of these pages used (0 < < 1)
Cost of saving page fault s
Cost of prepaging unnecessary pages s (1 - )

0: Prepaging lose

COMP 3511 Page 27

Page Replacement
Thursday, December 15, 2016

12:01 PM

Algorithm

Usage

FIFO (First-infirst-out)

- Replace oldest page

Illustration

- FIFO queue: Can used to track page ages


- Beladys Anomaly: Available frame Page faults

OPT (Optimal)

- Replace page won't used for longest period of time


- No Beladys anomaly
- Required future knowledge Impossible to implement

LRU (LeastRecently used)

- Replace page that has not used in most amount of time


- Implementation:
Counter:
Page entry has counter
Page referenced: Counter Clock time
Replacement needed: Find page with min counter value
Expensive search, inexpensive update
Stack:

Keep double-linked list of page numbers


Page referenced: Move to beginning of list
Expensive update, no search needed
- No Beladys anomaly
Second chance - Each page has reference bit:
(LRU
Initially, 0
Approximation)
When get referenced: Set to 1
- When replacement needed: Iterate through all pages:
If reference bit = 0: Replace
If reference bit = 1: Set to 0, continue

- Implementation: FIFO queue + Hardware-provided reference bit

Counting

- Keep counter of no of reference made to each page:

LFU (Least Frequently Used) Algorithm: Replace page with min count
MFU (Most Frequently Used) Algorithm: Replace page with max count

- Not commonly used

COMP 3511 Page 28

Thrashing

Frame Allocation
Thursday, December 15, 2016

3:10 PM

- Proc not have enough pages

Having to frequently swap page in/out


High page-fault rate
Frame Allocation Overview

- Purpose: Allocate frame to proc when needed

- Approach:
Equal allocation: Each proc get equal no of frame
100 frame, 5 proc
Each proc get 20

Proportional allocation: Allocate proportionately to proc size


si = Size of proc i

- Reason:
Proc migrates between localities

S = si
m = Total no of frames

Size of locality > Total mem size

Allocation for pi:


Priority allocation:
Allocate proportionately based on proc priority

- Can limit effects by local/priority page replacement: Prevent


proc from stealing others' frame, causing thrashing to others

If proc P get page fault:


Replace 1 of its own frames, OR
Replace frame taken from proc with lower priority

Working-Set Model

- Global vs Local replacement:


Global

Local

Proc select replacement frame


from set of all frames

Each proc select replacement frame


from own frame set

May provide better sys


throughput

Per-proc performance more


consistent

Exec time can vary greatly:


Proc can't control own pagefault rate

Can underutilize mem: Page preallocated to proc not used

- m: Max no of frame provided by physical mem


- = Working-set window

10000 instruction
- WWSi (Working set of proc i) = No of pages referenced in most recent
too small: Not encompass entire locality

Page-Fault Frequency

too large: Encompass several localities

- Use local replacement policy


- Establish "acceptable" page-fault rate:
Current rate too low: Proc lose frame
Current rate too high: Proc gain frame
Prevent thrashing

= : Ecompass entire prog


- D = WSSi: Total demand frames (Approx of current locality in whole sys)

- D > m: Thrashing ( 1 proc lack mem)


- Working set strategy: If D > m: Suspend/Swap out 1 proc
Prevent thrashing
Maintain high degree of multiprogramming
- Implementation: Approx working set with timer + reference bit

= 10000
Timer interrupt after every 5000 units
2 ref bit for each page: x1, x2
Page referenced: Set x1 = 1

Whenever timer interrupt, copy all x1's to x2's, set all x1's to 0
If x1 = 0 or x2 = 0: Page in working set

Bits used Accuracy, Cost

- TLB Reach = TLB Size Page Size:


Ideally: working set of each proc stored in TLB

Otherwise: high page fault

COMP 3511 Page 29

File System
Friday, December 16, 2016

File Access Method


12:31 AM

- Sequential access:
Read next

File Overview

Write next
Reset
No read after last write (rewrite)

- Attribute:
Name, Type, Size

- Direct access (Consider file as set of fixed length record)


Read n

Time

User identification
Identifier: Unique tag (number) to identify files within file sys
- Structure:
None: Just byte/word sequence
Simple record

Line, fixed/variable length record


Complex
Formatted document, relocatable exec file

- Operation:
Create/Read/Write/Delete/Truncate

Write n
Position to n
Read next
Write next
Rewrite n

(n = Relative block number)


- Others:
Built on top of direct-access
File indexing: Fast determination of data location

Seek (Reposition within file)


Open(F):

Search directory structure on disk to entry F


Move F's content to mem
Prepage F for subsequent access
Close(F): Move F's content from mem to directory structure on
disk

- OS's open file management:


Open file table (system-wide + per-proc): Track open files

File Sharing

- Multi-user system: Per user permission, per group

- Remote file system:


Use networking to access file
Manually: FTP

File pointer (per-proc): Pointer to last read/write location


File open count:
Count no of procs currently using file
Allow removal of data from open-file table when last proc
close file

Automatically: Distributed file sys


Distributed Information System: Implement unified access
to information needed for remote computing
Active Directory, NIS, LDAP,

Disk location of file


Access right (per proc)
- Open file locking:
Shared reader lock, exclusive writer lock
Mandatory lock: OS denies access depending on locks held &
requested
Windows

Advisory: Proc can find lock status & decide what to do


(programmer decide)

Access List and Group on UNIX

- Access mode: Read/Write/Execute


- User class:

UNIX

- Ask manager to create group, then add some users to group


- Attach group to file

Disk structure

chgrp G game
- Disk divided into partition, each partition can protected against
failure using RAID

(G: group name, game: filename)

- Define access for file, subdirectory


- Volume:
Entity containing file sys
Track file sys info in: Device directory, Volume's table of
contents

COMP 3511 Page 30

File System Mounting

File sys must mounted before accessed

Directory Structure

- Collection of nodes containing file info


- Reside on disk
- Operation supported:
Search/Create/Delete/Rename file
List directory
Traverse file sys
Structure type

Illustration

Pros

Cons

Single-lvl

- Single directory for all users - Naming problem: Different user can't
create directory with same name
- Very simple
- Grouping problem: No subdirectory

Two-lvl

- Can have same filename


under different users
- More efficient search than
single-lvl directory

Tree-Structured

- Efficient search
- Group capability

Acyclic-Graph

Can share subdirectories &

COMP 3511 Page 31

- Limited-grouping capability

Deletion might lead to dangling

Acyclic-Graph

- Can share subdirectories &


files

- Deletion might lead to dangling


pointer (pointing to empty/wrong
files)
- Link: Another name (pointer) - Need to ensure no cycle:
to existing file
Allow links to file, not
subdirectories
- Resolve link: Follow pointer
to file
Link added: Use cycle detection
algorithm to detect cycle

General Graph

Allow cycle

COMP 3511 Page 32

File System Implementation


Friday, December 16, 2016

8:38 PM

File System Organization

- In-mem structure:
Mount table: Info about mounted volumes

Role of Disk

Directory-structure cache: Info about recently access directories


- Disk provide most of secondary storage where file sys
maintained:
Disk can rewritten in place (each single block
support read/write)
Disk can directly access any block

Sys-wide open-file table: FCB copy of open files

Per-proc open-file table: Pointers to entries in sys-wide open-file table


Buffer: File block read/write to disk
- On-disk structure

Simple to access file sequentially, randomly


- I/O mem Disk transfer performed in blocks I/O
efficiency

File System Organization

Managemen Info stored


t unit

UFS features

NTFS features

Boot
control
block

Per volume

- Info needed by sys to boot Boot block


OS from volume
- Usually first block on volume
(if exist)

Partition boot
sector

Volume
control
block

Per volume

- Total no of blocks

Stored in master
file table

Superblock

- No of free blocks
- Block size
- Pointers
- Free FCBs count & pointers

Layer

Roles

(high low)
App prog
Logical file sys - Metadata: All file sys structure except file content
- File Control Block (FCB): Info about file (ownership,
permission, file content location on disk)

Directory Per file sys


structure

File organization

- File name

File
control
block

File details

- inode no

Per file

Stored in master
- Associate inode file table
(FCB) no
- Permission
- Size

- Date

- Manage directory structure


- Provide info needed by file organization module
- Translate file name file identifier/handle/location
through FCB

File
organization
module

- Translate logical physical block addr


- Pass physical block addr to basic file sys for data transfer

- Manage free disk space, disk block allocation


Basic file sys

- Issue command to device driver Read/Write physical


block to disk

File Operation Implementation

- Open file:

Retrieve block 123


- Manage mem buffer:

Hold data during mem disk transit


Buffer mem allocated before real disk block transfer
occur

- Cache frequently used file sys metadata, file block


I/O control

- Manage I/O device through device driver, interrupt


handler
- Transfer data between mem disk
Read drive 1, cylinder 72, track 2, sector 10

Device

Pass filename to logical file sys

If file already in use (by other proc): Per-proc open-file table entry
created, point to existing system-wide open-file table
Otherwise: Search directory structure:
File found: Copy FCB into sys-wide open-file table in mem
- Read file:

Open call return pointer to entry in per-proc open-file table


All file operation performed via this pointer
UNIX: File descriptor, Window: File handle
Data read, copied to specified user proc mem addr

Directory Management

- Structure management:
Linear list

Hash table

List of filenames, with pointer to data blocks

COMP 3511 Page 33

Linear list store directory entries

Stored in master
file table using
relation database
(1 row/file)

List of filenames, with pointer to data blocks

- Linear list store directory entries


- Hash table entry: Key = Filename, Value = Pointer to data blocks

Simple to implement

Time to search file

Search file in linear time

Collision: 2 different filenames hash to same location

- Disk block allocation:

Method

Description

Pros

Cons

Contiguous

Each file occupy contiguous disk blocks

Simple to manage: FCB only need to save start


location & length

- File size:

Difficult to grow
Must known at time of file creation
Overestimated Internal fragmentation
- Finding free space for new file:

Time consuming
Common strategies: Best fit, first fit (more
efficient than worst fit)

High cost of compaction for large disk

Extent-based - Extent: Contiguous set of disk blocks


(Modified
- File = Set of linked-list extents
contiguous
(Each extent has link to next one)
allocation)

Linked

File = Set of linked-list disk blocks, end at null pointer


FAT:

- Internal fragmentation: Extent too large


- External fragmentation: Extent can have various
size
- No compaction needed

- Direct access: Inefficient

- No external fragmentation

- Extra disk space required for pointers

- File size can grow easily

- Reliability: When pointers lost/damaged

Disk section at beginning of volume: Set aside for


FAT table

Directory entry contain block no of first block of file


Table entry (indexed by block no) contain block no of
next block in file
Table entry for last block: Special end-of-file value
Cache FAT table in mem Improve random access

Indexed

- Index block:

Special location, store array of pointers


Type:
Direct block (normal): 1 disk block, pointers
refer directly to real data block

- No external fragmentation (Pointer can point Wasted space: Indexed's pointer overhead >
to any free disk block)
Linked's overhead
- Support direct access

Small file (1, 2 disk block)


Indexed: Still need entire index
block
Linked: Only need space for 2 pointers

Indirect, Linked scheme: 2 index blocks


together
Indirect, Multi-lvl scheme: 1st lvl index block
2nd lvl index block File block

Combined scheme: Direct block for small files,


indirect blocks for large file
- Each file has own index block

Free space Management

Method

Description

Bit vector

Pros

Cons

- Simple

Inefficient if entire vector can't


kept in mem

- Efficient: Find first free block,


first n consecutive free blocks

COMP 3511 Page 34

Linked-list

- Linked-list of free disk blocks

- Easy: Locate single free block

- Cache pointer to first free block in mem

Grouping (Modify Store addr of n free blocks in first free block:


linked-list)
First (n - 1) of these blocks actually free
Last block: Addr of another n free blocks
Counting

Keep entry (Addr of first free block, Count of following


contiguous free blocks)

COMP 3511 Page 35

Can find large no of free blocks


quickly

- Uneasy: Obtain contiguous set


of free blocks
- Inefficient: Travel entire list (but
this is not frequent action)

Mass Storage System


Sunday, December 11, 2016

Solid-State Disk

Magnetic Tape

1:49 AM

- Non-volatile mem, used like hard


drive
- Faster access speed (no moving
parts)
- More reliable than HDDs

Magnetic Disk

- Structure:

- Storage cost per MB more


expensive
- Maybe have shorter life span

- Relatively permanent
- Can hold large quantities of data
- Mainly used for backup, storage of
infrequently-used data
- Very slow access time ( 1000 slower
than disk)

Disk structure

- Disk drive
Large 1D arrays of logical blocks
Sectors, sequentially
(sector 0: 1st sector of 1st track on outermost cylinder)

Platter: Circular plate storing data


Track: Circular path on platter, formed by consecutive platter
Sector: Subdivision of track on platter, store fixed data amount

- Mapping order: Outermost Innermost, Track Cylinder

Cylinder: Vertical collection of corresponding tracks on all platters

Arm: Hold read write head


Read write head: Device for reading/writing data to disk
Disk Attachment

- SCSI:

Bus, max 16 devices/cable


SCSI initiator (controller card) requests operation

SCSI targets (devices) perform task


- Storage Array:
Attach array of disks
Ports to connect hosts to array

- Metrics:

Memory, controlling software (NVRAM, )

Seek time: Move disk arm to desired cylinder


- Storage Area Network (SAN):
Rotational latency = 60 / RPM (s)

Rotate desired sector to under disk head


Avg rotational latency = 1/2 60 / RPM (s)

Random-Access (Positioning) Time


= Seek time + Rotational latency
Transfer rate: Rate of data flowing between drive computer
mem
Multiple hosts attached to multiple storage arrays
Avg I/O time

Physical medium: Fiber channel switches

= Avg access time


+ (Transfer amount/Transfer rate)

Storage made available via LUN (Logical Unit Number)


masking from specific arrays to specific servers

+ Controller overhead

Easy to add/remove storage

Transfer 4KB on 7200RPM disk, 5ms avg seek time, 1GB/sec


transfer rate, 0.1ms controller overhead

- Network-Attached Storage (NAS):


Storage made available over network rather than local
connection
Implemented via remote procedure call (RPCs) between host
and storage over TCP/UDP

Avg I/O time

Protocol: NFS, CIFS


iSCSI protocol: Carry SCSI protocol over IP

=9.27 ms

- Head crash: Result from disk head making contact with disk structure

Disk Scheduling
200 cylinder 0 199

Disk Management

COMP 3511 Page 36

Disk Scheduling
200 cylinder 0 199
Queue = 98, 183, 37, 122, 14, 124, 65, 67

Disk Management

Head starts at 53
- Low-lvl (Physical) format:
Algorithm Explanation

Divide disk into sectors that disk controller can read/write:

Illustration

Each hold header info, data, error correction code (ECC)

FCFS
(First
come first
serve)

- Logical (File sys) format:


Partition disk into 1 groups of cylinders, each treated as logical disk
Efficiency: Group blocks into clusters:
Disk I/O in blocks
File I/O in clusters

- RAID (Redundant Array of Inexpensive Disks):


2 disk drives, provide reliability via redundancy

Total distance = (183 - 53) + (183 - 37) + (122 - 37) + (122 14) + (124 - 14) + (124 - 65) + (67 - 65) = 640

Help recover from disk failure, but not prevent/detect data corruption

- Select request with min


SSTF
(Shortest
seek time from current
Seek Time head pos
First)

Techniques:
Parity: Re-generate lost content from parity-saved info elsewhere

Stripe: Store data randomly to multiple disk (no full data in single disk)
Mirror: Store copies of data on multiple disk

- Issues:
Starvation

Hot-spare disk: Disk left unallocated, automatically replace failed disk & have data
rebuilt onto them

Calculation
overhead

RAID lvls:
Head movement: 53, 65, 67, 37, 14, 98, 122, 124, 183
Total distance

= (67 - 53) + (67 - 14) + (183 - 14) = 236


SCAN

- Head starts at 1 end,


services requests until it
gets to other ends
- Then, head movement
reversed, servicing
continues

Lvl
0

Illustration

Properties
- Blocks striped
Excellent
performance
- No mirror, No parity
No redundancy

- Perform well for systems


placing heavy load on
disk

- Blocks mirrored
Excellent redundancy
- No striping, no parity

Assume head starts from 199

Good performance

Head movement: 53, 37, 14, 0, 65, 67, 98, 122, 124, 183
Total distance = (53 - 0) + (183 - 0) = 236
LOOK

- Similar to SCAN

Assume head starts from 0

Head movement: 53, 65, 67, 98, 122, 124, 183, 37, 14
- Except head only goes as Total distance = (183 - 53) + (183 - 14) = 299
far as last request in
each direction, then
reverses immediately
C-SCAN

- Bit lvl striping


- 1 data disk group
- 1 ECC disk group

- Head moves from 1 end


to other, servicing
request as it goes
- When reaching other
end: Immediately returns
to beginning, without
servicing any requests on
return trip
- Provide more uniform
wait time than SCAN
- Perform well for systems
placing heavy load on
disk

C-LOOK

- Byte lvl striping

- 2 data disks
- 1 parity disk
Assume head starts from 0
Head movement: 53, 65, 67, 98, 122, 124, 183, 199, 0, 14, 37
Total distance = (199 - 53) + (199 - 0) + (37 - 0) = 382

- Similar to LOOK

- Block lvl striping


- 2 data disks

- Except head only goes as


far as last request in
each direction, then
reverses immediately

- 1 parity disk

Assume head starts from 0


Head movement: 53, 65, 67, 98, 122, 124, 183, 14, 37
Total distance = (183 - 53) + (183 - 14) + (37 - 14) = 322

- Block striped
Good performance
- Distributed parity

Good redundancy

COMP 3511 Page 37

Assume head starts from 0


Head movement: 53, 65, 67, 98, 122, 124, 183, 14, 37

- Block striped
Good performance
- Distributed parity

Total distance = (183 - 53) + (183 - 14) + (37 - 14) = 322

Good redundancy

- Block striped
- 2 distributed parities (2
parity block for each data
block)

0+1

Mirror of stripes

1+0

Stripe of mirrors

- ZFS:

Adds checksums of all data & metadata


Checksum not kept with block, but stored with pointer to that block
Detect/Correct block corruption
No partitions, disk allocated in pools

COMP 3511 Page 38

S-ar putea să vă placă și