Sunteți pe pagina 1din 26

LINUX CPU SCHEDULING

Objectives

At the end of this session you can


Name

two scheduling algorithms introduced in 2.6 Linux kernel. Name major steps in these algorithms

Linux scheduler history

Linux 2.6 O(1) Scheduler

Reducing scheduling algorithm complexity to O(1) from O(n). Better support for SMP system.
Single

runqueue lock Cache problem

Preemptive: A higher priority process can preempt a running process with lower priority

SMP Support in 2.4 and 2.6 versions


2.4 Kernel 2.6 Kernel

CPU1

CPU2

CPU3

CPU1

CPU2

CPU3

The Linux 2.6 scheduler runqueue structure

Linux scheduler

140 priority levels


The

lower the value, higher is the priority Eg : Priority level 110 will have a higher priority than 130.

Two priority-ordered 'priority-arrays' per CPU


'Active'

left 'Expired' Array : tasks which have run Both accessed through pointers from perCPU runqueue

array : tasks which have timeslices

They are switched via a simple pointer

Scheduling policy

140 Priority levels


1-100 : RT prio ( MAX_RT_PRIO = 100 ) 101-140 : User task Prio ( MAX_PRIO = 140 )

Three different scheduling policies


One for normal tasks Two for Real time tasks

Normal tasks
Each task assigned a Nice value PRIO = MAX_RT_PRIO + NICE + 20 Assigned a time slice Tasks at the same prio are round-robined.

Ensures Priority + Fairness

Priority and interactivity effect on timeslice

Nice Value vs. static priority and Quantum

Dynamic priority

Dynamic priority is calculated from static priority and average sleep time Roughly speaking, the bonus is a number in [0, 10] that measures what percentage of the time the process was sleeping recently; 5 is neutral, 10 helps priority by 5, 0 hurts priority by 5 DP = max (100,min(SP bonus + 5, 139))

Interactivity

Dynamically scales a tasks priority based on its Interactivity Interactive tasks receive a prio bonus [ -5 ]

Hence a larger timeslice

CPU bound tasks receive a prio penalty [ +5 ] Interactivity estimated using a running sleep average.
Interactive tasks are I/O bound. They wait for events to occur. Sleeping tasks are I/O bound or interactive !! Actual bonus/penalty is determined by comparing the sleep average against a constant maximum sleep average.

Recalculation of priorities

When a task finishes it's timeslice :


It's

interactivity is estimated Interactive tasks can be inserted into the 'Active' array again. Else, priority is recalculated Inserted into the NEW priority level in the 'Expired' array.

Scheduling in Linux

The scheduler selects the next process to be assigned to the CPU based on process priority. In a high-level C program the nice value can be modified using the following functions:
int

getpriority(int which, id_t who); int setpriority(int which, id_t who, int value); int nice(int incr);

Parameters

which: Specifies the type of target. Can be one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER. who: Is the target of the setpriority() request; a process ID, process group ID, or user ID, respectively, depending on the value of which. A value of 0 indicates that the target is the current process, process group, or user. value: Is the new nice value for the process. Values in the range [-20, 19] are valid; values outside that range are silently clipped to this range.

Nice value

Varies between[-20,19] Lower value, higher priority The current process needs super user privilege to lower the nice value. But, it can lower the priority.

Nice value vs. Win32 Priority


nice value -15 to -6 -5 to +4 Win32 Priority -20 to -16 THREAD_PRIORITY_HIGHEST THREAD_PRIORITY_ABOVE_NOR MAL THREAD_PRIORITY_NORMAL

+5 to +14 THREAD_PRIORITY_BELOW_NOR MAL +15 to +19 THREAD_PRIORITY_LOWEST

Linux 2.6 CFS Scheduler


Was merged into the 2.6.23 release. Uses red-black tree structure instead of multilevel queues. Tries to run the task with the "gravest need" for CPU time

Red-Black tree in CFS

Red-Black tree properties


Self Balance Insertion and deletion operation in O(log(n))


With

proper implementation its performance is almost the same as O(1) algorithms!

Red-Black tree demo

http://www.ece.uc.edu/~franco/C321/ht ml/RedBlack/redblack.html

struct cfs_rq {/* Defined in 2.6.23:kernel/sched.c */ struct sched_entity { /*Defined in 2.6.23:/usr/include/linux/sched.h */ struct task_struct { /* Defined in 2.6.23:/usr/include/linux/sched.h */

Appendix

struct sched_class
struct sched_class { /* Defined in 2.6.23:/usr/include/linux/sched.h */ struct sched_class *next; void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep); void (*yield_task) (struct rq *rq, struct task_struct *p); void (*check_preempt_curr) (struct rq *rq, struct task_struct *p); struct task_struct * (*pick_next_task) (struct rq *rq); void (*put_prev_task) (struct rq *rq, struct task_struct *p); unsigned long (*load_balance) (struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_nr_move, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, int *all_pinned, int *this_best_prio); void (*set_curr_task) (struct rq *rq); void (*task_tick) (struct rq *rq, struct task_struct *p); void (*task_new) (struct rq *rq, struct task_struct *p); };

enqueue_task: When a task enters a runnable state, this function is called. It puts the scheduling entity (process) into the red-black tree and increments the nr_running variable. dequeue_task: When a task is no longer runnable, this function is called to keep the corresponding scheduling entity out of the red-black tree. It decrements the nr_running variable. yield_task: This function is basically just a dequeue followed by an enqueue, unless the compat_yield sysctl is turned on; in that case, it places the scheduling entity at the right-most end of the red-black tree. check_preempt_curr: This function checks whether the currently running task can be preempted. The CFS scheduler module does fairness testing before actually preempting the running task. This drives the wakeup preemption. pick_next_task: This function chooses the most appropriate process eligible to run next. load_balance: Each scheduler module implements a pair of functions, load_balance_start() and load_balance_next() to implement an iterator that gets called in the load_balance routine of the module. The core scheduler uses this method to load-balance processes managed by the scheduling module. set_curr_task: This function is called when a task changes its scheduling class or changes its task group. task_tick: This function is mostly called from time tick functions; it might lead to process switch. This drives the running preemption. task_new: The core scheduler gives the scheduling module an opportunity to manage new task startup. The CFS scheduling module uses it for group scheduling, while the scheduling module for a real-time task does not use it.

S-ar putea să vă placă și