Documente Academic
Documente Profesional
Documente Cultură
Objectives
two scheduling algorithms introduced in 2.6 Linux kernel. Name major steps in these algorithms
Reducing scheduling algorithm complexity to O(1) from O(n). Better support for SMP system.
Single
Preemptive: A higher priority process can preempt a running process with lower priority
CPU1
CPU2
CPU3
CPU1
CPU2
CPU3
Linux scheduler
lower the value, higher is the priority Eg : Priority level 110 will have a higher priority than 130.
left 'Expired' Array : tasks which have run Both accessed through pointers from perCPU runqueue
Scheduling policy
Normal tasks
Each task assigned a Nice value PRIO = MAX_RT_PRIO + NICE + 20 Assigned a time slice Tasks at the same prio are round-robined.
Dynamic priority
Dynamic priority is calculated from static priority and average sleep time Roughly speaking, the bonus is a number in [0, 10] that measures what percentage of the time the process was sleeping recently; 5 is neutral, 10 helps priority by 5, 0 hurts priority by 5 DP = max (100,min(SP bonus + 5, 139))
Interactivity
Dynamically scales a tasks priority based on its Interactivity Interactive tasks receive a prio bonus [ -5 ]
CPU bound tasks receive a prio penalty [ +5 ] Interactivity estimated using a running sleep average.
Interactive tasks are I/O bound. They wait for events to occur. Sleeping tasks are I/O bound or interactive !! Actual bonus/penalty is determined by comparing the sleep average against a constant maximum sleep average.
Recalculation of priorities
interactivity is estimated Interactive tasks can be inserted into the 'Active' array again. Else, priority is recalculated Inserted into the NEW priority level in the 'Expired' array.
Scheduling in Linux
The scheduler selects the next process to be assigned to the CPU based on process priority. In a high-level C program the nice value can be modified using the following functions:
int
getpriority(int which, id_t who); int setpriority(int which, id_t who, int value); int nice(int incr);
Parameters
which: Specifies the type of target. Can be one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER. who: Is the target of the setpriority() request; a process ID, process group ID, or user ID, respectively, depending on the value of which. A value of 0 indicates that the target is the current process, process group, or user. value: Is the new nice value for the process. Values in the range [-20, 19] are valid; values outside that range are silently clipped to this range.
Nice value
Varies between[-20,19] Lower value, higher priority The current process needs super user privilege to lower the nice value. But, it can lower the priority.
Was merged into the 2.6.23 release. Uses red-black tree structure instead of multilevel queues. Tries to run the task with the "gravest need" for CPU time
http://www.ece.uc.edu/~franco/C321/ht ml/RedBlack/redblack.html
struct cfs_rq {/* Defined in 2.6.23:kernel/sched.c */ struct sched_entity { /*Defined in 2.6.23:/usr/include/linux/sched.h */ struct task_struct { /* Defined in 2.6.23:/usr/include/linux/sched.h */
Appendix
struct sched_class
struct sched_class { /* Defined in 2.6.23:/usr/include/linux/sched.h */ struct sched_class *next; void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep); void (*yield_task) (struct rq *rq, struct task_struct *p); void (*check_preempt_curr) (struct rq *rq, struct task_struct *p); struct task_struct * (*pick_next_task) (struct rq *rq); void (*put_prev_task) (struct rq *rq, struct task_struct *p); unsigned long (*load_balance) (struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_nr_move, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, int *all_pinned, int *this_best_prio); void (*set_curr_task) (struct rq *rq); void (*task_tick) (struct rq *rq, struct task_struct *p); void (*task_new) (struct rq *rq, struct task_struct *p); };
enqueue_task: When a task enters a runnable state, this function is called. It puts the scheduling entity (process) into the red-black tree and increments the nr_running variable. dequeue_task: When a task is no longer runnable, this function is called to keep the corresponding scheduling entity out of the red-black tree. It decrements the nr_running variable. yield_task: This function is basically just a dequeue followed by an enqueue, unless the compat_yield sysctl is turned on; in that case, it places the scheduling entity at the right-most end of the red-black tree. check_preempt_curr: This function checks whether the currently running task can be preempted. The CFS scheduler module does fairness testing before actually preempting the running task. This drives the wakeup preemption. pick_next_task: This function chooses the most appropriate process eligible to run next. load_balance: Each scheduler module implements a pair of functions, load_balance_start() and load_balance_next() to implement an iterator that gets called in the load_balance routine of the module. The core scheduler uses this method to load-balance processes managed by the scheduling module. set_curr_task: This function is called when a task changes its scheduling class or changes its task group. task_tick: This function is mostly called from time tick functions; it might lead to process switch. This drives the running preemption. task_new: The core scheduler gives the scheduling module an opportunity to manage new task startup. The CFS scheduling module uses it for group scheduling, while the scheduling module for a real-time task does not use it.