Sunteți pe pagina 1din 47

Theory of System

Administration
DANSS Seminar
Feb 23rd , 2003
Elliot Jaffe
Outline
 What is System Administration
 Problems in System Administration
 Theory overview
 Results
 Research directions

Feb. 23, 2003 Danss - Theory of SysAdmin 2


What is System Administration?
What do you think?

Feb. 23, 2003 Danss - Theory of SysAdmin 3


What is System Administration
In computer technology, a set of functions
that provides support services, ensures
reliable operations, promotes efficient use
of the system, and ensures that prescribed
service-quality objectives are met.

Synonym system management.


US Federal Standard 1037C

Feb. 23, 2003 Danss - Theory of SysAdmin 4


System Administration is
The function that provides:

Reliability – Stable, consistent service

Efficiency – Performance

Predictability – Service Level Agreement

Feb. 23, 2003 Danss - Theory of SysAdmin 5


CS HUJI System Administration
 Infrastructure
 Operating Systems
 Networking
 Account Administration
 Software Licensing, Installation and Support
 Education

Feb. 23, 2003 Danss - Theory of SysAdmin 6


What you don’t see
 Budgets
 Cost Benefit Analysis
 Vendor Selection
 Service Contracts
 Long term planning
 Policy creation

Feb. 23, 2003 Danss - Theory of SysAdmin 7


Problems in Sys Admin

Strategic

Tactical

Feb. 23, 2003 Danss - Theory of SysAdmin 8


Strategic Problems
 Economic costs/benefit analysis
 How much disk space should be purchased in
the next year?
 Should we buy a one new router, or do we
need a fail-over pair?
 If we get %25 additional students, what
resources will we need?

Feb. 23, 2003 Danss - Theory of SysAdmin 9


Strategic Problems #2
 What is the right level of disk space
quotas?
 Should we use a VLAN to localize network
traffic?

Feb. 23, 2003 Danss - Theory of SysAdmin 10


Tactical Problems
 What is the best way to maintain multiple
systems?
 How do we apply patches?
 How should we rollout an OS change?
 How do we support multiple configurations?
 How many configurations should we support?
 How do we use version control part of system
administration?

Feb. 23, 2003 Danss - Theory of SysAdmin 11


A complete theory should
enable
 Policy determination and evaluation
 Strategic decisions about resource usage and
allocation
 Interactions between users and system for resources
 Productivity considerations (economics of the system)

 Empirical verification of strategies and policies


 Efficiency of policy and its implementation
 Efficiency of the system in doing its job

Feb. 23, 2003 Danss - Theory of SysAdmin 12


Theory of System
Administration

A group of computers is an evolving,


stochastic system viewable at multiple
levels of detail.

Feb. 23, 2003 Danss - Theory of SysAdmin 13


Configuration Space
 The memory state of the computer
 The set of bits that define the computer
state.

 Example:
 The state of the bits in primary memory and
on secondary media (disks)

Feb. 23, 2003 Danss - Theory of SysAdmin 14


Time
 Time is a discrete value.
 For averaging purposes, we allow it to take on
real values.

 Example:
 The system clock is discrete, having values as a
multiple of the clock speed Tc.
 t=0, Tc, 2Tc,…,nTc

Feb. 23, 2003 Danss - Theory of SysAdmin 15


Configuration
 A pattern of values associated with each
point on the configuration space.

 Example:
 The state of all bits in main memory at time t.
 This pattern changes over time.

Feb. 23, 2003 Danss - Theory of SysAdmin 16


Averaging
 Over time scales much larger than Tc, the
average properties of the system can be
treated as a continuum approximation, i.e.
as real functions of time.

 Example:
 The number of non-zero bits at any real value
of time.

Feb. 23, 2003 Danss - Theory of SysAdmin 17


Scales
 Transition from low- Level Example
level to high-level 6 LANS
 Group objects 5 Users, VMs
together to form new
4 Files
objects
 Refer to state of 3 int, float, char
object over time 2 bytes, words
1 bits

Feb. 23, 2003 Danss - Theory of SysAdmin 18


Closed Dynamical Systems
 A closed dynamical system consists of a
configuration space, an initial configuration and
a rule for subsequent time development
 Closed dynamical systems are deterministic
 Example:
A standalone computer without any external input
is a closed dynamical system

Feb. 23, 2003 Danss - Theory of SysAdmin 19


Interactions
 An interaction between two systems is an
endomorphism on the combined systems
such that both systems determine the time
developments of one another.

 Example:
 Two standalone computers connected via a
network and synchronizing system times.

Feb. 23, 2003 Danss - Theory of SysAdmin 20


Environment
 An ensemble of mutually interacting
systems.

 Example:
A user interacting with a computer.
 People are not standalone!

Feb. 23, 2003 Danss - Theory of SysAdmin 21


Open Dynamical System
 Projection of an ensemble of interacting
systems onto the state of a given system.
 The configuration state of an open system
is unpredictable over any interval dt ~ Tc.

 Does this mean that all is lost?

Feb. 23, 2003 Danss - Theory of SysAdmin 22


Stability
 Assume that there exists some time scale
on which it is possible to predict the
average state of the systems in question.

 We are not interested in managing


systems which cannot achieve a minimal
level of stability, since these system
cannot perform any reliable function.

Feb. 23, 2003 Danss - Theory of SysAdmin 23


Multiple Time Scales
 Short term:
 Tc the computer clock
 Medium term:
 human time > 107 Tc
 Long term:
 months and years > 107 human time

Feb. 23, 2003 Danss - Theory of SysAdmin 24


Components of System State
 The state of a system
at any given time is
composed of a slowly
varying local average
and a rapidly
State
fluctuating stochastic
remainder.
 Are these systems Time
stable?

Feb. 23, 2003 Danss - Theory of SysAdmin 25


Tasks
 A task is a representation of an
autonomous process executed on related
sets of state.

A task is closed if after execution, it returns the


system to the original state.
 A task is open if after execution, it has changed
the overall system state.

Feb. 23, 2003 Danss - Theory of SysAdmin 26


Maintenance Tasks
 A maintenance tasks is a task which
reduces the total rate of change of the
average configuration state.

 Example:
 Deletion of accumulated garbage

Feb. 23, 2003 Danss - Theory of SysAdmin 27


Policy
 A policy is an average specification of
equivalent system behaviors.
 A set of system states that are equivalent
over the given time period.

 A policy is neither good nor bad. It does


not necessarily lead to stability or chaos.

Feb. 23, 2003 Danss - Theory of SysAdmin 28


Policy - Examples
 Users are restricted to a known quota of
file system space.
 All computers must run Microsoft Office.
 Only port 80 will be open on network
servers.
 SSH will be used for all remote computer
access.

Feb. 23, 2003 Danss - Theory of SysAdmin 29


Convergence
 A convergent average policy is one whose
tasks result in an equivalent configuration
for all sufficiently large time scales.
 A convergent average policy is one whose
average behavior in time ends in a fixed
average state between two sufficiently
different time values.

Feb. 23, 2003 Danss - Theory of SysAdmin 30


Convergence - Example
 Deleting temporary files on a regular basis
is a convergent policy since it returns the
system to a known state (i.e. a given
amount of free file system space).

Feb. 23, 2003 Danss - Theory of SysAdmin 31


Persistent State
 A persistent state is a configuration for which
the probability of returning to an equivalent
configuration at a later time is 1.
 Persistence is reflected in the property that
the rate of change of the average state is
much slower than the rate of change of fast
moving variations.

Feb. 23, 2003 Danss - Theory of SysAdmin 32


Persistent States
 The fast variations
extend over several
complete cycles
before any

State
appreciable change in
the average is seem.

Time

Feb. 23, 2003 Danss - Theory of SysAdmin 33


Theorem
 In an open system, a policy specifies a class of
equivalent persistent states if and only if the policy
exhibits average convergence.

 You can maintain the state of the system if and only


if your policy consistently returns the system to a
similar state. i.e. the average resource usage is
constant over the policies time scale.

Feb. 23, 2003 Danss - Theory of SysAdmin 34


Implications
 System Administration is the development,
specification and implementation of
environments and maintenance tasks with
the goal of creating a persistent average
state.

Feb. 23, 2003 Danss - Theory of SysAdmin 35


Strategy
 Type I
 Stochastic models
 Type II
 Semantic models

Feb. 23, 2003 Danss - Theory of SysAdmin 36


Type I - Stochastic models
 Analyze what is happening on multiple time
scales
 Describelocally averaged states
 Model known boundary conditions

 Empirical measurements of existing systems.


 Predictive modeling of systems based on
measurements.

Feb. 23, 2003 Danss - Theory of SysAdmin 37


Problems with Stochastic
Models
 Statistics measurements are rare
 No experimental repeatability
 Conditions of measurements are
constantly changing
 Absolute definitions are impossible
 People cannot be described by a small
number of characteristics

Feb. 23, 2003 Danss - Theory of SysAdmin 38


Stochastic modeling -- Uses
 Strategic planning
 Do we need to buy more file servers?

 Problem identification
 Why is user X using 300% of the normal disk
quota?
 Why is computer Y rebooting twice a week
when all other systems are stable for months?

Feb. 23, 2003 Danss - Theory of SysAdmin 39


Strategic models
 Analyze what might be changed in a
system.
 Formulate as a game of strategy
 Achieve larger goals than just maintaining
a persistent state.

Feb. 23, 2003 Danss - Theory of SysAdmin 40


Strategic Goals
 Sys Admin: Keep the system alive and
running so that users can perform a
maximum amount of work
 Benign User: produce useful work using
the system. (consumes resources)
 Malicious User: Maximize control of
system resources

Feb. 23, 2003 Danss - Theory of SysAdmin 41


Strategic tools
 Game Theory
 Contests between System Administrator and
malicious users.
 System Downtime: Mean time to repair /
Mean time before failure
 Minimize MTTR or maximize MTBF?
 Levels of monitoring: At what point does the
cost of monitoring overwhelm the benefit?

Feb. 23, 2003 Danss - Theory of SysAdmin 42


Current research
 Recovering File space
 System upgrades
 Quota systems

Feb. 23, 2003 Danss - Theory of SysAdmin 43


Recovering File Space
 How do you clean unused files?
 Competition between users and admins
 Trade off between
 having enough space to operate
 Users recreating temp files that were deleted

 Users “grabbing” space for later use

Feb. 23, 2003 Danss - Theory of SysAdmin 44


Patch Application
 How do you apply changes to a distributed
system?
 Divergence

 Convergence

 Congruence

Feb. 23, 2003 Danss - Theory of SysAdmin 45


Quota application
 What is the correct way to set file system
quotas?
 By category
 Dynamically assign users to groups
 Set group to lowest maximal value

Feb. 23, 2003 Danss - Theory of SysAdmin 46


Bibliography
 Burgess, M. 2003. On the theory of System
Administration, Journal of the ACM.
 S. Traugott, L. Brown 2002. Why Order Matters:
Turing Equivalence
in Automated Systems Administration, Lisa 2002
 M. Gilfix, 2002. Holistic Quota Management: The
Natural path to a better, more efficient quota
system, Lisa 2002

Feb. 23, 2003 Danss - Theory of SysAdmin 47

S-ar putea să vă placă și