Why Computers Are Getting Slower

Why Computers Are Getting Slower
(and what we can do about it)
Rik van Riel

Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower
The traditional approach better performance

Why computers are getting slower
No miracle good enough
Operating system improvements
What Red Hat can do
Deployment & application improvements
What you can do
Conclusions
Your performance needs
More
More
More
More
More
Cheaper, too.
The traditional approach to performance
1. Wait for the hardware people to perform miracles.

2. ???
3. Performance!
Why computers are getting slower
Moore's law
Pretty graphs of an ugly reality:
CPUs performance vs core performance
Storage cannot keep up
Capacity vs. performance
Why the upcoming hardware miracles are not enough

Moore's law
The number of transistors on a chip will double
about every two years
Moore's law predicts density and complexity, not

performance
Multi-core performance still doubles about every two
years, but single core performance does not
However, software people relied on exponentially
increasing performance, making Moore's law cause of,
and solution to, every performance problem
Moore's law will not save us this time
Processor performance
Memory access latencies
Access latencies in CPU cycles
CPU L1 cache L2 cache RAM Disk
386 2 500000
486 2 10 1800000
586 2 20 1500000
Pentium II 2 10 35 2400000
Pentium III 2 15 50 6000000
Pentium 4 3 25 200 18000000
Core 2 3 25 200 24000000
Note: old CPUs took multiple cycles to run one instruction

while new CPUs can run multiple instructions per cycle
Hard disk capacity & performance
Availability consequences
Filesystem checks
Fsck has turned from a standard boot procedure into
a major inconvenience
As disk error rates drop slower than disk capacity
increases, errors become more likely

As disks continue to grow in size and seek times
barely drop, fsck times will increase from

inconvenience to disaster
Backups
Full backups take too long
Incremental backups solve the problem
Can you afford to wait for a restore from backup?

Hardware miracles
Solid State Disks
Flash SSD to overtake disks in $/GB in 5 years time
Current access latencies 10-100x lower than disk
2+ million rewrite cycles, better longevity than disk
However, in 5 years time capacity will be 10x larger
than today. Fsck and backup/restore times could still

be bad
NUMA
Can alleviate memory bottleneck in SMP systems,
but only if programs mostly access local memory

and CPU cache
Not new, just getting more widespread
Hardware miracles cont.
Large CPU caches
CPU cache size growing fast, but data sets grow
faster
Data can only be cached if read-only or accessed
just by this CPU
Conclusions
Faster hardware can lead to slower system
operations, due to increased capacity

Hardware miracles will not save us this time
Operating system improvements
Things we can do for you.
Scheduler improvements
Lockless kernel synchronization
Tickless timer & power management
Memory management improvements
Filesystem developments
Scheduler improvements
Lower latency scheduling for real time needs
Better CPU affinity and SMP/NUMA balancing
More cache accesses, fewer RAM accesses
Keep processes on their own NUMA node when
possible
Power aware scheduling
Move tasks to one CPU core, keep others in deep
sleep
With Intel's Dynamic Acceleration Technology, the
non-idle core can run faster as a result

Lockless kernel synchronization
Locks require that CPUs exchange data via RAM
Exchanging data between CPUs is slow
Fine-grained locking can reduce throughput
Linux uses several lockless synchronization algorithms

RCU (Read Copy Update) and seqlocks
Both are best for read-heavy data structures

Readers do not dirty the lock
Cache line with synchronization info can be shared
between all CPU caches

Writers notify the readers by writing to the lock
Do not have to wait for readers to finish
Tickless timer & power management
Traditionally, Linux used a 100Hz or 1000Hz timer
interrupt
Uses power
Keeps the CPU from going into a deep sleep mode
Makes higher precision wakeups difficult
Performance problem with virtualization
Instead of a fixed timer

Determine when the next timer expires
Set the hardware clock to go off at that time
Longer sleep periods allow the CPU to go into a deeper

power saving mode
Memory management improvements
Lockless page cache
File data can be looked up faster, on multiple CPUs
simultaneously
Improved concurrency is especially important for
things like glibc, which get mmaped and faulted on

every exec()
Split LRU lists
At pageout time, only scan pages that are a
candidate for eviction

Important for systems with many millions of pages
Can do different replacement algorithm for page
cache and process pages

Filesystem developments
Capacity
48 or 64 bit block numbers
Reliability
Disk error rates between 1 in 1TB and 1 in 1000TB
Disk sizes have reached 1TB already
Metadata checksums can detect errors
Availability
Errors will be more common on larger filesystems
Fsck needs to be fast; repair driven design
Performance
Smarter metadata layout can reduce disk seeks
Deployment & application changes
Things you will have to do.
Analyze your performance and capacity needs
Experiment with new hardware
Use NUMA/SMP friendly applications
Virtualization & availability
Analyze your needs
How much space will your users' programs need?
RAM and disk
How much performance do they expect?

What kind of latencies do they need?
What are the availability requirements?
Can a hardware problem stop you from meeting
availability goals?
How long will a restore from backup take?
How realistic are the users' requirements?

Experiment with new hardware
Running on more CPUs
Can result in more cache misses
Some workloads run slower with more CPUs
Especially true when NUMA is involved
Solid state disks

More expensive than hard disks per GB
Cheaper than hard disks per IO ops/second
May be cost effective for certain workloads
Databases, mail servers, ...

NUMA & SMP friendly applications
CPUs are fast, communication between CPUs is slow
Maximize performance by minimizing
communication
Fine-grained locking increases parallelism, but also
increases inter-CPU communication!
Worked great in the 1990's, but no more
Writing to common data structures invalidates cache

lines and increases inter-CPU communication
Write mostly to thread-local data, read mostly from
shared data
Use NUMA/SMP friendly runtimes (JVM, etc)
Virtualization & availability
How reliable do your systems need to be?
What do you spend time on when doing recovery?
Installing the OS?
Configuring applications?
Restoring data from backup?
Virtualization can hide some of that time

Guest OS w/ application lives on network storage
Guest OS w/ application can run elsewhere while
you configure new hardware and host OS

Use redundant network storage
Conclusions
Expectation: higher performance, cheaper
Reality: faster components sometimes lead to slower
systems
Software needs to improve
Some things can be fixed at the OS level

Red Hat is working on this
Other things can be fixed at the deployment and

application levels
You will have to do those
Analyze your needs and tell Red Hat

Why Computers Are Getting Slower

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Why Computers Are Getting Slower

Încărcat de

Drepturi de autor:

Formate disponibile

Why Computers Are Getting Slower

(and what we can do about it)

Rik van Riel

The traditional approach better performance

1. Wait for the hardware people to perform miracles.

Storage cannot keep up

Capacity vs. performance

Why the upcoming hardware miracles are not enough

Moore's law predicts density and complexity, not

Note: old CPUs took multiple cycles to run one instruction

increases, errors become more likely

barely drop, fsck times will increase from

Incremental backups solve the problem

Can you afford to wait for a restore from backup?

Current access latencies 10-100x lower than disk

2+ million rewrite cycles, better longevity than disk

However, in 5 years time capacity will be 10x larger

than today. Fsck and backup/restore times could still

but only if programs mostly access local memory

just by this CPU

operations, due to increased capacity

Keep processes on their own NUMA node when

non-idle core can run faster as a result

Fine-grained locking can reduce throughput

Linux uses several lockless synchronization algorithms

Both are best for read-heavy data structures

between all CPU caches

Keeps the CPU from going into a deep sleep mode

Makes higher precision wakeups difficult

Performance problem with virtualization

Instead of a fixed timer

Set the hardware clock to go off at that time

Longer sleep periods allow the CPU to go into a deeper

things like glibc, which get mmaped and faulted on

candidate for eviction

Can do different replacement algorithm for page

cache and process pages

Disk sizes have reached 1TB already

Metadata checksums can detect errors

Fsck needs to be fast; repair driven design

How much performance do they expect?

How realistic are the users' requirements?

Some workloads run slower with more CPUs

Especially true when NUMA is involved

Solid state disks

Cheaper than hard disks per IO ops/second

May be cost effective for certain workloads

Databases, mail servers, ...

Writing to common data structures invalidates cache

Restoring data from backup?

Virtualization can hide some of that time

Guest OS w/ application can run elsewhere while

you configure new hardware and host OS

Some things can be fixed at the OS level

Other things can be fixed at the deployment and

Analyze your needs and tell Red Hat

S-ar putea să vă placă și