Sunteți pe pagina 1din 24

Why Computers Are Getting Slower

(and what we can do about it)

Rik van Riel


Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower

The traditional approach better performance


Why computers are getting slower
No miracle good enough
Operating system improvements
What Red Hat can do
Deployment & application improvements
What you can do
Conclusions
Your performance needs

More
More
More
More
More
Cheaper, too.
The traditional approach to performance

1. Wait for the hardware people to perform miracles.


2. ???
3. Performance!
Why computers are getting slower

Moore's law
Pretty graphs of an ugly reality:
CPUs performance vs core performance

Storage cannot keep up

Capacity vs. performance

Why the upcoming hardware miracles are not enough


Moore's law
The number of transistors on a chip will double
about every two years

Moore's law predicts density and complexity, not


performance
Multi-core performance still doubles about every two
years, but single core performance does not
However, software people relied on exponentially
increasing performance, making Moore's law cause of,
and solution to, every performance problem
Moore's law will not save us this time
Processor performance
Memory access latencies
Access latencies in CPU cycles
CPU L1 cache L2 cache RAM Disk
386 2 500000
486 2 10 1800000
586 2 20 1500000
Pentium II 2 10 35 2400000
Pentium III 2 15 50 6000000
Pentium 4 3 25 200 18000000
Core 2 3 25 200 24000000

Note: old CPUs took multiple cycles to run one instruction


while new CPUs can run multiple instructions per cycle
Hard disk capacity & performance
Availability consequences
Filesystem checks
Fsck has turned from a standard boot procedure into

a major inconvenience
As disk error rates drop slower than disk capacity

increases, errors become more likely


As disks continue to grow in size and seek times

barely drop, fsck times will increase from


inconvenience to disaster
Backups
Full backups take too long

Incremental backups solve the problem

Can you afford to wait for a restore from backup?


Hardware miracles
Solid State Disks
Flash SSD to overtake disks in $/GB in 5 years time

Current access latencies 10-100x lower than disk

2+ million rewrite cycles, better longevity than disk

However, in 5 years time capacity will be 10x larger

than today. Fsck and backup/restore times could still


be bad
NUMA
Can alleviate memory bottleneck in SMP systems,

but only if programs mostly access local memory


and CPU cache
Not new, just getting more widespread
Hardware miracles cont.
Large CPU caches
CPU cache size growing fast, but data sets grow

faster
Data can only be cached if read-only or accessed

just by this CPU

Conclusions
Faster hardware can lead to slower system

operations, due to increased capacity


Hardware miracles will not save us this time
Operating system improvements
Things we can do for you.
Scheduler improvements
Lockless kernel synchronization
Tickless timer & power management
Memory management improvements
Filesystem developments
Scheduler improvements
Lower latency scheduling for real time needs
Better CPU affinity and SMP/NUMA balancing
More cache accesses, fewer RAM accesses

Keep processes on their own NUMA node when

possible
Power aware scheduling
Move tasks to one CPU core, keep others in deep

sleep
With Intel's Dynamic Acceleration Technology, the

non-idle core can run faster as a result


Lockless kernel synchronization
Locks require that CPUs exchange data via RAM
Exchanging data between CPUs is slow

Fine-grained locking can reduce throughput

Linux uses several lockless synchronization algorithms


RCU (Read Copy Update) and seqlocks

Both are best for read-heavy data structures


Readers do not dirty the lock
Cache line with synchronization info can be shared

between all CPU caches


Writers notify the readers by writing to the lock
Do not have to wait for readers to finish
Tickless timer & power management
Traditionally, Linux used a 100Hz or 1000Hz timer
interrupt
Uses power

Keeps the CPU from going into a deep sleep mode

Makes higher precision wakeups difficult

Performance problem with virtualization

Instead of a fixed timer


Determine when the next timer expires

Set the hardware clock to go off at that time

Longer sleep periods allow the CPU to go into a deeper


power saving mode
Memory management improvements
Lockless page cache
File data can be looked up faster, on multiple CPUs

simultaneously
Improved concurrency is especially important for

things like glibc, which get mmaped and faulted on


every exec()
Split LRU lists
At pageout time, only scan pages that are a

candidate for eviction


Important for systems with many millions of pages

Can do different replacement algorithm for page

cache and process pages


Filesystem developments
Capacity
48 or 64 bit block numbers

Reliability
Disk error rates between 1 in 1TB and 1 in 1000TB

Disk sizes have reached 1TB already

Metadata checksums can detect errors

Availability
Errors will be more common on larger filesystems

Fsck needs to be fast; repair driven design

Performance
Smarter metadata layout can reduce disk seeks
Deployment & application changes
Things you will have to do.
Analyze your performance and capacity needs
Experiment with new hardware
Use NUMA/SMP friendly applications
Virtualization & availability
Analyze your needs
How much space will your users' programs need?
RAM and disk

How much performance do they expect?


What kind of latencies do they need?
What are the availability requirements?
Can a hardware problem stop you from meeting

availability goals?
How long will a restore from backup take?

How realistic are the users' requirements?


Experiment with new hardware
Running on more CPUs
Can result in more cache misses

Some workloads run slower with more CPUs

Especially true when NUMA is involved

Solid state disks


More expensive than hard disks per GB

Cheaper than hard disks per IO ops/second

May be cost effective for certain workloads

Databases, mail servers, ...


NUMA & SMP friendly applications
CPUs are fast, communication between CPUs is slow
Maximize performance by minimizing

communication
Fine-grained locking increases parallelism, but also
increases inter-CPU communication!
Worked great in the 1990's, but no more

Writing to common data structures invalidates cache


lines and increases inter-CPU communication
Write mostly to thread-local data, read mostly from
shared data
Use NUMA/SMP friendly runtimes (JVM, etc)
Virtualization & availability
How reliable do your systems need to be?
What do you spend time on when doing recovery?
Installing the OS?

Configuring applications?

Restoring data from backup?

Virtualization can hide some of that time


Guest OS w/ application lives on network storage

Guest OS w/ application can run elsewhere while

you configure new hardware and host OS


Use redundant network storage
Conclusions
Expectation: higher performance, cheaper
Reality: faster components sometimes lead to slower
systems
Software needs to improve

Some things can be fixed at the OS level


Red Hat is working on this

Other things can be fixed at the deployment and


application levels
You will have to do those

Analyze your needs and tell Red Hat

S-ar putea să vă placă și