Sfink/Performance Thoughts

From MozillaWiki
Revision as of 20:46, 28 June 2010 by Sfink (talk | contribs) (general rambling. Intermediate checkin.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

I'm going to go a little crazy with taxonomies.

What Kind of Performance

  • Latency - how long do you have to wait between the time you initiate an action and the time some detectable response appears?
    • This is all the user really cares about, but it's not always the right thing to look at as a developer, since it is the end result of lots of other things that may be affected by multiple variables.
    • Latency of the complete response is one thing, but in reality some things are going to take some time, and so the latency to a visible progress indicator may be more important. Depends on the situation.
    • Variability of latency can be important too. It's critically important if the output depends on it, eg watching an animation of some sort. But it also interferes with learning: "I click on this button to make it do this... oh wait, it didn't work, did I forget something? Let me see if I need to -- oh, there it is. Odd, it normally doesn't take that long."
  • Memory usage - why is using memory bad?
    • latency is going to get very bad once you hit a certain "problem size"
    • you can't have as many other things active at the same time
    • the rest of the system gets sluggish as a side effect
    • you'll eventually crash the browser or some other application
  • Storage space - normally a far lesser concern with browsers.

Where Does Time Go?

I/O

I'm using I/O very generally, inclusive of disk, network, and memory.

You can kind of walk up the cache hierarchy, though it isn't really strictly a hierarchy.

At each level, you'll usually have asynchronous and synchronous behavior.

  • synchronous: stuff you have to wait for. Reads are usually synchronous. (Exceptions: readahead, or when you have another thread you can switch to. If you get close to actual devices, DMA can be a form of asynchronous read.)
  • asynchronous: stuff you don't have to wait for. Writes are often asynchronous. (Many more exceptions than with read=synchronous.)
    • Asynchronous requests don't matter, until they do: when too much asynchronous data is outstanding, it starts blocking and becoming synchronous. Asynchronous requests can also slow down or block synchronous ones earlier. Often this is because of dependencies between the requests. Those dependencies may or may not be fundamental -- they might just be a driver limitation or a simplification in the logic of whatever is handling the resource. (A memory read might block on an earlier write because it's hard to be certain they don't alias the same address space. Or a write may trigger a read to fill in the rest of a cache line.)

Most of these levels also have two types, slow and fast. (Often mapping to seeks vs sequential reads.)

All of them have weird exceptions to the simple taxonomy. (Disks have memory caches. Cache line aliasing has weird effects. Etc.)

Cache hierarchy, roughly ordered from most expensive to least:

  • Network I/O
    • Latency cost varies widely with the server and phase of moon.
  • Disk I/O
    • Writes can be asynchronous unless you need to explicitly synchronize for durability (in the ACID sense). But the OS will tend to flush them every so often even if it doesn't strictly need to, and those flushes can block reads.
    • Sequential I/O is fast, Random I/O is slow. Somewhat less true with an SSD, but I don't know much about those. The difference is significant enough that it's probably worthwhile to track this with two different metrics: initial block reads and total disk bandwidth.
  • Main RAM. In my myopic view, RAM is RAM. Mobile devices may break this with different types of memory (eg volatile vs involatile memory can be different speeds.) But if you stick your thumbs in your ears and waggle your fingers, you can ignore that.
  • TLB. Waggle faster. (Not often worth worrying about for non-specialized workloads.)
  • L3 then L2 then L1 caches.

This is a cache hierarchy, so for the most part the later layers will fall back to the earlier when their capacity is exceeded. And writes often go straight through to a slower layer.

CPU

CPUs are fast. Even on mobile devices, they're pretty fast. CPUs rarely consume large chunks of time just crunching through basic math operations. The time normally disappears into loading and saving data to and from memory, which counts as I/O.

Except that many measurement tools describe I/O in terms of CPU clock ticks. An L1 cache miss, for example, is normally measured in clock ticks. So you can think of it as CPU time if you like. (Clock ticks map fairly well to actual time, although you may need to adjust for occasional frequency scaling or whatever.)

More importantly, tools have a distinction of "I/O wait time" vs "CPU time". This distinction is mostly real: with I/O wait, your process is scheduled out and not running. CPU time includes time when the CPU is twiddling its thumbs waiting for a cache miss to be resolved, but the CPU isn't going anywhere; it'll keep running your process immediately after the needed data gets loaded in. (Exception: in SMT aka hyperthreading, it may be able to very very quickly go do work for someone else but be back in time to deal with your data before you've even noticed it was gone.)