Subject: Support for zero'ing pages in idle loop
To: None <current-users@netbsd.org>
From: Jason R Thorpe <thorpej@zembu.com>
List: current-users
Date: 04/24/2000 10:27:54
Hi folks...
I've just committed code that implements pre-zero'ing of pages in
the idle loop. This helps zero-fill page faults a fair bit, and
also speeds up e.g. page table allocation. Some lmbench results
before and after, on a 300MHz Celeron w/ 32M of RAM:
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
i386-netb NetBSD 1.4X 298 1.2 5.1 15 24 0.05K 2.7 4 1.1K 4K 8K
i386-netb NetBSD 1.4X 298 1.2 5.2 16 23 0.05K 2.7 4 0.8K 4K 7K
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
i386-netb NetBSD 1.4X 7298 9.1K
i386-netb NetBSD 1.4X 7098 7.7K
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
i386-netb NetBSD 1.4X 298 10 36 202
i386-netb NetBSD 1.4X 298 10 36 200
Note the L2 latency didn't change -- the i386 implementation does the
access uncached. Compare this with doing it cached:
i386-netb NetBSD 1.4X 298 10 87 200
"ouch." For this reason, portmasters who glue this in to their port's
idle loops should provide an uncached method like the i386 port does,
and if that is not possible, probably elect to just not do it at all
until we can come up with a way of doing this cached such that the
cache footprint can be minimized.
In other words, consider this a work-in-progress, with some incremental
benefit along the way :-)
Note that lmbench really really really stresses out memory allocators,
and doesn't leave the system with much idle time. For this reason, the
numbers might not look that impressive (there's a large miss rate for
zero-allocations, even though there is some improvemenet in overall
performance). The hit rate is much better for "normal" system activity
(e.g. running X, netscape, compiling stuff, etc.) and I've observed the
netscape-startup-time benchmark improve by 50-60% on the same system I
ran the lmbench on.
Enjoy!
--
-- Jason R. Thorpe <thorpej@zembu.com>