Linux/Compiler Options: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(add a note about how to use --enable-optimize)
m (→‎Overview: add links to cross-compliting and 32-on-64-bit-linux)
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
=Overview=
=Overview=
* For cross-compling look at [http://developer.mozilla.org/en/docs/Cross-Compiling_Mozilla]
* For compiling a 32-bit Firefox on a 64-bit Linux look at [http://developer.mozilla.org/en/docs/Compiling_32-bit_Firefox_on_a_Linux_64-bit_OS].


==Proper use of compiler options==
==Proper use of compiler options==


'''Don't set a default optimization level for the entire browser build.'''
'''For Firefox 3 builds, please use --enable-optimize without flags.'''
 
Our testing has shown that different parts of Mozilla run faster at different optimization levels.  For example, cairo, pixman and sqlite are compiled at -O2 because they are fastest at that level while the JS engine is fastest at -Os. [https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c9]  Don't use --enable-optimize as a place to pass in random compile flags.  That's a global setting that sets optimization levels [http://lxr.mozilla.org/seamonkey/search?string=MODULE_OPTIMIZE_FLAGS throughout the source tree] and is different depending on the module being compiled.
 
If you still need to pass in other non-optimization flags to the compile, please use CFLAGS and CXXFLAGS instead of passing them to --enable-optimize.
 
'''For Firefox 2 builds, you probably want to set a default optimization level.'''
 
The default optimization level on the 1.8 branch (a.k.a Firefox 2) is -O3 which is too aggressive and trades off a lot of space for not much speed.  So you probably want to use --enable-optimize="..." for this release.
 
If you're using gcc 4.1.x you should use -O2 to make things go as fast as possible.  This will result in about a 2MB code size hit.  If you want to avoid that code size hit you can specify "-Os -finline-limit=100" which gives back most of the performance without too much code size growth.  See the notes below.


Different parts of Mozilla run faster at different optimization levels. For example, cairo, pixman and sqlite are compiled at -O2 because they are fastest at that level while the JS engine is fastest at -Os. [https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c9]  If you want to use --enable-optimize, don't add extra optimization flags thereThat's a global setting that sets optimization levels [http://lxr.mozilla.org/seamonkey/search?string=MODULE_OPTIMIZE_FLAGS throughout the source tree].  Instead pass non-optimization flags that you care about via CFLAGS and CXXFLAGS during the build.
For gcc 4.3.x you can use -O2 for your builds.  The size hit is smaller because of visibility changes in that release of the compiler.
 
'''If you want to change optimization levels, please do it per-module.'''
 
As we discovered, the best optimization settings are per-moduleIf your testing shows that changes to a particular module improve performance please let us know by filing a bug against Firefox/Build Config and we can evaluate it and get it into the tree.


=Compilers=
=Compilers=


<table class="fullwidth-table">
<tr>
<th>Compiler + Options</th>
<th>Notes</th>
</tr>


<tr>
Notes from dwitte on gcc 4.3 vs. 4.1.2. [https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c17]  Also see the [https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c0 original post] about possible ways to make gcc 4.1.2 faster as well by using -Os and -finline-limit.
<td>gcc 4.1.2 (-Os -freorder-blocks -fno-reorder-functions)</td>
 
===gcc 4.1.2 notes===
 
<pre>
it turns out that gcc 4.1.2 on linux, at our default optimization setting "-Os
-freorder-blocks -fno-reorder-functions", avoids inlining even trivial
functions (where the cost of doing so is less than even the fncall overhead).
this is bad news for things like nsTArray, nsCOMPtr etc, which can result in
many layers of wrapper calls if not inlined sensibly.


<td>
gcc has an option to control inlining, "-finline-limit=n", which will (roughly)
inline functions up to length n pseudo-instructions. to give some sense for
numbers, the default value of n at -O2 is 600. i ran some tests and found that
with our current settings and -finline-limit=50 on a 32-bit linux build, which
is enough to inline trivial (one or two line) wrapper methods but no more, we
can get a codesize saving of 225kb (2%), a Ts win of 3%, a Txul win of 18%, and
a Tp2 win of about 25% (!).


Baseline for testing. Very similar to our reference platform, which uses gcc 4.1.1. Those options above are the default options that we use for compiling.
i also compared this to plain -O2: Txul is unchanged, Ts improves 3%, and Tp2
improves about 4%. however, codesize jumps 2,414kb (19%). maybe we can increase
the inline limit at -Os to get back a bit of this perf, without exploding
codesize. (we originally moved from -O2 to -Os on gcc 3.x, because it gave a
huge codesize win and also a perf win of a few percent on Ts, Txul, and Tp. so,
it seems gcc4.x behaves quite differently.)


This compiler version apparently does not inline even trivial functions.[https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c0]
</pre>


</td>
===gcc 4.3 notes===
</tr>


<tr>
<pre>
<td>gcc 4.2.1 (-Os -freorder-blocks -fno-reorder-functions -finline-limit=50)</td>
i've tested gcc 4.3 a bit. to summarize, it looks like this pathological -Os
behavior is specific to 4.1 branch, and possibly just 4.1.2. also, there are
some substantial perf and codesize wins to be had with gcc 4.3.


<td>
gory details: tested with gcc 4.3 (20080104 pull). "stock configuration" is
Note the use of -finline-limit=n.  This will force many small functions and helpers to be inlined. (-O2 uses 600 as a value, for example.)  On a 32-bit Linux build this results in:
"-Os -freorder-blocks -fno-reorder-functions". some Tp2 numbers:


codesize saving of 225kb (2%)<br/>
baseline: gcc 4.3, stock:      142.78 ms
Ts win of 3%<br/>
stock, with -finline-limit=50:  146.89 ms (+2.9%)
Txul win of 18%<br/>
-O2:                            131.56 ms (-7.9%)
Tp2 win of about 25% (!)<br/>
</td>
</tr>


<tr>
for comparison with previous results (comment 0):
<td>gcc 4.2.1 (-O2 -freorder-blocks -fno-reorder-functions)</td>
gcc 4.1.2, stock:              199    ms (+39%)
stock, with -finline-limit=50:  149.33 ms (+4.6%)
-O2:                            142.67 ms (even)


<td>
|size libxul.so|
Comparing -O2 vs. -Os -finline-limit=50, not directly against baseline. [https://bugzilla.mozilla.org/show_bug.cgi?id=409803#c0]
gcc 4.3, stock:                12,387kb
stock, with -finline-limit=50:  12,325kb (-62kb)
-O2:                           15,061kb (+2,674kb)


gcc 4.1.2, stock:              13,249kb (+862kb)
stock, with -finline-limit=50:  13,025kb (+638kb)
-O2:                            15,440kb (+3,053kb)


Txul is unchanged<br/>
a few points from this data:
Ts improves 3%<br/>
1) -Os is very sane on 4.3 by default.
Tp2 improves about 4%<br/>
2) on 4.3, relative to -Os, -O2 has improved a lot (8% Tp win, although at a
however, codesize jumps 2,414kb (19%)
2.7Mb codesize cost).
</td>
3) 4.3 is 5 - 8% faster on Tp2 than 4.1.2, depending on -Os/-O2.
4) 4.3 gives an 400-800k codesize saving over 4.1.2.


</table>
3 & 4) are probably the same thing - a result of the hidden visibility
propagation improvements introduced in gcc 4.2. these are a major win for us.
</pre>


=Distributions=
=Distributions=

Latest revision as of 09:23, 12 February 2008

Overview

  • For cross-compling look at [1]
  • For compiling a 32-bit Firefox on a 64-bit Linux look at [2].

Proper use of compiler options

For Firefox 3 builds, please use --enable-optimize without flags.

Our testing has shown that different parts of Mozilla run faster at different optimization levels. For example, cairo, pixman and sqlite are compiled at -O2 because they are fastest at that level while the JS engine is fastest at -Os. [3] Don't use --enable-optimize as a place to pass in random compile flags. That's a global setting that sets optimization levels throughout the source tree and is different depending on the module being compiled.

If you still need to pass in other non-optimization flags to the compile, please use CFLAGS and CXXFLAGS instead of passing them to --enable-optimize.

For Firefox 2 builds, you probably want to set a default optimization level.

The default optimization level on the 1.8 branch (a.k.a Firefox 2) is -O3 which is too aggressive and trades off a lot of space for not much speed. So you probably want to use --enable-optimize="..." for this release.

If you're using gcc 4.1.x you should use -O2 to make things go as fast as possible. This will result in about a 2MB code size hit. If you want to avoid that code size hit you can specify "-Os -finline-limit=100" which gives back most of the performance without too much code size growth. See the notes below.

For gcc 4.3.x you can use -O2 for your builds. The size hit is smaller because of visibility changes in that release of the compiler.

If you want to change optimization levels, please do it per-module.

As we discovered, the best optimization settings are per-module. If your testing shows that changes to a particular module improve performance please let us know by filing a bug against Firefox/Build Config and we can evaluate it and get it into the tree.

Compilers

Notes from dwitte on gcc 4.3 vs. 4.1.2. [4] Also see the original post about possible ways to make gcc 4.1.2 faster as well by using -Os and -finline-limit.

gcc 4.1.2 notes

it turns out that gcc 4.1.2 on linux, at our default optimization setting "-Os
-freorder-blocks -fno-reorder-functions", avoids inlining even trivial
functions (where the cost of doing so is less than even the fncall overhead).
this is bad news for things like nsTArray, nsCOMPtr etc, which can result in
many layers of wrapper calls if not inlined sensibly.

gcc has an option to control inlining, "-finline-limit=n", which will (roughly)
inline functions up to length n pseudo-instructions. to give some sense for
numbers, the default value of n at -O2 is 600. i ran some tests and found that
with our current settings and -finline-limit=50 on a 32-bit linux build, which
is enough to inline trivial (one or two line) wrapper methods but no more, we
can get a codesize saving of 225kb (2%), a Ts win of 3%, a Txul win of 18%, and
a Tp2 win of about 25% (!).

i also compared this to plain -O2: Txul is unchanged, Ts improves 3%, and Tp2
improves about 4%. however, codesize jumps 2,414kb (19%). maybe we can increase
the inline limit at -Os to get back a bit of this perf, without exploding
codesize. (we originally moved from -O2 to -Os on gcc 3.x, because it gave a
huge codesize win and also a perf win of a few percent on Ts, Txul, and Tp. so,
it seems gcc4.x behaves quite differently.)

gcc 4.3 notes

i've tested gcc 4.3 a bit. to summarize, it looks like this pathological -Os
behavior is specific to 4.1 branch, and possibly just 4.1.2. also, there are
some substantial perf and codesize wins to be had with gcc 4.3.

gory details: tested with gcc 4.3 (20080104 pull). "stock configuration" is
"-Os -freorder-blocks -fno-reorder-functions". some Tp2 numbers:

baseline: gcc 4.3, stock:       142.78 ms
stock, with -finline-limit=50:  146.89 ms (+2.9%)
-O2:                            131.56 ms (-7.9%)

for comparison with previous results (comment 0):
gcc 4.1.2, stock:               199    ms (+39%)
stock, with -finline-limit=50:  149.33 ms (+4.6%)
-O2:                            142.67 ms (even)

|size libxul.so|
gcc 4.3, stock:                 12,387kb
stock, with -finline-limit=50:  12,325kb (-62kb)
-O2:                            15,061kb (+2,674kb)

gcc 4.1.2, stock:               13,249kb (+862kb)
stock, with -finline-limit=50:  13,025kb (+638kb)
-O2:                            15,440kb (+3,053kb)

a few points from this data:
1) -Os is very sane on 4.3 by default.
2) on 4.3, relative to -Os, -O2 has improved a lot (8% Tp win, although at a
2.7Mb codesize cost).
3) 4.3 is 5 - 8% faster on Tp2 than 4.1.2, depending on -Os/-O2.
4) 4.3 gives an 400-800k codesize saving over 4.1.2.

3 & 4) are probably the same thing - a result of the hidden visibility
propagation improvements introduced in gcc 4.2. these are a major win for us.

Distributions

Name

GCC Version

Last Build

Ubuntu 7.10

gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)

2.0.0.11+2nobinonly-0ubuntu0.7.10 (2008-01-07)

gcc flags
-Wall -W -Wno-unused -Wpointer-arith -Wcast-align -Wno-long-long -pedantic -g -Wall -O2 -pthread -pipe
g++ flags
-fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -g -Wall -O2 -fshort-wchar -pthread -pipe
configure flags --build=i486-linux-gnu --prefix=/usr '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var '--libexecdir=${prefix}/lib/firefox' --disable-maintainer-mode --disable-dependency-tracking --srcdir=. --disable-debug --with-default-mozilla-five-home= --with-user-appdir=.mozilla --with-system-png=/usr --with-system-jpeg=/usr --with-system-zlib=/usr --with-system-nspr --with-system-nss --disable-composer --disable-debug --disable-elf-dynstr-gc --disable-gtktest --disable-installer --disable-ldap --disable-mailnews --disable-profilesharing --disable-strip --disable-strip-libs --disable-tests --disable-updater --disable-xprint --enable-application=browser --enable-canvas --enable-default-toolkit=gtk2 --enable-gnomevfs --enable-libthai '--enable-optimize=-pipe\ -w\ -O2\ -fno-strict-aliasing\ -g' --enable-pango --enable-postscript --enable-svg --enable-svg-renderer=cairo --enable-system-cairo --enable-mathml --enable-xft --enable-xinerama --enable-extensions=default --enable-single-profile --enable-system-myspell --with-distribution-id=com.ubuntu --enable-official-branding --enable-system-c

Fedora 8

gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

firefox-2.0.0.10-3.fc8 (2008-01-04)

gcc flags
-Wall -W -Wno-unused -Wpointer-arith -Wcast-align -Wno-long-long -pedantic -pthread -pipe
g++ flags
-fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -fshort-wchar -pthread -pipe
configure flags
--enable-application=browser --prefix=/usr --libdir=/usr/lib --with-system-nspr --with-system-nss --with-system-jpeg --with-system-zlib --with-system-png --with-pthreads --disable-tests --disable-debug --disable-installer '--enable-optimize=-Os -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables' --enable-xinerama --enable-default-toolkit=gtk2 --disable-xprint --disable-strip --enable-pango --enable-system-cairo --enable-svg --enable-canvas --enable-startup-notification --enable-official-branding

CentOS 5.1

gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)

firefox-1.5.0.12-7.el5.centos (2008-01-07)

gcc flags
-Wall -W -Wno-unused -Wpointer-arith -Wcast-align -Wno-long-long -pedantic -pthread -pipe
g++ flags
-fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -fshort-wchar -pthread -pipe
configure flags
--enable-application=browser --prefix=/usr --libdir=/usr/lib --with-system-nspr --with-system-nss --with-system-jpeg --with-system-zlib --with-system-png --with-pthreads --disable-tests --disable-debug --disable-installer '--enable-optimize=-Os -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables' --enable-xinerama --enable-default-toolkit=gtk2 --disable-xprint --disable-strip --enable-pango --enable-system-cairo --enable-svg --enable-canvas --enable-official-branding