From MozillaWiki
Jump to: navigation, search

What is Codesighs?

Codesighs is a set of tools to help you determine the code and data size of shared libraries and executables. Once you can measure the code and data size, then you can measure drifts in size as code changes occur.

Why use Codesighs when we already have file size on disk?

Codesighs does not look at the size on disk. Instead Codesighs relies upon symbol data made available by either the information in the file itself or by a linker map file. Using this data, Codesighs can tell the difference between executable code and static data, and can also determine the size of the symbols involved (e.g. functions, static variables).

File size on disk is an important metric for installers and for eyeballing size differences on large changes. Codesighs offers you the opportunity to measure even minute changes that may show no difference in file size.


If you are starting from an existing mozilla source tree:

  • Set MOZ_MAPINFO=1 in your build environment.
  • cvs checkout mozilla/tools/codesighs
  • Rerun mozilla/configure with your normal options, and in addition specify --enable-codesighs.
  • If you are using a linux build:
    • Run make in the mozilla/tools/codesighs directory.
    • From the parent directory of the mozilla source tree, execute the following command: ./mozilla/tools/codesighs/autosummary.unix.bash results001.tsv results000.tsv summary001.txt
  • If you are using a windows build:
    • All .exe and .dll files need to be relinked, such that mapfile information is generated at link time. One way to do this is by rebuilding your entire tree; this will also build your mozilla/tools/codesighs directory.
  • If you are using Mac OS X, XXX.
    • From the parent directory of the mozilla source tree, execute the following command from your bash shell: ./mozilla/tools/codesighs/ results001.tsv results000.tsv summary001.txt
  • See below for a description of these files and the output of the script.

Otherwise, if you are starting with no source tree:

  • Set MOZ_MAPINFO=1 in your build environment.
  • Perform the normal source tree checkout steps.
  • Enable --enable-codesighs by either placing in your .mozconfig file or by running configure manually.
  • Proceed with the normal build steps.
  • If you are using a linux or Mac OS X build:
    • From the parent directory of the mozilla source tree, execute the following command: ./mozilla/tools/codesighs/autosummary.unix.bash results001.tsv results000.tsv summary001.txt
  • If you are using a windows build:
    • From the parent directory of the mozilla source tree, execute the following command form your bash shell: ./mozilla/tools/codesighs/ results001.tsv results000.tsv summary001.txt
  • See below for a description of these files and the output of the script.

The script itself will first output a single number which represents the total size of all code and data in the considered executable build files.

The script may output a second number which represents the composite size difference from the results000.tsv file to the results001.tsv file, but only if the file results000.tsv is present.

The file results000.tsv does not need to exist beforehand, but go ahead specify it anyway. If it does exist, it should be the results of a prior run of the script (i.e. results001.tsv). By using prior results of the script you can see the differences any source code changes applied to your tree have caused.

The file results001.tsv will be overwritten to contain all symbol data garnered form the build. Use this file in the future as the results000.tsv file to see the differences any source code changes you apply to the source tree cause. If interested, take a look at this file. The file contains all symbols found sorted by their respective sizes. This information could be a good starting point if you are interested in reducing the code or data footprint of the build.

The file summary001.txt will contain any code or data size differences between results000.tsv and results001.tsv. In addition, this file will give a brief summary of the code and data sizes of the modules in the build.

A Longer Introduction

Once you havve performed the steps in the shorter HOWTO and are interested in the niceties, this section is for you. By explaining each Codesighs tool separately, I hope to empower you to wield or modify them as you will. Also, simply reading the autosummary.*.bash scripts will cover almost everything I will state below.

  • msmap2tsv

This command takes a MS linker .map file and converts it into a format which codesighs understands.

As a warning, the symbol sizes this tool reports are not guaranteed. The .map files produced by the MS linker do not specify sizes of the symbols, but instead give offsets of the symbols in particular sections. msmap2tsv uses these offsets and sections as clues to a symbol's size. All code and data is accounted for, but the guesswork may improperly report some symbol sizes. Some incorrect symbol sizes will include static functions which are in the source file near the public reported symbol.

Here is a list of sections a .map file might contain. Knowing the various sections can come in handy when trying to determine what the tool output represents. In short, these sections control whether or not the size of each section is attributed to code or data:

  • bss: uninitialized data.
  • crt: runtime library initialization/shutdown pointers.
  • data: initialized data.
  • debug: COFF debug information data.
  • edata: exported functions data.
  • idata: imported functions data.
  • rdata: read only data.
  • reloc: base relocations data.
  • rsrc: resource data.
  • text: machine code.

Further, the sections reported in the .map file may not be present in the resultant executable. This is a positive result, as they are merged by the linker and will cause less overhead; each section uses at least 4k of system memory even if only 1 byte of the section is utilized. For example, the edata and idata sections are normally merged with the rdata section, bss is normally merged with the data section, et. al. MSDN has some articles regarding merging of sections. If you see too many sections via a "dumpbin /summary <filename.exe>" then perhaps one way to reduce physical memory strain is to merge some of the sections.

Another thing to consdier is that at the time of this writing, msmap2tsv does not demangle the symbol names reported in the mapfile. This can make it slightly more difficult to recognize C++ symbols. On the other hand, nm2tsv does demangle the names if you are using a linux build.

  • nm2tsv

This command takes the output of the GNU nm tool and converts it into a format which codesighs understands.

Specifically, the options to the nm tool should be: --format=bsd --size-sort --print-file-name --demangle

The requirement for nm to be from GNU comes from some hard coded interpretations regarding the symbol type. The symbol types are used in helping to determine whether the symbol is code or data. Knowing these symbol types can help in understanding the output of this tool. Some of the types are as:

  • B: uninitialized data
  • D: initialized data
  • R: read only data
  • T: machine code
  • V: weak object
  • W: weak symbol

Because the nm tool reports the size of each symbol when the --size-sort switch is used, no guesswork is performed by this tool with regards to the symbol size.

  • codesighs

This tool takes the output from msmap2tsv or nm2tsv and outputs total sums regarding code and data size by module.

While researching various aspects of the tsv data, this tool is the easiest research tool available.

This tool has a lot of command line options. Short of importing the tsv output into a database to perform queries against, this tool is your best option. You can specify a fairly verbose query using the command line.

For instance, if you were interested in the import and export symbol sizes of a win32 build, you would perform the following command: codesighs.exe --match-section idata --match-section edata --input somefile.tsv The results of this command would show you the overhead incurred from importing and exporting functions on win32.

Here's a hypothetical sample of the output of this tool using the switches "--modules --match-module mozilla --match-module xpcom --match-module nspr" on a tsv file generated from every mapfile found in the source tree:

Overall Size
       Total:     4886384
       Code:      1304442
       Data:      3581942

       Total:     4364095
       Code:       948124
       Data:      3415971

       Total:      216929
       Code:       175202
       Data:        41727

       Total:      211255
       Code:       102283
       Data:       108972

       Total:       94105
       Code:        78833
       Data:        15272
  • maptsvdifftool

This tool is used to output a human readable change summary of code and data size drifts. Used mainly in the autosummary.*.bash scripts to show drifts after changes to a source tree occur, it is possible to get the same results by hand.

These steps are vague in nature, but you should be able to follow them and you will have a custom drift report in no time:

  • Run nm2tsv on a binary or msmap2tsv on a mapfile to produce some tsv output. Do this to as many mapfiles or binaries as you see fit to produce the right tsv output.
  • Sort the tsv output and save it away somewhere.
  • Make whatever build changes you have in mind, and then reproduce the tsv output and sort it.
  • Diff the first tsv output with the second tsv output.
  • Run the diff results through maptsvdifftool to see the deltas in a human readable form.
  • If there was no change, consider using the --zero-drift command line argument which will show all changes even if they result in a net zero change. This option will show you every minute change made to the symbols.

Here's a hypothetical sample of the output of this tool:

Overall Change in Size
       Total:        +6628
       Code:         +4133
       Data:         +2495

       Total:        +6628
       Code:         +4133
       Data:         +2495
             +4133     text (CODE)
                     +4112     codesighs.obj
                             +2192     _initOptions
                              +992     _cleanOptions
                              +928     _codesighs
                       +21     MSVCRTD:MSVCRTD.dll
                                +6     _strstr
                                +6     _strtoul
                                +6     __errno
                                +5     __strdup
                                -2     _printf
             +2430     data (DATA)
                     +2432     UNDEF:codesighs:data
                             +2432     UNDEF:codesighs:data
                        -2     MSVCRTD:merr.obj
                                -2     ___defaultmatherr
               +33     idata$6 (DATA)
                       +33     UNDEF:codesighs:idata$6
                               +33     UNDEF:codesighs:idata$6
               +16     idata$4 (DATA)
                       +16     UNDEF:codesighs:idata$4
                               +16     UNDEF:codesighs:idata$4
               +16     idata$5 (DATA)
                       +16     MSVCRTD:MSVCRTD.dll
                                +4     __imp__strstr
                                +4     __imp__strtoul
                                +4     \177MSVCRTD_NULL_THUNK_DATA
                                +4     __imp___errno