Platform/HTML5 sanitizer

Gecko Requirements

  • Allow a setting for enabling styles.
  • Allow a setting for enabling comments. See bug 572642
  • Have three element white lists: HTML, SVG and MathML.
  • Have three attribute white lists: HTML, SVG and MathML. The attributes don't depend on the element they are on beyond the element namespace.
  • Have three lists of attributes that take URLs. Drop the attributes when they have prohibited URLs (after trimming whitespace from the value).
    • Resolve relative URLs into absolute ones using a per fragment base URL. (Is this correct for Gecko reqs?)
    • Why is whitespace trimmed before the security check?
    • However, allow any URL in the src attribute on the img element, because imgs are safe.
      • Why risk this?
  • Have a list of SVG attributes that take different-document references.
  • Have a list of SVG attributes that are allowed to have same-document references only.
  • If styles are allowed, sanitize style attribute values. If styles aren't allowed, drop the style attribute.
  • Always drop script and title elements and their contents.
  • If styles are disabled, drop style elements and their contents.
  • If styles are enabled, sanitize the content of style elements.
  • Add the controls attribute to the video and audio elements (if it isn't there already).

Open Questions

  • Can stylistic SVG attributes have values that need to be sanitized?
  • Can stylistic MathML attributes have values that need to be sanitized?
  • Should element whitelisting take place after the tree builder algorithm so that the namespace of the element is known?
    • Likely yes.

Non-Gecko Requirements

  • Allow form-related elements to be toggled on and off in the white list.
  • Allow using the sanitizer in non-fragment mode (in which case, the title element should be allowed).