Script Origin Tracking: Difference between revisions

First cut.
(work in progress; just filed to get it off my laptop)
 
(First cut.)
Line 1: Line 1:
<i>This draft is being discussed in [https://bugzilla.mozilla.org/show_bug.cgi?id=637572 bug 637572]. The interface it describes is not stable, and perhaps not even implemented.</i>
<i>This draft is being discussed in [https://bugzilla.mozilla.org/show_bug.cgi?id=637572 bug 637572]. The interface it describes is not stable, and perhaps not even implemented.</i>


A debugger should help developers find any JavaScript code that has been introduced into a browsing context, regardess of how it got there; allow them to set breakpoints in any code; and be able to explain where any given piece of code came from.
A debugger should be able to explain how any given piece of JavaScript code running in a web page got there.


In a browser, JavaScript code can initially enter the system through any number of channels:
JavaScript code can initially enter a browsing context in several ways:
<ul>
<ul>
<li>A script may appear in an HTML <code>&lt;script&gt;</code> element (or be cited by its <code>src</code> attribute).
<li>A script may appear in an HTML <code>&lt;script&gt;</code> element (or be cited by its <code>src</code> attribute).
<li>A script may appear in HTML as an event handler content attribute.
<li>A script may appear in HTML as an event handler content attribute.
<li>The browser could retrieve a <code>javascript:</code> URL.
</ul>
</ul>
Once loaded, JavaScript code can then itself introduce new scripts:
 
Once loaded, JavaScript code can then itself introduce more code:
<ul>
<ul>
<li>It can call <code>eval</code> or the <code>Function</code> constructor.
<li>It can call <code>eval</code>, the <code>Function</code> constructor, and similar functions.
<li>It can assign a new script to a DOM element's event handler IDL attribute.
<li>It can create web workers; and web workers can call <code>importScripts</code>.
<li>It can assign new scripts to a DOM elements' event handler IDL attributes.
<li>It can use DOM manipulation (assignments to <code>innerHTML</code>, calls to <code>appendChild</code>, and so on) to introduce new &lt;script&gt; elements and event handler content attributes.
<li>It can use DOM manipulation (assignments to <code>innerHTML</code>, calls to <code>appendChild</code>, and so on) to introduce new &lt;script&gt; elements and event handler content attributes.
</ul>
</ul>


= Origin Values =
Given a particular piece of JavaScript, script origin tracking provides a complete trail showing how that JavaScript was loaded as a consequence of navigating to a resource.
 
== Location values ==
 
A <i>location</i> is a value that describes a particular point in markup text or a script. A location has the form:


<i>... separate into things representing scripts (algebraic) and things representing specific locations in a script (line, [column,] script)</i>
  { origin:<i>origin</i>, line:<i>line</i>, column:<i>column</i> }


An <i>origin value</i> describes how a particular script was loaded into its browsing context. An origin value has one of the forms below.
where <i>origin</i> is an origin value (described below), <i>line</i> is a one-based line number, and <i>column</i> is a zero-based column number. The <code>column</code> property is optional. The <code>line</code> property may also be omitted if it is not available; simple consumers could treat a missing <code>line</code> as referring to the beginning of the text.
 
A <i>markup location</i> is a location in markup text: a location whose <code>origin</code> is a markup origin. A <i>script location</i> is a location in a script: a location whose <code>origin</code> is a script origin.
 
== Origin values ==
 
An <i>origin value</i> is a value that describes where a particular markup text or script text came from: a URL, for example. A <i>script origin</i> is where a script came from; a <i>markup origin</i> is where some markup text (HTML or XML) came from. We describe the forms origin values can take and their meanings below.
 
=== Script origin values ===
 
A script origin value describes the origin of a particular piece of JavaScript code. It has one of the following forms:


<dl>
<dl>
<dt><code>{ url:<i>url</i> }</code>
<dt><code>{ scriptElement:<i>element</i>, markupLocation:<i>location</i> }</code>
<dd>This script was loaded from the resource identified by the absolute URL <i>url</i>. Note that this covers:
<dd>This script belongs to the &lt;script&gt; element <i>element</i> whose content appears inline at <i>location</i>. <i>Element</i> is a DOM element object; <i>location</i> is a markup location value.
 
<dt><code>{ scriptElement:<i>element</i>, markupLocation:<i>location</i>, url:<i>url</i> }</code>
<dd>As above, but for script elements with a <code>src</code> attribute, that refer to an external script resource. <i>Url</i> is the absolute form of the URL given by the <code>src</code> attribute.
 
<dt><code>{ scriptElement:<i>element</i>, scriptLocation:<i>location</i> }</code>
<dd>This script belongs to the dynamically constructed &lt;script&gt; element <i>element</i>, whose contents were assigned to it at <i>location</i>. Script elements created by <code>createElement</code> or similar functions use this form. <i>Element</i> is a DOM element object, and <i>location</i> is a script location value.
 
<dt><code>{ eventHandler:<i>element</i>, attribute:<i>attribute</i>, markupLocation:<i>location</i> }</code>
<dd>This script is the event handler content attribute <i>attribute</i> of <i>element</i>, appearing in markup at <i>location</i>. By 'event handler content attribute', we mean a bit of JavaScript code appearing in markup as the value of an element attribute. <i>Element</i> is a DOM element, <i>attribute</i> is the name of the event handler attribute, a string, and <i>location</i> is the location of the element's attribute, a markup location.
 
<dt><code>{ eventHandler:<i>element</i>, attribute:<i>attribute</i>, scriptLocation:<i>location</i> }</code>
<dd>As above, except that the handler script was assigned to <i>element</i>'s event handler IDL attribute <i>attribute</i> by JavaScript code at <i>location</i>, a script location. This covers both JavaScript assignments to element properties (like <code><i>element</i>.<i>property</i> = <i>script</i></code>) and calls to DOM methods that manipulate element attributes (like <code><i>element</i>.setAttribute("<i>attribute</i>", <i>script</i></code>).
 
<dt><code>{ evaluated:<i>function</i>, scriptLocation:<i>location</i> }</code>
<dd>The call at <i>location</i> to <i>function</i> produced this script. <i>Location</i> is a script location. <i>Function</i> is a string, naming the function called to evaluate or compile the script. Common values for <i>function</i> might be:
<ul>
<ul>
<li>&lt;script&gt; elements with a <code>src</code> attribute (here, <i>url</i> is that attribute's value);
<li><code>"eval"</code>, referring to the global object's <code>eval</code> property
<li>&lt;script&gt; elements with script text in-line; and
<li><code>"Function"</code>, referring to the <code>Function</code> constructor
<li>event handler content attributes (here, <i>url</i> identifies the containing HTML file)
<li><code>"setTimeout"</code>, referring to the HTML5 <code>setTimeout</code> function
</ul>
</ul>


<dt><code>{ eval:<i>function</i>, origin:<i>origin</i>, line:<i>line</i>, column:<i>column</i> }</code>
<dt><code>{ evaluated:<i>function</i>, scriptLocation:<i>location</i>, url:<i>url</i> }</code>
<dd>This script was produced by a call to <i>function</i> at <i>line</i> and <i>column</i> in the script given by <i>origin</i>, which is itself an origin value. <i>Function</i> is a string, naming the function called to evaluate or compile the script; typical values would be:
<dd>As above, where <i>function</i> loaded this script from <i>url</i>. This is used for functions like the Web Workers API's <code>importScripts</code>. <i>Url</i> is the absolute form of the URL from which the script was loaded, a string.
<ul>
 
<li><code>"eval"</code>, referring to the global object's <code>eval</code> property;
(Ideally, we would provide a way for cooperative custom content module loaders (the sort implemented using <code>XMLHttpRequest</code> and <code>eval</code>) to construct their own script origin values like this for the scripts they pass to <code>eval</code>, whose <i>location</i> values referred to the point at which they were called. Of course, telling a function the point from which it was called has security repercussions, so this would need to be handled carefully.)
<li><code>"Function"</code>, referring to the <code>Function</code> constructor;
<li><code>"setTimeout"</code>, referring to the HTML5 <code>setTimeout</code> function;
</ul>
and so on.


<dt><code>{ attributeAssignment:<i>attribute</i>, element:<i>element</i>, origin:<i>origin</i>, line:<i>line</i>, column:<i>column</i> }</code>
<dt><code>{ javascriptURL:<i>url</i> }</code>
<dd>The assignment at <i>line</i> and <i>column</i> in the script given by <i>origin</i> set the property of the DOM element <i>element</i> named <i>attribute</i> to this script. The value was interpreted as a script either because <i>attribute</i> is an event handler IDL attribute of <i>element</i> (like <code>"onclick"</code> or <code>"onkeydown"</code>), or <i>element</i> is a &lt;script&gt; node and <i>attribute</i> is <code>text</code> or <code>textContent</code>.
<dd>Retrieving the <code>javascript:</code> URL <i>url</i> created this script. <i>Url</i> is a string.


<dt><code>{ dynamicMarkup:<i>function</i>, element:<i>element</i>, origin:<i>origin</i>, line:<i>line</i>, column:<i>column</i> }</code>
Usually, the code in 'javascript:' URLs is so ephemeral that debuggers won't come across it, but it is possible for such code to live longer. For example, the effect of visiting a URL like:
<dd>The call to the dynamic markup insertion function named <i>function</i> at <i>line</i> and <i>column</i> in <i>origin</i> created this script. (This is used for functions that supply the new markup as a string, not for functions that operate on DOM elements as JavaScript objects.) (<i>What are line numbers relative to here?</i>)


<dt><code>{ elementConstructor:<i>constructor</i>, element:<i>element</i>, attribute:<i>attribute</i>, origin:<i>origin</i>, line:<i>line</i>, column:<i>column</i> }</code>
  javascript:g=function(){return\"look%20on%20my%20works,%20ye%20mighty,%20and%20despair\";};(void0) 
<dd>The call to <i>constructor</i> at <i>line</i> and <i>column</i> in <i>origin</i> created the DOM element <i>element</i> whose attribute <i>attribute</i> is this script. <i>Attribute</i> (We do not record the point at which <i>element</i> was inserted in the DOM, only where the element was constructed.)
is to create a function <code>g</code> on <code>window</code> --- the page's global object --- whose source can only reasonably be attributed to the <code>javascript:</code> URL.


Note that the result of evaluating a <code>javascript:</code> URL may itself taken to be markup or JavaScript source code. This origin refers to the code in the <code>javascript:</code> URL itself, not code produced by dereferencing such a URL.
</dl>
</dl>


<i>any origin value can have a <code>source</code> property, too</i>
Any script origin value may also have a property named <code>source</code>, whose value is the original source code of the script.


For example:
=== Markup origin values ===
 
A markup origin value describes the origin of a particular piece of markup text (HTML; XHTML; and so on). A markup origin value has one of the following forms:
 
<dl>
<dt><code>{ browsingContext:<i>url</i> }</code>
<dd>This describes a top-level browsing context visiting <i>url</i>. <i>Url</i> is an absolute URL, a string.
 
<dt><code>{ browsingContext:<i>url</i>, container:<i>container</i>, markupLocation:<i>location</i> }</code>
<dd>As above, except that the context is a nested browsing context whose browsing context container is <i>container</i> (an <code>&lt;iframe&gt;</code> element, perhaps) appearing in markup at <i>location</i>.
 
<dt><code>{ browsingContext:<i>url</i>, container:<i>container</i>, scriptLocation:<i>location</i> }</code>
<dd>As above, except that the browsing context container's URL was set by JavaScript at <i>location</i>. This is also the form used for
 
<dt><code>{ browsingContext:<i>url</i>, opener:<i>opener</i> }</code>
<dd>This describes an auxiliary browsing context whose opener browsing context is <i>opener</i>. <i>Opener</i> is a markup origin value.
 
<dt><code>{ dynamicMarkup:<i>node</i>, method:<i>method</i>, scriptLocation:<i>location</i> }</code>
<dd>The call at <i>location</i> to <i>node</i>'s method named <i>method</i> inserted this markup. This form is used for calls to <code>document.write</code> and similar functions. <i>Node</i> may be a DOM document or element; <i>method</i> is a string; and <i>location</i> is a script location.
 
<dt><code>{ dynamicMarkup:<i>node</i>, attribute:<i>attribute</i>, scriptLocation:<i>location</i> }</code>
<dd>The assignment at <i>location</i> to <i>node</i>'s attribute named <i>attribute</i> inserted this markup. This form is used for assignments to properties like <code>innerHTML</code>. <i>Node</i> may be a DOM document or element; <i>attribute</i> is a string; and <i>location</i> is a script location.
 
</dl>


  inline script
Any markup origin value may have a property named <code>source</code>, whose value is the markup text.


  inline event handler content attribute
== Examples ==


<i>distinction between javascript: URL content and retrieved resource content</i>


== The origin value prototype ==
== The origin value prototype ==


The prototype of an origin value holds the following methods:
The prototypes of origin values and location values hold the following methods:


<dl>
<dl>
<dt>toString()
<dt>toString()
<dd>Format the origin value as a human-readable string.
<dd>Format the location or origin value as a human-readable string. In English. <i>&#5b;Author is crushed by gigantic Monty Python-esque weight labeled "localization"&#5d;</i>
</dl>
</dl>


Line 72: Line 124:
The <code>origin</code> property of a [[Debug_Object#Debug.Script|<code>Debug.Script</code>]] is an origin value describing how the given script was loaded into its browsing context.
The <code>origin</code> property of a [[Debug_Object#Debug.Script|<code>Debug.Script</code>]] is an origin value describing how the given script was loaded into its browsing context.


= Open items =
== Open items ==


<ul>
<ul>
<li>Do these get serialized by XDR? Hopefully, yes. DOM elements probably don't need to be preserved there.
<li>Might be nice to have a lazy script location object that initially just holds the JSScript and PC (cheap to construct, because it avoids consulting the source map to get the line number), but can look up (and memoize) the origin/line on demand. These could hold a weak reference to the JSScript, such that, just before the JSScript goes away, we do the (JSScript, PC) -> (origin/line) computation. This saves a source map lookup when the location is never actually used and the JSScript outlives the lazy script location object.
<li>XDR serialization should just preserve this information; it will need to throw away DOM element references.
<li>Need to be able to pass an origin to eval explicitly (and thus, need the prototype to be public)
<li>Need to be able to pass an origin to eval explicitly (and thus, need the prototype to be public)
<li>It's possible to provide more details about javascript: URLs: where does the URL appear? Who put it there. But it doesn't seem like it should be a high priority.
</ul>
</ul>
Confirmed users
496

edits