User:Asqueella/JEP 107: Difference between revisions

m (Created page with 'This is an edited version of Labs/Jetpack/Reboot/JEP/107. == JEP 107 - Page Mods == * Champion: Daniel Buchner - daniel@mozilla.com * Status: Accepted/Pre-Production * Bug …')
 
(→‎TODO: Rename constructor to PageMod - done <http://bitbucket.org/nickolay/jetpack-packages/changeset/d8e31de0c690>)
 
(36 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is an edited version of [[Labs/Jetpack/Reboot/JEP/107]].
This is an alternative version of [[Labs/Jetpack/Reboot/JEP/107]] ('''page mods''').
 
Here's [http://bitbucket.org/nickolay/jetpack-packages/src/tip/packages/page-mods/ in progress implementation] (currently 'Model "B"').
 
----


== JEP 107 - Page Mods ==
== JEP 107 - Page Mods ==


* Champion: Daniel Buchner - daniel@mozilla.com
* Champion: Nickolay Ponomarev <asqueella@gmail.com>
* Status: Accepted/Pre-Production
* Status: ?
* Bug Ticket: [https://bugzilla.mozilla.org/show_bug.cgi?id=546739 546739]
* Bug Ticket: {{bug|546739}}
* Type: API
* Type: API
* Difficulty: 4


=== Proposal ===
=== Proposal ===
Page Mods is chiefly aimed at modifying and manipulating content documents within Firefox. Pages Mods should perform this functionality in a seamless fashion where the end result is the only change visible to the user.
Introduce an API allowing jetpacks to run script whenever a content page the jetpack is interested in loads.


=== Key Issues ===
This way of enhancing functionality of web sites was popularized by the [https://addons.mozilla.org/en-US/firefox/addon/748 Greasemonkey extension].  
When documents load that are represented in the "matches" white-list, we must ensure that script and styles injected into the page are evaluated by the parser in the normal load cycle. Perhaps we achieve this by dynamically creating resource (or custom protocol) URIs that contain the style and script blocks entered similar to the Background Page mechanisms. These URIs would be dynamically created CSS/JS files that would then be injected in the appropriate places withing the matched documents (CSS files in the doc head, JS files just after the close of the body tag).


=== Dependencies & Requirements ===
{{Note|Some parts of this API are generally agreed on (<code>include</code>, global <code>add()</code>/<code>remove()</code> methods), but there are two execution models under consideration for the scripts actually working with the page (see [[#Discussion - e10s]]).
* Requires [[Labs/Jetpack/Reboot/JEP/104|JEP 104 - Simple Storage]]
* Ability to dynamically create a resource, such a method would require [[Labs/Jetpack/Reboot/JEP/106|JEP 106 - Registered Jetpack URLs]]
* Capturing the page pre-render and injecting resources, for example: linked style sheets in the header and script tags after the body
* Ability to access extension specific resources


== Internal Methods ==
<p>The initial proposal described a model (called "B" in the e10s discussion), but current thinking is that another model, "C" is overall better. The difference between the two models are noted throughout the rest of this proposal.</p>}}


== API Methods ==
<span style="background-color: yellow">[Model "B"]</span> Unlike Greasemonkey scripts, in this proposal the pagemod's scripts all share the same Jetpack context and have the same privileges as the jetpack itself (but see [[#Discussion - e10s]] below).


=== Use Cases ===
* Porting Greasemonkey-style scripts to Jetpack (see [http://ehsanakhgari.org/blog/2010-01-07/bugzilla-tweaks-enhanced Bugzilla tweaks] for Jetpack prototype for example)
* Implementing scripts that enhance certain sites, while also having access to higher-privileged APIs.
* Adding methods and properties to the global window. For example, window.geolocation could have been implemented as a jetpack, and we could implement window.camera or window.microphone as a jetpack in the future. (via bsmedberg)


==== Page Mods <i>Initialization</i> ====
=== Non-Use Cases ===
* Porting Stylish-like CSS-based modifications to Jetpack (this could be added in a later version of the API)


=== Dependencies & Requirements ===
* <code>onWindowCreate</code> and current implementation of the module in general requires {{bug|549539}}'s fix, which (as of 2010-05-26) has only landed on mozilla-central (Firefox versions after 3.6.x). The plans are to land it on 3.6.x too, though.
* Making this API work in [[Electrolysis/Jetpack]] requires additional effort from the e10s team (see details on that page)


<pre class="brush:js;">
=== API Methods ===
var myMods = new pageMod({
Here's an example of how the ScriptMod API can be used in a jetpack:
  'include': ['*.google.com', 'jetpack.mozillalabs.com'],
  'exclude': ['https://mail.google.com/*'],
  'style': [
    'http://yui.yahooapis.com/3.0.0/build/cssbase/base-min.css',
    'body { background: #ffffff; font-family: Trebuchet MS; } span.details { font-size: 7px; }'
  ],
  'script': [
    'http://ajax.googleapis.com/ajax/libs/mootools/1.2.4/mootools.js',
    function(){
      $('container').addEvents({
          'click:relay(ul li)': function(){ this.setStyles('background','#000') },
          'mousenter:relay(ul li)': function(){ this.tween('background','#000') },
          'mouseleave:relay(ul li)': function(){ this.tween('background','#fff') }
      });
    }
  ]
});
</pre>
 
==== Page Mods Method: <i>add</i> ====


<b>Arguments:</b>
<pre class="brush:js; gutter:0">
var ScriptMod = require("page-mod").ScriptMod;
var myMod = new ScriptMod({
  include: ["*.example.com",
            "http://example.org/a/specific/url",
            "http://example.info/*"],


#<b>type</b> - (<i>string</i>) The type of modifier being added. Can be 'include', 'exclude', 'style', or 'script'
  // [Model "B"] The callbacks are specified in the jetpack context:
#<b>data</b> -
  onWindowCreate: function(wrappedWindow) {
#* include: (<i>array</i>) an array of URL strings to apply mods to
    // this runs each time a new content document starts loading, but
#* exclude: (<i>array</i>) an array of URL strings to skip when modding
    // before the page starts loading, so we can't interact with the
#* style: (<i>array</i>) an array of style or resource strings
    // page's DOM here yet.
#* script: (<i>function</i>) an array of functions or resource strings
    wrappedWindow.wrappedJSObject.newExposedProperty = 1;
  },
  onDOMReady: function(wrappedWindow) {
    // at this point we can work with the DOM
    wrappedWindow.document.body.innerHTML = "<h1>Jetpack Page Mods</h1>";
  },


<b>Returns:</b>
  // [Model "C"] The content script is run in a separate context, thus
  // it's specified in a separate file. The specific syntax is not final!
  script: require("self").data.url("my-example-org-mod.js"),
  // we'll also need a way to receive messages from the content
  // script here and maybe track the new pages getting loaded.
});
</pre>


The Page Mods instance
In this proposal, a script mod specifies the pages it might modify, and the scripts to run on these pages.


<b>Notes:</b>
The <code>onWindowCreate</code> callback function gets called at the earliest possible moment (before the page even started loading, which is made possible by the notifications added in {{bug|549539}}).


Modifications passed to the Page Mods instance with this method will be added, and persist, on open and future documents matching all of the URL(s) currently on the 'include' white-list.
The <code>onDOMReady</code> callback is called as soon as the page's DOM is ready (on the [http://wiki.greasespot.net/DOMContentLoaded <code>DOMContentLoaded</code>] event)


<b>Examples:</b>
==== <code>ScriptMod</code> constructor ====
The <code>ScriptMod</code> constructor takes a single <code>options</code> parameter which is an object that may define the following properties:
* <code>include</code>: a required parameter specifying the pages the scripts in this script mod should run on.
** Providing a string value <code>str</code> is equivalent to providing a single-item array <code>[str]</code>.
** The mod's scripts run on pages, matching ''any'' of <code>include</code> rules.
** Each <code>include</code> rule is a string using one of the following formats (see [[#Discussion - format for include|discussion below]]):
**# <code>*</code> (a single asterisk) - any page
**# <code>*.domain.name</code> - pages from the specified domain and all its subdomains, regardless of their scheme.
**# <code><nowiki>http://example.com/*</nowiki></code> - any URLs with the specified prefix.
**# <code><nowiki>http://example.com/test</nowiki></code> - the single specified URL
* <span style="background-color: yellow">[Model "B"]</span> <code>onWindowCreate</code>, <code>onDOMReady</code>: optional parameters specifying the code to run on the matched pages.
** No code is run if these parameters are not specified.
** Providing a single function <code>func</code> is equivalent to providing a single-item array <code>[func]</code>
** When the provided value is an array, its items are expected to be functions. Non-function values are ignored.
** The specified functions are called in order:
*** for <code>onWindowCreate</code> - when a page matching the <code>include</code> rules starts to load (but before any content is loaded in the page -- i.e. when the <code>content-document-global-created</code> notification implemented in {{bug|549539}} is issued)
*** for <code>onDOMReady</code> - when a <code>DOMContentLoaded</code> event fires for the matching page.
** An exception thrown from one of the functions does not stop the rest of functions from executing.
** The specified callbacks are called with a single <code>wrappedWindow</code> parameter -- the content's <code>window</code> object wrapped in an XPCNativeWrapper. The callback's <code>this</code> is the page mod object ('''TBD''' not currently implemented). It goes without saying that with this syntax the callbacks are run in the calling module's scope, not in the content page's scope.


<pre class="brush:js;">
Creating a <code>ScriptMod</code> instance does '''not''' automatically [[#global-add|add]] (activate) it.
myMods.add('include', ['*.digg.com']);


myMods.add('exclude', ['http://labs.digg.com/*']);
==== <code id="global-add">add()</code> ====
 
<pre class="brush:js; gutter:0">
myMods.add('style', ['body: { background: #ffffff; font-family: Trebuchet MS; } span.details { font-size: 7px; }']);
require("page-mod").add(scriptMod)
</pre>


// Creating a function reference
* <code>add()</code> makes the specified script mod take effect on any matching pages that start to load after the call. Adding a script mod does not apply it to existing matching pages.
* <code>scriptMod</code> must be a [[#ScriptMod constructor|<code>ScriptMod</code>]] instance.
* Trying to add the same script mod twice throws an exception.
* This method does not have a return value.


var pageLog = function(){
==== <code id="global-add">remove()</code> ====
  console.log('I just logged a message with Page Mods!');
<pre class="brush:js; gutter:0">
}
require("page-mod").remove(scriptMod)
 
myMods.add('script', pageLog);
</pre>
</pre>
* Call <code>remove()</code> to stop a script mod from running on further pages. This does not undo the mod's effects on already loaded pages.
* <code>scriptMod</code> must be a [[#ScriptMod constructor|<code>ScriptMod</code>]] instance, added earlier.
* Trying to remove a script mod, that has not been added, throws.
* This method does not have a return value.


==== FUTURE ADDITION: Page Mods Method: <i>remove</i> ====
=== Discussion ===


<b>Arguments:</b>
Extracted from [http://groups.google.com/group/mozilla-labs-jetpack/browse_thread/thread/09deffcc11fa00ea/2ce4c1ed979cb8da this thread].


#<b>type</b> - (<i>string</i>) The type of modifier being added. Can be 'include', 'exclude', 'style', or 'script'
==== Discussion - <code>include</code> option ====
#<b>data</b> -
A short survey of existing formats:
#* include: (<i>array</i>) an array of URL strings to apply mods to
* [http://wiki.greasespot.net/Include_and_exclude_rules Greasemonkey scripts] specify include and exclude URLs, each may contain wildcards ("*") in any location and may use a special ".tld" domain. These rules get compiled to a regular expression (see [http://github.com/greasemonkey/greasemonkey/blob/master/content/convert2RegExp.js convert2RegExp]), which is then matched against every URL loaded in the browser.
#* exclude: (<i>array</i>) an array of URL strings to skip when modding
* [http://code.google.com/chrome/extensions/match_patterns.html Match patterns for Google Chrome's content scripts] are similar to Greasemonkey's, but force to specify domain (either fully, any domain, or <code>*.domain</code>) and don't have the magic tld domain.
#* style: (<i>array</i>) an array of style or resource strings
** Also of interest: [http://code.google.com/p/chromium/issues/detail?id=18259], [http://groups.google.com/a/chromium.org/group/chromium-extensions/browse_thread/thread/9e3903c0817b5837/3d305eb340f01763]
#* script: (<i>function</i>) an array of functions or resource strings
* When specifying CSS styling [https://developer.mozilla.org/en/Using_the_Stylesheet_Service Using the Stylesheet Service], which is an easy and robust way to apply CSS to all content and is also what Stylish uses, you have to describe the filters using CSS, i.e. [https://developer.mozilla.org/index.php?title=En/CSS/%40-moz-document @-moz-document] rule. It allows to specify domain, exact URL, or the URL prefix.


<b>Returns:</b>
Comments: [http://groups.google.com/group/mozilla-labs-jetpack/msg/decd886a1ae37018 Myk #1] [http://groups.google.com/group/mozilla-labs-jetpack/msg/bd83ab801443a05b Myk #2] [http://groups.google.com/group/mozilla-labs-jetpack/msg/5cd0089d3795853e Brian]


The Page Mods instance
The 'include' option is made required to make the mods clearly specify which pages they apply to (for easier auditing).


<b>Examples:</b>
It was suggested to restrict the schemes of URLs page mods can run on, since letting a page mod run on chrome://, for example, can have security consequences we have not thought through.


<pre class="brush:js;">
==== Discussion - e10s ====
Context: in the 0.5 timeframe it is planned to move jetpacks to their own processes, as described on the [[Electrolysis/Jetpack]] page. In the long term, content tabs will run in their own processes as well ("out-of-process tabs").


myMods.remove('include', '*.google.com');
Communication between different processes is not entirely transparent [http://groups.google.com/group/mozilla-labs-jetpack/msg/8a06fdd83b71242c]: while the jetpack process will be able to call content functions, reference content objects and pass primitive values to content, content won't be able to hold references to jetpack objects. This means it won't be possible to pass a jetpack-defined callback to content functions (with a few possible exceptions).


myMods.remove('exclude', 'http://labs.digg.com/*');
From the discussion referenced above, there are different possible models of pagemods execution (implying different implementation requirements):
<pre>
A. run a script in the web page context, and let it communicate with the
jetpack via postMessage-style APIs. [...]


myMods.remove('style', [
B. run a script in the jetpack context and pass it the window/document for a
  'body { background; font-family; } span.details{}'  //This would remove all styles for the selector
page being loaded. This requires CPOW wrappers, which have some limitations[...]
]);
 
myMods.remove('script', pageLog);


C. Run page mods in the content processes (to avoid getting involved with CPOWs
and their limitations), but in a separate context from the page (to make it
possible to write page mods that can do things that we don't want to expose
to regular pages). My understanding is that it is similar to what Google
Chrome does and similar to what Greasemonkey does.
</pre>
</pre>


==== FUTURE ADDITION: Page Mods Method: <i>empty</i> ====
This proposal currently specifies (B). Comments collected from the discussion:
 
* on A:
<b>Arguments:</b>
** [Benjamin] This is trivially straightforward to do in a multi-process world[...]. But there are issues with polluting the content script namespace (e.g. if the jetpack needs to define functions).
 
** [Nickolay] this means we can't give it [the jetpack script in content] any additional privileges (e.g. by listening for postMessage'd requests asking to do something that requires chrome permissions or by providing additional APIs like GM_* in Greasemonkey). It's fine for simple scripts, but not in general, I think.
#<b>type</b> - (<i>string</i>) The type of modifier being added. Can be 'include', 'exclude', 'style', or 'script'
* on B:
 
** [Myk] Despite the limitations imposed by the requirement for CPOW wrappers, its developer ergonomics appeal to me. It's not yet clear what the relative security implications are, however.
<b>Returns:</b>
** [Nickolay] thinks that inability to register a callback is a major flaw for those who need it (cited the case of using [http://code.google.com/p/gmail-greasemonkey/wiki/GmailGreasemonkey10API Gmail's Greasemonkey API] to get notified of changes in the web app)
 
* on C:
The Page Mods instance
** [Nickolay] suggested this as an optional addition to (B) for scripts that need transparent interaction with content.
 
** [Benjamin] That's attractive in some ways, but it breaks the normal jetpack behavior of being a single script that does everything. I'm not sure it's worth breaking that programming model.
<b>Examples:</b>
** [http://groups.google.com/group/mozilla-labs-jetpack/msg/6c0eb6901d51742a [Myk]] given the technical limitations inherent to [jetpacks and content in separate processes], the question then becomes what feasible model best approximates [the] ideal experience. I think the answer to that question is in fact your suggested model C.
 
** '''Requires additional code in the single-process case, additions to the platform in the e10s case.'''
<pre class="brush:js;">
myMods.empty('include');
</pre>


== Use Cases ==
==== Discsussion - comparison to the original JEP ====
This JEP has three main differences from the [[Labs/Jetpack/Reboot/JEP/107|original JEP 107]]:
* CSS-based mods were deferred to a later version of the API.
* This JEP doesn't promise enabling/disabling page mods "instantly", since I don't see a way to implement it.
* Scripts in the original JEP run in the context of the page, while in this JEP they run in the jetpack context. Although it's an important feature, I think it can be implemented separately, since it requires substantially more effort and additional coordination for e10s.
* <code>add/remove/empty</code> methods on the page mod object were not included, since there's no clear use case for them, especially if the changes are not applied instantly, as in this proposal.


# Creation of CSS-based add-ons like Stylish, EditCSS, etc...
==== Discsussion - script context in Model C ====
# Creation of JS-based add-ons like Execute JS, JS Exec etc...
Background: Chrome's [http://code.google.com/chrome/extensions/content_scripts.html Content Scripts] and [http://code.google.com/chrome/extensions/messaging.html Message Passing].
# In General: Any Greasemonky-style add-on, with the advantage that this API would allow for far greater flexibility - turning on and off only certain parts of a mod, automatically flashing a new url/web-page with the active parts of a mod by using the <i>add</i> method to include a new match to the matches white-list


=== Common Actions ===
A few issues here:
* '''[RESOLVED]''' At which point does the separate script run, how does it declare it wants to do something on-window-created or on-DOM-ready.
** (Chrome has "run_at" option in the content script's manifest, which defaults to "document_idle" meaning "sometime between DOMReady and soon after onload" with other options being DOMReady and WindowCreated. It's not clear how important this is for performance and why.)
** Since we want to make it possible to run code on-window-created to install APIs, we'll run the pagemod script before load starts and provide onReady callback or have a DOMContentLoaded example in documentation.
* '''[UNRESOLVED]''' How do we let the script include common libraries / modularize its code
** [Nickolay] Chrome lists the scripts to be loaded (in the single content script context) in order in the manifest. Simple and similar to web pages, but different from jetpack execution model.
** Possible option: provide <code>require()</code> to content scripts, like in the main jetpack
*** Pros
**** [Myk/Brian] we do need a way for the mod context to access self.data.url, JavaScript libraries like jQuery, and probably some other functionality. And modules are our hammer.
**** [[http://groups.google.com/group/mozilla-labs-jetpack/msg/b6a054635076a25e Myk]] there is a long tail of built-in functionality that page mods might want to access [..] We could design another mechanism to expose APIs to page mod modules, but that mechanism would either be [..] limited  [..] duplicate what "require" already provides.
***** [[http://groups.google.com/group/mozilla-labs-jetpack/msg/42ff72faf47e91f3 Benjamin]] strongly disagree here. Page mods, if they wish to access all these other bits, should use message-passing to the main addon code.
*** Cons
**** [Myk] a bit worried about the potential for confusion due to conflating the two spaces by providing both with the same interface for importing functionality but not allowing one to import the same functionality as the other.
**** [Nickolay] we should also remember that the content scripts (and the related CommonJS machinery) will reload every single page load. I think that while the CommonJS hammer is attractive, the content scripts should generally not be as complex as to require it. We can add it later if there's need.
** [Brian] Perhaps the first release will not provide a require() function, and then later (once we figure out our story for the "search path" for this context and how it differs from the other modules), we can make it available.
** [Brian] I'm vaguely thinking that the PageMod() constructor, next to the script: argument, could provide a list of libraries that are made available to that script. Maybe a mapping, like: <code>scriptlibs: { jquery: data.url("jquery.js") } }</code>. Allowing my-example-org-mod.js to use: <code>var jq = require("jquery");</code>
* '''[RESOLVED?]''' What is the script's global, how does it access page's Window and Document
** [Nickolay] Both GreaseMonkey and Chrome create a clean object as the script's global with __proto__ set to XPCNativeWrapper(contentWindow) and necessary globals added to it (GM_*, chrome.extension.*). I think we should implement a scheme like GM's/Chrome's, since it will be the most familiar and intuitive. [[http://groups.google.com/group/mozilla-labs-jetpack/msg/b60602ab3bca1517 Myk] and [http://groups.google.com/group/mozilla-labs-jetpack/msg/42ff72faf47e91f3 Benjamin agreed this is a good solution for jetpack as well]. Other options are listed in Benjamin's message.]
** [Brian] Passing the window as an argument [to a callback] (versus providing it to the whole module as a global) seems more in keeping with the "The Number Of Globals In A CommonJS Module Shall Be Two: require and exports" pattern. [...]
*** [Myk,Nickolay] given that the sole purpose of "page mod" modules is to access the pages they are modifying, it's worth simplifying access to the "window" object by defining it globally
* How does the script communicate with the main jetpack
** GreaseMonkey defines several GM_* globals to provide additional functionality to GM scripts.
** [Nickolay] Chrome implements bidirectional asynchronous message passing via <code>chrome.extension.sendReqest(json, responseCallback)</code> and another pipe (<code>Port</code>) based API for long-lived connections.
** [Nickolay] If we are going to allow exporting APIs (e.g. window.microphone) via this mechanism, we might need sync content->jetpack messaging. bsmedberg also mentioned this as a possibility.
** [Brian] suggested a similar pipe-based mechanism: <code>onNewPage: function (pipe) {</code> in the ScriptMod options. "Instead of a "pipe" argument, maybe the onNewPage function should get a "control" object, from which it can manipulate the pipe, ask about the URL from which the target page was loaded, and register to hear about the page going away. The latter would be necessary for the all-volume-control jetpack to remove closed pages from its list."
** [Benjamin] Start with something like <code>addon.postMessage(JSON.stringify(messageobj), '*');</code> to re-use existing DOM mindshare (this doesn't do RPC).


The API, if done in this fashion, give the developer the ability to dramatically simplify application actions such as:
=== TODO ===
* Creating an instance of Page Mods that adds script or styles to a set of matched urls
* For SDK 0.5:
* Further extending and existing instance of Page Mods with additional styles and script
*# <s>Rename constructor to PageMod [http://groups.google.com/group/mozilla-labs-jetpack/msg/008c7dba5e4d21a1]</s>
* Toggling on and off specific styles or script within a Page Mods instance
*# Finalize the format for "include" rules and implement the necessary changes.
* Adding new matches to a Page Mods instance, which in turn instantly applies active styles and script within that instance to the newly added matches.
*#* Restrict "*" to only match HTTP(S)+FTP [http://groups.google.com/group/mozilla-labs-jetpack/msg/e23a845e70561473] [http://groups.google.com/group/mozilla-labs-jetpack/msg/2e7eb312e61820bb]
* Multiple instances of Page Mods can be instantiated, which enables a whole cadre of functionality that the object-bound 'singleton' implementation neglects.
*# Identify changes required for [[Electrolysis/Jetpack]] and implement them.
*# Fix the remaining XXX:
*#* minor tweaks
*#* disable test on not supported Firefox versions (e.g. 3.6.3) -- is it needed or will jetpack drop support for 3.6.x with e10s anyway?
*#* (maybe) figure out leak report in tests if the test tab is not closed before stopping tests.
*#* Do we pass an XPCNativeWrapper to pagemod callbacks and do we advertise it in the docs?
*#** [http://groups.google.com/group/mozilla-labs-jetpack/msg/bd83ab801443a05b Myk]: "the argument the callback functions are passed should be called simply "window" in the documentation rather than wrappedWindow, as developers are unlikely to encounter the differences"
*#** [http://groups.google.com/group/mozilla-labs-jetpack/msg/e23a845e70561473 Nickolay]: disagreed - XPCNW are very visible, "I'm keeping wrappedWindow for now, pending the decision to just pass an unwrapped value to the callback."
*#* [docs] encourage addon developers to clean up after their page mods via the unload module. (The clean up actions should also run when the script mod is removed).
*#* [docs] Should provide an example of using jQuery in a script mod.
*#* If we keep model "B"'s APIs make sure to implement the latest naming changes suggested by Myk.
* Post-0.5:
*# Possible API enhancements:
*#* implement helper functions for common actions (insert <style>s, <script>s, etc.)
*#* filtering functions as part of <code>include</code> ([http://groups.google.com/group/mozilla-labs-jetpack/msg/bd83ab801443a05b])
*#* CSS-based mods
*# Provide a way to run scripts in separate context for each page (i.e. in the content process for out-of-process tabs)

Latest revision as of 16:22, 5 June 2010

This is an alternative version of Labs/Jetpack/Reboot/JEP/107 (page mods).

Here's in progress implementation (currently 'Model "B"').


JEP 107 - Page Mods

  • Champion: Nickolay Ponomarev <asqueella@gmail.com>
  • Status: ?
  • Bug Ticket: bug 546739
  • Type: API

Proposal

Introduce an API allowing jetpacks to run script whenever a content page the jetpack is interested in loads.

This way of enhancing functionality of web sites was popularized by the Greasemonkey extension.

Note: Some parts of this API are generally agreed on (include, global add()/remove() methods), but there are two execution models under consideration for the scripts actually working with the page (see #Discussion - e10s).

The initial proposal described a model (called "B" in the e10s discussion), but current thinking is that another model, "C" is overall better. The difference between the two models are noted throughout the rest of this proposal.

[Model "B"] Unlike Greasemonkey scripts, in this proposal the pagemod's scripts all share the same Jetpack context and have the same privileges as the jetpack itself (but see #Discussion - e10s below).

Use Cases

  • Porting Greasemonkey-style scripts to Jetpack (see Bugzilla tweaks for Jetpack prototype for example)
  • Implementing scripts that enhance certain sites, while also having access to higher-privileged APIs.
  • Adding methods and properties to the global window. For example, window.geolocation could have been implemented as a jetpack, and we could implement window.camera or window.microphone as a jetpack in the future. (via bsmedberg)

Non-Use Cases

  • Porting Stylish-like CSS-based modifications to Jetpack (this could be added in a later version of the API)

Dependencies & Requirements

  • onWindowCreate and current implementation of the module in general requires bug 549539's fix, which (as of 2010-05-26) has only landed on mozilla-central (Firefox versions after 3.6.x). The plans are to land it on 3.6.x too, though.
  • Making this API work in Electrolysis/Jetpack requires additional effort from the e10s team (see details on that page)

API Methods

Here's an example of how the ScriptMod API can be used in a jetpack:

var ScriptMod = require("page-mod").ScriptMod;
var myMod = new ScriptMod({
  include: ["*.example.com",
            "http://example.org/a/specific/url",
            "http://example.info/*"],

  // [Model "B"] The callbacks are specified in the jetpack context:
  onWindowCreate: function(wrappedWindow) {
    // this runs each time a new content document starts loading, but
    // before the page starts loading, so we can't interact with the
    // page's DOM here yet.
    wrappedWindow.wrappedJSObject.newExposedProperty = 1;
  },
  onDOMReady: function(wrappedWindow) {
    // at this point we can work with the DOM
    wrappedWindow.document.body.innerHTML = "<h1>Jetpack Page Mods</h1>";
  },

  // [Model "C"] The content script is run in a separate context, thus
  // it's specified in a separate file. The specific syntax is not final!
  script: require("self").data.url("my-example-org-mod.js"),
  // we'll also need a way to receive messages from the content
  // script here and maybe track the new pages getting loaded.
});

In this proposal, a script mod specifies the pages it might modify, and the scripts to run on these pages.

The onWindowCreate callback function gets called at the earliest possible moment (before the page even started loading, which is made possible by the notifications added in bug 549539).

The onDOMReady callback is called as soon as the page's DOM is ready (on the DOMContentLoaded event)

ScriptMod constructor

The ScriptMod constructor takes a single options parameter which is an object that may define the following properties:

  • include: a required parameter specifying the pages the scripts in this script mod should run on.
    • Providing a string value str is equivalent to providing a single-item array [str].
    • The mod's scripts run on pages, matching any of include rules.
    • Each include rule is a string using one of the following formats (see discussion below):
      1. * (a single asterisk) - any page
      2. *.domain.name - pages from the specified domain and all its subdomains, regardless of their scheme.
      3. http://example.com/* - any URLs with the specified prefix.
      4. http://example.com/test - the single specified URL
  • [Model "B"] onWindowCreate, onDOMReady: optional parameters specifying the code to run on the matched pages.
    • No code is run if these parameters are not specified.
    • Providing a single function func is equivalent to providing a single-item array [func]
    • When the provided value is an array, its items are expected to be functions. Non-function values are ignored.
    • The specified functions are called in order:
      • for onWindowCreate - when a page matching the include rules starts to load (but before any content is loaded in the page -- i.e. when the content-document-global-created notification implemented in bug 549539 is issued)
      • for onDOMReady - when a DOMContentLoaded event fires for the matching page.
    • An exception thrown from one of the functions does not stop the rest of functions from executing.
    • The specified callbacks are called with a single wrappedWindow parameter -- the content's window object wrapped in an XPCNativeWrapper. The callback's this is the page mod object (TBD not currently implemented). It goes without saying that with this syntax the callbacks are run in the calling module's scope, not in the content page's scope.

Creating a ScriptMod instance does not automatically add (activate) it.

add()

require("page-mod").add(scriptMod)
  • add() makes the specified script mod take effect on any matching pages that start to load after the call. Adding a script mod does not apply it to existing matching pages.
  • scriptMod must be a ScriptMod instance.
  • Trying to add the same script mod twice throws an exception.
  • This method does not have a return value.

remove()

require("page-mod").remove(scriptMod)
  • Call remove() to stop a script mod from running on further pages. This does not undo the mod's effects on already loaded pages.
  • scriptMod must be a ScriptMod instance, added earlier.
  • Trying to remove a script mod, that has not been added, throws.
  • This method does not have a return value.

Discussion

Extracted from this thread.

Discussion - include option

A short survey of existing formats:

  • Greasemonkey scripts specify include and exclude URLs, each may contain wildcards ("*") in any location and may use a special ".tld" domain. These rules get compiled to a regular expression (see convert2RegExp), which is then matched against every URL loaded in the browser.
  • Match patterns for Google Chrome's content scripts are similar to Greasemonkey's, but force to specify domain (either fully, any domain, or *.domain) and don't have the magic tld domain.
  • When specifying CSS styling Using the Stylesheet Service, which is an easy and robust way to apply CSS to all content and is also what Stylish uses, you have to describe the filters using CSS, i.e. @-moz-document rule. It allows to specify domain, exact URL, or the URL prefix.

Comments: Myk #1 Myk #2 Brian

The 'include' option is made required to make the mods clearly specify which pages they apply to (for easier auditing).

It was suggested to restrict the schemes of URLs page mods can run on, since letting a page mod run on chrome://, for example, can have security consequences we have not thought through.

Discussion - e10s

Context: in the 0.5 timeframe it is planned to move jetpacks to their own processes, as described on the Electrolysis/Jetpack page. In the long term, content tabs will run in their own processes as well ("out-of-process tabs").

Communication between different processes is not entirely transparent [3]: while the jetpack process will be able to call content functions, reference content objects and pass primitive values to content, content won't be able to hold references to jetpack objects. This means it won't be possible to pass a jetpack-defined callback to content functions (with a few possible exceptions).

From the discussion referenced above, there are different possible models of pagemods execution (implying different implementation requirements):

A. run a script in the web page context, and let it communicate with the
jetpack via postMessage-style APIs. [...]

B. run a script in the jetpack context and pass it the window/document for a
page being loaded. This requires CPOW wrappers, which have some limitations[...]

C. Run page mods in the content processes (to avoid getting involved with CPOWs
and their limitations), but in a separate context from the page (to make it
possible to write page mods that can do things that we don't want to expose
to regular pages). My understanding is that it is similar to what Google
Chrome does and similar to what Greasemonkey does.

This proposal currently specifies (B). Comments collected from the discussion:

  • on A:
    • [Benjamin] This is trivially straightforward to do in a multi-process world[...]. But there are issues with polluting the content script namespace (e.g. if the jetpack needs to define functions).
    • [Nickolay] this means we can't give it [the jetpack script in content] any additional privileges (e.g. by listening for postMessage'd requests asking to do something that requires chrome permissions or by providing additional APIs like GM_* in Greasemonkey). It's fine for simple scripts, but not in general, I think.
  • on B:
    • [Myk] Despite the limitations imposed by the requirement for CPOW wrappers, its developer ergonomics appeal to me. It's not yet clear what the relative security implications are, however.
    • [Nickolay] thinks that inability to register a callback is a major flaw for those who need it (cited the case of using Gmail's Greasemonkey API to get notified of changes in the web app)
  • on C:
    • [Nickolay] suggested this as an optional addition to (B) for scripts that need transparent interaction with content.
    • [Benjamin] That's attractive in some ways, but it breaks the normal jetpack behavior of being a single script that does everything. I'm not sure it's worth breaking that programming model.
    • [Myk] given the technical limitations inherent to [jetpacks and content in separate processes], the question then becomes what feasible model best approximates [the] ideal experience. I think the answer to that question is in fact your suggested model C.
    • Requires additional code in the single-process case, additions to the platform in the e10s case.

Discsussion - comparison to the original JEP

This JEP has three main differences from the original JEP 107:

  • CSS-based mods were deferred to a later version of the API.
  • This JEP doesn't promise enabling/disabling page mods "instantly", since I don't see a way to implement it.
  • Scripts in the original JEP run in the context of the page, while in this JEP they run in the jetpack context. Although it's an important feature, I think it can be implemented separately, since it requires substantially more effort and additional coordination for e10s.
  • add/remove/empty methods on the page mod object were not included, since there's no clear use case for them, especially if the changes are not applied instantly, as in this proposal.

Discsussion - script context in Model C

Background: Chrome's Content Scripts and Message Passing.

A few issues here:

  • [RESOLVED] At which point does the separate script run, how does it declare it wants to do something on-window-created or on-DOM-ready.
    • (Chrome has "run_at" option in the content script's manifest, which defaults to "document_idle" meaning "sometime between DOMReady and soon after onload" with other options being DOMReady and WindowCreated. It's not clear how important this is for performance and why.)
    • Since we want to make it possible to run code on-window-created to install APIs, we'll run the pagemod script before load starts and provide onReady callback or have a DOMContentLoaded example in documentation.
  • [UNRESOLVED] How do we let the script include common libraries / modularize its code
    • [Nickolay] Chrome lists the scripts to be loaded (in the single content script context) in order in the manifest. Simple and similar to web pages, but different from jetpack execution model.
    • Possible option: provide require() to content scripts, like in the main jetpack
      • Pros
        • [Myk/Brian] we do need a way for the mod context to access self.data.url, JavaScript libraries like jQuery, and probably some other functionality. And modules are our hammer.
        • [Myk] there is a long tail of built-in functionality that page mods might want to access [..] We could design another mechanism to expose APIs to page mod modules, but that mechanism would either be [..] limited [..] duplicate what "require" already provides.
          • [Benjamin] strongly disagree here. Page mods, if they wish to access all these other bits, should use message-passing to the main addon code.
      • Cons
        • [Myk] a bit worried about the potential for confusion due to conflating the two spaces by providing both with the same interface for importing functionality but not allowing one to import the same functionality as the other.
        • [Nickolay] we should also remember that the content scripts (and the related CommonJS machinery) will reload every single page load. I think that while the CommonJS hammer is attractive, the content scripts should generally not be as complex as to require it. We can add it later if there's need.
    • [Brian] Perhaps the first release will not provide a require() function, and then later (once we figure out our story for the "search path" for this context and how it differs from the other modules), we can make it available.
    • [Brian] I'm vaguely thinking that the PageMod() constructor, next to the script: argument, could provide a list of libraries that are made available to that script. Maybe a mapping, like: scriptlibs: { jquery: data.url("jquery.js") } }. Allowing my-example-org-mod.js to use: var jq = require("jquery");
  • [RESOLVED?] What is the script's global, how does it access page's Window and Document
    • [Nickolay] Both GreaseMonkey and Chrome create a clean object as the script's global with __proto__ set to XPCNativeWrapper(contentWindow) and necessary globals added to it (GM_*, chrome.extension.*). I think we should implement a scheme like GM's/Chrome's, since it will be the most familiar and intuitive. [Myk and Benjamin agreed this is a good solution for jetpack as well. Other options are listed in Benjamin's message.]
    • [Brian] Passing the window as an argument [to a callback] (versus providing it to the whole module as a global) seems more in keeping with the "The Number Of Globals In A CommonJS Module Shall Be Two: require and exports" pattern. [...]
      • [Myk,Nickolay] given that the sole purpose of "page mod" modules is to access the pages they are modifying, it's worth simplifying access to the "window" object by defining it globally
  • How does the script communicate with the main jetpack
    • GreaseMonkey defines several GM_* globals to provide additional functionality to GM scripts.
    • [Nickolay] Chrome implements bidirectional asynchronous message passing via chrome.extension.sendReqest(json, responseCallback) and another pipe (Port) based API for long-lived connections.
    • [Nickolay] If we are going to allow exporting APIs (e.g. window.microphone) via this mechanism, we might need sync content->jetpack messaging. bsmedberg also mentioned this as a possibility.
    • [Brian] suggested a similar pipe-based mechanism: onNewPage: function (pipe) { in the ScriptMod options. "Instead of a "pipe" argument, maybe the onNewPage function should get a "control" object, from which it can manipulate the pipe, ask about the URL from which the target page was loaded, and register to hear about the page going away. The latter would be necessary for the all-volume-control jetpack to remove closed pages from its list."
    • [Benjamin] Start with something like addon.postMessage(JSON.stringify(messageobj), '*'); to re-use existing DOM mindshare (this doesn't do RPC).

TODO

  • For SDK 0.5:
    1. Rename constructor to PageMod [4]
    2. Finalize the format for "include" rules and implement the necessary changes.
      • Restrict "*" to only match HTTP(S)+FTP [5] [6]
    3. Identify changes required for Electrolysis/Jetpack and implement them.
    4. Fix the remaining XXX:
      • minor tweaks
      • disable test on not supported Firefox versions (e.g. 3.6.3) -- is it needed or will jetpack drop support for 3.6.x with e10s anyway?
      • (maybe) figure out leak report in tests if the test tab is not closed before stopping tests.
      • Do we pass an XPCNativeWrapper to pagemod callbacks and do we advertise it in the docs?
        • Myk: "the argument the callback functions are passed should be called simply "window" in the documentation rather than wrappedWindow, as developers are unlikely to encounter the differences"
        • Nickolay: disagreed - XPCNW are very visible, "I'm keeping wrappedWindow for now, pending the decision to just pass an unwrapped value to the callback."
      • [docs] encourage addon developers to clean up after their page mods via the unload module. (The clean up actions should also run when the script mod is removed).
      • [docs] Should provide an example of using jQuery in a script mod.
      • If we keep model "B"'s APIs make sure to implement the latest naming changes suggested by Myk.
  • Post-0.5:
    1. Possible API enhancements:
      • implement helper functions for common actions (insert <style>s, <script>s, etc.)
      • filtering functions as part of include ([7])
      • CSS-based mods
    2. Provide a way to run scripts in separate context for each page (i.e. in the content process for out-of-process tabs)