LocalStorage

From MozillaWiki
Revision as of 00:16, 29 June 2005 by Roc (talk | contribs) (→‎Storage)
Jump to navigation Jump to search

Problem

The web is an application platform. There are millions of people who use the web as their primary tool to read email, store and edit documents and search for information. However, the web is only usable as an interface when the network is available.

When people want to take an interface that they have designed for the web and make it work when that network is not available they only have a couple of options. They can write an application that the user installs on their hard drive and runs as a full native application. This could be written using native tools (i.e. a windows application) or as a an interpreted application using an OS-in-a-box (i.e. against sun's proprietary Java implementation or something .NET based.) This nearly always involves creating a second interface in addition to the "main" web interface. This is an expensive proposition for developers of software. The inability of most solutions in this area to leverage the existing front end web code should be considered a failure.

A second problem is that installing software on your computer often involves an all-or-nothing choice. Either you install the software, and give it complete access to your system or you don't install it and you can't gain the benefits of having the ability to do your work offline. Victims of spyware on windows understand this pain. It would be nice if we could offer a system that sandboxes data for a web application in such a way as it doesn't have to give access to local resources, other than what is normally available to a standard web application. Our extensions system also suffers from this problem.

The last problem that we believe needs to be solved is the problem surrounding how difficult it is to install software. We believe that for most users installing software is hard and scary. We believe that this is one of the reasons why IE has such a huge advantage over us in the market. It would be nice if we could offer a solution that included no obvious software installation. A good measure would be that adding a new offline "application" would be as easy as making a bookmark. Most users know how to make bookmarks and consider it to be a safe operation. This is in contrast to Extensions, which solve most of the above problems with the exception of trust and ease of interface.

Goals

These three problems lead us to a few specific goals:

1. The system should leverage the web technologies that exist today. This means that JavaScript, CSS and the DOM are the main technologies used.

2. The system should use an incremental approach that allows web developers the ability to add this to their sites with very little cost or development time.

3. The system should operate within the security principal of the original web site that provides the application, and except for an additional set of APIs that the application can use, the apps get no more permissions.

4. The system should be so easy to use and safe-looking that using it should not make users uncomfortable. This means no installation dialogs, no preferences and no progress bars. In fact, they shouldn't even know they are using them.

Development strategy

Deployment

The first thing that we need to describe is what makes up a web application. This generally consists a set of files. HTML or XML to describe a basic document heirarchy, JavaScript to manipulate it and CSS to describe how to render it. A manifest must be included that describes all of the components that make up an application.

Another problem with deployment is versioning. You have to know that a particular browser version supports a particular set of apis. With multiple versions of various browser implementations out there this creates a large matrix of support and testing that has to take place. It would be nice if the api that a browser supports also includes a capability-based versioning scheme. This also has the advantage that some browsers (like a small handheld browser with limited storage) might only have to implement part of the api.

roc: I think the right thing to do here is to make an offline application just a set of pinned pages in the cache. Then you can write a single set of pages that can function online or offline. HTTP caching instructions take care of application updating. We could automatically allow a site to pin N% of your disk space, if your free disk space is >M% of total; if they try to go over that, the user has to approve it.
Instead of providing a manifest, a more Web-friendly approach might be to crawl the page. Basically, when you select a page for offline viewing (which could be as simple as just bookmarking it), we pin it and also download and pin any of the resources we would if we were doing Save As --- images, stylesheets, scripts, IFRAMEs. In addition we could provide a new 'rel' value for <a> links that says "crawl this link for offline viewing". This would be really easy for Web developers to use and maintain and not too difficult to implement, I suspect. Application crawling and download could be done in the background.

Storage

It's important to think about the kinds of strategies we want to use for apps to store data. Very often on the backend of most web sites there's a structured database. We believe that following that model for our data model makes it very easy for reflect that data back into the client side store with very little effort, including encouraging the building of automated utilities to do so. However, it's not our goal to create a fully featured relational database on the client. It's important to find the correct mix between enough complexity to get things done and the simplicity that makes things easy to use.

We believe that allowing for storing and querying from the usual two standard models is important. These are query by key (maps to dictionaries) and iterative and lookup by offset (maps to arrays.)

Because people will add apps online and sometimes remove apps offline, we believe that it's possible to mark some sets of data as being "dirty" so the browser can warn a user that some data associated with that application has not been saved.

People writing apps will want to be able to download and store chunks of content and binary data such as images or java class files. Therefore, we think it's important that you have the ability to redirect the output from a url load to particular place in a database. Also, the ability to execute a url load (javascript or image, for example) from a place in storage using a url syntax is important.

It's also important to add in some basic techniques for querying the data stored in the rows has value. For example, if you have an app that's storing 50 megabytes of email and you want to search that text, loading and searching each piece of text is expensive. We should add some simple and useful sugar to make this kind of thing easy. Also, user-defined functions for sorting and comparisons in queries would make this kind of thing a lot more useful.

It would also be useful to app writers if some data could be automatically expired, like from a LIFO cache. This means that some apps could be written so that they could keep a "smart cache" around. Imagine browsing a wiki and the wiki site kept a list of all of the apps that were linked to the pages that you read. This would still allow you to do some research and editing while you were offline, even though you hadn't specifically downloaded all those articles. Or you a maps site could keep chunks of map data in your cache near your house so that they would be available when you were offline.

Applications should be able to manage their own cache if they want. This means that we need to expose the amount of storage used by any particular application and allow the application to break down the amount of data stored in any particular table.

Sharing data between applications is also important. We believe that we should use the cookie model as the basis for how to share data across applications. For example, if you're using a mail application to read mail offline the addressbook data, stored at a different site, should also be easy to access. (Think of mail.yahoo.com and address.yahoo.com.)

roc: I think that trying to provide structured storage on the client is hopeless for the same reasons that trying to provide structured storage in a filesystem is hopeless. Whatever model we choose will never be the right model for the majority of applications. Furthermore, it diverges from today's Web programming models. I think we should just provide a simple filesystem API --- essentially, a persistent hashmap from string keys to string values --- slightly enhanced cookies. Remember that developers will want to at least do all the things they do with cookies today, including obfuscation, encryption, and on our side, quota management. People can build libraries on top of this if they want to. They can build libraries for indexing, LIFO cache management and so on. In fact, I would suggest that we simply use cookies as the API, but relax some of the limits. If we do that, we'd want to add a new method to read or write a substring of a cookie to support efficient access to very large cookies.

UI

We believe that it's important to avoid having to re-educate users about what this new model means to them. Their only experience should be that the web is suddenly much more useful than it was before, while at the same time being just as safe.

We think that the best possible place to make this change is through the bookmark user interface. It leverages the existing training and user interface to which people are accustomed. People know how to use bookmarks and already feel safe doing so. For example if you bookmark a site and that site contains a special <link> tag that bookmark will be added as a "smart bookmark" downloading the manifest for the application. Then when you access the site through the bookmark and you're offline you get the local copy of in the manifest instead of the one off the network.

Deleting a smart bookmark can delete the associated storage of that data, after warning a user of any unsaved data.

We can only identify two places where new UI might be required.

1. A way to modify the amount of storage that's allocated to a particular app.

2. A warning that a user is about to remove an "bookmark" that contains unsaved data.

Since apps will know how much of the allocated storage they have used, it might be nice to allow an app to throw the dialog that change the amount of storage that's used.

There's also the problem of trusted vs. untrusted computers. We should probably add a way to easily disable this functionality completely for use on untrusted computers.

In summary, it's important that we leverage the existing UI that's out there and make these new "apps" painless and transparent to users.

roc: I really like the bookmarks UI idea. When we're in offline mode bookmarks for pages that are not available should be disabled with a tooltip explaining why. Although maybe everything that gets bookmarked should be downloaded for offline access anyway.

APIs

The apis have a few easy rules:

1. Think about the use cases.

2. Always leverage what you have today.

3. Avoid over-complex calls and abstractions. People can (and will) add those later through their own libraries.

Known use cases:

1. Storage <-> XML-RPC bridge. It's clear that people will want to build bridges between XML-RPC and this storage mechanism.

2. Binary data retreived from URLs. People will want to download binary data (especially images) into a database row directly.

3. Storing and querying text. It should be possible to download and search text strings.

4. Storing and querying structured data. If you download an XML document into a database row, it might be nice to be able to search that data based on an attribute name or value stored in that XML document.

Deployment APIs

  window.supportsCapability(string aCapability);
    Check to see if the api supports aCapability.
  Storage-related APIs
  Creating and querying storage types.
  window.storage.create(string aName, dict aDescription)
    aName is the name of the storage.  The description is an array of
    dictionaries that describes the structure of the storage.  For
    example:
    var desc = [ { name: "col1", type: "int", key: true },
                 { name: "col2", type: "string" },
                 { name: "col3", type: "cache" ]
    window.storage.create("foo", desc);
    The key: entry in the dictionary notes that this column will hold
    primary keys.
    Supported types include:
      int - An integer of size XXX.
      double - A floating point number of size XXX.
      string - A string in unicode format (UCS-2?  Should be at least
        UTF-8.)
      blob - A binary chunk of data.  Each entry can have its own
      mime type.  This mime type can be set by the application or is
      automatically set when the blob is filled from the network.
      cache - this special type is actually a double saying that the
      storage module should manage caching in this particular storage
      module.  If the application starts to fill to the maximum
      storage capacity, entries with lower values will be deleted
      before entries with higher values.  However, entries with
      negative values will never be deleted.  Applications can not
      predict when entries will be removed from the storage unit.
      Note: add a default mime type above to a column descriptor?
  window.storage.delete(string aName);
    Deletes a particular piece of storage by name.
  window.storage.getNames()
    Returns an array of strings with the names of all of the storage
    units defined.
  window.storage.getDesc(string aName);
    Returns an array of dictionaries describing the storage unit
    defined.  See window.storage.create() above.
  window.storage.addDataByName(string aName, dictionary aValues);
    This will add a new row to aName storage and the column names
    will be specified as the keys in aValues.
  window.storage.addData(string aName, array aValues);
    This will add a new row using the offset in the aValues array as
    the offset in the aName storage.
  window.storage.deleteDataByName(string aName, dictionary aValues);
    This will delete any rows in aName storage that match the
    key/value pairs in aValues.
  window.storage.deleteData(string aName, array aValues);
    This will delete any rows in aName storage that match the values
    specified in aValues.
  window.storage.getRowsByName(string aName, dictionary aValues);
    This will return any rows that match the specified values.  Note
    that there's no equiv. getRows() call.  This is because you have
    to be able to leave out rows for which you don't know the value
    and 'null' is the only way to pass that information in an array,
    and you should still be able to match against the 'null' value in
    the database.
  window.storage.search(string aName, searchDesc aSearchDescriptor);
    This is the powerful search function.  You will need to build a
    query object that allows you to search.
    The search descriptor is an array of dictionaries that includes
    arbitrarily complex search functions in reverse polish notion.
    For example:
    var foo = [ {and: [ {name: {equals: "bob"}}, {addr: {equals: "bar"}} ] }
                {or:  [ {status: {equals: "on"}} ] } ];
    This is the same as
    if ((name == "bob" && addr == "bar") || status == "on")
    Note: Ugly!  Can we just pass JS evals and then get the decision
    tree?  Also, string matching using substrings and regex?  How
    about user-defined functions?
  window.storage.getByName(string aName, object key)
    This assumes that you set the key attribute when you created the
    storage.  It will return the row that matches the key.
  window.storage.getByOffset(string aName, number offset)
    This will get the n-th offset item in the storage.
  window.storage.getCount(string aName)
    This will get the number of rows in aName storage.
  window.storage.iterate(string aName, func callback)
    This will iterate over every row in aName storage calling your
    callback for each one.
  window.storage.getUsageAllowed();
  window.storage.getUsage(string aName);
    getUsageAllowed() returns the number of bytes that the app can
    use for storage.  If you pass 'null' to getUsage it will return
    the total number of bytes in use.  getUsage with a storage name
    will give the number of bytes in use for a specific piece of
    storage.
  Notes:
    add .sort to sort a table?
    How about updates?
  Networking APIs
    How to specificy which specific row to download?  Maybe with a
    add()?  Should be something like:
    window.storage.addFromNetwork(aName, rowdesc, url);
    Same problem with setting the mime type for a particular row in a
    particular column - need a way to specify that.