User:Uri/Bidi editing

From MozillaWiki
Jump to navigation Jump to search

This is currently a private draft. Come back later if you're interested.

My goals for this document are:

  1. Describe the fundamental issues involved in bidirectional editing.
  2. Give a high-level overview of how Mozilla currently tackles these isues.
  3. Point out some problems with the current approach.
  4. Present an abstract alternative approach, which I think can solve some of these problems.
  5. Finally, suggest how this new approach can be implemented within the current framework.

Before starting, one important note:

Mozilla currently implements what's known as visual caret movement. That is, pressing the left (right) arrow key always moves the caret one place to the left (right), regardless of the directionality of the text the caret is on, or of the paragraph directionality. This approach is also the system approach on Mac OS X (and always was on MacOS), but is not the system behaviour on Windows (which uses logical caret movement instead).

This document assumes that this approach is going to remain, i.e., it duscusses only visual caret movement. Some of the issues discussed below might be relevant to logical caret movement as well, but many aren't. If you're interested in the visual-vs-logical debate, see bug 167288.

The issues

The main issue involved in bidirectional editing is that there is no one-to-one relationship between a logical position in the text, and a visual position on the screen. A single logical position can map to two different visual positions, and a single visual position might map to two different logical positions.

In the following examples, I will use uppercase Latin letters to represent RTL (e.g. Hebrew) letters, whereas lowercase latin letters will represent LTR (e.g. Latin) letters. This seems to be the convention, as it makes it easier for people who can not read RTL languages to understand the examples.

Consider, for eample, the text with the following logical representation: latinHEBREWmore (this example diliberately ommits spaces, in order to avoid the issues associated with resolving their directionality). This text is displayed on the screen as latinWERBEHmore.

Consider the logical position between n and H. This is immediately after n, so it maps to the visual position between n and W. But it is also immediately before H, so it also maps to the visual position between H and m.

Now, consider the visual position between n and W. It is immediately after n, so it can be mapped to the logical position between n and H. But it also immediately after W, so it can also be mapped to the logical position between W and m.

Bidirectional text is stored logically, and (obviously) displayed visually. The caret, being a graphical element, corresponds to a visual location. The user can manipulate text and move the caret through a combination of logical operations (such as typing or deleting) and visual operations (such as using the arrow keys). Therefore, the problem of mapping between logical and visual positions in a way which will meet the expectations of the user is the central problem of bidirectional editing.

At this point, I would like to recommend reading Guidelines of a Logical User Interface (UI) for Editing Bidirectional Text by Matitiahu Allouche of IBM. This document presents a method for dealing with the problems associated with BiDi editing. It contains some useful definitions, as well as a detailed description of a logical-to-visual mapping algorithm (in the "Conversion of cursor positions" section). This document is the basis of the current Mozilla bidi editing implementation (which I'll describe below). I'd like to thank Simon Montagu for introducing me to this document.

The current Mozilla implementation

This section describes my understanding of the current system. It might contain inaccuracies. If you spot any, please let me know.

Mozilla represents the caret location internally as a (collapsed) selection, which consistes of:

  • A logical position inside the content tree (i.e. a content node and an offset into that node).
  • A "hint", which is a boolean value indicating how the caret arrived at the current location.

The "hint" mechanism was originally devised as way of indicating where to display the caret when it is at the end of a wrapping line. When arriving from the left (in LTR text), using the right arrow key, the caret should be displayed at the end of the line (after the space at which the line wraps). Pressing the right arrow key again, the caret should be moved to the beginning of the following line. Note that the caret remains in the same logical position - so this is a simple (non-bidi) case of where one logical position can be mapped to two visual positions.

When bidi support was added to Mozilla, the "hint" mechanism's role expanded to handle other cases where one logical position maps to two visual positions, as described in the previous section. In terms of the IBM document, the hint mechanism is used to implement the concept of the "cursor bidi level". It serves both as an input to the algorithm determining the cursor level, and as an output of the logical-to-visual mapping algorithm. <<<more work needed here>>>

When required to display the caret, the system examines the selection object, and invokes the IBM agorithm in order to determine the visual position in which the caret will be displayed (when the logical position is in the middle of an LTR run of characters or an RTL run of characters, there is no ambiguity and the algorithm is trivial. Things get interesting only when the logical caret position is between runs of different directions, or, more accurately: between runs of different bidi embedding levels).

Typing

When the user types a character, that character is inserted into the text stream at the logical insertion point. The insertion point is then moved to after the new character, and the new visual caret position is determined again according to the IBM logical-to-visual mapping algorithm.

Deleting

When the user attempts to delete a character (either forward, using the "delete" key, or backwards, useing the "backspace" key), things get a bit more complicated. In our previous example (latinHEBREWmore), consider that the logical insertion point is between n and H. Now suppose, that the visual mapping currently chosen for this position is the first one mentioned above, i.e. after (to the right of) the n. Visually, this looks like this: latin|WERBEHmore (where | represents the caret). Suppose now that the user pressed the (forward) "delete" key. Since deleting is a logical function, the character which should be deleted is the one which logically follows the insertion point, that is H. However, notice that the caret isn't currently displayed as adjacent to that character! So deleting the H would likely be confusing and unexpected. IBM's algorithm handles this by specifying that in this case, no deletion will actually be done, but instead, the logical insertion point will be mapped to the alternative visual caret position, so that the caret will appear between H and m, indicating to the user that another press of the "delete" key will delete the H.

Switching keyboards

Actually, typing is not as as simple as described above. Consider The same scenario as above, but this time, instead of pressing "delete", the user types a Hebrew letter (let's say X). This letter is inserted (logically) between the n and the H, which means that it will appear visually to the right of the H: latinWERBEHXmore. Notice that the newly-inserted letter appears away from the visual caret position! IBM's algorithm tries to address this issue in two ways:

  1. It tries to ensure that the keyboard language selection will match the current lgocal-to-visual mapping. For example, the situation described above will occur when the caret arrived at that position by using the right-arrow to moce past the word latin. In this case, the IBM algorithm specifies that the keyboard input method should be set to an LTR method, so the use rwill not be able to type a Hebrew letter without manually switching the keyboard input to Hebrew (similarly, the other visual mapping for the same logical position is triggered by left-arrowing over the Hebrew word, which would set tyhe keyboard input selection to Hebrew). Note that this part of the IBM algorithm is currently not implemented by Mozilla - see bug 162242.
  2. When the user does manually switch the keyboard layout, the system adjusts the visual positioning of the caret to match the position in which the next expected character will actually apper. In the example above, if the user switches the keyboard layout from English to Hebrew, the logical-to-visual mapping will be switched so that the caret will be displayed between the H and the m, so when the X is typed, it appears at the caret's location.

Note that the combination of these two methods still doesn't solve the problem entirely. It's possible for the user to type LTR characters (such as numbers) when the keyboard layout is Hebrew, ot to type neutral characters, whcih will become part of an RTL run, while the keyboard layout is English. I'll get back to this in the next section.