User:Uri/Bidi editing: Difference between revisions

m
spellcheck
m (spellcheck)
Line 3: Line 3:
My goals for this document are:
My goals for this document are:
# Describe the fundamental issues involved in bidirectional editing.
# Describe the fundamental issues involved in bidirectional editing.
# Give a high-level overview of how Mozilla currently tackles these isues.  
# Give a high-level overview of how Mozilla currently tackles these issues.  
# Point out some problems with the current approach.
# Point out some problems with the current approach.
# Present an abstract alternative approach, which I think can solve some of these problems.  
# Present an abstract alternative approach, which I think can solve some of these problems.  
Line 10: Line 10:
Before starting, one important note:
Before starting, one important note:


Mozilla currently implements what's known as '''visual caret movement'''. That is, pressing the left (right) arrow key always moves the caret one place to the left (right), regardless of the directionality of the text the caret is on, or of the paragraph directionality. This approach is also the system approach on Mac OS X (and always was on MacOS), but is ''not'' the system behaviour on Windows (which uses '''logical caret movement''' instead).
Mozilla currently implements what's known as '''visual caret movement'''. That is, pressing the left (right) arrow key always moves the caret one place to the left (right), regardless of the directionality of the text the caret is on, or of the paragraph directionality. This approach is also the system approach on Mac OS X (and always was on Mac OS), but is ''not'' the system behavior on Windows (which uses '''logical caret movement''' instead).


This document assumes that this approach is going to remain, i.e., it duscusses only visual caret movement. Some of the issues discussed below might be relevant to logical caret movement as well, but many aren't. If you're interested in the visual-vs-logical debate, see [https://bugzilla.mozilla.org/show_bug.cgi?id=167288 bug 167288].
This document assumes that this approach is going to remain, i.e., it discusses only visual caret movement. Some of the issues discussed below might be relevant to logical caret movement as well, but many aren't. If you're interested in the visual-vs.-logical debate, see [https://bugzilla.mozilla.org/show_bug.cgi?id=167288 bug 167288].


==The issues==
==The issues==
The main issue involved in bidirectional editing is that there is no one-to-one relationship between a logical position in the text, and a visual position on the screen. A single logical position can map to two different visual positions, and a single visual position might map to two different logical positions.
The main issue involved in bidirectional editing is that there is no one-to-one relationship between a logical position in the text, and a visual position on the screen. A single logical position can map to two different visual positions, and a single visual position might map to two different logical positions.


<small>''In the following examples, I will use uppercase Latin letters to represent RTL (e.g. Hebrew) letters, whereas lowercase latin letters will represent LTR (e.g. Latin) letters. This seems to be the convention, as it makes it easier for people who can not read RTL languages to understand the examples.''</small>
<small>''In the following examples, I will use uppercase Latin letters to represent RTL (e.g. Hebrew) letters, whereas lowercase Latin letters will represent LTR (e.g. Latin) letters. This is the convention, as it makes it easier for people who can not read RTL languages to understand the examples.''</small>


Consider, for eample, the text with the following logical representation: '''latinHEBREWmore''' (this example diliberately ommits spaces, in order to avoid the issues associated with resolving their directionality). This text is displayed on the screen as '''latinWERBEHmore'''.
Consider, for example, the text with the following logical representation: '''latinHEBREWmore''' (this example deliberately omits spaces, in order to avoid the issues associated with resolving their directionality). This text is displayed on the screen as '''latinWERBEHmore'''.


Consider the logical position between '''n''' and '''H'''. This is immediately after '''n''', so it maps to the visual position between '''n''' and '''W'''. But it is also immediately before '''H''', so it also maps to the visual position between '''H''' and '''m'''.
Consider the logical position between '''n''' and '''H'''. This is immediately after '''n''', so it maps to the visual position between '''n''' and '''W'''. But it is also immediately before '''H''', so it also maps to the visual position between '''H''' and '''m'''.
Line 25: Line 25:
Now, consider the visual position between '''n''' and '''W'''. It is immediately after '''n''', so it can be mapped to the logical position between '''n''' and '''H'''. But it also immediately after '''W''', so it can also be mapped to the logical position between '''W''' and '''m'''.
Now, consider the visual position between '''n''' and '''W'''. It is immediately after '''n''', so it can be mapped to the logical position between '''n''' and '''H'''. But it also immediately after '''W''', so it can also be mapped to the logical position between '''W''' and '''m'''.


Bidirectional text is stored logically, and (obviously) displayed visually. The caret, being a graphical element, corresponds to a visual location. The user can manipulate text and move the caret through a combination of '''logical functions''' (such as typing or deleting) and '''visual functions''' (such as using the arrow keys). Therefore, the problem of mapping between logical and visual positions in a way which will meet the expectations of the user is the central problem of bidirectional editing.
Bidirectional text is stored logically, and (obviously) displayed visually. The caret, being a graphical element, corresponds to a visual location. The user can manipulate text and move the caret through a combination of '''logical functions''' (such as typing or deleting) and '''visual functions''' (such as using the arrow keys). Therefore, the problem of mapping between logical and visual positions in a way that will meet the expectations of the user is the central problem of bidirectional editing.


At this point, I would like to recommend reading [http://www-306.ibm.com/software/globalization/topics/bidiui/index.jsp Guidelines of a Logical User Interface (UI) for Editing Bidirectional Text] by Matitiahu Allouche of IBM. This document presents a method for dealing with the problems associated with BiDi editing. It contains some useful definitions, as well as a detailed description of a logical-to-visual mapping algorithm (in the "[http://www-306.ibm.com/software/globalization/topics/bidiui/conversion.jsp Conversion of cursor positions]" section). This document is the basis of the current Mozilla bidi editing implementation (which I'll describe below). I'd like to thank Simon Montagu for introducing me to this document, which I'll hereby refer to as "the IBM document" (or "the IBM algorithm").
At this point, I would like to recommend reading [http://www-306.ibm.com/software/globalization/topics/bidiui/index.jsp Guidelines of a Logical User Interface (UI) for Editing Bidirectional Text] by Matitiahu Allouche of IBM. This document presents a method for dealing with the problems associated with bidi editing. It contains some useful definitions, as well as a detailed description of a logical-to-visual mapping algorithm (in the "[http://www-306.ibm.com/software/globalization/topics/bidiui/conversion.jsp Conversion of cursor positions]" section). This document is the basis of the current Mozilla bidi editing implementation (which I'll describe below). I'd like to thank Simon Montagu for introducing me to this document, which I'll hereby refer to as "the IBM document" (or "the IBM algorithm").


==The current Mozilla implementation==
==The current Mozilla implementation==
''This section describes my understanding of the current system. It might contain inaccuracies. If you spot any, please let me know.''
''This section describes my understanding of the current system. It might contain inaccuracies. If you spot any, please let me know.''


Mozilla represents the caret location internally as a (collapsed) selection, which consistes of:
Mozilla represents the caret location internally as a (collapsed) selection, which consists of:
* A logical position inside the content tree (i.e. a content node and an offset into that node).
* A logical position inside the content tree (i.e. a content node and an offset into that node).
* A "hint", which is a boolean value indicating how the caret arrived at the current location.
* A "hint", which is a boolean value indicating how the caret arrived at the current location.
Line 42: Line 42:
When bidi support was added to Mozilla, the "hint" mechanism's role expanded to handle other cases where one logical position maps to two visual positions, as described in the previous section. In terms of the IBM document, the hint mechanism is used to implement the concept of the "cursor bidi level". It serves both as an input to the algorithm determining the cursor level, and as an output of the logical-to-visual mapping algorithm.
When bidi support was added to Mozilla, the "hint" mechanism's role expanded to handle other cases where one logical position maps to two visual positions, as described in the previous section. In terms of the IBM document, the hint mechanism is used to implement the concept of the "cursor bidi level". It serves both as an input to the algorithm determining the cursor level, and as an output of the logical-to-visual mapping algorithm.


When required to display the caret, the system examines the selection object, and invokes the IBM agorithm in order to determine the visual position in which the caret will be displayed (when the logical position is in the middle of an LTR run of characters or an RTL run of characters, there is no ambiguity and the algorithm is trivial. Things get interesting only when the logical caret position is between runs of different directions, or, more accurately: between runs of different bidi embedding levels).
When required to display the caret, the system examines the selection object, and invokes the IBM algorithm in order to determine the visual position in which the caret will be displayed (when the logical position is in the middle of an LTR run of characters or an RTL run of characters, there is no ambiguity and the algorithm is trivial. Things get interesting only when the logical caret position is between runs of different directions, or, more accurately: between runs of different bidi embedding levels).


===Typing===
===Typing===
Line 48: Line 48:


===Deleting===
===Deleting===
When the user attempts to delete a character (either forward, using the "delete" key, or backwards, useing the "backspace" key), things get a bit more complicated.
When the user attempts to delete a character (either forward, using the "delete" key, or backwards, using the "backspace" key), things get a bit more complicated.
In our previous example ('''latinHEBREWmore'''), consider that the logical insertion point is between '''n''' and '''H'''. Now suppose, that the visual mapping currently chosen for this position is the first one mentioned above, i.e. after (to the right of) the '''n'''. Visually, this looks like this: '''latin|WERBEHmore''' (where | represents the caret). Suppose now that the user pressed the (forward) "delete" key. Since deleting is a logical function, the character which should be deleted is the one which logically follows the insertion point, that is '''H'''.
In our previous example ('''latinHEBREWmore'''), consider that the logical insertion point is between '''n''' and '''H'''. Now suppose, that the visual mapping currently chosen for this position is the first one mentioned above, i.e. after (to the right of) the '''n'''. Visually, this looks like this: '''latin|WERBEHmore''' (where | represents the caret). Suppose now that the user pressed the (forward) "delete" key. Since deleting is a logical function, the character that should be deleted is the one that logically follows the insertion point, that is, '''H'''.
However, notice that the caret isn't currently displayed as adjacent to that character! So deleting the '''H''' would likely be confusing and unexpected. IBM's algorithm handles this by specifying that in this case, no deletion will actually be done, but instead, the logical insertion point will be mapped to the alternative visual caret position, so that the caret will appear between '''H''' and '''m''', indicating to the user that ''another'' press of the "delete" key will delete the '''H'''.
However, notice that the caret isn't currently displayed as adjacent to that character! So deleting the '''H''' would likely be confusing and unexpected. IBM's algorithm handles this by specifying that in this case, no deletion will actually be done, but instead, the logical insertion point will be mapped to the alternative visual caret position, so that the caret will appear between '''H''' and '''m''', indicating to the user that ''another'' press of the "delete" key will delete the '''H'''.


===Switching keyboards===
===Switching keyboards===
Actually, typing is not as as simple as described above. Consider The same scenario as above, but this time, instead of pressing "delete", the user types a Hebrew letter (let's say '''X'''). This letter is inserted (logically) between the '''n''' and the '''H''', which means that it will appear visually to the right of the '''H''': '''latinWERBEHXmore'''. Notice that the newly-inserted letter appears away from the visual caret position!
Actually, typing is not as simple as described above. Consider The same scenario as above, but this time, instead of pressing "delete", the user types a Hebrew letter (let's say '''X'''). This letter is inserted (logically) between the '''n''' and the '''H''', which means that it will appear visually to the right of the '''H''': '''latinWERBEHXmore'''. Notice that the newly-inserted letter appears away from the visual caret position!
IBM's algorithm tries to address this issue in two ways:
IBM's algorithm tries to address this issue in two ways:
#It tries to ensure that the keyboard language selection will match the current logical-to-visual mapping. For example, the situation described above will occur when the caret arrived at that position by using the right-arrow to moce past the word '''latin'''. In this case, the IBM algorithm specifies that the keyboard input method should be set to an LTR method, so the use rwill not be able to type a Hebrew letter without manually switching the keyboard input to Hebrew (similarly, the other visual mapping for the same logical position is triggered by left-arrowing over the Hebrew word, which would set tyhe keyboard input selection to Hebrew). ''Note that this part of the IBM algorithm is currently not implemented by Mozilla'' - see [https://bugzilla.mozilla.org/show_bug.cgi?id=162242 bug 162242].
#It tries to ensure that the keyboard language selection will match the current logical-to-visual mapping. For example, the situation described above will occur when the caret arrived at that position by using the right-arrow to move past the word '''latin'''. In this case, the IBM algorithm specifies that the keyboard input method should be set to an LTR method, so the user will not be able to type a Hebrew letter without manually switching the keyboard input to Hebrew (similarly, the other visual mapping for the same logical position is triggered by left-arrowing over the Hebrew word, which would set the keyboard input selection to Hebrew). ''Note that this part of the IBM algorithm is currently not implemented by Mozilla'' - see [https://bugzilla.mozilla.org/show_bug.cgi?id=162242 bug 162242].
#When the user does manually switch the keyboard layout, the system adjusts the visual positioning of the caret to match the position in which the next expected character will actually apper. In the example above, if the user switches the keyboard layout from English to Hebrew, the logical-to-visual mapping will be switched so that the caret will be displayed between the '''H''' and the '''m''', so when the '''X''' is typed, it appears at the caret's location.
#When the user does manually switch the keyboard layout, the system adjusts the visual positioning of the caret to match the position in which the next expected character will actually appear. In the example above, if the user switches the keyboard layout from English to Hebrew, the logical-to-visual mapping will be switched so that the caret will be displayed between the '''H''' and the '''m''', so when the '''X''' is typed, it appears at the caret's location.
Note that the combination of these two methods still doesn't solve the problem entirely. It's possible for the user to type LTR characters (such as numbers) when the keyboard layout is Hebrew, ot to type neutral characters, whcih will become part of an RTL run, while the keyboard layout is English. I'll get back to this in the next section.
Note that the combination of these two methods still doesn't solve the problem entirely. It's possible for the user to type LTR characters (such as numbers) when the keyboard layout is Hebrew, ot to type neutral characters, which will become part of an RTL run, while the keyboard layout is English. I'll get back to this in the next section.


===Moving the caret===
===Moving the caret===
''I'll focus here on using the left and right arrow keys (without modifyers) to move the caret. There are other methods of moving the caret, but for the purpose of this documents they will be ignored.''
''I'll focus here on using the left and right arrow keys (without modifiers) to move the caret. There are other methods of moving the caret, but for the purpose of this documents they will be ignored.''


When the user presses an arrow key (left or right), the following process is initiated:
When the user presses an arrow key (left or right), the following process is initiated:
Line 69: Line 69:
The main problems with the current system should be evident from my description of the system above. I'll re-iterate them briefly:
The main problems with the current system should be evident from my description of the system above. I'll re-iterate them briefly:
* The system does not always behave as the user expects:
* The system does not always behave as the user expects:
** In the case of typing, the system (even when fully implemented) does not ''ensure'' that the typed character will apear at the location of the caret. The result could be confusing and even frustrating for the user. See [https://bugzilla.mozilla.org/show_bug.cgi?id=300004 bug 300004], and, specifically, the [https://bugzilla.mozilla.org/attachment.cgi?id=192975 second testcase] attached to it.
** In the case of typing, the system (even when fully implemented) does not ''ensure'' that the typed character will appear at the location of the caret. The result could be confusing and even frustrating for the user. See [https://bugzilla.mozilla.org/show_bug.cgi?id=300004 bug 300004], and, specifically, the [https://bugzilla.mozilla.org/attachment.cgi?id=192975 second testcase] attached to it.
** In the case of deleting, when the caret is not adjacent to the to-be-deleted character, the system's solution is to not actually delete a character, but to move the caret (possibly a long distance!) to the position where the deletion ''would have'' taken place. This is likely ''not'' what the user expects. The user expects for a character visually adjacent to the caret to be deleted.
** In the case of deleting, when the caret is not adjacent to the to-be-deleted character, the system's solution is to not actually delete a character, but to move the caret (possibly a long distance!) to the position where the deletion ''would have'' taken place. This is likely ''not'' what the user expects. The user expects for a character visually adjacent to the caret to be deleted.
** When switching keyboard layouts, the caret might move to a different positionn. This, again, is unexpected from the point of view of the user, which would expect the text being typed to be inserted at the caret position even if the typing is preceed by switching the keyboard layout.  
** When switching keyboard layouts, the caret might move to a different position. This, again, is unexpected from the point of view of the user, which would expect the text being typed to be inserted at the caret position even if the typing is preceded by switching the keyboard layout.  
* The process used by the system to perform visual functions (such as responding to right or left arrow keys), is extremely complicated, as it involves visual-to-logical mapping followed by logical-to-visual mapping, both being ambiguous, complex, tasks (and the fist of which is undocumented, as far as I know). All of this is to achieve a seemingly simple result: moving the caret visually (e.g.) one place to the left. The complexity of this process makes its implementing code bug-prone and difficult to maintain, as the many dependencies of [https://bugzilla.mozilla.org/show_bug.cgi?id=207186 bug 207186]  will attest.
* The process used by the system to perform visual functions (such as responding to right or left arrow keys), is extremely complicated, as it involves visual-to-logical mapping followed by logical-to-visual mapping, both being ambiguous, complex, tasks (and the fist of which is undocumented, as far as I know). All of this is to achieve a seemingly simple result: moving the caret visually (e.g.) one place to the left. The complexity of this process makes its implementing code bug-prone and difficult to maintain, as the many dependencies of [https://bugzilla.mozilla.org/show_bug.cgi?id=207186 bug 207186]  will attest.


Line 79: Line 79:
In this section I'll present an alternative approach to implementing bidi editing (with visual caret movement). I won't go into implementation details, but I'll sketch the basic principles.
In this section I'll present an alternative approach to implementing bidi editing (with visual caret movement). I won't go into implementation details, but I'll sketch the basic principles.


The system I propose has two modes: '''logical mode''' and '''visual mode'''. The system is placed in logical mode following any logical function perofrmed by the user (such as typing or deleting), and is similarly placed in visual mode following any visual function performed by the user (such as pressing arrow keys, or clicking anywhere in the text).
The system I propose has two modes: '''logical mode''' and '''visual mode'''. The system is placed in logical mode following any logical function performed by the user (such as typing or deleting), and is similarly placed in visual mode following any visual function performed by the user (such as pressing arrow keys, or clicking anywhere in the text).


When the system is in logical mode, it stores the location of the caret "logically", i.e. as an offset to the (logically stored) text. When it is in visual mode, it stores the caret location visually, i.e. relative to the text as it is presented on the screen.
When the system is in logical mode, it stores the location of the caret "logically", i.e. as an offset to the (logically stored) text. When it is in visual mode, it stores the caret location visually, i.e. relative to the text as it is presented on the screen.
Line 96: Line 96:
Let's take our example of '''latinHEBREWmore''' again, displayed as '''latinWERBEHmore'''. Suppose the user visually moved the caret to between the '''n''' and the '''W''' (i.e. by using the arrow keys or the mouse). Now, suppose the user types an LTR character ('''x'''). The system has to switch to logical mode, that is, to map the visual position to a logical one. As we recall, this is ambiguous: the logical positions after the '''n''' and after the '''W''' both correspond to the current visual position. However, since at this stage the system knows that the user typed a LTR character, it will prefer the logical position following the '''n''', and the result would be logically '''latinxHEBREWmore''' and visually '''latinx|WERBEHmore'''.
Let's take our example of '''latinHEBREWmore''' again, displayed as '''latinWERBEHmore'''. Suppose the user visually moved the caret to between the '''n''' and the '''W''' (i.e. by using the arrow keys or the mouse). Now, suppose the user types an LTR character ('''x'''). The system has to switch to logical mode, that is, to map the visual position to a logical one. As we recall, this is ambiguous: the logical positions after the '''n''' and after the '''W''' both correspond to the current visual position. However, since at this stage the system knows that the user typed a LTR character, it will prefer the logical position following the '''n''', and the result would be logically '''latinxHEBREWmore''' and visually '''latinx|WERBEHmore'''.


Conversly, if, at the same visual location, the user types an RTL character ('''X'''), the system will prefer the other logical position mapped to this visual position, and the result will be logocally '''latinHEBREWXmore''', and visually '''latin|XWERBEHmore'''.
Conversely, if, at the same visual location, the user types an RTL character ('''X'''), the system will prefer the other logical position mapped to this visual position, and the result will be logocally '''latinHEBREWXmore''', and visually '''latin|XWERBEHmore'''.


In both cases, the result will be likely what the user expected. Notice that we are able to do this because unlike the current system, which tries to "guess" what character the user will type next (based, e.g., on the keyboard layout), the proposed system only resolves the visual-to-logical ambiguity when it has all the information, i.e. when it knows what character was, in fact, typed in.
In both cases, the result will be likely what the user expected. Notice that we are able to do this because unlike the current system, which tries to "guess" what character the user will type next (based, e.g., on the keyboard layout), the proposed system only resolves the visual-to-logical ambiguity when it has all the information, i.e. when it knows what character was, in fact, typed in.


Now consider deletion. In the above example, when that caret is visualy positioned between '''n''' and '''W''', the user presses the "backspace" key. In this case, the system can use the paragraph direction (which we'll assume is LTR) to determine that the expected result is deleting the character on the left (the '''n'''). So the logical position selected will be that between '''n''' and '''H'''.  
Now consider deletion. In the above example, when that caret is visually positioned between '''n''' and '''W''', the user presses the "backspace" key. In this case, the system can use the paragraph direction (which we'll assume is LTR) to determine that the expected result is deleting the character on the left (the '''n'''). So the logical position selected will be that between '''n''' and '''H'''.  
Pressing "delete" in the same position will cause the other possible logical position to be selected, resulting in the deletion of the '''W'''.
Pressing "delete" in the same position will cause the other possible logical position to be selected, resulting in the deletion of the '''W'''.


In a more realistic case, when the character on one of the sides of the caret is of neutral directionality (i.e., a space), and the other is of strong directionality, the directionality of the strong character can be used instead of the paragraph direction to determine the direction of deletion. So for the following visual setting: '''latin |WERBEH''', pressing "backspace" will delete the '''W''', while "delete" will delete the space.
In a more realistic case, when the character on one of the sides of the caret is of neutral directionality (i.e., a space), and the other is of strong directionality, the directionality of the strong character can be used instead of the paragraph direction to determine the direction of deletion. So for the following visual setting: '''latin |WERBEH''', pressing "backspace" will delete the '''W''', while "delete" will delete the space.


In any event, the deleted character will always be visually adjacant to the caret.
In any event, the deleted character will always be visually adjacent to the caret.
67

edits