Firefox/Projects/FTS and Awesomebar: Difference between revisions

Line 18: Line 18:


* Gecko has some facilities for i18n word boundary analysis under the  [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/ intl/lwbrk/] directory.
* Gecko has some facilities for i18n word boundary analysis under the  [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/ intl/lwbrk/] directory.
** [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/idl/nsISemanticUnitScanner.idl <tt>nsISemanticUnitScanner</tt>] is the only interface there, and [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSemanticUnitScanner.h <tt>nsSemanticUnitScanner</tt>] is its only implementation.  <tt>nsSemanticUnitScanner</tt> is derived from [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSampleWordBreaker.cpp <tt>nsSampleWordBreaker</tt>], which as its name implies is not robust.  It supports ASCII but as far as I can tell has incomplete support for CJK and Thai and no support for other scripts.
** [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/idl/nsISemanticUnitScanner.idl <tt>nsISemanticUnitScanner</tt>] is the only interface there, and [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSemanticUnitScanner.h <tt>nsSemanticUnitScanner</tt>] is its only implementation.  <tt>nsSemanticUnitScanner</tt> is derived from [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSampleWordBreaker.cpp <tt>nsSampleWordBreaker</tt>], which as its name implies is not robust.  It supports ASCII but as far as I can tell has incomplete support for CJK and Thai and no support for other scripts.  Manual testing shows that it does not tokenize Japanese satisfactorily.
** There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/public/nsIWordBreaker.h <tt>nsIWordBreaker</tt>], but its [http://mxr.mozilla.org/mozilla-central/source/intl/build/nsI18nModule.cpp#65 only implementation] is <tt>nsSampleWordBreaker</tt>.
** There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/public/nsIWordBreaker.h <tt>nsIWordBreaker</tt>], but its [http://mxr.mozilla.org/mozilla-central/source/intl/build/nsI18nModule.cpp#65 only implementation] is <tt>nsSampleWordBreaker</tt>.
** There are some other files.  There are several line breakers in the [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/ src] directory.  There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/rulebrk.h another word breaker], but it's for Thai text only.  There's a smattering of files related to [http://en.wikipedia.org/wiki/JIS_encoding JIS encoding].
** There are some other files.  There are several line breakers in the [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/ src] directory.  There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/rulebrk.h another word breaker], but it's for Thai text only.  There's a smattering of files related to [http://en.wikipedia.org/wiki/JIS_encoding JIS encoding].
Confirmed users
764

edits