Firefox/Projects/FTS and Awesomebar: Difference between revisions

Jump to navigation Jump to search
m
(Massive update to include more details)
Line 20: Line 20:
** [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/idl/nsISemanticUnitScanner.idl <tt>nsISemanticUnitScanner</tt>] is the only interface there, and [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSemanticUnitScanner.h <tt>nsSemanticUnitScanner</tt>] is its only implementation.  <tt>nsSemanticUnitScanner</tt> is derived from [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSampleWordBreaker.cpp <tt>nsSampleWordBreaker</tt>], which as its name implies is not robust.  It supports ASCII but as far as I can tell has incomplete support for CJK and Thai and no support for other scripts.
** [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/idl/nsISemanticUnitScanner.idl <tt>nsISemanticUnitScanner</tt>] is the only interface there, and [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSemanticUnitScanner.h <tt>nsSemanticUnitScanner</tt>] is its only implementation.  <tt>nsSemanticUnitScanner</tt> is derived from [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsSampleWordBreaker.cpp <tt>nsSampleWordBreaker</tt>], which as its name implies is not robust.  It supports ASCII but as far as I can tell has incomplete support for CJK and Thai and no support for other scripts.
** There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/public/nsIWordBreaker.h <tt>nsIWordBreaker</tt>], but its [http://mxr.mozilla.org/mozilla-central/source/intl/build/nsI18nModule.cpp#65 only implementation] is <tt>nsSampleWordBreaker</tt>.
** There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/public/nsIWordBreaker.h <tt>nsIWordBreaker</tt>], but its [http://mxr.mozilla.org/mozilla-central/source/intl/build/nsI18nModule.cpp#65 only implementation] is <tt>nsSampleWordBreaker</tt>.
** There are some other files.  There are several line breakers in the [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/ src] directory.  There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/rulebrk.h another word breaker], but it's for Thai text only.  There's a scattering of files related to [http://en.wikipedia.org/wiki/JIS_encoding JIS encoding].
** There are some other files.  There are several line breakers in the [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/ src] directory.  There's [http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/rulebrk.h another word breaker], but it's for Thai text only.  There's a smattering of files related to [http://en.wikipedia.org/wiki/JIS_encoding JIS encoding].
* Thunderbird 3 does FTS with a custom tokenizer in the [http://mxr.mozilla.org/comm-central/source/mailnews/extensions/fts3/ mailnews/extensions/fts3/] directory.
* Thunderbird 3 does FTS with a custom tokenizer in the [http://mxr.mozilla.org/comm-central/source/mailnews/extensions/fts3/ mailnews/extensions/fts3/] directory.
** According to the [http://mxr.mozilla.org/comm-central/source/mailnews/extensions/fts3/src/README.mozilla readme], the tokenizer "supports CJK indexing using bi-gram. So you have to use bi-gram search string if you wanto to search CJK character."  There is no mention of other scripts.
** According to the [http://mxr.mozilla.org/comm-central/source/mailnews/extensions/fts3/src/README.mozilla readme], the tokenizer "supports CJK indexing using bi-gram. So you have to use bi-gram search string if you wanto to search CJK character."  There is no mention of other scripts.
Confirmed users
764

edits

Navigation menu