Intro
SQLite supports indexed full text search (FTS) through its fts extensions. Indexed FTS makes text searches fast: Rather than looking at every record in the database to see if it contains a search string, target records are found by comparing the search string against an index.
But the fts extensions aren't by default suitable for international text. This project will try to make them suitable so we can use indexed FTS to improve all of our users' experiences with Firefox.
A good first target feature in Firefox is the awesomebar. It is highly visible, and since it makes extensive use of text searches, stands to benefit from this work.
Note that full text search does not mean that we store the entire text of pages. We will use it to store page titles, URLs, tags, etc., as we already do now, only lookup will be fast.
- Champion, lead: adw
Status
TAKING OFF
- Gecko has some facilities for i18n word boundary analysis, but they're not comprehensive or suitable for Firefox's 300 million users. (They aren't even used anywhere in the tree.) Too bad.
- Thunderbird does FTS. I talked with asuth about it, and unfortunately their i18n tokenizer doesn't seem appropriate for us either.
- Investigating pulling some components of ICU into our tree. ICU is a large, established, and widely used i18n library that has facilities for word boundary analysis and tokenization. SQLite supports an ICU tokenizer out of the box.
- I was able to build our SQLite with ICU support. It works! I'm building and linking against an ICU build outside of our tree, because I don't want to focus on the grunt work of building and linking inside the tree with our tools right now. I assume doing so is not impossible...
- Next I would like to do some tests to determine its potential for improving awesomebar searches.
Goals
- Make awesomebar results come back from the database faster. (Note that async awesomebar prevents the UI from locking up, which is great and necessary, but it doesn't make the database queries any faster.)
Non Goals
- Improving user-facing features in Firefox other than the awesomebar. There is definitely potential, but that's for follow-up work.
- Pulling in parts of ICU (or any other lib) not required for i18n FTS.
Milestones
Note: Dates in the future are only estimates.
- 2010/03 - [Complete] Investigate i18n tokenizers
- 2010/03 - [Complete] Get ICU up and running with our SQLite, Storage
- 2010/03 - Test potential perf impact on awesomebar searches
- 2010/06 - Integrate the required ICU components with our tree, build system
- 2010/08 - Integrate awesomebar with SQLite's fts extension using ICU tokenizer
Delivery Requirements
- Testing to make sure awesomebar functionality and certainly perf is not regressed.
Constraints
- Have to convince people that pulling in parts of ICU is worth it. Expect pushback...
Dependencies
- Since this project is broadly defined -- improving FTS all the way to using FST in the awesomebar -- none.
Testing
- Will require manual testing of the awesomebar.
- Maybe we can set up some automated harness to time awesomebar searches.