Firefox/Projects/Places DB Creation Scripts: Difference between revisions
(→Design) |
(→Design) |
||
Line 45: | Line 45: | ||
* http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | * http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | ||
** GetAutoCompleteBaseQuery() | ** GetAutoCompleteBaseQuery() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#190 | ||
** see BOOK_TAG_SQL - having a lot of tags will slow stuff down, however that might not be representative of "normal users" | ** see BOOK_TAG_SQL - having a lot of tags will slow stuff down, however that might not be representative of "normal users": http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#109 | ||
** mDBAdaptiveQuery | ** mDBAdaptiveQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#485 | ||
** mDBKeywordQuery | ** mDBKeywordQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#507 | ||
** AutoCompleteProcessSearch() | ** AutoCompleteProcessSearch() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#1000 | ||
* This file (or these queries at least) are being rewritten in JS: see _processRow() in https://bug455555.bugzilla.mozilla.org/attachment.cgi?id=363641 | * This file (or these queries at least) are being rewritten in JS: see _processRow() in https://bug455555.bugzilla.mozilla.org/attachment.cgi?id=363641 | ||
Some notes on the above funcs and SQL: | Some notes on the above funcs and SQL: | ||
* GetAutoCompleteBaseQuery() is from table moz_places(_temp) x moz_favicons; where frecency != 0 | * GetAutoCompleteBaseQuery() is from table moz_places(_temp) x moz_favicons; where frecency != 0; orders by column 9 (guessing this is frecency column...) | ||
* BOOK_TAG_SQL defined in terms of SQL_STR_FRAGMENT_GET_BOOK_TAG http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#94 | |||
* mDBAdaptiveQuery uses moz_inputhistory | |||
--- | --- | ||
Revision as of 20:05, 25 February 2009
Overview
Sprint lead: ddahl
Sprinters: adw
- Description
- Create python
/perlscripts to generate Places DBs with various characteristics such as "many visits within the same domain", "visits across many domains", "many tags", "many bookmarks", etc.
Goals / Use Cases
The sample data set should actually be quite huge (according to Beltzner and Shaver). We should collect stats from others with Dietrich's extension to see what the average data set looks like at Mozilla.
The chief goal is to be able to automate the generation of these sample sqlite databases for a continuous test to run on Places. We want to be able to reliably set some benchmarks and see what code changes either slow down or speed up queries in Places.
Non Goals
tbd
Design
We should try to use the Django ORM to reverse-engineer the Places database schema into Django Models so creating rows will be easy and we can concentrate on url data collection.
Data collection:
Beltzner envisions a huge dataset made up of perhaps 10k unique urls in bookmarks and a similar data set in history, etc...
We need to brainstorm a method for getting this raw data. Spider/bot? There are many python libs for this.
What are the variables we need to keep in mind when creating this data sample for performance testing? ASK Dietrich and Shawn.
Potential exemplar datasets:
- "Grandma": Very few visits per month, mostly to the same sites. Very few bookmarks.
- "Nerd": Very many visits per month across a wide range of sites with a core of often visited sites. Tons o' bookmarks, maybe lots of tags, too.
- "Random Walk": Many visits to many different sites with no discernible most often visited sites.
- "News Hound": Many visits per month, mostly to the same sites.
Or a more general way to think about it, we have these dimensions:
- Number of places (unique URLs)
- Number of visits
- Nature of visits (visiting same URLs often to the exclusion of others, or visiting all places equally? Visiting same domains often? (Does that matter?) Type of transition?)
- Number of bookmarks
- Number of tags
- Nature of tags (each bookmark has tons of tags, few tags, or varied?)
Shawn says:
- http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp
- GetAutoCompleteBaseQuery() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#190
- see BOOK_TAG_SQL - having a lot of tags will slow stuff down, however that might not be representative of "normal users": http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#109
- mDBAdaptiveQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#485
- mDBKeywordQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#507
- AutoCompleteProcessSearch() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#1000
- This file (or these queries at least) are being rewritten in JS: see _processRow() in https://bug455555.bugzilla.mozilla.org/attachment.cgi?id=363641
Some notes on the above funcs and SQL:
- GetAutoCompleteBaseQuery() is from table moz_places(_temp) x moz_favicons; where frecency != 0; orders by column 9 (guessing this is frecency column...)
- BOOK_TAG_SQL defined in terms of SQL_STR_FRAGMENT_GET_BOOK_TAG http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#94
- mDBAdaptiveQuery uses moz_inputhistory
---
set up django:
http://www.djangoproject.com/download/1.0.2/tarball/
uncompress and run:
sudo python setup.py install
add django bin to your path
export PATH=$PATH:~/code/python/django/bin:~/code/python
cd ~/code/python
run this:
django-admin.py startproject places
django-admin.py startapp builddb
copy a places.sqlite file to ~/code/python/places
export PLACES_DB_PATH=~/code/python/places/places.sqlite
export DJANGO_SETTINGS_MODULE=places.settings
export PYTHONPATH=$PYTHONPATH:~/code/python
edit the places/settings.py:
import os
DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = os.environ['PLACES_DB_PATH']
reverse engineer the Django Models from the schema:
cd ~/code/python/places
python manage.py inspectdb >> builddb/models.py
Now, we need to clean up the foreign keys.
Bugs
tbd