Update:Remora Database Migration

From MozillaWiki
Jump to: navigation, search

« Back to Update:Remora

This page is about the migration from the v1 database to the new (remora) database structures.

The Utility

The Python script that does the migration has all sorts of fun features, exciting options, as well as providing thrilling output. It is a wholesome command line program.

Installation

The script is called migration.py and lives in svn at .../trunk/services/bin

The script requires Python 2.4 to work correctly. Versions before 2.4 will likely produce syntax errors like this one:

File "./migration.py", line 961

    delimitedListOfVersionIdsThatShouldBeDeleted = ",".join((str(x.id) for x in newDB.executeSql("select v.id from versions v where addon_id in (%s)" %  delimitedListOfAddons)))

SyntaxError: invalid syntax

The tests require the use of "generator expressions" that were not available until 2.4.

Also, keep in mind that you shouldn't expect existing data in the target database to persist. You should run the script on a clean target database.

The script has three Python library dependencies:

Use

Disclaimer

migration.py is a sharp edged tool.  It makes it easy to completely screw up the
data in the database, therefore it must be used carefully.  migration.py logs its
every use, so if something bad does happen, you can go back through the log to
see what went wrong and perhaps assign blame.

To use the script, just invoke it on the command line with appropriate options. The complete list of options can be found with this command:

./migration.py --help

Command Line Options

lars@deerlodge:~/project> ./migration.py -?
m1 0.2
This routine migrates data from the old AMO database schems to the new one.
        -a, --addons
                a quoted comma delimited list of the ids of addons OR the word
                "all" (default: all)
            --addonStatusWhenNoFilesApproved
                the status to set an addon if no files approved 
                (null, sandbox, pending, nominated, public, disabled)
                (default: sandbox)
            --addonStatusWhenSomeFilesApproved
                the status to set an addon if some files approved 
                (null, sandbox, pending, nominated, public, disabled)
                (default: public)
            --clear
                clear all exisiting information from the new database
            --clearAddons
                clear all exisiting addons in the "addons" option list
                information from the new database
        -c, --config
                specify the location and name of the config file 
                (default: ./migration.conf)
            --fileCachePath
                a path for the caching of files (blank for no cache)
                the path should no end in /
        -?, --help
                print this message
        -l, --locale
                the locale to use when migrating addons (default: en-US)
            --logPathName
                a progressive log of all runs of the migration script
                (default: ./migration.log)
        -A, --migrateAddons
                if present, this switch causes the addons in the "addons" 
                option list to be migrated
        -M, --migrateMetaData
                if present, this switch causes the metadata tables to 
                be migrated
            --newDatabaseName
                the name of the new database within the server (default: )
            --newPassword
                the password for the user in the new database (default: )
            --newServerName
                the name of the new database server (default: )
            --newUserName
                the name of the user in the new database (default: )
        -n, --not
                reverses the meaning of the "addon" option.  If the "addon"
                option has a list, then specify everything except what's on 
                the list
            --oldDatabaseName
                the name of the old database within the server (default: )
            --oldPassword
                the password for the user in the old database (default: )
            --oldServerName
                the name of the old database server (default: )
            --oldUserName
                the name of the user in the old database (default: )
            --previewURIPrefix
                a prefix to add to the URI in the preview table to enable
                downloading (default: https://addons.mozilla.org)
            --tests
                run the test cases
        -v, --verbose
                print status information as it runs

Usage examples

Migrate all metadata (-M) and all addons (-A) and talk alot about what's going on (-v):

lars@deerlodge:~/project> ./migration.py -MAv
2006-08-21 21:05:56.80 beginning migration version 0.2 with options:
        addons=all
        config=./migration.conf
        migrateAddons
        migrateMetaData
        newDatabaseName=noo
        newPassword=**********
        newServerName=deerlodge
        newUserName=lars
        oldDatabaseName=old
        oldPassword=**********
        oldServerName=deerlodge
        oldUserName=lars
        verbose
        version=0.2
2006-08-21 21:05:56.80 beginning metadata migration
2006-08-21 21:05:56.80  clearing metadata tables...
                        done.
2006-08-21 21:05:58.19  beginning applicationsToApplications...
                        done.
2006-08-21 21:05:58.65  beginning categoriesToTags...
                        done.
2006-08-21 21:05:58.76  beginning osToPlatforms...
                        done.
2006-08-21 21:05:58.79  beginning userprofilesToUsers...
                        done.
2006-08-21 21:06:29.47 beginning addons migration
2006-08-21 21:06:29.47  clearing addons tables...
                        done.
2006-08-21 21:06:29.47  beginning mainToAddOns...
                        done.
2006-08-21 21:06:33.21  beginning authorxrefToAddons_users...
                        done.
2006-08-21 21:06:34.87  beginning versionToVerions...
2006-08-21 21:07:02.08  WARNING -- version ID 15798 of addon 2506 for application 79 fails to migrate.
                        (1452, 'Cannot add or update a child row: a foreign key constraint fails 
                          (`noo/applications_versions`, CONSTRAINT `applications_versions_ibfk_1` FOREIGN KEY 
                          (`application_id`) REFERENCES `applications` (`id`))')
2006-08-21 21:07:02.08  WARNING -- version ID 15799 of addon 2507 for application 79 fails to migrate.
                        (1452, 'Cannot add or update a child row: a foreign key constraint fails 
                          (`noo/applications_versions`, CONSTRAINT `applications_versions_ibfk_1` FOREIGN KEY 
                          (`application_id`) REFERENCES `applications` (`id`))')
                        done.
2006-08-21 21:07:09.55  beginning categoryxrefToAddons_tags...
                        done.
2006-08-21 21:07:13.81  beginning previewsToPreviews...
                        done.
done.

Clear all the Addon information from the new database silently:

lars@deerlodge:~/project> ./migration.py --clearAddons

Copy only addons 3110, 3112 and 3113 to the new database:

lars@deerlodge:~/project> ./m1.py -A -a 3110,3112,3113 -v
2006-08-21 21:14:42.80 beginning migration version 0.2 with options:
        addons=3110,3112,3113
        config=./migration.conf
        migrateAddons
        newDatabaseName=noo
        newPassword=**********
        newServerName=deerlodge
        newUserName=lars
        oldDatabaseName=old
        oldPassword=**********
        oldServerName=deerlodge
        oldUserName=lars
        verbose
        version=0.2
2006-08-21 21:14:42.80  beginning setupAddonSelectionTable...
                        done.
2006-08-21 21:14:42.97 beginning addons migration
2006-08-21 21:14:42.97  clearing addons tables...
                        done.
2006-08-21 21:14:42.97  beginning mainToAddOns...
                        done.
2006-08-21 21:14:43.01  beginning authorxrefToAddons_users...
                        done.
2006-08-21 21:14:43.02  beginning versionToVerions...
                        done.
2006-08-21 21:14:43.14  beginning categoryxrefToAddons_tags...
                        done.
2006-08-21 21:14:43.17  beginning previewsToPreviews...
                        done.
done.

Copy all addons except 3110, 3112 and 3113 to the new database:

lars@deerlodge:~/project> ./migration.py -A -na 3110,3112,3113 -v
2006-08-21 21:16:16.01 beginning migration version 0.2 with options:
        addons=3110,3112,3113
        config=./migration.conf
        migrateAddons
        newDatabaseName=noo
        newPassword=**********
        newServerName=deerlodge
        newUserName=lars
        not
        oldDatabaseName=old
        oldPassword=**********
        oldServerName=deerlodge
        oldUserName=lars
        verbose
        version=0.2
2006-08-21 21:16:16.01  beginning setupAddonSelectionTable...
                        done.
2006-08-21 21:16:18.25 beginning addons migration
2006-08-21 21:16:18.25  clearing addons tables...
                        done.
2006-08-21 21:16:18.25  beginning mainToAddOns...
                        done.
2006-08-21 21:16:21.79  beginning authorxrefToAddons_users...
                        done.
2006-08-21 21:16:23.15  beginning versionToVerions...
2006-08-21 21:16:50.91  WARNING -- version ID 15798 of addon 2506 for application 79 fails to migrate.
                        (1452, 'Cannot add or update a child row: a foreign key constraint fails 
                          (`noo/applications_versions`, CONSTRAINT `applications_versions_ibfk_1` FOREIGN KEY 
                          (`application_id`) REFERENCES `applications` (`id`))')
2006-08-21 21:16:50.91  WARNING -- version ID 15799 of addon 2507 for application 79 fails to migrate.
                        (1452, 'Cannot add or update a child row: a foreign key constraint fails 
                          (`noo/applications_versions`, CONSTRAINT `applications_versions_ibfk_1` FOREIGN KEY 
                          (`application_id`) REFERENCES `applications` (`id`))')
                        done.
2006-08-21 21:16:58.47  beginning categoryxrefToAddons_tags...
                        done.
2006-08-21 21:17:02.80  beginning previewsToPreviews...
                        done.
done.

Remove Addons 2506 and 2507 from the new database:

lars@deerlodge:~/project> ./migration.py --clearAddons -a 2506,2507

Reinsert Addon 2506 as a German language addon:

lars@deerlodge:~/project> ./migration.py -A -a 2506 -l de

Independent Information (metadata)

applications to applications, appversions

Old Database New Database
<thead> </thead> <tbody> </tbody>
AppID applications.id
GUID applications.guid
-
AppName translations.localized_string
insert id into applications.name
-
AppName translations.localized_string
insert id into applications.shortname
-
supported supported
-
major no mapping
-
minor no mapping
-
release no mapping
}
Old Database New Database
<thead> </thead> <tbody> </tbody>
AppID appversions.id
AppID appversions.application_id
use AppID for originally encountered application
-
Version appversions.version
-
appversions.created
time.now
-
appversions.modified
use default value
}

categories to tags

Old Database New Database
<thead> </thead> <tbody> </tbody>
CategoryID id
CatName translations.localized_string
insert id into name
-
CatDesc translations.localized_string
insert id into description
-
CatType addontype_id
lookup in addontypes.name assuming mapping of E->extensions, T->themes...
-
CatApp application_id
lookup in applications.name
-
created
time.now
-
modified
use default value
}

os to platforms

Old Database New Database
<thead> </thead> <tbody> </tbody>
OSID id
OSName translations.localized_string
insert id into name
-
OSName translations.localized_string
insert id into shortname
-
created
time.now
-
modified
use default value
}

userprofiles to users

Old Database New Database
<thead> </thead> <tbody> </tbody>
UserID id
UserName parse to firstname, lastname
lastname is always the last word from Username
firstname is all but the last word or "?" if there are no other words
-
nickname
no source for data, Null not allowed, no default - I put in an empty string
-
UserEmail email
-
UserWebsite homepage
-
UserPass password
-
UserMode no mapping
-
UserTrusted no mapping
-
UserEmailHide emailhidden
-
UserLastLogin no mapping
-
ConfirmationCode confirmationcode
-
created
time.now
-
modified
use default value
-
notes
null
}

authorxref to addons_users

Old Database New Database
<thead> </thead> <tbody> </tbody>
ID addon_id
UserID user_id
-
created
time.now
-
modified
use default value
}

add-on specific information (per addon ID)

main to addons, translations

Old Database New Database
<thead> </thead> <tbody> </tbody>
ID id
GUID guid
-
Name translations.localized_string
insert into translations, then fetch id
name
-
Type addontype_id
will have to find type in type table
-
DateAdded created
-
DateUpdated modified
-
Homepage translations.localized_string
insert id into homepage
-
Description translations.localized_string
insert into translations, then fetch id
description
-
Rating averagerating
-
downloadcount weeklydownloads
-
TotalDownloads totaldownloads
-
devcomments translations.localized_string
insert id into developercomments
-
icon
use default value
-
Description translations.localized_string
take up to 250 characters but not more than the first line, do not break inside a word, add ellipsis if break is not at the end of sentence insert into translations, then fetch id
summary
-
inactive
use default value
-
prerelease
use default value
-
adminreview
use default value
-
sitespecific
use default value
-
externalsoftware
use default value
-
approvalnotes
use NULL
-
eula
use NULL
-
privacypolicy
use NULL
}

version to versions, files, applications_versions

Old Database New Database
<thead> </thead> <tbody> </tbody>
vID versions.id
ID versions.addon_id
-
Version versions.version
-
MinAppVer_int
no mapping
-
MaxAppVer_int
no mapping
-
DateAdded versions.created
-
DateUpdated versions.modified
-
Notes translations.localized_string
insert id into versions.releasenotes
-
addons.approvalnotes
empty string
-
versions.created
time.now
-
modified
use default value
}
Old Database New Database
<thead> </thead> <tbody> </tbody>
AppID applications_versions.application_id
vID applications_versions.version_id
-
MinAppVer applications_versions.min
look up in appversions table
ambiguous versions are translated via the technique outlined in update:ToublesInAddonMigration. Missing versions are added to the appversiontable
-
MaxAppVer applications_versions.max
look up in appversions table
ambiguous versions are translated via the technique outlined in update:ToublesInAddonMigration. Missing versions are added to the appversiontable
-
applications_versions.created
time.now
-
applications_versions.modified
use default value
}
Old Database New Database
<thead> </thead> <tbody> </tbody>
id
auto-increment
vID files.version_id
-
OSID files.platform_id
-
URI files.filename
use base name only
-
Size files.size
-
hash files.hash
-
DateApproved files.dateapproved
-
approved files.approved
mapping -- "YES": 4, "NO": 1, "?":2, "DISABLED":5
-
files.created
time.now
-
files.modified
use default value
}

categoryxref to addons_tags

Old Database New Database
<thead> </thead> <tbody> </tbody>
ID addon_id
CategoryID tag_id
}

previews to previews

Old Database New Database
<thead> </thead> <tbody> </tbody>
PreviewID id
PreviewURI filedata
data from download
-
PreviewURI filetype
the MIME type from download
-
PreviewURI thumbdata
data from download cooked into an image of 200x150 with aspect ratio preserved (add transparent filler if needed)
-
PreviewURI thumbtype
the MIME type from download
-
ID addon_id
-
caption translations.localized_string
insert id from translations into into caption
-
preview highlight
conversion -- "YES":1, "NO":0
-
created
time.now
-
modified
use default value
}

Feedback

If we don't migrate add-ons which are only compatible with fx 1.0, then users can upload add-ons with the same em:ID and then users who might have had the old add-ons still installed but disabled (or who are still using fx 1.0) will be updated to the new one.

Previews

  • No previews were migrated -- but we need to solve this issue to help get images over.
  • In order to pull them we'd need to generate the URI and grab them so we can insert them into the proper blob in the previews table

Old db:

mysql> describe previews;
+------------+------------------+------+-----+---------+----------------+

Type | Null | Key | Default | Extra |

+------------+------------------+------+-----+---------+----------------+

int(11) | | PRI | NULL | auto_increment | varchar(200) | | | | | int(5) | | MUL | 0 | | varchar(255) | | | | | enum('YES','NO') | | | NO | |

+------------+------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
  • "PreviewURI" column points to the relative path of the preview.
  • "ID" is the FK to main.ID
  • "caption" would get ported to dynamic l10n
  • "preview" would be "highlight" in the new previews table

Moving to new db:

  • Use "https://addons.mozilla.org".$PreviewURI to grab the image and insert it into the filedata blog in the new db.
  • Filetype should be preserved as well so it can display properly in the images controller.

Duplicate Files

  • (clouserw) I think there were some duplications within the files table. For example:
mysql> select * from files where filename like 'stumbleupon-2.1%';
+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+

version_id | platform_id | filename | size | hash | codereview | status | datestatuschanged | created | modified |

+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+

7507 | 1 | stumbleupon-2.1-fx+fl+mz+ns.xpi | 101 | | 0 | 4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 | 7507 | 1 | stumbleupon-2.1-fx+fl+mz+ns.xpi | 101 | | 0 | 4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 | 7507 | 1 | stumbleupon-2.1-fx+fl+mz+ns.xpi | 101 | | 0 | 4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 | 7507 | 1 | stumbleupon-2.1-fx+fl+mz+ns.xpi | 101 | | 0 | 4 | 0000-00-00 00:00:00 | 2006-12-22 00:20:46 | 0000-00-00 00:00:00 |

+--------+------------+-------------+---------------------------------+------+------+------------+--------+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)
  • (morgamic) I think the script may have created duped rows in files because in the old 'version' table rows would be duped except for the app/appversion info -- in this database that data is spliced off. Might want to start there.
  • (lars) files are duplicated because sometimes the the same physical file is specified for more than one application (Firefox & Mozilla) - it is very easy to prevent this from resulting in two entries in to the file table. However, if the two entries may not match exactly (hash, approval, etc), the one seen later will superceed the one seen earlier...

Duplicate Categories/Tags

  • I haven't looked at the results since last week, but as I recall, there are many duplicate categories. My guess is that this is because in v1/v2, each application has to have its own category, so there might be an "Emo" category for Firefox and a different "Emo" category for Thunderbird, and the add-on is in both of them, but it only shows one because we grouped by name before. We will need to look at this and figure out whether to fix this in the code or migration. Fligtar 16:28, 27 December 2006 (PST)
  • (lars) the database schema for the tags table requires that we duplicate the entries because it has foreign keys to both the addontypes and applications tables. There are six tags called "Miscellaneous" because we need entries for the addontypes "extension" and "theme" for the applications "Firefox", "Mozilla" and "Thunderbird"

Other

  • There are blank filenames in the files table - is this a result of the migration?
    • (lars) empty filenames in the migration are faithful reproductions from the old database. There are entries in the olddb version table where the URI column is blank. What would you like me to do with them?
  • Some sha sums are calculated in the files table, some aren't. Since we're migrating data, would it be useful to calculate hashes while we're at it? (Would it be easily doable?)

Updates, 2007-02-07

Things that we need to tweak:

  • When there are multiple files per version/platform (one for each app, for example) AND there is at least one Firefox file, keep the firefox one and get rid of the non-firefox one.
     SELECT
        addons.guid as guid,
        addons.id as id,
        addons.addontype_id as type,
        applications.guid as appguid,
        appmin.version as min,
        appmax.version as max,
        files.filename,
        files.hash,
        versions.version as version
    FROM
        versions 
    INNER JOIN addons ON versions.addon_id = addons.id AND addons.guid = '{c45c406e-ab73-11d8-be73-000a95be3b12}' AND addons.status = 4
    INNER JOIN applications_versions ON applications_versions.version_id = versions.id 
    INNER JOIN applications ON applications_versions.application_id = applications.id  AND applications.guid = '{ec8030f7-c20a-464f-9b0e-13a3a9e97384}'       
    INNER JOIN appversions appmin ON appmin.id = applications_versions.min AND '2.0.0.2pre' >= appmin.version
    INNER JOIN appversions appmax ON appmax.id = applications_versions.max  
    INNER JOIN files ON files.version_id = versions.id AND (files.platform_id = 1 OR files.platform_id = '3' ) AND files.status = 4
    ORDER BY
        versions.id DESC
    LIMIT 1 

In the case above, when using the migrated data set, the -mz file is returned because it is selected at random. The file should actually be the -fx file.

  • Verify main.description is getting trimmed at max 250 chars. when migrating to summary

Final Migration

Clouserw performed the final migration on 2007-03-05:

I expect this script will be retired at this point, but just to be complete, this is what happened with the final run:

  • Initial AMOv2 database was loaded with a db dump from v2 from 2007-03-05 08:10am
    • Initial AMOv3 database was loaded with remora.sql from r2206
    • migration.conf was setup, etc. Final command run was ./migration.py -MAv --recalculateHash > migration.out 2>&1. We had to run the --recalculateHash flag so the script would download new addons (if it finds a hash in the db, it doesn't download the addon)
    • The following previews failed to migrate (but there are blank entries for them in the db):
    acid_burn-1.jpg for addon_id 453 acid_burn-2.jpg for addon_id 453 futurama-4.jpg for addon_id 1240 kiminbu?-1.jpg for addon_id 2904 ook?_video_ook!-4.jpg for addon_id 2584 webos_xul#_extension-1.jpg for addon_id 2945 ook?_video_ook!-5.jpg for addon_id 2584 print_preview-2.jpg for addon_id 1778
  • Ran the supplemental SQL
  • ./migration.py --clearAddons -a cat amov2_addons_marked_for_deletion to remove the old disabled addons. Just some housekeeping.