BMO/ElasticSearch: Difference between revisions

m
Line 138: Line 138:
|}
|}


As humans, we know what happened here:  Kyle (me) changed his email address (somewhere between Jan2nd and Jan4th), and then removed himself from the CC list for the bug.  The ETL script has no such domain knowledge, and simply sees and inconsistency.  A naive rebuilding of the CC list history would have to assume '''kyle@lahnakoski.com''' was in the CC list since the beginning (which is a legitimate situation, but uncommon, for the first snapshot of a bug).  In aggregate, with all these mismatches, the naive rebuilding of historical record resulted in concluding many bugs started with long CC lists, that were eventually paired down over time to what currently exists.  This pattern is quite opposite of reality; where a bug starts with usually few people and the list grows.
As humans, we know what happened here:  Kyle (me) changed his email address (somewhere between Jan2nd and Jan4th), and then removed himself from the CC list for the bug.  The ETL script has no such domain knowledge, and simply sees an inconsistency.  A naive rebuilding of the CC list history would have to assume '''kyle@lahnakoski.com''' was in the CC list since the beginning (which is a legitimate situation, but uncommon, for the first snapshot of a bug).  In aggregate, with all these mismatches, the naive rebuilding of historical record resulted in concluding many bugs started with long CC lists, that were eventually paired down over time to what currently exists.  This pattern is quite opposite of reality; where a bug starts with usually few people and the list grows.


I implemented an alias analysis that uses the inconsistency in the history, specifically '''klahnakoski@mozilla.com''' was added to the CC (+1) but does not exist in the current bug state (+0).  '''mcote@mozilla.com''' was added (+1) and exists (+1), so the logic is consistent.  We must conclude removal is '''kyle@lahnakoski.com''' (-1) matches to addition of '''klahnakoski@mozilla.com''' (+1) to zero effect (0).  Really, we are solving a simple equation:
I implemented an alias analysis that uses the inconsistency in the history, specifically '''klahnakoski@mozilla.com''' was added to the CC (+1) but does not exist in the current bug state (+0).  '''mcote@mozilla.com''' was added (+1) and exists (+1), so the logic is consistent.  We must conclude removal is '''kyle@lahnakoski.com''' (-1) matches to addition of '''klahnakoski@mozilla.com''' (+1) to zero effect (0).  Really, we are solving a simple equation:
Confirmed users
513

edits