Sources of data
- Bugzilla (full MySQL data dump available; talk to Josh for details)
- Interesting data: Comments, comment dates, comment authors, attachments (eg. code contributions), flag for first code contribution
- mozilla-central or github
- Mailing list archives of users who say they wish to write code for Mozilla
- For correlation with other pieces of data, usually
Interesting data to be mined
- Contributors who are no longer active
- In progress, complete with ranking algorithm to prioritize prolific contributors
- Mentoring effectiveness
- Bugs in bugzilla can have mentor=foo annotations; determine ratio of fixed to open bugs for mentor foo and dig into results (number of different people commenting in bugs, number of code contributions, how active mentor is, etc.)
- Breakdown about volunteer activity across groups of components (such as Core: DOM and friends)
- Requires access to lists of employee names
- Effectiveness of mentored bugs as stepping stone
- Figure out how many new contributors (ie. people whose first contribution was in 2012) contributed to at least one mentored bug and at least one non-mentored bug