EngineeringProductivity/Projects/Treeherder: Difference between revisions

EngineeringProductivity/Projects/Treeherder (view source)

Revision as of 23:25, 20 December 2012

108 bytes added , 20 December 2012

→‎We Need A Data Model

Jeads

Confirmed users

353

edits

@@ Line 43: / Line 43: @@
 == We Need A Data Model ==
-After examining TBPL and it's relationship to Buildbot and mercurial it's unclear that TBPL should be the "source of truth" for build/branch/revision and os/platform/test nomenclature or for obtaining mappings between build and push data.  The original sources of that information are the build system (Buildbot or any other automated build process) and the source code repository (mercurial or git).  There are a variety of third party applications: TBPL, orange factor, graph server, talos, datazilla, go faster etc... (the number continues to grow!) that make use of subsets of this information by parsing build logs (many times over and in a multitude of different ways) or using related services/data.  Each of these applications, including TBPL, attempts to represent the relationship (or subset of) between a build/push/branch/revision and os/platform/test in its own way with its own implementation of a "data model".  These data models often maintain multiple methods of mapping between different application nomenclature versions.  Some examples would include branch or os names from Buildbot->TBPL->OrangeFactor or Talos->TBPL->Graph Server/Datazilla.  This makes it impossible to add new automation without breaking an armada of downstream applications.
+After examining TBPL and it's relationship to Buildbot and mercurial it's unclear that TBPL should be the "source of truth" for build/branch/revision and os/platform/test nomenclature or for obtaining mappings between build and push data.  The original sources of that information are the build system (Buildbot or any other automated build process) and the source code repository (mercurial or git).  There are a variety of third party applications: TBPL, orange factor, graph server, talos, datazilla, go faster etc... (the number continues to grow!) that make use of subsets of this information by parsing build logs (many times over and in a multitude of different ways) or using related services/data.  Each of these applications, including TBPL, attempts to represent the relationship (or subset of) between a build/push/branch/revision and os/platform/test in its own way with its own implementation of a "data model".  These data models often maintain multiple methods of mapping between different application nomenclature versions.  Some examples would include branch or os names from Buildbot->TBPL->OrangeFactor or Talos->TBPL->Graph Server/Datazilla.  This makes it impossible to add new automation without breaking an armada of downstream applications and severely limits the quality of applications that can be delivered.  We desperately need to standardize.
 We need a data model definition for all of the entities involved (product, build, push, branch, revision, os, platform, test etc...), their attributes, and the relationship between them.  With that in hand we could start identifying where correct root sources of truth reside. And at what point in the lifecycle of a source code push and build that the information is available.  Retrieval of some data will require accessing both a build system and a related source code repository.