Places/Stats: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Ideas for improving the places queries logged)
(Added section Problems with Places)
 
(2 intermediate revisions by one other user not shown)
Line 48: Line 48:
* Add visit_type summation to capture usage of bookmarks and overall visitation patterns (link click, bookmark, etc)
* Add visit_type summation to capture usage of bookmarks and overall visitation patterns (link click, bookmark, etc)
** select count(*) as N, visit_type from moz_historyvisits group by visit_type
** select count(*) as N, visit_type from moz_historyvisits group by visit_type
* Compute distribution metrics on Session Length to inform design of trail style history visutalizations
** sum of square, cubes, and 4th powers provides greater ability to characterize a distribution than min, max, mean, median
== Problems with Places ==
The following problems in the database schema, APIs, or Places in general are preventing really interesting experimentation, analysis, or user experience:
* Tab spawns... (fill me out!)
== Related Research ==
* Hartmut Obendorf, Harald Weinreich, Eelco Herder, Matthias Mayer. Web Page Revisitation Revisited: Implications of a Long-term Click-stream Study of Browser Usage in: CHI 2007
** http://vsis-www.informatik.uni-hamburg.de/publications/view.php/280
** Classifies use of history into 3 tasks: short-term revisits (backtrack or undo), medium-term (re-utilize or observe) and long-term revisits (rediscover)
** 70% of revisits occur within 1 hour
** Bookmarks trigger less than 12% of url visits

Latest revision as of 17:09, 29 June 2009

Context

See Places-Stats.mozilla


Analysis

Goals:

  • Insights into usage of bookmarks history
  • Define characteristics for test places databases
  • Use open source tools to create and iterate on reproducible analysis of the places stats data set
  • [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.)

Toolset

Code

See the Etherpad Page for the scratchpad

Load Data (save https://places-stats.mozilla.com/stats?format=csv locally)

places <- read.csv("...places.csv")

Compute age metrics

places$oldest_stamp = as.POSIXct(strptime(as.character(places$visit_date_oldest),format="%m/%d/%y %H:%M"))
places$newest_stamp = as.POSIXct(strptime(as.character(places$visit_date_newest),format="%m/%d/%y %H:%M"))
places$time_delta = difftime(places$newest_stamp,places$oldest_stamp, units="days")

Tags & Bookmark Metrics

places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt
places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt
places$user_of_tags  <- ifelse(places$tag_cnt > 0, c("1"), c("0"))

Other Derived Values

places$percent_visits_new = places$places_visited_unique_cnt / places$moz_historyvisits_cnt
places$pages_per_day = places$moz_historyvisits_cnt / as.numeric(places$time_delta)

Subsets

taggers <- places[places$tag_cnt > 0,]
livemarkers <- places[places$livemark_container_cnt > 0,]
bookmarkers <- places[places$bookmark_cnt>20,]

Improvements to Data Collection

  • Add visit_type summation to capture usage of bookmarks and overall visitation patterns (link click, bookmark, etc)
    • select count(*) as N, visit_type from moz_historyvisits group by visit_type
  • Compute distribution metrics on Session Length to inform design of trail style history visutalizations
    • sum of square, cubes, and 4th powers provides greater ability to characterize a distribution than min, max, mean, median

Problems with Places

The following problems in the database schema, APIs, or Places in general are preventing really interesting experimentation, analysis, or user experience:

  • Tab spawns... (fill me out!)

Related Research

  • Hartmut Obendorf, Harald Weinreich, Eelco Herder, Matthias Mayer. Web Page Revisitation Revisited: Implications of a Long-term Click-stream Study of Browser Usage in: CHI 2007