Places/Stats: Difference between revisions
< Places
Jump to navigation
Jump to search
(→Analysis: cluster analysis to define test database extents) |
(Added section Problems with Places) |
||
(6 intermediate revisions by one other user not shown) | |||
Line 33: | Line 33: | ||
places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt | places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt | ||
places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt | places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt | ||
places$user_of_tags <- ifelse(places$tag_cnt > 0, c("1"), c("0")) | |||
Other Derived Values | Other Derived Values | ||
Line 42: | Line 43: | ||
taggers <- places[places$tag_cnt > 0,] | taggers <- places[places$tag_cnt > 0,] | ||
livemarkers <- places[places$livemark_container_cnt > 0,] | livemarkers <- places[places$livemark_container_cnt > 0,] | ||
bookmarkers <- places[places$bookmark_cnt>20,] | |||
== Improvements to Data Collection == | |||
* Add visit_type summation to capture usage of bookmarks and overall visitation patterns (link click, bookmark, etc) | |||
** select count(*) as N, visit_type from moz_historyvisits group by visit_type | |||
* Compute distribution metrics on Session Length to inform design of trail style history visutalizations | |||
** sum of square, cubes, and 4th powers provides greater ability to characterize a distribution than min, max, mean, median | |||
== Problems with Places == | |||
The following problems in the database schema, APIs, or Places in general are preventing really interesting experimentation, analysis, or user experience: | |||
* Tab spawns... (fill me out!) | |||
== Related Research == | |||
* Hartmut Obendorf, Harald Weinreich, Eelco Herder, Matthias Mayer. Web Page Revisitation Revisited: Implications of a Long-term Click-stream Study of Browser Usage in: CHI 2007 | |||
** http://vsis-www.informatik.uni-hamburg.de/publications/view.php/280 | |||
** Classifies use of history into 3 tasks: short-term revisits (backtrack or undo), medium-term (re-utilize or observe) and long-term revisits (rediscover) | |||
** 70% of revisits occur within 1 hour | |||
** Bookmarks trigger less than 12% of url visits |
Latest revision as of 17:09, 29 June 2009
Context
Analysis
Goals:
- Insights into usage of bookmarks history
- Define characteristics for test places databases
- Use open source tools to create and iterate on reproducible analysis of the places stats data set
- [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.)
Toolset
Code
See the Etherpad Page for the scratchpad
Load Data (save https://places-stats.mozilla.com/stats?format=csv locally)
places <- read.csv("...places.csv")
Compute age metrics
places$oldest_stamp = as.POSIXct(strptime(as.character(places$visit_date_oldest),format="%m/%d/%y %H:%M")) places$newest_stamp = as.POSIXct(strptime(as.character(places$visit_date_newest),format="%m/%d/%y %H:%M")) places$time_delta = difftime(places$newest_stamp,places$oldest_stamp, units="days")
Tags & Bookmark Metrics
places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt places$user_of_tags <- ifelse(places$tag_cnt > 0, c("1"), c("0"))
Other Derived Values
places$percent_visits_new = places$places_visited_unique_cnt / places$moz_historyvisits_cnt places$pages_per_day = places$moz_historyvisits_cnt / as.numeric(places$time_delta)
Subsets
taggers <- places[places$tag_cnt > 0,] livemarkers <- places[places$livemark_container_cnt > 0,] bookmarkers <- places[places$bookmark_cnt>20,]
Improvements to Data Collection
- Add visit_type summation to capture usage of bookmarks and overall visitation patterns (link click, bookmark, etc)
- select count(*) as N, visit_type from moz_historyvisits group by visit_type
- Compute distribution metrics on Session Length to inform design of trail style history visutalizations
- sum of square, cubes, and 4th powers provides greater ability to characterize a distribution than min, max, mean, median
Problems with Places
The following problems in the database schema, APIs, or Places in general are preventing really interesting experimentation, analysis, or user experience:
- Tab spawns... (fill me out!)
Related Research
- Hartmut Obendorf, Harald Weinreich, Eelco Herder, Matthias Mayer. Web Page Revisitation Revisited: Implications of a Long-term Click-stream Study of Browser Usage in: CHI 2007
- http://vsis-www.informatik.uni-hamburg.de/publications/view.php/280
- Classifies use of history into 3 tasks: short-term revisits (backtrack or undo), medium-term (re-utilize or observe) and long-term revisits (rediscover)
- 70% of revisits occur within 1 hour
- Bookmarks trigger less than 12% of url visits