Mobile/Metrics/Redash

This page explains how to use Mozilla's Re:dash query and visualization system.

Overview

Re:dash is a data collaboration and visualization platform. The system can connect to a variety of datasources. Mozilla already connects several datasources (Presto, Crash-DB and Sync) to our instance. Telemetry data (UT and UI) are extracted via Spark, stored as Parquet files and exposed to Presto DB into Re:dash. If you want to query and analyze Mobile event data, make sure you are using "Presto" (the default) when making an SQL query.

Presto is a distributed SQL query engine designed to query large data sets. Browse the documentation to learn more about the SQL syntax in Presto.

Data Tables / Schemas

If you want to explore the exposed data tables in Re:dash, use this query:

show tables

If you want to examine the structure of a specific table, use this query:

describe <table-name>

Mobile telemetry data is split into two tables: android_clients_v1 and android_events_v1.

The android_clients_v1 table holds data about common, mostly stable, information about individual Firefox installs. Each install has a unique clientid UUID.

clientid	varchar		unique identifier
profiledate	timestamp	when the installation was created, based on the profile
submissiondate	timestamp	when the telemetry record was received by the server
creationdate	timestamp	when the telemetry record was generated by the client
appversion	varchar		version of Firefox
osversion	bigint		version of Android SDK
locale		varchar		locale code
defaultsearch	varchar		current default search engine name
device		varchar		device identifier string
arch		varchar		CPU architecture
channel		varchar		channel name (parition)
submission	varchar		submission as a string 'YYYYMMDD' (partition)

Some of these fields can change during the lifetime of a Firefox installation. Some can't. A record is created for each telemetry ping, so it is possible to track the history of a client over time. You can see the default search engine change or the OS get updated, for a given client.

The android_events_v1 table holds event data (UI Telemetry). Each event is a record.

clientid	varchar		unique identifier
submissiondate	timestamp	when the telemetry record was received by the server
ts		bigint		relative timestamp associated with the event
action		varchar		name of the event action
method		varchar		name of the event method
extras		varchar		extra contextual hint associated with the event
sessions	varchar		sessions associated with the event (JSON array)
experiments	varchar		a/b experiments associated with the event (JSON array)
channel		varchar		channel name (parition)
submission	varchar		submission as a string 'YYYYMMDD' (partition)

Useful Queries

This guide doesn't try to teach SQL, but assumes you have a basic understanding.

Looking at Clients

Find the profile creation date for clients:

select distinct
  profiledate,
  clientid
from android_clients_v1
where
  profiledate is not null

Looking at Events

You want to look at basic UI Telemetry over the last 7 days, like the UI Telemetry dashboard:

select
  submissiondate as date,
  channel,
  action,
  method,
  extras,
  count(*) as count
from android_events_v1
where
  submissiondate > current_date - interval '7' day
group by 1, 2, 3, 4, 5

Same, but only those events that happened during a firstrun session:

select
  submissiondate as date,
  channel,
  action,
  method,
  extras,
  count(*) as count
from android_events_v1
where
  submissiondate > current_date - interval '7' day and sessions like '%firstrun%'
group by 1, 2, 3, 4, 5

Let's breakdown loadurl.1 events into some subgroups over the last 7 days:

select
  submissiondate as date,
  count(action) as allloads,
  sum(case when extras = 'bookmarks' then 1 else 0 end) as bookmarks,
  sum(case when extras = 'reading_list' then 1 else 0 end) as readinglist,
  sum(case when extras = 'top_sites' then 1 else 0 end) as topsites,
  sum(case when extras = 'history' then 1 else 0 end) as history,
  sum(case when extras = 'frecency' then 1 else 0 end) as frecency,
  sum(case when method = 'actionbar' and extras = 'user' then 1 else 0 end) as user_typed,
  count(distinct clientid) as users
from
  android_events_v1
where
  submissiondate > current_date - interval '7' day and action = 'loadurl.1'
group by 1

Resources

Turns out that many other companies use event data and SQL, so we can benefit from what they have already done. Here are some blogs and posts that might help you build some SQL queries.

  • Periscope Data has a blog with tons of posts on using SQL and event data for analyzing retention, churn and active users. They also cover some general SQL topics too.
  • Mode Analytics covers using SQL on event data, but has a lot of general analysis posts too.
  • Segment doesn't cover as much SQL as others, but does have posts on the process of data collection and analysis. Worth taking a look.