This page explains how to use Mozilla's Re:dash query and visualization system.
Overview
Re:dash is a data collaboration and visualization platform. The system can connect to a variety of datasources. Mozilla already connects several datasources (Presto, Crash-DB and Sync) to our instance. Telemetry data (UT and UI) are extracted via Spark, stored as Parquet files and exposed to Presto DB into Re:dash. If you want to query and analyze Mobile event data, make sure you are using "Presto" (the default) when making an SQL query.
Presto is a distributed SQL query engine designed to query large data sets. Browse the documentation to learn more about the SQL syntax in Presto.
Data Tables / Schemas
If you want to explore the exposed data tables in Re:dash, use this query:
show tables
If you want to examine the structure of a specific table, use this query:
describe <table-name>
Mobile telemetry data is split into two tables: android_clients_v1
and android_events_v1
.
The android_clients_v1
table holds data about common, mostly stable, information about individual Firefox installs. Each install has a unique clientid
UUID.
clientid varchar unique identifier profiledate timestamp when the installation was created, based on the profile submissiondate timestamp when the telemetry record was received by the server creationdate timestamp when the telemetry record was generated by the client appversion varchar version of Firefox osversion bigint version of Android SDK locale varchar locale code defaultsearch varchar current default search engine name device varchar device identifier string arch varchar CPU architecture channel varchar channel name (parition) submission varchar submission as a string 'YYYYMMDD' (partition)
Some of these fields can change during the lifetime of a Firefox installation. Some can't. A record is created for each telemetry ping, so it is possible to track the history of a client over time. You can see the default search engine change or the OS get updated, for a given client.
The android_events_v1
table holds event data (UI Telemetry). Each event is a record.
clientid varchar unique identifier submissiondate timestamp when the telemetry record was received by the server ts bigint relative timestamp associated with the event action varchar name of the event action method varchar name of the event method extras varchar extra contextual hint associated with the event sessions varchar sessions associated with the event (JSON array) experiments varchar a/b experiments associated with the event (JSON array) channel varchar channel name (parition) submission varchar submission as a string 'YYYYMMDD' (partition)
Useful Queries
This guide doesn't try to teach SQL, but assumes you have a basic understanding.
Looking at Clients
Find the profile creation date for clients:
select distinct profiledate, clientid from android_clients_v1 where profiledate is not null
Looking at Events
You want to look at basic UI Telemetry over the last 7 days, like the UI Telemetry dashboard:
select submissiondate as date, channel, action, method, extras, count(*) as count from android_events_v1 where submissiondate > current_date - interval '7' day group by 1, 2, 3, 4, 5
Same, but only those events that happened during a firstrun
session:
select submissiondate as date, channel, action, method, extras, count(*) as count from android_events_v1 where submissiondate > current_date - interval '7' day and sessions like '%firstrun%' group by 1, 2, 3, 4, 5
Let's breakdown loadurl.1
events into some subgroups over the last 7 days:
select submissiondate as date, count(action) as allloads, sum(case when extras = 'bookmarks' then 1 else 0 end) as bookmarks, sum(case when extras = 'reading_list' then 1 else 0 end) as readinglist, sum(case when extras = 'top_sites' then 1 else 0 end) as topsites, sum(case when extras = 'history' then 1 else 0 end) as history, sum(case when extras = 'frecency' then 1 else 0 end) as frecency, sum(case when method = 'actionbar' and extras = 'user' then 1 else 0 end) as user_typed, count(distinct clientid) as users from android_events_v1 where submissiondate > current_date - interval '7' day and action = 'loadurl.1' group by 1
Resources
Turns out that many other companies use event data and SQL, so we can benefit from what they have already done. Here are some blogs and posts that might help you build some SQL queries.
- Periscope Data has a blog with tons of posts on using SQL and event data for analyzing retention, churn and active users. They also cover some general SQL topics too.
- Mode Analytics covers using SQL on event data, but has a lot of general analysis posts too.
- Segment doesn't cover as much SQL as others, but does have posts on the process of data collection and analysis. Worth taking a look.