Telemetry/Custom analysis with spark: Difference between revisions

Scheduled Jobs
(Updates for Using Spark)
(Scheduled Jobs)
Line 34: Line 34:


The notebook is setup to work with Spark. See the "Using Spark" section for more information.
The notebook is setup to work with Spark. See the "Using Spark" section for more information.
=== Setting Up a Dashboard ===
Scheduled Spark jobs allow a Jupyter notebook to be updated consistently, making a nice and easy-to-use dashboard.
To schedule a Spark job:
# Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.
# Click “Schedule a Spark Job”.
# Enter some details:
##      The “Job Name” field should be a short descriptive name, like “chromehangs analysis”.
##      Upload your IPython notebook containing the analysis.
##      Set the number of workers of the cluster in the “Cluster Size” field.
##      Set a schedule frequency using the remaining fields.
Now, the notebook will be updated automatically, and the results can be easily shared.
For reference, see [https://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly Simple Dashboard with Scheduled Spark Jobs and Plotly].


== Using Spark ==
== Using Spark ==
Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data.
Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data.
Check out [https://robertovitillo.com/2015/06/30/spark-best-practices/ Spark Best Practices] for tips on using Spark to it's full capabilities.


=== SparkContext (sc) ===
=== SparkContext (sc) ===
29

edits