29
edits
(Updates for Using Spark) |
(Scheduled Jobs) |
||
Line 34: | Line 34: | ||
The notebook is setup to work with Spark. See the "Using Spark" section for more information. | The notebook is setup to work with Spark. See the "Using Spark" section for more information. | ||
=== Setting Up a Dashboard === | |||
Scheduled Spark jobs allow a Jupyter notebook to be updated consistently, making a nice and easy-to-use dashboard. | |||
To schedule a Spark job: | |||
# Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address. | |||
# Click “Schedule a Spark Job”. | |||
# Enter some details: | |||
## The “Job Name” field should be a short descriptive name, like “chromehangs analysis”. | |||
## Upload your IPython notebook containing the analysis. | |||
## Set the number of workers of the cluster in the “Cluster Size” field. | |||
## Set a schedule frequency using the remaining fields. | |||
Now, the notebook will be updated automatically, and the results can be easily shared. | |||
For reference, see [https://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly Simple Dashboard with Scheduled Spark Jobs and Plotly]. | |||
== Using Spark == | == Using Spark == | ||
Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data. | Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data. | ||
Check out [https://robertovitillo.com/2015/06/30/spark-best-practices/ Spark Best Practices] for tips on using Spark to it's full capabilities. | |||
=== SparkContext (sc) === | === SparkContext (sc) === |
edits