Telemetry/Custom analysis with spark: Difference between revisions

Telemetry/Custom analysis with spark (view source)

Revision as of 18:32, 8 September 2016

1,080 bytes added , 8 September 2016

Scheduled Jobs

Fbertsch

29

edits

@@ Line 34: / Line 34: @@
 The notebook is setup to work with Spark. See the "Using Spark" section for more information.
+=== Setting Up a Dashboard ===
+Scheduled Spark jobs allow a Jupyter notebook to be updated consistently, making a nice and easy-to-use dashboard.
+To schedule a Spark job:
+# Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.
+# Click “Schedule a Spark Job”.
+# Enter some details:
+##      The “Job Name” field should be a short descriptive name, like “chromehangs analysis”.
+##      Upload your IPython notebook containing the analysis.
+##      Set the number of workers of the cluster in the “Cluster Size” field.
+##      Set a schedule frequency using the remaining fields.
+Now, the notebook will be updated automatically, and the results can be easily shared.
+For reference, see [https://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly Simple Dashboard with Scheduled Spark Jobs and Plotly].
 == Using Spark ==
 Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data.
+Check out [https://robertovitillo.com/2015/06/30/spark-best-practices/ Spark Best Practices] for tips on using Spark to it's full capabilities.
 === SparkContext (sc) ===