Telemetry/Custom analysis with spark: Difference between revisions

(Created page with "This page is currently in-progress. == Introduction == Spark is a data processing engine designed to be fast and easy to use. We have setup Jupyter workbooks that use Spark t...")
 
Line 17: Line 17:
# A cluster will be launched on AWS preconfigured with Spark, IPython and some handy data analysis libraries like pandas and matplotlib.
# A cluster will be launched on AWS preconfigured with Spark, IPython and some handy data analysis libraries like pandas and matplotlib.


Once the cluster is ready, you can tunnel IPython through SSH by following the instructions on the dashboard, e.g.:
Once the cluster is ready, you can tunnel IPython through SSH by following the instructions on the dashboard, e.g. run:
ssh -i my-private-key -L 8888:localhost:8888 hadoop@ec2-54-70-129-221.us-west-2.compute.amazonaws.com
ssh -i my-private-key -L 8888:localhost:8888 hadoop@ec2-54-70-129-221.us-west-2.compute.amazonaws.com


Finally, you can launch IPython in Firefox by visiting http://localhost:8888.
Finally, you can launch IPython in Firefox by visiting http://localhost:8888.
29

edits