Telemetry/Custom analysis with spark: Difference between revisions

Jump to navigation Jump to search
m
(→‎How do I load an external library into the cluster?: Update external library loading instructions to include ipython context + alternate egg downloading method)
Line 182: Line 182:
Assuming you've got a url for the repo, you can create an egg for it this way:
Assuming you've got a url for the repo, you can create an egg for it this way:


  import sys
  import os
   !git clone <repo url> && cd <repo-name> && python setup.py bdist_egg
   !git clone <repo url> && cd <repo-name> && python setup.py bdist_egg
   sc.addPyFile('<repo-name>/dist/my-egg-file.egg')
   sc.addPyFile('<repo-name>/dist/my-egg-file.egg')
  sys.path.append(os.path.join(os.getcwd(), '<repo-name>/dist/my-egg-file.egg'))


Alternately, you could just create that egg locally, upload it to a web server, then download and install it:
Alternately, you could just create that egg locally, upload it to a web server, then download and install it:


   import requests
   import requests
  import sys
  import os
   r = requests.get('<url-to-my-egg-file>')
   r = requests.get('<url-to-my-egg-file>')
   with open('mylibrary.egg', 'wb') as f:
   with open('mylibrary.egg', 'wb') as f:
     f.write(r.content)
     f.write(r.content)
   sc.addPyFile('mylibrary.egg')
   sc.addPyFile('mylibrary.egg')
  sys.path.append(os.path.join(os.getcwd(), 'mylibrary.egg'))


You will want to do this '''before''' you load the library. If the library is already loaded, restart the kernel in the ipython notebook.
You will want to do this '''before''' you load the library. If the library is already loaded, restart the kernel in the ipython notebook.
Confirmed users
955

edits

Navigation menu