8/20/2019 Where Does Pip Install Pyspark
The solution for this is to use pip to install ipython. But by default, pip is not installed in cloudera VM, and pip can not be installed by eacyinstall as well. And after some search, I noticed cloudera has yum installed, and yum can install pip.
Posted by1 year ago
Archived
Sagemaker - Pyspark kernel & matplotlib
Hi there,
I don't think this is strictly an AWS question/issue, however was wondering if someone perhaps knows this stuff better than me, or can point me in the right direction.
I've got Jupyter on Sagemaker connecting to my EMR Spark cluster, and it works great. However, now I want to get more fancy with my Notebooks.
If you choose the 'conda' kernels on Jupyter, matplotlib for example is installed already and you can just start creating plots in the notebook. This doesn't seem to be the case for pyspark - it cant import matplotlib by itself. I'm assuming its the different Python environments.
I've tried adding 'pip install matplotlib' in my EMR bootstrap action - but I still cannot import matplotlib in my notebooks? Is this supposed to work or am I missing a step?
Does anyone know if it's even possible to use a pyspark kernel in Jupyter?
86% Upvoted
Get prebuilt sparkTest prebuilt spark (this should open a spark console, use Ctrl+C to exit )Get virtualenv: We assume your python is installed under your home dir, so no sudo is needed.
If you want to install python under your home dir, get the tarball from here and use
./configure --prefix=any/dir/of/your/choice/where/you/have/write/access . Then, you need to make install and add python's bin to the $PATH environment variable.
To install
virtualenv
Start new virtualenvGet necessary scientific python packagesedit bashrc or spark-2.1.0-bin-hadoop2.7/conf/spark-env.shpaste the following in spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh (this file doesn't originally exist, you have to create it)Start a jupyter notebook with pyspark (edit the number of slave processes [4] appropriately)If you executed all of the above on remote machine from a local linux box via ssh:
You can open a ssh tunnel as follows. This way, you can open the jupyter notebook in your local browser instead of having to use the browser on the remote machine via
ssh -X . In case of the following tunnel, you need to open your local browser at http://localhost:8889 and enter the token printed in your terminal in the previous step.
(Above gist has been successfully tested with Ubuntu 14.04 LTS on Intel Xeon E5-2620 and Intel Celeron N3160)
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |