Pyspark with spark 2.1.0 - Python cannot be version 3.6.


To get started using spark with pyton, you can

1. Install Anaconda Python - which has all the goodies you need.

2.Download Spark and unzip into a folder.

3. After you have all these setup, next you need to issue the following command (spark only supports python 3.5)

conda create -n py35 python=3.5 anaconda

activate py35

4. Goto your spark installation folder, goto "bin" and run "pyspark".

5. You probably going to get some exceptions but still should be able to run the following scripts :


from pyspark import SparkContext
sc = SparkContext.getOrCreate()
tf = sc.textFile("j:\\tmp\\data.txt")
tf.count()

Please make sure you have your "data.txt" pointed correctly.

This setup looks easier than it is. Spent a lot of time today trying to get it up and running.








Comments

Popular posts from this blog

The specified initialization vector (IV) does not match the block size for this algorithm