Pyspark with spark 2.1.0 - Python cannot be version 3.6.

June 14, 2017

To get started using spark with pyton, you can

1. Install Anaconda Python - which has all the goodies you need.

2.Download Spark and unzip into a folder.

3. After you have all these setup, next you need to issue the following command (spark only supports python 3.5)

conda create -n py35 python=3.5 anaconda

activate py35

4. Goto your spark installation folder, goto "bin" and run "pyspark".

5. You probably going to get some exceptions but still should be able to run the following scripts :

from pyspark import SparkContext

sc = SparkContext.getOrCreate()

tf = sc.textFile("j:\\tmp\\data.txt")

tf.count()

Please make sure you have your "data.txt" pointed correctly.

This setup looks easier than it is. Spent a lot of time today trying to get it up and running.

Search This Blog

mitzen

Pyspark with spark 2.1.0 - Python cannot be version 3.6.

Comments

Popular posts from this blog

Nextjs - How do you handle onclick which do something

The specified initialization vector (IV) does not match the block size for this algorithm

Azure function error : Missing value for AzureWebJobsStorage in local.settings.json