How to connect pyspark from a spark cluster to a DSE cassandra cluster remotely

Steven Lacerda
Nov 19, 2019

--

This article is meant to help those connecting from a remote Apache Spark cluster to DSE cassandra using pyspark.

To do so, you need to get the byos.properties and byos<version>.jar files from the DSE cassandra nodes, you can do so by first generating the byos.properties file from one of your analytics nodes:

dse client-tool configuration byos-export ~/byos.properties

Now, locate the byos jar file:

$ locate dse-byos
/usr/share/dse/clients/dse-byos_2.11-6.7.10.jar
/usr/share/dse/spark/client-lib/dse-byos_2.11-6.7.10.jar

Use scp to copy them over to your remote spark node where you will be running pyspark:

scp user@dsenode1.example.com:/usr/share/dse/clients/dse-byos_2.10–5.0.1–5.0.0-all.jar byos-5.0.jar

From your node, enter pyspark using the following command:

pyspark --jars dse-byos_2.11–6.7.5.jar --properties-file byos.properties

Of course, you’ll need to alter the file names to be appropriate.

--

--

Steven Lacerda
Steven Lacerda

Written by Steven Lacerda

Steve Lacerda is a software engineer specializing in web development. His favorite 80’s song is Let’s Put the X in Sex by Kiss.

No responses yet