Spark Deployment Modes

Below is the normal Spark Architecture

Here we have a separate EC2 instance called edge node. Its configuration is not as much as the other nodes.

User does not connect directly to the cluster rather connects to the edge node now.

They can login to the edge node and perform tasks. Kerberos is used for Authentication and Authorization.

Any data that needs to be submitted to cluster also goes through edge node.

The /bin/spark-submit folder is on the edge node, it contains hadoop client libaries YARN is not installed here.

Driver is made on the edge node.

In cluster mode, the driver is created on the cluster.

Pro :

Con :

Once the driver in the local system shuts down, the executors also go down.
When we submit on client mode we will have network latency. Two way communication creates lot of delay.

In cluster mode, we are given an application id and using that we can see the spark ui details.