Spark Deployment Modes
Lecture 28 : Deployment Modes in Spark🔗
Below is the normal Spark Architecture
Here we have a separate EC2 instance called edge node. Its configuration is not as much as the other nodes.
User does not connect directly to the cluster rather connects to the edge node now.
They can login to the edge node and perform tasks. Kerberos is used for Authentication and Authorization.
Any data that needs to be submitted to cluster also goes through edge node.
The /bin/spark-submit folder is on the edge node, it contains hadoop client libaries YARN is not installed here.
client mode deployment🔗
Driver is made on the edge node.
cluster mode🔗
In cluster mode, the driver is created on the cluster.
pros and cons of client mode🔗
Pro :
- The user can see the cluster logs on their own system.
Con :
- Once the driver in the local system shuts down, the executors also go down.
- When we submit on client mode we will have network latency. Two way communication creates lot of delay.
In cluster mode, we are given an application id and using that we can see the spark ui details.