πΉ What is Blob Storage?π
Blob = Binary Large Object β any file (text, image, video, parquet, JSON, etc.) Azure Blob Storage is Microsoftβs object storage solution for unstructured data.
Itβs cheap, scalable, durable β you can store petabytes of data and pay only for what you use.
πΉ Types of Blobsπ
Azure Blob storage supports 3 types:
-
Block Blob (most common)
-
Optimized for streaming and storing files.
- Stores data as blocks β you can upload in chunks.
-
Used for: documents, CSV, Parquet, images, logs.
-
Append Blob
-
Optimized for append operations.
-
Great for logs β you can only add to the end, not modify existing content.
-
Page Blob
-
Optimized for random read/write.
- Used for VM disks (VHD files).
π For Data Engineering / Delta Lake β youβll almost always use Block Blobs.
πΉ Storage Account + Containersπ
- A Storage Account = the root of your blob storage.
- Inside it, you create containers β logical groups of blobs.
- Inside containers, you can have folders (if Hierarchical Namespace is enabled = ADLS Gen2).
Example path:
Breakdown:
abfss
β protocol for ADLS Gen2 secure access.bronze
β container.mydatalake
β storage account.dfs.core.windows.net
β ADLS Gen2 endpoint.2025/08/data.csv
β folder path + file.
πΉ Access Tiers (Cost Optimization)π
Blob storage offers 3 main tiers:
- Hot β frequently accessed, higher cost per GB, lower access cost.
- Cool β infrequently accessed, cheaper storage, higher access charges.
- Archive β very cheap storage, but must be βrehydratedβ before use (hours).
Example:
- Store last 30 days of logs in Hot.
- Move logs > 30 days old to Cool.
- Move logs > 1 year old to Archive.
πΉ Security & Accessπ
-
Authentication options:
-
Azure AD (recommended) β RBAC roles, Managed Identity.
- Shared Key (account key) β full access, risky.
-
SAS Tokens β temporary, limited access (e.g., read-only link valid for 1 hour).
-
Authorization:
-
RBAC roles:
Storage Blob Data Reader
β read only.Storage Blob Data Contributor
β read/write.Storage Blob Data Owner
β full control.
-
Networking:
-
Private endpoints (VNet integration).
- Firewalls + IP restrictions.
πΉ Features for Data Engineeringπ
-
Hierarchical Namespace (HNS) β required for Data Lake Gen2.
-
Allows directories + POSIX-like permissions.
- Needed for Delta Lake + Databricks UC.
- Soft delete / versioning β recover accidentally deleted blobs.
- Lifecycle rules β auto-move data across tiers.
- Event Grid integration β trigger pipelines when new data arrives.
- Immutable blobs (WORM) β compliance, canβt be modified/deleted.
πΉ Example Scenario (ETL Pipeline with Blob Storage)π
- Raw CSV files land in
bronze
container. - Azure Function + Event Grid detects new files.
- Data Factory (ADF) or Databricks picks up files β transforms β saves as Delta in
silver
. - Aggregated tables saved in
gold
. - Access controlled via Unity Catalog external location with Managed Identity.
πΉ Quick Analogyπ
- Block Blob = Lego blocks (you can build files in chunks).
- Append Blob = notebook (you can only keep adding pages).
- Page Blob = hard disk (you can jump to any page and edit).