Skip to content

Microsoft Fabric

Microsoft Fabric

Section 1 : Introduction and Installation

1. Introduction

image

image

2. Pre Requisites

image

3. Project Architecture

image

image

image

4. Installation

  • Ensure Hierarchial namespace enabled to create Azure Data Lake Storage Gen2 resource.
  • We dont get charged for Synapse Analytics until we create compute.

Create Spark Pool

Spark Pool Settings

image

Section 2 : Understanding Microsoft Fabric

5. Evolution of Architecture

image

Metadata caching layer brings ACID properties.

6. Delta Lake Structure

image

image

image

What happens under the hood?

image

7. Why Fabric?

image

Lot of services need to be created individually.

image

image

8. Starting Up Fabric

  • Login to fabric with Microsoft entra id account for free trial.
  • Go to Settings -> Admin Portal -> enable fabric

9. Licensing and Costs

image

image

image

Creating Azure Capacity

Microsoft Official Link

If you are not able to select subscription follow these steps

  • If you can open the subscription but not perform actions:

  • Go to Azure Portal > Subscriptions

  • Click on the subscription

  • Go to Access Control (IAM) > Role Assignments

  • Filter by Role = Owner

  • You’ll see a list of users, groups, or service principals who are assigned the “Owner” role.

10. Fabric Terms

image

Example

image

11. OneLake in Fabric

Data is stored in One data lake based on the workspace names.

There is only one storage point.

image

We will have only one copy of data and nothing is duplicated.

The files are stored in parquet metadata powered by delta lake.

12. One Copy for all computes

All engines store data in One Lake.

image

All data stored in delta parquet format.

Section 3 : Fabric Lakehouse

13. Microsoft Fabric Workspaces

image

14. Workspace Roles

image

15. Creating a Lakehouse

When we create a Lakehouse there are three things:

image

Lakehouse - data platform to store the data.

Semantic Model - Dataset to present to powerbi.

SQL Endpoint - we can run queries.

16. Lakehouse Explorer

image

Data Uploaded to table

image

Table created from the file

image

Files are stored in parquet with delta log

image

Here is delta log info image

On clicking properties we can see if its managed or not.

image

17. SQL Analytics Endpoint

We can only read data from this enpoint not write / update.

image

We can create views

image

18. Default Semantic Model View

image

image

In the context of the semantic model in Microsoft Fabric lakehouse, the semantic model itself doesn't directly store raw data. Instead, it provides a logical, structured view of the data stored in the underlying data lake or warehouse.

The semantic model acts as an abstraction layer that organizes and simplifies access to the data, making it easier for reporting and analysis tools to query the data efficiently. The raw data is stored in the data lake or data warehouse, and the semantic model helps to structure and shape this data into meaningful formats suitable for analysis and reporting.

Section 4 : Data Factory in Fabric

19. How to Load data in Lakehouse

image

20. Fabric vs Azure Data Factory

image

21. Data Gateway Types

  • Gateway connects two networks

image

Imagine you have a big box of toys at home, and you want to show them to your friends who live far away. You have two ways to show them: one way is through a special window, and the other way is to use a delivery truck.

On-Premise Data Gateway (like a special window):

This is like a window that you open to let your friends see your toys without taking them out of the box. It connects your toys (data) at home to an online game or app that your friends are using. You can think of this as a way to share data that's stored in your house but don't let your friends take it out or change it. It keeps your toys safe inside but lets you show them off.

VNet Data Gateway (like a delivery truck):

This method is like using a delivery truck to send some of your toys to your friends' houses. The VNet (Virtual Network) is a big, secure road that connects your house and your friends' houses. When you use this truck, you're moving data across this secure road, allowing your friends to actually play with the toys (data) over at their place, but still keeping it safe and controlled. So, in simple terms, the on-premise data gateway lets you show your toys to friends securely while they are still at home, and the VNet data gateway lets you share some toys by sending them out safely to your friend's houses.

image

image

image

22. Connections

image

Click Gear Icon -> Manage Connections and Gateways

image

  • Gateway : Equivalent to Integration Runtime in ADF
  • Connection : Similar to Linked Service in ADF

23. Creating Pipeline

Step 1 : Lookup Activity to query the SQL connecte ddatabase

image

Step 2 : Foreach activity to go over both tables

image

Step 3 : For each iteration run copy data activity

image

Destination : Our Onelake data lakehouse

image

24. Dataflow Gen2

image

Adding Data Source to Dataflow Gen 2

image

Alice here has Blob Storage Contributor role that can be granted in container screen.

image

Click Combine

image

Click Add Column -> Custom Column

image

if [State] = "CA" then "California" else if 
[State] = "NJ" then "New Jersey" else if
[State] = "FL" then "Florida"
else [State]

Click Add Destination -> Lakehouse

image

Next Go to Home -> Save and Run, refresh should automatically start

We should be able to see the data once refresh is completed.

image

Section 5 : Fabric One Lake

25. Shortcuts in Fabric

Shortcuts can be created only at OneLake level.

Let's say finance team wants data from marketing lakehouse.

They can create a shortcut to the marketing lakehouse without copying the data.

The data is refreshed/updated automatically.

image

No need to copy data while loading from Amazon S3.

26. How to create a shortcut?

image

image

image

27. Creating Files Shortcut

image

image

Deleting file at Azure Data Lake Storage

image

Data gets deleted here in fabric also.

Deleting data in Fabric

image

Data gets deleted in Azure Blob also.

image

28. Creating Table Shortcut

image

We can see that this table is in unmanaged section

image

image

In Microsoft Fabric, unidentified tables are entries displayed in the managed section of your data environment that lack associated metadata or table references. Here’s a breakdown of the concept:

Managed vs. Unmanaged: In Fabric, the managed section refers to tables that have both metadata and data managed by the Fabric engine. In contrast, the unmanaged section allows you to upload files in any format, which do not have the same management.

Unidentified Tables: If you create a table that is not in the delta format, it will be saved in the unidentified folder. This often occurs when files, such as CSVs, are included without a defined table structure, leading Fabric to categorize them as unidentified.

Purpose: The main goal of the unidentified prompt is to alert users that these files do not conform to the required structure for the managed section and do not support any tables. Essentially, it indicates that there are files present that need to be reviewed and potentially removed.

If we want files from sub folder we cant create shortcut.

When we create shortcut from files it can be from sub directories also.

Now I dropped a parquet file in adls and there is no unmanaged error.

image

29. Creating Delta from Parquet

image

image

  1. Go to synapse workspace

  2. Create new notebook.

df = spark.read.format("parquet").load('abfss://containername@storageaccountname.dfs.core.windows.net/UnEmployment.parquet')
df.write.format('delta').save('abfss://shortcutdelta@msfabriclakehousevedanth.dfs.core.windows.net/')

30. Creating Shortcut in Fabric

Just execute above code and create a table level shortcut.

image

image