Microsoft Fabric
Microsoft Fabric
Section 1 : Introduction and Installation
1. Introduction
2. Pre Requisites
3. Project Architecture
4. Installation
- Ensure Hierarchial namespace enabled to create Azure Data Lake Storage Gen2 resource.
- We dont get charged for Synapse Analytics until we create compute.
Create Spark Pool
Spark Pool Settings
Section 2 : Understanding Microsoft Fabric
5. Evolution of Architecture
Metadata caching layer brings ACID properties.
6. Delta Lake Structure
What happens under the hood?
7. Why Fabric?
Lot of services need to be created individually.
8. Starting Up Fabric
- Login to fabric with Microsoft entra id account for free trial.
- Go to Settings -> Admin Portal -> enable fabric
9. Licensing and Costs
Creating Azure Capacity
If you are not able to select subscription follow these steps
-
If you can open the subscription but not perform actions:
-
Go to Azure Portal > Subscriptions
-
Click on the subscription
-
Go to Access Control (IAM) > Role Assignments
-
Filter by Role = Owner
-
You’ll see a list of users, groups, or service principals who are assigned the “Owner” role.
10. Fabric Terms
Example
11. OneLake in Fabric
Data is stored in One data lake based on the workspace names.
There is only one storage point.
We will have only one copy of data and nothing is duplicated.
The files are stored in parquet metadata powered by delta lake.
12. One Copy for all computes
All engines store data in One Lake.
All data stored in delta parquet format.
Section 3 : Fabric Lakehouse
13. Microsoft Fabric Workspaces
14. Workspace Roles
15. Creating a Lakehouse
When we create a Lakehouse there are three things:
Lakehouse - data platform to store the data.
Semantic Model - Dataset to present to powerbi.
SQL Endpoint - we can run queries.
16. Lakehouse Explorer
Data Uploaded to table
Table created from the file
Files are stored in parquet with delta log
Here is delta log info
On clicking properties we can see if its managed or not.
17. SQL Analytics Endpoint
We can only read data from this enpoint not write / update.
We can create views
18. Default Semantic Model View
In the context of the semantic model in Microsoft Fabric lakehouse, the semantic model itself doesn't directly store raw data. Instead, it provides a logical, structured view of the data stored in the underlying data lake or warehouse.
The semantic model acts as an abstraction layer that organizes and simplifies access to the data, making it easier for reporting and analysis tools to query the data efficiently. The raw data is stored in the data lake or data warehouse, and the semantic model helps to structure and shape this data into meaningful formats suitable for analysis and reporting.
Section 4 : Data Factory in Fabric
19. How to Load data in Lakehouse
20. Fabric vs Azure Data Factory
21. Data Gateway Types
- Gateway connects two networks
Imagine you have a big box of toys at home, and you want to show them to your friends who live far away. You have two ways to show them: one way is through a special window, and the other way is to use a delivery truck.
On-Premise Data Gateway (like a special window):
This is like a window that you open to let your friends see your toys without taking them out of the box. It connects your toys (data) at home to an online game or app that your friends are using. You can think of this as a way to share data that's stored in your house but don't let your friends take it out or change it. It keeps your toys safe inside but lets you show them off.
VNet Data Gateway (like a delivery truck):
This method is like using a delivery truck to send some of your toys to your friends' houses. The VNet (Virtual Network) is a big, secure road that connects your house and your friends' houses. When you use this truck, you're moving data across this secure road, allowing your friends to actually play with the toys (data) over at their place, but still keeping it safe and controlled. So, in simple terms, the on-premise data gateway lets you show your toys to friends securely while they are still at home, and the VNet data gateway lets you share some toys by sending them out safely to your friend's houses.
22. Connections
Click Gear Icon -> Manage Connections and Gateways
- Gateway : Equivalent to Integration Runtime in ADF
- Connection : Similar to Linked Service in ADF
23. Creating Pipeline
Step 1 : Lookup Activity to query the SQL connecte ddatabase
Step 2 : Foreach activity to go over both tables
Step 3 : For each iteration run copy data activity
Destination : Our Onelake data lakehouse
24. Dataflow Gen2
Adding Data Source to Dataflow Gen 2
Alice here has Blob Storage Contributor role that can be granted in container screen.
Click Combine
Click Add Column -> Custom Column
if [State] = "CA" then "California" else if
[State] = "NJ" then "New Jersey" else if
[State] = "FL" then "Florida"
else [State]
Click Add Destination -> Lakehouse
Next Go to Home -> Save and Run, refresh should automatically start
We should be able to see the data once refresh is completed.
Section 5 : Fabric One Lake
25. Shortcuts in Fabric
Shortcuts can be created only at OneLake level.
Let's say finance team wants data from marketing lakehouse.
They can create a shortcut to the marketing lakehouse without copying the data.
The data is refreshed/updated automatically.
No need to copy data while loading from Amazon S3.
26. How to create a shortcut?
27. Creating Files Shortcut
Deleting file at Azure Data Lake Storage
Data gets deleted here in fabric also.
Deleting data in Fabric
Data gets deleted in Azure Blob also.
28. Creating Table Shortcut
We can see that this table is in unmanaged section
In Microsoft Fabric, unidentified tables are entries displayed in the managed section of your data environment that lack associated metadata or table references. Here’s a breakdown of the concept:
Managed vs. Unmanaged: In Fabric, the managed section refers to tables that have both metadata and data managed by the Fabric engine. In contrast, the unmanaged section allows you to upload files in any format, which do not have the same management.
Unidentified Tables: If you create a table that is not in the delta format, it will be saved in the unidentified folder. This often occurs when files, such as CSVs, are included without a defined table structure, leading Fabric to categorize them as unidentified.
Purpose: The main goal of the unidentified prompt is to alert users that these files do not conform to the required structure for the managed section and do not support any tables. Essentially, it indicates that there are files present that need to be reviewed and potentially removed.
If we want files from sub folder we cant create shortcut.
When we create shortcut from files it can be from sub directories also.
Now I dropped a parquet file in adls and there is no unmanaged error.
29. Creating Delta from Parquet
-
Go to synapse workspace
-
Create new notebook.
df = spark.read.format("parquet").load('abfss://containername@storageaccountname.dfs.core.windows.net/UnEmployment.parquet')
df.write.format('delta').save('abfss://shortcutdelta@msfabriclakehousevedanth.dfs.core.windows.net/')
30. Creating Shortcut in Fabric
Just execute above code and create a table level shortcut.