Name: Azure Databricks & Spark For Data Engineers (PySpark / SQL)
Price: 4127.22916547 RND
Rating: 4.6441293 (15556 reviews)

$ Infinity

Free

100% Off

Promotion

Earn point reward

189 Bài học

Approximately 20h to complete

There are 92039 participants

What you'll learn

You will learn how to build a real world data project using Azure Databricks and Spark Core. This course has been taught using real world data.

You will acquire professional level data engineering skills in Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2 and Azure Data Factory (ADF)

You will learn how to create notebooks, dashboards, clusters, cluster pools and jobs in Azure Databricks

You will learn how to ingest and transform data using PySpark in Azure Databricks

You will learn how to transform and analyse data using Spark SQL in Azure Databricks

You will learn about Data Lake architecture and Lakehouse Architecture. Also, you will learn how to implement a Lakehouse architecture using Delta Lake.

You will learn how to create Azure Data Factory pipelines to execute Databricks notebooks

You will learn how to create Azure Data Factory triggers to schedule pipelines as well as monitor them.

You will gain the skills required around Azure Databricks and Data Factory to pass the Azure Data Engineer Associate certification exam DP203

You will learn how to connect to Azure Databricks from PowerBI to create reports

You will gain a comprehensive understanding about Unity Catalog and the data governance capabilities offered by Unity Catalog.

You will learn to implement a data governance solution using Unity Catalog enabled Databricks workspace.

Course content

28 Sections

• 189 Lessons

• 20h 1m

Introduction

4 Lessons

• 09m 40s

Azure Subscription (Optional)

2 Lessons

• 09m 43s

Azure Databricks Overview

4 Lessons

• 30m 25s

Databricks Clusters

9 Lessons

• 63m 47s

Databricks Notebooks

6 Lessons

• 30m 25s

Accessing Azure Data Lake from Databricks

9 Lessons

• 65m 12s

Securing Access to Azure Data Lake

7 Lessons

• 29m 08s

Mounting Data Lake Container to Databricks

5 Lessons

• 32m 21s

Formula1 Project Overview

5 Lessons

• 29m 38s

Spark Introduction

2 Lessons

• 06m 47s

Data Ingestion - CSV

11 Lessons

• 65m 45s

Data Ingestion - JSON

9 Lessons

• 37m 08s

Data Ingestion - Multiple Files

4 Lessons

• 11m 31s

Databricks Workflows

5 Lessons

• 34m 18s

Filter & Join Transformations

8 Lessons

• 50m 38s

Aggregations

6 Lessons

• 40m 06s

Using SQL in Spark Applications

2 Lessons

• 14m 27s

Spark SQL - Databases/ Tables/ Views

11 Lessons

• 72m 45s

Spark SQL - Filters/ Joins/ Aggregations

5 Lessons

• 36m 52s

Spark SQL - Analysis

8 Lessons

• 44m 21s

Incremental Load

13 Lessons

• 93m 43s

Delta Lake

16 Lessons

• 140m 34s

Azure Data Factory

10 Lessons

• 83m 43s

Connect to Other Services

1 Lessons

• 09m 26s

Unity Catalog - Introduction

13 Lessons

• 77m 19s

Unity Catalog - Mini Project

6 Lessons

• 35m 07s

Unity Catalog - Key Benefits

6 Lessons

• 40m 22s

Next Steps

2 Lessons

• 00m 51s

Survey the need to receive video lessons via Google Drive

survey

Preview

Describe

Major updates to the course since the launch

May 2023 - New sections 25, 26 and 27 added to include Unity Catalog. Unity Catalog is a recent addition to Databricks which offers unified data governance solution for a Data Lakehouse. These sections cover all aspects of Unity Catalog and the implementation using a project.

March 2023 - New sections 6 and 7 added. Section 8 Updated. These changes are to reflect latest Databricks recommendations around accessing Azure Data Lake. Also, this provides a better solution to complete the course project for students using Azure Student Subscription or Corporate Subscriptions with limited access to Azure Active Directory.

December 2022 - Sections 3, 4 & 5 updated to reflect recent UI changes to Azure Databricks. Also included lessons on additional functionality included by Databricks recently to Databricks clusters. .

Welcome!

I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Databricks! This course has been taught with implementing a data engineering solution using Azure Databricks and Spark core for a real world project of analysing and reporting on Formula1 motor racing data.

This is like no other course in Udemy for Azure Databricks. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Databricks. I have also included lessons on Azure Data Lake Storage Gen2, Azure Data Factory as well as PowerBI. The primary focus of the course is Azure Databricks and Spark core, but it also covers the relevant concepts and connectivity to the other technologies mentioned. Please note that the course doesn't cover other aspects of Spark such as Spark streaming and Spark ML. Also the course has been taught using PySpark as well as Spark SQL; It doesn't cover Scala or Java.

The course follows a logical progression of a real world project implementation with technical concepts being explained and the Databricks notebooks being built at the same time. Even though this course is not specifically designed to teach you the skills required for passing the Azure Data Engineer Associate Certification Exam DP203, it can greatly help you get most of the necessary skills required for the exam.

I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basics and by the end of the course you will be proficient in the technologies used.

Currently the course teaches you the following

Azure Databricks

Building a solution architecture for a data engineering solution using Azure Databricks, Azure Data Lake Gen2, Azure Data Factory and Power BI
Creating and using Azure Databricks service and the architecture of Databricks within Azure
Working with Databricks notebooks as well as using Databricks utilities, magic commands etc
Passing parameters between notebooks as well as creating notebook workflows
Creating, configuring and monitoring Databricks clusters, cluster pools and jobs
Mounting Azure Storage in Databricks using secrets stored in Azure Key Vault
Working with Databricks Tables, Databricks File System (DBFS) etc
Using Delta Lake to implement a solution using Lakehouse architecture
Creating dashboards to visualise the outputs
Connecting to the Azure Databricks tables from PowerBI

Spark (Only PySpark and SQL)

Spark architecture, Data Sources API and Dataframe API
PySpark - Ingestion of CSV, simple and complex JSON files into the data lake as parquet files/ tables.
PySpark - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.
PySpark - Creating local and temporary views
Spark SQL - Creating databases, tables and views
Spark SQL - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.
Spark SQL - Creating local and temporary views
Implementing full refresh and incremental load patterns using partitions

Delta Lake

Emergence of Data Lakehouse architecture and the role of delta lake.
Read, Write, Update, Delete and Merge to delta lake using both PySpark as well as SQL
History, Time Travel and Vacuum
Converting Parquet files to Delta files
Implementing incremental load pattern using delta lake

Unity Catalog

Overview of Data Governance and Unity Catalog
Create Unity Catalog Metastore and enable a Databricks workspace with Unity Catalog
Overview of 3 level namespace and creating Unity Catalog objects
Configuring and accessing external data lakes via Unity Catalog
Development of mini project using unity catalog and seeing the key data governance capabilities offered by Unity Catalog such as Data Discovery, Data Audit, Data Lineage and Data Access Control.

Azure Data Factory

Creating pipelines to execute Databricks notebooks
Designing robust pipelines to deal with unexpected scenarios such as missing files
Creating dependencies between activities as well as pipelines
Scheduling the pipelines using data factory triggers to execute at regular intervals
Monitor the triggers/ pipelines to check for errors/ outputs.

Reviews

4.64

(15,556 Ratings)

No Data Available in this Section