Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Perform data engineering with Azure Synapse Apache Spark Pools

Microsoft via Microsoft Learn

Overview

  • Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
  • After completing this module, you will be able to:

    • Differentiate between Apache Spark and Spark pools
    • Differentiate between Azure Databricks and Spark pools
    • Differentiate between HDInsight and Spark Pools
    • Differentiate between Spark Pools and SQL Pools
    • Understand the use-cases of data engineering with Apache Spark in Azure Synapse analytics
    • Create a Spark pool in Azure Synapse Analytics
  • Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
  • After completing this module, you will be able to:

    • Understand the use-cases for Spark Notebooks
    • Create a Spark Notebook in Azure Synapse Analytics
    • Understand the supported languages in Spark Notebooks
    • Develop Spark Notebooks
    • Run Spark Notebooks
    • Load data in Spark Notebooks
    • Save Spark Notebooks
  • Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
  • After completing this module, you will be able to:

    • Understand DataFrames in Spark Pools in Azure Synapse Analytics
    • Load data into a Spark DataFrame
    • Create a Spark table
    • Write Data to and from a storage account
    • Load a streaming DataFrame into Apache Spark
    • Flatten nested structures and explode arrays with Apache Spark
  • Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
  • After completing this module, you will be able to:

    • Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
    • Understand the use-cases for SQL and Spark Pools integration
    • Authenticate in Azure Synapse Analytics
    • Transfer data between SQL and Spark Pool in Azure Synapse Analytics
    • Authenticate between Spark and SQL Pool in Azure Synapse Analytics
    • Integrate SQL and Spark Pools in Azure Synapse Analytics
    • Externalize the use of Spark Pools within Azure Synapse workspace
    • Transfer data outside the Synapse workspace using SQL Authentication
    • Transfer data outside the Synapse workspace using the PySpark Connector
    • Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics
  • Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
  • After completing this module, you will be able to:

    • Monitor Spark Pools in Azure Synapse Analytics
    • Understand Resource Utilization of Spark Pools in Azure Synapse Analytics
    • Monitor Query activity of Spark Pools in Azure Synapse Analytics
    • Base-line Apache Spark performance with Apache Spark History Server in Azure Synapse Analytics
    • Optimize Apache Spark jobs in Azure Synapse Analytics
    • Automate scaling of Apache Spark pools in Azure Synapse Analytics

Syllabus

  • Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
    • Introduction
    • What is an Apache Spark pool in Azure Synapse Analytics
    • How do Apache Spark pools work in Azure Synapse Analytics
    • When do you use Apache Spark pools in Azure Synapse Analytics
    • Knowledge check
    • Summary
  • Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
    • Introduction
    • Introduction to spark notebooks
    • Understand the use-cases for spark notebooks
    • Exercise: Create a spark notebook in Azure Synapse Analytics
    • Discover supported languages in spark notebooks
    • Develop spark notebooks
    • Exercise: Develop spark notebooks
    • Run spark notebooks
    • Exercise: Run spark notebooks
    • Load data in spark notebooks
    • Exercise: Load data in spark notebooks
    • Save spark notebooks
    • Knowledge check
    • Summary
  • Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
    • Introduction
    • Introduction to dataframes in spark pools in Azure Synapse Analytics
    • Load data into a spark dataframe
    • Exercise: Load data into a spark dataframe
    • Exercise: Create a spark table
    • Flatten nested structures and explode arrays with Apache Spark
    • Exercise: Flatten nested structures and explode arrays with Apache Spark in synapse
    • Knowledge check
    • Summary
  • Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
    • Introduction
    • Describe the integration methods between SQL and spark pools in Azure Synapse Analytics
    • Understand the use-cases for SQL and spark pools integration
    • Authenticate in Azure Synapse Analytics
    • Transfer data between SQL and spark pool in Azure Synapse Analytics
    • Authenticate between spark and SQL pool in Azure Synapse Analytics
    • Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
    • Externalize the use of spark pools within Azure Synapse Workspace
    • Transfer data outside the synapse workspace using the PySpark connector
    • Knowledge check
    • Summary
  • Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
    • Introduction
    • Monitor spark pools in Azure Synapse Analytics
    • Base-line Apache Spark performance with Apache Spark history server in Azure Synapse Analytics
    • Optimize Apache Spark jobs in Azure Synapse Analytics
    • Automate scaling of Apache Spark pools in Azure Synapse Analytics
    • Knowledge check
    • Summary

Reviews

Start your review of Perform data engineering with Azure Synapse Apache Spark Pools

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.