Mastering AWS Elastic Map Reduce EMR for Data Engineers
What you’ll learn
Creating Clusters using AWS Elastic Map Reduce Web Console
Setup Remote Application Development using AWS Elastic Map Reduce (EMR) and Visual Studio Code
Develop and Validate Simple Spark Application using Visual Studio Code and AWS Elastic Map Reduce (EMR)
Deploy Spark Application as Step to AWS Elastic Map Reduce (EMR)
Manage AWS Elastic Map Reduce (EMR) based Pipelines using Boto3 and Python
Build End to End AWS Elastic Map Reduce (EMR) based Pipelines using AWS Step Functions
Develop Applications using Spark SQL on AWS EMR Cluster
Build State Machine or Pipeline using AWS Step Functions using Spark SQL Script on AWS EMR Cluster
Understand how to pass parameters to Spark SQL Scripts deployed on EMR
Requirements
A computer science or IT Degree or 1 or 2 years of IT Experience
Basic Linux Skills with ability to run commands using Terminal
Programming Skills using Python is required
Valid AWS Account to use the AWS Services to learn how to build Data Pipelines using AWS Lambda Functions
Description
AWS Elastic Map Reduce (EMR) is one of the key AWS Services used in building large-scale data processing leveraging Big Data Technologies such as Apache Hadoop, Apache Spark, Hive, etc. As part of this course, you will end up learning AWS Elastic Map Reduce (EMR) by building end-to-end data pipelines leveraging Apache Spark and AWS Step Functions.Here is the detailed outline of the course.First, you will learn how to Get Started with AWS Elastic Map Reduce (EMR) by understanding how to use AWS Web Console to create and manage EMR Clusters. You will also learn about all the key features of Web Console and also how to connect to the master node of the cluster and validate all the important CLI interfaces such as spark-shell, pyspark, hive, etc as well as hdfs and aws CLI commands.Once you understand how to get started with AWS EMR, you will go through the details related to Setting up Development Cluster using AWS EMR. There are quite a few advantages to using AWS EMR Clusters for development purposes and most enterprises do so.After setting up a development cluster using AWS EMR, you will go through the Development Life Cycle of Spark Applications using AWS EMR Development Cluster. You will be using Visual Studio Code Remote Development on top of the AWS EMR Development Cluster to go through the details.Once the development is done, you will go through the details related to Deploying Spark Application on AWS EMR Cluster. You will build the zip file and understand how to run using CLI in both clients as well as cluster deployment modes. You will also understand how you can deploy the spark application as a step on AWS EMR Clusters. You will also understand the details related to troubleshooting the issues related to Spark Applications by going through relevant logs.Typically we run Spark Applications programmatically. After going through the details related to deploying spark applications on AWS EMR Clusters, you will be learning how to Manage AWS EMR Clusters using Python Boto3. You will not only learn how to create clusters programmatically but also how to deploy Spark Applications as Steps programmatically using Python Boto3.End to End Data Pipelines using AWS EMR is built using AWS Step Functions. Once you understand how to manage EMR Clusters using Python Boto3 and also deploy Spark Applications on EMR Clusters using the same, it is important to learn how to Build EMR-based Workflows or Pipelines using AWS Step Functions. You will be learning how to create the cluster, deploy Spark Application as Step on to the cluster, and then terminate the cluster as part of a basic pipeline or State Machine using AWS Step Functions.You will also learn how to perform validations as part of State Machines by Enhancing AWS EMR-based State Machine or Pipeline. You will check if the files specified already exist as part of the validations.We can also build Data Processing Applications or Pipelines using Spark SQL on AWS EMR. First, you will learn how to design and develop solutions using Spark SQL Script, how to validate by using appropriate commands by passing relevant runtime arguments, etc.Once you understand the development process of implementing solutions using Spark SQL on AWS EMR, you will learn how to deploy Data Pipeline using AWS Step Function to deploy Spark SQL Script on EMR Cluster. You will also learn the concept of Boto3 Waiters to make sure the steps are executed in a linear fashion.
Overview
Section 1: Introduction to Mastering AWS Elastic Map Reduce for Data Engineers
Lecture 1 Introduction to Mastering AWS Elastic Map Reduce for Data Engineers
Section 2: Getting Started on Windows with Required Tools
Lecture 2 Overview of Powershell on Windows 10 or Windows 11
Lecture 3 Install Visual Studio Code on Windows
Lecture 4 Install Remote Development Extension Kit for Visual Studio Code
Section 3: Getting Started with AWS EMR
Lecture 5 Planning of EMR Cluster
Lecture 6 Create EC2 Key Pair
Lecture 7 Setup EMR Cluster with Spark
Lecture 8 Understanding Summary of AWS EMR Cluster
Lecture 9 Review EMR Cluster Application User Interfaces
Lecture 10 Review EMR Cluster Monitoring
Lecture 11 Review EMR Cluster Hardware and Cluster Scaling Policy
Lecture 12 Review EMR Cluster Configurations
Lecture 13 Review EMR Cluster Events
Lecture 14 Review EMR Cluster Steps
Lecture 15 Review EMR Cluster Bootstrap Actions
Lecture 16 Connecting to EMR Master Node using SSH
Lecture 17 Disabling Termination Protection and Terminating the Cluster
Lecture 18 Clone and Create New Cluster
Lecture 19 Listing AWS S3 Buckets and Objects using AWS CLI on EMR Cluster
Lecture 20 Listing AWS S3 Buckets and Objects using HDFS CLI on EMR Cluster
Lecture 21 Managing Files in AWS s3 using HDFS CLI on EMR Cluster
Lecture 22 Review Glue Catalog Databases and Tables
Lecture 23 Accessing Glue Catalog Databases and Tables using EMR Cluster
Lecture 24 Accessing spark-sql CLI of AWS EMR Cluster
Lecture 25 Accessing pyspark CLI of AWS EMR Cluster
Lecture 26 Accessing spark-shell CLI of AWS EMR Cluster
Lecture 27 Create AWS EMR Cluster for Notebooks
Section 4: Setup Development Cluster using AWS EMR
Lecture 28 Create bootstrap script for AWS EMR Cluster
Lecture 29 Provision Elastic IP for Master Node of AWS EMR Cluster
Lecture 30 Create AWS EMR for Development
Lecture 31 Troubleshooting Issues related to Bootstrap of EMR Cluster
Lecture 32 Fix Bootstrap Script for AWS EMR Cluster
Lecture 33 Validate AWS EMR Cluster with Bootstrap Action with updated script
Lecture 34 Setup Python Virtual Environment as part of VS Code Workspace
Lecture 35 Getting Started with Boto3 to Manage AWS EMR Clusters
Lecture 36 Setup boto3 to explore APIs to manage AWS EMR Clusters
Lecture 37 Set AWS Profile using env file in Visual Studio Code
Lecture 38 Get Cluster Details of AWS EMR Development Cluster using boto3
Lecture 39 Getting Instance Id of the Master Node of AWS EMR Cluster using boto3
Lecture 40 Getting Allocation Id of the Elastic Ip using AWS boto3
Lecture 41 Associating Elastic Ip with AWS EMR Master Node using Boto3
Lecture 42 Setup Notebook Environment for EMR Cluster using IAM User
Section 5: Development Life Cycle using AWS EMR Development Cluster
Lecture 43 Open Remote Window on AWS EMR Master Node using VS Code
Lecture 44 Setup Workspace on AWS EMR Master using Git Repository
Lecture 45 Best Practices and Advantages of using AWS EMR Cluster for Team Development
Lecture 46 Install VSCode Extensions in remote Workspace for Python
Lecture 47 Review Python and Pyspark details on EMR Cluster
Lecture 48 Running Applications using local and yarn during development
Lecture 49 Getting Started with Development of Spark Applications on EMR Cluster
Lecture 50 Create Function for Spark Session
Lecture 51 Upload Files to AWS s3 for the development using AWS EMR Cluster
Lecture 52 Develop read logic for the Spark Application
Lecture 53 Process Data Frame using Spark APIs
Lecture 54 Write Data to Files using Spark APIs
Lecture 55 Productionize the Code and setup required data sets for validation
Lecture 56 Resize the AWS EMR Cluster using Web Console
Lecture 57 Validate Changes to productionize the Application Code
Lecture 58 Take the backup and terminate the cluster
Section 6: Deploy Spark Application on AWS EMR Cluster
Lecture 59 Recreate the AWS EMR Cluster to deploy Spark Applications
Lecture 60 Setup Code Repository on the AWS EMR Master Node
Lecture 61 Resize the AWS EMR Cluster to validate application on larger data sets
Lecture 62 Build Zip File for the Spark Application
Lecture 63 Validate the Spark Application using zip file and client as deploy mode
Lecture 64 Run Spark Application on EMR using Cluster Deployment Mode
Lecture 65 Run Spark Application copied to s3 on EMR using Cluster Deployment Mode
Lecture 66 Deploy Spark Application as Step to the AWS EMR Cluster
Lecture 67 Setup Multiple Files to Manage AWS s3 Objects using State Machines
Lecture 68 Validate Spark Application Deployed as Step on AWS EMR Cluster
Section 7: Manage AWS EMR Clusters using Python Boto3
Lecture 69 Update Material related to Managing AWS EMR using Boto3
Lecture 70 Create AWS EMR Cluster using AWS CLI Command
Lecture 71 Manage AWS EMR Clusters using AWS CLI Commands
Lecture 72 Overview of AWS boto3 to Manage AWS EMR Clusters
Lecture 73 Overview of Run Job Flow API to create AWS EMR Cluster
Lecture 74 Create AWS EMR Cluster or Job Flow Cluster using AWS Boto3
Lecture 75 Prepare Data Sets to add Spark Application as Step to AWS EMR Cluster
Lecture 76 Add Spark Application as Step to AWS EMR Cluster using Boto3
Lecture 77 Exercise to add Spark Application as Step to EMR Cluster using boto3
Lecture 78 Terminate the AWS EMR Cluster used for adding Steps
Lecture 79 Exercise to Create AWS EMR Cluster with Steps for Spark Application
Section 8: Build EMR based Workflows or Pipelines using AWS Step Functions
Lecture 80 Review of Development Environment for AWS Step Functions and EMR
Lecture 81 Quick Overview of Important Terms of AWS Step Functions
Lecture 82 Getting Started with EMR based Pipeline using AWS Step Functions
Lecture 83 Overview of AWS IAM Role associated with State Machine copy
Lecture 84 Overview of Creating EMR Cluster using AWS Step Functions
Lecture 85 Parameters to Create EMR Cluster using AWS Step Functions
Lecture 86 Attach Permissions to Step Function Role to Create AWS EMR Cluster
Lecture 87 Add Step to AWS EMR Cluster using AWS Step Function
Lecture 88 Validate Adding Step to AWS EMR Cluster using Step Functions
Lecture 89 Add Action to Step Machine to Terminate the AWS EMR Cluster
Lecture 90 Validate the execution of State Machine to run Spark Application on AWS EMR
Lecture 91 Terminate AWS EMR Clusters Created to Validate State Machine copy
Section 9: Develop State Machine using AWS Step Functions to manage s3
Lecture 92 Review the current state of AWS EMR based Pipeline or State Machine copy
Lecture 93 Create State Machine using AWS Step Function to Validate s3 copy
Lecture 94 Attach Policy with Permissions on AWS s3 to Step Function Role copy
Lecture 95 Setup File in AWS s3 and Validate State Machine to list objects copy
Lecture 96 Relationship between AWS Boto3 and Actions in Step Functions copy
Lecture 97 Add State to Delete Object from AWS s3 copy
Lecture 98 Fix Permissions and Run State Machine to Delete Object from AWS s3 copy
Lecture 99 Passing Input to States in AWS Step Functions State Machine copy
Lecture 100 Setup Multiple Files to Manage AWS s3 Objects using State Machines copy
Lecture 101 Process AWS s3 Objects using Map in State Machine
Lecture 102 Extract Key of AWS s3 Objects using Step Functions Pass
Lecture 103 Add State to AWS Step Function Delete s3 Object
Lecture 104 Develop AWS Lambda Function to customise State Machine Data
Lecture 105 Add AWS Lambda Function to State Machine to Pass s3 Details for delete
Lecture 106 Add Condition to State Machine to avoid Key Error on AWS s3 List Objects
Lecture 107 Overview of Map Concurrency in State Machines of AWS Step Functions
Lecture 108 Invoking AWS Step Function State Machine from Other State Machines
Lecture 109 Overview of integration of s3 based State Machine with EMR State Machine
Section 10: Adding s3 Validation Logic to AWS EMR based State Machine
Lecture 110 Taking back up of AWS Step Functions State Machines
Lecture 111 Grant Permissions between AWS Step Functions State Machines via IAM Role
Lecture 112 Update AWS Step Function State Machine with EMR to validate s3
Lecture 113 Pass EMR Step Details to AWS Step Functions State
Lecture 114 Validate AWS Step Function EMR based State Machine Execution
Lecture 115 Run AWS Step Function State Machine to validate logic to delete AWS s3 Objects
Lecture 116 Exercise to add validation of source s3 location in AWS Step Function StateMach
Lecture 117 Update AWS Step Function State Machine to Validate Source s3 Location
Lecture 118 Run AWS Step Function State Function with source s3 Validation Logic
Lecture 119 Develop AWS Lambda Function to check number of files in source s3
Lecture 120 Attach Policy to State Machine Role to Invoke AWS Lambda Function
Lecture 121 Run Updated State Machine to validate source count
Lecture 122 Best Practices to Run AWS Step Functions State Machines
Section 11: Develop Applications using Spark SQL on AWS EMR Cluster
Lecture 123 Setup AWS EMR Cluster to develop applications using Spark SQL
Lecture 124 Setup Visual Studio Code Workspace using AWS EMR Master Node
Lecture 125 Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
Lecture 126 Setup Required Data Sets for Spark SQL
Lecture 127 Upload Retail DB Files to AWS s3 using AWS CLI commands
Lecture 128 Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR C
Lecture 129 Create Spark SQL Temporary Views for Orders and Order Items
Lecture 130 Join and Aggregate using Spark SQL on AWS EMR Cluster
Lecture 131 Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
Lecture 132 Develop Script using Spark SQL Commands
Lecture 133 Parameterize Bucket Name in Spark SQL Script
Lecture 134 Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
Lecture 135 Deploy Spark SQL Script as Step on AWS EMR Cluster
Lecture 136 Conclusion to Develop Spark SQL Applications on EMR Cluster
Section 12: Develop AWS Step Function to deploy Spark SQL Script on EMR Cluster
Lecture 137 Create State Machine to Deploy Spark SQL Script on AWS EMR Cluster
Lecture 138 Overview of Managing AWS EMR Clusters using Boto3
Lecture 139 Overview of AWS boto3 to Manage AWS EMR Clusters
Lecture 140 Create AWS EMR Job Flow Cluster using Python Boto3
Lecture 141 Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
Lecture 142 Overview of AWS EMR Waiters using Python Boto3
Lecture 143 Terminate AWS EMR Cluster using waiters and Python Boto3
Lecture 144 Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
Lecture 145 Create State Machine using AWS Step Function to create EMR Cluster
Lecture 146 Grant Permissions to State Machine via Role to Create AWS EMR Cluster
Lecture 147 Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
Lecture 148 Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
Lecture 149 Pass AWS EMR Step Details as Input to State Machine at Execution Time
Lecture 150 Validate Spark SQL Script Execution as AWS EMR Step using State Machine
University Students who want to learn AWS Elastic Map Reduce to process heavy volumes of data with hands on and real time examples,Aspiring Data Engineers and Data Scientists who want to master building data pipelines using AWS Elastic Map Reduce for large scale Data Processing,Experienced Application Developers who would like to explore how to build end to end Data Pipelines using Python and AWS Services such as AWS Elastic Map Reduce,Experienced Data Engineers to build end to end data pipelines using Python and AWS Elastic Map Reduce,Any IT Professional who is keen to deep dive into AWS Elastic Map Reduce (EMR) for heavy weight Data Processing
Course Information:
Udemy | English | 11h 18m | 5.54 GB
Created by: Durga Viswanatha Raju Gadiraju
You Can See More Courses in the IT & Software >> Greetings from CourseDown.com