Learn Big Data The Hadoop Ecosystem Masterclass

Master the Hadoop ecosystem using HDFS, MapReduce, Yarn, Pig, Hive, Kafka, HBase, Spark, Knox, Ranger, Ambari, Zookeeper
Learn Big Data The Hadoop Ecosystem Masterclass
File Size :
2.60 GB
Total length :
5h 58m



Edward Viaene


Last update




Learn Big Data The Hadoop Ecosystem Masterclass

What you’ll learn

Process Big Data using batch
Process Big Data using realtime data
Be familiar with the technologies in the Hadoop Stack
Be able to install and configure the Hortonworks Data Platform (HDP)

Learn Big Data The Hadoop Ecosystem Masterclass


You will need to have a background in IT. The course is aimed at Software Engineers, System Administrators, DBAs who want to learn about Big Data
Knowing any programming language will enhance your course experience
The course contains demos you can try out on your own machine. To run the Hadoop cluster on your own machine, you will need to run a virtual server. 8 GB or more RAM is recommended.


Important update: Effective January 31, 2021, all Cloudera software will require a valid subscription and only be accessible via the paywall. The sandbox can still be downloaded, but the full install requires a Cloudera subscription to get access to the yum repository. In this course you will learn Big Data using the Hadoop Ecosystem. Why Hadoop? It is one of the most sought after skills in the IT industry. The average salary in the US is $112,000 per year, up to an average of $160,000 in San Fransisco (source: Indeed).The course is aimed at Software Engineers, Database Administrators, and System Administrators that want to learn about Big Data. Other IT professionals can also take this course, but might have to do some extra research to understand some of the concepts.You will learn how to use the most popular software in the Big Data industry at moment, using batch processing as well as realtime processing. This course will give you enough background to be able to talk about real problems and solutions with experts in the industry. Updating your LinkedIn profile with these technologies will make recruiters want you to get interviews at the most prestigious companies in the world.The course is very practical, with more than 6 hours of lectures. You want to try out everything yourself, adding multiple hours of learning. If you get stuck with the technology while trying, there is support available. I will answer your messages on the message boards and we have a Facebook group where you can post questions.


Section 1: Introduction

Lecture 1 Course Introduction

Lecture 2 Course Guide

Section 2: What is Big Data and Hadoop

Lecture 3 What is Big Data

Lecture 4 Examples of Big Data

Lecture 5 What is Data Science

Lecture 6 What is Hadoop

Lecture 7 Hadoop Distributions

Section 3: Introduction to Hadoop

Lecture 8 Hadoop Installation

Lecture 9 Demo: Hortonworks Sandbox

Lecture 10 Demo: Hadoop Installation – Part 1

Lecture 11 Demo: Hadoop Installation – Part 2

Lecture 12 Introduction to HDFS

Lecture 13 DataNode Communications

Lecture 14 Demo: HDFS – Part 1

Lecture 15 Demo: HDFS – Part 2 – Using Ambari

Lecture 16 MapReduce WordCount Example

Lecture 17 Demo: MapReduce WordCount

Lecture 18 Lines that span blocks

Lecture 19 Introduction to Yarn

Lecture 20 Demo: Yarn and ResourceManager UI

Lecture 21 Ambari API and Blueprints

Lecture 22 Demo: Ambari API and Blueprints

Lecture 23 ETL Processing in Hadoop

Section 4: Pig

Lecture 24 Introduction to Pig

Lecture 25 Demo: Part 1 – Pig Installation

Lecture 26 Demo: Part 2 – Pig Commands

Lecture 27 Demo: Part 3 – More Pig Commands

Section 5: Apache Spark

Lecture 28 Introduction to Apache Spark

Lecture 29 Spark WordCount

Lecture 30 Demo: Spark installation and WordCount

Lecture 31 RDDs

Lecture 32 Demo: RDD Transformations and Actions

Lecture 33 Overview of RDD Transformations and Actions

Lecture 34 Spark MLLib

Section 6: Hive

Lecture 35 Introduction to Hive

Lecture 36 Hive Queries

Lecture 37 Demo: Hive Installation and Hive Queries

Lecture 38 Hive Partitioning, Buckets, UDFs, and SerDes

Lecture 39 The Stinger Initiative

Lecture 40 Hive in Spark

Section 7: Real Time Processing

Lecture 41 Introduction to Realtime Processing

Section 8: Kafka

Lecture 42 Introduction to Kafka

Lecture 43 Kafka Topics

Lecture 44 Kafka Messages and Log Compaction

Lecture 45 Kafka Use Cases and Usage

Lecture 46 Demo: Kafka Installation and Usage

Section 9: Storm

Lecture 47 Introduction to Storm

Lecture 48 A Storm Topology

Lecture 49 Demo: Storm installation and Example Topology

Lecture 50 Storm Message Processing and Reliability

Lecture 51 Trident

Section 10: Spark Streaming

Lecture 52 Introduction to Spark Streaming

Lecture 53 Spark Streaming Architecture

Lecture 54 Spark Receivers and WordCount Streaming Example

Lecture 55 Demo: Spark Streaming with Kafka

Lecture 56 Spark Streaming State and Checkpointing

Lecture 57 Demo: Stateful Spark Streaming

Lecture 58 More Spark Streaming Features

Section 11: HBase

Lecture 59 Introduction to HBase

Lecture 60 HBase Tables

Lecture 61 The HBase Meta Table

Lecture 62 HBase Writes

Lecture 63 HBase Reads

Lecture 64 Compactions

Lecture 65 Crash Recovery

Lecture 66 Region Splits

Lecture 67 Hotspotting

Lecture 68 Demo: HBase Install

Lecture 69 Demo: HBase Shell

Lecture 70 Demo: Spark HBase

Section 12: Phoenix

Lecture 71 Introduction to Phoenix

Lecture 72 Salting, Compression, and Indexes in Phoenix

Lecture 73 JOINs, VIEWs, and Phoenix in Spark

Lecture 74 Demo: Phoenix

Section 13: Hadoop Security

Lecture 75 Introduction to Kerberos

Lecture 76 Kerberos on Hadoop

Lecture 77 Kerberos Terminology

Lecture 78 Demo: Enabling Kerberos

Lecture 79 Introduction to SPNEGO

Lecture 80 Demo: SPNEGO

Lecture 81 Introduction to Knox

Section 14: Ranger

Lecture 82 Introduction to Ranger

Lecture 83 Demo: Ranger Installation

Lecture 84 Demo: Ranger with Hive

Section 15: HDFS Encryption

Lecture 85 Introduction to HDFS Transparent Encryption

Lecture 86 Demo: HDFS Encryption using Ranger KMS

Section 16: Advanced Topics

Lecture 87 Yarn Schedulers

Lecture 88 Demo: Capacity Scheduler

Lecture 89 Label based scheduling

Lecture 90 Yarn Sizing

Lecture 91 Hive Query Optimizations

Lecture 92 Join Strategies

Lecture 93 Spark Optimizations

Lecture 94 NameNode High Availability

Lecture 95 Demo: NameNode High Availability Setup

Lecture 96 Database High Availability

Section 17: Thank You

Lecture 97 Thank You!

Lecture 98 Bonus Lecture: My Other Courses

This course is for anyone that wants to know how Big Data works, and what technologies are involved,The main focus is on the Hadoop ecosystem. We don’t cover any technologies not on the Hortonworks Data Platform Stack,The course compares MapR, Cloudera, and Hortonworks, but we only use the Hortonworks Data Platform (HDP) in the demos

Course Information:

Udemy | English | 5h 58m | 2.60 GB
Created by: Edward Viaene

You Can See More Courses in the IT & Software >> Greetings from CourseDown.com

New Courses

Scroll to Top