Hadoop Developer In Real World

Free Cluster Access * HDFS * MapReduce * YARN * Pig * Hive * Flume * Sqoop * AWS * EMR * Optimization * Troubleshooting
Hadoop Developer In Real World
File Size :
10.93 GB
Total length :
20h 22m



Hadoop In Real World


Last update




Hadoop Developer In Real World

What you’ll learn

Understand what is Big Data, the challenges with Big Data and how Hadoop propose a solution for the Big Data problem
Work and navigate Hadoop cluster with ease
Install and configure a Hadoop cluster on cloud services like Amazon Web Services (AWS)
Understand the difference phases of MapReduce in detail
Write optimized Pig Latin instruction to perform complex data analysis
Write optimized Hive queries to perform data analysis on simple and nested datasets
Work with file formats like SequenceFile, AVRO etc
Understand Hadoop architecture, Single Point Of Failures (SPOF), Secondary/Checkpoint/Backup nodes, HA configuration and YARN
Tune and optimize slowing running MapReduce jobs, Pig instructions and Hive queries
Understand how Joins work behind the scenes and will be able to write optimized join statements
Wherever possible, students will be introduced to difficult questions that are asked in real Hadoop interviews

Hadoop Developer In Real World


Although you don’t have to be an expert in Java, basic knowledge in Java programming is required as we will be looking at programs in Java.
Basic Linux commands


From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in REAL WORLD Hadoop environments.The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc. All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.


Section 1: Thank You and Let’s Get Started

Lecture 1 Course Structure

Lecture 2 Tools & Setup (Windows)

Lecture 3 Tools & Setup (Linux)

Section 2: Introduction To Big Data

Lecture 4 What is Big Data?

Lecture 5 Understanding Big Data Problem

Lecture 6 History of Hadoop

Section 3: HDFS

Lecture 7 HDFS – Why Another Filesystem?

Lecture 8 Blocks

Lecture 9 Working With HDFS

Lecture 10 HDFS – Read & Write

Lecture 11 HDFS – Read & Write (Program)

Lecture 12 HDFS Assignment

Section 4: MapReduce

Lecture 13 Introduction to MapReduce

Lecture 14 Dissecting MapReduce Components

Lecture 15 Dissecting MapReduce Program (Part 1)

Lecture 16 Dissecting MapReduce Program (Part 2)

Lecture 17 Combiner

Lecture 18 Counters

Lecture 19 Facebook – Mutual Friends

Lecture 20 New York Times – Time Machine

Lecture 21 MapReduce Assignment

Section 5: Apache Pig

Lecture 22 Introduction to Apache Pig

Lecture 23 Loading & Projecting Datasets

Lecture 24 Solving a Problem

Lecture 25 Complex Types

Lecture 26 Pig Latin – Joins

Lecture 27 Million Song Dataset (Part 1)

Lecture 28 Million Song Dataset (Part 2)

Lecture 29 Page Ranking (Part 1)

Lecture 30 Page Ranking (Part 2)

Lecture 31 Page Ranking (Part 3)

Lecture 32 Apache Pig Assignment

Section 6: Apache Hive

Lecture 33 Introduction to Apache Hive

Lecture 34 Dissect a Hive Table

Lecture 35 Loading Hive Tables

Lecture 36 Simple Selects

Lecture 37 Managed Table vs. External Table

Lecture 38 Order By vs. Sort By vs. Cluster By

Lecture 39 Partitions

Lecture 40 Buckets

Lecture 41 Hive QL – Joins

Lecture 42 Twitter (Part 1)

Lecture 43 Twitter (Part 2)

Lecture 44 Apache Hive Assignment

Section 7: Hive Window and Analytical Functions

Lecture 45 Introduction to Hive Window and Analytical functions

Lecture 46 Kickstarter campaign duplicates and top campaigns

Lecture 47 Kickstarter campaign bands and user sessions

Section 8: Architechture

Lecture 48 HDFS Architechture

Lecture 49 Secondary Namenode

Lecture 50 Highly Available Hadoop

Lecture 51 MRv1 Architechture

Lecture 52 YARN

Section 9: Cluster Setup

Lecture 53 Vendors & Hosting

Lecture 54 Cluster Setup (Part 1)

Lecture 55 Cluster Setup (Part 2)

Lecture 56 Cluster Setup (Part 3)

Lecture 57 Amazon EMR

Section 10: Hadoop Administrator In Real World (Preview)

Lecture 58 Cloudera Manager – Introduction

Lecture 59 Cloudera Manager – Installation

Section 11: File Formats

Lecture 60 Compression

Lecture 61 Sequence File

Lecture 62 AVRO

Lecture 63 File Formats – Pig

Lecture 64 File Formats – Hive

Lecture 65 Introduction to RCFile

Lecture 66 Working with RCFile

Lecture 67 Introduction to ORC

Lecture 68 Working with ORC

Lecture 69 Parquet – Another Columnar Format

Lecture 70 Avro Schema and It’s Importance

Lecture 71 Schema Evolution in Avro (Part 1)

Lecture 72 Schema Evolution in Avro (Part 2)

Section 12: Troubleshooting and Optimizations

Lecture 73 Exploring Logs

Lecture 74 MRUnit

Lecture 75 MapReduce Tuning

Lecture 76 Pig Join Optimizations (Part 1)

Lecture 77 Pig Join Optimizations (Part 2)

Lecture 78 Hive Join Optimizations

Section 13: Apache Sqoop

Lecture 79 Sqoop Imports

Lecture 80 Sqoop – File Formats

Lecture 81 Jobs & Incremental Imports

Lecture 82 Hive – Exports

Section 14: Apache Flume

Lecture 83 Introduction to Flume

Lecture 84 Replication

Lecture 85 Consolidation & Mutliplexing

Lecture 86 Streaming Twitter with Flume

Section 15: Kafka

Lecture 87 Kafka – The Why & the What?

Lecture 88 Kafka Concepts

Lecture 89 Tolerating Failures – Producers & Consumers

Lecture 90 Tolerating Failures – Brokers

Lecture 91 Kafka Installation

Lecture 92 Experiments with Kafka

Lecture 93 Streaming Meetup with Kafka (Part-1)

Lecture 94 Streaming Meetup with Kafka (Part-2)

Lecture 95 Writing production ready Kafka application

Lecture 96 Schema management with Kafka Schema Registry

Lecture 97 Schema evolution with Kafka Schema Registry

Section 16: Bonus

Lecture 98 Preparing For Hadoop Interviews

This course is for anyone who aspire a career as a Hadoop Developer,This course is for anyone who want to learn and understand in depth about Hadoop and Big Data

Course Information:

Udemy | English | 20h 22m | 10.93 GB
Created by: Hadoop In Real World

You Can See More Courses in the IT & Software >> Greetings from CourseDown.com

New Courses

Scroll to Top