Maxtrain.com - [email protected] - 513-322-8888 - 866-595-6863


Fundamentals of Apache Hadoop

Alert Me


Apache Hadoop: is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework.

What is covered in this course:

  • Hadoop Distributed File System (HDFS)
  • HDFS Operations
  • HDFS Management
  • MapReduce
  • MapReduce Types & Formats
  • Counters
  • Hadoop Administration
  • Apache Pig Installation & Configuration
  • Hands on Pig
  • Apache Hive Installation & Configuration
  • Hands of Hive
  • Apache HBase
  • Hands on HBase
  • Apache Zookeeper
  • Apache SQOOP
  • Apache Flume
  • Cloudera
  • HortonWorks


Hadoop Installation:
  • Download and install the JDK & JRE
  • Download and Install Apache Hadoop
  • Add a Path to the profile
  • Configure SSH
  • Configure Common HDFS and MapReduce Configurations
  • Format NameNode and Launch Hadoop Daemons
HDFS Operations:
  • Copy a File from local to HDFS
  • List Files Directory in HDFS
  • Copy a File from HDFS to Local
  • CAT Remove a File Directory from HDFS
  • Using Administrative Tools
  • Access NameNode via Web User Interface
Managing HDFS:
  • Sourcing data from various Locations
  • Using Hadoop Archives
  • Parallel Copying with distcp
  • HDFS Upgrade Process
  • Configuring Rack Awareness
  • Installing Eclipse
  • Creating a Mapper Class
  • Creating a Reducer Class
  • Creating a Driver Class
  • Packing jar and Running MapReduce
  • Accessing Job Tracker via Web User Interface
MapReduce Types & Formats:
  • Running a Default MapReducer Job
  • Default Mapper
  • Default Partitioner
  • Default Reducer
  • Running a Streaming MapReduce Job
  • Understanding Counters
  • Writing User Defined Counters
Hadoop Administration:
  • Finding Logs
  • Directory structures of HDFS Components
  • Commissioning & Decommissioning slave nodes
  • Optimizing configuration settings
  • Using Teragen to generate data sets
  • Using Terasort to Benchmark Hadoop Cluster
Apache Pic Installation & Configuration:
  • Downloading Apache Pig
  • Installing Apache Pig
  • Configuring Apache Pig
  • Starting Pig in Local Mode
  • Starting Pig in MapReduce Mode
  • Running a Pig Script
Hands on Pig:
  • Loading & Storing
  • Filtering & Transforming
  • Grouping & Sorting
  • Combining & Splitting
  • Writing User Defined Functions
  • Using Diagnostic Operations
Apache Hive Installation & Configuration:
  • Downloading Apache Hive
  • Installing Apache Hive
  • Configuring Apache Hive
  • Creating a Table in Hive
  • Loading data into the Table
  • Running HiveQL Statements
Hands on Hive:
  • Creating Tables (Managed & External)
  • Using Partitions
  • Creating Views
  • Creating Indexes
  • Writing a Hive UDF
Apache HBase:
  • Downloading Apache HBase
  • Installing Apache HBase
  • Configuring Apache HBase
  • Creating a Table in Apache HBase
Hands on HBase:
  • Installing HBase in Fully Distributed Mode
  • Creating a Table in HBase using HBase Shell
  • Loading Data in HBase using Pig
  • Running Hive Queries on HBase Tables
  • Using REST Server
Apache Zookeeper:
  • Downloading Apache Zookeeper
  • Installing Apache Zookeeper
  • Configuring Apache Zookeeper
  • Using Zookeeper in CLI to perform functions
Apache SQOOP:
  • Downloading Apache SQOOP
  • Installing Apache SQOOP
  • Configuring Apache SQOOP
  • Downloading MySQL Connector for SQOOP
  • Importing Data from RDBMS to HDFS and Hive
  • Exporting Data from HDFS to RDBMS
Apache Flume:
  • Downloading Apache Flume
  • Installing Apache Flume
  • Configuring Apache Flume
  • Setting up Twitter Developer Accounts for API Keys
  • Setting the .conf file to stream data to HDFS
  • Streaming Twitter data to HDFS
  • Download & Install VMWare Player on Windows
  • Download Cloudera CDH VM
  • Load Cloudera CDH using VMWare Player
  • Using Cloudera Manager
  • Using Cloudera HUE
  • Exploring Cloudera CDH VM
Horton Works:
  • Download the HDP 2.1 Sandbox
  • Load HDP 2.1 Sandbox using VMWare Player
  • Getting Started with HDP 2.1 Sandbox
  • Using Apache Ambari


Basic Linux Knowledge and some knowledge of traditional database systems.


Developers and Administrators interested in working with Big Data and the Big Data Ecosystem.
$3000.00 List Price

5 Days Course

Class Dates

Request a Date or a Private Class below.

MAX Educ. Savings
Categories: , Tags: ,
Loading ...