Hadoop, how to create a single node cluster using docker

INTRODUCTION

Hadoop is an open-source sofware utilities that uses a computer cluster to solve problems involving massive amounts of data (BigData)

The Apache Hadoop framework is composed by follow modules:

  • Common
  • HDFS
  • YARN
  • MapReduce

All this modules are included into single docker image (only for accademic use) created by sequenceiq

INSTRUCTION

Requirements: Docker CE

Follow steps will help you to create a single node cluster into your computer !!!

First pull the image from official repo

docker pull sequenceiq/hadoop-docker:2.7.1

Now you can create a docker container named hadoop-local

docker run --name hadoop-local -d -t -i  \
    -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -p 50090:50090 \
    -p 8020:8020 -p 9000:9000 -p 19888:19888 -p 8030:8030 -p 8031:8031 \ 
    -p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 8088:8088 -p 49707:49707 \
    -p 2122:2122 \
    sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash

Into run command there are exposed ports:
  • HDFS
    • 50010
    • 50020 
    • 50070 (web-ui)
    • 50075 
    • 50090 
    • 8020 
    • 9000
  • MAP REDUCE
    • 19888
  • YARN
    • 8030
    • 8031 
    • 8032 
    • 8033
    • 8040 
    • 8042 
    • 8088 (web-ui)
  • OTHERS
    • 49707
    • 2122

EXAMPLE: COPY A FILE INTO HDFS

Now, for example, we copy a file into HDFS and show it into webui

Step 1

Copy customers.csv file into /tmp directory inside docker container

docker cp customers.csv hadoop-local:/tmp

Step 2

Copy customers.csv file into HDFS File System

docker exec -t hadoop-local /usr/local/hadoop/bin/hdfs dfs -put /tmp/customers.csv /home/user/customers.csv

Note: /home/user directory must be present into HDFS

Now our file is into HDFS !!!!

WEB-UI

The accademic docker image has also 2 web user interfaces
  • YARN
    • http://yourip:8088
  • HDFS
    • http://yourip:50070

STOP AND  REMOVE

To stop and remove the docker container you can use follow commands:

docker stop hadoop-local
docker rm hadoop-local

Note: All your data inside HDFS will be lost!

Commenti

Posta un commento

Post popolari in questo blog

Apache Spark - Try it using docker!

IPFS - InterPlanetary File System