Hadoop, how to create a single node cluster using docker
INTRODUCTION
Hadoop is an open-source sofware utilities that uses a computer cluster to solve problems involving massive amounts of data (BigData)The Apache Hadoop framework is composed by follow modules:
- Common
- HDFS
- YARN
- MapReduce
All this modules are included into single docker image (only for accademic use) created by sequenceiq
INSTRUCTION
Requirements: Docker CEFollow steps will help you to create a single node cluster into your computer !!!
First pull the image from official repo
docker pull sequenceiq/hadoop-docker:2.7.1
Now you can create a docker container named hadoop-local
docker run --name hadoop-local -d -t -i \
-p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -p 50090:50090 \
-p 8020:8020 -p 9000:9000 -p 19888:19888 -p 8030:8030 -p 8031:8031 \
-p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 8088:8088 -p 49707:49707 \
-p 2122:2122 \
sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
Into run command there are exposed ports:
- HDFS
- 50010
- 50020
- 50070 (web-ui)
- 50075
- 50090
- 8020
- 9000
- MAP REDUCE
- 19888
- YARN
- 8030
- 8031
- 8032
- 8033
- 8040
- 8042
- 8088 (web-ui)
- OTHERS
- 49707
- 2122
EXAMPLE: COPY A FILE INTO HDFS
Now, for example, we copy a file into HDFS and show it into webui
Step 1
Copy customers.csv file into /tmp directory inside docker container
docker cp customers.csv hadoop-local:/tmp
Step 2
Copy customers.csv file into HDFS File System
docker exec -t hadoop-local /usr/local/hadoop/bin/hdfs dfs -put /tmp/customers.csv /home/user/customers.csv
Note: /home/user directory must be present into HDFS
Now our file is into HDFS !!!!
WEB-UI
The accademic docker image has also 2 web user interfaces
- YARN
- http://yourip:8088
- HDFS
- http://yourip:50070
STOP AND REMOVE
To stop and remove the docker container you can use follow commands:
docker stop hadoop-local
docker rm hadoop-local
Note: All your data inside HDFS will be lost!
Very nice post,thank you for sharing this awesome blog with us.
RispondiEliminakeep sharing more...
Big data online course
Big data training