When technology is progressing well, it becomes easier to use.
So you want to test hadoop locally - not for testing or production, but for playing and development. But all of the installation examples look rather complicated, and you know that everyone produces their own distribution, so why are you downloading and extracting a tarball?
Anyway, here’s an easy installation on ubuntu 18.04.
snap install --beta hadoop #yup.
Now check that you actually have something, and run an example. Parts of this follow the hadoop docs, but with ubuntu-snap-specific commands, which you might find easier to follow.
hadoop jar /snap/hadoop/current/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.jar #... pick an example that doesn't depend on hdfs for now sudo hadoop jar /snap/hadoop/current/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 8 500
Even if you already know π to a thousand places, it’s fun to have a Monte Carlo method tell you approximately what it is:
Number of Maps = 8 Samples per Map = 500 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Starting Job 18/11/04 22:36:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/11/04 22:36:13 INFO input.FileInputFormat: Total input paths to process : 8 18/11/04 22:36:14 INFO mapreduce.JobSubmitter: number of splits:8 18/11/04 22:36:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541542844479_0004 18/11/04 22:36:14 INFO impl.YarnClientImpl: Submitted application application_1541542844479_0004 18/11/04 22:36:14 INFO mapreduce.Job: The url to track the job: http://local-ubuntu:8088/proxy/application_1541542844479_0004/ 18/11/04 22:36:14 INFO mapreduce.Job: Running job: job_1541542844479_0004 18/11/04 22:36:18 INFO mapreduce.Job: Job job_1541542844479_0004 running in uber mode : false 18/11/04 22:36:18 INFO mapreduce.Job: map 0% reduce 0% 18/11/04 22:36:25 INFO mapreduce.Job: map 63% reduce 0% 18/11/04 22:36:26 INFO mapreduce.Job: map 75% reduce 0% 18/11/04 22:36:27 INFO mapreduce.Job: map 88% reduce 0% 18/11/04 22:36:28 INFO mapreduce.Job: map 100% reduce 0% 18/11/04 22:36:30 INFO mapreduce.Job: map 100% reduce 100% 18/11/04 22:36:30 INFO mapreduce.Job: Job job_1541542844479_0004 completed successfully 18/11/04 22:36:30 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=182 FILE: Number of bytes written=1085904 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2112 HDFS: Number of bytes written=215 HDFS: Number of read operations=35 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=8 Launched reduce tasks=1 Data-local map tasks=8 Total time spent by all maps in occupied slots (ms)=32949 Total time spent by all reduces in occupied slots (ms)=1508 Total time spent by all map tasks (ms)=32949 Total time spent by all reduce tasks (ms)=1508 Total vcore-milliseconds taken by all map tasks=32949 Total vcore-milliseconds taken by all reduce tasks=1508 Total megabyte-milliseconds taken by all map tasks=33739776 Total megabyte-milliseconds taken by all reduce tasks=1544192 Map-Reduce Framework Map input records=8 Map output records=16 Map output bytes=144 Map output materialized bytes=224 Input split bytes=1168 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=224 Reduce input records=16 Reduce output records=0 Spilled Records=32 Shuffled Maps =8 Failed Shuffles=0 Merged Map outputs=8 GC time elapsed (ms)=1231 CPU time spent (ms)=2770 Physical memory (bytes) snapshot=2354618368 Virtual memory (bytes) snapshot=17203613696 Total committed heap usage (bytes)=1699741696 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=944 File Output Format Counters Bytes Written=97 Job Finished in 17.162 seconds Estimated value of Pi is 3.14000000000000000000
Make a directory and copy an input file.txt to the (pseudo-)distributed filesystem, then for old times’ sake, run the word-count example on that single file.
$ sudo hadoop.hdfs dfs -mkdir /user/iain $ sudo hadoop.hdfs dfs -chown -R iain:iain /user/iain $ hadoop.hdfs dfs -put ~/file.txt /user/iain/ $ hadoop jar /snap/hadoop/current/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/iain/file.txt /etc/hadoop/wc_output [ output snipped ] $ hadoop.hdfs dfs -ls /etc/hadoop/wc_output Found 2 items -rw-r--r-- 1 root supergroup 0 2018-11-13 23:46 /etc/hadoop/wc_output/_SUCCESS -rw-r--r-- 1 root supergroup 32 2018-11-13 23:46 /etc/hadoop/wc_output/part-r-00000 $ hadoop.hdfs dfs -cat /etc/hadoop/wc_output/part-r-00000 This 1 a 1 file 1 is 1 simple 1