Hadoop2.7与ceph10.2 S3对象存储进行集成测试

公司提出测试需求,将Hadoop2.7与ceph10.2 S3对象存储进行集成测试,hadoop官网介绍:[官网介绍](http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)
后查阅相关资料完成对接测试,现将环境部署,对接测试完整过程,整理如下:

**hadoop环境**

2台主机 主机名分别为master slave .  master作为hadoop namenode,slave作为datanode.
hadoop集群部署过程参考: [hadoop集群部署](http://www.178pt.com/156.html)

ceph10.2 radosgw配置过程参考:[radosgw配置](http://www.178pt.com/250.html)

**hadoop集成s3**

在master(namenode)节点上修改core-site.xml,增加如下配置(endpoint key根据实际填写):

<!– Put site-specific property overrides in this file. –>

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/tmp</value>
<description>Abase for other temporary   directories.</description>
</property>

##增加如下内容
<property>
<name>fs.s3a.access.key</name>
<value>YZ8H5J5B4BS4HGJ6U8YC</value>
<description>AWS access key ID. Omit for Role-based authentication.</description>
</property>

<property>
<name>fs.s3a.secret.key</name>
<value>KzPrV6ytwoZoQCMHzbnXXMQKrjH5MLnD3Wsb0AjJ</value>
<description>AWS secret key</description>
</property>

<property>
<name>fs.s3a.endpoint</name>
<value>192.168.1.31:7480</value>
<description>AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the standard region (s3.amazonaws.com) is assumed.
</description>
</property>

<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
<description>Enables or disables SSL connections to S3.</description>
</property>
##增加结束
</configuration>

在master  slave 2个hadoop节点上拷贝s3相关的jar包,否则会报错。

[root@master etc]# pwd
/usr/hadoop/hadoop-2.7.3/etc
[root@master etc]# cp hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar hadoop/share/hadoop/common/lib/
[root@master etc]# cp hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar hadoop/share/hadoop/common/lib/
[root@master etc]# cp hadoop/share/hadoop/tools/lib/joda-time-2.9.4.jar hadoop/share/hadoop/common/lib/
[root@master etc]# cp hadoop/share/hadoop/tools/lib/jackson-*.jar hadoop/share/hadoop/common/lib/

重启hadoop

[root@master etc]# stop-all.sh
[root@master etc]# start-all.sh

**hadoop集成s3测试**

ceph 节点上创建桶hadoop,并上传文件

[root@radosgw1 ~]# s3cmd mb s3://hadoop
Bucket ‘s3://hadoop/’ created
[root@radosgw1 ~]# s3cmd put abc s3://hadoop
upload: ‘abc’ -> ‘s3://hadoop/abc’  [1 of 1]
1109 of 1109   100% in    1s  1096.74 B/s  done
[root@radosgw1 ~]# s3cmd ls s3://hadoop
2018-04-25 08:47      1109   s3://hadoop/abc

hadoop master节点上查看

[root@master ~]# hadoop fs -ls s3a://hadoop/
Found 1 items
-rw-rw-rw-   1       1109 2018-04-25 16:47 s3a://hadoop/abc

1、 从hadoop client本机上传文件到对象存储

[root@master ~]# ls ceshi.txt
ceshi.txt
[root@master ~]# hadoop fs -put ceshi.txt s3a://hadoop/
[root@master ~]# hadoop fs -ls s3a://hadoop/
Found 2 items
-rw-rw-rw-   1       1109 2018-04-25 16:47 s3a://hadoop/abc
-rw-rw-rw-   1       1083 2018-04-25 16:52 s3a://hadoop/ceshi.txt
[root@master ~]#

集群端查看

[root@radosgw1 ~]# s3cmd ls s3://hadoop
2018-04-25 08:47      1109   s3://hadoop/abc
2018-04-25 08:52      1083   s3://hadoop/ceshi.txt

2、 将文件从对象存储下载到本地

[root@master ~]# rm -f ceshi.txt
[root@master ~]# ls ceshi.txt
ls: cannot access ceshi.txt: No such file or directory
[root@master ~]# hadoop fs -get s3a://hadoop/ceshi.txt
[root@master ~]# ls ceshi.txt
ceshi.txt
[root@master ~]#

3、 将文件从对象拷贝到hdfs文件系统

[root@master ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   – root supergroup          0 2018-04-25 15:21 /haha
drwxr-xr-x   – root supergroup          0 2018-04-25 12:10 /input
drwxr-xr-x   – root supergroup          0 2018-04-25 12:11 /output
drwx——   – root supergroup          0 2018-04-25 12:11 /tmp
[root@master ~]# hdfs dfs -ls /ceshi.txt
ls: `/ceshi.txt’: No such file or directory

[root@master ~]# hadoop distcp s3a://hadoop/ceshi.txt /ceshi.txt
18/04/25 17:00:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile=’null’, copyStrategy=’uniformsize’, sourceFileListing=null, sourcePaths=[s3a://hadoop/ceshi.txt], targetPath=/ceshi.txt, targetPathExists=false, preserveRawXattrs=false}
18/04/25 17:00:10 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:00:30 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
18/04/25 17:00:30 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
18/04/25 17:00:31 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:00:31 INFO mapreduce.JobSubmitter: number of splits:1
18/04/25 17:00:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524633996089_0009
18/04/25 17:00:32 INFO impl.YarnClientImpl: Submitted application application_1524633996089_0009
18/04/25 17:00:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524633996089_0009/
18/04/25 17:00:32 INFO tools.DistCp: DistCp job-id: job_1524633996089_0009
18/04/25 17:00:32 INFO mapreduce.Job: Running job: job_1524633996089_0009
18/04/25 17:00:40 INFO mapreduce.Job: Job job_1524633996089_0009 running in uber mode : false
18/04/25 17:00:40 INFO mapreduce.Job:  map 0% reduce 0%
18/04/25 17:00:52 INFO mapreduce.Job:  map 100% reduce 0%
18/04/25 17:01:05 INFO mapreduce.Job: Job job_1524633996089_0009 completed successfully
18/04/25 17:01:05 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=121596
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=330
HDFS: Number of bytes written=1083
HDFS: Number of read operations=14
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
S3A: Number of bytes read=1083
S3A: Number of bytes written=0
S3A: Number of read operations=3
S3A: Number of large read operations=0
S3A: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=20780
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20780
Total vcore-milliseconds taken by all map tasks=20780
Total megabyte-milliseconds taken by all map tasks=21278720
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=135
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=154
CPU time spent (ms)=1350
Physical memory (bytes) snapshot=113676288
Virtual memory (bytes) snapshot=862224384
Total committed heap usage (bytes)=29032448
File Input Format Counters
Bytes Read=195
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=1083
BYTESEXPECTED=1083
COPY=1

[root@master ~]# hdfs dfs -ls /ceshi.txt
-rw-r–r–   1 root supergroup       1083 2018-04-25 17:00 /ceshi.txt
[root@master ~]#

4、  将文件从HDFS文件系统拷贝到s3对象存储中

s3对象列出所有文件
[root@radosgw1 ~]# s3cmd ls s3://hadoop
2018-04-25 08:47      1109   s3://hadoop/abc
2018-04-25 08:52      1083   s3://hadoop/ceshi.txt
[root@radosgw1 ~]#

将hdfs文件系统下的/haha目录中anaconda-ks.cfg文件传到s3对象存储里面
[root@master ~]# hdfs dfs -ls /haha
Found 1 items
-rw-r–r–   1 root supergroup       1083 2018-04-25 15:21 /haha/anaconda-ks.cfg

[root@master ~]# hadoop distcp /haha/anaconda-ks.cfg s3a://hadoop/
18/04/25 17:06:18 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile=’null’, copyStrategy=’uniformsize’, sourceFileListing=null, sourcePaths=[/haha/anaconda-ks.cfg], targetPath=s3a://hadoop/, targetPathExists=true, preserveRawXattrs=false}
18/04/25 17:06:18 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:06:24 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
18/04/25 17:06:24 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
18/04/25 17:06:25 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:06:26 INFO mapreduce.JobSubmitter: number of splits:1
18/04/25 17:06:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524633996089_0010
18/04/25 17:06:26 INFO impl.YarnClientImpl: Submitted application application_1524633996089_0010
18/04/25 17:06:26 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524633996089_0010/
18/04/25 17:06:26 INFO tools.DistCp: DistCp job-id: job_1524633996089_0010
18/04/25 17:06:26 INFO mapreduce.Job: Running job: job_1524633996089_0010
18/04/25 17:06:35 INFO mapreduce.Job: Job job_1524633996089_0010 running in uber mode : false
18/04/25 17:06:35 INFO mapreduce.Job:  map 0% reduce 0%
18/04/25 17:06:57 INFO mapreduce.Job:  map 100% reduce 0%
18/04/25 17:08:14 INFO mapreduce.Job: Job job_1524633996089_0010 completed successfully
18/04/25 17:08:14 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=121562
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1459
HDFS: Number of bytes written=0
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
S3A: Number of bytes read=0
S3A: Number of bytes written=1083
S3A: Number of read operations=11
S3A: Number of large read operations=0
S3A: Number of write operations=3
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=86489
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=86489
Total vcore-milliseconds taken by all map tasks=86489
Total megabyte-milliseconds taken by all map tasks=88564736
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=134
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=151
CPU time spent (ms)=1760
Physical memory (bytes) snapshot=116514816
Virtual memory (bytes) snapshot=863125504
Total committed heap usage (bytes)=29032448
File Input Format Counters
Bytes Read=242
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=1083
BYTESEXPECTED=1083
COPY=1
[root@master ~]#

s3集群端验证
[root@radosgw1 ~]# s3cmd ls s3://hadoop
2018-04-25 08:47      1109   s3://hadoop/abc
2018-04-25 09:08      1083   s3://hadoop/anaconda-ks.cfg
2018-04-25 08:52      1083   s3://hadoop/ceshi.txt
[root@radosgw1 ~]#

5、 将对象存储中的文件作为mapreduce的输入,进行计算之后将结果输出到hdfs文件系统中。

将对象存储中的/hadoop/abc文件作为mapreduce的文件输入,计算结果输出到hdfs的/result目录
[root@master ~]# hadoop fs -ls s3a://hadoop/
Found 3 items
-rw-rw-rw-   1       1109 2018-04-25 16:47 s3a://hadoop/abc
-rw-rw-rw-   1       1083 2018-04-25 17:08 s3a://hadoop/anaconda-ks.cfg
-rw-rw-rw-   1       1083 2018-04-25 16:52 s3a://hadoop/ceshi.txt
[root@master ~]# hdfs dfs -ls /result
ls: `/result’: No such file or directory
[root@master ~]#

当前hdfs是没有/result目录的,下面进行计算操作

[root@master ~]# hadoop jar /usr/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount s3a://hadoop/abc /result
18/04/25 17:19:53 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:19:55 INFO input.FileInputFormat: Total input paths to process : 1
18/04/25 17:19:56 INFO mapreduce.JobSubmitter: number of splits:1
18/04/25 17:19:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524633996089_0011
18/04/25 17:19:57 INFO impl.YarnClientImpl: Submitted application application_1524633996089_0011
18/04/25 17:19:57 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524633996089_0011/
18/04/25 17:19:57 INFO mapreduce.Job: Running job: job_1524633996089_0011
18/04/25 17:20:06 INFO mapreduce.Job: Job job_1524633996089_0011 running in uber mode : false
18/04/25 17:20:06 INFO mapreduce.Job:  map 0% reduce 0%
18/04/25 17:20:23 INFO mapreduce.Job:  map 100% reduce 0%
18/04/25 17:20:31 INFO mapreduce.Job:  map 100% reduce 100%
18/04/25 17:20:32 INFO mapreduce.Job: Job job_1524633996089_0011 completed successfully
18/04/25 17:20:32 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=1442
FILE: Number of bytes written=240937
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=81
HDFS: Number of bytes written=1121
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
S3A: Number of bytes read=1109
S3A: Number of bytes written=0
S3A: Number of read operations=1
S3A: Number of large read operations=0
S3A: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=14541
Total time spent by all reduces in occupied slots (ms)=5450
Total time spent by all map tasks (ms)=14541
Total time spent by all reduce tasks (ms)=5450
Total vcore-milliseconds taken by all map tasks=14541
Total vcore-milliseconds taken by all reduce tasks=5450
Total megabyte-milliseconds taken by all map tasks=14889984
Total megabyte-milliseconds taken by all reduce tasks=5580800
Map-Reduce Framework
Map input records=43
Map output records=104
Map output bytes=1517
Map output materialized bytes=1442
Input split bytes=81
Combine input records=104
Combine output records=79
Reduce input groups=79
Reduce shuffle bytes=1442
Reduce input records=79
Reduce output records=79
Spilled Records=158
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=230
CPU time spent (ms)=2230
Physical memory (bytes) snapshot=324866048
Virtual memory (bytes) snapshot=1723260928
Total committed heap usage (bytes)=162926592
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1109
File Output Format Counters
Bytes Written=1121
[root@master ~]#

计算成功后,查看hdfs目录,下面可以看到目录存在,且计算结果文件也存在,且返回SUCCESS

[root@master ~]# hdfs dfs -ls /result
Found 2 items
-rw-r–r–   1 root supergroup          0 2018-04-25 17:20 /result/_SUCCESS
-rw-r–r–   1 root supergroup       1121 2018-04-25 17:20 /result/part-r-00000
[root@master ~]#

6、 将对象存储中的文件作为mapreduce的输入,进行计算之后将结果输出到对象存储桶中。

首先查看对象存储桶中hadoop下result目录是否存在。待会输出结果会传到这里。
[root@master ~]# hadoop fs -ls s3a://hadoop/result
ls: `s3a://hadoop/result’: No such file or directory

[root@master ~]# hadoop jar /usr/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount s3a://hadoop/abc s3a://hadoop/result
18/04/25 17:25:27 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:25:39 INFO input.FileInputFormat: Total input paths to process : 1
18/04/25 17:25:40 INFO mapreduce.JobSubmitter: number of splits:1
18/04/25 17:25:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524633996089_0012
18/04/25 17:25:41 INFO impl.YarnClientImpl: Submitted application application_1524633996089_0012
18/04/25 17:25:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524633996089_0012/
18/04/25 17:25:41 INFO mapreduce.Job: Running job: job_1524633996089_0012
18/04/25 17:25:53 INFO mapreduce.Job: Job job_1524633996089_0012 running in uber mode : false
18/04/25 17:25:53 INFO mapreduce.Job:  map 0% reduce 0%
18/04/25 17:26:57 INFO mapreduce.Job:  map 100% reduce 0%
18/04/25 17:27:18 INFO mapreduce.Job:  map 100% reduce 67%
18/04/25 17:27:27 INFO mapreduce.Job:  map 100% reduce 100%
18/04/25 17:32:44 INFO mapreduce.Job: Job job_1524633996089_0012 completed successfully
18/04/25 17:32:44 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=1442
FILE: Number of bytes written=240925
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=81
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
S3A: Number of bytes read=1109
S3A: Number of bytes written=1121
S3A: Number of read operations=19
S3A: Number of large read operations=0
S3A: Number of write operations=5
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=22928
Total time spent by all reduces in occupied slots (ms)=198775
Total time spent by all map tasks (ms)=22928
Total time spent by all reduce tasks (ms)=198775
Total vcore-milliseconds taken by all map tasks=22928
Total vcore-milliseconds taken by all reduce tasks=198775
Total megabyte-milliseconds taken by all map tasks=23478272
Total megabyte-milliseconds taken by all reduce tasks=203545600
Map-Reduce Framework
Map input records=43
Map output records=104
Map output bytes=1517
Map output materialized bytes=1442
Input split bytes=81
Combine input records=104
Combine output records=79
Reduce input groups=79
Reduce shuffle bytes=1442
Reduce input records=79
Reduce output records=79
Spilled Records=158
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=256
CPU time spent (ms)=1550
Physical memory (bytes) snapshot=336670720
Virtual memory (bytes) snapshot=1724592128
Total committed heap usage (bytes)=162926592
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1109
File Output Format Counters
Bytes Written=1121
[root@master ~]#

成功后,验证查看

hadoop节点验证
[root@master ~]# hadoop fs -ls s3a://hadoop/result
Found 2 items
-rw-rw-rw-   1          0 2018-04-25 17:33 s3a://hadoop/result/_SUCCESS
-rw-rw-rw-   1       1121 2018-04-25 17:32 s3a://hadoop/result/part-r-00000
[root@master ~]#

ceph集群节点验证
[root@radosgw1 ~]# s3cmd ls s3://hadoop/result/
2018-04-25 09:33         0   s3://hadoop/result/_SUCCESS
2018-04-25 09:32      1121   s3://hadoop/result/part-r-00000

7、 将 HDFS 中的文件作为 MapReduce 的输入,计算结果输出到对象存储的存储空间中

下面将hdfs中ceshi.txt作为计算输入,将结果输出对象存储中hadoop/output目录中。 前期查看

[root@master ~]# hdfs dfs -ls /
Found 6 items
-rw-r–r–   1 root supergroup       1083 2018-04-25 17:00 /ceshi.txt

[root@master ~]# hadoop fs -ls s3a://hadoop/output
ls: `s3a://hadoop/output’: No such file or directory

开始计算并输出

[root@master ~]# hadoop jar /usr/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /ceshi.txt  s3a://hadoop/output
18/04/25 17:39:55 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.20:8032
18/04/25 17:40:04 INFO input.FileInputFormat: Total input paths to process : 1
18/04/25 17:40:05 INFO mapreduce.JobSubmitter: number of splits:1
18/04/25 17:40:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524633996089_0013
18/04/25 17:40:06 INFO impl.YarnClientImpl: Submitted application application_1524633996089_0013
18/04/25 17:40:06 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524633996089_0013/
18/04/25 17:40:06 INFO mapreduce.Job: Running job: job_1524633996089_0013
18/04/25 17:40:19 INFO mapreduce.Job: Job job_1524633996089_0013 running in uber mode : false
18/04/25 17:40:19 INFO mapreduce.Job:  map 0% reduce 0%
18/04/25 17:41:16 INFO mapreduce.Job:  map 100% reduce 0%
18/04/25 17:41:36 INFO mapreduce.Job:  map 100% reduce 67%
18/04/25 17:41:45 INFO mapreduce.Job:  map 100% reduce 100%
18/04/25 17:46:38 INFO mapreduce.Job: Job job_1524633996089_0013 completed successfully
18/04/25 17:46:38 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=1404
FILE: Number of bytes written=240873
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1176
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=1091
S3A: Number of read operations=18
S3A: Number of large read operations=0
S3A: Number of write operations=5
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=17302
Total time spent by all reduces in occupied slots (ms)=173487
Total time spent by all map tasks (ms)=17302
Total time spent by all reduce tasks (ms)=173487
Total vcore-milliseconds taken by all map tasks=17302
Total vcore-milliseconds taken by all reduce tasks=173487
Total megabyte-milliseconds taken by all map tasks=17717248
Total megabyte-milliseconds taken by all reduce tasks=177650688
Map-Reduce Framework
Map input records=41
Map output records=102
Map output bytes=1483
Map output materialized bytes=1404
Input split bytes=93
Combine input records=102
Combine output records=77
Reduce input groups=77
Reduce shuffle bytes=1404
Reduce input records=77
Reduce output records=77
Spilled Records=154
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=261
CPU time spent (ms)=1570
Physical memory (bytes) snapshot=325062656
Virtual memory (bytes) snapshot=1724448768
Total committed heap usage (bytes)=162926592
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1083
File Output Format Counters
Bytes Written=1091

在hadoop节点和集群节点验证

[root@master ~]# hadoop fs -ls s3a://hadoop/output
Found 2 items
-rw-rw-rw-   1          0 2018-04-25 17:47 s3a://hadoop/output/_SUCCESS
-rw-rw-rw-   1       1091 2018-04-25 17:46 s3a://hadoop/output/part-r-00000
[root@master ~]#

[root@radosgw1 ~]# s3cmd ls s3://hadoop/output/
2018-04-25 09:47         0   s3://hadoop/output/_SUCCESS
2018-04-25 09:46      1091   s3://hadoop/output/part-r-00000
[root@radosgw1 ~]#
可以看到集群端和hadoop节点端都能看到。

至此,配置测试结束。

发布者

deelaaay

己所不欲,勿施于人。