《进击<a target="_blank" href="https://www.huoban.com/news/tags-790.html"style="font-weight:bold;">大数据</a>》系列教程之hadoop大数据基础-伙伴云

《进击大数据》系列教程之hadoop大数据基础

网友投稿 768 2025-04-03

前言

一、使用http的方式访问hdfs

二、hdfs各组件及其作用

三、hdfs中的数据块（Block）

四、java api 操作hdfs

五、java 开发 hdfs 应用时注意的事项

六、DataNode心跳机制的作用

七、NameNode中EditsLog 和FSImage 的作用

八、SecondaryNameNode 帮助NameNode 减负

九、NameNode 如何扩展？

十、创建文件的快照（备份）命令

十一、平衡数据

十二、safemode 安全模式

前言

时隔一年多，忙忙碌碌一直在做java web端的业务开发，大数据基本忘的差不多了，此次出一个大数据系列教程博文将其捡起。

hadoop，hdfs的下载安装以及启动，停止在这里就不一一介绍了，不会的可以查看我的历史博客。

默认我们已经搭建好了一个三节点的主从hadoop节点，一个master，两个salve

一、使用http的方式访问hdfs

在hdfs-site.xml中增加如下配置，然后重启hdfs：

dfs.webhdfs.enabled

true

使得可以使用http的方式访问hdfs

使用http访问：

查询user/hadoop-twq/cmd 文件系统中的error.txt文件

http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=LISTSTATUS

http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=OPEN

支持的op见：

http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

二、hdfs各组件及其作用

三、hdfs中的数据块（Block）

数据块的默认大小：128M

设置数据块的大小为：256M = 256*1024*1024

在${HADOOP_HOME}/ect/hadoop/hdfs-site.xml增加配置：

dfs.block.size

268435456

数据块的默认备份数是3

设置数据块的备份数

对指定的文件设置备份数

hadoop fs -setrep 2 /user/hadoop-twq/cmd/big_file.txt

全局文件备份数是直接在hdfs-site进行设置：

dfs.replication

数据块都是存储在每一个datanode 所在的机器本地磁盘文件中

四、java api 操作hdfs

使用javaapi 写数据到文件

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataOutputStream;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import java.io.IOException;

import java.net.URI;

import java.net.URISyntaxException;

import java.nio.charset.StandardCharsets;

/**

* @author duanzhaoxu

* @ClassName:

* @Description:

* @date 2020年12月17日 17:35:37

public class hdfs {

public static void main(String[] args) throws Exception {

String content = "this is a example";

String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";

Configuration configuration = new Configuration();

FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);

FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(dest));

fsDataOutputStream.write(content.getBytes(StandardCharsets.UTF_8));

fsDataOutputStream.close();

}

使用java api 读取文件

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataInputStream;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;

import java.io.InputStreamReader;

import java.net.URI;

《进击大数据》系列教程之hadoop大数据基础

/**

* @author duanzhaoxu

* @ClassName:

* @Description:

* @date 2020年12月17日 17:35:37

public class hdfs {

public static void main(String[] args) throws Exception {

String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";

Configuration configuration = new Configuration();

FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);

FSDataInputStream fsDataInputStream = fileSystem.open(new Path(dest));

BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));

String line = null;

while (bufferedReader.readLine() != null) {

System.out.println(line);

}

fsDataInputStream.close();

bufferedReader.close();

}

使用java api 获取文件状态信息

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataInputStream;

import org.apache.hadoop.fs.FileStatus;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.fs.permission.FsAction;

import org.apache.hadoop.fs.permission.FsPermission;

import java.io.BufferedReader;

import java.io.InputStreamReader;

import java.net.URI;

/**

* @author duanzhaoxu

* @ClassName:

* @Description:

* @date 2020年12月17日 17:35:37

public class hdfs {

public static void main(String[] args) throws Exception {

//获取指定文件的文件状态信息

String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";

Configuration configuration = new Configuration();

FileSystem fileSystem = FileSystem.get(URI.create("hdfs://master:9999/"), configuration);

FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));

System.out.println(fileStatus.getPath());

System.out.println(fileStatus.getAccessTime());

System.out.println(fileStatus.getBlockSize());

System.out.println(fileStatus.getGroup());

System.out.println(fileStatus.getLen());

System.out.println(fileStatus.getModificationTime());

System.out.println(fileStatus.getOwner());

System.out.println(fileStatus.getPermission());

System.out.println(fileStatus.getReplication());

System.out.println(fileStatus.getSymlink());

//获取指定目录下的所有文件的文件状态信息

FileStatus[] fileStatuses = fileSystem.listStatus(new Path("hdfs://master:9999/user/hadoop-twq/cmd"));

for (FileStatus status : fileStatuses) {

System.out.println(status.getPath());

System.out.println(status.getAccessTime());

System.out.println(status.getBlockSize());

System.out.println(status.getGroup());

System.out.println(status.getLen());

System.out.println(status.getModificationTime());

System.out.println(status.getOwner());

System.out.println(status.getPermission());

System.out.println(status.getReplication());

System.out.println(status.getSymlink());

}

//创建目录

fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"));

//创建目录并指定权限 rwx--x---

fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/temp"), new FsPermission(FsAction.ALL, FsAction.EXECUTE, FsAction.NONE));

//删除指定文件

fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java/1.txt"), false);

//删除指定目录

fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"), true);

}

五、java 开发 hdfs 应用时注意的事项

//需要把core-site.xml文件放到resources目录下，自动读取hdfs的ip端口配置

String dest = "user/hadoop-twq/cmd/java_writer.txt";

Configuration configuration = new Configuration();

FileSystem fileSystem = FileSystem.get(configuration);

FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));

六、DataNode心跳机制的作用

七、NameNode中EditsLog 和FSImage 的作用

八、SecondaryNameNode 帮助NameNode 减负

九、NameNode 如何扩展？

向主节点master的hdfs-site.xml 增加如下配置

查看 master节点的 clusterId

将hdfs-site.xml 从主节点master 拷贝到 slave1 和 slave2

形成三个节点的集群之后，我们使用java api 就不知道指定哪个 nameNode 的ip：端口了，所以我们需要进行viewFs配置

首先将core-site.xml 中的fs.defaultFS 配置项注释掉，然后添加如下配置

fs.default.name

viewfs://my-cluster

然后增加一个 mountTable.xml 文件( 元数据管理分布映射，相当于将namenode 管理的元数据分散到不同的namenode节点上)

然后将修改好的配置文件同步到 slave1 和 slave2 上，重启hdfs集群即可

重启之后在任意节点都可以使用通用的请求方式，例如：

hadoop fs -ls viewfs:://my-cluster/

十、创建文件的快照（备份）命令

给指定的目录授权创建快照

hadoop dfsadmin -allowSnapshot /user/hadoop-twq/data

创建快照

hadoop fs -createSnapshot /user/hadoop-twq/data data-20180317-snapshot

查看创建的快照文件

hadoop fs -ls /user/hadoop-twq/data/.snapshot/data-20180317-snapshot

其他快照相关命令

十一、平衡数据

当我们对hdfs 集群进行扩展的时候，难免新扩展进来的节点，分配的数据量较少，这个时候为了能够均衡的分配数据，可以使用hdfs balancer命令

十二、safemode 安全模式

开启安全模式之后无法创建，删除目录和文件，只允许查看目录和文件

hadoop dfsadmin -safemode get

Safe mode is OFF

hadoop dfsadmin -safemode enter

Safe mode is ON

hadoop dfsadmin -safemode leave

Safe mode is ON

大数据 Hadoop

大数据 服务上云的思考">大数据 服务上云的思考

768 2025-04-03

公众号文章汇总

768 2025-04-03

国美&华为，战略合作签约！

768 2025-04-03

《进击大数据》系列教程之hadoop大数据基础

大数据 服务上云的思考">大数据 服务上云的思考

公众号文章汇总

国美&华为，战略合作签约！

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

进销存库存管理盘点">简单进销存库存管理盘点

定制订单管理系统（为特定需求定制的订单管理系统）

友情链接

《进击大数据》系列教程之hadoop大数据基础

微信扫一扫：分享

大数据服务上云的思考">大数据服务上云的思考

推荐文章

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

进销存库存管理盘点">简单进销存库存管理盘点

友情链接