Node.js读取word文件内容并输出

By adminmysql360On 2024年11月26日2024年11月25日

在 Node.js 中，可以使用 mammoth 或 officegen 等库来读取 Word 文件内容并输出。以下是使用 mammoth 读取 .docx 文件内容的示例代码：

步骤 1: 安装依赖

npm install mammoth

步骤 2: 创建 Node.js 脚本

const fs = require('fs');
const mammoth = require('mammoth');

// 读取 Word 文件内容
function readWordFile(filePath) {
    fs.readFile(filePath, (err, data) => {
        if (err) {
            console.error("Error reading file:", err);
            return;
        }

        mammoth.extractRawText({ buffer: data })
            .then((result) => {
                console.log("Word file content:");
                console.log(result.value); // 输出 Word 文件中的纯文本内容
            })
            .catch((err) => {
                console.error("Error processing Word file:", err);
            });
    });
}

// 指定 Word 文件路径
const filePath = "example.docx";
readWordFile(filePath);

说明

mammoth库：mammoth 主要用于从 Word 文件中提取纯文本内容，特别适合以结构化方式存储的文件（如标题、段落）。
支持的文件格式：目前仅支持 .docx 格式，不支持 .doc 格式。如果需要支持 .doc 文件，可以考虑 unoconv 或类似工具。

运行脚本

将上述代码保存为 read-word.js，然后运行以下命令：

node read-word.js

示例输出

假设 example.docx 的内容为：

Hello, this is a test Word document.

脚本输出：

Word file content:
Hello, this is a test Word document.

如果要处理更复杂的格式（例如表格、图片等），可以尝试其他库，例如 docxtemplater 或 word-extractor，但这些库更适合特定用途。

浏览量: 69

Node.js读取word文件内容并输出

步骤 1: 安装依赖

步骤 2: 创建 Node.js 脚本

说明

运行脚本

示例输出

相关文章：

发表回复 取消回复

发表回复取消回复