Skip to content

文档内容抽取

适用版本>=v7.0.2510.20251023

接口说明:文档内容抽取

标签文件

请求说明

字段
请求地址
配置域名+/openapi+/v7/drives/{drive_id}/files/{file_id}/content
HTTP 方法
GET
接口描述
文档内容抽取
签名方式
KSO-1(配置域名+/openapi不参与签名)
限频策略
权限要求
查询和管理文件(用户授权) kso.file.readwrite
查询文件(用户授权) kso.file.read

请求头 (Headers)

属性名类型是否必填描述可选值
X-Kso-Date
stringRFC1123 格式的日期,例: Wed, 23 Jan 2013 06:43:08 GMT-
X-Kso-Authorization
stringKSO-1 签名值,详见《签名方法》-
Authorization
string授权凭证,格式为:Bearer {access_token}-

路径参数 (Path)

属性名类型是否必填描述可选值
drive_id
string驱动盘id-
file_id
string文件id-

查询参数 (Query)

属性名类型是否必填描述可选值
format
string文档内容目标格式kdc, plain, markdown, html
include_elements
array指定抽取元素。默认元素为para,且一定会被导出;其余附加元素根据请求参数选择性导出。para, table, component, textbox, all

响应体(Response)

HTTP状态码: 200
响应体格式: application/json

没有可用的数据

响应体示例

json
{
  "data": {
    "attachment_url": "string",
    "doc": {
      "comments": [
        {
          "blocks": [
            {
              "bounding_box": {
                "x1": 0,
                "x2": 0,
                "y1": 0,
                "y2": 0
              },
              "component": {
                "media_id": "string",
                "type": "image"
              },
              "id": "string",
              "index": 0,
              "page_index": 0,
              "para": {
                "prop": {
                  "alignment": "left",
                  "def_run_prop": {
                    "bold": true,
                    "color": "string",
                    "font_ascii": "string",
                    "font_east_asia": "string",
                    "size": 0
                  },
                  "list_string": "string",
                  "outline_level": 0
                },
                "runs": [
                  {
                    "id": "string",
                    "prop": {
                      "bold": true,
                      "color": "string",
                      "font_ascii": "string",
                      "font_east_asia": "string",
                      "size": 0
                    },
                    "text": "string"
                  }
                ]
              },
              "rotate": 0,
              "table": {
                "rows": [
                  {
                    "cells": [
                      {
                        "blocks": [
                          {}
                        ],
                        "col_span": 0,
                        "id": "string",
                        "row_span": 0
                      }
                    ]
                  }
                ]
              },
              "tags": [
                {
                  "name": "string",
                  "value": "string"
                }
              ],
              "textbox": {
                "blocks": [
                  {}
                ]
              },
              "type": "para"
            }
          ],
          "references": [
            {
              "id": "string",
              "type": "run"
            }
          ]
        }
      ],
      "medias": [
        {
          "data": "string",
          "id": "string",
          "url": "string"
        }
      ],
      "prop": {
        "page_count": 0,
        "page_props": [
          {
            "dpi": 0,
            "offset_angle": 0,
            "rotate": 0,
            "size": {
              "height": 0,
              "width": 0
            }
          }
        ]
      },
      "tree": {
        "blocks": [
          {}
        ],
        "children": [
          {}
        ],
        "outline_level": 0
      }
    },
    "doc_url": "string",
    "dst_format": "kdc",
    "file_info": {
      "is_scan": true,
      "sheet_num": 0,
      "total_page_num": 0
    },
    "html": "string",
    "is_partly_exported": true,
    "markdown": "string",
    "plain": "string",
    "src_format": "docx",
    "src_format_detail": "string",
    "version": "string"
  },
  "code": 0,
  "msg": "string"
}