DocumentBuilder returns empty Json string for caption

DocumentBuilder version: document-builder 8.3.3.22
Installation method: document-builder · PyPI
OS: MacOS arm

Additional information:
The ToJSON method returns empty string for caption paragraph. The following is an example code to reproduce the result.

# Initialize document builder
from docbuilder import docbuilder

builder = docbuilder.CDocBuilder()

# Open the existing DOCX file
builder.OpenFile("好的.docx", "")

# Get the document context
context = builder.GetContext()
global_obj = context.GetGlobal()
api = global_obj["Api"]

# Get the document
document = api.Call("GetDocument")

# Get element count
elements_count = document.Call("GetElementsCount")
element_count_val = elements_count.ToInt() if elements_count else 0

for i in range(element_count_val):
    element = document.Call("GetElement", i)
    print(f"Processing element {i + 1}/{element_count_val}")
    text = element.Call("GetText").ToString()
    print(f"Element Text: {text}")
    element_json = element.Call("ToJSON", True, True)
    print(element_json.ToString())
    print("=" * 80)

The output is:

Processing element 1/11
Element Text: 标题1

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr","pStyle":"711"},"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[{"..."}}}
================================================================================
Processing element 2/11
Element Text: 标题2

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr","pStyle":"703"},}}}
================================================================================
Processing element 3/11
Element Text: 正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr"},"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":["正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文正文"],"footnotes":[],"endnotes":[],"type":"run"},{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"endRun"}],"changes":[],"type":"paragraph"}
================================================================================
Processing element 4/11
Element Text: 1)	12312312312312

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr","pStyle":"710"},"}}
================================================================================
Processing element 5/11
Element Text: 2)	12312321322222

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr","pStyle":"710"},"rPr":{"l}}}
================================================================================
Processing element 6/11
Element Text: 3)	21312312312222

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr","pStyle":"710"},"rPr":{"}}}
================================================================================
Processing element 7/11
Element Text: 

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr"},"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"run"},{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"endRun"}],"changes":[],"type":"paragraph"}
================================================================================
Processing element 8/11
Element Text: 
{"bPresentation":false,"tblGrid":[{"w":2899,"type":"gridCol"},{""}}}
================================================================================
Processing element 9/11
Element Text: 

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr"},"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"run"},{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"endRun"}],"changes":[],"type":"paragraph"}
================================================================================
Processing element 10/11
Element Text: 表 1 test test


================================================================================
Processing element 11/11
Element Text: 

{"bFromDocument":true,"pPr":{"bFromDocument":true,"type":"paraPr"},"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"run"},{"bFromDocument":true,"rPr":{"lang":{"eastAsia":"zh-CN","val":"en-US"},"rFonts":{"hint":"eastAsia"},"bFromDocument":true,"type":"textPr"},"content":[],"footnotes":[],"endnotes":[],"type":"endRun"}],"changes":[],"type":"paragraph"}
================================================================================

The result for the paragraph “表 1 test test” is empty. Note: I removed some parts of the JSON string from the output because they are too long.

好的.docx (14.0 KB)

Hey @lotus :wave:

Thank you for reporting the issue with DocumentBuilder.

  1. We recommend starting by updating to the latest version of DocumentBuilder, 9.0.4, which was recently released. This update may address the issue you’re experiencing.
  2. If you have a valid license, please consider reaching out via mail for faster and more personalized support.
  3. After updating, we’ll assist you further in troubleshooting. There’s a possibility that the ToJSON method does not properly handle captions (such as “表 1 test test”), resulting in an empty string.

Please update to version 9.0.4 and let us know the results, sharing any new logs or details if the problem persists.

Thank you for your reply.

I tried the updated package. There is another problem. Using the same script I gave, the GetElementsCount method prints 0. I also tried other files, and the results are the same.

The purchase is processing, so now I can’t use the faster helping way. Thanks for your kindness.

Sorry, I made a mistake. It works now. Thank you very much.

1 Like