Docx to pdf issue and html to pdf conversion issue

We are using community version of only office API.When uploading a .docx file and converting to PDF via the OnlyOffice conversion tool, the output PDF sometimes contains corrupted or unreadable content. This happens even with simple, valid .docx files.



Hello,
Please provide the following information:

  1. What version of Document Server are you using?
  2. What is the type of the Document Server installation? (deb/rpm, Docker, Windows Server)
  3. How are you performing the conversion? Please describe the steps in detail.
  4. Attach the original file that reproduces the issue.

1-version": “8.3.3.18”
2-Docker
3-we are following below mentioned steps for document conversion :-

a- we are using this payload as a jwt token having following data
HashMap<String, Object> map = new HashMap<>();
map.put(“async”, false);
map.put(“filetype”, fromFileType);
map.put(“key”, “output_” + fromFileType + “to” + toFileType + “_” + System.currentTimeMillis());
map.put(“outputtype”, toFileType);
map.put(“title”, docName);
map.put(“url”, documentURL);

b- documentURL - /rest/only-office-service/files/175421734/contents

             while fetching document url - 
	    response.addHeader("Content-Disposition", "attachment;filename=" +
		new String(filename.getBytes("utf-8"), "ISO-8859-1"));			
		response.addHeader("Content-Length", String.valueOf(docAttachMpg.getDocSize()));
		StreamUtils.copy(fis, response.getOutputStream());
        response.setContentType(MediaType.APPLICATION_OCTET_STREAM_VALUE);		

c- REQUEST METHOD- POST
REQUEST URL - http://document-converter.oswas.gov.in/converter
REQUEST BODY - {
“token”:“eyJhbGciOiJIUzI1NiJ9.eyJhc3luYyI6ZmFsc2UsImZpbGV0eXBlIjoiZG9jeCIsIm91dHB1dHR5cGUiOiJwZGYiLCJ0aXRsZSI6IkxldHRlcihEcmFmdC0yKS5wZGYiLCJleHAiOjE3NTMxNzg1NzgsImtleSI6Im91dHB1dF9kb2N4X3RvX3BkZl8xNzUzMTc4MjY2NzYyIiwidXJsIjoiL3Jlc3Qvb25seS1vZmZpY2Utc2VydmljZS9maWxlcy8xNzU0MjE3MzQvY29udGVudHMifQ.0Ggw_EpjnYOtxRb-53QVWYSHMSFBGXRoaHU_vSSEUKU”
}
4-unable to upload the original document showing “new user can not reply”

You now have permission to upload files. Please share the original document. Also, note that the version you are using is outdated, kindly update to the latest version (9.0.3) and check if the issue still occurs.

Thanks Dmitrii. We are trying with the latest version pulled from docker. Can you please share your thoughts if this could be a problem due to async parameter ? We are currently having “async”:“false” but are getting multiple concurrent conversion request to this endpoint. Does the community versions limit of 20 connections apply to the conversion endpoint also ? We are using /converter endpoint as a REST call from our main App.
We are using this implementation only for conversion of html and docx to pdf , we are not using this service for loading the documents in edit mode.

The number of requests should not cause document corruption.
Please provide the original document, so we could check on our side. But first make sure that you are actually checking on the latest version.
First, ensure that you are checking on the latest version. You can verify it here:
info

We are using the API with the /converter endpoint for PDF conversion. This process is handled in the API layer, not on the frontend.

REQUEST METHOD- POST

	REQUEST URL - http://document-converter.oswas.gov.in/converter

REQUEST BODY- HashMap<String, Object> map = new HashMap<>();
map.put(“async”, false);
map.put(“filetype”, fromFileType);
map.put(“key”, “output_” + fromFileType + “to” + toFileType + “_” + System.currentTimeMillis());
map.put(“outputtype”, toFileType);
map.put(“title”, docName);
map.put(“url”, documentURL);

OnlyOffice Version we are using is “version”: “9.0.3.29”

Please provide the original file for analysis as well. All sensitive data can be erased or substituted, as long as the problem remains reproducible

Please find the attachment with erased customer data.
onlyoffice.docx (10.4 KB)

And we are unable to upload .html file . And we are facing this issue only in our production environment.
Please provide immediate resolution so that we can proceed.

Waiting your response , kindly respond to our concerns

The above-provided file cannot be opened with any editor, it appears to be corrupted

Thanks for looking into this. The file does not appear to be corrupted, since we are able to successfully convert the same document to PDF using the Office API in our test and pre-production environments. Also the same file is converted to pdf using libre office. The issue only occurs in production while using only office API, where the converted PDF outputs unreadable / gibberish content.

Could you please help us confirm if there are any known discrepancies or additional prerequisites for document-to-PDF conversion in production environments?

Hello @oswas

It depends on the infrastructure of your environments. Basically, Document Server in Docker will have the same dependencies inside, the only difference could be there in services that work with Document Server in your test and production environments.


In the meantime, I have checked provided file and one particular XML file is missing in it. How it was created initially? I have also tried opening attached file in various processors and none of them did manage to open it. How do you open it?

Also, checking JWT from the provided sample, I noticed that URL parameter has following value:

 "url": "/rest/only-office-service/files/175421734/contents"

Can you please make sure that the provide link actually contains a file for download and not a stream of the file? It must be absolute URL to the file, not a stream.