Easily Convert PDF File to Word Using {Convert2Docx} R Package

Ifeanyi Idiaye
4 min readJul 13, 2023
convert2docx r package logo

In this post, I will show you how to easily convert your PDF file to a Word document using the Convert2Docx R package.

Convert2Docx is a lightweight R package that R developers and researchers can readily use in their workflow to easily convert PDF files to Word documents.

I wrote the package as an R wrapper for the pdf2docx Python module, which converts PDF files to Word documents in Python.

The package contains just three functions for converting your PDF file to Word or DOCX.

Now, let us see this package in action.

Install Package

For now, the Convert2Docx R package is only available on Github. Therefore, to install it, you will need to first of all install devtools. In your R console, run the code below to install devtools

# install devtools 
install. Packages("devtools")

Now that devtools has been installed, we can now go on to install Convert2Docx package from Github. Run the code below to install it

# install Convert2Docx from Github
devtools::install_github("Ifeanyi55/Convert2Docx")

Great! We are now ready to run some code. The next thing to do now is to install the conversion engine. Run the below code to install it

# install engine
install_engine()

Please note that should you encounter any problem installing the engine, go to your terminal and run

pip install pdf2docx

Make sure you have the latest version of Python installed on your machine before running the code though.

After doing that, try installing the conversion engine again, and if all goes well, you should be able to access the full functionality of the package.

Let us now start converting files!

Convert Entire PDF File

It is good to mention at this point that you do not have to read the PDF file into your R environment before you can convert it. All you need to do is just specify the path to the file from your current working directory like so and run the converter

# convert entire pdf file
pdf_file <- "TechNation.pdf"

Converter(pdf_file = pdf_file,
docx_filename = "TechNation.docx")

Now, if you check your current working directory, you should see the converted file there.

Here is the PDF file

pdf file to be converted to word

Here is the converted Word file

converted word document

Nice! Let us explore this package further by converting from one page to another.

Convert From One Page to the Other

Here, we will use another function in the package to convert pages 3 to 5 of the PDF file to Word

# convert from one page to the other
pdf_file <- "TechNation.pdf"

startANDend(pdf_file = pdf_file,
docx_filename = "threetofive.docx",
start = 2,
end = 5)

It is good to mention here that the pages of some PDF files might not be correctly numbered.

Therefore, when the conversion to Word is done, especially when converting selected pages, you could find that the page numbering is slightly different from what you are expecting.

In the example above, page 3 of the PDF file starts at page 2, which is why you see “start = 2” in the code instead of 3.

However, this “problem” does not occur when you convert the entire document to Word as demonstrated earlier.

Now, let us convert selected pages in the PDF file

Convert Selected Pages

To do this, you will need to parse a numeric vector representing the pages from the PDF file you want to convert

# convert selected pages
pdf_file <- "TechNation.pdf"

selectPages(pdf_file = pdf_file,
docx_filename = "selected.docx",
pages = c(2,4,6))

In the above code, I selected pages 2, 4, and 6 from the PDF file to be converted to Word, and it did a good job.

As you convert, don’t forget to check in your current working directory for the converted files.

So, there you have it! Now you can easily convert your PDF files to Word documents in the R programming language thanks to the Convert2Docx R package. 😃

Please, do not forget to give the Convert2Docx R package’s Github repository a ⭐️ if you find the package helpful.

I hope you enjoyed reading this post.

Follow me on Medium: @ifeanyidiaye

Follow me on Twitter: @Ifeanyidiaye

--

--

Ifeanyi Idiaye

I am a data scientist, who is passionate about AI and building AI-based solutions. I am also a data analytics developer and writer for Statistics Globe.