How to extract text from a PDF File or doccument

Do you need to extract text from a PDF file but you do not know how to do it?This guide is definitely your case. Get Images, Text or Fonts out of a PDF File. With this free online tool you can extract Images, Text or Fonts from a PDF File. If you want to extract text from PDF, you could import the pdf file into Google Docs, then export it to a more friendly format such as .html, .odf, .rtf

OCR  comes from the word Optical character recognition, that is, optical character recognition and very often it has been heard. But if we want to apply this recognition even within a pdf file?

The first thing to do is to check if the file is protected by anti-copy protection. To check it, simply open the file and go to the Properties menu on the File menu . If you read Allow next to the Copy and Print items , this means that the file has no protection, so there should be no problems in extracting the text from the file. If you can not copy and extract the text from the PDF it is highly likely that the document in question was created with simple scanned images.

Now, before switching to the OCR phase, you must check that the selection tool is enabled  and to do so just click on any point in your pdf sheet with the right button. Once you have selected the text, just click CTRL + C to copy it.

At this point we can see some alternatives to make OCR from PDF .

OCR with Office Lens

Office Lens , the Microsoft Office application is particularly useful for anyone who is often faced with paperwork and wants to make OCR without even using a scanner.

  • Download link | Office Lens for Android
  • Download link  | Office Lens for Apple

The Office Lens app is very effective and easy to use: just point the smartphone / tablet camera on the document you want to capture. After capturing, choosing to save the document as PDF to Onedrive, its content will automatically be submitted to OCR.

To ensure that everything is successful, just open the PDF file in Office Online (after logging on to OneDrive) and then make sure that you can now select piece of text, copy it through CTRL + C and then paste it from anywhere else without any kind of problem.

Integrated OCR in Office Online

If you already have the PDF document consisting only of images (so you do not have to digitize it like before) you can upload it to OneDrive then convert the file to Word by clicking on its name by choosing Edit in Word .

Again, the conversion will be handled by Microsoft automatically, and once completed, you can open the document on Word Online and use the CTRL + C key combination again to copy the text.

OCR with PDF-XChange Viewer

An alternative solution to Office Online is to use the PDF-XChange Viewer program. This application implemented a fully comprehensive OCR module capable of recognizing characters inside a PDF, making it a removable text.

The thing that makes it a valid (and very effective) tool is the ability to download an Italian dictionary to include within the program. It should also be noted that the files covered by this program will all be local and therefore there will be no need to lean on the cloud.

OCR with Microsoft OneNote

Even the well-known Microsoft software, OneNote, lets you launch OCR on previously scanned pages via your smartphone. The application is available in stores at the following links:

  • Download link | OneNote for Android
  • Download link | OneNote for Apple

The operating principle is practically the same as that of OfficeLens, so it’s enough to use your smartphone / tablet camera to digitize the document and then give it a meal to the program that, thanks to its OCR module, will make the text copyable and allow it to be extracted.

