C# - Find whether a PDF contains images or text A quick and simple way (with free code sample) to check if a PDF file contains images and/or text using Syncfusion PDF class library for .NET

How to handle multipage TIFF files with ASP.NET C# (GDI+ alternative)

If you often work with PDF files, you've probably already heard of Syncfusion Essential Studio, a .NET-based software product providing solutions to most of the complex problems faced during application development. Among the many components available the product includes a neat PDF framework, a feature-rich .NET PDF class library developed with 100% managed C# code that can be used to create, read and write PDF files using Windows Forms, WPF, ASP.NET Web Forms, ASP.NET MVC, ASP.NET Core, Blazor, UWP, Xamarin, Flutter applications and Unity platform without any external dependency (no Adobe Acrobat required).

I have recently used Syncfusion's PDF framework to solve a task I was given a few days ago: determine if a bunch of PDF files contain images, text, or both. Needless to say, the files were actually a lot (more than 500K), so this couldn't be done manually. In this post, I'll share the source code I have used to deal with this issue.

Getting the Packages

Let's start with the NuGet packages I have used to fulfill the job.

The first package contains the base classes to handle PDF files, while the latter contains specific modules and extension methods to deal with PDF-embedded images.

For both packages I have used the 20.4.0.44 version, which was the latest at the time of writing (and fully compatible with .NET 6 and .NET 7): feel free to update it to a newer version!

Creating the App

The next thing I did was create a simple .NET 6 console application using the standard Visual Studio 2022's C# Console App template, as shown in the screenshot below.

C# - Find whether a PDF contains images or text

I chose a console app since I didn't need a user interface to do the job: however, the core part of the source code you will find in this post can be also used within any ASP.NET Core Web app, as well as WPF app, Web Form app, and so on.

Source Code

Without further ado, here's the source code:

IMPORTANT: for the latest version of the source code, check out the PDFInspector project page on GitHub.

As we can see, the code is quite simple to understand. Here's what we are doing in a nutshell:

  • Retrieve a list of the PDF file paths.
  • Cycle through each one of them.
  • Use Syncfusion PDF to extract the character count and/or the image file count from each PDF page.
  • Use the above counters to determine if the PDF contains text and/or images (or none).

The overall outcome is stored in the pdfType local variable: in my above example I have used a string, but you could replace it with an enum, a const value, or anything else you might want to use instead.

As you can see, I have placed some TODO comments to highlight the source code lines where you need to add your own stuff, such as: adding the Syncfusion License Key; retrieving the list of PDF file paths; do something after we have determined the pdfType, and so on.

Syncfusion License Key

The Syncfusion Essential Studio license key can be purchased from the Syncfusion official website. The product is quite expensive, but there's some great news for you: the company offers a FREE community license for all companies and individuals with less than $1 million USD in annual gross revenue and 5 or fewer developers. That's precisely what I did (since I am poor enough to be eligible!), thus getting the entire product line (worth more than $ 12K!) for no cost. If you are eligible as well, I strongly suggest you do the same!

Conclusion

That's it, at least for now: I hope that my source code sample will help other .NET developers looking for a way to determine whether a PDF file contains images and/or text!

Fork me on GitHub

About Ryan

IT Project Manager, Web Interface Architect and Lead Developer for many high-traffic web sites & services hosted in Italy and Europe. Since 2010 it's also a lead designer for many App and games for Android, iOS and Windows Phone mobile devices for a number of italian companies. Microsoft MVP for Development Technologies since 2018.

View all posts by Ryan

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.