ASP.NET - Convert PDF to TXT (Plain-Text) or HTML in C# with iTextSharp An useful C# code snippet to convert PDF files into TXT plain-text or HTML in C# with iTextSharp, an open-source PDF management library for ASP.NET

Classe ASP.NET C# per il controllo e il calcolo formale del Codice Fiscale

Today I had to find a quick way to programmatically convert a bunch of PDF files into txt / text / plain-text format within an ASP.NET web application. Unfortunately, there aren't much open-source libraries that can do that.

After some time struggling with Google, I stumbled upon an old friend of mine - iTextSharp, a great PDF management library for ASP.NET that I used a while ago to fullfill a rather different task involving PDF parsing. By reading the updated SourceForge page I acknowledged that the (once) open-source code has evolved into a commercial product called iText, available for Java and .NET through a Java-port which is still called iTextSharp. Luckily enough, iText also offers a Comunity Edition coming with an AGPL licence model.

Long story short, I installed iTextSharp 5.5.13 from NuGet  and used it to pull off this simple helper class that extracts the text from any PDF file:

Needless to say, once we extract the plain-text we can easily format and/or style it using some fancy HTML markup in the following way:

That's about it: I sincerely hope that this simple class will help those who're looking for an easy way to convert PDF into plain-text or HTML.

 

About Ryan

IT Project Manager, Web Interface Architect and Lead Developer for many high-traffic web sites & services hosted in Italy and Europe. Since 2010 it's also a lead designer for many App and games for Android, iOS and Windows Phone mobile devices for a number of italian companies. Microsoft MVP for Development Technologies since 2018.

View all posts by Ryan

3 Comments on “ASP.NET - Convert PDF to TXT (Plain-Text) or HTML in C# with iTextSharp An useful C# code snippet to convert PDF files into TXT plain-text or HTML in C# with iTextSharp, an open-source PDF management library for ASP.NET

    1. Hello there,

      this post is specifically for plaintext-to-PDF: if HTML-to-PDF is what you’re looking for, check out the following post:

      • https://www.lifewire.com/pdf-to-html-conversion-tools-3469173
  1. Please give full example, after conversion pdf text to string, how we can display it in Big Text Editor as a proper html input in browser and after that how we can change the content and rewrite the pdf again, with new text.

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.