PHP – How to strip P7M data from a XML.P7M file or string (CAdES, FatturaPA)

php-cgi.exe - The FastCGI process exited unexpectedly error and how to fix it

Some days ago I had to write some PHP code to extract the contents from some XML files (italian electronic invoices for Public Administrations, also known in Italy as FatturaPA): the work was pretty simple, yet I had to quickly solve two main problems: extracting XML data from a digitally signed .xml.p7m file and stripping away some invalid UTF8 characters in the XML content itself.

Since I had to get the job done quickly, I’ve dealt with both tasks using quick’n’dirty workarounds by fully taking advantage of the famous PHP “double clawed hammer” features: we’ll be dealing with the first one here, while the latter has been addressed in another dedicated post.

PHP - How to strip P7M data from a XML.P7M file or string (CAdES, FatturaPA)
Credits to Ian Baker for this awesome handmade job. Look at his project on Flickr here: https://www.flickr.com/photos/raindrift/sets/72157629492908038

Regarding the P7M thing I have been lucky, since all the invoices were digitally signed using CAdES format, which – as you might already know – works by adding a PKCS#7 header and a signature info footer to the original file, meaning that we can easily get rid of them – as long as we don’t need to check the signature. It’s worth noting here that, as it was perfectly fine for my specific scenario – since everything was already verified – it could not be the case for most situations where you do want to check the signature before reading/using the file.

That said, here’s the code that I came up with:

There’s no need to explain it, as the underlying logic is pretty simple: we just strip everything positioned before the XML tag (the PKCS#7 header) and after the last XML closing tag (the signature info footer). That’s quite ugly, I second that, yet it gets the job done. I would be happy to replace it with some better code anytime soon, hoping I’ll have the time.

The common usage case for such a function would be within the server-side script that receives a POST REQUEST in multipart format containing the XML.P7M file as a parameter, just like in the following example:

… and so on.

Use it with caution, and… happy parsing!

Useful References

 

RELATED POSTS

About Ryan

IT Project Manager, Web Interface Architect and Lead Developer for many high-traffic web sites & services hosted in Italy and Europe. Since 2010 it's also a lead designer for many App and games for Android, iOS and Windows Phone mobile devices for a number of italian companies.

View all posts by Ryan