Today I will raise the ugly-but-working bar even further by publishing the method I wrote as follow-up, which basically strips/skips all the invalid characters from the resulting XML string, so it can be cast into a SimpleXML PHP class:
Some days ago I had to write some PHP code to extract the contents from some XML files (italian electronic invoices for Public Administrations, also known in Italy as FatturaPA): the work was pretty simple, yet I had to quickly solve two main problems: extracting XML data from a digitally signed .xml.p7m file and stripping away some invalid UTF8 characters in the XML content itself.
Regarding the P7M thing I have been lucky, since all the invoices were digitally signed using CAdES format, which – as you might already know – works by adding a PKCS#7 header and a signature info footer to the original file, meaning that we can easily get rid of them – as long as we don’t need to check the signature. It’s worth noting here that, as it was perfectly fine for my specific scenario – since everything was already verified – it could not be the case for most situations where you do want to check the signature before reading/using the file.
Visual Basic 6 (VB6) is like a mythical creature: no matter how hard you try to get rid of it, it always comes back haunting you sooner or later: most of the times it assumes the form of some old and outdated home-made software you really would like to dismiss, yet you still have it somewhere. Today we had another resurrection in our office involving a very old client-based software that we still use here and there.
The issue we had to face was related to the following URLEncode function hidden within the source code:
How can I get the MIME type from a file extension in C#? This is a rather common question among developers, an evergreen requirement that I happen to heard at least once a year from friends & colleagues working with ASP.NET MVC,ASP.NET Web API and (lately) .NET Core. The reason is pretty much obvious: whenever you end up working with file object storage in any web-based or client-based application, you will sooner or later have to retrieve the MIME type related to the byte array you’re dealing with.
There are a number of ways/techniques to do that, but – for the sake of simplicity – we will put them down to two: looking them up within the Windows Registry or relying to static, hard-coded MIME type lists. We won’t consider anything that involves querying an external service, as we do want an efficient way to deal with such issue.