29 May 2009

Today I ran into a funny issue. I was playing around with PowerShell in order to automate some repetitive task I have to do at work (it is Web UI automation, I’ll post on that later), and when reading XML file using Get-Content I had encoding issues. When the XML contained a ü character, it was replaced in the output file by some junk that immediately tells you that there is something wrong with encoding somewhere.

Let’s see this with a little example.

Here is a sample XML file:


Ok, the content is ugly, but you get the point. Let’s read this in PowerShell and see what happens

$xmlData = [xml] (gc .\Input.xml)

And now, let’s see the content of that variable



Something was lost in translation. You might think that it’s only a display issue, as help on Get-Content says that the –encoding switch but it is not. If you write the content of the $xmlData variable using .Save() or simply by out-putting it to a file (using > operator), the content of the file will not be valid.

To make this work, there is a switch in Get-Content: -encoding. You can specify which encoding format. So, using this command:

$xmlData = [xml] (gc .\Input.xml -enc UTF8)

The data should be read properly. Now if I display the content of $xmlData once more, here is what I get:


Perfect, I got what I want. I can now save or output it properly!

PS: if you go a gm (Get-Member) on $xmlData.Test.EvilStringOfDeath you won’t see the get_innertext() member. I don’t know why, but still, it works. I found it over here.

PS2: I had that issues at work, when I’m running a XP laptop. Now that I’m home on my Windows 7 RC box, the –encoding switch of Get-Content doesn’t even show in the help… There sure is a good explanation, but I don’t have it.

blog comments powered by Disqus