Get-Content and –encoding Switch
Today I ran into a funny issue. I was playing around with PowerShell in order to automate some repetitive task I have to do at work (it is Web UI automation, I’ll post on that later), and when reading XML file using Get-Content I had encoding issues. When the XML contained a ü character, it was replaced in the output file by some junk that immediately tells you that there is something wrong with encoding somewhere.
Let’s see this with a little example.
Here is a sample XML file:
<Test> <EvilStringOfDeath><![CDATA[ a'b<'>",!"/%$?$&?%*(()%/"!"/&?%$/"*&$/"?%&?-f¯Ñ112üêù ]]></EvilStringOfDeath> </Test>
Ok, the content is ugly, but you get the point. Let’s read this in PowerShell and see what happens
$xmlData = [xml] (gc .\Input.xml)
And now, let’s see the content of that variable
Something was lost in translation. You might think that it’s only a display issue, as help on Get-Content says that the –encoding switch but it is not. If you write the content of the $xmlData variable using .Save() or simply by out-putting it to a file (using > operator), the content of the file will not be valid.
To make this work, there is a switch in Get-Content: -encoding. You can specify which encoding format. So, using this command:
$xmlData = [xml] (gc .\Input.xml -enc UTF8)
The data should be read properly. Now if I display the content of $xmlData once more, here is what I get:
Perfect, I got what I want. I can now save or output it properly!
PS: if you go a gm (Get-Member) on $xmlData.Test.EvilStringOfDeath you won’t see the get_innertext() member. I don’t know why, but still, it works. I found it over here.
PS2: I had that issues at work, when I’m running a XP laptop. Now that I’m home on my Windows 7 RC box, the –encoding switch of Get-Content doesn’t even show in the help… There sure is a good explanation, but I don’t have it.
blog comments powered by Disqus