PowerShell: Get-FileEncoding
I often work in PowerShell, and one day I needed to create a script that would pull the file encoding out a file.
Encodings
However, this proved to be difficult since most encodings don’t require a BOM (Byte Order Mark). Here’s some good information that I found on the subject:
Automatically determining the correct encoding for a given byte array is notoriously difficult. Sometimes, to be helpful, the author of the data will insert something called a BOM (Byte Order Mark) at the beginning of the data. If a BOM is present, that makes detecting the encoding painless, since each encoding uses a different BOM.
However, the problem remains, how do you automatically detect the correct encoding when there is no BOM? Technically it’s recommended that you don’t place a BOM at the beginning of your data when using UTF-8, and there is no BOM defined for any of the ANSI code pages. So it’s certainly not out of the realm of possibility that a text file may not have a BOM. If all the files that you deal with are in English, it’s probably safe to assume that if no BOM is present, then UTF-8 will suffice. However, if any of the files happen to use something else, without a BOM, then that won’t work.
The Code
<# | |
.SYNOPSIS | |
Gets file encoding. | |
.DESCRIPTION | |
The Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM). | |
Based on port of C# code from http://www.west-wind.com/Weblog/posts/197245.aspx | |
.OUTPUTS | |
System.Text.Encoding | |
.PARAMETER Path | |
The Path of the file that we want to check. | |
.PARAMETER DefaultEncoding | |
The Encoding to return if one cannot be inferred. | |
You may prefer to use the System's default encoding: [System.Text.Encoding]::Default | |
List of available Encodings is available here: http://goo.gl/GDtzj7 | |
.EXAMPLE | |
# This command gets ps1 files in current directory where encoding is not ASCII | |
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {[string]$_.Encoding -ne 'System.Text.ASCIIEncoding'} | |
.EXAMPLE | |
# Same as previous example but fixes encoding using set-content | |
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {[string]$_.Encoding -ne 'System.Text.ASCIIEncoding'} | foreach {(get-content $_.FullName) | set-content $_.FullName -Encoding ASCII} | |
.NOTES | |
Version History | |
v1.0 - 2010/08/10, Chad Miller - Initial release | |
v1.1 - 2010/08/16, Jason Archer - Improved pipeline support and added detection of little endian BOMs. (http://poshcode.org/2075) | |
v1.2 - 2015/02/03, VertigoRay - Adjusted to use .NET's [System.Text.Encoding Class](http://goo.gl/XQNeuc). (http://poshcode.org/5724) | |
.LINK | |
http://goo.gl/bL12YV | |
#> | |
function Get-FileEncoding { | |
[CmdletBinding()] | |
param ( | |
[Alias("PSPath")] | |
[Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] | |
[String]$Path | |
, | |
[Parameter(Mandatory = $False)] | |
[System.Text.Encoding]$DefaultEncoding = [System.Text.Encoding]::ASCII | |
) | |
process { | |
[Byte[]]$bom = Get-Content -Encoding Byte -ReadCount 4 -TotalCount 4 -Path $Path | |
$encoding_found = $false | |
foreach ($encoding in [System.Text.Encoding]::GetEncodings().GetEncoding()) { | |
$preamble = $encoding.GetPreamble() | |
if ($preamble) { | |
foreach ($i in 0..$preamble.Length) { | |
if ($preamble[$i] -ne $bom[$i]) { | |
break | |
} elseif ($i -eq $preable.Length) { | |
$encoding_found = $encoding | |
} | |
} | |
} | |
} | |
if (!$encoding_found) { | |
$encoding_found = $DefaultEncoding | |
} | |
$encoding_found | |
} | |
} |
Credits
I came across some code on a PowerShell sharing site, POSHCode.org, that inspired me to do things a different way. So, I made the ammendments there as well. Unfortunately, since I’ve written this blog, it appears that POSHCode has gone down for the count:

Screenshot taken: June 21, 2017.
There is a typo in line 51. missing the “m” in the $preamble variable. should be
($i in 0..$preamble.Length)