PowerShell: Get-FileEncoding

VertigoRay/ February 4, 2015/ Uncategorized/ 1 comments

I often work in PowerShell, and one day I needed to create a script that would pull the file encoding out a file.

Encodings

However, this proved to be difficult since most encodings don’t require a BOM (Byte Order Mark). Here’s some good information that I found on the subject:

Automatically determining the correct encoding for a given byte array is notoriously difficult. Sometimes, to be helpful, the author of the data will insert something called a BOM (Byte Order Mark) at the beginning of the data. If a BOM is present, that makes detecting the encoding painless, since each encoding uses a different BOM.

However, the problem remains, how do you automatically detect the correct encoding when there is no BOM? Technically it’s recommended that you don’t place a BOM at the beginning of your data when using UTF-8, and there is no BOM defined for any of the ANSI code pages. So it’s certainly not out of the realm of possibility that a text file may not have a BOM. If all the files that you deal with are in English, it’s probably safe to assume that if no BOM is present, then UTF-8 will suffice. However, if any of the files happen to use something else, without a BOM, then that won’t work.

The Code

<#
.SYNOPSIS
Gets file encoding.
.DESCRIPTION
The Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM).
Based on port of C# code from http://www.west-wind.com/Weblog/posts/197245.aspx
.OUTPUTS
System.Text.Encoding
.PARAMETER Path
The Path of the file that we want to check.
.PARAMETER DefaultEncoding
The Encoding to return if one cannot be inferred.
You may prefer to use the System's default encoding: [System.Text.Encoding]::Default
List of available Encodings is available here: http://goo.gl/GDtzj7
.EXAMPLE
# This command gets ps1 files in current directory where encoding is not ASCII
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {[string]$_.Encoding -ne 'System.Text.ASCIIEncoding'}
.EXAMPLE
# Same as previous example but fixes encoding using set-content
Get-ChildItem *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {[string]$_.Encoding -ne 'System.Text.ASCIIEncoding'} | foreach {(get-content $_.FullName) | set-content $_.FullName -Encoding ASCII}
.NOTES
Version History
v1.0 - 2010/08/10, Chad Miller - Initial release
v1.1 - 2010/08/16, Jason Archer - Improved pipeline support and added detection of little endian BOMs. (http://poshcode.org/2075)
v1.2 - 2015/02/03, VertigoRay - Adjusted to use .NET's [System.Text.Encoding Class](http://goo.gl/XQNeuc). (http://poshcode.org/5724)
.LINK
http://goo.gl/bL12YV
#>
function Get-FileEncoding {
[CmdletBinding()]
param (
[Alias("PSPath")]
[Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)]
[String]$Path
,
[Parameter(Mandatory = $False)]
[System.Text.Encoding]$DefaultEncoding = [System.Text.Encoding]::ASCII
)
process {
[Byte[]]$bom = Get-Content -Encoding Byte -ReadCount 4 -TotalCount 4 -Path $Path
$encoding_found = $false
foreach ($encoding in [System.Text.Encoding]::GetEncodings().GetEncoding()) {
$preamble = $encoding.GetPreamble()
if ($preamble) {
foreach ($i in 0..$preamble.Length) {
if ($preamble[$i] -ne $bom[$i]) {
break
} elseif ($i -eq $preable.Length) {
$encoding_found = $encoding
}
}
}
}
if (!$encoding_found) {
$encoding_found = $DefaultEncoding
}
$encoding_found
}
}
view raw Get-FileEncoding.ps1 hosted with ❤ by GitHub

Credits

I came across some code on a PowerShell sharing site, POSHCode.org,  that inspired me to do things a different way. So, I made the ammendments there as well. Unfortunately, since I’ve written this blog, it appears that POSHCode has gone down for the count:

poshcode.org is almost here! Upload your website to get started.

Screenshot taken: June 21, 2017.

1 Comment

  1. There is a typo in line 51. missing the “m” in the $preamble variable. should be
    ($i in 0..$preamble.Length)

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
*
*