You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scanning a file with unicode characters in its name outputs the name in a strange encoding that neither is UTF-8 nor the system encoding.
How to reproduce the problem
This was reproduced on a fresh Windows installation of clamav-1.4.2.win.x64.msi with system encoding CP1252.
I created an empty file called file_öταБЬℓσ.txt.
I ran the following Python script in the same directory as the test file I created. Ensure your editor is set to UTF-8. Replace the path to clamscan.exe:
The output text is encoded as CP437, with some characters escaped with "?". Try yourself in Python with b'\x94\xe7\xe0??l\xe5.txt'.decode('cp437'). I have no idea why. I expected it to output it either in UTF-8, which would be for the best, or CP1252 which is the system encoding of my Windows installation.
In the documentation it says that "As a side note, console output (stdin and stderr) will always be OEM encoded, even when redirected to a file.".
Output of PowerShell [System.Text.Encoding]::Default:
Checking configuration files in C:\Program Files\ClamAV
Config file: clamd.conf
-----------------------
ERROR: Please edit the example config file C:\Program Files\ClamAV\clamd.conf
Config file: freshclam.conf
---------------------------
DatabaseMirror = "database.clamav.net"
clamav-milter.conf not found
Software settings
-----------------
Version: 1.4.2
Optional features supported: MEMPOOL AUTOIT_EA06 RAR
Database information
--------------------
Database directory: C:\Program Files\ClamAV\database
WARNING: freshclam.conf and clamd.conf point to different database directories
bytecode.cvd: version 335, sigs: 86, built on Tue Feb 27 16:37:24 2024
daily.cvd: version 27528, sigs: 2072291, built on Fri Jan 24 10:40:27 2025
main.cvd: version 62, sigs: 6647427, built on Thu Sep 16 14:32:42 2021
Total number of signatures: 8719804
Platform information
--------------------
uname: Microsoft Windows 6.2 SP0.0 Build 9200
OS: Windows, ARCH: AMD64, CPU: AMD64
zlib version: 1.3.1 (1.3.1), compile flags: 65
platform id: 0x1025d4d40800000000000794
Build information
-----------------
Microsoft Visual C++: (0.7.148)
sizeof(void*) = 8
Engine flevel: 212, dconf: 212
Interestingly, running the command directly in the PowerShell terminal as & 'C:\Program Files\ClamAV\clamscan.exe' file_öταБЬℓσ.txt presents the output as file_öτα??lσ.txt which probably is due to the Encoder/Decoder best fit fallback as presented above in the output of [System.Text.Encoding]::Default. This can be remedied by running e.g. [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding("Windows-1252") or [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 right before clamscan.exe.
The text was updated successfully, but these errors were encountered:
ember91
changed the title
Strange character encoding of file name
Strange character encoding of file name in output
Jan 25, 2025
I just looked into this and found out that nothing's wrong. Apparently the OEM code page, whatever that is, is not the same as the system (ANSI) encoding code page. My OEM code page is 437 while the ANSI code page is 1252. One of them is returned by GetACP() and one by GetOEMCP(). Windows is strange I guess.
val-ms
changed the title
Strange character encoding of file name in output
Windows: Log outputs OEM format, shows strange character encodings for unsupported utf8 characters in file names
Jan 29, 2025
Describe the bug
Scanning a file with unicode characters in its name outputs the name in a strange encoding that neither is UTF-8 nor the system encoding.
How to reproduce the problem
This was reproduced on a fresh Windows installation of
clamav-1.4.2.win.x64.msi
with system encoding CP1252.I created an empty file called
file_öταБЬℓσ.txt
.I ran the following Python script in the same directory as the test file I created. Ensure your editor is set to UTF-8. Replace the path to
clamscan.exe
:Which outputs (I cut some of the output short with "..."):
The output text is encoded as CP437, with some characters escaped with "?". Try yourself in Python with
b'\x94\xe7\xe0??l\xe5.txt'.decode('cp437')
. I have no idea why. I expected it to output it either in UTF-8, which would be for the best, or CP1252 which is the system encoding of my Windows installation.In the documentation it says that "As a side note, console output (stdin and stderr) will always be OEM encoded, even when redirected to a file.".
Output of PowerShell
[System.Text.Encoding]::Default
:Output from
clamconf.exe -n
:Interestingly, running the command directly in the PowerShell terminal as
& 'C:\Program Files\ClamAV\clamscan.exe' file_öταБЬℓσ.txt
presents the output asfile_öτα??lσ.txt
which probably is due to the Encoder/Decoder best fit fallback as presented above in the output of[System.Text.Encoding]::Default
. This can be remedied by running e.g.[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding("Windows-1252")
or[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
right beforeclamscan.exe
.The text was updated successfully, but these errors were encountered: