website stat

Forcing a file charset on LA[M]P

Today I faced a very queer situation. I was moving an old wiki between servers (Dokuwiki by the way) that had the content written in ISO-8859-15. In consequence, the meta tag Content-Type had to be set accordingly. And so it was. But Mozilla Firefox and Safari were reporting the file as being UTF-8 instead of ISO Latin 1 (= ISO-8859-15).

I double checked the meta tag and then I did an insightful

file -i doku.php

that returned the following

doku.php: text/plain; charset=us-ascii

I don’t know why the file information had changed after an scp between servers but I had to find an easy fix since changing every single file was not feasible. Besides, I didn't know how to change that through CLI (You can post on the comments if you like. I would be very appreciated).

My first attempt at fixing this targeted Apache. Set an ForceType 'text/html; charset=ISO-8859-15' php but the browsers were still detecting it as being UTF-8. Actually, not only the browsers showed this behaviour. W3C also detected as being unicode.

Next attempt stared at PHP. Before doing anything, let’s try sending an header to set the charset.

header("Content-type: text/html; charset=ISO-8859-15");

This worked. Files were now recognized by their correct charset :-)

Here’s a good reference on the subject.

P.S. - For those who may not know, LAMP stands for Linux Apache MySQL PHP and it is referred as being the killer combo — great set of tools. Some also say ‘P’ stands for Perl. Could it?


2 Responses to “Forcing a file charset on LA[M]P”

  1. andr3
    Published at December 8th, 2005 at 7:29 am

    Are you doing that straight in the code?

    Why not use a local php.ini to override the default_charset on that entire folder? That would be my attempt. ;)

  2. mlopes
    Published at December 8th, 2005 at 5:01 pm

    I did straight in the code because every other files are as UTF-8 and should be identified by the webbrowser as unicode. Using a local php.ini would also do it but I don’t think it’s worth the trouble. Putting a single line to override the charset detection is not that bad and I don’t really think that it adds an overhead worth worrying.