Igor Kulman

Detecting encoding of uploaded file in ASP.NET MVC

· Igor Kulman

Uploading a file in ASP.NET MVC is very easy, but there is no easy way to detect the encoding of a uploaded text file. However you can use the fact if you try to read the file with a wrong encoding, you get an DecoderFallbackException. So how do you put everything together?

First, get a stream of the uploaded file.

[HttpPost]
public ActionResult FromCSV(HttpPostedFileBase file)
{
    if (file != null && file.ContentLength > 0)
    {
        var stream = file.InputStream;
        ...
    }
}

Next, read the whole file to a byte array

public static byte[] ReadFully(Stream input)
{
    byte[] buffer = new byte[16 * 1024];
    using (MemoryStream ms = new MemoryStream())
    {
        int read;
        while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
        {
            ms.Write(buffer, 0, read);
        }
        return ms.ToArray();
     }
}

Finally the trick is to try all the encodings you think the file may be in and chech if if fails or not

private static string[] GetFileContent(Stream input)
{
    Encoding[] encodings = new Encoding[]{
        Encoding.GetEncoding("UTF-8", new EncoderExceptionFallback(), new DecoderExceptionFallback()),
        Encoding.GetEncoding(1250, new EncoderExceptionFallback(), new DecoderExceptionFallback())
    };
    var inputArr = ReadFully(input);
    String result = null;

    foreach (Encoding enc in encodings)
    {
        try
        {
            result = enc.GetString(inputArr);
            break;
        } catch (DecoderFallbackException e){}
     }

     return result;
}

See also