Skip to content

[BUG] - [UTF-8 with BOM doesn't work easily with csv] #973

@ROMYIM

Description

@ROMYIM

Excel Type

  • XLSX
  • XLSM
  • CSV
  • OTHER

Upload Excel File

example).csv

MiniExcel Version

1.44.1

Description

When writing an UTF8 csv and then opening with Excel, it needs to be UTF8 with BOM, otherwise the UTF-8 characters will be messed up.
I've tried methods like,

    private static readonly Encoding _utf8WithBom = new UTF8Encoding(true); // note the `true` here
    private static readonly CsvConfiguration _csvConfiguration = new()
    {
        StreamWriterFunc = stream => new(stream, _utf8WithBom);
    };

doesn't work. Or,

  stream.WriteByte(0xEF);
  stream.WriteByte(0xBB);
  stream.WriteByte(0xBF);
  // ... proceeds to SaveAs()

doesn't work.

Solution

    private static readonly char _utf8bom = '\ufeff';

    private static readonly CsvConfiguration _csvConfiguration = new()
    {
        StreamWriterFunc = stream =>
        {
            StreamWriter sw = new(stream, _utf8WithBom);
            
            sw.Write(_utf8bom);
            return sw;
        }
    };

The root cause is a lifetime management defect in CsvWriter that prevents StreamWriter from reliably writing its preamble. This is a bug in MiniExcel, not in the .NET runtime.

Detailed Analysis

  1. MiniExcel's declared intent is correct
    In CsvConfiguration.cs, the default encoding is explicitly configured with BOM:
private static readonly Encoding DefaultEncoding = new UTF8Encoding(true);

public Func<Stream, StreamWriter> StreamWriterFunc { get; set; } 
    = (stream) => new StreamWriter(stream, DefaultEncoding);

new UTF8Encoding(true) is the standard .NET way to request a UTF-8 encoding whose GetPreamble() returns 0xEF 0xBB 0xBF.

  1. .NET StreamWriter design is correct (not a .NET bug)
    StreamWriter uses lazy preamble writing, which is a deliberate and correct design:

The constructor does not immediately write the BOM to the stream.
The BOM is deferred until the first Flush() (including implicit flush on Dispose()):

// From .NET StreamWriter.Flush()
if (!_haveWrittenPreamble)
{
    _haveWrittenPreamble = true;
    ReadOnlySpan<byte> preamble = _encoding.Preamble;
    if (preamble.Length > 0)
        _stream.Write(preamble);
}

This behavior is standard across all .NET versions.

  1. The bug is in MiniExcel's CsvWriter lifetime management
    The problem is that CsvWriter instances are never explicitly disposed in the SaveAs() call chain:
// In MiniExcel.SaveAs / stream.SaveAs
return ExcelWriterFactory.GetProvider(stream, value, sheetName, excelType, configuration, printHeader).SaveAs();

Here, a CsvWriter is created, SaveAs() is called, and then the instance is simply discarded. There is no using statement and no call to Dispose().

Looking at CsvWriter.Dispose():

protected virtual void Dispose(bool disposing)
{
    if (this._disposedValue)
        return;
    if (disposing)
        this._writer.Dispose();  // Only disposed when disposing == true
    this._disposedValue = true;
}

~CsvWriter() => this.Dispose(false);

When the CsvWriter is garbage collected, the finalizer calls Dispose(false). Per the standard .NET dispose pattern, when disposing is false, managed resources must not be disposed. Therefore, the internal _writer (StreamWriter) is never disposed.

Although CsvWriter.SaveAs() manually calls _writer.Flush(), StreamWriter's preamble is normally written during the complete Dispose() lifecycle. Because CsvWriter never disposes its internal StreamWriter, the preamble (BOM) is not reliably flushed to the underlying file stream, especially when the outer FileStream is closed independently.

  1. Why manual BOM injection also fails
    If a user tries to manually write BOM bytes before calling MiniExcel.SaveAs(string path, ...), it still fails because MiniExcel internally creates a new FileStream with FileMode.Create, which truncates the stream and discards any pre-written bytes.

Reproduction

// This configuration is supposed to produce BOM, but it doesn't.
var config = new CsvConfiguration(); // Default: UTF8Encoding(true)
MiniExcel.SaveAs("test.csv", data, excelType: ExcelType.CSV, configuration: config);

// Result: test.csv has NO BOM (0xEF 0xBB 0xBF missing at start)
Current Workaround
Explicitly write the BOM character via a custom StreamWriterFunc:

var csvConfig = new CsvConfiguration
{
    StreamWriterFunc = stream =>
    {
        var sw = new StreamWriter(stream, new UTF8Encoding(true));
        sw.Write('\ufeff'); // Explicitly emit BOM
        return sw;
    }
};
MiniExcel.SaveAs("test.csv", data, excelType: ExcelType.CSV, configuration: csvConfig);

Suggested Fix
Wrap CsvWriter usage with using in ExcelWriterFactory.GetProvider() or in the SaveAs()/Insert() call sites to ensure proper disposal.
Alternatively, call _writer.Dispose() or ensure _writer.Flush() includes preamble logic in CsvWriter.SaveAs() before returning.

The async methods are affected by the same root cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions