Download and parse CSV data in C#

30 September 21

The fourtoo GetFootballStats app makes use of historical football results data files in CSV format available at football-data.co.uk. The application's requirement is to

  • Ensure we can GET the resource at a given URI
  • Open a stream for reading
  • Read through each record in the CSV file, parsing to an object we use to query and update our database

A shortened example of the work done can be found at https://github.com/fourtootrobs/CsvParsingExample

We first implement a Typed Client in FootballDataService by following the information at Make HTTP requests using IHttpClientFactory in ASP.NET Core. Here we inject a HttpClient via our constructor and expose a method to begin CSV reading

public class FootballDataService
    : IFootballDataService
{
    private readonly HttpClient _httpClient;

    public FootballDataService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<HttpStreamCsvReaderContext> BeginReadingCsvAsync(string requestUri)
    {
        var responseMessage = await _httpClient.GetAsync(
            requestUri, 
            HttpCompletionOption.ResponseHeadersRead);

        responseMessage.EnsureSuccessStatusCode();

        var stream = await responseMessage.Content.ReadAsStreamAsync();

        return new HttpStreamCsvReaderContext(responseMessage, stream);
    }
}

Inside BeginReadingCsvAsync, the call to _httpClient.GetAsync sets the option HttpCompletionOption.ResponseHeadersRead. This stops further getting of the response data once headers have been read.

Once we have ensured a success status on the resource, we then get a stream with await responseMessage.Content.ReadAsStreamAsync(). The response message and stream are passed to a custom class HttpStreamCsvReaderContext

public class HttpStreamCsvReaderContext
    : IDisposable
{
    public CsvReader CsvReader { get; private set; }

    private readonly HttpResponseMessage _responseMessage;
    private bool _disposedValue;        

    public HttpStreamCsvReaderContext(
        HttpResponseMessage responseMessage,
        Stream stream)
    {
        _responseMessage = responseMessage;

        CsvReader = new CsvReader(
            new StreamReader(stream),
            CsvHelperConstants.DefaultCsvConfig);
    }

    public void Dispose() => Dispose(true);

    protected virtual void Dispose(bool disposing)
    {
        if (!_disposedValue)
        {
            if (disposing)
            {
                CsvReader?.Dispose();
                _responseMessage?.Dispose();
            }

            _disposedValue = true;
        }
    }
}

This class holds the response message and a CsvReader initialized with the stream. It implements IDisposable and in its Dispose method will dispose of both the CsvReader and HttpResponseMessage. This lets us dispose of all of the resources once we are finished with them in the class making use of them - in the wider application this is where we are iterating the records and using their data to query and update the database

In Program we can see a small example of this usage. The using keyword holds our resources until we are finished with them as described above and the CsvReader.GetRecordsAsync returns an IAsyncEnumerable so we can read each record and process them in turn without having to read the entire CSV contents to memory first

using var readerContext = await footballDataService.BeginReadingCsvAsync(
    "/mmz4281/1920/E0.csv");

await foreach (var row in readerContext.CsvReader.GetRecordsAsync&lt;FdHistoricalDataRowDto&gt;())
{
    logger.LogInformation(
        $"{row.HomeTeam} {row.FullTimeHomeGoals}:{row.FullTimeAwayGoals} {row.AwayTeam}");
}


© 2023 Tom Robson