Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes progress on getodk/central#589, adding utility functions related to CSV files.
parseCSVHeader()
parses the header row of a CSV file,parseCSV()
parses the data, andformatRow()
displays a row in CSV format.What has been done to verify that this works as intended?
This PR just adds utility functions; it doesn't make user-facing changes. I've added unit tests of the utility functions.
Why is this the best possible solution? Were any other approaches considered?
npm package
One important decision was around which npm package to use for CSV parsing. I first tried
csv-parse
, since that's what we use on Backend. However, I ran into a couple of issues:csv-parse
use in the browser, a CSV string is passed to theparse()
function. Ideally, we'd be able to pass aFile
or aFileReader
or aReadableStream
fromBlob.prototype.stream()
. It looks like there may be a way to convert aFile
to a Node-style stream that could be used withcsv-parse
, but I was hoping for something simpler. Related: Add support for WhatWG streams adaltas/node-csv#242to
, I saw an error in the browser console: "Uncaught TypeError: "listener" argument must be a function". I think that error stems fromoption.to
breaks the stream and exits the program adaltas/node-csv#333. That said, it looked like parsing was able to complete despite the error logging. But to avoid the error, I tried doing some preliminary parsing, grabbing the substring of the CSV string before the first newline character and passing that tocsv-parse
. That approach allowed me to avoid optionto
, but it's again at odds with streaming. It also won't return the correct result if a column header contains a newline.I gave a different package Papa Parse a try instead, and I've been liking it. It's easy to stream the file, and I didn't have trouble parsing the header row. It also seems markedly faster than what I was doing with
csv-parse
(where I was callingtext()
on theFile
, then passing the resulting string to theparse()
function). I tried parsing a file with 4 columns (3 properties + label):csv-parse
timeAnother little benefit of Papa Parse is the ability to easily abort parsing. I'm planning to set it up so that if the user leaves the upload modal, any ongoing CSV parsing is aborted.
Parsing the header and data separately
As mentioned above, there are separate functions for parsing the header vs. parsing the data.
parseCSV()
parses the data and expects to receive an array of column headers. That array comes from parsing the header row usingparseCSVHeader()
.I've set it up this way so that if there is an error in the header, the rest of the CSV file is not parsed. The upload modal will show errors about the header, and I want those to be shown as soon as the header has been parsed and validated.
That could still have been done in a single function, say, by passing a
validateHeader
option to the function. However,parseCSV()
is already fairly complex, and I didn't want to make it even more so.parseCSVHeader()
andparseCSV()
also handle errors pretty differently. If Papa Parse finds errors in the header (of codeMissingQuotes
orInvalidQuotes
),parseCSVHeader()
will not return a rejected promise and will instead return the errors along with Papa Parse's best attempt to parse the header. That's because the upload modal will display both the errors and the header: the two are needed together. On the other hand,parseCSV()
returns a rejected promise if there is an error with the data. In that case, the upload modal simply shows a danger alert.parseCSV()
only returns data if it was successful. At that point, it may also return warnings about the data.parseCSVHeader()
has no concept of warnings.I don't think we lose much by separating
parseCSVHeader()
andparseCSV()
. There end up being some conceptual differences between the two, so I think it's useful to do so.Returning promises
Another part of the approach I took was to wrap Papa Parse in a promise. Papa Parse follows more of a callback pattern, but it is more convenient to work with promises.
parseCSVHeader()
andparseCSV()
return promises, using Papa Parse callbacks under the hood. Also, Papa Parse provides its own API for aborting parsing, but in other places in Frontend (around requests), we useAbortSignal
s.parseCSVHeader()
andparseCSV()
will accept an optionalAbortSignal
, then use theAbortSignal
to trigger the abort mechanism in Papa Parse. That way, most code can just use promises andAbortSignal
s, and for the most part, only src/util/csv.js needs to be concerned with the mechanics of Papa Parse.Before submitting this PR, please make sure you have:
npm run test
andnpm run lint
and confirmed all checks still pass OR confirm CircleCI build passes