A Straightforward Way to Extend CSV with Metadata
bitwize 2021-08-19 17:07:30 +0000 UTC [ - ]
iask 2021-08-19 16:52:36 +0000 UTC [ - ]
If you think CSV is complicated for your app requirement choose a different delimiter like pipe, else look at other alternatives. Simple as that.
I’ve spent years building parsers for different document in the retail juggernaut businesses.
karmakaze 2021-08-19 16:24:00 +0000 UTC [ - ]
hinkley 2021-08-19 16:51:03 +0000 UTC [ - ]
For instance in a code signing situation it was socialized that only the archive was 'safe' and you used one of several tools to build them or crack them open. However that company was already used to thinking about physical shipping manifests and so learning by analogy worked, after a fashion.
swader999 2021-08-19 16:47:08 +0000 UTC [ - ]
41209 2021-08-19 16:14:51 +0000 UTC [ - ]
I don't really see a need for a metadata file, nor would I ever see Excel or other tools accepting it. The main problem is adoption, CSV isn't perfect but it's what we have. Now if you wrote this as a member of the Excel team at Microsoft, and then Excel had the option of exporting CSV files with a metadata file, then I'd be a bit more excited.
VenTatsu 2021-08-19 17:06:05 +0000 UTC [ - ]
Where I work we have offices in the US, and in Europe where installing a localized version of windows will swap ',' and '.' when used as the group and decimal separator. Excel when loading a value 100,002 in the US will see one hundred thousand and two, in some parts of Europe it will see one hundred and 2 thousandths.
Character set handing can be just as bad, there is no good way to get Excel to auto open a CSV file as UTF-8 that won't break every other CSV parser in existence. The only cross platform option is ASCII. Excel will happily load your local OS encoding, likely some variant of ISO-8859, but any other encoding requires jumping through hoops.
IanCal 2021-08-19 16:29:17 +0000 UTC [ - ]
Source: dealing with CSV files people exported from Excel and the horrors that flowed from there.
bitwize 2021-08-19 17:09:18 +0000 UTC [ - ]
It can't do this because it confuses such files with files in SYLK format, which was YET ANOTHER attempt to standardize spreadsheet data interchange, dating from the 80s.
swader999 2021-08-19 16:45:59 +0000 UTC [ - ]
Tagbert 2021-08-19 16:27:49 +0000 UTC [ - ]
Here are a couple of cases that I run into frequently:
* Excel is very aggressive about forcing type conversion based on its own assumptions. It will convert strings to dates or numbers, even if data is lost in the process. It will ignore quotes to convert long numeric IDs into scientific notation which truncates the ID unrecoverable.
* Excel cannot deal with quoted strings containing line breaks. It treats them as separate records and you get truncated records and partial records on separate rows.
isoprophlex 2021-08-19 16:39:34 +0000 UTC [ - ]
You can't even hope to keep a file intact upon opening...
Hackbraten 2021-08-19 16:28:52 +0000 UTC [ - ]
* format.txt
* mydata.csv
* .DS_Store
It will be there if the zip was created on a Mac so might as well include it in the standard.
kaeruct 2021-08-19 16:40:33 +0000 UTC [ - ]
If you are serious, then what about just ignoring any files in the zip that are not specified in the standard?
swader999 2021-08-19 16:49:20 +0000 UTC [ - ]
delusional 2021-08-19 16:43:40 +0000 UTC [ - ]
If you can decide this file format, couldn't you just normalize the CSV file instead?
barbazoo 2021-08-19 16:48:54 +0000 UTC [ - ]
I'm assuming it's in response to the post yesterday that outlined all the things that are wrong with CSV and how other formats like Parquet are better.
hinkley 2021-08-19 16:57:06 +0000 UTC [ - ]
I know I had specific conversations about CSV versus XML and those referred to a substantial body of literature on the topic.
dec0dedab0de 2021-08-19 17:02:38 +0000 UTC [ - ]
dreyfan 2021-08-19 17:07:53 +0000 UTC [ - ]