Opened 17 years ago
Closed 17 years ago
#4852 closed defect (fixed)
backslashes and all text after them is lost on importing CSV files
Reported by: | Uwe Stöhr | Owned by: | |
---|---|---|---|
Priority: | high | Milestone: | 1.5.6 |
Component: | converters | Version: | 1.5.4 |
Severity: | major | Keywords: | dataloss |
Cc: | jamatos@… |
Description
Open the attached CSV-file with an editor and import it to LyX.
Result: The backslashes and the text in the line after the backslashes is lost
after importing.
Attachments (9)
Change History (33)
by , 17 years ago
Attachment: | TEK00000.CSV added |
---|
comment:1 by , 17 years ago
Keywords: | dataloss added |
---|---|
Milestone: | → 1.5.6 |
comment:2 by , 17 years ago
cvs2lyx currently has a pretty dumb parser (good enough for many cases, but
gloriously failing in others).
Hartmut is working on a complete rewrite of the parser AFAIK.
comment:3 by , 17 years ago
Status: | new → assigned |
---|
- If you look into "Common Format and MIME Type for Comma-Separated Values
(CSV) Files" (http://tools.ietf.org/html/rfc4180), you will see that \ or
is
not allowed.
- I attach a new version that uses a parser in a csv.so file, but I need help
for testing:
2.a I only get it running if csv.so is located where csv2lyx is located.
2.b It produces a Python error, which I don't understand, but on the command
line a .lyx-file will be written.
comment:4 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
If you look into "Common Format and MIME Type for Comma-Separated Values
(CSV) Files" (http://tools.ietf.org/html/rfc4180), you will see that \ or
is not allowed.
OK, then I can mark this bug report as invalid.
(I added the backslashes only for testing purposes not knowing that they aren't
allowed in the CVS specifications.)
So from my point of view the CSV-importer works fine, except that the user
cannot specify the column separator via a dialog in LyX. This is now #4584.
Note also that indeed the comma and not the tab is the default column separator
according to
http://en.wikipedia.org/wiki/Comma-separated_values
and
http://de.wikipedia.org/wiki/CSV_%28Dateiformat%29
comment:5 by , 17 years ago
Of course, the "default" column separator is ',' but as the original version of
csv2lyx had no parser at all, Tab as column separator was the best choice
because normally there are not much Tab's in tables.
Programs like OO and Tellico (maybe Excel, too) allow to use your own column
separator.
comment:6 by , 17 years ago
So from my point of view the CSV-importer works fine, except that the user
cannot specify the column separator via a dialog in LyX.
Actually you can do it with Tools->Preferences->File-Handling->Convertor->CSV -
LyX, filed Convertor.
comment:8 by , 17 years ago
new version using Python's csv module
Good work!
You only missed to close the LyX file. I added a method to automatically detect
the column separator:
http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg140099.html
I guess José will find some issues to improve, so I sent it to the list for review.
I'll try to get some Excel files tomorrow.
comment:9 by , 17 years ago
Uwe, your method to automatically detect the column separator does not work
properly for three reasons:
- Even if the user specifies -s ',' you are detecting
- If there are several counts with the maximum, the last one is taken
- Put a colon in every row of TEK00000.CSV, and see what you get.
The csv module provides a "sniff"-method to deduce the format of a CSV file.
Let me check how it can be used.
comment:10 by , 17 years ago
Subject: Re: backslashes and all text after them is lost on importing
CSV files
Uwe, your method to automatically detect the column separator does not work
properly for three reasons:
- Even if the user specifies -s ',' you are detecting
Agreed.
- If there are several counts with the maximum, the last one is taken
Yes, but when a file has this, it is broken anyway.
- Put a colon in every row of TEK00000.CSV, and see what you get.
What do you mean? I get the colon as separator as expected. Note when one uses comas as separator
one doesn't use any of the other posible separator characters in every line.
The csv module provides a "sniff"-method to deduce the format of a CSV file.
Yes José has reworked the csv2lyx script using this - works fine. Please also test.
comment:11 by , 17 years ago
- If there are several counts with the maximum, the last one is taken
Yes, but when a file has this, it is broken anyway.
No, e. g. you can have blanks in your cells
- Put a colon in every row of TEK00000.CSV, and see what you get.
What do you mean?
see the attached file
The csv module provides a "sniff"-method to deduce the format of a
CSV file.
Yes José has reworked the csv2lyx script using this - works fine. Please
also test.
Where is it?
comment:13 by , 17 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
After a discussion with Hartmut we decided to fix also bugs in non-CSV conform
CSV files, like in comment 18.
So this bug, that text after a backslash is lost, is a valid one.
comment:14 by , 17 years ago
I don't see any possibility to fix this.
José, can you have a look on
http://docs.python.org/lib/module-csv.html
and on
/usr/lib/python2.5/csv.py
please. Maybe you find something.
comment:16 by , 17 years ago
Actually it is a LyX problem. If you take Uwe's test file (attachment 2545),
and cat the produced .lyx-file or view it in an editor, you will see that the
\ (backshlash) are still there.
comment:17 by , 17 years ago
Thanks for the report.
We need to escape the backslash in the imported text. I have committed
the fix both to 1.6 and to 1.5.
Please test.
comment:19 by , 17 years ago
I tried the test file above and it works for me in rev. 24872.
Could you send me the generated lyx file, please?
comment:20 by , 17 years ago
Sorry, José, I haven't seen that you modified csv2lyx.py. Everything's fine now.
comment:22 by , 17 years ago
Keywords: | fixedinbranch fixedintrunk added |
---|
fixedintrunk and branch:
http://www.lyx.org/trac/changeset/24862
http://www.lyx.org/trac/changeset/24863
comment:23 by , 17 years ago
Keywords: | fixedinbranch fixedintrunk removed |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
1.5.6 is ready. Marking FIXED.
comment:24 by , 11 years ago
Component: | import → converters |
---|
CSV-testfile