Opened 16 years ago

Closed 16 years ago

#4946 closed defect (fixed)

unicode combining characters export problems

Reported by: ps@… Owned by: ben.bob@…
Priority: high Milestone: 1.5.6
Component: latex export Version: 1.5.5
Severity: normal Keywords:
Cc: j.spitzmueller@…, georg.baum@…

Description

1.new file, display View source pane
2.document->settings->language->encoding->utf8x
3.put into the command buffer:
accent-tilde w
accent-tilde w

  1. now the second tilde get wrong metrics where its painted.

many different odd paintings can be obtained this way.

i guess it has something to do with unicode combining characters.

Attachments (4)

4946.diff (575 bytes ) - added by Juergen Spitzmueller 16 years ago.
patch (against branch)
4926-2.diff (17.6 KB ) - added by Juergen Spitzmueller 16 years ago.
consider force flag in non-plain utf8 encodings
x.diff (17.7 KB ) - added by Georg Baum 16 years ago.
consider force flag in utf8 encodings handled by inoputenc
4926-3.diff (17.9 KB ) - added by Juergen Spitzmueller 16 years ago.
yet another patch

Download all attachments as: .zip

Change History (24)

comment:1 by milde@…, 16 years ago

This also applies to 1.5.5 with encodings utf8, UTF8, utf8-plain.

It has to do with unicode combining characters used by LyX for the accent-*
lfun.

While in non-unicode encodings, this is tranlated to a accent function like
\~{w}, the latex export in unicode encodings is clearly broken in 1.5.5
(is this fixed in 1.6?):

The accented character is surrounded by braces { } which prevents the
combining character from getting on top of it.

It not only looks odd in the source code but leads to errors when converting.

However, exept for XeTeX, no latex variant is really ready to handle
combining characters in the input:

with utf8:

Unicode char \u8~ not set up for use with LaTeX

with utf8x:

~{w}
Composed characters can only be rendered correctly, when the option

'combine' is activated.


As the option 'combine' is fragile (see the documentation to the ucs package),
LyX should rather:

  • not use combining characters but convert to corresponding accented unicode character at input if such a character exists. (also facilitates editing and display in LyX)


  • export combining characters as LaTeX accent commands for unicode encodings as well as for non-unicode encodings.


Exception: utf8-plain, where it should most probably be passed as-is.



comment:2 by Georg Baum, 16 years ago

Günter, what you describe is a different problem, this bug is about display of
combining characters inside LyX. This problem does not appear in version 1.5.5.

Concerning your problem: The export is indeed broken in 1.5.5 (please file a
new report): The braces are only meant to be used in case the accent is output
as a command (\~) and not the unicode code point of the combining character.

comment:3 by Juergen Spitzmueller, 16 years ago

Cc: j.spitzmueller@… georg.baum@… added
Summary: View Source unicode characters painting problemsunicode combining characters export problems
Version: 1.6.0svn1.5.5

I think Pavel and Günther are talking about the same think: The LaTeX export.
Note that Pavel is talking about the view source representation (but "painting"
might be misleading). Correct, Pavel?

Anyway, I do not see a painting problem (i.e. inside the main view) neither in
trunk nor in branch, but I see the export problem in both trees. Changing the
summary.

by Juergen Spitzmueller, 16 years ago

Attachment: 4946.diff added

patch (against branch)

comment:4 by Juergen Spitzmueller, 16 years ago

Component: dialogslatex export

comment:5 by Georg Baum, 16 years ago

Yes, I see nothing wrong with the patch.

comment:6 by Juergen Spitzmueller, 16 years ago

I've committed a slightly better patch to trunk that fixes the export:
http://www.lyx.org/trac/changeset/25347

Note that LaTeX produces an error, since inputenc does not support combining
characters. Unfortunately, setting the "force" flag for the combining character
does not help, since the utf8 encodings actually ignore that flag (which might
be a separate bug).

comment:7 by Georg Baum, 16 years ago

If they ignore the flag it was probably by design. Concerning the incomplete
utf8 support in standard LaTeX I believe it would be a good idea to ignore the
force flag only in the utf8-plain encoding.

comment:8 by Juergen Spitzmueller, 16 years ago

FWIW, I completely agree with comment 7.

by Juergen Spitzmueller, 16 years ago

Attachment: 4926-2.diff added

consider force flag in non-plain utf8 encodings

comment:9 by Georg Baum, 16 years ago

The patch looks good, but there is one detail that I would do differently:
Instead of

if (!forced_.empty()
iconvName_ != "UTF-8" name_ == "utf8-plain")

I'd rather use

if (!forced_.empty()
iconvName_ != "UTF-8" package_ != inputenc)

. It does not make any difference for the existing encodings, but the default
for new encodings a user might define is IMHO better: Obey the force flag iff
inputenc is used.

And "forced_.empty()
forced_.find(c) == forced_.end()" is equivalent

to "forced_.find(c) == forced_.end()".

by Georg Baum, 16 years ago

Attachment: x.diff added

consider force flag in utf8 encodings handled by inoputenc

comment:10 by Juergen Spitzmueller, 16 years ago

I'd rather use

if (!forced_.empty()
iconvName_ != "UTF-8" package_ != inputenc)

It does not make any difference for the existing encodings,

actually, it does make a difference in the UTF8 encoding that does not use
inputenc, but CJK. I just tested: in this encoding, the fix is needed as well.
Should we check for package == none instead?

And "forced_.empty()
forced_.find(c) == forced_.end()" is equivalent

to "forced_.find(c) == forced_.end()".

I have the impression that the extra empty() check makes the generation of the
output quite a bit faster. Try view source with the UG.

I overlooked that the complete_ flag was not used anymore for utf8 encodings.

I don't understand.

comment:11 by Georg Baum, 16 years ago

Subject: Re: unicode combining characters export problems

Should we check for package == none instead?

IMHO yes.

I have the impression that the extra empty() check makes the generation

of the

output quite a bit faster. Try view source with the UG.

Then leave it as is of course. I did not imagine that this makes a
difference. Maybe the creation of the std::map iterators is too expensive,
since std::map::find() itself should be quite fast if the map is empty.

I overlooked that the complete_ flag was not used anymore for utf8

encodings.

I don't understand.

complete_ tells whether the information about a specific encoding is
complete. In your patch you do not use this for the utf8 case, but rather
look whether forced_ is empty. IMHO it is better to use complete_ also for
the utf8 encodings. BTW I forgot to set complete_ = false for the utf8
encodings in the Encoding constructor.

comment:12 by Juergen Spitzmueller, 16 years ago

The problem with your approach is that forced_ is rebuilt for any (non-plain)
utf8 encoding, which is not needed, since the result will be always identical.

Checking for forced_.empty() assures that the list is only built once, which
results in significant speedup.
(theoretically, we could also build the list in the constructor).

comment:13 by ps@…, 16 years ago

fyi the bug was not intended to be about export problem, but view source pane.
anyway i can't reproduce the problem now, so ok to changing summary. note also
that editing combining character is broken - for example removing accent-tilde
via delete key when this character is not in the begining of the line.

all this is a part of extensive thread in users list:
http://www.mail-archive.com/lyx-users@lists.lyx.org/msg65303.html
http://www.mail-archive.com/lyx-users@lists.lyx.org/msg65311.html

i will try to raise greek support question devel list soon.

comment:14 by Juergen Spitzmueller, 16 years ago

note that the view source pane _uses_ exported LaTeX, thus it is about export.

by Juergen Spitzmueller, 16 years ago

Attachment: 4926-3.diff added

yet another patch

comment:15 by Georg Baum, 16 years ago

You obviously know this stuff better than I do, as you have a good answer to
every nitpick of mine. Therefore I have nothing to complain anymore ;-)

comment:16 by Juergen Spitzmueller, 16 years ago

Keywords: fixedintrunk added

comment:17 by Uwe Stöhr, 16 years ago

Milestone: 1.6.0

comment:18 by Juergen Spitzmueller, 16 years ago

Keywords: fixedinbranch added
Milestone: 1.6.01.5.6

comment:19 by Juergen Spitzmueller, 16 years ago

Keywords: fixedinbranch removed
Resolution: fixed
Status: newclosed

1.5.6 is ready. Marking FIXED.

comment:20 by Vincent van Ravesteijn, 10 years ago

Keywords: fixedintrunk removed
Note: See TracTickets for help on using tickets.