Opened 16 years ago

Closed 15 years ago

Last modified 8 years ago

#4971 closed defect (fixed)

Document>Settings>Language Encoding obscurity

Reported by: milde@… Owned by: nobody@…
Priority: high Milestone: 1.6.0
Component: dialogs Version: 1.5.5
Severity: normal Keywords:
Cc: j.spitzmueller@…

Description

The Language tab of the Document>Settings dialogue should be made more
straightforward and in-line with other Settings dialogs.

  1. Replace the separate

[x] Use languages default encoding

with a [Language Default] entry in the drop down list.

A separate tick box for the default entry makes only sense if it toggles a
more complex formular like in the "Page Margins" tab.

In the Language tab, the situation is rather comparable to "Fonts", where
the Default is also included in the drop-down list.

  1. Change [LaTeX default] GUI name for \inputencoding default

The somewhat misleading GUO name triggered bugreport 4968.
Comment #6 clarifies:

The "LaTeX default" encoding is defined to be identical with "Use
language encoding" with the only difference that "LaTeX default" does
not load any inputenc related packages.

It is the users responsibility to load needed packages

[and eventually insert ERT to tell latex the correct encoding or
text parts in a different language].

Otherwise LaTeX errors like the reported ones are expected. This
encoding is only useful for experts in rare cases (I forgot which ones,
but they were useful, there have been discussions during the 1.5
development cycle).

I can follow the choice of the name element "default" in analogy to
other document settings where "default" signifies settings that do not
use extra packages.

However, in my understanding, "default" implies a recommended or fallback
setting which is save to use in most circumstances but not a setting
"only useful for experts in rare cases".

The "save bet" for LaTeX without the inputenc package is to use 7-bit
ASCII chars only, i.e. \inputenc ascii. Therefore I would regard the
[ascii] encoding to be "LaTeX default".

My recommendation would be:

New Current


[Language Default ] [x] Use languages default encoding
[Language Default (no inputenc)] LaTeX default
[ASCII (7-bit) ] ascii
[armscii8 ] no change
...

but this is open for discussion.

Change History (18)

comment:1 by milde@…, 16 years ago

* #4968 has been marked as a duplicate of this bug. *

comment:2 by Juergen Spitzmueller, 16 years ago

Note that "Language Default" != "LaTeX Default"

comment:3 by Juergen Spitzmueller, 16 years ago

Cc: j.spitzmueller@… added

My opinion: The UI should be kept as is. The extra checkbox indicates that the
user should stick with the language default encoding, if he doesn't has a
rationale not to do so. In the combo, the default would just be one of several
choices.

I agree, however, that "LaTeX default" is misleading and should be renamed.

comment:4 by milde@…, 16 years ago

After reading the comments, some more thoughts and input from #4966,
I put forward a revised proposal with the 3 main points:

  1. Keep the extra check-box (indicates the default that is the "save bet" in most cases and recommended by the LyX developers).


  1. Use clear and descriptive names to avoid the impression of "Word-like behaviour, where things work in a certain way depending on the phase of the moon." [Comment #33 in #4966]
  1. Re-order the combo items so that the "latex default" expert-option is no longer the default once the check-box is unchecked.

Tooltips for the different combo box items would be an added bonus.

Problematic GUI names
=====================

Language Default



This setting uses the encodings defined in LYXDIR/languages

  1. The (global) document encoding depends on the document language.
  1. Text parts in a different language use this language's default encoding (inputenc commands for encoding change are inserted in the text).


But:

Manual setting of the Encoding to the document language's default will
use this encoding also for parts in a different language.

Consequence:

"Already now things work in an unacceptable way, as if the language's
default encoding is latin1, for example, selecting "Use default
language's encoding" or unchecking it and directly selecting "latin1",
leads to different behaviour." [Comment #33 in #4966]


Proposal:

Use a name that makes this difference obvious, e.g. one of

  • Language dependent
  • Adaptive
  • Mixed

LaTeX Default


Note that "Language Default" != "LaTeX Default"

well, otherwise there would be no need for both of them ;-)

However, if there is any such thing as a LaTeX *Default* encoding, this is
IMO either

  • the encoding babel would choose for a given language (i.e. language default), or


  • pure 7-bit ASCII.

The name "LaTeX Default" by no means implicates the features of this
setting:

  1. The Unicode characters in the LyX source are converted into the encodings defined in LYXDIR/languages.


  1. Text parts in a different language are converted into a different encoding (if not by chance they happen to have the document encoding as default).


  1. TeX is not informed about the encoding of the text (and text parts) (no loading of inputenc, no global encoding set, no encoding change commands)

As long as LyX source and LaTeX source were both in the same
locale-dependent 8-bit encoding, this was acceptable.

With the change to Unicode, the encoding of LyX source and LaTeX source
differ (except for languages like Farsi and Vietnamese) and the encoding of
the LaTeX file depends no longer on the locale setting but the content of
LYXDIR/languages.

Consequence:

Hard to predict behaviour in a setting that announces itself as "Default"
and is pre-selected when unticking "Language default".


LaTeX output in an encoding that is possibly un-processable on the
host system because of a "strange" encoding.


Mix of encodings in the LaTeX source without indication.


Proposal:

a) Remove this setting (replace with utf8-plain, the UTF-8 equivalent of

the "do not use inputenc" feature)

or, find out what it is needed for and


b) * Do not use the content language dependent encoding but the system

locale's default encoding for the lyx-source (Unicode)->latex-source
conversion. This also automatically ensures that the whole latex
source uses one single encoding also if the text part uses different
languages (with different default encodings).


  • Rename to "Locale default" or "Locale default (plain)"


  • Put in last position of the combo box items.


UTF8


The difference between utf8 and UTF8 is not obvious.

Proposal:

Rename UTF8 to UTF8 (CJK) in the GUI

Alternative:

utf8
utf8x (ucs)
UTF8 (CJK)
utf8-plain (XeTeX)


8859


Please normalise the names

iso88595
8859-6
iso-8859-7
8859-8

in the GUI.

Proposal:

iso8859-<digit> (as used in LYXDIR/languages)


comment:5 by Juergen Spitzmueller, 16 years ago

We have to ponder all of this carefully, but in general, the critique is valid.

comment:6 by Georg Baum, 16 years ago

IMHO neither option a) nor option b) make sense. I remember lengthy discussions
about how to convert old documents in "LaTeX default" encoding to unicode, and
the outcome was that it was worth it to keep this choice. Why would you want to
change this to the locale encoding? Then the document may be uncompilable,
depending on the users environment. Note that

"As long as LyX source and LaTeX source were both in the same
locale-dependent 8-bit encoding, this was acceptable."

is wrong (at least since version 1.2). As far as the LaTeX output is concerned
it works in LyX 1.5 exactly the same as in older releases. The only difference
is that the encoding of the .lyx file may now be different from the one of
the .tex file (since it always uses utf8), and now you are not limited to
singlebyte encodings like the iso8859 family, and therefore it is easier to
shoot yourself into the foot.

IMHO, the following items make sense in addition to numbers 1.-3. at tzhe
beginning of comment 4:

c) Don't display the LaTeX name of an encoding in the GUI, but introduce a
translatable GUI name and use that. The LaTeX names are incosistent for
historical reasons. A good example how to name encodings GUI-wise is kate
(e.g. "western european (iso 8859-1)". In addition to the names you already
mentioned I'd throw in "Automatic" for "Language default" and "Automatic (raw)"
for "LaTeX default". IMHO there does not need to be any encoding named "LaTeX
default".

d) Optionally, give better guidance in the docs about encoding choice: List
common problems and solutions.

e) Optionally, re-evaluate the reasons for having "LaTeX default", and if they
are no longer valid, ditch it. Otherwise, document it.

comment:7 by milde@…, 16 years ago

Thanks for the comments and for setting me right about the "locale
dependent" encoding.

A new proposal:

Mixed encoding option


Introduce a separate tic box in the Encodings tab for

[x] Flexible: language-dependent encodings for text parts in

different languages.

as opposed to a uniform encoding for the whole document.

Currently,

  • "Language default" implies "flexible", while
  • all but on of the items from the combo box imply a "uniform" encoding.


Exception: "LaTeX default" also uses "flexible" encoding
(with the need to specify the encoding as ERT).


(Question: Did "LaTeX default" in LyX < 1.5 use a "uniform" document-wide

encoding?)

With the new option:

  • It would be possible to specify a document encoding other than the "language default" (latin-1 instead of latin-9, say) but still have "flexible" encoding for parts in foreign languages.
  • It would be possible to let LyX choose the default encoding for the document language but force a uniform encoding for the whole document.

Advantages:

  • with the transformations defined in unicodesymbols, a uniform encoding is more predictable and "in our hands" (See problems with iso-8859-7 in bug #4966)
  • Avoid the behaviour criticised in Comment #33 in #4966

... if the language's default encoding is latin1, for example, selecting
"Use default language's encoding" or unchecking it and directly selecting
"latin1", leads to different behaviour.

GUI names


An advantage of using LaTeX names in the GUI is, that it is easier to track
down the used definition file and find documentation.

This could be further facilitated by appending the used package from the
<package> field in encodings (replacing "none" with "plain").

Maybe the encodings file could be amended with a <comment> field for
some documentation that is either shown in the combo-box list or as tool-tip.
Many encodings definitions already have small comments now.

For the current "Language default" and "LaTeX default", I propose

"Language default"

and

"Language default (plain)"

in the style of the established names "utf8" and "utf8 (plain)", where
(plain) indicates (and should be documented as) "do not use inputenc".

Misc


Some feedback about the "Language default" would be nice as well.
Maybe pre-selection of the language default encoding in the combo-box.

Documentation is discussed in the lyx-docs list and worked on.

comment:8 by Georg Baum, 16 years ago

(Question: Did "LaTeX default" in LyX < 1.5 use a "uniform" document-wide

encoding?)

No. The only two differences between "LaTeX default" in 1.5 and earlier
versions are

  • the encoding of the .lyx file (mixed in pre-1.5, utf8 in 1.5)
  • the possibility to use all unicode characters in the .lyx file in 1.5

Advantages:

  • with the transformations defined in unicodesymbols, a uniform encoding is more predictable and "in our hands" (See problems with iso-8859-7 in bug #4966)

You could easily achive the same goal by setting a fixed encoding manually.

  • Avoid the behaviour criticised in Comment #33 in #4966

... if the language's default encoding is latin1, for example, selecting
"Use default language's encoding" or unchecking it and directly selecting
"latin1", leads to different behaviour.

That should not happen of course. Unfortunately the new option would not change
this bug, it would only allow another workaround.

An advantage of using LaTeX names in the GUI is, that it is easier to track
down the used definition file and find documentation.

This could be further facilitated by appending the used package from the
<package> field in encodings (replacing "none" with "plain").

The disadvantage is the non-uniform naming you critizised. I still believe that
an extra GUI name would be good, but of course it could be augmented with LaTeX
name and package name in a tool tip.

comment:9 by milde@…, 16 years ago

  • with the transformations defined in unicodesymbols, a uniform encoding is more predictable and "in our hands" ...

You could easily achive the same goal by setting a fixed encoding manually.

However, there is also comment #3:

The extra checkbox indicates that the user should stick with the
language default encoding, if he doesn't has a rationale not to do so.

With the transformations defined in unicodesymbols, is there still a
rationale to prefer a mixed, language default encoding in the LaTeX
source?

(In view of the major difference between the mixed and uniform encoding,
I agree that it is better to keep the checkbox.)

Currently, "mixed" encoding is tied to "language default" (and the
misnamed "LaTeX default"), while specifying an encoding implies a global,
uniform document encoding.

This does not care for 2 cases:

  1. uniform encoding in the language default

This can be achieved by manually setting the encoding to the language
default.


It would be made easier, if the language default were preselected in
the combo box. (Is this doable with reasonable effort?)


  1. mixed encoding with main encoding different from languagee default

This is currently impossible.


E.g. German document in iso8859-1 encoding (instead of the German
default of iso8859-15) with Russian parts in Russian default (koi8).

Is there a workaround for case 2? Is it never needed?

comment:10 by Juergen Spitzmueller, 16 years ago

As a first step, I've implemented GUI encoding names to trunk:
http://www.lyx.org/trac/changeset/25558

comment:11 by milde@…, 16 years ago

Thanks for the GUI names change. Its a step in the right direction.

Still problematic is the [LaTeX default] entry:

"only useful for experts in rare cases" but on a prominent first (and
pre-selected) place in the list of Other encodings.

  • Could this entry be placed at the *last* position in the encodings list?
  • Could its GUI name be changed?

Proposals:

  • Language default (raw)
  • Language default (no inputenc)
  • Language default (plain)

comment:12 by Juergen Spitzmueller, 16 years ago

I agree with the general proposal, however, I'm not convinced by the naming
proposals.

How about "Pass-through (no inputenc)"?

comment:13 by milde@…, 16 years ago

How about "Pass-through (no inputenc)"?

No, "Pass-through" would (since 1.5) imply UTF8, i.e.
"Pass-through (no inputenc)" is a correct description of
what is currently called "Unicode (XeTeX) (utf8)"¹.

The misnamed "LaTeX default", however, uses the language default encoding
(as defined in LYXDIR/languages) which in most cases implies a recoding from
utf8 to something else.

Also, both "Language default" and "LaTeX default" use "local encodings":
text parts in a differnt language will be encoded in this language's default
encoding which might differ from the rest of the document.

Therefore, I vote for a similar naming of these two settings, e.g. one of

  • Language dependend
  • Language specific
  • Automatic

with one of (plain) or (no inputenc) appended for the current
"LaTeX default".

¹ The name of "Unicode (XeTeX) (utf8)" might better be changed to e.g.

"Unicode (XeTeX) (no inputenc)", as this setting also does away with the

\usepackage[utf8x]{inputenc}

line in the document preamble.

comment:14 by Juergen Spitzmueller, 16 years ago

Keywords: fixedintrunk added
Milestone: 1.6.0

OK, I renamed it to "Language Default (no inputenc)":
http://www.lyx.org/trac/changeset/26281

Placing it in the last position of the combo is too tricky. I think the
renaming is sufficient.

The name of "Unicode (XeTeX) (utf8)" might better be changed to e.g.
"Unicode (XeTeX) (no inputenc)"

No. utf8 is important, since Unicode does not necessarily mean utf8. And "no
inputenc is redundant, since LaTeX-literate XeTeX users know that XeTeX does
not use inputenc, while for others, inputenc itself is too cryptic.

I'm closing this bug now.

comment:15 by milde@…, 16 years ago

OK, I renamed it to "Language Default (no inputenc)"

Thanks a lot.

Placing it in the last position of the combo is too tricky.

This is a pity, as so this experts-only setting
(noone seems to remember what it is useful for) stays
very visible and pre-selected.

Would it be possible to place "ASCII" first and
"Language Default (no inputenc)" second?
(Maybe by removing it from the encodings list and hard-coding?)

Or could (as a stop-gap measure) the preceding comment::

Always put the default encoding in the first position.

be changed to something like

Special option (not in the encodingslist)
TODO: place at last position or pre-select the document languages default

I think the renaming is sufficient.

Hopefully the 'cryptic' "no inputenc" will deter
the 'uninitiated' from ever using this.

The name of "Unicode (XeTeX) (utf8)" might better be changed ...
No. utf8 is important, since Unicode does not necessarily mean utf8.

To make clear that this does not imply "utf8.def" better use the
common writing "(UTF-8)" then for *all* Unicode settings (including "utf8x").
http://www.unicode.org/versions/Unicode4.0.0/appC.pdf

comment:16 by Juergen Spitzmueller, 16 years ago

Would it be possible to place "ASCII" first and
"Language Default (no inputenc)" second?

No. "default" is a special case that is added separately, the others are from
the encodings file. Placing it everywhere else than at first position is likely
to cause trouble.

To make clear that this does not imply "utf8.def" better use the
common writing "(UTF-8)" then for *all* Unicode settings (including "utf8x")

I've used the naming model of KDE and prefer to stick with that.

comment:17 by Juergen Spitzmueller, 15 years ago

Keywords: fixedintrunk removed
Resolution: fixed
Status: newclosed

LyX 1.6.0 is released.

comment:18 by Guenter Milde, 8 years ago

Remaining issues are now in ticket #9883.

Note: See TracTickets for help on using tickets.