From owner-ntemacs-users@june  Tue Aug 27 17:24:38 1996
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "27" "August" "1996" "16:45:00" "PDT" "George V. Reilly" "georger@microcrafts.com" nil "27" "RE: More ctrl-M stuff" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.7.5/7.2ju) with SMTP id RAA29214 for <voelker@june.cs.washington.edu>; Tue, 27 Aug 1996 17:24:38 -0700
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id RAA30852 for <voelker@joker.cs.washington.edu>; Tue, 27 Aug 1996 17:24:37 -0700
Received: from halcyon.com (smtp2.halcyon.com [198.137.231.18]) by june.cs.washington.edu (8.7.5/7.2ju) with SMTP id QAA25586 for <ntemacs-users@cs.washington.edu>; Tue, 27 Aug 1996 16:46:01 -0700
Received: from ms-smtp.wa.com by halcyon.com with SMTP id AA11191   (5.65c/IDA-1.4.4 for <ntemacs-users@cs.washington.edu>); Tue, 27 Aug 1996 16:46:00 -0700
Received: by ms-smtp.wa.com with Microsoft Mail 	id <32238978@ms-smtp.wa.com>; Tue, 27 Aug 96 16:49:12 PDT
Message-Id: <32238978@ms-smtp.wa.com>
Encoding: 27 TEXT
X-Mailer: Microsoft Mail V3.0
From: "George V. Reilly" <georger@microcrafts.com>
To: ntemacs-users <ntemacs-users@cs.washington.edu>
Subject: RE: More ctrl-M stuff
Date: Tue, 27 Aug 96 16:45:00 PDT


The solution that Vim uses, which works well in practice,
is to have two variables, textauto (global) and textmode
(buffer-local).  If textmode is set, a file is written with
DOS-style (CR-LF line separators); if it's off, the file is
written with Unix-style (LF line separators).  By default,
textmode is set on all new buffers for DOS-like systems
(DOS, OS/2, Win32) and cleared on all other systems.  If
textauto is set, then textmode is set for a buffer when
a file is read in which has every line separated by CR-LFs
and cleared otherwise.  In either case, the file looks fine
on screen.

If you edit and write a file, the line separator settings will
remain the same unless you explicitly override them.  This is
something I find very annoying with NT Emacs---especially
when diffing a modified file against an original file which
came from Unix and having diff report the whole file has
changed.  If the file has non-standard separator settings for
the OS (e.g., LFs on NT), you'll see a note about it in the
message line.
 --
/George V. Reilly   <georger@microcrafts.com>   <gvr@halcyon.com>
MicroCrafts, Inc., 17371 NE 67th Ct #205, Redmond, WA 98052, USA.
Tel: +1 206/250-0014  Fax: 206/250-0100  Web: www.microcrafts.com
Vim 4 (vi clone) for NT & Windows 95: http://www.halcyon.com/gvr/
pgp fingerprint: e2 b4 83 64 11 52 21 ea  bf d8 51 c2 11 00 78 fc  

From owner-ntemacs-users@june  Fri Nov  1 08:18:26 1996
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Fri" " 1" "November" "1996" "16:34:38" "+0100" "Frederic Corne" "frederic.corne@erli.fr" "<9611011534.AA07747@orme.sunserv>" "28" "Pb of crlf with Samba and untranslate" "^From:" nil nil "11" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.7.6/7.2ju) with SMTP id IAA10826 for <voelker@june.cs.washington.edu>; Fri, 1 Nov 1996 08:18:26 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id IAA25748 for <voelker@joker.cs.washington.edu>; Fri, 1 Nov 1996 08:18:24 -0800
Received: from polaris.gsi.fr (polaris.gsi.fr [150.175.128.2]) by june.cs.washington.edu (8.7.6/7.2ju) with ESMTP id HAA08122 for <ntemacs-users@cs.washington.edu>; Fri, 1 Nov 1996 07:34:55 -0800
Received: from erli.fr ([150.175.65.76]) by polaris.gsi.fr (8.7.3/8.6.12) with SMTP id QAA04907 for <ntemacs-users@cs.washington.edu>; Fri, 1 Nov 1996 16:35:53 +0100 (MET)
Received: from orme.sunserv by erli.fr (4.1/SMI-4.1) 	id AA19201; Fri, 1 Nov 96 16:34:40 +0100
Received: by orme.sunserv (5.x/SMI-SVR4) 	id AA07747; Fri, 1 Nov 1996 16:34:38 +0100
Message-Id: <9611011534.AA07747@orme.sunserv>
Reply-To: frederic.corne@erli.fr
From: Frederic Corne <frederic.corne@erli.fr>
To: ntemacs-users@cs.washington.edu
Subject: Pb of crlf with Samba and untranslate
Date: Fri, 1 Nov 1996 16:34:38 +0100


NOTE : This is a repost. It seems my previous message was lost.



I have installed Samba 1.9.16p7 on my unix box and I use 
untranslate.el with emacs19.31.1 on my NT machine.

(load "untranslate")
(add-untranslated-filesystem "E:")

at the top of my .emacs file

When I read and write a simple file ( for ex a README file)
all are OK. No crlf before and after.

But when the file is of a particular mode (c, c++, text, ...) the read is
correct ( no crlf) but when I save the file after modification, crlf is added.


Any idea ?


FC

-- 

****  Frederic CORNE   GSI-ERLI  frederic.corne@erli.fr ****

From da@dcs.ed.ac.uk  Wed Jan 22 04:21:09 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" "22" "January" "1997" "12:20:18" "+0000" "David Aspinall" "da@dcs.ed.ac.uk" "<199701221221.EAA09932@june.cs.washington.edu>" "35" "Re: DOS (text) mode" "^From:" nil nil "1" nil nil nil nil]
	nil)
Received: from rainich.dcs.ed.ac.uk (rainich.dcs.ed.ac.uk [129.215.160.105]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with ESMTP id EAA09932 for <voelker@cs.washington.edu>; Wed, 22 Jan 1997 04:21:04 -0800
Message-Id: <199701221221.EAA09932@june.cs.washington.edu>
Received: from INVOKE.demon.co.uk (actually host modem3.dcs.ed.ac.uk)            by rainich.dcs.ed.ac.uk with SMTP (PP);           Wed, 22 Jan 1997 12:19:57 +0000
X-Mailer: emacs 19.34.1 (via feedmail 3 Q) 
In-Reply-To: <199701220756.XAA25816@joker.cs.washington.edu>
References: <199701151430.GAA18597@june.cs.washington.edu>	<199701220756.XAA25816@joker.cs.washington.edu>
From: David Aspinall <da@dcs.ed.ac.uk>
To: voelker@cs.washington.edu (Geoff Voelker)
Cc: da@dcs.ed.ac.uk
Subject: Re: DOS (text) mode
Date: Wed, 22 Jan 1997 12:20:18 +0000

 > I'm unfamiliar with format-alist; what support is missing?

format-alist:  "List of information about understood file formats."

I think it was added to deal with enriched mode where text properties
are saved to the file.

I don't know much about it --- I just read the doc string.  From that
it seems as if it might cope nicely with DOS text files, if a regular
expression could be used to match the start of a file.  (If not,
perhaps format-alist could be extended to use a regexp or a function
argument).  Then it will automatically call hooks to encode and decode
the buffer.

I don't think this would add anything new to existing mechanisms
(whether the built-in handling of binary files, or the "DOS" minor
mode), but since Emacs now provides a hook for decoding different file
formats it might seem wise to integrate with it?

After discussions on the list about various DOS translation ideas I
thought I should mention this variable.

Personally I dislike the current mechanism: I would rather that files
were handled in "binary" mode by default, and only in DOS-text mode if
they can be deduced to be in DOS-text mode when visited.  (Perhaps
some file extensions should trigger DOS-text mode, but I am not
convinced).  There should be an easy way to switch to DOS-text mode,
just as with enriched mode.  I think this would be a nice behaviour
for those of us that use mixed text-formats; for people who use only
DOS-text, perhaps there could be a variable to enable the current
DOS-loving behaviour.

 - David.
 


From waider@autodealing.com  Wed Mar  5 04:25:54 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 5" "March" "1997" "11:24" "GMT" "Ronan Waide" "waider@autodealing.com" nil "20" "bug in load from ange-ftp directory?" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA15893 for <voelker@cs.washington.edu>; Wed, 5 Mar 1997 04:25:48 -0800
Received: from mail (gate.autodealing.com [194.125.131.131]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with SMTP id DAA04609 for <voelker@cs.washington.edu>; Wed, 5 Mar 1997 03:59:38 -0800 (PST)
Received: from waider.cognotec.com by mail with smtp 	(Smail3.1.29.1 #3) id m0w2EoI-002mKGC; Wed, 5 Mar 97 11:24 GMT
Message-Id: <m0w2EoI-002mKGC@mail>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Organization: AutoDealing Software, Ltd.
From: Ronan Waide <waider@autodealing.com>
To: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>
Subject: bug in load from ange-ftp directory?
Date: Wed, 5 Mar 97 11:24 GMT

Hiho,

I'm using the recent patched version of emacs 19.34 on win95 at the
moment. In an attempt to consolidate disparate emacs src and lib
directories, I've put a lot of stuff on a local ftp-able machine, and
I load it from there. However, emacs seems to have some trouble
loading .elc files via the ftp link; it successfully downloads them to
the local drive, but then fails to load them, usually complaining of
a missing bracket. Doing a find-file followed by eval-current-buffer
works fine, however. I suspect it may be loading the downloaded file
in text-mode, since ange-ftp creates the downloaded file as a
temporary file with no extension. Could either of you confirm this
suspicion?

Regards,
Waider. I'll try hacking ange-ftp-load (again!) in the meantime.
-- 
waider@autodealing.com / AutoDealing Software Ltd / +353-1-6766455

Never attribute to malloc that which can be adequately explained by stupidity

From owner-ntemacs-users@trout  Tue Apr  8 06:12:18 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 8" "April" "1997" "13:29:59" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "67" "Re: Attachments via ange-ftp" "^From:" nil nil "4" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA04047 for <voelker@june.cs.washington.edu>; Tue, 8 Apr 1997 06:12:17 -0700
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id GAA30228 for <voelker@joker.cs.washington.edu>; Tue, 8 Apr 1997 06:12:17 -0700
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id FAA27051 for <ntemacs-users@trout.cs.washington.edu>; Tue, 8 Apr 1997 05:31:51 -0700 (PDT)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA03036 for <ntemacs-users@cs.washington.edu>; Tue, 8 Apr 1997 05:31:48 -0700
Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.7.3) with ESMTP id NAA01533; Tue, 8 Apr 1997 13:30:46 +0100 (BST)
Received: from elan.long.harlequin.co.uk (elan.long.harlequin.co.uk [193.128.93.78]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id NAA29309; Tue, 8 Apr 1997 13:29:59 +0100 (BST)
Message-Id: <199704081229.NAA29309@propos.long.harlequin.co.uk>
In-reply-to: <QQcjly00819.199704020234@crystal.WonderWorks.COM> (message from 	Kyle Jones on Tue, 1 Apr 1997 21:34:19 -0500 (EST))
From: Andrew Innes <andrewi@harlequin.co.uk>
To: kyle_jones@wonderworks.com
CC: gray@austin.apc.slb.com, info-vm@uunet.uu.net,         ntemacs-users@cs.washington.edu
Subject: Re: Attachments via ange-ftp
Date: Tue, 8 Apr 1997 13:29:59 +0100 (BST)

On Tue, 1 Apr 1997 13:42:36 -0600, gray@austin.apc.slb.com (Douglas Gray Stephens) said:
>I suspect that my problem is PC related, but I'm not sure if it
>can/should be fixed in VM, or nt-emacs, hence I'm cross posting this
>to ntemacs-users@cs.washington.edu to see if the nt-emacs side have
>any suggestions.

Yes, this problem is PC specific (for the most part).

On Tue, 1 Apr 1997 21:34:19 -0500 (EST), Kyle Jones <kyle_jones@wonderworks.com> said:
>Douglas Gray Stephens writes:
>>[...]
>>This ^M will be causing vm to encode the message in base64.
>>
>>I am not sure why you've used
>>insert-file-contents-literally
>>instead of
>>insert-file-contents
>
>To avoid problems with file handlers uncompressing or otherwise
>fiddling with the input.  Maybe this is the wrong thing to do.
>I'm willing to switch to insert-file-contents and see if that
>works better.

Given that we are talking about including files as MIME attachments, I
think using insert-file-contents-literally is, in principle, the right
thing to do; the "problem" in this context is that it disables the
(imperfect) file type detection code used on Windows as well as
inhibiting the various handlers and hook functions.

Strictly speaking, if the original file uses DOS line endings, then that
is what should be transmitted (in base64 encoding if required).
However, if it is simply a plain text file, it would generally be more
helpful to treat it as such, and convert it to whatever line ending
convention is most suitable - in this case, convert to Unix line endings
so that the contents are transmitted in the clear.

So, although insert-file-contents-literally is strictly correct, in this
instance it would be more helpful to use a modified version which only
inhibits the handlers and hook functions, but leaves the file type code
in place.  Such a change should be safe to make, since it will only
affect Windows where it will generally do the right thing.

Aside:

The whole issue of how text files are handled, by the DOS and Windows
ports of Emacs at least, is really overdue for a major rethink.

The current method for determining whether a file is text (implicitly
meaning DOS text) or binary is based on regular expression matching
against the file name.  This leads to all sorts of hassles, most of
which could be easily avoided by using a simple content scanning
heuristic to identify whether a file is text or binary, and the line
ending convention (DOS, Mac, Unix) if text.

Personally, I would like to see this heuristic incorporated into Emacs
(on all platforms, not just DOS and Windows) - it would make editing and
manipulating text files from different sources mostly transparent.  I
don't know how likely it is this will happen though, since the Mule
capabilities currently being added to Emacs (which must deal with the
more general language/charset encoding properties of files and other
data streams) will probably subsume this issue, and may do so in a
completely different and more general way.

Still, I expect that the line ending convention is usually orthogonal to
charset encoding, so maybe there is a chance to do this anyway.

AndrewI

From owner-ntemacs-users@trout  Tue Apr  8 09:48:12 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 8" "April" "1997" "12:06:41" "-0400" "John R. Dennis" "jdennis@ultranet.com" nil "80" "Re: Attachments via ange-ftp" "^From:" nil nil "4" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA18123 for <voelker@june.cs.washington.edu>; Tue, 8 Apr 1997 09:48:11 -0700
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA30317 for <voelker@joker.cs.washington.edu>; Tue, 8 Apr 1997 09:48:10 -0700
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id JAA02649 for <ntemacs-users@trout.cs.washington.edu>; Tue, 8 Apr 1997 09:06:54 -0700 (PDT)
Received: from cinna.ultra.net (cinna.ultra.net [199.232.56.8]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA14789 for <ntemacs-users@cs.washington.edu>; Tue, 8 Apr 1997 09:06:52 -0700
Received: from DAKOTA (d9.dial-3.wor.ma.ultra.net [146.115.69.73]) by cinna.ultra.net (8.8.5/ult1.04) with SMTP id MAA04163; Tue, 8 Apr 1997 12:06:41 -0400 (EDT)
Message-Id: <199704081606.MAA04163@cinna.ultra.net>
In-reply-to: <199704081229.NAA29309@propos.long.harlequin.co.uk> (message from 	Andrew Innes on Tue, 8 Apr 1997 13:29:59 +0100 (BST))
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
From: "John R. Dennis" <jdennis@ultranet.com>
To: andrewi@harlequin.co.uk, John Dennis <jdennis@ultranet.com>
CC: kyle_jones@wonderworks.com, gray@austin.apc.slb.com,         ntemacs-users@cs.washington.edu
Subject: Re: Attachments via ange-ftp
Date: Tue, 8 Apr 1997 12:06:41 -0400 (EDT)

>>>>> "Andrew" == Andrew Innes <andrewi@harlequin.co.uk> writes:

    Andrew> Given that we are talking about including files as MIME
    Andrew> attachments, I think using insert-file-contents-literally
    Andrew> is, in principle, the right thing to do; the "problem" in
    Andrew> this context is that it disables the (imperfect) file type
    Andrew> detection code used on Windows as well as inhibiting the
    Andrew> various handlers and hook functions.

    Andrew> The whole issue of how text files are handled, by the DOS
    Andrew> and Windows ports of Emacs at least, is really overdue for
    Andrew> a major rethink.

I cannot believe how topical this issue is. I just spent all Friday
morning debugging a similar problem in mime.el.

Even though I had set all the variables I knew of that caused CRLF
translation when inserting into a buffer...

      (let ((start (point))
	    (emx-binary-mode t)		;Stop LF to CRLF conversion in OS/2
	    (buffer-file-type t)	;Stop LF to CRLF conversion in DOS/NT
	    (binary-process-input t))	;Stop LF to CRLF conversion in DOS/NT

the conversion was still happening because in fileio.c the
implementation of insert-file-contents overwrites the user supplied
value of buffer-file-type:

    current_buffer->buffer_file_type
      = call1 (Qfind_buffer_file_type, filename);

The elisp code knew it wanted to insert the contents of the file as
binary so it explicitly set buffer-file-type, but the implementation
of insert-file-contents ignored that setting and tried to determine
the translation mode by a regular expression match on the filename. I
fixed the problem by calling insert-file-contents-literally which
undefines find-buffer-file-type so the call in insert-file-contents to
find-buffer-file-type won't succeed. But I don't think the C code in
insert-buffer-contents should ignore the documented variable
(buffer-file-type) that is supposed to toggle the CRLF translation!

All of this is pretty ugly, prone to failure, and more to the point
undocumented for the most part as far as I can tell. After spending
the better part of day digging through the binary vs. text issues I
was left with the distinct impression that most of this code is a
"hack" waiting to break. I absolutely agree with Andrew that this is
in need of a major rethink.

To begin the discussion I will make the following observations:

* Determining binary/text based on regular expression matching of filenames
  is fundamentally flawed. There is not enough naming discipline with
  filenames and extensions to make this work reliably. I have been
  burned by this more times than I care to remember.

* The only way to tell if a file is binary is to scan the file and
  look for non-ascii bytes.

* The documentation on the text/binary issues is woefully inadequate
  and the implementation is inconsistent.

* The binary/text translation should be controlled by a user settable
  variable that is ALWAYS respected. After all, the user is ultimately
  more knowledgable about the contents of a file than the
  implementation.

* There should be second user settable variable that toggles whether
  translation variable is automatically set based on the contents of
  the file. In this way you get automatic translation in the 99% of
  the cases you want it AND you can force the translation on/off when you
  have to.

* We'ed all be happier without operating systems that make the
  artifical distinction between text and binary files and attempts to
  insert/delete/modify bytes that are not in the actual file to undo
  the damage introduced by this ill-conceived distinction in the first
  place (sorry, this last point was a completely personal soapbox
  comment :-)

John Dennis

From owner-ntemacs-users@trout  Wed Mar 26 09:53:47 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "26" "March" "1997" "09:07:47" "-0800" "Don Erway" "derway@ndc.com" nil "27" "Re: > toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA25357 for <voelker@june.cs.washington.edu>; Wed, 26 Mar 1997 09:53:47 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA23324 for <voelker@joker.cs.washington.edu>; Wed, 26 Mar 1997 09:53:46 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id JAA12324 for <ntemacs-users@trout.cs.washington.edu>; Wed, 26 Mar 1997 09:07:50 -0800 (PST)
Received: from maya.ndc.com (maya.ndc.com [192.101.92.41]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA22697 for <ntemacs-users@cs.washington.edu>; Wed, 26 Mar 1997 09:07:49 -0800
Received: from heidi.ndc-new.com (heidi [192.101.92.15]) by maya.ndc.com (8.7.5/8.7.3) with SMTP id JAA12674 for <ntemacs-users@cs.washington.edu>; Wed, 26 Mar 1997 09:06:16 -0800 (PST)
Received: from HAL.ndc.com by heidi.ndc-new.com (SMI-8.6/SMI-SVR4) 	id JAA13517; Wed, 26 Mar 1997 09:07:47 -0800
Message-Id: <199703261707.JAA13517@heidi.ndc-new.com>
In-reply-to: <199703261320.AA11627@lambda.unx.sas.com> (message from David 	Biesack on Wed, 26 Mar 1997 08:20:33 -0500)
Mime-Version: 1.0 (generated by tm-edit 7.92)
Content-Type: text/plain; charset=US-ASCII
From: Don Erway <derway@ndc.com>
To: ntemacs-users@cs.washington.edu
Subject: Re: > toggle binary/text mode of current buffer
Date: Wed, 26 Mar 1997 09:07:47 -0800


>>>>> "db" == David Biesack <sasdjb@unx.sas.com> writes:

 db> suggested:

 db> (defvar binary-mode-distance 500
 db>       "Number of characters to search for CR/LF when looking for a binary file.")

 db> (defun check-buffer-file-type (filename)
 db>   (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence
 db>            ;; and has no LF w/o CR within sight
 db>            (not (re-search-forward "[^\r]\n]" binary-mode-distance t)))
 db>       nil ;; so use text mode
 db>     t))   ;; else use binary mode

This works fine.  However, auto detection still does not work under unix.  I
am running 19.32 on NT, and 19.33 on Solaris.  In the 19.33 solaris version,
there is no file-name-buffer-file-type-alist defined.  So without this
alist, and some code to process it, there is no surprise that it doesn't work.

Is the idea to use winnt.el even when running on unix?

If I load winnt.el into the unix version, it complains that the
set-message-beep function doesn't exist.  But I can always work around that if
this is even the right approach.

Don

From owner-ntemacs-users@trout  Wed Mar 26 06:03:34 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "26" "March" "1997" "08:20:33" "-0500" "David Biesack" "sasdjb@unx.sas.com" nil "41" "> toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA14317 for <voelker@june.cs.washington.edu>; Wed, 26 Mar 1997 06:03:33 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id GAA17020 for <voelker@joker.cs.washington.edu>; Wed, 26 Mar 1997 06:03:32 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id FAA07763 for <ntemacs-users@trout.cs.washington.edu>; Wed, 26 Mar 1997 05:20:43 -0800 (PST)
Received: from lamb.sas.com (lamb.sas.com [192.35.83.8]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id FAA13537 for <ntemacs-users@cs.washington.edu>; Wed, 26 Mar 1997 05:20:41 -0800
Received: from mozart by lamb.sas.com (5.65c/SAS/Gateway/01-23-95) 	id AA11423; Wed, 26 Mar 1997 08:20:39 -0500
Received: from lambda.unx.sas.com by mozart (5.65c/SAS/Domains/5-6-90) 	id AA21315; Wed, 26 Mar 1997 08:20:33 -0500
Received: by lambda.unx.sas.com (5.65c/SAS/Generic 9.01/3-26-93) 	id AA11627; Wed, 26 Mar 1997 08:20:33 -0500
Message-Id: <199703261320.AA11627@lambda.unx.sas.com>
In-Reply-To: <199703252229.OAA06160@sampras.isi.com> (message from Kin Cho on Tue, 25 Mar 1997 14:29:49 -0800)
From: David Biesack <sasdjb@unx.sas.com>
To: ntemacs-users@cs.washington.edu
Subject: > toggle binary/text mode of current buffer
Date: Wed, 26 Mar 1997 08:20:33 -0500


> ;;; This examines the actual contents of the loaded file to see if 
> ;;;  it should use text mode or binary:
> (defun check-buffer-file-type (filename)
>   (if (and (looking-at ".*\r\n")		    ;; It has CR-LF sequence
>            (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR
>       nil					    ;; so use text mode
>     t))						    ;; else use binary mode

Someone else pointed out that the search-forward should be a
re-search-forward. However, also note that passing nil to the search
will cause inspection of the entire buffer, which is not always
negligible. It might be better to make this a variable as is done in
dos-mode.el

;;;    LCD Archive Entry:
;;;    dos-mode|Andy Norman|ange@hplb.hpl.hp.com
;;;    |MSDOS minor mode for GNU Emacs
;;;    |$Date: 93/06/01 19:54:29 $|$Revision: 1.12 $|

which passes (min (point-max) dos-mode-distance) to re-search-forward
where 

    (defvar dos-mode-distance 200
      "Number of characters to search for RETURN when looking for a DOS file.")

to determine if a file is in DOS CR/LF mode.  You can change
dos-mode-distance to 1000 or some other reasonable value in your .emacs

suggested:


(defvar binary-mode-distance 500
      "Number of characters to search for CR/LF when looking for a binary file.")

(defun check-buffer-file-type (filename)
  (if (and (looking-at ".*\r\n") ;; It has CR-LF sequence
           ;; and has no LF w/o CR within sight
           (not (re-search-forward "[^\r]\n]" binary-mode-distance t)))
      nil ;; so use text mode
    t))   ;; else use binary mode

From owner-ntemacs-users@trout  Tue Mar 25 18:23:53 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "25" "March" "1997" "20:20:21" "-0500" "Geoff Odhner" "odhner@recom.com" nil "69" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id SAA20274 for <voelker@june.cs.washington.edu>; Tue, 25 Mar 1997 18:23:53 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id SAA17699 for <voelker@joker.cs.washington.edu>; Tue, 25 Mar 1997 18:23:51 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id RAA27386 for <ntemacs-users@trout.cs.washington.edu>; Tue, 25 Mar 1997 17:19:46 -0800 (PST)
Received: from recom.recom.com (freeholders.co.camden.nj.us [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id RAA15955 for <ntemacs-users@cs.washington.edu>; Tue, 25 Mar 1997 17:19:46 -0800
Received: from odhner (dial31.mt-holly.emanon.net [204.213.88.131]) by recom.recom.com (8.6.12/8.6.9) with SMTP id UAA02882; Tue, 25 Mar 1997 20:25:16 -0500
Message-ID: <333879D5.2FEC@recom.com>
X-Mailer: Mozilla 2.01Gold (Win95; I)
MIME-Version: 1.0
References: <199703241934.LAA10013@heidi.ndc-new.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Geoff Odhner <odhner@recom.com>
To: Don Erway <derway@ndc.com>
CC: kin@isi.com, ntemacs-users@cs.washington.edu
Subject: Re: toggle binary/text mode of current buffer
Date: Tue, 25 Mar 1997 20:20:21 -0500

Don Erway wrote:
> > The one funny is that files to which I have only read-only access come up as
> > writeable.
> 
> I spoke too soon.  It appears that the visiting read-only files works on NT,
> but under unix, a read-only file is not translated correctly.
> check-buffer-file-type works, but translation does not occur.
> 
> On NT, the files do get translated, and do not come up as writeable.

Try my latest version.  It should address this problem.  It works on 
win95,
but I haven't yet tested it on unix, though I'm expecting no problem.

BTW, one caveat about using this on unix:  Though this code works to
toggle the buffer type, the mode line indicator doesn't work on unix,
at least not on SunOS.  If you add the mode line %t indicator, it always
indicates T on the mode line.  I expect that requires a fix to the
C code and a recompile.  I guess they figured noone would ever use it
on unix. :-)

Happy editing...

-Geoff


And here's the new version, as promised:

;;;  If you have loaded a file as binary that actually has the ^M's in 
it,
;;;  then switching to text mode will remove them in the buffer.  Of 
course
;;;  now that it's in text mode, it will save with the ^M's inserted.
;;;  Switching to binary mode does NOT have a reverse effect.  If you 
want
;;;  to disable that change on entering text mode, then use a negative
;;;  prefix argument, as described below.

;;;  A prefix argument will force the mode change in a particular
;;;  direction.  A positive prefix argument forces it to binary.  A zero
;;;  prefix argument forces text mode allowing the removal of ^M's (only
;;;  preceding ^J's).  A negative prefix argument forces text mode
;;;  disallowing the removal of ^M's.  

;;;  When the mode is changed the state of modification of the buffer is
;;;  preserved, even if the ^M's are removed.

(defun toggle-buffer-file-type (arg)
  "Alternate value of buffer-file-type"
  (interactive "P")
  (let ((old buffer-file-type)
	(mod (buffer-modified-p))
	(buffer-read-only nil))
    (setq buffer-file-type
	  (if arg (>= arg 1)
	    (not buffer-file-type)))
    (if (and old
	     (not buffer-file-type)
	     (or (not arg)
		 (> arg -2)))
	(save-excursion
	  (beginning-of-buffer)
	  (while (search-forward "\r\n" nil t)
	    (replace-match "\n" nil t))
	  (set-buffer-modified-p mod))))
  (force-mode-line-update))

;; And my preferred key binding:
(global-set-key [?\A-t]	'toggle-buffer-file-type)

From kin@isi.com  Tue Mar 25 14:28:47 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "25" "March" "1997" "14:29:49" "-0800" "Kin Cho" "kin@isi.com" nil "30" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from sampras.isi.com (sampras.isi.com [192.103.53.29]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id OAA03847; Tue, 25 Mar 1997 14:28:47 -0800
Received: (from kin@localhost) by sampras.isi.com (8.6.10/8.6.10) id OAA06160; Tue, 25 Mar 1997 14:29:49 -0800
Message-Id: <199703252229.OAA06160@sampras.isi.com>
In-reply-to: <33355BAF.2353@recom.com> (message from Geoff Odhner on Sun, 23 	Mar 1997 11: 34:55 -0500)
From: Kin Cho <kin@isi.com>
To: odhner@recom.com, voelker@cs.washington.edu
CC: derway@ndc.com, ntemacs-users@cs.washington.edu
Subject: Re: toggle binary/text mode of current buffer
Date: Tue, 25 Mar 1997 14:29:49 -0800

Thanks, this is good!

A real solution as compared to the workarounds that came before.
If only this works in UNIX as well!

Please put it in the FAQ, or even better, integrate it with main line code.

-kin

p.s., this is my mod:
	      (list (cons "" 'check-buffer-file-type))))

;;; Associate the universal match regexp "" with the
;;; function check-buffer-file-type, so any file will be
;;; examined to automatically select the appropriate mode.
;;; Add this check only after known filename patterns are
;;; treated the way they should be.  (That's why we append
;;; to the list, instead of replacing it).  You might want
;;; to use more more restrictive pattern(s) for doing this
;;; check.
(setq file-name-buffer-file-type-alist
      (append file-name-buffer-file-type-alist
	      (list (cons "" 'check-buffer-file-type))))
;;; This examines the actual contents of the loaded file to see if 
;;;  it should use text mode or binary:
(defun check-buffer-file-type (filename)
  (if (and (looking-at ".*\r\n")		    ;; It has CR-LF sequence
	   (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR
      nil					    ;; so use text mode
    t))						    ;; else use binary mode

From owner-ntemacs-users@trout  Sun Mar 23 09:13:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "23" "March" "1997" "11:34:55" "-0500" "Geoff Odhner" "odhner@recom.com" nil "33" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA20826 for <voelker@june.cs.washington.edu>; Sun, 23 Mar 1997 09:13:15 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id JAA26846 for <voelker@joker.cs.washington.edu>; Sun, 23 Mar 1997 09:13:13 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id IAA08257 for <ntemacs-users@trout.cs.washington.edu>; Sun, 23 Mar 1997 08:34:38 -0800 (PST)
Received: from recom.recom.com (freeholders.co.camden.nj.us [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA19986 for <ntemacs-users@cs.washington.edu>; Sun, 23 Mar 1997 08:34:37 -0800
Received: from odhner (dial15.mt-holly.emanon.net [204.213.88.115]) by recom.recom.com (8.6.12/8.6.9) with SMTP id LAA11941; Sun, 23 Mar 1997 11:39:44 -0500
Message-ID: <33355BAF.2353@recom.com>
X-Mailer: Mozilla 2.01Gold (Win95; I)
MIME-Version: 1.0
References: <199703222055.MAA14453@heidi.ndc-new.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Geoff Odhner <odhner@recom.com>
To: Don Erway <derway@ndc.com>
CC: kin@isi.com, ntemacs-users@cs.washington.edu
Subject: Re: toggle binary/text mode of current buffer
Date: Sun, 23 Mar 1997 11:34:55 -0500

Don Erway wrote:
> Finally, it needs an auto option, to make it possible to
> automatically go into text mode if the content is strictly
> text and crlfs are already present in a file.  This should
> not be based on file name extensions or file systems, but
> only on file content.

I have written a few more bits of code that help automate
binary/text mode selection.  This approach is possible due
to the Geoff Voelker's foresight in designing the
infrastructure to be configurable in this way.  Thanks,
Geoff.



;;; Associate the universal match regexp "" with the
;;; function check-buffer-file-type, so any file will be
;;; examined to automatically select the appropriate mode.
;;; Add this check only after known filename patterns are
;;; treated the way they should be.  (That's why we append
;;; to the list, instead of replacing it).  You might want
;;; to use more more restrictive pattern(s) for doing this
;;; check.
(setq file-name-buffer-file-type-alist
      (append file-name-buffer-file-type-alist
	      (cons "" 'check-buffer-file-type)))
;;; This examines the actual contents of the loaded file to see if 
;;;  it should use text mode or binary:
(defun check-buffer-file-type (filename)
  (if (and (looking-at ".*\r\n")		    ;; It has CR-LF sequence
	   (not (search-forward "[^\r]\n]" nil t))) ;; and has no LF w/o CR
      nil					    ;; so use text mode
    t))						    ;; else use binary mode

From owner-ntemacs-users@trout  Sat Mar 22 13:28:17 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sat" "22" "March" "1997" "12:55:31" "-0800" "Don Erway" "derway@ndc.com" nil "23" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id NAA14305 for <voelker@june.cs.washington.edu>; Sat, 22 Mar 1997 13:28:17 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id NAA23216 for <voelker@joker.cs.washington.edu>; Sat, 22 Mar 1997 13:28:15 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id MAA24518 for <ntemacs-users@trout.cs.washington.edu>; Sat, 22 Mar 1997 12:56:39 -0800 (PST)
Received: from maya.ndc.com (maya.ndc.com [192.101.92.41]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA13538 for <ntemacs-users@cs.washington.edu>; Sat, 22 Mar 1997 12:56:38 -0800
Received: from heidi.ndc-new.com (heidi [192.101.92.15]) by maya.ndc.com (8.7.5/8.7.3) with SMTP id MAA05675; Sat, 22 Mar 1997 12:54:00 -0800 (PST)
Received: from HAL.ndc.com by heidi.ndc-new.com (SMI-8.6/SMI-SVR4) 	id MAA14453; Sat, 22 Mar 1997 12:55:31 -0800
Message-Id: <199703222055.MAA14453@heidi.ndc-new.com>
In-reply-to: <33343F98.2A7@recom.com> (message from Geoff Odhner on Sat, 22 	Mar 1997 15:22:48 -0500)
Mime-Version: 1.0 (generated by tm-edit 7.92)
Content-Type: text/plain; charset=US-ASCII
From: Don Erway <derway@ndc.com>
To: odhner@recom.com
CC: kin@isi.com, ntemacs-users@cs.washington.edu
Subject: Re: toggle binary/text mode of current buffer
Date: Sat, 22 Mar 1997 12:55:31 -0800


This is good.

I can now happily make everything binary by default, and use your toggle
funciton for the few cases it is really needed.  This is better than using
crypt's DOS mode, because it is faster.

Now, if only it would work in unix emacs, we could completely share files
either way.

Finally, it needs an auto option, to make it possible to automatically go into
text mode if the content is strictly text and crlfs are already present in a
file.  This should not be based on file name extensions or file systems, but
only on file content.

Thanks for the useful hack.

Don

	Don Erway			derway@ndc.com
	NDC Systems			818-939-3847
	5314 N. Irwindale Ave		Fax:939-3870
	Irwindale, CA, 91706

From owner-ntemacs-users@trout  Sat Mar 22 13:01:33 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sat" "22" "March" "1997" "15:22:48" "-0500" "Geoff Odhner" "odhner@recom.com" nil "52" "Re: toggle binary/text mode of current buffer" "^From:" nil nil "3" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id NAA13680 for <voelker@june.cs.washington.edu>; Sat, 22 Mar 1997 13:01:33 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id NAA25772 for <voelker@joker.cs.washington.edu>; Sat, 22 Mar 1997 13:01:31 -0800
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2ws+) with ESMTP id MAA23716 for <ntemacs-users@trout.cs.washington.edu>; Sat, 22 Mar 1997 12:22:12 -0800 (PST)
Received: from recom.recom.com (recom.recom.com [204.213.88.1]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id MAA12359 for <ntemacs-users@cs.washington.edu>; Sat, 22 Mar 1997 12:22:12 -0800
Received: from odhner (dial5.mt-holly.emanon.net [204.213.88.105]) by recom.recom.com (8.6.12/8.6.9) with SMTP id PAA01331; Sat, 22 Mar 1997 15:27:36 -0500
Message-ID: <33343F98.2A7@recom.com>
X-Mailer: Mozilla 2.01Gold (Win95; I)
MIME-Version: 1.0
References: <199703212025.MAA03727@sampras.isi.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Geoff Odhner <odhner@recom.com>
To: Kin Cho <kin@isi.com>
CC: ntemacs-users@cs.washington.edu
Subject: Re: toggle binary/text mode of current buffer
Date: Sat, 22 Mar 1997 15:22:48 -0500

Kin Cho wrote:
> 
> Is that a function that does this?
> Trying to work around yet another PC<->UNIX integration problem.
> 
> Thanks.
> 
> -kin

I have yet another version of my toggle-buffer-file-type function.
This one updates the status bar, which is necessary if you bind it 
to a key.  

-Geoff

;;;  If you have loaded a file as binary that actually has the ^M's in it,
;;;  then switching to text mode will remove them in the buffer.  Of course
;;;  now that it's in text mode, it will save with the ^M's inserted.
;;;  Switching to binary mode does NOT have a reverse effect.  If you want
;;;  to disable that change on entering text mode, then use a negative
;;;  prefix argument, as described below.

;;;  A prefix argument will force the mode change in a particular
;;;  direction.  A positive prefix argument forces it to binary.  A zero
;;;  prefix argument forces text mode allowing the removal of ^M's (only
;;;  preceding ^J's).  A negative prefix argument forces text mode
;;;  disallowing the removal of ^M's.  

;;;  When the mode is changed the state of modification of the buffer is
;;;  preserved, even if the ^M's are removed.

(defun toggle-buffer-file-type (arg)
  "Alternate value of buffer-file-type"
  (interactive "P")
  (let ((old buffer-file-type)
	(mod (buffer-modified-p)))
    (setq buffer-file-type
	  (if arg (>= arg 1)
	    (not buffer-file-type)))
    (if (and old
	     (not buffer-file-type)
	     (or (not arg)
		 (> arg -2)))
	(save-excursion
	  (beginning-of-buffer)
	  (while (search-forward "\r\n" nil t)
	    (replace-match "\n" nil t))
	  (set-buffer-modified-p mod))))
  (force-mode-line-update))

;;;  Here's my personal selection for a key binding for this function:
(global-set-key [?\A-t]		'toggle-buffer-file-type)

From andrewi@harlequin.co.uk  Tue Apr 15 06:15:20 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "15" "April" "1997" "14:14:36" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "38" "Questions about MULE" "^From:" nil nil "4" nil nil nil nil]
	nil)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id GAA22624 for <voelker@cs.washington.edu>; Tue, 15 Apr 1997 06:15:19 -0700
Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50]) by holly.cam.harlequin.co.uk (8.8.4/8.7.3) with ESMTP id OAA28494; Tue, 15 Apr 1997 14:15:10 +0100 (BST)
Received: from elan.long.harlequin.co.uk (elan.long.harlequin.co.uk [193.128.93.78]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id OAA25514; Tue, 15 Apr 1997 14:14:36 +0100 (BST)
Message-Id: <199704151314.OAA25514@propos.long.harlequin.co.uk>
In-reply-to: <199704150439.AAA16856@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 15 Apr 1997 00:39:58 -0400)
From: Andrew Innes <andrewi@harlequin.co.uk>
To: rms@gnu.ai.mit.edu
cc: voelker@cs.washington.edu
Subject: Questions about MULE
Date: Tue, 15 Apr 1997 14:14:36 +0100 (BST)

On Tue, 15 Apr 1997 00:39:58 -0400, Richard Stallman <rms@gnu.ai.mit.edu> said:
>I see nothing problematical in these changes.
>The ones that have to do with cr conversion will have to be
>redone totally differently for the next Emacs release, though,
>because MULE affects this very much.

I am only dimly aware of what the MULE work for 19.35 entails, so if you
have time I would like to ask a few questions about it.

Since the issue of DOS vs Unix line ending conventions for text files is
currently handled poorly in 19.34 (in the DOS and Windows ports), there
has been a fair bit of discussion recently on the ntemacs-users mailing
list about possible mechanisms for improving this in the future.  This
applies primarily in the context of working with a mixture of text files
using both line ending conventions.

The main difficulties at present are that Emacs doesn't, in general,
correctly identify text vs. binary files, and for text files doesn't
remember which line ending convention was used.  The general thrust of
suggestions for improvement is to implement some kind of
mostly-automatic mechanism to detect which files are text, and remember
the line ending convention in use (DOS, Unix or possibly Mac).  Obvious
heuristics based on scanning the first part of each file when loaded for
"funny" characters could be used.  More sophisticated extensions which
detect mistakes in the assumed format follow on from that.

I know this issue overlaps somewhat with the more general language and
character encoding issues that are handled by MULE, but I'm not sure how
exactly.  Is there any documentation about MULE, as being implemented in
19.35, that I could read?

It seems to me that the line ending convention employed by text files is
often orthogonal to the character encoding convention (at least for
single-and multi-byte encoding, and for Unicode as well after allowing
for wider characters), and so a mechanism for automatically detecting
and propagating the convention in use could still be of value.

AndrewI

From rms@gnu.ai.mit.edu  Tue Jul  1 17:55:36 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 1" "July" "1997" "20:55:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "20" "New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA13267 for <voelker@cs.washington.edu>; Tue, 1 Jul 1997 17:55:35 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA26226; Tue, 1 Jul 1997 20:55:47 -0400
Message-Id: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il, voelker@cs.washington.edu
Subject: New way of handling CRLF
Date: Tue, 1 Jul 1997 20:55:47 -0400

The MULE features include a new way of handling CRLF conversion.
It detects the need to convert CRLF using the same mechanism that
detects the need to convert international character sets.

One consequence of this is that it ought to succeed in editing files
that use LF or files that use CRLF.  Regardless of what type of system
you are on and what type of file system you are using, the file will
appear in the normal Emacs way, with newlines between the lines.

Does this mean that some of the features for text vs binary files
and untranslated file systems are now unnecessary?  Can I simplify
the "Text Files and Binary Files" in the manual?

Please answer me as soon as you can; I am trying to finish the manual.

Note: currently there is a bug: when you visit a file on Unix which
uses CRLF between lines, it recognizes that, but
buffer-file-coding-system is set to nil, which is not right.  I will
forward you the fix for this as soon as I get it.


From eliz@is.elta.co.il  Wed Jul  2 01:03:33 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" " 2" "July" "1997" "11:03:09" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970702110228.27453E-100000@is>" "34" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id BAA28595 for <voelker@cs.washington.edu>; Wed, 2 Jul 1997 01:03:30 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id LAA27499; Wed, 2 Jul 1997 11:03:10 +0300
X-Sender: eliz@is
In-Reply-To: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970702110228.27453E-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu
Subject: Re: New way of handling CRLF
Date: Wed, 2 Jul 1997 11:03:09 +0300 (IDT)


On Tue, 1 Jul 1997, Richard Stallman wrote:

The following is purely theoretical, based on what you told in your
message.  I didn't have time yet to download and install the pretest,
neither do I know how does MULE detect and convert the file format.

> Does this mean that some of the features for text vs binary files
> and untranslated file systems are now unnecessary?  Can I simplify
> the "Text Files and Binary Files" in the manual?

I would guess that the manual needs to be changed, but not necessarily
simplified.  The text vs binary thing has two aspects: reading them
into Emacs and writing them back to the filesystem.  No matter how
smart the CRLF detection mechanism is, there will be cases when users
will want a buffer to be written in specific format of their
preference, which might be different from the format of the original
file as read by Emacs.

I'm also not sure that the CRLF detection can be made fully automatic.
Imagine a binary file (like an executable program) that includes a
CRLF pair somewhere; would Emacs 20 strip the CR from it when it reads
that file and treat it as text?

So I think Emacs 20 will need to keep the special varieties of
`find-file' that specify text or binary explicitly (btw, it seems as
if they aren't mentioned anywhere in the 19.34 manual).  There should
also be a way to tell Emacs to write a buffer (or a region) with LFs
translated to CRLFs.  In particular, the (un)?translated filesystem
feature should be kept IMHO.

If the above reasoning is true, there should be minor changes to the
manual (to explain the automatic CRLF detection feature), but the bulk
of the text should be kept.

From eliz@is.elta.co.il  Thu Jul  3 08:39:40 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" " 3" "July" "1997" "18:36:11" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA18394 for <voelker@cs.washington.edu>; Thu, 3 Jul 1997 08:39:39 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id SAA01572; Thu, 3 Jul 1997 18:36:11 +0300
X-Sender: eliz@is
In-Reply-To: <199707030040.UAA03584@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970703183248.1458I-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu
Subject: Re: New way of handling CRLF
Date: Thu, 3 Jul 1997 18:36:11 +0300 (IDT)


On Wed, 2 Jul 1997, Richard Stallman wrote:

> We have two mechanisms for deciding whether a file should have LF, not
> CRLF based on the file name.  One looks for "binary files" and one
> looking for untranslated file systems.
> 
> Could these be unified, I wonder?

On "translated" file systems, Emacs should decide whether the file is 
text (and then convert CRLF -> LF) or binary.

On "untranslated" file systems, all files should be read and written 
verbatim.

From rms@gnu.ai.mit.edu  Wed Jul  2 17:39:07 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 2" "July" "1997" "20:39:37" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA17875 for <voelker@cs.washington.edu>; Wed, 2 Jul 1997 17:39:06 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA03573; Wed, 2 Jul 1997 20:39:37 -0400
Message-Id: <199707030039.UAA03573@psilocin.gnu.ai.mit.edu>
In-reply-to: <199707021953.MAA19844@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> 	<Pine.SUN.3.91.970702110228.27453E-100000@is> <199707021953.MAA19844@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
Subject: Re: New way of handling CRLF
Date: Wed, 2 Jul 1997 20:39:37 -0400

    I agree with Eli that users will still want a mechanism by which files
    are written in a format automatically determined by Emacs.

I agree.

Still, I would really really appreciate it if you would tell
me how things DO work now!  Does Emacs automatically figure
out whether a file has CRLF or LF?

(There is a bug in the pretest that fails to save a file with CRLF if
it was recognized with CRLF.  That has been fixed.)

From rms@gnu.ai.mit.edu  Wed Jul  2 17:40:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 2" "July" "1997" "20:40:45" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA17925 for <voelker@cs.washington.edu>; Wed, 2 Jul 1997 17:40:14 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA03584; Wed, 2 Jul 1997 20:40:45 -0400
Message-Id: <199707030040.UAA03584@psilocin.gnu.ai.mit.edu>
In-reply-to: <199707021953.MAA19844@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <199707020055.UAA26226@psilocin.gnu.ai.mit.edu> 	<Pine.SUN.3.91.970702110228.27453E-100000@is> <199707021953.MAA19844@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il
Subject: Re: New way of handling CRLF
Date: Wed, 2 Jul 1997 20:40:45 -0400

We have two mechanisms for deciding whether a file should have LF, not
CRLF based on the file name.  One looks for "binary files" and one
looking for untranslated file systems.

Could these be unified, I wonder?  And could they both be done using
file-coding-system-alist now?

From rms@gnu.ai.mit.edu  Thu Jul  3 12:18:41 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Thu" " 3" "July" "1997" "15:19:06" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707031919.PAA10787@psilocin.gnu.ai.mit.edu>" "12" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA07559 for <voelker@cs.washington.edu>; Thu, 3 Jul 1997 12:18:39 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA10787; Thu, 3 Jul 1997 15:19:06 -0400
Message-Id: <199707031919.PAA10787@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970703183248.1458I-100000@is> (message from Eli 	Zaretskii on Thu, 3 Jul 1997 18:36:11 +0300 (IDT))
References:  <Pine.SUN.3.91.970703183248.1458I-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu
Subject: Re: New way of handling CRLF
Date: Thu, 3 Jul 1997 15:19:06 -0400

    On "translated" file systems, Emacs should decide whether the file is 
    text (and then convert CRLF -> LF) or binary.

    On "untranslated" file systems, all files should be read and written 
    verbatim.

That is what it does now--doesn't it?  So what are you trying to say?
Perhaps you misunderstood my question and answered a completely
different one.

Right now we have two separate mechanisms to do two similar jobs.
Can we replace them with one mechanism that can do both jobs?

From eliz@is.elta.co.il  Sun Jul  6 07:22:25 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" " 6" "July" "1997" "17:22:00" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA24478 for <voelker@cs.washington.edu>; Sun, 6 Jul 1997 07:22:23 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA08656; Sun, 6 Jul 1997 17:22:01 +0300
X-Sender: eliz@is
In-Reply-To: <199707031919.PAA10787@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970706172036.8624C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu
Subject: Re: New way of handling CRLF
Date: Sun, 6 Jul 1997 17:22:00 +0300 (IDT)


On Thu, 3 Jul 1997, Richard Stallman wrote:

>     On "translated" file systems, Emacs should decide whether the file is 
>     text (and then convert CRLF -> LF) or binary.
> 
>     On "untranslated" file systems, all files should be read and written 
>     verbatim.
> 
> That is what it does now--doesn't it?  So what are you trying to
> say?

I was trying to say that the two should be combined.  (un)?translated
says whether the CRLF<->LF conversion is at all an issue, and the
detection of the file type says whether this particular file needs the
conversion, given that it belongs to a "translated" filesystem.

If it already works this way, then my comments are redundant.

From eliz@is.elta.co.il  Sun Jul  6 07:23:08 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" " 6" "July" "1997" "17:22:44" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA24484 for <voelker@cs.washington.edu>; Sun, 6 Jul 1997 07:23:05 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA08662; Sun, 6 Jul 1997 17:22:44 +0300
X-Sender: eliz@is
In-Reply-To: <199707042102.OAA34672@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970706172213.8624D-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, Andrew Innes <andrewi@harlequin.co.uk>
Subject: Re: New way of handling CRLF
Date: Sun, 6 Jul 1997 17:22:44 +0300 (IDT)


On Fri, 4 Jul 1997, Geoff Voelker wrote:

> correctly (e.g., on a text file with CRLF, both a find-file and a
> find-file-binary create a buffer with the text file stripped of
> CRLF,

What about binary files with embedded CRLFs?  How can Emacs tell which
files are and which aren't ``text''?  If it can't, then the above
behavior is wrong: I might want to use `find-file-binary' to read a
binary file (e.g., an executable program) that just happens to have
embedded CRLF pairs, either as part of text messages or even just an
opcode that happens to look like CRLF.  Will I then be presented with
the file with all CRs in CRLF pairs removed?

> a text file without CRLF, Emacs reads it in correctly, but the
> buffer-file-type is "text", and so the file gets written out with LF
> converted to CRLF.  This is not a bug in the coding-system code, but
> rather due to the fact that, internally, Emacs under DOS_NT looks at
> the buffer-file-type, sees "text", and opens the file in text mode,
> and the operating system changes LF to CRLF.

I'm not sure this is a bug, either.  I can imagine cases where the
user would like Unix-style text files be written as DOS-style text.  I
haven't decided yet what the default should be here, but at least a
user-definable option should be available to get the non-default
behavior.

> Given the new coding-system framework, I think that all file I/O under
> DOS_NT should now be done in binary mode since the data that Emacs
> gives to the operating system does not need any conversion.

If that is the case, how would a user tell Emacs that a file which
originally had no CRs should have them added on output (assuming that
you agree that such cases are possible)?

From rms@gnu.ai.mit.edu  Sun Jul  6 17:08:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" " 6" "July" "1997" "20:08:26" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707070008.UAA15380@psilocin.gnu.ai.mit.edu>" "15" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA07584 for <voelker@cs.washington.edu>; Sun, 6 Jul 1997 17:08:02 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA15380; Sun, 6 Jul 1997 20:08:26 -0400
Message-Id: <199707070008.UAA15380@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970706172213.8624D-100000@is> (message from Eli 	Zaretskii on Sun, 6 Jul 1997 17:22:44 +0300 (IDT))
References:  <Pine.SUN.3.91.970706172213.8624D-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: New way of handling CRLF
Date: Sun, 6 Jul 1997 20:08:26 -0400

    What about binary files with embedded CRLFs?

Specifying that a file is binary means specifying no conversion.
Therefore, CRLF in these files will not be converted.

    > Given the new coding-system framework, I think that all file I/O under
    > DOS_NT should now be done in binary mode since the data that Emacs
    > gives to the operating system does not need any conversion.

    If that is the case, how would a user tell Emacs that a file which
    originally had no CRs should have them added on output (assuming that
    you agree that such cases are possible)?

You can certainly do this by specifying a different coding system
when you save the file.

From eliz@is.elta.co.il  Sun Jul 13 10:59:56 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "13" "July" "1997" "20:59:41" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "58" "New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id KAA13469 for <voelker@cs.washington.edu>; Sun, 13 Jul 1997 10:59:55 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id UAA28768; Sun, 13 Jul 1997 20:59:41 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970713205820.28618S-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: Richard Stallman <rms@gnu.ai.mit.edu>,         Andrew Innes <andrewi@harlequin.co.uk>
Subject: New way to handle CRLF in Emacs 20.0
Date: Sun, 13 Jul 1997 20:59:41 +0300 (IDT)


> Actually, the new coding-system framework appears to obviate the need
> for buffer-file-type; file-coding-system-alist and
> buffer-file-coding-system appear to be flexible enough to supercede
> it.  I will need to think more about this, though, since it is a
> rather drastic change under DOS_NT.  (Eli and Andrew, if you get a
> chance to look at the coding system support, I would like to hear what
> you think about doing away with buffer-file-type, too.)

Here's what I think, after spending an evening reading the code and
playing with Emacs.

I also think that the coding system can supercede buffer-file-type.
We need to make a list of filename patterns that will automatically
guess the coding system given a filename.

If a given file is not in the list, Emacs should try to guess its
EOL format, like it does now.  Since this guess might be wrong (for
example, Emacs decides that the file is CRLF-style when it sees the
first CRLF pair, and might thus be fooled by a binary file), it would
be nice to have options e.g. to ask the user whether the guess is
correct, or require more than a single CRLF before a decision is
made.  (I didn't think about this too much, so I might be wrong.)

Emacs should only do the above for filesystems that aren't in the
untranslated list (for which all file I/O should be unconverted).

I'd like to see user options (other than to tell them set the coding
system) to have Emacs write files in specific (CRLF or LF) format.
the default behavior of preserving the original EOL encodings seems
reasonable.  The options would of course just set the coding system,
but I'd rather people who need to do this don't have to know too much
about coding systems.

I also think that the (un)?translated filesystem feature might be
useful to Unix users as well.  I can imagine NT or even DOS disks
mounted via networks, or people who run Linux-based systems and access
DOS partitions there (I actually see quite a few complaints from the
latter on gnu.emacs.help).  These might benefit by adding such disks
to translated systems' list and having Emacs handle the conversion.

So maybe it's a good idea to move this feature to lisp/files.el?

> Currently, the default for file-coding-system-alist is 'undecided.
> Under DOS_NT, this should probably be 'emacs-mule so that CRLF is
> decoded and encoded by default.

I agree.  But shouldn't we also set coding.eol_type, for the EOL
conversion to take place?  I though 'emacs-mule is not enough, no?

> file-name-buffer-file-type-alist and the untranslate functions.  The
> last issue is whether to remove buffer-file-type, but I won't do
> anything about that until more people agree that it is no longer
> necessary.

I think that it can go once the coding system handles everything.  We
need to decide whether the T: or B: in the modeline is necessary (it
seems that the coding system characters show the same information).

From eliz@is.elta.co.il  Sun Jul 13 11:02:20 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "13" "July" "1997" "21:02:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "65" "CRLF on DOS_NT" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id LAA13569 for <voelker@cs.washington.edu>; Sun, 13 Jul 1997 11:02:19 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id VAA28793; Sun, 13 Jul 1997 21:02:01 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970713210059.28618U-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>
Subject: CRLF on DOS_NT
Date: Sun, 13 Jul 1997 21:02:01 +0300 (IDT)


The following changes are required to make CRLF <-> LF conversion work
in most common cases.  I have deliberately not tried to get them into
final shape, since I need to learn more about the coding systems, and
because Geoff said he will work on that.  I didn't install these
changes, for these reasons (and also because I didn't have enough time
to do that today).

See also my other mail about the file format translation.

(Geoff, the `callproc.c' patch is DOS-specific, since that fragment is
for DOS only; you might consider looking up the relevant code for the
NT subprocess support.)

1997-07-10  Eli Zaretskii  <eliz@is.elta.co.il>

	* fileio.c (Fwrite_region) [DOS_NT]: Always use binary mode since
	coding conversion now takes care of NL -> CRLF.

*** src/fileio.c~0	Tue Jul  8 11:36:00 1997
--- src/fileio.c	Thu Jul 10 23:16:14 1997
*************** to the file, instead of any buffer conte
*** 3799,3806 ****
    struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
    struct buffer *given_buffer;
  #ifdef DOS_NT
!   int buffer_file_type
!     = NILP (current_buffer->buffer_file_type) ? O_TEXT : O_BINARY;
  #endif /* DOS_NT */
    struct coding_system coding;
  
--- 3799,3805 ----
    struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
    struct buffer *given_buffer;
  #ifdef DOS_NT
!   int buffer_file_type = O_BINARY;
  #endif /* DOS_NT */
    struct coding_system coding;

1997-07-11  Eli Zaretskii  <eliz@is.elta.co.il>

	* callproc.c (Fcall_process) [MSDOS]: Request EOL conversion of
	the process output, unless we were promised it is binary.

*** src/callproc.c~0	Mon Jul  7 00:56:00 1997
--- src/callproc.c	Fri Jul 11 21:48:30 1997
*************** If you quit, the process is killed with 
*** 295,300 ****
--- 295,311 ----
  	      val = Qnil;
  	  }
  	setup_coding_system (Fcheck_coding_system (val), &process_coding);
+ #ifdef MSDOS
+ 	/* FIXME: this probably should be moved into the guts of
+ 	   `Ffind_operation_coding_system' for the case of `call-process'.  */
+ 	if (NILP (Vbinary_process_output))
+ 	  {
+ 	    process_coding.eol_type = CODING_EOL_CRLF;
+ 	    if (process_coding.type == coding_type_no_conversion)
+ 	      /* FIXME: should we set type to undecided?  */
+ 	      process_coding.type = coding_type_emacs_mule;
+ 	  }
+ #endif
        }
    }

From rms@gnu.ai.mit.edu  Sun Jul 13 14:41:06 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "13" "July" "1997" "17:41:39" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20208 for <voelker@cs.washington.edu>; Sun, 13 Jul 1997 14:41:05 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25670; Sun, 13 Jul 1997 17:41:39 -0400
Message-Id: <199707132141.RAA25670@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970713205820.28618S-100000@is> (message from Eli 	Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT))
References:  <Pine.SUN.3.91.970713205820.28618S-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: New way to handle CRLF in Emacs 20.0
Date: Sun, 13 Jul 1997 17:41:39 -0400

    be nice to have options e.g. to ask the user whether the guess is
    correct, or require more than a single CRLF before a decision is
    made.  (I didn't think about this too much, so I might be wrong.)

I think that is not worth the trouble, given that we still have
find-file-text and find-file-binary.

From rms@gnu.ai.mit.edu  Sun Jul 13 14:43:43 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "13" "July" "1997" "17:44:11" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20278 for <voelker@cs.washington.edu>; Sun, 13 Jul 1997 14:43:42 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25707; Sun, 13 Jul 1997 17:44:11 -0400
Message-Id: <199707132144.RAA25707@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970713205820.28618S-100000@is> (message from Eli 	Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT))
References:  <Pine.SUN.3.91.970713205820.28618S-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: New way to handle CRLF in Emacs 20.0
Date: Sun, 13 Jul 1997 17:44:11 -0400

    I also think that the (un)?translated filesystem feature might be
    useful to Unix users as well.  I can imagine NT or even DOS disks
    mounted via networks,

A feature like this could be useful; but some of the present details
don't fit this new context.  If you are running Emacs on a GNU system,
"untranslated" file systems are the usual case; file systems for which
new files should be translated are the special case.  This is the
opposite of the situation for MSDOS.

From rms@gnu.ai.mit.edu  Sun Jul 13 14:44:49 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Sun" "13" "July" "1997" "17:45:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707132145.RAA25719@psilocin.gnu.ai.mit.edu>" "16" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA20297 for <voelker@cs.washington.edu>; Sun, 13 Jul 1997 14:44:49 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA25719; Sun, 13 Jul 1997 17:45:24 -0400
Message-Id: <199707132145.RAA25719@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970713205820.28618S-100000@is> (message from Eli 	Zaretskii on Sun, 13 Jul 1997 20:59:41 +0300 (IDT))
References:  <Pine.SUN.3.91.970713205820.28618S-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: New way to handle CRLF in Emacs 20.0
Date: Sun, 13 Jul 1997 17:45:24 -0400

    > Currently, the default for file-coding-system-alist is 'undecided.
    > Under DOS_NT, this should probably be 'emacs-mule

No, definitely not.

       so that CRLF is
    > decoded and encoded by default.

CRLF encoding is supposed to happen just the same for undecided
as it does for emacs-mule.

      We
    need to decide whether the T: or B: in the modeline is necessary (it
    seems that the coding system characters show the same information).

Yes, that is something we should decide right now.

From rms@gnu.ai.mit.edu  Fri Jul 18 20:12:22 1997
X-VM-v5-Data: ([nil nil nil nil t t nil nil nil]
	[nil "Fri" "18" "July" "1997" "23:13:02" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707190313.XAA18973@psilocin.gnu.ai.mit.edu>" "71" "Re: CRLF on DOS_NT" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA22532 for <voelker@cs.washington.edu>; Fri, 18 Jul 1997 20:12:21 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA18973; Fri, 18 Jul 1997 23:13:02 -0400
Message-Id: <199707190313.XAA18973@psilocin.gnu.ai.mit.edu>
In-reply-to: <199707182334.QAA39176@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <Pine.SUN.3.91.970713210059.28618U-100000@is> 	<199707132042.QAA25113@psilocin.gnu.ai.mit.edu> 	<199707160153.SAA23676@joker.cs.washington.edu> 	<199707182305.TAA17659@psilocin.gnu.ai.mit.edu> <199707182334.QAA39176@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
Subject: Re: CRLF on DOS_NT
Date: Fri, 18 Jul 1997 23:13:02 -0400

    I've always interpreted the semantics of specifying 'nil' (text) in
    file-name-buffer-file-type-alist as being that you explicitly want
    CRLF separating lines.  For example, no matter what, you want
    config.sys to have CRLFs between the lines.

That is a good point.  So here's the change I've made.

But I wonder whether emacs-mule-dos is the right coding system
in other respects.  You've argued that -dos is right, but is
emacs-mule right?

*** dos-w32.el	1997/07/18 22:54:23	1.6
--- dos-w32.el	1997/07/19 03:10:17
***************
*** 102,128 ****
      If the match is nil (for text):			'emacs-mule-dos'
    Otherwise:
      If the file exists:					'undecided'
!     If the file does not exist:				'emacs-mule-dos'
  
  If COMMAND is 'write-region', the coding system is chosen based
  upon the value of 'buffer-file-type': If t, the coding system is
  'no-conversion', otherwise it is 'emacs-mule-dos'."
    (let ((op (nth 0 command))
  	(target)
! 	(binary)
  	(undecided nil))
      (cond ((eq op 'insert-file-contents) 
  	   (setq target (nth 1 command))
  	   (setq binary (find-buffer-file-type target))
! 	   (if (not binary)
! 	       (setq undecided 
! 		     (and (file-exists-p target)
! 			  (not (find-buffer-file-type-match target))))))
  	  ((eq op 'write-region) 
  	   (setq binary buffer-file-type)))
      (cond (binary '(no-conversion . no-conversion))
  	  (undecided '(undecided . undecided))
! 	  (t '(emacs-mule-dos . emacs-mule-dos)))))
  
  (modify-coding-system-alist 'file "" 'find-buffer-file-type-coding-system)
  
--- 102,129 ----
      If the match is nil (for text):			'emacs-mule-dos'
    Otherwise:
      If the file exists:					'undecided'
!     If the file does not exist:				'undecided-dos'
  
  If COMMAND is 'write-region', the coding system is chosen based
  upon the value of 'buffer-file-type': If t, the coding system is
  'no-conversion', otherwise it is 'emacs-mule-dos'."
    (let ((op (nth 0 command))
  	(target)
! 	(binary nil) (text nil)
  	(undecided nil))
      (cond ((eq op 'insert-file-contents) 
  	   (setq target (nth 1 command))
  	   (setq binary (find-buffer-file-type target))
! 	   (unless binary
! 	     (if (find-buffer-file-type-match target)
! 		 (setq text t)
! 	       (setq undecided (file-exists-p target)))))
  	  ((eq op 'write-region) 
  	   (setq binary buffer-file-type)))
      (cond (binary '(no-conversion . no-conversion))
+ 	  (text '(emacs-mule-dos . emacs-mule-dos))
  	  (undecided '(undecided . undecided))
! 	  (t '(undecided-dos . undecided-dos)))))
  
  (modify-coding-system-alist 'file "" 'find-buffer-file-type-coding-system)
  

From Marc.Fleischeuers@kub.nl  Fri Aug  1 01:07:26 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "" " 1" "August" "1997" "10:07:17" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" "<uk9i6s4wa.fsf@kub.nl>" "69" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA21136 for <voelker@cs.washington.edu>; Fri, 1 Aug 1997 01:07:24 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA27228; Fri, 1 Aug 1997 10:07:19 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu>
In-Reply-To: Richard Stallman's message of Thu, 31 Jul 1997 19:42:35 -0400
Message-ID: <uk9i6s4wa.fsf@kub.nl>
Lines: 69
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Richard Stallman <rms@gnu.ai.mit.edu>
Cc: voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 01 Aug 1997 10:07:17 +0200

>     I'm not sure I understand what you are trying to do.  When a file is
>     inside of Emacs, line are always terminated by newlines.  The line
>     termination that exists when the file is in the filesystem is only
>     placed there when the file is written out.  There is no need to
>     explicitly place CR or LF characters in a file to change the
>     termination used.
> 
> You're right--but perhaps this can be a clue to finding a place
> where the documentation needs to be made clearer.  So it is worth figuring
> out why Marc got the wrong idea.

My intent was too straightforward, obviously. I noticed some problems
with msdos-type files (note: these files were not created with emacs,
but with other programs most notably netscape). I knew about the cr-lf
line ending convention so in an attempt to create a msdos file in
emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note
that this works as expected in emacs 19.33 (i386-*-Win NT 4.0).

The variables Geoff mentioned (buffer-file-type and
coding-system-for-write) have sent me off on a chase though emacs'
help. Skip to the last paragraph if you are not interested in the dead
ends.

First, `C-h v buffer-file-type' mentions that this is a MS-DOG and
Windows NT-only variable, and that it's value is nil. I tried to set
the variable with M-x set-variable RET buffer-file-type but when I
press return all I get is [no match]. I don't think this is a great
loss though, surely with so many advanced encoding and decoding
facilities, there is no more need for MSDOG as a special case?

On to `coding-system-for-write'. The documentation mentions that this
is a variable of internal use only. Setting it would probably require
lisp. The appropriate values for this variable should be taken from
`coding-system-alist'. There is however no documentation for this
variable (`C-h v coding-system-alist' -> [no match]). Still, an
internal variable is not the first thing to use if I want to creat an
ms-dos file. 

Apropos'ing around I found another promising variable,
`buffer-file-format', valid values for which are found in
`format-alist'. In this alist there seems to be an appropriate format,
`ibm'. However, `M-x set-variable RET buffer-file-format' again gives
[no match]. 

What I should have used all along was `M-x
set-buffer-file-coding-system RET iso-latin-1-dos'. This function is
accessible from the menu ([menu-bar mule set-various-coding-systems
set-buffer-file-coding-system]) and from the C-x RET keymap. However,
it was only from the resulting file that I could see that it was what
I wanted (in fact there may still be a better way). The documentation
for `set-buffer-file-coding-system' does not mention to what values it
can be set, and the description in `M-x describe-coding-system' does
not mention what any of the listed coding systems do. In fact, after
I selected iso-latin-1-dos, it was described as

Current buffer file: buffer-file-coding-system
 - -- undecided-dos

The short answer is the documentation for describe-coding-system and
set-*-coding-system could be improved upon. For
describe-coding-system, why is it necessary to mention the priority of
coding systems? Instead, use the space to explain what the selected
coding systems do. For `set-*-coding-systems', it could be mentioned
to what values it can be set, and perhaps what they do.

Marc
-- 
Computer! End program!
Computer! Create _new_ program!

From rms@gnu.ai.mit.edu  Sat Aug  2 03:18:33 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sat" " 2" "August" "1997" "06:18:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "13" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24743 for <voelker@cs.washington.edu>; Sat, 2 Aug 1997 03:18:33 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id GAA13689; Sat, 2 Aug 1997 06:18:47 -0400
Message-Id: <199708021018.GAA13689@psilocin.gnu.ai.mit.edu>
In-reply-to: <uk9i6s4wa.fsf@kub.nl> (message from Marc Fleischeuers on 01 Aug 	1997 10:07:17 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Sat, 2 Aug 1997 06:18:47 -0400

    What I should have used all along was `M-x
    set-buffer-file-coding-system RET iso-latin-1-dos'. This function is
    accessible from the menu ([menu-bar mule set-various-coding-systems
    set-buffer-file-coding-system]) and from the C-x RET keymap. However,
    it was only from the resulting file that I could see that it was what

I improved the doc of this command.
But that won't fully solve the problem.

What I really should do is to point you at this command from somewhere
else that you would naturally look.

Any suggestions for where that could be?

From rms@gnu.ai.mit.edu  Sat Aug  2 21:23:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" " 3" "August" "1997" "00:23:09" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA20521 for <voelker@cs.washington.edu>; Sat, 2 Aug 1997 21:23:03 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA25764; Sun, 3 Aug 1997 00:23:09 -0400
Message-Id: <199708030423.AAA25764@psilocin.gnu.ai.mit.edu>
In-reply-to: <uk9i6s4wa.fsf@kub.nl> (message from Marc Fleischeuers on 01 Aug 	1997 10:07:17 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: voelker@cs.washington.edu, Marc.Fleischeuers@kub.nl,         andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Sun, 3 Aug 1997 00:23:09 -0400

    I knew about the cr-lf
    line ending convention so in an attempt to create a msdos file in
    emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note
    that this works as expected in emacs 19.33 (i386-*-Win NT 4.0).

Is that really true?  What algorithm does 19.33 use for LF to CRLF
conversion?  Maybe we should change the Emacs 20 EOL conversion to do
the same thing.

From handa@etl.go.jp  Sun Aug  3 18:32:56 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "10:33:49" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA20790 for <voelker@cs.washington.edu>; Sun, 3 Aug 1997 18:32:53 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA06878; Mon, 4 Aug 1997 10:32:33 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00812; Mon, 4 Aug 1997 10:32:33 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA04718; Mon, 4 Aug 1997 10:33:49 +0900
Message-Id: <199708040133.KAA04718@etlken.etl.go.jp>
In-reply-to: <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Sun, 3 Aug 1997 00:23:09 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Mon, 4 Aug 1997 10:33:49 +0900

   Date: Sun, 3 Aug 1997 00:23:09 -0400
   From: Richard Stallman <rms@gnu.ai.mit.edu>

       I knew about the cr-lf
       line ending convention so in an attempt to create a msdos file in
       emacs, I ended lines with an explicit `C-q Cm C-q C-j'. Please note
       that this works as expected in emacs 19.33 (i386-*-Win NT 4.0).

   Is that really true?  What algorithm does 19.33 use for LF to CRLF
   conversion?  Maybe we should change the Emacs 20 EOL conversion to do
   the same thing.

Since the above is the first mail I get about this thread, this reply
may fail to catch the point...

I don't know why the above doesn't work for Emacs 20.

I've just tried the following.
1) At first, visit a new file.
2) type `a b c C-q C-m C-q C-j'
3) save it.
4) visit it again.

Then the file is read as `undecided-dos' and the buffer contents are
4-byte of:
	abc\C-j
This means that CR LF is decoded to single LF.  But, since
buffer-file-coding-system is undecided-dos, when I edit this file and
save it, all LFs are encoded back to CR LF.

---
Ken'ichi HANDA
handa@etl.go.jp

From Marc.Fleischeuers@kub.nl  Mon Aug  4 01:42:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 4" "August" "1997" "10:42:27" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "25" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA03503 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 01:42:36 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA27167; Mon, 4 Aug 1997 10:42:25 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708021018.GAA13689@psilocin.gnu.ai.mit.edu>
In-Reply-To: Richard Stallman's message of Sat, 2 Aug 1997 06:18:47 -0400
Message-ID: <u4t96730s.fsf@kub.nl>
Lines: 25
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Richard Stallman <rms@gnu.ai.mit.edu>
Cc: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 04 Aug 1997 10:42:27 +0200

Richard Stallman <rms@gnu.ai.mit.edu> writes:

> What I really should do is to point you at this command from somewhere
> else that you would naturally look.
> 
> Any suggestions for where that could be?


The command is in the menu and in the advertised C-x RET keymap; I
think anyone inversitgating emacs' new features should find these
functions easily (I did't go there straightaway because I first
followed the suggestions by Geoff Voelker). In the menu-bar there is
even a corresponding `describe' entry for input method and coding
systems. I think this is a good thing, this is where I would look if I
were a user. In fact I did look there when I first started emacs 20;
it's just that the descriptions are not very informative about what
the functions actually do for me (input methods do not work (yet?) so
I cannot comment on that).

If the documentation for set-buffer-file-coding-system, and `M-x
describe-coding-system' give information about the available,
resp. selected coding systems and what they do for me, I think this
should do it.

Marc

From Marc.Fleischeuers@kub.nl  Mon Aug  4 02:40:35 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 4" "August" "1997" "11:40:11" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "45" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA04516 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 02:40:28 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA00950; Mon, 4 Aug 1997 11:40:09 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp>
In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 10:33:49 +0900
Message-ID: <u3eoq70ck.fsf@kub.nl>
Lines: 45
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Kenichi Handa <handa@etl.go.jp>
Cc: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 04 Aug 1997 11:40:11 +0200

Kenichi Handa <handa@etl.go.jp> writes:

> I've just tried the following.
> 1) At first, visit a new file.
> 2) type `a b c C-q C-m C-q C-j'
> 3) save it.
> 4) visit it again.
> 
> Then the file is read as `undecided-dos' and the buffer contents are
> 4-byte of:
> 	abc\C-j
> This means that CR LF is decoded to single LF.  But, since
> buffer-file-coding-system is undecided-dos, when I edit this file and
> save it, all LFs are encoded back to CR LF.

This is the way it should be, unfortunately it is not for me. I have
repeated the four steps above. 

When I first open a new file, the buffer-file-coding-system is nil and
the mode-line indicator is `:'.  If I insert `a b c C-q C-m C-q C-j'
in the buffer and then save the file, the buffer contains the five
bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the
mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the
contents of the file is `6162 630d 0d0a'.

If I then re-visit the file (`C-x C-v RET') it contains six bytes
`abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac',
and the mode-line indicator is `/'.

I started emacs with `c:\emacs\bin\emacs.bat --no-site-file
--no-init-file' The batch file sets a number of environment
variables. It is not modified from the one generated by the install
process. I use emacs 20.0.92 on Windows NT 4.0, compiled with MS VC++
4.2. 

I have also used the following version, started the same way, to
perform exactly the same steps:
In GNU Emacs 19.33.1 (i386-*-nt4.0) of Wed Aug 14 1996 on BANANA-FISH
configured using `configure NT'

The file is written and read back in as the five bytes `abc\C-m\C-y'.
There is a mode-line indicator `(T:', both when I first open the file
and when I read it back in.

Marc

From handa@etl.go.jp  Mon Aug  4 04:43:24 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "20:37:31" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "69" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA06611 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 04:43:23 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id UAA11557; Mon, 4 Aug 1997 20:36:15 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA07947; Mon, 4 Aug 1997 20:36:15 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id UAA05301; Mon, 4 Aug 1997 20:37:31 +0900
Message-Id: <199708041137.UAA05301@etlken.etl.go.jp>
In-reply-to: <u3eoq70ck.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 11:40:11 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl>
From: Kenichi Handa <handa@etl.go.jp>
To: Marc.Fleischeuers@kub.nl
CC: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Mon, 4 Aug 1997 20:37:31 +0900

   From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
   Date: 04 Aug 1997 11:40:11 +0200
   Kenichi Handa <handa@etl.go.jp> writes:
   > I've just tried the following.
   > 1) At first, visit a new file.
   > 2) type `a b c C-q C-m C-q C-j'
   > 3) save it.
   > 4) visit it again.
   > 
   > Then the file is read as `undecided-dos' and the buffer contents are
   > 4-byte of:
   > 	abc\C-j
   > This means that CR LF is decoded to single LF.  But, since
   > buffer-file-coding-system is undecided-dos, when I edit this file and
   > save it, all LFs are encoded back to CR LF.

   This is the way it should be, unfortunately it is not for me. I have
   repeated the four steps above. 

   When I first open a new file, the buffer-file-coding-system is nil and
   the mode-line indicator is `:'.  If I insert `a b c C-q C-m C-q C-j'
   in the buffer and then save the file, the buffer contains the five
   bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the
   mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the
   contents of the file is `6162 630d 0d0a'.

Hmm, the sequence CR LF was written out as CR CR LF.  It seems that
the file is opened by O_TEXT instead of O_BINARY.  But, this should
have been fixed in 20.0.92 already.  Strange...

Could you please check the file src/fileio.c?  Is it applied the
following patch made by <eliz@is.elta.co.il>?

------------------------------------------------------------
RCS file: RCS/fileio.c,v
retrieving revision 1.250
retrieving revision 1.251
diff -u -r1.250 -r1.251
--- fileio.c    1997/07/12 06:43:08     1.250
+++ fileio.c    1997/07/13 20:37:01     1.251
@@ -3799,8 +3799,7 @@
   struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
   struct buffer *given_buffer;
 #ifdef DOS_NT
-  int buffer_file_type
-    = NILP (current_buffer->buffer_file_type) ? O_TEXT : O_BINARY;
+  int buffer_file_type = O_BINARY;
 #endif /* DOS_NT */
   struct coding_system coding;
 
------------------------------------------------------------

   If I then re-visit the file (`C-x C-v RET') it contains six bytes
   `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac',
   and the mode-line indicator is `/'.

This is an expected behaviour when Emacs reads `a b c CR CR LF'.  When
Emacs encounters CR not followed by LF, it thinks the end-of-line
format for the file is CR (Mac's convention), and translate CR to LF.
LF is read as is.  So, the buffer contains three LFs.

So, the problem seems to be in writing a file.

Anyway, I don't have Windows NT.  I asked a person who is an expert of
Windows to check the code.

---
Ken'ichi HANDA
handa@etl.go.jp

From Marc.Fleischeuers@kub.nl  Mon Aug  4 05:21:28 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 4" "August" "1997" "14:18:51" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "22" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA07716 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 05:21:26 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id OAA10741; Mon, 4 Aug 1997 14:18:49 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<19 <199708041137.UAA05301@etlken.etl.go.jp>
In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 20:37:31 +0900
Message-ID: <uvi1m5efo.fsf@kub.nl>
Lines: 22
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Kenichi Handa <handa@etl.go.jp>
Cc: Marc.Fleischeuers@kub.nl, rms@gnu.ai.mit.edu, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: 04 Aug 1997 14:18:51 +0200

Kenichi Handa <handa@etl.go.jp> writes:

> Hmm, the sequence CR LF was written out as CR CR LF.  It seems that
> the file is opened by O_TEXT instead of O_BINARY.  But, this should
> have been fixed in 20.0.92 already.  Strange...
> 
> Could you please check the file src/fileio.c?  Is it applied the
> following patch made by <eliz@is.elta.co.il>?

This patch is applied (that is, it says ``int buffer_file_type =
O_BINARY'', that's what it should be I think)

> So, the problem seems to be in writing a file.

In a previous post today, I have described how both 19.33 and 20.0.92
both write the same bytes to disk. If the way in which this file is
read in is correct (as I understand from your and RMS' posts) then a)
the way the cr and lf sequences are interpreted differs between 19.34
and 20.0.92, and b) this difference in reading, is indeed not matched
by an appropriate difference in writing.



From handa@etl.go.jp  Mon Aug  4 05:54:20 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "21:54:40" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA08453 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 05:54:19 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id VAA13372; Mon, 4 Aug 1997 21:53:24 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA09960; Mon, 4 Aug 1997 21:53:25 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id VAA05378; Mon, 4 Aug 1997 21:54:40 +0900
Message-Id: <199708041254.VAA05378@etlken.etl.go.jp>
In-reply-to: <uvi1m5efo.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 14:18:51 +0200)
From: Kenichi Handa <handa@etl.go.jp>
To: Marc.Fleischeuers@kub.nl
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: Mon, 4 Aug 1997 21:54:40 +0900

   From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
   Date: 04 Aug 1997 14:18:51 +0200
   > Could you please check the file src/fileio.c?  Is it applied the
   > following patch made by <eliz@is.elta.co.il>?

   This patch is applied (that is, it says ``int buffer_file_type =
   O_BINARY'', that's what it should be I think)

I have just found that lisp/dos-w23.el is doing something about
deciding coding system.  Although I have not yet read the code in
detail, I suspect that the code decides that coding system for writing
a file on NT is undecided-dos.  If it is true, it explains everything,
because Emacs writes CR as is and converts LF to CR LF when it writes
a file by undecided-dos.

Perhaps, Mr. Voelker wrote this code so that NT users don't have to do
special thing to create a DOS file.  In your case, you don't have to
insert \C-m by hand to creat a DOS file.

Please just try the followings:
1) visit a new file
2) type `a b c RET'
3) save the file.
4) visit the file again.

You should be able to create a file of `a b c CR LF' by step 3, and
buffer-file-coding-system is set to undecided-dos by step 4.

Mr. Voelker?  Is this correct?

---
Ken'ichi HANDA
handa@etl.go.jp

From Marc.Fleischeuers@kub.nl  Mon Aug  4 05:59:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 4" "August" "1997" "14:58:42" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "33" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA08582 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 05:58:53 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id OAA14494; Mon, 4 Aug 1997 14:58:40 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<19 <199708041137.UAA05301@etlken.etl.go.jp> <uvi1m5efo.fsf@kub.nl>
In-Reply-To: Marc Fleischeuers's message of 04 Aug 1997 14:18:51 +0200
Message-ID: <uraca5cl9.fsf@kub.nl>
Lines: 33
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Cc: Kenichi Handa <handa@etl.go.jp>, rms@gnu.ai.mit.edu,         voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: 04 Aug 1997 14:58:42 +0200

Marc Fleischeuers <Marc.Fleischeuers@kub.nl> writes:

> Kenichi Handa <handa@etl.go.jp> writes:
> 
> > Hmm, the sequence CR LF was written out as CR CR LF.  It seems that
> > the file is opened by O_TEXT instead of O_BINARY.  But, this should
> > have been fixed in 20.0.92 already.  Strange...
> > 
> > Could you please check the file src/fileio.c?  Is it applied the
> > following patch made by <eliz@is.elta.co.il>?
> 
> This patch is applied (that is, it says ``int buffer_file_type =
> O_BINARY'', that's what it should be I think)
> 
> > So, the problem seems to be in writing a file.

I have examined the value of the lisp-variable `buffer-file-type' in
several stages after reading and writing files containing cr and lf
sequences. The value of this variable was always nil, indicating a
text (i.e., non-binary) file. In buffer-file-type-alist it is set that
files with extension '.tpu' are interpreted as binary, so I tried 

C-x C-f new.tpu a b c C-q C-m C-q C-j C-x C-s C-x C-v RET

This file is created containing the intended 5 bytes, and it is read
back in "correctly" (buffer contains `abc^M'). After reading the file
in, the mode line indicator is `=:', and buffer-file-coding-system is
`= -- no-conversion (alias: binary)'.

May I argue that the concept of `buffer-file-type' and its associated
variables and functions are removed from emacs 20?

Marc

From Marc.Fleischeuers@kub.nl  Mon Aug  4 06:36:36 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 4" "August" "1997" "15:36:27" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "14" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id GAA10461 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 06:36:35 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id PAA16644; Mon, 4 Aug 1997 15:36:25 +0200 (MET DST)
References: <199708041254.VAA05378@etlken.etl.go.jp>
In-Reply-To: Kenichi Handa's message of Mon, 4 Aug 1997 21:54:40 +0900
Message-ID: <upvru5auc.fsf@kub.nl>
Lines: 14
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Kenichi Handa <handa@etl.go.jp>
Cc: Marc.Fleischeuers@kub.nl, rms@gnu.ai.mit.edu, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: 04 Aug 1997 15:36:27 +0200

Kenichi Handa <handa@etl.go.jp> writes:

> I have just found that lisp/dos-w23.el is doing something about
> deciding coding system.  Although I have not yet read the code in
> detail, I suspect that the code decides that coding system for writing
> a file on NT is undecided-dos.  If it is true, it explains everything,

I was reading there too. It appears that emacs does a lot of thinking
for me. I have done some light testing, and it looks like everything
acts like I expect it to, when (untranslated-file-p filename) returns
t. This is what I'll be doing for a while, until something else
breaks.. 

Marc

From rms@gnu.ai.mit.edu  Mon Aug  4 12:46:38 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "15:46:48" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA03249 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 12:46:37 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA22721; Mon, 4 Aug 1997 15:46:48 -0400
Message-Id: <199708041946.PAA22721@psilocin.gnu.ai.mit.edu>
In-reply-to: <uvi1m5efo.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 14:18:51 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<19 <199708041137.UAA05301@etlken.etl.go.jp> <uvi1m5efo.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: Mon, 4 Aug 1997 15:46:48 -0400

    > Hmm, the sequence CR LF was written out as CR CR LF.  It seems that
    > the file is opened by O_TEXT instead of O_BINARY.

I would expect this is because of the usual DOS eol conversion.
and not because of O_TEXT.

    This patch is applied (that is, it says ``int buffer_file_type =
    O_BINARY'', that's what it should be I think)

I am not surprised.


The same code in Emacs that converts just LF to CR LF
will of course do so when the preceding character is a CR--
unless there is something special to stop it.

As far as I know, there is nothing special
to avoid encoding LF as CR LF based on the preceding character.
Handa, is there?

From rms@gnu.ai.mit.edu  Mon Aug  4 12:55:40 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "15:55:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA03818 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 12:55:39 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA22807; Mon, 4 Aug 1997 15:55:46 -0400
Message-Id: <199708041955.PAA22807@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708041254.VAA05378@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 4 Aug 1997 21:54:40 +0900)
References:  <199708041254.VAA05378@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: Mon, 4 Aug 1997 15:55:46 -0400

      If it is true, it explains everything,
    because Emacs writes CR as is and converts LF to CR LF when it writes
    a file by undecided-dos.

Yes, of course.  I've been telling both of you this over and over.
You've been trying to unravel a mystery which is not a mystery at all.

The real question is, should we put in a special feature to override
that behavior when the buffer contains a CR?  Should DOS-style EOL
conversion recognize when the buffer contains CR LF, and output it as
CR LF (rather than CR CR LF)?

From rms@gnu.ai.mit.edu  Mon Aug  4 13:05:08 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "August" "1997" "16:05:23" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA04450 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 13:05:07 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA22977; Mon, 4 Aug 1997 16:05:23 -0400
Message-Id: <199708042005.QAA22977@psilocin.gnu.ai.mit.edu>
In-reply-to: <upvru5auc.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 15:36:27 +0200)
References: <199708041254.VAA05378@etlken.etl.go.jp> <upvru5auc.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: Mon, 4 Aug 1997 16:05:23 -0400

    I was reading there too. It appears that emacs does a lot of thinking
    for me. I have done some light testing, and it looks like everything
    acts like I expect it to, when (untranslated-file-p filename) returns
    t.

That sentence is ambiguous; it could mean there is a no problem, or it
could mean there is a serious problem.

untranslated-file-p is supposed to return t only when the file resides
on a file system that is mounted on a Unix-like system.  That is an
unusual case for an MSDOS user; therefore, it is not the really
important case.  The really important case is when untranslated-file-p
returns nil.

So let's focus on the most important question first: what happens when
untranslated-file-p returns nil?  Do you get correct behavior in all
cases?  If not, can you tell us precisely which case is still
incorrect?

From handa@etl.go.jp  Mon Aug  4 17:28:58 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "09:29:44" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "34" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA20984 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 17:28:57 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id JAA25767; Tue, 5 Aug 1997 09:28:29 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA29964; Tue, 5 Aug 1997 09:28:29 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id JAA05934; Tue, 5 Aug 1997 09:29:44 +0900
Message-Id: <199708050029.JAA05934@etlken.etl.go.jp>
In-reply-to: <199708041946.PAA22721@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Mon, 4 Aug 1997 15:46:48 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<19 <199708041137.UAA05301@etlken.etl.go.jp> <uvi1m5efo.fsf@kub.nl> <199708041946.PAA22721@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error           converting cr-lf]
Date: Tue, 5 Aug 1997 09:29:44 +0900

   Date: Mon, 4 Aug 1997 15:46:48 -0400
   From: Richard Stallman <rms@gnu.ai.mit.edu>

   The same code in Emacs that converts just LF to CR LF
   will of course do so when the preceding character is a CR--
   unless there is something special to stop it.

   As far as I know, there is nothing special
   to avoid encoding LF as CR LF based on the preceding character.
   Handa, is there?

You are right.  I didn't wrote such a special code.

	 If it is true, it explains everything,
       because Emacs writes CR as is and converts LF to CR LF when it writes
       a file by undecided-dos.

   Yes, of course.  I've been telling both of you this over and over.
   You've been trying to unravel a mystery which is not a mystery at all.

Please note that I joined this discussion from halfway.

   The real question is, should we put in a special feature to override
   that behavior when the buffer contains a CR?  Should DOS-style EOL
   conversion recognize when the buffer contains CR LF, and output it as
   CR LF (rather than CR CR LF)?

I don't like it because it's too kluge (can I use this word as an
adjective?).  But, if DOS users want it, it's not that difficult to
implement it.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Mon Aug  4 23:31:17 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "02:30:07" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "24" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA03522 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 23:31:16 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA31410; Tue, 5 Aug 1997 02:30:07 -0400
Message-Id: <199708050630.CAA31410@psilocin.gnu.ai.mit.edu>
In-reply-to: <u3eoq70ck.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 11:40:11 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 02:30:07 -0400

    When I first open a new file, the buffer-file-coding-system is nil and
    the mode-line indicator is `:'.  If I insert `a b c C-q C-m C-q C-j'
    in the buffer and then save the file, the buffer contains the five
    bytes `abc\C-m\C-j', buffer-file-coding-system is still nil, and the
    mode-line indicator is still `:'. With `c:\emacs\bin\hexl abc', the
    contents of the file is `6162 630d 0d0a'.

This is the right behavior, as Emacs is currently designed.
It may not be quite the best feature, but it is not a bug.

    If I then re-visit the file (`C-x C-v RET') it contains six bytes
    `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac',
    and the mode-line indicator is `/'.

This is a bug.  The presence of CR CR LF in the file
should not cause mac EOL conversion to be used.

I think that Emacs is being too quick to use mac EOL conversion.
I suspect that right now any CR not followed by LF does this.

If the file contains a LF anywhere near the beginning,
then mac EOL conversion should not be used.

Handa, can you fix this?

From rms@gnu.ai.mit.edu  Mon Aug  4 23:35:32 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "02:31:37" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA03608 for <voelker@cs.washington.edu>; Mon, 4 Aug 1997 23:35:32 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA31418; Tue, 5 Aug 1997 02:31:37 -0400
Message-Id: <199708050631.CAA31418@psilocin.gnu.ai.mit.edu>
In-reply-to: <u3eoq70ck.fsf@kub.nl> (message from Marc Fleischeuers on 04 Aug 	1997 11:40:11 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 02:31:37 -0400

    I have also used the following version, started the same way, to
    perform exactly the same steps:
    In GNU Emacs 19.33.1 (i386-*-nt4.0) of Wed Aug 14 1996 on BANANA-FISH
    configured using `configure NT'

    The file is written and read back in as the five bytes `abc\C-m\C-y'.
    There is a mode-line indicator `(T:', both when I first open the file
    and when I read it back in.

If you write the file out and then visit it again,
you are performing two experiments in series
and you are telling only the result of the two of them.

That isn't really useful.  You need to tell us the result
of each experiment.

In other words, what exactly is in the file when you write it with
Emacs 19 in this way?  Is it a b c CR CR LF or a b c CR LF or what?

From handa@etl.go.jp  Tue Aug  5 01:10:29 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "17:10:48" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06992 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 01:10:27 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA23247; Tue, 5 Aug 1997 17:09:34 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA26427; Tue, 5 Aug 1997 17:09:34 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA06812; Tue, 5 Aug 1997 17:10:48 +0900
Message-Id: <199708050810.RAA06812@etlken.etl.go.jp>
In-reply-to: <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 5 Aug 1997 02:30:07 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl,         voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 17:10:48 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     If I then re-visit the file (`C-x C-v RET') it contains six bytes
>     `abc\C-j\C-j\C-j', buffer-file-coding-system is `- -- undecided-mac',
>     and the mode-line indicator is `/'.

> This is a bug.  The presence of CR CR LF in the file
> should not cause mac EOL conversion to be used.
> I think that Emacs is being too quick to use mac EOL conversion.
> I suspect that right now any CR not followed by LF does this.

Right.

> If the file contains a LF anywhere near the beginning,
> then mac EOL conversion should not be used.
> Handa, can you fix this?

Yes.  But how about LF CR LF or CR LF LF?  Should they be recognized
as DOS format or Unix format?

Hmmm, how about accumulating how many times each possible end-of-line
format appears, and select the one which first occurs 3 times?  If
none occurs 3 times, perhaps we should select the one occurs last.
Then,
	CR CR LF -> DOS
	LF CR LF -> DOS
	CR LF LF -> Unix
	CR LF CR -> Mac
---
Ken'ichi HANDA
handa@etl.go.jp

From Marc.Fleischeuers@kub.nl  Tue Aug  5 01:11:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 5" "August" "1997" "10:11:00" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "42" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA07028 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 01:11:09 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA22959; Tue, 5 Aug 1997 10:11:00 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> 	<199708050631.CAA31418@psilocin.gnu.ai.mit.edu>
In-Reply-To: Richard Stallman's message of Tue, 5 Aug 1997 02:31:37 -0400
Message-ID: <uk9i1jbhn.fsf@kub.nl>
Lines: 42
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Richard Stallman <rms@gnu.ai.mit.edu>
Cc: Marc.Fleischeuers@kub.nl, handa@etl.go.jp, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 05 Aug 1997 10:11:00 +0200

Richard Stallman <rms@gnu.ai.mit.edu> writes:

> If you write the file out and then visit it again,
> you are performing two experiments in series
> and you are telling only the result of the two of them.
> 
> That isn't really useful.  You need to tell us the result
> of each experiment.

The emacsen are on different machines, hence I can safely use the same
pathnames.

GNU Emacs 19.33.1 (i386-*-nt4.0) 	GNU Emacs 20.0.92.1 (i386-*-nt4.0)
started with:				started with:
C:\> c:\emacs\bin\emacs.bat -nw 	C:\> c:\emacs\bin\emacs.bat -nw
     --no-site-file --no-init-file	     --no-site-file --no-init-file 
					     
Input:					Input:
C-x C-f t . t RET a b c C-q RET RET	C-x C-f t . t RET a b c C-q RET RET
d e f C-q RET RET C-x C-s		d e f C-q RET RET C-x C-s

Buffer looks like:			Buffer looks like:

abc^M					abc^M
def^M					def^M

File contents:				File contents:
6162 630d 0d0a 6465 660d 0d0a		6162 630d 0d0a 6465 660d 0d0a

Input:					Input:
C-x C-v RET				C-x C-v RET

Buffer looks like:			Buffer looks like:
abc^M					abc
def^M
-------
					def


                                        -------

Marc

From Marc.Fleischeuers@kub.nl  Tue Aug  5 01:23:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 5" "August" "1997" "10:23:02" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "14" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA07239 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 01:23:14 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA23774; Tue, 5 Aug 1997 10:23:02 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> 	<199708050630.CAA31410@psilocin.gnu.ai.mit.edu> 	<199708050810.RAA06812@etlken.etl.go.jp>
In-Reply-To: Kenichi Handa's message of Tue, 5 Aug 1997 17:10:48 +0900
Message-ID: <uiuxljaxl.fsf@kub.nl>
Lines: 14
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Kenichi Handa <handa@etl.go.jp>
Cc: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 05 Aug 1997 10:23:02 +0200

Kenichi Handa <handa@etl.go.jp> writes:

> Yes.  But how about LF CR LF or CR LF LF?  Should they be recognized
> as DOS format or Unix format?

And what about CR CR LF LF? LF CR CR LF? 
Yes I'm joking. However, after spending two days chasing a bug
eventually finding myself outwitted by emacs' intelligence in
dos-w32.el, I tend to think that emacs should not be too smart. 

If the distribution of CR and LF throughout the file do not form a
clear pattern, would `no conversion' be an option?

Marc

From rms@gnu.ai.mit.edu  Tue Aug  5 01:39:26 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "04:38:02" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "22" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA08178 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 01:39:25 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id EAA00520; Tue, 5 Aug 1997 04:38:02 -0400
Message-Id: <199708050838.EAA00520@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708040133.KAA04718@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 4 Aug 1997 10:33:49 +0900)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 04:38:02 -0400

    I've just tried the following.
    1) At first, visit a new file.
    2) type `a b c C-q C-m C-q C-j'
    3) save it.
    4) visit it again.

    Then the file is read as `undecided-dos' and the buffer contents are
    4-byte of:
	    abc\C-j
    This means that CR LF is decoded to single LF.

Are you doing this on DOS, or on Unix?

If you are on Unix, this behavior is correct, because on Unix new
files are normally written with no EOL conversion.

But if this happened on DOS, it woudl be a bug.

I think Marc told us this is not what happens for him.
Rather, the file is written with DOS EOL conversion,
which is correct according to the current specs.
Then reading the file again mistakenly uses Mac EOL conversion.

From handa@etl.go.jp  Tue Aug  5 05:01:43 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "21:00:37" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "103" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA11469 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 05:01:42 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id UAA02607; Tue, 5 Aug 1997 20:59:22 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA06694; Tue, 5 Aug 1997 20:59:22 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id VAA07192; Tue, 5 Aug 1997 21:00:37 +0900
Message-Id: <199708051200.VAA07192@etlken.etl.go.jp>
In-reply-to: <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 5 Aug 1997 04:38:02 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 21:00:37 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     I've just tried the following.
>     1) At first, visit a new file.
>     2) type `a b c C-q C-m C-q C-j'
>     3) save it.
>     4) visit it again.
>     Then the file is read as `undecided-dos' and the buffer contents are
>     4-byte of:
> 	    abc\C-j
>     This means that CR LF is decoded to single LF.

> Are you doing this on DOS, or on Unix?

I'm using Unix.

> If you are on Unix, this behavior is correct, because on Unix new
> files are normally written with no EOL conversion.
> But if this happened on DOS, it woudl be a bug.

Right.

> I think Marc told us this is not what happens for him.
> Rather, the file is written with DOS EOL conversion,
> which is correct according to the current specs.
> Then reading the file again mistakenly uses Mac EOL conversion.

Yes, I now know it.

Marc Fleischeuers <Marc.Fleischeuers@kub.nl> writes:
>> Yes.  But how about LF CR LF or CR LF LF?  Should they be recognized
>> as DOS format or Unix format?

> And what about CR CR LF LF? LF CR CR LF? 
> Yes I'm joking. However, after spending two days chasing a bug
> eventually finding myself outwitted by emacs' intelligence in
> dos-w32.el, I tend to think that emacs should not be too smart. 

I agree.

> If the distribution of CR and LF throughout the file do not form a
> clear pattern, would `no conversion' be an option?

To decide the pattern is clear or not is very difficult.  In addition,
we had better not do exhaustive scanning throughout the file.

So, I suggest the following code.  This scans buffer until it
encounters 3 end-of-lines.  If it founds two different patterns while
scanning, it decides not to decode end-of-line (by returning
CODING_EOL_LF).  So, in any of the following cases, it doesn't decode
end-of-line.
	CR CR LF,  LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF.
I think it is clear enough, and users won't be surprised that much.

What do you all think?

---
Ken'ichi HANDA
handa@etl.go.jp

--in src/coding.c------------------------------------------------------------

/* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
   is encoded.  Return one of CODING_EOL_LF, CODING_EOL_CRLF,
   CODING_EOL_CR, and CODING_EOL_UNDECIDED.  */

#define MAX_EOL_CHECK_COUNT 3

int
detect_eol_type (src, src_bytes)
     unsigned char *src;
     int src_bytes;
{
  unsigned char *src_end = src + src_bytes;
  unsigned char c;
  int total = 0;		/* How many end-of-lines are found so far.  */
  int eol_type = CODING_EOL_UNDECIDED;
  int this_eol_type;

  while (src < src_end && total < MAX_EOL_CHECK_COUNT)
    {
      c = *src++;
      if (c == '\n' || c == '\r')
	{
	  total++;
	  if (c == '\n')
	    this_eol_type = CODING_EOL_LF;
	  else if (src >= src_end || *src != '\n')
	    this_eol_type = CODING_EOL_CR;
	  else
	    this_eol_type = CODING_EOL_CRLF, src++;

	  if (eol_type == CODING_EOL_UNDECIDED)
	    /* This is the first end-of-line.  */
	    eol_type = this_eol_type;
	  else if (eol_type != this_eol_type)
	    /* The found type is different from what found before.
	       We had better not decode end-of-line.  */
	    return CODING_EOL_LF;
	}
    }

  return (total ? eol_type : CODING_EOL_UNDECIDED);
}

From andrewi@harlequin.co.uk  Tue Aug  5 09:01:30 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "17:00:38" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "63" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21469 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 09:01:26 -0700
Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50])           by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id RAA15639; Tue, 5 Aug 1997 17:01:13 +0100 (BST)
Received: from woozle.long.harlequin.co.uk (woozle.long.harlequin.co.uk [193.128.93.77]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id RAA14620; Tue, 5 Aug 1997 17:00:38 +0100 (BST)
Message-Id: <199708051600.RAA14620@propos.long.harlequin.co.uk>
In-reply-to: <199708051200.VAA07192@etlken.etl.go.jp> (message from Kenichi 	Handa on Tue, 5 Aug 1997 21:00:37 +0900)
From: Andrew Innes <andrewi@harlequin.co.uk>
To: handa@etl.go.jp
CC: rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 17:00:38 +0100 (BST)

(Replying to several messages.)

Aside: I haven't had a chance to study the new coding support yet, so my
comments may be based on misunderstanding or ignorance of it.

On Mon, 4 Aug 1997 15:55:46 -0400, Richard Stallman <rms@gnu.ai.mit.edu> said:
>The real question is, should we put in a special feature to override
>that behavior when the buffer contains a CR?  Should DOS-style EOL
>conversion recognize when the buffer contains CR LF, and output it as
>CR LF (rather than CR CR LF)?

I agree with Handa that Emacs should not do this.  Nearly all text files
will use a single end-of-line convention throughout, and thus pose no
problem.  I can't think of cirumstances in which a user would encounter
text files containing extra CR characters like this.  If such
cirumstances really are rare, then having to edit in "binary" mode where
all CRs are explicit seems reasonable.

(BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and
binary files?  I think such a distinction is useful - a binary file
might contain all sorts of odd combinations of CR and LF, but a text
file should normally use a single convention throughout.)

>On Tue, 5 Aug 1997 21:00:37 +0900, Kenichi Handa <handa@etl.go.jp> said:
>>If the distribution of CR and LF throughout the file do not form a
>>clear pattern, would `no conversion' be an option?
>
>To decide the pattern is clear or not is very difficult.  In addition,
>we had better not do exhaustive scanning throughout the file.

We don't want to do exhaustive scanning, but we can easily detect if the
end-of-line convention we choose based on the initial scan is not used
uniformly.  I would want any text file which appears not to use a single
convention to be handled using an information preserving convention,
ie. CODING_EOL_LF (or preferrably marked as a binary file, not a text
file using Unix line-endings, if that distinction is made).

>So, I suggest the following code.  This scans buffer until it
>encounters 3 end-of-lines.  If it founds two different patterns while
>scanning, it decides not to decode end-of-line (by returning
>CODING_EOL_LF).  So, in any of the following cases, it doesn't decode
>end-of-line.
>	CR CR LF,  LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF.
>I think it is clear enough, and users won't be surprised that much.
>
>What do you all think?

I like this proposal.  In addition, or perhaps instead, I would want
insert-file-contents to notice if the chosen convention is not used
uniformly, and either report an error or possibly reread the file using
CODING_EOL_LF.  (Indeed, the existing buffer contents could be patched
up without rereading since it would be known that all previous lines
used the original convention.)

If it is not appropriate to signal an error, or revert the coding in
situ, then at least insert-file-contents should indicate in some way
(eg. by setting a variable to the number of non-conforming end-of-lines)
so that other functions could inform the user about the discrepancy.

I'm not sure whether the same sort of argument applies to subprocess
output or not.

AndrewI

From rms@gnu.ai.mit.edu  Tue Aug  5 10:30:38 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "13:30:33" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "7" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA28221 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 10:30:37 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA05204; Tue, 5 Aug 1997 13:30:33 -0400
Message-Id: <199708051730.NAA05204@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708050810.RAA06812@etlken.etl.go.jp> (message from Kenichi 	Handa on Tue, 5 Aug 1997 17:10:48 +0900)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708050810.RAA06812@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl,         voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 13:30:33 -0400

    Yes.  But how about LF CR LF or CR LF LF?  Should they be recognized
    as DOS format or Unix format?

This question is less important.  Neither choice is horrible.

So please fix the Mac-format problem right away
and don't let it be delayed by other questions like this one.

From rms@gnu.ai.mit.edu  Tue Aug  5 10:34:47 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "13:34:45" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "18" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA28453 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 10:34:46 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA05225; Tue, 5 Aug 1997 13:34:45 -0400
Message-Id: <199708051734.NAA05225@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708050810.RAA06812@etlken.etl.go.jp> (message from Kenichi 	Handa on Tue, 5 Aug 1997 17:10:48 +0900)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> <199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <199708050810.RAA06812@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: Marc.Fleischeuers@kub.nl, Marc.Fleischeuers@kub.nl,         voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 13:34:45 -0400

    Hmmm, how about accumulating how many times each possible end-of-line
    format appears, and select the one which first occurs 3 times?  If
    none occurs 3 times, perhaps we should select the one occurs last.
    Then,

This does not fit the practical needs.

The most important practical need is to avoid ever using mac format
for a file which really should be in dos format.

Therefore, the right solution is never use mac format if you can see
any linefeed at all.

So if all you can find is CR, never LF, then use mac format.

Otherwise, if you every LF has a CR before it, use dos format.

Otherwise, use Unix format.

From rms@gnu.ai.mit.edu  Tue Aug  5 11:17:53 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "14:18:03" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708051818.OAA05876@psilocin.gnu.ai.mit.edu>" "9" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA01720 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 11:17:52 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA05876; Tue, 5 Aug 1997 14:18:03 -0400
Message-Id: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708051200.VAA07192@etlken.etl.go.jp> (message from Kenichi 	Handa on Tue, 5 Aug 1997 21:00:37 +0900)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 14:18:03 -0400

    So, I suggest the following code.  This scans buffer until it
    encounters 3 end-of-lines.  If it founds two different patterns while
    scanning, it decides not to decode end-of-line (by returning
    CODING_EOL_LF).  So, in any of the following cases, it doesn't decode
    end-of-line.
	    CR CR LF,  LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF.
    I think it is clear enough, and users won't be surprised that much.

I think this is good enough.  I'll install it now.

From rms@gnu.ai.mit.edu  Tue Aug  5 12:53:09 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 5" "August" "1997" "15:53:20" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "11" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA09274 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 12:53:09 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA07564; Tue, 5 Aug 1997 15:53:20 -0400
Message-Id: <199708051953.PAA07564@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708051600.RAA14620@propos.long.harlequin.co.uk> (message from 	Andrew Innes on Tue, 5 Aug 1997 17:00:38 +0100 (BST))
References:  <199708051600.RAA14620@propos.long.harlequin.co.uk>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: andrewi@harlequin.co.uk
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Tue, 5 Aug 1997 15:53:20 -0400

    (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and
    binary files?

There is a distinction which perhaps you could interpret in this way:
whether no-conversion is specified as the coding system.

		   I think such a distinction is useful - a binary file
    might contain all sorts of odd combinations of CR and LF, but a text
    file should normally use a single convention throughout.)

What, specifically, is it useful for?

From eliz@is.elta.co.il  Tue Aug  5 21:11:10 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "07:10:51" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970806071013.4498F-100000@is>" "18" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id VAA04658 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 21:11:08 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id HAA04550; Wed, 6 Aug 1997 07:10:52 +0300
X-Sender: eliz@is
In-Reply-To: <199707132141.RAA25670@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970806071013.4498F-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: New way to handle CRLF in Emacs 20.0
Date: Wed, 6 Aug 1997 07:10:51 +0300 (IDT)


On Sun, 13 Jul 1997, Richard Stallman wrote:

>     be nice to have options e.g. to ask the user whether the guess is
>     correct, or require more than a single CRLF before a decision is
>     made.  (I didn't think about this too much, so I might be wrong.)
> 
> I think that is not worth the trouble, given that we still have
> find-file-text and find-file-binary.

A typical DOS/NT user doesn't even know these functions exist.  I
think that `find-file' should in most of the cases do the right thing
automatically, leaving the text- and binary-specific functions for the
marginal cases.

I will return to this issue when the more urgent problems are solved.
Hopefully, by then I will also have enough experience to judge this
objectively.

From eliz@is.elta.co.il  Tue Aug  5 21:11:56 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "07:11:46" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970806071110.4498G-100000@is>" "8" "Re: New way to handle CRLF in Emacs 20.0" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id VAA04692 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 21:11:54 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id HAA04561; Wed, 6 Aug 1997 07:11:47 +0300
X-Sender: eliz@is
In-Reply-To: <199707170631.XAA39945@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970806071110.4498G-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: New way to handle CRLF in Emacs 20.0
Date: Wed, 6 Aug 1997 07:11:46 +0300 (IDT)


On Wed, 16 Jul 1997, Geoff Voelker wrote:

> I vote for removing the T:/B: from the modeline since they will be
> redundant to the coding system characters.

I agree.  I didn't yet have time to download 20.0.92, but if this
change isn't already there, I can install it.

From handa@etl.go.jp  Tue Aug  5 18:09:55 1997
X-VM-v5-Data: ([nil nil nil t nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "10:10:27" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA28619 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 18:09:40 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA20137; Wed, 6 Aug 1997 10:09:11 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00905; Wed, 6 Aug 1997 10:09:11 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA07868; Wed, 6 Aug 1997 10:10:27 +0900
Message-Id: <199708060110.KAA07868@etlken.etl.go.jp>
In-reply-to: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 5 Aug 1997 14:18:03 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> <199708051818.OAA05876@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 10:10:27 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     So, I suggest the following code.  This scans buffer until it
>     encounters 3 end-of-lines.  If it founds two different patterns while
>     scanning, it decides not to decode end-of-line (by returning
>     CODING_EOL_LF).  So, in any of the following cases, it doesn't decode
>     end-of-line.
> 	    CR CR LF,  LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF.
>     I think it is clear enough, and users won't be surprised that much.

> I think this is good enough.  I'll install it now.

I have just made a small change as below in FSF's code:

diff -c -r1.30 coding.c
*** coding.c    1997/08/05 18:19:33     1.30
--- coding.c    1997/08/06 01:06:38
***************
*** 2739,2745 ****
        }
      }
  
!   return (total ? eol_type : CODING_EOL_UNDECIDED);
  }
  
  /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
--- 2739,2745 ----
        }
      }
  
!   return eol_type;
  }
  
  /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC


---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Thu Jul 31 16:42:11 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "31" "July" "1997" "19:42:35" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "10" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id QAA01460 for <voelker@cs.washington.edu>; Thu, 31 Jul 1997 16:42:11 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id TAA19727; Thu, 31 Jul 1997 19:42:35 -0400
Message-Id: <199707312342.TAA19727@psilocin.gnu.ai.mit.edu>
In-reply-to: <199707312038.NAA15222@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> 	<uyb6niore.fsf@kub.nl> <199707312038.NAA15222@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Thu, 31 Jul 1997 19:42:35 -0400

    I'm not sure I understand what you are trying to do.  When a file is
    inside of Emacs, line are always terminated by newlines.  The line
    termination that exists when the file is in the filesystem is only
    placed there when the file is written out.  There is no need to
    explicitly place CR or LF characters in a file to change the
    termination used.

You're right--but perhaps this can be a clue to finding a place
where the documentation needs to be made clearer.  So it is worth figuring
out why Marc got the wrong idea.

From eliz@is.elta.co.il  Thu Aug 14 09:31:56 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Thu" "14" "August" "1997" "19:31:42" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970814192216.8080A-100000@is>" "19" "EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA24370 for <voelker@cs.washington.edu>; Thu, 14 Aug 1997 09:31:55 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id TAA08084; Thu, 14 Aug 1997 19:31:43 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970814192216.8080A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: Andrew Innes <andrewi@harlequin.co.uk>,         Richard Stallman <rms@gnu.ai.mit.edu>
Subject: EOL conversion on MSDOS and MS-Windows
Date: Thu, 14 Aug 1997 19:31:42 +0300 (IDT)

Geoff, there's something that bothers me in the way Emacs 20.0.93 
computes and displays the EOL conversion.  There is a seeming 
inconsistency between the setting of coding system for reading and for 
writing.

When Emacs reads a file that is not in the alist of known file types, it 
sets the coding system to undecided.  If the file happens to be a 
Unix-style (like e.g. those in Emacs source distribution), it ends up 
being *-unix, and Emacs displays `:' in the modeline.  However, if you
then save the file, the coding system is set to undecided-dos, and Emacs 
adds CR characters.  But the coding system displayed on the modeline does 
not change.  It will only change if you revert the buffer or restart 
Emacs.

I think this is confusing.  When users look at the modeline, they should 
be able to determine how will the file be written to disk.  So I think 
the coding system should be by default set to undecided-dos on reading 
the file (unless the file matches in the buffer-type-alist or is on 
untranslated filesystem etc.).

From eliz@is.elta.co.il  Mon Aug 18 08:31:44 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Mon" "18" "August" "1997" "18:31:36" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970818181955.17661C-100000@is>" "35" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id IAA18273 for <voelker@cs.washington.edu>; Mon, 18 Aug 1997 08:31:42 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id SAA17735; Mon, 18 Aug 1997 18:31:37 +0300
X-Sender: eliz@is
In-Reply-To: <199708162215.PAA28649@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970818181955.17661C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 18 Aug 1997 18:31:36 +0300 (IDT)


On Sat, 16 Aug 1997, Geoff Voelker wrote:

> +   if (CODING_REQUIRE_EOL_CONVERSION (&coding))

This doesn't compile: there is no macro named 
CODING_REQUIRE_EOL_CONVERSION (or thereabouts).  I replaced this line 
with this:

	if (coding.eol_type == CODING_EOL_CRLF)

Geoff, is this what you meant, or did you miss something from the diffs?

After the above change, quick test indicates that this now works as 
Richard suggested.  But there is another problem: `write-region' always 
takes the coding system from `buffer-file-type' even if I write the 
region to another file.  To reproduce:

		emacs -q
		C-x C-f src/xfns.c
		C-SPC
		C-u 10 C-n
		M-x write-region RET xyzzy RET

Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix 
format.

I think this is wrong.  Since xyzzy did not exist, it should have been 
created in DOS format.

Do you agree?  Richard, how would this behave on Unix if src/xfns.c was 
in DOS format and xyzzy didn't exist?

What about the case where xyzzy already exists and Emacs is overwriting 
it?

From rms@gnu.ai.mit.edu  Mon Aug 18 21:34:38 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "00:35:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "27" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA27454 for <voelker@cs.washington.edu>; Mon, 18 Aug 1997 21:34:37 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA21653; Tue, 19 Aug 1997 00:35:46 -0400
Message-Id: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970818181955.17661C-100000@is> (message from Eli 	Zaretskii on Mon, 18 Aug 1997 18:31:36 +0300 (IDT))
References:  <Pine.SUN.3.91.970818181955.17661C-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk
cc: handa@etl.go.jp, rms@gnu.ai.mit.edu
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 00:35:46 -0400

		    emacs -q
		    C-x C-f src/xfns.c
		    C-SPC
		    C-u 10 C-n
		    M-x write-region RET xyzzy RET

    Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix 
    format.

    I think this is wrong.  Since xyzzy did not exist, it should have been 
    created in DOS format.

I am not sure.

Note that you can use C-x RET c to specify a different coding system
for this command.

      Richard, how would this behave on Unix if src/xfns.c was 
    in DOS format and xyzzy didn't exist?

As far as I know, it would do exactly the same thing as on DOS.

    What about the case where xyzzy already exists and Emacs is overwriting 
    it?

Emacs would not notice whether the file already exists.
Maybe it should, but I am not sure.

From handa@etl.go.jp  Mon Aug 18 22:27:15 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "14:28:03" "+0900" "Kenichi Handa" "handa@etl.go.jp" "<199708190528.OAA24843@etlken.etl.go.jp>" "40" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA29533 for <voelker@cs.washington.edu>; Mon, 18 Aug 1997 22:27:14 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id OAA09429; Tue, 19 Aug 1997 14:26:57 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id OAA21185; Tue, 19 Aug 1997 14:26:55 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id OAA24843; Tue, 19 Aug 1997 14:28:03 +0900
Message-Id: <199708190528.OAA24843@etlken.etl.go.jp>
In-reply-to: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 19 Aug 1997 00:35:46 -0400)
References: <Pine.SUN.3.91.970818181955.17661C-100000@is> <199708190435.AAA21653@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk,         rms@gnu.ai.mit.edu
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 14:28:03 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
> 		    emacs -q
> 		    C-x C-f src/xfns.c
> 		    C-SPC
> 		    C-u 10 C-n
> 		    M-x write-region RET xyzzy RET
>     Assuming src/xfns.c is in Unix format, this creates xyzzy also in Unix 
>     format.
>     I think this is wrong.  Since xyzzy did not exist, it should have been 
>     created in DOS format.

> I am not sure.

I think write-region should write in the format of
buffer-file-coding-system of the current buffer if no coding system is
specified explicitely by C-x RET c or in file-coding-system-alist.

> Note that you can use C-x RET c to specify a different coding system
> for this command.

>       Richard, how would this behave on Unix if src/xfns.c was 
>     in DOS format and xyzzy didn't exist?

> As far as I know, it would do exactly the same thing as on DOS.

In this case, "the same thing as on DOS" is "to write in the format of
buffer-file-coding-system".  So, xyzzy is written in DOS format.

>     What about the case where xyzzy already exists and Emacs is overwriting 
>     it?

> Emacs would not notice whether the file already exists.
> Maybe it should, but I am not sure.

I think write-region should not follow the format of already existing
file, but append-to-file had better follow the format.

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Tue Aug 19 05:59:27 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "15:59:17" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "13" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id FAA12235 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 05:59:25 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id PAA21411; Tue, 19 Aug 1997 15:59:18 +0300
X-Sender: eliz@is
In-Reply-To: <199708190707.AAA25484@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970819155649.21250L-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 15:59:17 +0300 (IDT)


On Mon, 18 Aug 1997, Geoff Voelker wrote:

> !   if (coding.eol_type != CODING_EOL_UNDECIDED 
> !       && coding.eol_type != CODING_EOL_LF)
>       current_buffer->buffer_file_type = Qnil;
>     else
>       current_buffer->buffer_file_type = Qt;

Hmm...  If the EOL coding is still undecided, why should the file be 
marked as binary?  Shouldn't it be text by default?  The above code means 
that if I read a file which has no newlines, it will be treated as 
binary.  Is this correct?

From eliz@is.elta.co.il  Tue Aug 19 06:05:56 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "15:56:43" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA12367 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 06:05:54 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id PAA21383; Tue, 19 Aug 1997 15:56:43 +0300
X-Sender: eliz@is
In-Reply-To: <199708190435.AAA21653@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970819155536.21250K-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 15:56:43 +0300 (IDT)


On Tue, 19 Aug 1997, Richard Stallman wrote:

>     I think this is wrong.  Since xyzzy did not exist, it should have been 
>     created in DOS format.
> 
> I am not sure.

I'm not sure either, so let me explain why do I think it's wrong.

First, there's a general rule about non-existent files: they are
created with the default coding system.  This is documented behavior,
and users could think it applies to this case as well.

The other consideration that I think goes against inheriting the
coding system from the current buffer is that Emacs tailors many
aspects of its operation using the *name* of the file.  For example,
suppose Emacs were to do something special when writing a C source
file.  Would we expect that action to take place when I write a region
of an e-mail message to a .c file?  I think we would.  If you agree
with that, I think we should also expect the coding system (for
writing) be derived from the name of the file the region is being
written to.  This means that if the file's name to which the region is
written is not found in the various alists which define specific
coding systems, Emacs should fall back to the default coding system,
which is undecided-dos on DOS_NT platforms (unless Emacs is
customized).

Here's another aspect of this dilemma.  Suppose I visit a file that is
in Unix EOL format, and then use `C-x C-w' to write the entire buffer
to a file whose name *is* found in `file-name-buffer-type-alist'.
Would we expect the coding system to change to reflect the change in
the buffer's filename?  I think we would, but Emacs doesn't behave
this way now, because  Emacs consults `file-name-buffer-type-alist'
only when it reads files.

From rms@gnu.ai.mit.edu  Tue Aug 19 08:23:57 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "11:25:18" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "5" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id IAA17840 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 08:23:57 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id LAA29210; Tue, 19 Aug 1997 11:25:18 -0400
Message-Id: <199708191525.LAA29210@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708190711.AAA25742@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <Pine.SUN.3.91.970818181955.17661C-100000@is> 	<199708190435.AAA21653@psilocin.gnu.ai.mit.edu> 	<199708190528.OAA24843@etlken.etl.go.jp> <199708190711.AAA25742@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: handa@etl.go.jp, eliz@is.elta.co.il, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 11:25:18 -0400

    Perhaps we should add a warning message to the user when a
    write-region will write a buffer in an eol format that is not in the
    default format of the file system being used.

I added text about this.  Thanks.

From rms@gnu.ai.mit.edu  Tue Aug 19 09:08:53 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "12:10:08" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA20726 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 09:08:52 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29916; Tue, 19 Aug 1997 12:10:08 -0400
Message-Id: <199708191610.MAA29916@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708190528.OAA24843@etlken.etl.go.jp> (message from Kenichi 	Handa on Tue, 19 Aug 1997 14:28:03 +0900)
References: <Pine.SUN.3.91.970818181955.17661C-100000@is> <199708190435.AAA21653@psilocin.gnu.ai.mit.edu> <199708190528.OAA24843@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 12:10:08 -0400

    I think write-region should not follow the format of already existing
    file, but append-to-file had better follow the format.

I agree, append-to-file needs to do this.

Does it already, or is this a bug that needs fixing?

From eliz@is.elta.co.il  Tue Aug 19 10:05:35 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "20:05:05" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id KAA23786 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 10:05:33 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id UAA21957; Tue, 19 Aug 1997 20:05:06 +0300
X-Sender: eliz@is
In-Reply-To: <199708191610.MAA29916@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970819200145.21938B-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 20:05:05 +0300 (IDT)


On Tue, 19 Aug 1997, Richard Stallman wrote:

>     I think write-region should not follow the format of already existing
>     file, but append-to-file had better follow the format.
> 
> I agree, append-to-file needs to do this.
> 
> Does it already, or is this a bug that needs fixing?

No, it doesn't.  It just seeks to the end of file and writes the region 
with the coding system determined as usual.

I tried to append a portion of a Unix-style file to a DOS-style file and 
got the appended part in Unix format.

Unfortunately, I don't have time to fix this right now.  I will fix it
by tomorrow, if nobody will before that.

From rms@gnu.ai.mit.edu  Tue Aug 19 11:41:24 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "14:42:28" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "3" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA00204 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 11:41:23 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA32279; Tue, 19 Aug 1997 14:42:28 -0400
Message-Id: <199708191842.OAA32279@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970819155536.21250K-100000@is> (message from Eli 	Zaretskii on Tue, 19 Aug 1997 15:56:43 +0300 (IDT))
References:  <Pine.SUN.3.91.970819155536.21250K-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 14:42:28 -0400

It seems to me that this is one of those cases the concept of what is
"really right" is so complex, that it may be better to do something
simple and not try to do what is "really right".

From rms@gnu.ai.mit.edu  Tue Aug 19 11:44:14 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "19" "August" "1997" "14:45:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "13" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA00404 for <voelker@cs.washington.edu>; Tue, 19 Aug 1997 11:44:14 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA32310; Tue, 19 Aug 1997 14:45:24 -0400
Message-Id: <199708191845.OAA32310@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970819155649.21250L-100000@is> (message from Eli 	Zaretskii on Tue, 19 Aug 1997 15:59:17 +0300 (IDT))
References:  <Pine.SUN.3.91.970819155649.21250L-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu,         handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Tue, 19 Aug 1997 14:45:24 -0400

    Hmm...  If the EOL coding is still undecided, why should the file be 
    marked as binary?  Shouldn't it be text by default?  The above code means 
    that if I read a file which has no newlines, it will be treated as 
    binary.  Is this correct?

If this question concerns ONLY the case of a file with no newlines,
then I agree with you, that should be considered a text file by default.

But there is a related question: what will happen with the actual
writing of the file?  If buffer-file-coding-system is undecided
as regards the eol conversion, what will happen if the user
inserts some newlines and then saves the file?  What eol convention
will be used for saving the file?

From eliz@is.elta.co.il  Wed Aug 20 07:26:55 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "17:25:19" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA22381 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 07:26:53 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA24053; Wed, 20 Aug 1997 17:25:20 +0300
X-Sender: eliz@is
In-Reply-To: <199708191842.OAA32279@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970820172430.24006C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 17:25:19 +0300 (IDT)


On Tue, 19 Aug 1997, Richard Stallman wrote:

> It seems to me that this is one of those cases the concept of what is
> "really right" is so complex, that it may be better to do something
> simple and not try to do what is "really right".

I agree, but computing EOL conversion for writing from the filename
when it's not the default buffer file name doesn't seem too complex.

From eliz@is.elta.co.il  Wed Aug 20 07:37:51 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "17:37:14" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "33" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA22809 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 07:37:49 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA24061; Wed, 20 Aug 1997 17:37:15 +0300
X-Sender: eliz@is
In-Reply-To: <199708191845.OAA32310@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970820172534.24006D-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 17:37:14 +0300 (IDT)


On Tue, 19 Aug 1997, Richard Stallman wrote:

> But there is a related question: what will happen with the actual
> writing of the file?  If buffer-file-coding-system is undecided
> as regards the eol conversion, what will happen if the user
> inserts some newlines and then saves the file?  What eol convention
> will be used for saving the file?

I think Emacs will use the EOL convention that is determined by 
buffer-file-type.

Here's the last fragment from find-buffer-file-type-coding-system (on 
lisp/dos-w32.el) in its current incarnation:

	((eq op 'write-region)
	 (if buffer-file-coding-system
	     (cons buffer-file-coding-system
		   buffer-file-coding-system)
	   (if buffer-file-type
	       '(no-conversion . no-conversion)
	     '(undecided-dos . undecided-dos)))))))

So if the buffer type is text (nil), Emacs will add CR characters, if 
it's binary (t), it won't.

Personally, I tend to make it text file, so it will be written in DOS
text format.  But I'm afraid that this tendency is a left-over from
the previous Emacs behavior whereby it would rewrite LF-only files
with CRLF EOLs.  And we have changed that behavior.  So now I'm not
sure whether my gut feelings are correct.

Geoff, what do you think?

From rms@gnu.ai.mit.edu  Wed Aug 20 09:39:43 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "12:40:57" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA28870 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 09:39:42 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA11228; Wed, 20 Aug 1997 12:40:57 -0400
Message-Id: <199708201640.MAA11228@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970820172430.24006C-100000@is> (message from Eli 	Zaretskii on Wed, 20 Aug 1997 17:25:19 +0300 (IDT))
References:  <Pine.SUN.3.91.970820172430.24006C-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 12:40:57 -0400

    I agree, but computing EOL conversion for writing from the filename
    when it's not the default buffer file name doesn't seem too complex.

I am not sure that is always right either.

From eliz@is.elta.co.il  Wed Aug 20 09:54:59 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "19:54:14" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "15" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA29976 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 09:54:55 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id TAA24420; Wed, 20 Aug 1997 19:54:15 +0300
X-Sender: eliz@is
In-Reply-To: <Pine.SUN.3.91.970819200145.21938B-100000@is>
Message-ID: <Pine.SUN.3.91.970820195130.24402B-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>, handa@etl.go.jp,         voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 19:54:14 +0300 (IDT)


On Tue, 19 Aug 1997, Eli Zaretskii wrote:

On Tue, 19 Aug 1997, Richard Stallman wrote:

>     I think write-region should not follow the format of already existing
>     file, but append-to-file had better follow the format.
> 
> I agree, append-to-file needs to do this.

When you say that appending to a file should follow the format, do you 
mean only the EOL encoding, or the entire coding system?

It seems to me that if the EOLs are taken from the file to which the 
region is appended, the rest of the coding system should be also, no?

From rms@gnu.ai.mit.edu  Wed Aug 20 10:21:50 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "13:20:31" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "23" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA02218 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 10:21:49 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA11905; Wed, 20 Aug 1997 13:20:31 -0400
Message-Id: <199708201720.NAA11905@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970820172534.24006D-100000@is> (message from Eli 	Zaretskii on Wed, 20 Aug 1997 17:37:14 +0300 (IDT))
References:  <Pine.SUN.3.91.970820172534.24006D-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp,         rms@gnu.ai.mit.edu
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 13:20:31 -0400

    I think Emacs will use the EOL convention that is determined by 
    buffer-file-type.

Looking at the code, I think that is true only if
buffer-file-coding-system is nil:

	((eq op 'write-region)
	 (if buffer-file-coding-system
	     (cons buffer-file-coding-system
		   buffer-file-coding-system)
	   (if buffer-file-type
	       '(no-conversion . no-conversion)
	     '(undecided-dos . undecided-dos)))))))

If buffer-file-coding-system is non-nil, that overrides
buffer-file-type.  And buffer-file-coding-system should always be
non-nil, if you have visited an existing file.

When you visit a file that contains no newlines,
buffer-file-coding-system gets set to undecided.  It will still be
undecided when you save the buffer.  So the question is, what does
saving the buffer do when buffer-file-coding-system is undecided?
Handa, can you tell us?

From eliz@is.elta.co.il  Wed Aug 20 11:37:01 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "21:36:12" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id LAA08273 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 11:36:59 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id VAA24776; Wed, 20 Aug 1997 21:36:13 +0300
X-Sender: eliz@is
In-Reply-To: <199708201720.NAA11905@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970820212512.24550E-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 21:36:12 +0300 (IDT)


On Wed, 20 Aug 1997, Richard Stallman wrote:

> When you visit a file that contains no newlines,
> buffer-file-coding-system gets set to undecided.  It will still be
> undecided when you save the buffer.

Hmm... for some reason when I read a file without newlines, the coding 
system gets set to undecided-dos, although that function on dos-w32.el 
indeed sets it to undecided.  I will have to debug this.

>  So the question is, what does
> saving the buffer do when buffer-file-coding-system is undecided?
> Handa, can you tell us?

I have set the coding system manually to undecided, added a newline and 
saved it.  It got a Unix-style linefeed (as I'd expect, since that is the 
default when EOL type is not set).

From rms@gnu.ai.mit.edu  Wed Aug 20 16:33:38 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "19:35:08" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "7" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id QAA26599 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 16:33:37 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id TAA16877; Wed, 20 Aug 1997 19:35:08 -0400
Message-Id: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970820195130.24402B-100000@is> (message from Eli 	Zaretskii on Wed, 20 Aug 1997 19:54:14 +0300 (IDT))
References:  <Pine.SUN.3.91.970820195130.24402B-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 19:35:08 -0400

    When you say that appending to a file should follow the format, do you 
    mean only the EOL encoding, or the entire coding system?

    It seems to me that if the EOLs are taken from the file to which the 
    region is appended, the rest of the coding system should be also, no?

I think so too.

From handa@etl.go.jp  Wed Aug 20 17:38:10 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "21" "August" "1997" "09:39:02" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "12" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00315 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 17:38:09 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id JAA03179; Thu, 21 Aug 1997 09:37:50 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA05469; Thu, 21 Aug 1997 09:37:49 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id JAA27152; Thu, 21 Aug 1997 09:39:02 +0900
Message-Id: <199708210039.JAA27152@etlken.etl.go.jp>
In-reply-to: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Wed, 20 Aug 1997 19:35:08 -0400)
References: <Pine.SUN.3.91.970820195130.24402B-100000@is> <199708202335.TAA16877@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Thu, 21 Aug 1997 09:39:02 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     When you say that appending to a file should follow the format, do you 
>     mean only the EOL encoding, or the entire coding system?
>     It seems to me that if the EOLs are taken from the file to which the 
>     region is appended, the rest of the coding system should be also, no?
> I think so too.

I agree too.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Wed Aug 20 17:42:47 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "20" "August" "1997" "20:44:10" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00529 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 17:42:47 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id UAA18067; Wed, 20 Aug 1997 20:44:10 -0400
Message-Id: <199708210044.UAA18067@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970820212512.24550E-100000@is> (message from Eli 	Zaretskii on Wed, 20 Aug 1997 21:36:12 +0300 (IDT))
References:  <Pine.SUN.3.91.970820212512.24550E-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Wed, 20 Aug 1997 20:44:10 -0400

    Hmm... for some reason when I read a file without newlines, the coding 
    system gets set to undecided-dos,

That might be a good default on DOS.

Or perhaps, on DOS, if the buffer-file-coding-system is still
undecided when you first save the file, save it using undecided-dos
instead.  More precisely, if the eol conversion is still undecided
when saving the file, on DOS, then save it using the DOS eol
conversion.

(I am assuming that this will have no effect on files whose names or
file systems are recognized as determining which eol convention to
use; I'm assuming that in those cases buffer-file-coding-system will
specify the eol convention precisely.  If that's not true,
my proposal won't be good.)

From handa@etl.go.jp  Wed Aug 20 17:45:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "21" "August" "1997" "09:46:23" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "11" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00618 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 17:45:40 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id JAA03434; Thu, 21 Aug 1997 09:45:12 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA05730; Thu, 21 Aug 1997 09:45:11 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id JAA27166; Thu, 21 Aug 1997 09:46:23 +0900
Message-Id: <199708210046.JAA27166@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970820212512.24550E-100000@is> (message from Eli 	Zaretskii on Wed, 20 Aug 1997 21:36:12 +0300 (IDT))
References:  <Pine.SUN.3.91.970820212512.24550E-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Thu, 21 Aug 1997 09:46:23 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> I have set the coding system manually to undecided, added a newline and 
> saved it.  It got a Unix-style linefeed (as I'd expect, since that is the 
> default when EOL type is not set).

How about setting default-buffer-file-coding-system to 'undecided-dos
on DOS?

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Wed Aug 20 22:56:34 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "21" "August" "1997" "01:54:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA11507 for <voelker@cs.washington.edu>; Wed, 20 Aug 1997 22:56:33 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id BAA21551; Thu, 21 Aug 1997 01:54:49 -0400
Message-Id: <199708210554.BAA21551@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708210046.JAA27166@etlken.etl.go.jp> (message from Kenichi 	Handa on Thu, 21 Aug 1997 09:46:23 +0900)
References: <Pine.SUN.3.91.970820212512.24550E-100000@is> <199708210046.JAA27166@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Thu, 21 Aug 1997 01:54:49 -0400

    How about setting default-buffer-file-coding-system to 'undecided-dos
    on DOS?

That might be right, but I am not sure.
What effect would that have, in the various cases?

For example, what effect would this have when you visit a file
with no line separators in them?

What would happen if you add some newlines and save the file?

What effect would this have when you visit a file
that uses the Unix EOL convention?

What effect would this have when you create a buffer
with C-x b, and then save it in a file?

From handa@etl.go.jp  Thu Aug 21 03:59:57 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "21" "August" "1997" "20:00:51" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "39" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA19186 for <voelker@cs.washington.edu>; Thu, 21 Aug 1997 03:59:56 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id TAA06719; Thu, 21 Aug 1997 19:59:40 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA09321; Thu, 21 Aug 1997 19:59:39 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id UAA27712; Thu, 21 Aug 1997 20:00:51 +0900
Message-Id: <199708211100.UAA27712@etlken.etl.go.jp>
In-reply-to: <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Thu, 21 Aug 1997 01:54:49 -0400)
References: <Pine.SUN.3.91.970820212512.24550E-100000@is> <199708210046.JAA27166@etlken.etl.go.jp> <199708210554.BAA21551@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, rms@gnu.ai.mit.edu, voelker@cs.washington.edu,         andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Thu, 21 Aug 1997 20:00:51 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     How about setting default-buffer-file-coding-system to 'undecided-dos
>     on DOS?

> That might be right, but I am not sure.
> What effect would that have, in the various cases?

> For example, what effect would this have when you visit a file
> with no line separators in them?

buffer-file-coding-system of the new buffer is set to 'undecided-dos
if the file contains only ASCII.  If the file contains Japanese text
encoded in iso-2022-7bit, buffer-file-coding-system is set to
iso-2022-7bit-dos.

Thus,

> What would happen if you add some newlines and save the file?

the file is saved by DOS EOL convention.

> What effect would this have when you visit a file
> that uses the Unix EOL convention?

Unix EOL convention is detected correctly, and
buffer-file-coding-system of the new buffer is set to XXXX-unix.

> What effect would this have when you create a buffer
> with C-x b, and then save it in a file?

buffer-file-coding-system of the new buffer is still nil, but since
default-buffer-file-coding-system is undecided-dos, it is saved by DOS
EOL convention.

I think all these behaviours are appropriate.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Thu Aug 21 14:47:03 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Thu" "21" "August" "1997" "17:48:24" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708212148.RAA30605@psilocin.gnu.ai.mit.edu>" "4" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA23496 for <voelker@cs.washington.edu>; Thu, 21 Aug 1997 14:47:02 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA30605; Thu, 21 Aug 1997 17:48:24 -0400
Message-Id: <199708212148.RAA30605@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708211100.UAA27712@etlken.etl.go.jp> (message from Kenichi 	Handa on Thu, 21 Aug 1997 20:00:51 +0900)
References: <Pine.SUN.3.91.970820212512.24550E-100000@is> <199708210046.JAA27166@etlken.etl.go.jp> <199708210554.BAA21551@psilocin.gnu.ai.mit.edu> <199708211100.UAA27712@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Thu, 21 Aug 1997 17:48:24 -0400

    >     How about setting default-buffer-file-coding-system to 'undecided-dos
    >     on DOS?

Ok, this sounds like a good idea.  Could someone please make the change?

From eliz@is.elta.co.il  Sun Aug 24 22:08:33 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "08:07:49" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12224 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 22:08:27 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id IAA01344; Mon, 25 Aug 1997 08:07:50 +0300
X-Sender: eliz@is
In-Reply-To: <199708212148.RAA30605@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970825080711.522B-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 08:07:49 +0300 (IDT)


On Thu, 21 Aug 1997, Richard Stallman wrote:

>     >     How about setting default-buffer-file-coding-system to 'undecided-dos
>     >     on DOS?
> 
> Ok, this sounds like a good idea.  Could someone please make the change?

No need to do anything, it is already set this way.  See lisp/dos-w32.el.

From eliz@is.elta.co.il  Sun Aug 24 22:24:58 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "08:24:22" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "26" "Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12770 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 22:24:55 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id IAA01772; Mon, 25 Aug 1997 08:24:22 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970825082144.522G-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>,         Kenichi Handa <handa@etl.go.jp>
Subject: Coding system issues (1)
Date: Mon, 25 Aug 1997 08:24:22 +0300 (IDT)


It seems that (setq-default enable-multibyte-characters nil) also
disables part of the DOS EOL conversions.  Specifically, if you create
a new buffer, type text there, then save the buffer, you get
Unix-style linefeeds at EOL, although the modeline quite deceptively
says "\".  E.g., try this:

	   emacs -q
	   M-: (setq-default enable-multibyte-characters nil) RET
	   C-x b my-own-buffer RET

Now type a few lines of text, then press C-x C-s foobar RET.  Exit or
suspend Emacs and look at the file foobar; you will see a Unix-style
file.

Is this so by design?

Disabling EOL conversion when multibyte characters aren't supported
might make sense on Unix (since it returns to the pre-20 behavior),
but not on DOS_NT, I think.

If you agree, then when multibyte characters support is disabled,
Emacs on DOS_NT needs either to bind coding-system-for-read/write or
call find-operation-coding-system (and then the latter should test the
value of enable-multibyte-characters to return emacs-mule-dos/unix
when it's nil).  I prefer the latter solution.

From eliz@is.elta.co.il  Sun Aug 24 22:28:45 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "08:28:23" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "33" "Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA12878 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 22:28:43 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id IAA01802; Mon, 25 Aug 1997 08:28:24 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970825082425.522H-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>,         Kenichi Handa <handa@etl.go.jp>
Subject: Coding system issues (2)
Date: Mon, 25 Aug 1997 08:28:23 +0300 (IDT)


I wonder whether insert-file-contents needs to inherit the coding
system from the buffer, if it is set already (not undecided)?  Right
now, the coding system is computed afresh every time, even if REPLACE
is non-nil, or if we are inserting into a buffer which already has
some text in it.  (I'm not talking merely about DOS_NT EOL conversion 
here.)

One case where this subtlety might bite you is when you byte-compile a
.el file.  The byte compiler erases the buffer and re-reads the file
before it begins the compilation (why, btw?), so even if you had set
the coding system before that, you need to set it again with C-x RET c
before compiling.  If you forget, you might get subtle bugs when
running the .elc file, because the strings get written into it in
converted form.

I had this problem with lisp/term/internal.el which leads Emacs to
believe it's encoded in sjis.  Even if I set the coding to emacs-mule
when I visit the file, Emacs will use sjis when it re-reads the file
before compiling it.  The converted strings were used to set
case-conversion tables, so the effect of this was that Fdowncase
mysteriously stopped working for some characters: a particularly nasty
and hard-to-debug problem.

Do you agree that inserting a file into a buffer that already has a
decided coding system should use the same coding system?

If not, what about the case with byte-compiling?  Should coding system
be bound to that of the buffer during the compilation?  I think users
should not be requested to know whether a certain command calls
insert-file-contents or not.  When I set the coding system for a
buffer, I'd expect that all the operations thereafter will use that
coding system.  Won't you?

From eliz@is.elta.co.il  Sun Aug 24 22:33:11 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "08:30:10" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "55" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA13012 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 22:33:10 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id IAA01814; Mon, 25 Aug 1997 08:30:11 +0300
X-Sender: eliz@is
In-Reply-To: <199708202335.TAA16877@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970825082850.522J-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 08:30:10 +0300 (IDT)


On Wed, 20 Aug 1997, Richard Stallman wrote:

>     It seems to me that if the EOLs are taken from the file to which the 
>     region is appended, the rest of the coding system should be also, no?
> 
> I think so too.

To make append-to-file use the coding system of that file, I need to
decide where to put the test for this.

The relevant fragment from fileio.c is attached below for your
reference.

I think coding-system-for-write should take precedence over the file
to which we are appending, otherwise there would be no way for the
caller to force a specific coding system for this operation.  When
enable-multibyte-characters is nil, we shouldn't look at the coding
system of the file either, even if buffer-file-coding-system is local.
Do you agree?

If so, it seems to me that testing for the file's coding system before
the else clause and falling back to Ffind_operation_coding_system if
the file leaves the coding undecided, is the correct way.

----------- from fileio.c ------------------------------------------

  /* Decide the coding-system to be encoded to.  */
  {
    Lisp_Object val;

    if (auto_saving)
      val = Qnil;
    else if (!NILP (Vcoding_system_for_write))
      val = Vcoding_system_for_write;
    else if (NILP (current_buffer->enable_multibyte_characters))
      val = (NILP (Flocal_variable_p (Qbuffer_file_coding_system, Qnil))
	     ? Qnil
	     : Fsymbol_value (Qbuffer_file_coding_system));
    else
      {
	Lisp_Object args[7], coding_systems;

	args[0] = Qwrite_region, args[1] = start, args[2] = end,
	  args[3] = filename, args[4] = append, args[5] = visit,
	  args[6] = lockname;
	coding_systems = Ffind_operation_coding_system (7, args);
	val = (CONSP (coding_systems) && !NILP (XCONS (coding_systems)->cdr)
	       ? XCONS (coding_systems)->cdr
	       : current_buffer->buffer_file_coding_system);
      }
    setup_coding_system (Fcheck_coding_system (val), &coding); 
    if (!STRINGP (start) && !NILP (current_buffer->selective_display))
      coding.selective = 1;
  }

From eliz@is.elta.co.il  Sun Aug 24 22:37:01 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "08:36:03" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "22" "Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id WAA13129 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 22:36:59 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id IAA01868; Mon, 25 Aug 1997 08:36:04 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970825083345.522K-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>,         Kenichi Handa <handa@etl.go.jp>
Subject: Coding system issues (3)
Date: Mon, 25 Aug 1997 08:36:03 +0300 (IDT)


There's something in autodetection of a file's coding system which I
find deeply disturbing: it gets in my way when I edit e.g. C sources
with strings that include ASCII characters with the high bit set.  For
example, try to load src/msdos.c or lisp/term/internal.el.  You will
get no-conversion in the first case (which means CRLFs won't be
converted if that file is in DOS format) and in the second you get sjis.
Which is dead wrong in both cases: these are just tables of ASCII
characters with codes beyond 127.

Now, I understand that Emacs cannot possibly know what did I mean when
I put such strings into the file.  These strings might as well be text
in some language other than English, right?  But what annoys me is that
I need to set the coding system explicitly each time I visit these files
to see them as God intended.

Am I missing some function or variable?

If not, then do I have any other way except local file variables to
tell Emacs these are ASCII files?  With major modes, we can specify
the mode on the first nonblank line if we don't like Emacs' choice,
but there seems to be no such feature for coding systems.

From handa@etl.go.jp  Sun Aug 24 23:00:50 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "15:01:28" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "39" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA13966 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 23:00:49 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id PAA14641; Mon, 25 Aug 1997 15:00:23 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id PAA09544; Mon, 25 Aug 1997 15:00:20 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id PAA01874; Mon, 25 Aug 1997 15:01:28 +0900
Message-Id: <199708250601.PAA01874@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825083345.522K-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:36:03 +0300 (IDT))
References:  <Pine.SUN.3.91.970825083345.522K-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 15:01:28 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> Now, I understand that Emacs cannot possibly know what did I mean when
> I put such strings into the file.  These strings might as well be text
> in some language other than English, right?  But what annoys me is that
> I need to set the coding system explicitly each time I visit these files
> to see them as God intended.

> Am I missing some function or variable?

> If not, then do I have any other way except local file variables to
> tell Emacs these are ASCII files?  With major modes, we can specify
> the mode on the first nonblank line if we don't like Emacs' choice,
> but there seems to be no such feature for coding systems.

In the latest pretest, 

Richard Stallman <rms@gnu.ai.mit.edu> writes:
> I have made a new pretest, which is tarring up now.
> It will soon be in gnu/emacs/{emacs.xtar.gz,leim.xtar.gz}
> on alpha.gnu.ai.mit.edu.

> It has an important new feature:
> You can specify the coding system for a file using the -*-
> construct.  Include `coding: CODINGSYSTEM;' inside the -*-...-*-
> to specify use of coding system CODINGSYSTEM.

So, we can use this feature for src/msdos.c and lisp/term/internal.el.
But, since I followed the way of handling `mode' tag, the `coding' tag
should also be at the first line of a file, which requires making the
first line of internal.el very long.

Richard, shouldn't we loosen this restriction at least for `coding'
tag?  How about consulting at least the first three lines?

By the way, src/msdos.c has Unix-like EOL now, doesn't it?

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Sun Aug 24 23:35:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "15:36:07" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "47" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA15294 for <voelker@cs.washington.edu>; Sun, 24 Aug 1997 23:35:13 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id PAA16683; Mon, 25 Aug 1997 15:34:57 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id PAA11933; Mon, 25 Aug 1997 15:34:56 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id PAA01967; Mon, 25 Aug 1997 15:36:07 +0900
Message-Id: <199708250636.PAA01967@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825082425.522H-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT))
References:  <Pine.SUN.3.91.970825082425.522H-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 15:36:07 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> I wonder whether insert-file-contents needs to inherit the coding
> system from the buffer, if it is set already (not undecided)?  Right
> now, the coding system is computed afresh every time, even if REPLACE
> is non-nil, or if we are inserting into a buffer which already has
> some text in it.  (I'm not talking merely about DOS_NT EOL conversion 
> here.)

I don't agree.  The coding system of a file being read should be
decided only by the file contents unless the coding system is
specified explicitly.

> One case where this subtlety might bite you is when you byte-compile a
> .el file.  The byte compiler erases the buffer and re-reads the file
> before it begins the compilation (why, btw?), so even if you had set
> the coding system before that, you need to set it again with C-x RET c
> before compiling.  If you forget, you might get subtle bugs when
> running the .elc file, because the strings get written into it in
> converted form.

> I had this problem with lisp/term/internal.el which leads Emacs to
> believe it's encoded in sjis.  Even if I set the coding to emacs-mule
> when I visit the file, Emacs will use sjis when it re-reads the file
> before compiling it.  The converted strings were used to set
> case-conversion tables, so the effect of this was that Fdowncase
> mysteriously stopped working for some characters: a particularly nasty
> and hard-to-debug problem.

This can be avoided by putting `coding' tag at the head of a file as I
wrote before.

By the way, I don't think it is a good idea to have random binary
codes in a source file.  For instance, in the case of internal.el (I
have just notived the existence of this file), we can use backslash
notation (e.g. "\207") instead of putting row binary code in string to
keep the information of cases.

With the current file, if a user of Japanese or Chinese version of
Windows sees the file with their own editor (not emacs), they surely
break the file contents.

And, I think the file name "internal.el" is not appropriate.
Something like "codepage.el" is better.  What do you think?

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Mon Aug 25 00:08:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "02:42:12" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "3" "New pretest" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA16339 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 00:08:40 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA12978; Mon, 25 Aug 1997 02:42:12 -0400
Message-Id: <199708250642.CAA12978@psilocin.gnu.ai.mit.edu>
Sent-via-bcc-to: Emacs pretesters
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: rms@gnu.ai.mit.edu
Subject: New pretest
Date: Mon, 25 Aug 1997 02:42:12 -0400

There is a new pretest in the usual place:
gnu/emacs/emacs.xtar.gz and gnu/emacs/leim.xtar.gz
on alpha.gnu.ai.mit.edu.

From handa@etl.go.jp  Mon Aug 25 01:38:49 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "17:39:23" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "18" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA20012 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 01:38:47 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA23762; Mon, 25 Aug 1997 17:38:14 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA18640; Mon, 25 Aug 1997 17:38:12 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA02071; Mon, 25 Aug 1997 17:39:23 +0900
Message-Id: <199708250839.RAA02071@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825082850.522J-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:30:10 +0300 (IDT))
References:  <Pine.SUN.3.91.970825082850.522J-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 17:39:23 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> I think coding-system-for-write should take precedence over the file
> to which we are appending, otherwise there would be no way for the
> caller to force a specific coding system for this operation.

I agree.

> When enable-multibyte-characters is nil, we shouldn't look at the
> coding system of the file either, even if buffer-file-coding-system
> is local.  Do you agree?

I'm not sure.  Even if enable-multibyte-characters is nil, at least
EOL format is detected by insert-file-contents.  So, append-to-file
had better detect at least EOL format.  What do you think?

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Mon Aug 25 02:04:55 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "12:04:22" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA21974 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 02:04:53 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id MAA03044; Mon, 25 Aug 1997 12:04:23 +0300
X-Sender: eliz@is
In-Reply-To: <199708250839.RAA02071@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825120245.3032C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 12:04:22 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> > When enable-multibyte-characters is nil, we shouldn't look at the
> > coding system of the file either, even if buffer-file-coding-system
> > is local.  Do you agree?
> 
> I'm not sure.  Even if enable-multibyte-characters is nil, at least
> EOL format is detected by insert-file-contents.  So, append-to-file
> had better detect at least EOL format.  What do you think?

Maybe I just don't understand well enough why that test for 
buffer-file-coding-system being a local variable is at all
required?  Can you explain it to me?

From eliz@is.elta.co.il  Mon Aug 25 03:10:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "13:10:16" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA24018 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 03:10:39 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id NAA03366; Mon, 25 Aug 1997 13:10:17 +0300
X-Sender: eliz@is
In-Reply-To: <199708250601.PAA01874@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825130559.3327C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 13:10:16 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> > construct.  Include `coding: CODINGSYSTEM;' inside the -*-...-*-
> > to specify use of coding system CODINGSYSTEM.
> 
> So, we can use this feature for src/msdos.c and lisp/term/internal.el.

I've seen Richard's announcement after I wrote the message.  I will add 
"codong: " settings to those two files.

> By the way, src/msdos.c has Unix-like EOL now, doesn't it?

Yes.  So if you stay with the same version of Emacs, you won't see any 
problem.  But I also loaded msdos.c edited by a previous version of 
Emacs, which always added CRs, and then I saw all those ^M characters.

Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the 
modeline, which means binary file.  This is not quite right.

From eliz@is.elta.co.il  Mon Aug 25 03:22:20 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "13:21:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "36" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA24251 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 03:22:18 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id NAA03393; Mon, 25 Aug 1997 13:21:55 +0300
X-Sender: eliz@is
In-Reply-To: <199708250636.PAA01967@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825131532.3327E-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 13:21:54 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> > I had this problem with lisp/term/internal.el which leads Emacs to
> > believe it's encoded in sjis.  Even if I set the coding to emacs-mule
> > when I visit the file, Emacs will use sjis when it re-reads the file
> > before compiling it.  The converted strings were used to set
> > case-conversion tables, so the effect of this was that Fdowncase
> > mysteriously stopped working for some characters: a particularly nasty
> > and hard-to-debug problem.
> 
> This can be avoided by putting `coding' tag at the head of a file as I
> wrote before.

No, it's not good enough.  Users can override the coding tag with
C-x c RET  when they loaded the file.  It is IMHO not nice to request 
that they use C-x c RET again before invoking the byte compiler.

> By the way, I don't think it is a good idea to have random binary
> codes in a source file.  For instance, in the case of internal.el (I
> have just notived the existence of this file), we can use backslash
> notation (e.g. "\207") instead of putting row binary code in string to
> keep the information of cases.

Sure, but this is much harder for the programmer ;-).

> And, I think the file name "internal.el" is not appropriate.
> Something like "codepage.el" is better.  What do you think?

The truth is, those case tables should be nuked.  As soon as I learn 
enough about international languages support in Emacs, I will change that 
part to set a specific language environment according to the DOS 
codepage.

Other than that, internal.el does perform terminal-specific stuff, like 
key remapping etc.

From handa@etl.go.jp  Mon Aug 25 03:34:17 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "19:34:59" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "26" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24541 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 03:34:16 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id TAA00163; Mon, 25 Aug 1997 19:33:51 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA24346; Mon, 25 Aug 1997 19:33:50 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id TAA02242; Mon, 25 Aug 1997 19:34:59 +0900
Message-Id: <199708251034.TAA02242@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825120245.3032C-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 12:04:22 +0300 (IDT))
References:  <Pine.SUN.3.91.970825120245.3032C-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 19:34:59 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> I'm not sure.  Even if enable-multibyte-characters is nil, at least
>> EOL format is detected by insert-file-contents.  So, append-to-file
>> had better detect at least EOL format.  What do you think?

> Maybe I just don't understand well enough why that test for 
> buffer-file-coding-system being a local variable is at all
> required?  Can you explain it to me?

buffer-file-coding-system being set locally means that the file was
read with some kind of code conversion regardless of the current value
of enable-multibyte-characters.  There are two cases which cause this
situation.

1) The file was read before enable-multibyte-characters is set to nil.

2) EOL format of the file was not that of Unix files.

In both cases, I thought it was safer to encode the file with the same
coding system used for decoding.

Does this explanation help you?

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Mon Aug 25 03:53:45 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "19:54:42" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "28" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA24934 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 03:53:45 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id TAA01114; Mon, 25 Aug 1997 19:53:32 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA25158; Mon, 25 Aug 1997 19:53:31 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id TAA02279; Mon, 25 Aug 1997 19:54:42 +0900
Message-Id: <199708251054.TAA02279@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825130559.3327C-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 13:10:16 +0300 (IDT))
References:  <Pine.SUN.3.91.970825130559.3327C-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 19:54:42 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> I've seen Richard's announcement after I wrote the message.  I will add 
> "codong: " settings to those two files.

I recommend to use backslash notation in those files.  Then, there's
no nead of `coding:' tags.

>> By the way, src/msdos.c has Unix-like EOL now, doesn't it?

> Yes.  So if you stay with the same version of Emacs, you won't see any 
> problem.  But I also loaded msdos.c edited by a previous version of 
> Emacs, which always added CRs, and then I saw all those ^M characters.

Yah!  Hmmm.  If a file contains random 8-bit code which doesn't fit
the coding system emacs-mule, it is detected as binary file.  This is
a difficult problem.  How can we distinguish such a file from a truely
binnary file which doesn't require any EOL conversion?

> Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the 
> modeline, which means binary file.  This is not quite right.

Why?  The file doesn't need EOL conversion.  In addition, the file
contains random 8bit codes.  So, it should be read/written without any
code conversion.

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Mon Aug 25 03:58:16 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "13:57:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "14" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA25028 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 03:58:14 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id NAA03490; Mon, 25 Aug 1997 13:57:55 +0300
X-Sender: eliz@is
In-Reply-To: <199708251034.TAA02242@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825135656.3327M-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 13:57:54 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> 1) The file was read before enable-multibyte-characters is set to nil.
> 
> 2) EOL format of the file was not that of Unix files.
> 
> In both cases, I thought it was safer to encode the file with the same
> coding system used for decoding.
> 
> Does this explanation help you?

Yes, thanks.  I will make the patches when I build the next pretest and 
send them to you all for reviewing.

From eliz@is.elta.co.il  Mon Aug 25 04:07:44 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "14:07:13" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "24" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id EAA25222 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 04:07:42 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id OAA03537; Mon, 25 Aug 1997 14:07:14 +0300
X-Sender: eliz@is
In-Reply-To: <199708251054.TAA02279@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825140112.3327O-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 14:07:13 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> Yah!  Hmmm.  If a file contains random 8-bit code which doesn't fit
> the coding system emacs-mule, it is detected as binary file.  This is
> a difficult problem.  How can we distinguish such a file from a truely
> binnary file which doesn't require any EOL conversion?

This has been my concern since I first looked at src/coding.c.  The
`coding' tag will have to be the stopgap for now.  Another solution is to
use `find-file-text'.  Perhaps some heuristic could be added in future
based on the relative frequency of CRLF pairs and the binary characters. 

> > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the 
> > modeline, which means binary file.  This is not quite right.
> 
> Why?  The file doesn't need EOL conversion.  In addition, the file
> contains random 8bit codes.  So, it should be read/written without any
> code conversion.

The modeline is not for Emacs, it's for the user.  The user should NOT 
see "=" when the file is a text file.  Otherwise, we will need to 
resurrect the T: and B: that we have just nuked in 20.0.93 (because we 
agreed that the coding system tells enough).

From handa@etl.go.jp  Mon Aug 25 04:14:30 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "20:14:49" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "49" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA25420 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 04:14:29 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id UAA02229; Mon, 25 Aug 1997 20:13:39 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA26297; Mon, 25 Aug 1997 20:13:38 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id UAA02309; Mon, 25 Aug 1997 20:14:49 +0900
Message-Id: <199708251114.UAA02309@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825131532.3327E-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 13:21:54 +0300 (IDT))
References:  <Pine.SUN.3.91.970825131532.3327E-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 20:14:49 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> > I had this problem with lisp/term/internal.el which leads Emacs to
>> > believe it's encoded in sjis.  Even if I set the coding to emacs-mule
>> > when I visit the file, Emacs will use sjis when it re-reads the file
>> > before compiling it.  The converted strings were used to set
>> > case-conversion tables, so the effect of this was that Fdowncase
>> > mysteriously stopped working for some characters: a particularly nasty
>> > and hard-to-debug problem.
>> 
>> This can be avoided by putting `coding' tag at the head of a file as I
>> wrote before.

> No, it's not good enough.  Users can override the coding tag with
> C-x c RET  when they loaded the file.  It is IMHO not nice to request 
> that they use C-x c RET again before invoking the byte compiler.

I don't understand why they dare to load the file by C-x RET c (not
C-x c RET)?  Anyway, if there's a reason to use C-x RET c, it means
that coding tag is not correct and the tag should be modified
correctly.

>> By the way, I don't think it is a good idea to have random binary
>> codes in a source file.  For instance, in the case of internal.el (I
>> have just notived the existence of this file), we can use backslash
>> notation (e.g. "\207") instead of putting row binary code in string to
>> keep the information of cases.

> Sure, but this is much harder for the programmer ;-).

I don't know why putting those raw 8bit codes is easier for
programmers.  When you add more dos-codepage support (e.g. Slavic,
Turkish), you anyway can't see correct characters.  And, the current
code of internal.el doesn't work for multibyte characters.  To make it
work for multibyte characters, I think the better way is to change
those string to vector of multibyte charactes.

> The truth is, those case tables should be nuked.  As soon as I learn 
> enough about international languages support in Emacs, I will change that 
> part to set a specific language environment according to the DOS 
> codepage.

> Other than that, internal.el does perform terminal-specific stuff, like 
> key remapping etc.

I see.

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Mon Aug 25 05:38:34 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "21:39:16" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "56" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA27871 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 05:38:33 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id VAA05277; Mon, 25 Aug 1997 21:38:04 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA29258; Mon, 25 Aug 1997 21:38:04 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id VAA02414; Mon, 25 Aug 1997 21:39:16 +0900
Message-Id: <199708251239.VAA02414@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970825140112.3327O-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 14:07:13 +0300 (IDT))
References:  <Pine.SUN.3.91.970825140112.3327O-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 21:39:16 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> Yah!  Hmmm.  If a file contains random 8-bit code which doesn't fit
>> the coding system emacs-mule, it is detected as binary file.  This is
>> a difficult problem.  How can we distinguish such a file from a truely
>> binnary file which doesn't require any EOL conversion?

> This has been my concern since I first looked at src/coding.c.  The
> `coding' tag will have to be the stopgap for now.  Another solution is to
> use `find-file-text'.  Perhaps some heuristic could be added in future
> based on the relative frequency of CRLF pairs and the binary characters. 

Hmm, perhaps, we must now give up detecting a coding system of a file
in an incremental manner as being done now, but have to read the whole
file with no conversion, detect a coding system by running
sophisticated Emacs Lisp code on the whole buffer, then decode the
whole buffer at once.  This requires a lot more memory and
time-consuming for reading a huge file, but the advantage of more
appropriate code-detection may be larger than this disadvantage.

>> > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the 
>> > modeline, which means binary file.  This is not quite right.
>> 
>> Why?  The file doesn't need EOL conversion.  In addition, the file
>> contains random 8bit codes.  So, it should be read/written without any
>> code conversion.

> The modeline is not for Emacs, it's for the user.  The user should NOT 
> see "=" when the file is a text file.  Otherwise, we will need to 
> resurrect the T: and B: that we have just nuked in 20.0.93 (because we 
> agreed that the coding system tells enough).

The modeline doesn't say anything about the file is text or binary.
It just says how the file was encoded.  They are different things.
Although we have a coding system `binary' (alias of no-conversion),
the term `binary' doesn't means that of DOS file type.  But, hmmm,
perhaps DOS users are too familiar with the concept of file type (text
or binary).  For Unix users, usually there's no difference.

Anyway, these discussions suggests that we have to detect EOL type
even after we detect that a text contains random 8-bit code.  How
about adding a new coding system raw-text, raw-text-dos,
raw-text-unix, raw-text-mac, and set coding-category-binary to
raw-text if we are not in such language environment as Vietnames which
are using such a random 8-bit file for their own language files.

Please try the followings:

(make-coding-system 'raw-text 0 ?t "Raw text")
(setq coding-category-binary 'raw-text)

and find-file msdos.c of LF format and of CRLF format.


---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Mon Aug 25 06:56:24 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "16:56:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "10" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA01050 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 06:56:22 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id QAA03868; Mon, 25 Aug 1997 16:56:02 +0300
X-Sender: eliz@is
In-Reply-To: <199708251114.UAA02309@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825165454.3844A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 16:56:01 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> > Sure, but this is much harder for the programmer ;-).
> 
> I don't know why putting those raw 8bit codes is easier for
> programmers.

Because when its mine codepage, I just type the characters on my keyboard 
;-).

From eliz@is.elta.co.il  Mon Aug 25 07:10:18 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "17:09:56" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970825165614.3844B-100000@is>" "31" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA02011 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 07:10:15 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA03892; Mon, 25 Aug 1997 17:09:57 +0300
X-Sender: eliz@is
In-Reply-To: <199708251239.VAA02414@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970825165614.3844B-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 17:09:56 +0300 (IDT)


On Mon, 25 Aug 1997, Kenichi Handa wrote:

> The modeline doesn't say anything about the file is text or binary.
> It just says how the file was encoded.  They are different things.

Originally, yes.  Previous versions of Emacs on MS-DOS would display T: 
or B:, accordingly, for text and binary files.  In Emacs 20, we all 
agreed that the coding system/EOL info on the modeline makes those T:/B: 
redundant.  So now they aren't displayed by default.

But this means that the coding system and EOL part of the modeline now
has an additional meaning on DOS and NT.

> Anyway, these discussions suggests that we have to detect EOL type
> even after we detect that a text contains random 8-bit code.

Yep, seems this could be a solution that won't slow down the file loading 
too much.

> (make-coding-system 'raw-text 0 ?t "Raw text")
> (setq coding-category-binary 'raw-text)
> 
> and find-file msdos.c of LF format and of CRLF format.

This works (displays "t:" and "t\" respectively on the modeline).

I will have to dig deeper to understand what this does exactly.

Richard, Geoff and Andrew, do you agree that creating such a coding system 
is the way to go? 

From rms@gnu.ai.mit.edu  Mon Aug 25 10:49:40 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "13:51:00" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA15521 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 10:49:39 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA15665; Mon, 25 Aug 1997 13:51:00 -0400
Message-Id: <199708251751.NAA15665@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970825131532.3327E-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 13:21:54 +0300 (IDT))
References:  <Pine.SUN.3.91.970825131532.3327E-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 13:51:00 -0400

    No, it's not good enough.  Users can override the coding tag with
    C-x c RET  when they loaded the file.

I do not see a problem with this.  Users can do whatever they want to.

      It is IMHO not nice to request 
    that they use C-x c RET again before invoking the byte compiler.

Could you explain what you are talking about?

From rms@gnu.ai.mit.edu  Mon Aug 25 13:54:30 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "16:55:55" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA28556 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 13:54:24 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA16638; Mon, 25 Aug 1997 16:55:55 -0400
Message-Id: <199708252055.QAA16638@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708250636.PAA01967@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 25 Aug 1997 15:36:07 +0900)
References: <Pine.SUN.3.91.970825082425.522H-100000@is> <199708250636.PAA01967@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 16:55:55 -0400

      The coding system of a file being read should be
    decided only by the file contents unless the coding system is
    specified explicitly.

Yes, that is right.

Eli, if you disagree, would you please describe *in full* a case where
you think it is wrong?  It isn't useful to have a discussion if we
are not sure we are talking about the same thing.

From rms@gnu.ai.mit.edu  Mon Aug 25 13:56:16 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "16:57:47" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA28679 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 13:56:15 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA16652; Mon, 25 Aug 1997 16:57:47 -0400
Message-Id: <199708252057.QAA16652@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708250839.RAA02071@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 25 Aug 1997 17:39:23 +0900)
References: <Pine.SUN.3.91.970825082850.522J-100000@is> <199708250839.RAA02071@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 16:57:47 -0400

    > When enable-multibyte-characters is nil, we shouldn't look at the
    > coding system of the file either, even if buffer-file-coding-system
    > is local.  Do you agree?

    I'm not sure.  Even if enable-multibyte-characters is nil, at least
    EOL format is detected by insert-file-contents.  So, append-to-file
    had better detect at least EOL format.  What do you think?

That is right.

From rms@gnu.ai.mit.edu  Mon Aug 25 14:10:53 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "17:12:25" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: EOL conversion on MSDOS and MS-Windows" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA29570 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 14:10:52 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA16778; Mon, 25 Aug 1997 17:12:25 -0400
Message-Id: <199708252112.RAA16778@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708251034.TAA02242@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 25 Aug 1997 19:34:59 +0900)
References: <Pine.SUN.3.91.970825120245.3032C-100000@is> <199708251034.TAA02242@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL conversion on MSDOS and MS-Windows
Date: Mon, 25 Aug 1997 17:12:25 -0400

    1) The file was read before enable-multibyte-characters is set to nil.

    2) EOL format of the file was not that of Unix files.

    In both cases, I thought it was safer to encode the file with the same
    coding system used for decoding.

This reasoning makes sense to me.

From rms@gnu.ai.mit.edu  Mon Aug 25 14:54:48 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "17:55:58" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "17" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA02218 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 14:54:47 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA17030; Mon, 25 Aug 1997 17:55:58 -0400
Message-Id: <199708252155.RAA17030@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970825082425.522H-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT))
References:  <Pine.SUN.3.91.970825082425.522H-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 17:55:58 -0400

    I wonder whether insert-file-contents needs to inherit the coding
    system from the buffer, if it is set already (not undecided)?

This would be incorrect.  If I insert file foo into a buffer visiting
bar, foo should be decoding in the right coding system of file foo,
which has absolutely nothing to do with the coding system of file bar.

      The byte compiler erases the buffer and re-reads the file
    before it begins the compilation (why, btw?),

Because that buffer probably had some other text in it,
perhaps from another file that you compiled.

Remember that byte-compile-file does NOT visit the input file.
It uses a temporary buffer.  This is a normal technique;
many Emacs commands that look at files use temporary buffers.
Some read many different files.

From rms@gnu.ai.mit.edu  Mon Aug 25 14:55:19 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "17:56:43" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA02250 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 14:55:18 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA17045; Mon, 25 Aug 1997 17:56:43 -0400
Message-Id: <199708252156.RAA17045@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970825082425.522H-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:28:23 +0300 (IDT))
References:  <Pine.SUN.3.91.970825082425.522H-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: Coding system issues (2)
Date: Mon, 25 Aug 1997 17:56:43 -0400

    Do you agree that inserting a file into a buffer that already has a
    decided coding system should use the same coding system?

That would be completely wrong.

From rms@gnu.ai.mit.edu  Mon Aug 25 15:00:22 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "18:01:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "24" "Re: Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id PAA02642 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 15:00:21 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id SAA17074; Mon, 25 Aug 1997 18:01:49 -0400
Message-Id: <199708252201.SAA17074@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970825082144.522G-100000@is> (message from Eli 	Zaretskii on Mon, 25 Aug 1997 08:24:22 +0300 (IDT))
References:  <Pine.SUN.3.91.970825082144.522G-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il, handa@etl.go.jp
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, rms@gnu.ai.mit.edu
Subject: Re: Coding system issues (1)
Date: Mon, 25 Aug 1997 18:01:49 -0400

    It seems that (setq-default enable-multibyte-characters nil) also
    disables part of the DOS EOL conversions.  Specifically, if you create
    a new buffer, type text there, then save the buffer, you get
    Unix-style linefeeds at EOL, although the modeline quite deceptively
    says "\".  E.g., try this:

	       emacs -q
	       M-: (setq-default enable-multibyte-characters nil) RET
	       C-x b my-own-buffer RET

    Now type a few lines of text, then press C-x C-s foobar RET.  Exit or
    suspend Emacs and look at the file foobar; you will see a Unix-style
    file.

This is definitely a bug.  Saving this buffer should peform EOL conversion
even though enable-multibyte-characters is nil.

Handa can you please work on this with highest priority?

    Disabling EOL conversion when multibyte characters aren't supported
    might make sense on Unix (since it returns to the pre-20 behavior),

It is wrong on Unix too.  EOL conversion should work for all formats
on all systems, regardless of enable-multibyte-characters.

From handa@etl.go.jp  Mon Aug 25 18:00:52 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "10:01:51" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "26" "Re: Coding system issues (1)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA13003 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 18:00:51 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA20440; Tue, 26 Aug 1997 10:00:41 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA22888; Tue, 26 Aug 1997 10:00:40 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA03036; Tue, 26 Aug 1997 10:01:51 +0900
Message-Id: <199708260101.KAA03036@etlken.etl.go.jp>
In-reply-to: <199708252201.SAA17074@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Mon, 25 Aug 1997 18:01:49 -0400)
References: <Pine.SUN.3.91.970825082144.522G-100000@is> <199708252201.SAA17074@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk,         rms@gnu.ai.mit.edu
Subject: Re: Coding system issues (1)
Date: Tue, 26 Aug 1997 10:01:51 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
> 	       emacs -q
> 	       M-: (setq-default enable-multibyte-characters nil) RET
> 	       C-x b my-own-buffer RET
>     Now type a few lines of text, then press C-x C-s foobar RET.  Exit or
>     suspend Emacs and look at the file foobar; you will see a Unix-style
>     file.

> This is definitely a bug.  Saving this buffer should peform EOL conversion
> even though enable-multibyte-characters is nil.

> Handa can you please work on this with highest priority?

Ok, I'm now working on that.

>     Disabling EOL conversion when multibyte characters aren't supported
>     might make sense on Unix (since it returns to the pre-20 behavior),

> It is wrong on Unix too.  EOL conversion should work for all formats
> on all systems, regardless of enable-multibyte-characters.

I agree.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Mon Aug 25 20:58:05 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "25" "August" "1997" "23:59:33" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "12" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA20132 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 20:58:04 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA18563; Mon, 25 Aug 1997 23:59:33 -0400
Message-Id: <199708260359.XAA18563@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708251054.TAA02279@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 25 Aug 1997 19:54:42 +0900)
References: <Pine.SUN.3.91.970825130559.3327C-100000@is> <199708251054.TAA02279@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 25 Aug 1997 23:59:33 -0400

    > Yes.  So if you stay with the same version of Emacs, you won't see any 
    > problem.  But I also loaded msdos.c edited by a previous version of 
    > Emacs, which always added CRs, and then I saw all those ^M characters.

That was a bug in the old version of Emacs.  Let's not worry about old
bugs that have been fixed.

    Yah!  Hmmm.  If a file contains random 8-bit code which doesn't fit
    the coding system emacs-mule, it is detected as binary file.

Could you be more precise?  What does "detected as binary file" really
mean?  What does Emacs DO in this case?

From handa@etl.go.jp  Mon Aug 25 22:08:07 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "14:08:25" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "58" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id WAA22058 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 22:08:06 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id OAA06286; Tue, 26 Aug 1997 14:07:14 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id OAA10146; Tue, 26 Aug 1997 14:07:14 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id OAA03270; Tue, 26 Aug 1997 14:08:25 +0900
Message-Id: <199708260508.OAA03270@etlken.etl.go.jp>
In-reply-to: <199708260359.XAA18563@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Mon, 25 Aug 1997 23:59:33 -0400)
References: <Pine.SUN.3.91.970825130559.3327C-100000@is> <199708251054.TAA02279@etlken.etl.go.jp> <199708260359.XAA18563@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Tue, 26 Aug 1997 14:08:25 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>> Yes.  So if you stay with the same version of Emacs, you won't see any 
>> problem.  But I also loaded msdos.c edited by a previous version of 
>> Emacs, which always added CRs, and then I saw all those ^M characters.

> That was a bug in the old version of Emacs.  Let's not worry about old
> bugs that have been fixed.

No.  This bug (feature?) still remains.  If a file is formated by
DOS-like EOL, and there exist ramdom 8-bit codes somewhere in the
file, the current version detect it as coding-category-binary, and
does no code conversion because coding-category-binary is set to
`no-conversion' by default.

>     Yah!  Hmmm.  If a file contains random 8-bit code which doesn't fit
>     the coding system emacs-mule, it is detected as binary file.

> Could you be more precise?  What does "detected as binary file" really
> mean?  What does Emacs DO in this case?

It does as I wrote above because decode_coding (in coding.c) has the
following code.
----------------------------------------------------------------------
  if (coding->type == coding_type_undecided)
    detect_coding (coding, source, src_bytes);

  if (coding->eol_type == CODING_EOL_UNDECIDED)
    detect_eol (coding, source, src_bytes);
----------------------------------------------------------------------
So, Emacs at first try to detect text coding.  At this time, if the
file contains random 8-bit code, Emacs thinks that the category of
coding is coding-category-binary and setup the coding system
no-conversion in the structure `coding' (coding->eol_type is set to
CODING_EOL_LF).  So, it skips detect_eol.

Even now, if we set coding-category-binary to `emacs-mule', this
problem is avoided, but it is like setting coding-category-sjis to
iso-latin-1 (Richard, do you remember the previous discussion about
handling Microsoft extra latin code?), and not a right thing.
In addtition, mnemonic of `emacs-mule' is `=' (same as no conversion),
which won't help DOS users.

So, I proposed a new coding system `raw-text' (thought I'm not sure
this is a good name or not) which requires only EOL conversion and set
coding-category-binary to raw-text by default.

Then, the call of detect_coding setup raw-text in coding
(coding->eol_type is set to CODING_EOL_UNDECIDED), and Emacs calls
detect_eol which may set coding->eol_type correctly.

The demerit of this method is that a truely binary file is detected as
`raw-text-XXX'.  But, this can be avoided except for a very rare case
by changing the code of detect_eol so that it setup no-conversion to
the struct `coding' if EOL format is not consistent.

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Mon Aug 25 23:53:17 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "09:52:54" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA25266 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 23:53:15 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id JAA05300; Tue, 26 Aug 1997 09:52:55 +0300
X-Sender: eliz@is
In-Reply-To: <199708251801.OAA15763@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970826095143.5294A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Tue, 26 Aug 1997 09:52:54 +0300 (IDT)


On Mon, 25 Aug 1997, Richard Stallman wrote:

>     > Besides, even with Unix EOLs, msdos.c causes Emacs to put "=" on the 
>     > modeline, which means binary file.  This is not quite right.
> 
>     Why?  The file doesn't need EOL conversion.  In addition, the file
>     contains random 8bit codes.  So, it should be read/written without any
>     code conversion.
> 
> Eli, please send him a precise test case; tell Handa *exactly* what to
> type so he can observe this.  Describing actions in abstract ways is a
> VERY bad idea, almost guaranteed to lead to misunderstandings.

I did send a precise test case: `C-x C-f src/msdos.c RET'.  After
that, look at the modeline: it says the coding is no-conversion ("="),
as if this were a binary file.  If the file has DOS EOLs, you will see
^M characters at the end of each line.

From eliz@is.elta.co.il  Mon Aug 25 23:59:28 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "09:58:49" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "40" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA25410 for <voelker@cs.washington.edu>; Mon, 25 Aug 1997 23:59:26 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id JAA05307; Tue, 26 Aug 1997 09:58:49 +0300
X-Sender: eliz@is
In-Reply-To: <199708251751.NAA15665@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970826095752.5294C-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (2)
Date: Tue, 26 Aug 1997 09:58:49 +0300 (IDT)


On Mon, 25 Aug 1997, Richard Stallman wrote:

>       It is IMHO not nice to request 
>     that they use C-x c RET again before invoking the byte compiler.
> 
> Could you explain what you are talking about?

I did that, in my original message.  This message was one of a series
that was generated by a discussion which followed.

The problem (which took me a couple of days to debug, btw) was that
lisp/term/internal.el, when byte-compiled and loaded, would cause
Fdowncase to behave erratically.  Specifically, the letters A, O, and
U would stay in upper case.  To reproduce:

	     emacs -q
	     M-x load-file lisp/term/internal.elc RET
	     C-x b *scratch* RET
	     (downcase "AOU")^J

(I hope that internal.el can be loaded on Unix with no problems, so
you could try it.)

It turned out that internal.el looks to Emacs as sjis-encoded file.  I
then set the coding system to emacs-mule manually when I visited that
file (`C-x RET c emacs-mule RET C-x C-f lisp/term/internal.el RET'),
but the byte-compiled file was still wrong, because
`emacs-lisp-byte-compile' re-reads the file, and when it does, it
decodes it again as sjis.  So the user must set the coding system
again before compiling the file.  This is IMHO counter-intuitive,
since users might have no idea that the byte compiler reads the file
again.

With the introduction of the `coding' tag in the -*- line, this
problem with internal.el is solved.  But I'm still concerned with the
more general case whereby setting coding system for the .el file is
not enough to byte-compile it as that coding system says.  In the case
where a user needs to override Emacs coding detection, this might lead
to subtle bugs.

From eliz@is.elta.co.il  Tue Aug 26 00:06:29 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "10:06:05" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "10" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA25732 for <voelker@cs.washington.edu>; Tue, 26 Aug 1997 00:06:27 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA05319; Tue, 26 Aug 1997 10:06:06 +0300
X-Sender: eliz@is
In-Reply-To: <199708252155.RAA17030@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970826095923.5294D-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: Coding system issues (2)
Date: Tue, 26 Aug 1997 10:06:05 +0300 (IDT)


On Mon, 25 Aug 1997, Richard Stallman wrote:

> Remember that byte-compile-file does NOT visit the input file.
> It uses a temporary buffer.

I was using `emacs-lisp-byte-compile' (also available from the menu bar),
which is supposed to compile the file in the current buffer.  I understand
that it calls `byte-compile-file' internally, but it is still not obvious
that it should re-read the file which in this case is already visited. 

From rms@gnu.ai.mit.edu  Tue Aug 26 09:55:56 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" "26" "August" "1997" "12:56:31" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "1" "Re: Coding system issues (2)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA16283 for <voelker@cs.washington.edu>; Tue, 26 Aug 1997 09:55:52 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA23352; Tue, 26 Aug 1997 12:56:31 -0400
Message-Id: <199708261656.MAA23352@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970826095923.5294D-100000@is> (message from Eli 	Zaretskii on Tue, 26 Aug 1997 10:06:05 +0300 (IDT))
References:  <Pine.SUN.3.91.970826095923.5294D-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: Coding system issues (2)
Date: Tue, 26 Aug 1997 12:56:31 -0400

byte-compile-file should not be influenced by the current buffer.

From rms@gnu.ai.mit.edu  Tue Aug 26 21:29:52 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "00:31:18" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id VAA27973 for <voelker@cs.washington.edu>; Tue, 26 Aug 1997 21:29:51 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id AAA26954; Wed, 27 Aug 1997 00:31:18 -0400
Message-Id: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970826095143.5294A-100000@is> (message from Eli 	Zaretskii on Tue, 26 Aug 1997 09:52:54 +0300 (IDT))
References:  <Pine.SUN.3.91.970826095143.5294A-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 00:31:18 -0400

    I did send a precise test case: `C-x C-f src/msdos.c RET'.  After
    that, look at the modeline: it says the coding is no-conversion ("="),
    as if this were a binary file.

Yes, I see this.  The buffer has buffer-file-coding-system = no-conversion,
but enable-multibyte-characters is t.

This is not right.  If the file happens to have a \201 in it, 
Emacs could get quite confused.

The right thing to do, for a file which has byte codes 200-377 which
Emacs can't understand, is to turn off enable-multibyte-characters.
That way it is safe to read in the file no matter what byte values it
has.

In addition, I agree with Eli that it should do EOL conversion
according to the data in the file.

Handa, can you please implement this?

From eliz@is.elta.co.il  Mon Sep 29 01:43:40 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "29" "September" "1997" "10:43:12" "+0200" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id BAA13207 for <voelker@cs.washington.edu>; Mon, 29 Sep 1997 01:43:38 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA02087; Mon, 29 Sep 1997 10:43:13 +0200
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970929103310.1917N-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Kenichi Handa <handa@etl.go.jp>, Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>
Subject: EOL encoding and C-x i
Date: Mon, 29 Sep 1997 10:43:12 +0200 (IST)

`C-x i' changes the EOL encoding in a way that I find unexpected.  This 
is in Emacs 20.2 on MS-DOS.

To reproduce:

	emacs -q
	C-x C-f foobar.txt

(I assume `foobar.txt' doesn't exist, but I don't think that it matters.)
Put some text into the buffer, then save it.  The file foobar.txt is 
saved with DOS EOLs, like it should.

Now insert another file into the buffer:

	C-x i foo.bar

If `foo.bar' has Unix EOLs, the coding system of the current buffer is 
changed to *-dos, and the file is saved as such.

Is this done on purpose?  If so, I would like to know the reason.  I
would expect that, at least for a buffer which has been saved already in 
a file with DOS EOLs and have enough of them to qualify as a DOS text 
file, `C-x i' won't change the EOL encodings. 

From handa@etl.go.jp  Mon Sep 29 04:29:08 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" "29" "September" "1997" "20:28:54" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "79" "Re: EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA16676 for <voelker@cs.washington.edu>; Mon, 29 Sep 1997 04:29:06 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id UAA06440; Mon, 29 Sep 1997 20:27:51 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA12486; Mon, 29 Sep 1997 20:27:50 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id UAA23617; Mon, 29 Sep 1997 20:28:54 +0900
Message-Id: <199709291128.UAA23617@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970929103310.1917N-100000@is> (message from Eli 	Zaretskii on Mon, 29 Sep 1997 10:43:12 +0200 (IST))
References:  <Pine.SUN.3.91.970929103310.1917N-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL encoding and C-x i
Date: Mon, 29 Sep 1997 20:28:54 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> `C-x i' changes the EOL encoding in a way that I find unexpected.  This 
> is in Emacs 20.2 on MS-DOS.
> To reproduce:
> 	emacs -q
> 	C-x C-f foobar.txt
> (I assume `foobar.txt' doesn't exist, but I don't think that it matters.)
> Put some text into the buffer, then save it.  The file foobar.txt is 
> saved with DOS EOLs, like it should.
> Now insert another file into the buffer:
> 	C-x i foo.bar
> If `foo.bar' has Unix EOLs, the coding system of the current buffer is 
> changed to *-dos, and the file is saved as such.

> Is this done on purpose?  If so, I would like to know the reason.

This is not done on purpose.  But, the behaviour of this reason is
simple; buffer-file-coding-system is not set locally just by saving
it.

> I would expect that, at least for a buffer which has been saved
> already in a file with DOS EOLs and have enough of them to qualify
> as a DOS text file, `C-x i' won't change the EOL encodings.

I agree that what you expect is quite reasonable, and agree that the
buffer-file-coding-system of a buffer once saved should not be changed
by inserting something later.  This can be achieved by binding
buffer-file-coding-system locally in that buffer.

The question is at which point we should bind it.  How about the
following change to basic-save-buffer?

---lisp/ChangeLog---------------------------------------------------------
	* files.el (basic-save-buffer): Set buffer-file-coding-system to
	the coding system actually used for saving.
---patch for lisp/files.el------------------------------------------------
diff -acrN --exclude=ChangeLog --exclude=*.elc --exclude=*~ --exclude=TAGS --exclude=loaddefs.el ../emacs-20.2.fsf/lisp/files.el ../emacs-20.2/lisp/files.el
*** ../emacs-20.2.fsf/lisp/files.el	Tue Sep  9 14:32:49 1997
--- ../emacs-20.2/lisp/files.el	Mon Sep 29 19:59:39 1997
***************
*** 2181,2186 ****
--- 2181,2190 ----
  		;; If a hook returned t, file is already "written".
  		;; Otherwise, write it the usual way now.
  		(setq setmodes (basic-save-buffer-1)))
+ 	    ;; Now we have saved the current buffer.  Let's make sure
+ 	    ;; that buffer-file-coding-system is fixed to what
+ 	    ;; actually used for saving by binding it locally.
+ 	    (setq buffer-file-coding-system last-coding-system-used)
  	    (setq buffer-file-number
  		  (nthcdr 10 (file-attributes buffer-file-name)))
  	    (if setmodes
------------------------------------------------------------------------

I've just tested this change on Unix by:
(1) At first, set default value of buffer-file-coding-system to
undecided-dos.

(2) Then, I visisted a new file.  At this moment,
buffer-file-coding-system was not set locally.

(3) Entered "abc\n" in the buffer and saved it.  Now
buffer-file-coding-system ws set to undecided-dos locally.

(4) Inserted some ascii file of Unix-like end-of-line codes.
buffer-file-coding-system was still undecided-dos.

(5) Inserted a file of iso-latin-1-unix.  Then
buffer-file-coding-system was changed to iso-latin-1-dos.

So, it seems that this change works well.

If you test it in your environment and find no problem, someone please
update FSF's code.  Now, I'm using a very narrow line, and it's quite
difficult to do the job of updating.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Tue Sep 30 19:14:13 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Tue" "30" "September" "1997" "22:15:00" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199710010215.WAA12036@psilocin.gnu.ai.mit.edu>" "17" "Re: EOL encoding and C-x i" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id TAA29544 for <voelker@cs.washington.edu>; Tue, 30 Sep 1997 19:14:12 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id WAA12036; Tue, 30 Sep 1997 22:15:00 -0400
Message-Id: <199710010215.WAA12036@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970929103310.1917N-100000@is> (message from Eli 	Zaretskii on Mon, 29 Sep 1997 10:43:12 +0200 (IST))
Reply-to: rms@gnu.ai.mit.edu
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: EOL encoding and C-x i
Date: Tue, 30 Sep 1997 22:15:00 -0400

    Now insert another file into the buffer:

	    C-x i foo.bar

    If `foo.bar' has Unix EOLs, the coding system of the current buffer is 
    changed to *-dos, and the file is saved as such.

Do you mean it is changed to *-unix?

    Is this done on purpose?  If so, I would like to know the reason.  I
    would expect that, at least for a buffer which has been saved already in 
    a file with DOS EOLs and have enough of them to qualify as a DOS text 
    file, `C-x i' won't change the EOL encodings. 

In that situation you are mixing the two kinds of EOLs, which means
that it is hard to be sre that either alternative is really right.
But I tend to agree with you.  Handa, how hard would this be?

From rms@gnu.ai.mit.edu  Sun Jul  6 01:33:31 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Sun" " 6" "July" "1997" "04:33:59" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199707060833.EAA10252@psilocin.gnu.ai.mit.edu>" "45" "Re: New way of handling CRLF" "^From:" nil nil "7" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA18916 for <voelker@cs.washington.edu>; Sun, 6 Jul 1997 01:33:30 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id EAA10252; Sun, 6 Jul 1997 04:33:59 -0400
Message-Id: <199707060833.EAA10252@psilocin.gnu.ai.mit.edu>
In-reply-to: <199707042102.OAA34672@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <Pine.SUN.3.91.970703183248.1458I-100000@is> 	<199707031919.PAA10787@psilocin.gnu.ai.mit.edu> <199707042102.OAA34672@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk
Subject: Re: New way of handling CRLF
Date: Sun, 6 Jul 1997 04:33:59 -0400

    Given the new coding-system framework, I think that all file I/O under
    DOS_NT should now be done in binary mode

Yes, that is true.

If it doesn't work that way now, could someone send me a fix?


    Actually, the new coding-system framework appears to obviate the need
    for buffer-file-type; file-coding-system-alist and
    buffer-file-coding-system appear to be flexible enough to supercede
    it.  I will need to think more about this, though, since it is a
    rather drastic change under DOS_NT.

Another alternative would be to modify some of the Mule functions so
that, on DOS/NT, they look at the same variables which now control the
decision about the buffer file type.  For example, the list of special
extensions and the list of untranslated file systems.

    > (There is a bug in the pretest that fails to save a file with CRLF if
    > it was recognized with CRLF.  That has been fixed.)

    Can you send me the patches for this so that I can test assuming that
    this case should work?

I don't know what the patch is.  If you want, you can log in here
and try to figure out.

    Currently, the default for file-coding-system-alist is 'undecided.
    Under DOS_NT, this should probably be 'emacs-mule so that CRLF is
    decoded and encoded by default.

I don't follow the reasoning.  Why would changing from undecided
to emacs-mule have any effect on EOL conversion?

Perhaps you're being fooled by the bug that Handa fixed (see above)
which made EOL conversion not work when saving a file, since that
may have been only for `undecided'.


At this point, I don't think we should try to eliminate
file-name-buffer-file-type-alist and untranslated file systems, rather
just make them have their effect via the coding system mechanism which
is the right way for now.  file-name-buffer-file-type-alist and
untranslated file systems are documented.

From handa@etl.go.jp  Tue Aug  5 18:09:55 1997
X-VM-v5-Data: ([nil nil nil t nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "10:10:27" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA28619 for <voelker@cs.washington.edu>; Tue, 5 Aug 1997 18:09:40 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA20137; Wed, 6 Aug 1997 10:09:11 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA00905; Wed, 6 Aug 1997 10:09:11 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA07868; Wed, 6 Aug 1997 10:10:27 +0900
Message-Id: <199708060110.KAA07868@etlken.etl.go.jp>
In-reply-to: <199708051818.OAA05876@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 5 Aug 1997 14:18:03 -0400)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> <199708030423.AAA25764@psilocin.gnu.ai.mit.edu> <199708040133.KAA04718@etlken.etl.go.jp> <199708050838.EAA00520@psilocin.gnu.ai.mit.edu> <199708051200.VAA07192@etlken.etl.go.jp> <199708051818.OAA05876@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl, andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 10:10:27 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     So, I suggest the following code.  This scans buffer until it
>     encounters 3 end-of-lines.  If it founds two different patterns while
>     scanning, it decides not to decode end-of-line (by returning
>     CODING_EOL_LF).  So, in any of the following cases, it doesn't decode
>     end-of-line.
> 	    CR CR LF,  LF CR LF, CR LF LF, CR CR LF LF, LF CR CR LF.
>     I think it is clear enough, and users won't be surprised that much.

> I think this is good enough.  I'll install it now.

I have just made a small change as below in FSF's code:

diff -c -r1.30 coding.c
*** coding.c    1997/08/05 18:19:33     1.30
--- coding.c    1997/08/06 01:06:38
***************
*** 2739,2745 ****
        }
      }
  
!   return (total ? eol_type : CODING_EOL_UNDECIDED);
  }
  
  /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
--- 2739,2745 ----
        }
      }
  
!   return eol_type;
  }
  
  /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC


---
Ken'ichi HANDA
handa@etl.go.jp

From Marc.Fleischeuers@kub.nl  Wed Aug  6 01:32:59 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 6" "August" "1997" "10:32:13" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "50" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA14030 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 01:32:53 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id KAA20618; Wed, 6 Aug 1997 10:32:12 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> 	<199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <ulo2hjd2t.fsf@kub.nl> 	<199708051721.NAA05128@psilocin.gnu.ai.mit.edu>
In-Reply-To: Richard Stallman's message of Tue, 5 Aug 1997 13:21:52 -0400
Message-ID: <ud8nrww36.fsf@kub.nl>
Lines: 50
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Richard Stallman <rms@gnu.ai.mit.edu>
Cc: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, handa@etl.go.jp,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 06 Aug 1997 10:32:13 +0200

voelker@cs.washington.edu (Geoff Voelker) writes:

> But at some point, Marc and Handa expressed dismay at the code I added
> in dos-w32.el for determining the coding system for a file.  Do you
> still think this?

Yes, I still think this. But I think this little excersice has helped
in making the issue clearer.

Richard Stallman <rms@gnu.ai.mit.edu> writes:

>     I would like this behaviour to be dependent on the
>     buffer-file-coding-system in effect for a buffer, *not* the operating
>     system emacs runs on.
> 
> It already is, I believe.  The default choice for buffer-file-coding-system
> is different on MSDOS.
> 
> Does this seem to be untrue in your experience?

My gripe comes down to this: I understand a different default for
buffer-file-coding-system; the problem is that as a simple user, I
don't *see* this. M-x describe-coding-system says
buffer-file-coding-system is nil. The mode line indicator is `:'. I
only found out what was going on when I looked at
`find-buffer-file-type-coding-system' in lisp/dos-w32.el.

At this point, I added the following definitions to my ~/.emacs:

(defun untranslated-file-p (filename)
  "Return t if FILENAME is on a filesystem that does not require 
CR/LF translation, and nil otherwise."
  t)

(setq-default buffer-file-coding-system 'undecided-dos)

These definitions are a gross hack and by no means do I recommend them
to use for emacs. However, my purpose was the following: first, I
wanted to disable the selection of a coding system based on file
system, file name and file existence in
`find-buffer-file-coding-system', second, I wanted to have an explicit,
user-visible, default. The above definitions serve this purpose
well.  This is how I like emacs to be.

Marc


-- 
Computer! End program!
Computer! Create _new_ program!

From Marc.Fleischeuers@kub.nl  Wed Aug  6 02:01:42 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 6" "August" "1997" "11:01:38" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "40" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA14752 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 02:01:40 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA22366; Wed, 6 Aug 1997 11:01:38 +0200 (MET DST)
References: <199708051600.RAA14620@propos.long.harlequin.co.uk>
In-Reply-To: Andrew Innes's message of Tue, 5 Aug 1997 17:00:38 +0100 (BST)
Message-ID: <ubu3bwuq5.fsf@kub.nl>
Lines: 40
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: Andrew Innes <andrewi@harlequin.co.uk>
Cc: handa@etl.go.jp, rms@gnu.ai.mit.edu, Marc.Fleischeuers@kub.nl,         voelker@cs.washington.edu
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 06 Aug 1997 11:01:38 +0200

Andrew Innes <andrewi@harlequin.co.uk> writes:

> (Replying to several messages.)

> I agree with Handa that Emacs should not do this.  Nearly all text files
> will use a single end-of-line convention throughout, and thus pose no
> problem.  I can't think of cirumstances in which a user would encounter
> text files containing extra CR characters like this.  If such
> cirumstances really are rare, then having to edit in "binary" mode where
> all CRs are explicit seems reasonable.

True, this is what you would like to see. If emacs is in
"no-conversion" or "unix" eol-mode (`buffer-file-coding-system'
matches "-unix", and the mode line indicator is `:'), then I think it
is not unreasonable that if you want to make a "dos" file, to enter
`C-q C-m C-q C-j'.

Incidentally, this actually works if `untranslated-file-p' returns `t'
indiscriminantly, and the default value of `buffer-file-coding-system
is `undecided-dos' (that is, my .emacs settings since
yesterday). Given the input 

C-x C-f M-backspace M-backspace n e w . f i l e return C-x return f u
n d e c i d e d - u n i x return a b c C-q RET C-q C-j a b c C-q RET
C-q C-j C-x C-s C-x C-v return

File new.file is written containing exactly the bytes I input (as I
wanted to, i.e., abc\C-m\C-jabc\C-m\C-j) and it's read in the way it
should be: as a dos-file. Heck, I can even create a Mac-file on an
MS-DOS machine like this!

> (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and
> binary files?  I think such a distinction is useful - a binary file
> might contain all sorts of odd combinations of CR and LF, but a text
> file should normally use a single convention throughout.)

No. There is a `CODING_EOF_UNDECIDED', but in src/coding.c this case
is treated the same as `CODING_EOF_LF'.

Marc

From Marc.Fleischeuers@kub.nl  Wed Aug  6 02:13:59 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 6" "August" "1997" "11:11:30" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "24" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA15107 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 02:13:56 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA22961; Wed, 6 Aug 1997 11:11:30 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> 	<199708050838.EAA00520@psilocin.gnu.ai.mit.edu> 	<199708051200.VAA07192@etlken.etl.go.jp> 	<199708051818.OAA05876@psilocin.gnu.ai.mit.edu> 	<199708051840.LAA26604@joker.cs.washington.edu>
In-Reply-To: voelker@cs.washington.edu's message of Tue, 05 Aug 1997 11:34:09 -0700 (PDT)
Message-ID: <uafivwu9p.fsf@kub.nl>
Lines: 24
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: voelker@cs.washington.edu (Geoff Voelker)
Cc: Marc.Fleischeuers@kub.nl, handa@etl.go.jp, rms@gnu.ai.mit.edu,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 06 Aug 1997 11:11:30 +0200

voelker@cs.washington.edu (Geoff Voelker) writes:

> I've been gone from Friday until this morning, and so I've been
> catching up on the mail that's been sent on this thread.  From what I
> can tell, Handa's latest patch fixes Marc's problem (have you had a
> chance to try this, Marc?).  

Yes I installed this new function. Emacs is a little more predictable
and I have seen no big surprises. The (admittedly incorrectly made)
msdos-file that contains \C-m\C-m\C-j line separator chars is still
not displayed "correctly", it is shown as

abc^M^M
abc^M^M

and buffer-file-coding-system is nil. I think however that this is
ok. For one, \C-m\C-m\C-j as a line separator does not agree with any
convention so the way that a file containing these characters is
displayed is arbitrary anyway. Instead, I think it is more fruitful to
avoid situations where files with these erroneous line-endings are
created, i.e. making it clearer for users what the eol-conventions at
any given time are.

Marc

From Marc.Fleischeuers@kub.nl  Wed Aug  6 02:58:58 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "" " 6" "August" "1997" "11:58:54" "+0200" "Marc Fleischeuers" "Marc.Fleischeuers@kub.nl" nil "30" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mailnews.kub.nl (mailnews.kub.nl [137.56.0.220]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id CAA16012 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 02:58:56 -0700
Received: from PI0737.kub.nl (pi0737.kub.nl [137.56.38.229]) by mailnews.kub.nl (8.8.5/8.7.1) with SMTP id LAA26126 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 11:58:54 +0200 (MET DST)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708051851.LAA18429@joker.cs.washington.edu>
In-Reply-To: voelker@cs.washington.edu's message of Tue, 05 Aug 1997 11:46:18 -0700 (PDT)
Message-ID: <u90yfws2p.fsf@kub.nl>
Lines: 30
X-Mailer: Gnus v5.3/Emacs 19.33
From: Marc Fleischeuers <Marc.Fleischeuers@kub.nl>
Sender: marcf@PI0737.kub.nl
To: voelker@cs.washington.edu (Geoff Voelker)
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: 06 Aug 1997 11:58:54 +0200

voelker@cs.washington.edu (Geoff Voelker) writes:

> > the variable with M-x set-variable RET buffer-file-type but when I
> > press return all I get is [no match].
> >
> > ...
> >
> > Apropos'ing around I found another promising variable,
> > `buffer-file-format', valid values for which are found in
> > `format-alist'. In this alist there seems to be an appropriate format,
> > `ibm'. However, `M-x set-variable RET buffer-file-format' again gives
> > [no match]. 
> 
> Marc,
> 
> I couldn't quite tell if you had figured this out yet or not, but
> set-variable works on variables that have been defvar'd (which these
> have not).  For these, you would want to use setq.
> 
> -geoff

Er, yes,  was mimicking an average user. I think your advice was
addressed to an experienced emacs debugger so I was not really fair. I
do think however that end-of-line stuff should be at the control of a
user, without having to resort to lisp.

Marc
-- 
Computer! End program!
Computer! Create _new_ program!

From andrewi@harlequin.co.uk  Wed Aug  6 05:32:07 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "13:30:20" "+0100" "Andrew Innes" "andrewi@harlequin.co.uk" nil "42" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA16523 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 05:32:04 -0700
Received: from propos.long.harlequin.co.uk (propos.long.harlequin.co.uk [193.128.93.50])           by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id NAA25198; Wed, 6 Aug 1997 13:30:56 +0100 (BST)
Received: from woozle.long.harlequin.co.uk (woozle.long.harlequin.co.uk [193.128.93.77]) by propos.long.harlequin.co.uk (8.8.4/8.6.12) with SMTP id NAA02181; Wed, 6 Aug 1997 13:30:20 +0100 (BST)
Message-Id: <199708061230.NAA02181@propos.long.harlequin.co.uk>
In-reply-to: <199708051953.PAA07564@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Tue, 5 Aug 1997 15:53:20 -0400)
From: Andrew Innes <andrewi@harlequin.co.uk>
To: rms@gnu.ai.mit.edu
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 13:30:20 +0100 (BST)

On Tue, 5 Aug 1997 15:53:20 -0400, Richard Stallman <rms@gnu.ai.mit.edu> said:
>    (BTW, does Emacs 20 distinguish between text files in CODING_EOF_LF, and
>    binary files?
>
>There is a distinction which perhaps you could interpret in this way:
>whether no-conversion is specified as the coding system.

I'm not familiar with this; does no-conversion imply no character set
conversion, or no EOL conversion (or both)?  If the latter (no charset
or EOL conversion), then that nicely expresses the distinction I have
between binary and text.

>		   I think such a distinction is useful - a binary file
>    might contain all sorts of odd combinations of CR and LF, but a text
>    file should normally use a single convention throughout.)
>
>What, specifically, is it useful for?

It is probably only useful (in practical terms) in small ways, such as
if insert-file-contents were to check whether the chosen/specified EOL
coding is used consistently for all lines in text files; obviously that
would not be appropriate for binary files.

I guess the distinction is meaningful to me as a user, though of little
consequence to the way Emacs handles files.  That is, I think of certain
types of file (eg. .gz, .tar, .zip, .obj files etc) as binary, and I
therefore expect to do different things with such files in Emacs than I
would with text files.

So, while I would think it quite natural to change the EOL coding for a
buffer visiting a text file, I would want Emacs to query me if I tried
to do the same for a buffer visiting a binary file.  If I visited a text
file that had mixed EOL coding, I would want to be told about it, and
probably would want the option to have all lines converted to the same
coding.

I might like to have a find-file-hooks function that sets truncate-lines
to t for text files, and to nil for binary files.  That sort of thing.

It is not a big deal, but to my mind makes sense.

AndrewI

From rms@gnu.ai.mit.edu  Wed Aug  6 10:51:10 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "13:50:52" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA20275 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 10:51:10 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA24101; Wed, 6 Aug 1997 13:50:52 -0400
Message-Id: <199708061750.NAA24101@psilocin.gnu.ai.mit.edu>
In-reply-to: <ud8nrww36.fsf@kub.nl> (message from Marc Fleischeuers on 06 Aug 	1997 10:32:13 +0200)
References: <199707291709.NAA14978@psilocin.gnu.ai.mit.edu> 	<199707310605.XAA15156@joker.cs.washington.edu> <uyb6niore.fsf@kub.nl> 	<199707312038.NAA15222@joker.cs.washington.edu> 	<199707312342.TAA19727@psilocin.gnu.ai.mit.edu> <uk9i6s4wa.fsf@kub.nl> 	<199708030423.AAA25764@psilocin.gnu.ai.mit.edu> 	<199708040133.KAA04718@etlken.etl.go.jp> <u3eoq70ck.fsf@kub.nl> 	<199708050630.CAA31410@psilocin.gnu.ai.mit.edu> <ulo2hjd2t.fsf@kub.nl> 	<199708051721.NAA05128@psilocin.gnu.ai.mit.edu> <ud8nrww36.fsf@kub.nl>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: Marc.Fleischeuers@kub.nl
CC: Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu, handa@etl.go.jp,         andrewi@harlequin.co.uk
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 13:50:52 -0400

    second, I wanted to have an explicit,
    user-visible, default.

I agree it is better to make the decision about the default coding
system for a new file when the buffer is created--not delay it until
saving the file.

Can someone write this?

From rms@gnu.ai.mit.edu  Wed Aug  6 11:06:18 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "14:06:15" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA21460 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 11:06:17 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA24241; Wed, 6 Aug 1997 14:06:15 -0400
Message-Id: <199708061806.OAA24241@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708061230.NAA02181@propos.long.harlequin.co.uk> (message from 	Andrew Innes on Wed, 6 Aug 1997 13:30:20 +0100 (BST))
References:  <199708061230.NAA02181@propos.long.harlequin.co.uk>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: andrewi@harlequin.co.uk
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 14:06:15 -0400

    >What, specifically, is it useful for?

    It is probably only useful (in practical terms) in small ways, such as
    if insert-file-contents were to check whether the chosen/specified EOL
    coding is used consistently for all lines in text files; obviously that
    would not be appropriate for binary files.

no-conversion is useful for that.

From rms@gnu.ai.mit.edu  Wed Aug  6 11:07:10 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" " 6" "August" "1997" "14:07:17" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "5" "Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id LAA21544 for <voelker@cs.washington.edu>; Wed, 6 Aug 1997 11:07:09 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id OAA24249; Wed, 6 Aug 1997 14:07:17 -0400
Message-Id: <199708061807.OAA24249@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708061230.NAA02181@propos.long.harlequin.co.uk> (message from 	Andrew Innes on Wed, 6 Aug 1997 13:30:20 +0100 (BST))
References:  <199708061230.NAA02181@propos.long.harlequin.co.uk>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: andrewi@harlequin.co.uk
CC: handa@etl.go.jp, Marc.Fleischeuers@kub.nl, voelker@cs.washington.edu,         Marc.Fleischeuers@kub.nl
Subject: Re: [Marc.Fleischeuers@kub.nl: Emacs 20.0.92 on Windows NT 4.0: error converting cr-lf]
Date: Wed, 6 Aug 1997 14:07:17 -0400

    I might like to have a find-file-hooks function that sets truncate-lines
    to t for text files, and to nil for binary files.  That sort of thing.

I think it would work, now, to do this by testing
whether buffer-file-coding-system is no-conversion.

From eliz@is.elta.co.il  Tue Aug 26 23:55:55 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "09:55:17" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970827095430.7942A-100000@is>" "29" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id XAA02550 for <voelker@cs.washington.edu>; Tue, 26 Aug 1997 23:55:53 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id JAA07948; Wed, 27 Aug 1997 09:55:18 +0300
X-Sender: eliz@is
In-Reply-To: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970827095430.7942A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 09:55:17 +0300 (IDT)


On Wed, 27 Aug 1997, Richard Stallman wrote:

> The right thing to do, for a file which has byte codes 200-377 which
> Emacs can't understand, is to turn off enable-multibyte-characters.
> That way it is safe to read in the file no matter what byte values it
> has.

When Emacs detects binary characters that don't fit into any known
coding system, it assumes that coding-category-binary has been
assigned an appropriate coding system.  Currently, this is
no-conversion by default.  Handa suggested to change that default and
assign to it the (new) coding system to be called raw-text that would
still do EOL conversions.

Will this solve the problem?

If it will, then the only problem that remains is how do we make sure
that a truely binary file that happens to have a few CRLF pairs
doesn't get detected as raw-text-dos.  This seems to call for some
kind of heuristic in detect_eol_type, slightly more complicated than
what's there today (which just compares the number of CRLF pairs
against a compile-time threshold, currently set to 3).

I also think the solution proposed by Handa is better than turning off
enable-multibyte-characters, because the latter would probably mean
that users won't be able to assign something different to
coding-category-binary, in case they need to customize the handling of
binary files.

From eliz@is.elta.co.il  Wed Aug 27 00:28:27 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "10:28:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04042 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:28:24 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA08074; Wed, 27 Aug 1997 10:28:01 +0300
X-Sender: eliz@is
In-Reply-To: <199708270721.AAA16471@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970827102338.7942M-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, handa@etl.go.jp, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 10:28:01 +0300 (IDT)


On Wed, 27 Aug 1997, Geoff Voelker wrote:

> This does not seem correct since a file could be on an "untranslated"
> filesystem and still need a coding system (the untranslated only
> refers to EOL).  Do people agree that "binary" in this context really
> means use LFs for EOL?  If so, then undecided-unix should probably be
> used instead of no-conversion.

I agree, but only for files created by Emacs.  An existing file should set
the EOL type according to its content when it is visited and use that EOL
type when it is saved, even on untranslated systems, because that's what
Emacs would do on Unix (where all filesystems are currently treated as
untranslated). 

I think the automatic decoding of EOLs has taken most of the sting out of 
untranslated filesystems feature, except for the case of files created by 
Emacs.

From handa@etl.go.jp  Wed Aug 27 00:33:02 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "16:33:26" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "38" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04164 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:32:57 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id QAA01585; Wed, 27 Aug 1997 16:32:31 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA16416; Wed, 27 Aug 1997 16:32:29 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id QAA04792; Wed, 27 Aug 1997 16:33:26 +0900
Message-Id: <199708270733.QAA04792@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827095430.7942A-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 09:55:17 +0300 (IDT))
References:  <Pine.SUN.3.91.970827095430.7942A-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 16:33:26 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> When Emacs detects binary characters that don't fit into any known
> coding system, it assumes that coding-category-binary has been
> assigned an appropriate coding system.  Currently, this is
> no-conversion by default.  Handa suggested to change that default and
> assign to it the (new) coding system to be called raw-text that would
> still do EOL conversions.

> Will this solve the problem?

I think so, and I beleive Richard too.  Richard asked me to implement
my suggestion, and also asked me to turn off
enable-multibyte-characters when Emacs detects a file is raw-text.
But, of course, we notice that this can't solve all of the situations.

I've just done it.  But, I found one problem.  When we set
enable-multibyte-characters to nil, mode-line doesn't show any
information about coding system (except for EOL format).  Perhaps, we
have to modify mode-line-format so that it shows `B' if
buffer-file-coding-system is no-conversion and `T' in the other cases.

> If it will, then the only problem that remains is how do we make sure
> that a truely binary file that happens to have a few CRLF pairs
> doesn't get detected as raw-text-dos.  This seems to call for some
> kind of heuristic in detect_eol_type, slightly more complicated than
> what's there today (which just compares the number of CRLF pairs
> against a compile-time threshold, currently set to 3).

I don't think it's worth implementing such kind of heuristics, because
there's anyway a case that we can't detect correctly.  In addition,
such a code makes Emacs slower on reading a normal file.

Or, do you have any idea on detecting EOL format without making Emacs
much slower?

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Wed Aug 27 00:45:42 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "16:46:50" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "32" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04483 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:45:41 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id QAA02426; Wed, 27 Aug 1997 16:45:39 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA17012; Wed, 27 Aug 1997 16:45:39 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id QAA04804; Wed, 27 Aug 1997 16:46:50 +0900
Message-Id: <199708270746.QAA04804@etlken.etl.go.jp>
In-reply-to: <199708270721.AAA16471@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> 	<Pine.SUN.3.91.970827095430.7942A-100000@is> <199708270721.AAA16471@joker.cs.washington.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 16:46:50 +0900

voelker@cs.washington.edu (Geoff Voelker) writes:
> This discussion about no-conversion has made me rethink part of
> find-buffer-file-type-coding-system.  For files that are specified to
> be "binary" in file-name-buffer-file-type-alst or
> untranslated-filesystem-list, the no-conversion coding system is used.
> This does not seem correct since a file could be on an "untranslated"
> filesystem and still need a coding system (the untranslated only
> refers to EOL).  Do people agree that "binary" in this context really
> means use LFs for EOL?  If so, then undecided-unix should probably be
> used instead of no-conversion.

I don't know what do you mean by "\"binary\" in this context".

If you mean coding system `binary' or you use the word in a normal
context, e.g. "This is a text file, that is a binary file", then
`binary' measn "no code conversion (including EOL format) required".

But, now I set coding-category-binary to raw-text.  So, in the context
of coding-category-binary, `binary' refer only to text part, and EOL
is automatically detected.

So, your mail makes me think that we had better:
o treat `coding-category-binary' as truely binary even for EOL format,
o set it to `no-conversion',
o make a new category `coding-category-raw-text',
o and set it to raw-text.

What do you think?

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Wed Aug 27 00:46:31 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "10:46:08" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "39" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04520 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:46:29 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA08109; Wed, 27 Aug 1997 10:46:09 +0300
X-Sender: eliz@is
In-Reply-To: <199708270733.QAA04792@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970827103541.7942O-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 10:46:08 +0300 (IDT)


On Wed, 27 Aug 1997, Kenichi Handa wrote:

> Richard asked me to implement
> my suggestion, and also asked me to turn off
> enable-multibyte-characters when Emacs detects a file is raw-text.

I thought setting coding-category-binary to raw-text is enough.  Your 
two-line test was all I needed to read msdos.c correctly.  So why do you 
also need to turn enable-multibyte-character off?

> I've just done it.

Could you please send me the diffs so I could test this?  Thanks.

> > If it will, then the only problem that remains is how do we make sure
> > that a truely binary file that happens to have a few CRLF pairs
> > doesn't get detected as raw-text-dos.  This seems to call for some
> > kind of heuristic in detect_eol_type, slightly more complicated than
> > what's there today (which just compares the number of CRLF pairs
> > against a compile-time threshold, currently set to 3).
> 
> I don't think it's worth implementing such kind of heuristics, because
> there's anyway a case that we can't detect correctly.

Then how would you suggest to solve the case of a true binary file (say, 
an executable program) that happens to have 3 or more CRLF pairs in it?  
As far as I understand, Emacs will convert the CRLF pairs on input and 
add a CR to any LF on output, which is disastrous in such cases.

> Or, do you have any idea on detecting EOL format without making Emacs
> much slower?

The idea is to not give up checking the file after you've seen the first 3
CRLF pairs, but look into the file some more.  I didn't think about this
enough to have a working solution.  I wanted first to be sure that people
agree that this is the way to go.  But generally, I don't think this would
make the input much slower than it is already, if the heuristic is 
implemented in C (inside decode_coding or thereabouts). 

From handa@etl.go.jp  Wed Aug 27 00:47:52 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "16:48:35" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "11" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA04554 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:47:47 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id QAA02490; Wed, 27 Aug 1997 16:47:25 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA17064; Wed, 27 Aug 1997 16:47:24 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id QAA04817; Wed, 27 Aug 1997 16:48:35 +0900
Message-Id: <199708270748.QAA04817@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827102338.7942M-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 10:28:01 +0300 (IDT))
References:  <Pine.SUN.3.91.970827102338.7942M-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 16:48:35 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> I think the automatic decoding of EOLs has taken most of the sting out of 
> untranslated filesystems feature, except for the case of files created by 
> Emacs.

I'm sorry I don't understand the wording "has taken most of the sting
out of ...".  Could you please tell it in an easier English?  ^.^;;;

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Wed Aug 27 00:54:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "10:54:00" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04724 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:54:40 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA08125; Wed, 27 Aug 1997 10:54:01 +0300
X-Sender: eliz@is
In-Reply-To: <199708270746.QAA04804@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970827104903.7942P-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 10:54:00 +0300 (IDT)


On Wed, 27 Aug 1997, Kenichi Handa wrote:

> But, now I set coding-category-binary to raw-text.  So, in the context
> of coding-category-binary, `binary' refer only to text part, and EOL
> is automatically detected.
> 
> So, your mail makes me think that we had better:
> o treat `coding-category-binary' as truely binary even for EOL format,
> o set it to `no-conversion',
> o make a new category `coding-category-raw-text',
> o and set it to raw-text.

This might be an OK solution, but I'm afraid I don't understand how would 
Emacs distinguish between these two coding categories (binary and 
raw-text)?

Let's take msdos.c and emacs.exe as two examples.  The former is a text 
file where EOLs should be decoded, the latter is a binary file where EOLs 
should NOT be converted.

Assuming that emacs.exe has at least 3 CRLF pairs in it, how would Emacs 
know which conversion to apply in each of these two cases?

From eliz@is.elta.co.il  Wed Aug 27 00:56:07 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "10:55:45" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "13" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id AAA04746 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 00:56:05 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id KAA08130; Wed, 27 Aug 1997 10:55:45 +0300
X-Sender: eliz@is
In-Reply-To: <199708270748.QAA04817@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970827105414.7942Q-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 10:55:45 +0300 (IDT)


On Wed, 27 Aug 1997, Kenichi Handa wrote:

> Eli Zaretskii <eliz@is.elta.co.il> writes:
> > I think the automatic decoding of EOLs has taken most of the sting out of 
> > untranslated filesystems feature, except for the case of files created by 
> > Emacs.
> 
> I'm sorry I don't understand the wording "has taken most of the sting
> out of ...".  Could you please tell it in an easier English?  ^.^;;;

I mean it made the intranslated feature unnecessary in many cases where 
it would be usedin Emacs 19.

From handa@etl.go.jp  Wed Aug 27 01:25:00 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "17:24:33" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "68" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA05431 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 01:24:59 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA04228; Wed, 27 Aug 1997 17:23:22 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19038; Wed, 27 Aug 1997 17:23:21 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA04887; Wed, 27 Aug 1997 17:24:33 +0900
Message-Id: <199708270824.RAA04887@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827103541.7942O-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 10:46:08 +0300 (IDT))
References:  <Pine.SUN.3.91.970827103541.7942O-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 17:24:33 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> Richard asked me to implement
>> my suggestion, and also asked me to turn off
>> enable-multibyte-characters when Emacs detects a file is raw-text.

> I thought setting coding-category-binary to raw-text is enough.  Your 
> two-line test was all I needed to read msdos.c correctly.  So why do you 
> also need to turn enable-multibyte-character off?

By turning enable-multibyte-character off, you can avoid seeing some
garbage characters when some part of the buffer contents matches
Emacs' internal format incidentally, can avoid incorrect cursor moving
in such a case.

>> I've just done it.

> Could you please send me the diffs so I could test this?  Thanks.

I'll attach the current diff at the tail.  Please note that it
also contains patches not related to the current discussion.  I'll
update FSF's code as soon as we reach some agreement.

>> I don't think it's worth implementing such kind of heuristics, because
>> there's anyway a case that we can't detect correctly.

> Then how would you suggest to solve the case of a true binary file (say, 
> an executable program) that happens to have 3 or more CRLF pairs in it?  
> As far as I understand, Emacs will convert the CRLF pairs on input and 
> add a CR to any LF on output, which is disastrous in such cases.

The 100% safe way is:
o set default value of enable-multibyte-character to nil,
o or register the target file name in file-coding-system-alist,
o or visit the file by C-x RET c no-conversion RET FILENAME.

>> Or, do you have any idea on detecting EOL format without making Emacs
>> much slower?

> The idea is to not give up checking the file after you've seen the first 3
> CRLF pairs, but look into the file some more.  I didn't think about this
> enough to have a working solution.  I wanted first to be sure that people
> agree that this is the way to go.  But generally, I don't think this would
> make the input much slower than it is already, if the heuristic is 
> implemented in C (inside decode_coding or thereabouts). 

The problem with the current file-reading mechanism is that it doesn't
read the whole text at once, instead it does:
1) reads one bunch
2) detects coding
3) if coding is decided, decodes the bunch just read, goto 5)
4) goto 1)
5) reads the remaining bunches while decoding them by the decided coding.

So, currently, if the detecting routine at step 2 can't decide EOL
format, it insert the text as is in a buffer.

I understand that we had better change this mechanism.  But, it
requires another big change in the current code, and I'm afraid it
will delay shipping Emacs much more.

And, I beleive the patch I attached will save most cases.  Considering
the trade off between making code-detection not that slow and making
code-detection more intelligent, I think the former is important as
fas as we can't have a 100% correct code-detection.

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Wed Aug 27 01:27:11 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "17:27:58" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06114 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 01:27:10 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA04510; Wed, 27 Aug 1997 17:26:48 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19240; Wed, 27 Aug 1997 17:26:47 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA04900; Wed, 27 Aug 1997 17:27:58 +0900
Message-Id: <199708270827.RAA04900@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827104903.7942P-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 10:54:00 +0300 (IDT))
References:  <Pine.SUN.3.91.970827104903.7942P-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 17:27:58 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> But, now I set coding-category-binary to raw-text.  So, in the context
>> of coding-category-binary, `binary' refer only to text part, and EOL
>> is automatically detected.
>> 
>> So, your mail makes me think that we had better:
>> o treat `coding-category-binary' as truely binary even for EOL format,
>> o set it to `no-conversion',
>> o make a new category `coding-category-raw-text',
>> o and set it to raw-text.

> This might be an OK solution, but I'm afraid I don't understand how would 
> Emacs distinguish between these two coding categories (binary and 
> raw-text)?

Only by consistency of EOL format.  If consistent, it's raw-text, if
not, it's no-conversion.

> Let's take msdos.c and emacs.exe as two examples.  The former is a text 
> file where EOLs should be decoded, the latter is a binary file where EOLs 
> should NOT be converted.
> Assuming that emacs.exe has at least 3 CRLF pairs in it, how would Emacs 
> know which conversion to apply in each of these two cases?

No way.  We can assume any rare cases which make any code-detection
mechanisms fail.

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Wed Aug 27 01:29:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "17:29:18" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "166" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA06187 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 01:29:01 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA04680; Wed, 27 Aug 1997 17:28:08 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA19303; Wed, 27 Aug 1997 17:28:07 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA04906; Wed, 27 Aug 1997 17:29:18 +0900
Message-Id: <199708270829.RAA04906@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827105414.7942Q-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 10:55:45 +0300 (IDT))
References:  <Pine.SUN.3.91.970827105414.7942Q-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 17:29:18 +0900

Oops, sorry, I forgot to attach the patch.  Here it is.

---
Ken'ichi HANDA
handa@etl.go.jp

------------------------------------------------------------
begin 664 all.diff.gz
M'XL("![=`S0``V%L;"YD:69F`-P]^W?31K,_FW/_B,6%QJ:6X[<3YZ-M,('/
M-`\:!]J>TA,4:6V+R)*1Y(3<POW;[\SLKK229<<)I:3-H8D>,[.[,[/SVEW5
M=D8C9IA6<,@,@W^PW+G-'_<GIC?F^_Y8>_:HREU+O_\_[>9D]_E0NW5]T[;Y
M*`045JUN\JEIA4:C5JU5MSO543C:=+DSW3PPS_G(<7G5\1:@%B#N/7KT:#U:
MA9,Y9[OS,6MT6*/6J[=ZS2ZK;V]W[QF&<7U#A5^X+=`!J]5K-WOUFD!_E/ZA
M'C5;E5:-T2UC\.\;=N2Q7[GG?&"F9[-HPMG@R0$['G9JM5J%V?[\S.6&[4=L
MS*.0A5;`+Z&]^:P*R/CX,:O>,_!*W#SX$_Y^$K]5"[],'&O"]G`4+/+9/.3X
MQ_*]"QY$[&1PPG`H]$X`[3OA3#RK$(&SJX@;EC^=P:-%$.KXF'L\,"-.(\#G
M#/EDN$X8@5"KQ$H8>W,;I&X8N:QI=2OM9HHUP_F9[03<BOS`$?T[XVQJVIPY
M(QAI&%CP^A-S0F:#5O*`>Q$;!?Z4.F'-`WQ`A!25J^J]^VSXZLG3P?'P\?NY
MZ;@L/#]7S3TC+JB1V((4LL=VK,CQ/9-Z,?(#UI\X'@<V/G_"K(D9F%;$`Q;R
M"(4"",;S)X_?T)!;G4J[(8;\]XWI"XTG3V;U1K-2;VPKJ0'LXP=_]O\[.-P;
M[AEPASIX>'1HI%_()Y]@O"]V7^[B,[S^Z>AX;_<0K_9>'1^]E-<'@V&?5/D^
M,UVWAV\/=OM#?"5'C9>_'!WO/_V44CK%AA.?<2^<!ZB;9L0N@8VFQX*Y)U2Y
M"MV>`+LC,X`IAHQWQIX?`+]*'H<9(F8`=[PQFSA1&<5DXAP*D%5@]4!0?&3.
MW8A=F"Y8$G_$J(-"Y^N-!C!H2ZG`UV,0D;M3[,G5IV:G4N^T4E;@>.Z2P8I-
M#"GY)OH7H=+]7T_VC@^,IX,^/(P<TEHQX!Y<%1[\>?SJT)`\,5QX=39W7!O8
M,)N9T>33IN.!NGLF30EW$PC`_#`L[X*]070&/^"IH/=LHW1F1M;$4"#2@D;E
M#6;`=!0=2R;Q9M*QQ7Z,F*"5LJVBUT*6W^B6-O3G@<6EG?;(&ZR>U6;`B88-
MP@Z<LSE"FB`[E.?0KZ"L(I19I,EVRDTO%%J@&R))9YDQT@P18P-/4+3,D%=0
ME][-0R`,WI*HN(YW'J*]>@C"Z\'?PH_[AZ?#'S6>/8";'Y>\L>"5!>^(UA,.
MEHNCHGHXD/G,!G88L88;R"MPH-SEH#(P,9COVI*!E^00%8N0G2#*R`0;:#.(
M+90UP8[J,Z:79W*PJVA"4?Z$JR!VP"F#!MU'#4*`$;T&SXQC0>75`1C.&VY-
M_,24?63C@,\08_2)?<\V;7ZQZ<U=0`/F`YN#.=])T+D+YCN8HEX1QB.V`X.-
M`6S?4]!XB7]O."V$;L?MR1E1+.7RG;TI5M\4RT4QA1TOC,@RP2^<"##4W]F'
MMR7+AD8'A\.3W?U]8!EP;//,\39GEW;Y+;O_&$&T!W\P.7+A6>O-=J4.\8+N
M6S_+4GS'"K_#-):3^`_V\2.;GL>S^A]L2%:H,`'=STA$-IJ2Q@IAD$+\'<K$
MI`*3JO]M[<+OF"5Z+T;.%U+N7+?8`;?8[2BW")W'J1Z,TDUL*K71GT%\2UTF
M++(PQ;X_NT*3B<--0O\4&J0]U6*"9\U26I2!C:T,^!%F6)`>JGB#T`L?60XW
MYE,S/&>U'8'TX0*QOOT6'$>D&[LRVY&TTZJ9TV`<,_TUC3*I:,+.*FE;+OA(
M-O7#R+VBZYXP1FTP1IVN,D9W7SX9Z-@D_(4"N]:FK)H$<H+F"CTM]K^XZ]^1
M&'3!2R:OIPSVUZF.@+G+6+HIN$+#FMJY2-<AK*B=K$;,EE)JO5IC52GE&FKI
MRDH+J'565%;:6Y5F=TO92,:*$#E'F&^,YAY%QC2-+`P%L0I")I_M[PT.&$X$
MFF@8#^R9$!LJ%$:QK.M"1RZ=:,(@=H*X<5P1>''DRSQSRJO%LG"X)1`6I$_Y
M3J4$2&4`@OZ]RNL"1HD)7=#?:I%&4X(XEI5*R5@6R:M7Y3)A``X$ND`Q?D'-
MT@M\A(PH63`C$D16XA]FIF=35PT<%/8EI@>((8_>:PR%R:;CX\\]@\&_-`^@
MJ0P?`!4A$RZ06"99;H0P>H0JD9:8T,P%%WB2'R[#V6&@LR^+%_&P77F+V"/H
M.>1!WMB88K#$BMB;HAB^2X@P[9>$`0JHG-`C+KAB]*X8-5D$'/4%0,O4R'"\
MV3PRICR:^#;S*(J$$9]HR1-!,`F!:0),@,B!/&D\A\@DXA\B63M#[>YL*P]S
M9[7[VX"'E!*&ZRHY9JMHP;^(GINSF7NUH.6B>S?1Z2\HW3OF,GQO=#.7(1!N
MX3($8F$(7AB-?!UL>ZW7;O5JV[=Q&9*:[C+:O7JCUVPM=QF-5JW2:+6%R\#F
MQ(-DEA5%"4:VYZ((P7Y@>8L*!6=S+(I0_5M8EZ+2%*QX0)]LM#CA51CQ*406
M;",P+PV4.FNS'R)\4CPV+TD/*K(H(6LP""-K$B$+H`%(7K>,,P>?VF@5RX`-
M__(:@GYO.*$/+&LTC"[B--@/+V@X@^$1P\>0,N(0ND21>P(?AH6_?0]FS/,:
MS<:-4LD,+<>!S!057/V7[W_KW4JST4S\+P5PHF<0:?$Q6!(#^]4U,)8JI/MH
MN+YUOA)M*XNVA6AAV%B*=>:,VX7"AB6*VW0;6X9%6,\,K@#:\V4Z'H)V*7&"
MA5!,G@4.V*SH2K`G=WS"8#?JP(YN8K#_F>Q0&GM+3MPIXW83NW8;DX;VYUG@
M"/O38+7M7KO1:[1N;LVRAJS3J]=[[?JJV+=1@0@AF7OU'AM.G%%DO!@,60D<
MS\'0^,GTWCEE8;RF8%A@FD,(\,*$@`_7?E[V*X3:Z#$P$Z138..0XV@5IJ9W
MQ<#Y.:87A0*PV6-/0(>R!-5:$M)#]6KU&&C*!>A%A?7[^W#CCP-S&N+%A6/S
MV/Y`(B8-D>F.4;,F4XQ8#@[W#HX.!WT,5DQM>4JLG-G`0=>\@A[XF(H!.?"N
MG'PMU:AURXC4GA[UC>')\>#PN:!G^]9\"JZ:V"^M^'+L9_NHB>&,6\X(XY8I
M5J!G$,<X(<8TPCT@(8C+.$9:)[^]W%,!'(I("^#^)2(2%-H]=BPM1<:3X6K"
MM>Z,_3ODG#LUMYN5YO9V,C5W=M@N1L%!"*P"<PJ=BU?&WJ9ZLH'\G_$@NJKJ
MH;&"@=Y)_W]!@30$%>";*5@M9*QQ61IZ2H0`LH1K'I13C7DP8]'5C)<A^?L/
M7;$:7'XO+EN"7(D'`:`5<9B8(1`K(9B%<!YD4ZM66\4D&,]O8^KQJ>\Y%K7S
M.+YE/S!J+;ZO-[KI-F.]P';Y5&O71*T%D9Y![!_K2Q5[0A-N&[=:U)()]X]B
M?/N?R?B\&=#JU"NM3C<N7E,)KR18-Y.YF6N.13JW\-:.WY;5(@HE=BA`72`M
M":9JA$P-8^"!E!U;SNF8ASCWB:LM5@*;5R[*/(]0,6M+:03;2-]J+:^!I+0A
M&V;EH<:PZ19#N9VD46EU8YV^PZP4N*6H`#\[S*GR*BL]5JHM"8O$>]FX,['H
M'9=.KN)W0?&[W93I!UL3SL]"QW;,N$4FB52$+['\(.#AS/=L*NF8:2A%2/DN
MFT<@8/"+W+,-?V0HMP@.2AJOG)%SWS50%G'\+XS'?Z2$FDH_A)5+.FSD\YGN
M!$Y-5DM(6[M@@;?2%OA.CQ^-9\(#36$_GQ]Y^M%I;54Z:LL;,JS3KE<ZG9JV
MC(1-`E$(0FP#IK`H.!C)(SF5L??9(H#`QGQ-8E%-=U6'L:90B!G!WV>YEDF-
M`;B`4*44&/$U0Y@X^)WH$8AN$('[!7F+X'`Q',2=&2S@IEU52,)2<`\]CD$U
M-+':K7Q/F#A@?<0:Z[(L8X2!DKDS*2JM#JZ;HRK@&R6I"JEP`*$K)9=MK+G5
MZCV\N$F6&E/2T]1FKUF/%WSR%+[>@9BLWNTF^P5QKQ#%YS"GC*3@#%8B78*N
MJLK:SL[.-]]\8\XC'UF<E*.I0\;R)1<L8:<KTN"YXOV//]/"[,RTSDW<L>9X
MV7HU="BU*'.X>["'VX,(3"08\6N9Z*#)4N7#I`H<[L`@/#1/KGM58:#RX<2?
MN[9,-J[(IKW%`6P@3V*JT!;T`5\*FG)#6!1PK@KGVA()*SY[FF5FCQ4%`\1D
MDCQAI7BIQS!#302Y*T'$Q;(D(X)BP7C<M`)VP,,E544Y?B,H&@(9?3^U*5K/
MX*0EIZ'0,YC6;'8^)@"ZH`=^F(JR*?HE;/X!X$*8_'%/!"VL8-[7[7Z"@E:'
MC,P"$EIB'H:@'ZS8-[V-B%U"_DO;C>1FD8?AFV*1J:&)@::I(X:D'H/%4=8M
MR,<K48HYT$PL,C_D+G!^H26!$K.QE`A(K,LE`MT8S5V7%:N/WKR!J?Z@B!X@
M](-HD2$TI5#A'X9B9X+>30&<RK[$ROE\%N\6KZJ^F1<H-VM./B9F36+3X[$F
M?!O[D4^N@-63YH2C\4+:5D4I?H##9!,0,.0J,3(*R/7]<_16H(>E`)*F#S/C
M_=R/=&T4:&+@!:W%$BTG&A![H*>/`V:<C-1T/H7$I8'R0H#BP]35WL1*$T"L
MP<T`&@`C=6D&MD8/S=>5(;I+=7E)H2`6HI*N`!^.(<4#HS`V@S.R;^:("BO`
ME@P[4MR48SOC8T=LL<R,$`/&A0$4:)D,]UQBW[!$HD.4E_0,5RT7[+'>*['`
M=S.&;"#I91);&%4A7R`%M1X))AO;-^U$`Y,1(R_X>US3*WG1A#4I`"VSC0WI
MED*8\L*Y*!01F0;^F)2\("P%A*4S"#OEV$0H6Y>M%%:Q52Q2:E--:OXRGDI^
M*B.08X(FW#K7)S3EE@I>TU5<-S8@QIO)^;DP!538"<[)B\)E=-**1\,RIHZ7
M`I`ZD%&`HO)`'&Q?S.:BG!+K2W]!S"6US"L&5M;$O6BE5INI@FX22G(1L8BV
M!J`@3M47DM&7/QSBOXV'87PC+LIOO*)0!D$1=:TN=`V2[)>[_9]VG^\9&)MD
MH1HQU/[NX?-7`*9!+&HIP.WV3P:O=T_VC&>O#OM9<LV%1D\&)_NI5DFW:9;'
M,$_WAOWCP<N3P=$AI!B>*.&6-226W:51?504#;9%@QE@(4Q9S*VE(750Y=M=
M$X,C$<TDW7HVV-_+\$R/BD+3"T'8H+XH[#P%5DA)@%0N+QCZV-]B&2;!CYWR
M+1P?$5Z>W*57%#4T;$H2C4UZZ=QQW25-+?7SM*V\&*>E2:BHF4A95MO!N+>D
MUA.$QL6%`I$7;-7CDU)_;U[`XDTK>)?9N+(Z37B+W=E8L7M+)0KR\JOG"L))
M4`:.9C1_8',/@@+VEGB7)HV$Y5!PX0=W/LBM1O&YOS=%?7/IFR*R1&,$_/L%
M]WC*?4>T+=]4*TX5(A/PJ>2)*C&*D@"DU([(/X%(>G<258K$D&XV(,$KJCMI
M2O59"=7RQ$G/FV(FY2#D9$$I(H4X1@_9;1(C+0^)FY*9F:STKDI!TBCIN&$A
M$4B`%>SU&8O>`,6+,EB\)G](C)MN1QE;94@78@^50L01ZLH<0@N%;YA$Y$7:
M6A:Q,G]@&B,S\:K6H5MG$%0:D\'DNLG#&B'6DH2!^GM]QK!.KO"96<*:R<^R
MQ`U+&<(^&7CZ33)1A2K79A'0>;"*-\PBKLT:5O,V$0!-^[B[P.RGXMQ<VAS'
M4?P9MTP\1N[HE5P=G0X!"H<(T(%_SKTDG4O,Q7.E5.`C'H;RL)Z=L1H2:=DX
M(%;\D,J`9!>&B3?(,?S2)9#AE]YIF!@1S9Z')-@P-N]QE!`;-+41>>&-MA/Y
MOBA,TT[B'(.?;#&6@*O<25Y=+4/!&2W6L[0BJ!8^:IXD/<[DQ6*Y!T7E\Q!=
MQ,0,V=N'8=JI5L2Q3WE0N*A2(-G'_,*@/KD6.H435>^0QB=M%W;&-.I4O`#/
M*H=\@4JB+]+0I[;LJ@Q-;A:9AK&*",$FU#)MQ]+#(6=8&:L&]Q1<,ISE=3E%
M4E;E<FMRN7G]3;/ZI3G]C3+Z:_+Y6V?S"Q[[FC)5OE4NY)GDY1G_\IS_NJP_
MYIKVY*\I`2A:6A&`Y14!=+A&#)<N`TB814?#<@L!.LGF0M-:*4`!4C&@L&XE
M8-TBP#KY?\SQ&U0`;IK\T]/$^L=ZDWC5`AD8,H?+7"IE;-)I5@5.SA1"5R=?
MQK-ZZ,-@9F9`IT%$D)YXW6*J*I&*8%95)'+LL)TR9+I[M-.C3UE5B-E=$T-V
MD4OX8%NSAU(^*W*_<0V$::VEBB#T>%D5A%Y>4P;1TI:5%8^ON.#LFGCR9,PW
MK:L`)WJT:L$Y'_B:!>=\)'U;-"XX=]?8%KV$4OJ`1Z/6:Z]8<&XU*BW]1."0
M1Y`TI@[A<._""7R/-C*5#E[M[Y7%%W&P4=>QV.[^R1Z$AV`##W][01\V"7/J
M`V)+`6KC?&;05@9#)UQ4Y,!?6_(2=\AO;;6WC79LJ9)WIBM7VR^\JW>L&#^_
M,B]Q6V)1'4GZGWLB6A>[8^6V&QBT?E#LWS-HK=J2OTEL&;%E[+H+4Y%[8W@P
M66,>)I#K3L($HS"<BRT?^'&RYGI;/O+(9,Y7M59/OW:EG7RT0&[C4B(#Z6<W
M=A6,&__D[+D51T\*A93IQQ.>^<=H&@(2(P//J"^CMV74UX*3IWER#_,L(9Q&
MT([Q+,`+&>$)$59(KG$F+3N_D]JRE4<R?.>$A<([><+`".E$`CS+@Q4'B%+G
MA\"\")M#DDX+Y&M)>ET)?O=Y&G'')*T.:GTA(>?N7VQ7NO7,],X;0F%#&\+2
M[FW<I'_94VQ+V9)[GH\M/<>V^DR?&/`R9;PC`\Z<V+O-6.^"6[QPH-=K!J@Z
M[+JN4<>A\/3%W*7]D)U>;9W/5N03RKC';J_67.$>:Y56YU;1Z6MHD_(MSEX/
MAOW!X-8Q6D))%%DV+N('QH6#9W_C0*V8>O4^B(-0B*A>#WX^%FN;0%:<)XF/
M%<MC5)8X,97N?O6>VA$<\%D`#S`5MAU0/2<"#)=-S>`<<MDKMDO#U';\FEC^
M=UW_,A3?X$%N=F\5]MX1;JX1W2Z0NSLB^#HV(PRL33'&JK4X4?6WR^V"#I7S
M`9M587(*-3WWVYU>8\77:CK=2E<[*U_"5:WY;,8#K%*6Z=.)>(!2.YXH@R?Q
MN5TJ'@,,556$ON-!3ZKHSIS(=)W_Y7:63/;+KJ1PK2H[@K>!\A:#^/N=EWB.
M%%=_L3BZ*>HWICC+28HEOH&0.9Z!7U,D.E@(P6].GOD7O,(FXJNBX9R^QV$R
M.D<J/YT`SP6]@#33P1H\'D,5+A?XI"6S7Y5/ZEL-JL"UFQDZ+I3#`,S4>5=\
MOW#$H<K$U@PB@TLIP#9ZP9)8!0^ZAG/\3J7D^0>+S\1A*^VH"U%0QUUDO]IW
M5Z1Y4Z'1ZE0:[;KV%8YV`Q[$W\S%#Z">'IV]PZ7WGT7/3I6,?^:^>XHG3G:R
M@**(=XHUP=,4T@+DS`^C4V3(:<)\H`RV\)0XI#W>@:0AA>OYJ;?$<"P7!UX:
M3JP!J-Z(DB^T(>B+)<V%?N$.F=-9X%L\1%C]5J)0+\4#M5-F)Y_%G:UNI='9
MSGR=8_,1ZH8\<4T^`LP]I3TXB(#T@%?H+!%*,42Y#H9'=.Y<N132\\!WA?KZ
M0*[.2L-A`S&&PR;,K7Q\B6:$_/T<=(4;#M[;<PMTIM0?#LH5*E;#Q)2;GDPW
MI(^6AXXM]Q^`X$+G#+Q[1'MR0G_*X4_2``H=OW?[:#,>,2Z,EE[O]4^.CE^R
MTFL:ZRF-%96$G]*V&'7:Z]MOV?W#P3X`_BI0EF,8WRNI_F[]H8Z+?<2/T;''
MC[%+I_VCIWNGR!=XFGW8%">6&[@-KM'5SRS_^Z3TSQ1/[J3JUIH@KGHM.=G\
MD0'*X/#Y:7_W9._YT?%OIP>[PY].GPP.=X]_*^\H@XP_6`M`WH%T2;1B)U.`
M!ZRO(A+1$#\,D3@CDC)*R)2KFRZ8+(H2I4C5Y[J)GP7ZY.%CW#:!YR-C$PA)
M)L28@55A\.L4'$E9=CP-1V'.*=4?<J%7#Y.4&1B#W$D^BOGEN2,90SXXESL;
M"Y]R*B69>OE.\"Y7T[9J8+ZWZBWM`[#`(^GZH*>X<GBJNT+U$4NY/TC`W<\`
MQJ?`@<OXU;*1/\?_!<7_-W=UOVG#0/R9_R+2'KIH>0`"6R6T21V$*1(-&]!]
M2),09:B)5(A&@SI4\;_O/GRQ\P&$3>MX:A5\/OO.OKN?<[Y@T^+G!Q[QKB:W
MN*7*V\;U#.O+P@IG^$."NL#,,XX&S$B%I:MHUH"@P$<J47C#P730IW'OU!+8
M26XP3AOFWM3G+F<V]\$BN:":[-&<7^Y29NL:PZ1-PN]THQ591(@^5O/M$3GX
M07<8C/WQQ`LF18F4KPZW`1)R7QL1U*5["8_:AB?)2`/LF\'R)NAY7;_G]>3&
M.\@$5<HS7L69J]41!\F24XPE[`P3SO/I<'&R@TS->=K4V+*>U%_Q>8^XK"CE
M#\.SK;[KS55E=,@?8549*3\#S^G>L`2QRC'-DPWF46/]BLR9(`]?&!.TSP:,
MULMLF.>HGNV.HMJ)Y<J2O148_^YAN[R-[SO41N6]*,^5H;#_/X@.#V+<L!*(
M#HLE_1IN-1`=YD!TZ_#KW:8+IJ'5,(-:)5!<<-/H(<90PT$O<K7*51LP8A&R
MRW0H(]63S"5M]HB'KTX-^WOO?VCG.C0^I")[/$\_G]\3]22/5G$T7#9)P2@]
M)L13`J8P"QXQ`Q@2A:CT2'?*@4+W??)ME%-2,,!D'DAN;3/,/'NY.?]$<*\*
MK,".3-&.T.1+1LIPF8H?$)<,S%]F83Z/3+=52!]/VGYNHO4B#_EURW*P+T#_
M1)57!-UU$W2_X`P\TV)W1[4F!62S-6C[#M.IR'EAP4A1?`E9ZETLES4(QIKK
M+L$_5)@LB4%YJYC3C;7^I$@'MD.A;1<45@"P^"'**V%G^A6K957AF$J]R#/U
MVXG9*@G7\>8NU+</U.Q9%5TC*%T"^HE4R;%9LGE0YK]03ZTHO^N/PS'.*1@"
MTSKVR^.A'F&Q[=&KZ]8=5[]5X!&-6%$-*RT3(BN:!6?1EWQD52(`W.:7YAX%
MC[Q/-_[(@U$"6`H^>Z.Q/PR45[-KM>]\IBVELP"'221@;KF,=Q4BA<\.T>E`
MVQ8BP(2:9D_D,>C;U26SG&T/[=@]8KF^^G9$-,\EF1.(,".>=E>J.!)G!7EJ
M)GD^1R-.K0H?V/]BB\=5@#(Y!'393HU6GC&XORC50(JN_-[7*7T?9(HO?&IU
MK@WGOG'<=NH`SV27_,%20&$?(336`GU`X>@Z$`]HFUJMN+'.:&>=B3337?6W
5TJ^\#VNG;\+GWX6_`2X)YP='=```
`
end

From eliz@is.elta.co.il  Wed Aug 27 02:13:22 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "12:12:01" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "20" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA07268 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 02:13:21 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id MAA08303; Wed, 27 Aug 1997 12:12:03 +0300
X-Sender: eliz@is
In-Reply-To: <199708270824.RAA04887@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970827120729.8289A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 12:12:01 +0300 (IDT)


On Wed, 27 Aug 1997, Kenichi Handa wrote:

> By turning enable-multibyte-character off, you can avoid seeing some
> garbage characters when some part of the buffer contents matches
> Emacs' internal format incidentally, can avoid incorrect cursor moving
> in such a case.

I see.  In that case, I agree that it would be better to have the modeline
still show the coding system in the case where Emacs sees unknown binary
characters in the file.  ?t and ?b (for text and binary files,
accordingly) are good enough for me. 

> And, I beleive the patch I attached will save most cases.  Considering
> the trade off between making code-detection not that slow and making
> code-detection more intelligent, I think the former is important as
> fas as we can't have a 100% correct code-detection.

I agree.  Any heuristic should be based on user experience, and we cannot 
have that unless Emacs is released ;-).

From handa@etl.go.jp  Wed Aug 27 04:08:38 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "20:08:58" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "30" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id EAA09013 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 04:08:37 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id UAA11196; Wed, 27 Aug 1997 20:07:48 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id UAA26086; Wed, 27 Aug 1997 20:07:47 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id UAA05054; Wed, 27 Aug 1997 20:08:58 +0900
Message-Id: <199708271108.UAA05054@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827120729.8289A-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 12:12:01 +0300 (IDT))
References:  <Pine.SUN.3.91.970827120729.8289A-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 20:08:58 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> On Wed, 27 Aug 1997, Kenichi Handa wrote:
>> By turning enable-multibyte-character off, you can avoid seeing some
>> garbage characters when some part of the buffer contents matches
>> Emacs' internal format incidentally, can avoid incorrect cursor moving
>> in such a case.

> I see.  In that case, I agree that it would be better to have the modeline
> still show the coding system in the case where Emacs sees unknown binary
> characters in the file.  ?t and ?b (for text and binary files,
> accordingly) are good enough for me. 

The remaining matter is how to show the state of
enable-multibyte-character in mode line.  Now, two letters (`-' and
coding system mnemonic) before EOL indicator (`:', `\', or `/') means
enable-multibyte-character is t and one character `-' means
enable-multibyte-character it nil.  If we just show `t' or `b', it's
hard for users to know the status of enable-multibyte-character.  My
idea is to turn the first letter `-' to `='.  And, I prefer `=' to `b'
because mnemonic letter of `no-conversion' is `='.

Another change I want to do is to change mnemonic letter of
`emacs-mule' from `=' to `M', then `=' always tells that the buffer
contents are binary code.

Richard, what do you think?  May I change the current code as above?

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Wed Aug 27 04:17:15 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "14:16:30" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "18" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id EAA09124 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 04:17:13 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id OAA08460; Wed, 27 Aug 1997 14:16:31 +0300
X-Sender: eliz@is
In-Reply-To: <199708271108.UAA05054@etlken.etl.go.jp>
Message-ID: <Pine.SUN.3.91.970827141304.8439A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Kenichi Handa <handa@etl.go.jp>
cc: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 14:16:30 +0300 (IDT)


On Wed, 27 Aug 1997, Kenichi Handa wrote:

> My
> idea is to turn the first letter `-' to `='.  And, I prefer `=' to `b'
> because mnemonic letter of `no-conversion' is `='.

Seems OK to me.

> Another change I want to do is to change mnemonic letter of
> `emacs-mule' from `=' to `M', then `=' always tells that the buffer
> contents are binary code.

It bothers me for some time that emacs-mule and no-conversion have both 
the same mnemonic, so the change is welcome.  However, I would suggest to 
leave emacs-mule be `=' and invent a new letter (`b'? `B'?) for the 
binary case, since `=' means ``the usual case'', which is emacs-mule.  
The binary case is the exception.

From handa@etl.go.jp  Wed Aug 27 05:14:07 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "21:09:41" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "22" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id FAA10528 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 05:14:06 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id VAA13952; Wed, 27 Aug 1997 21:08:29 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id VAA28850; Wed, 27 Aug 1997 21:08:29 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id VAA05105; Wed, 27 Aug 1997 21:09:41 +0900
Message-Id: <199708271209.VAA05105@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827141304.8439A-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 14:16:30 +0300 (IDT))
References:  <Pine.SUN.3.91.970827141304.8439A-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 21:09:41 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
>> Another change I want to do is to change mnemonic letter of
>> `emacs-mule' from `=' to `M', then `=' always tells that the buffer
>> contents are binary code.

> It bothers me for some time that emacs-mule and no-conversion have both 
> the same mnemonic, so the change is welcome.  However, I would suggest to 
> leave emacs-mule be `=' and invent a new letter (`b'? `B'?) for the 
> binary case, since `=' means ``the usual case'', which is emacs-mule.  
> The binary case is the exception.

Unfortunately `B' is already used by `chinese-big5'.  And I want to
keep the letter `b' for a coding system which we may support in the
feature.  And I think `=' is a good mnemonic for `no-conversion'
because the files' external and internal (to Emacs) codings are
`equal'.  In addition, all the other frequently used coding systems
have non-symbol mnemonics.  So, using the symbol `=' for the exception
(no-conversion) seems reasonable.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Wed Aug 27 09:23:57 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "12:25:17" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "8" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21124 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 09:23:56 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29840; Wed, 27 Aug 1997 12:25:17 -0400
Message-Id: <199708271625.MAA29840@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970827095430.7942A-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 09:55:17 +0300 (IDT))
References:  <Pine.SUN.3.91.970827095430.7942A-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 12:25:17 -0400

      Handa suggested to change that default and
    assign to it the (new) coding system to be called raw-text that would
    still do EOL conversions.

    Will this solve the problem?

NO!  It is impossible to solve the problem unless
enable-multibyte-characters is set to nil.

From rms@gnu.ai.mit.edu  Wed Aug 27 09:37:08 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "12:38:39" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "22" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA21924 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 09:37:07 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29926; Wed, 27 Aug 1997 12:38:39 -0400
Message-Id: <199708271638.MAA29926@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708270721.AAA16471@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <199708270431.AAA26954@psilocin.gnu.ai.mit.edu> 	<Pine.SUN.3.91.970827095430.7942A-100000@is> <199708270721.AAA16471@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il, handa@etl.go.jp, andrewi@harlequin.co.uk,         rms@gnu.ai.mit.edu
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 12:38:39 -0400

    This discussion about no-conversion has made me rethink part of
    find-buffer-file-type-coding-system.  For files that are specified to
    be "binary" in file-name-buffer-file-type-alst or
    untranslated-filesystem-list, the no-conversion coding system is used.

If a file is binary, no-conversion is right; but in addition,
enable-multibyte-characters should be turned off, so that no sequence
of bytes gets misinterpreted as a multibyte character.

    This does not seem correct since a file could be on an "untranslated"
    filesystem and still need a coding system (the untranslated only
    refers to EOL).

That is true.  Files which are on an untranslated file system, whose
individual names do not imply binary files, are not really "binary".

The best thing to do with them is this: if the file exists, read it
normally; if it does not exist, use undecided-unix as the coding
system for creating it.

Can you implement that?


From rms@gnu.ai.mit.edu  Wed Aug 27 09:43:57 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "12:45:22" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "6" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA22345 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 09:43:55 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA29949; Wed, 27 Aug 1997 12:45:22 -0400
Message-Id: <199708271645.MAA29949@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708270733.QAA04792@etlken.etl.go.jp> (message from Kenichi 	Handa on Wed, 27 Aug 1997 16:33:26 +0900)
References: <Pine.SUN.3.91.970827095430.7942A-100000@is> <199708270733.QAA04792@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 12:45:22 -0400

    > If it will, then the only problem that remains is how do we make sure
    > that a truely binary file that happens to have a few CRLF pairs
    > doesn't get detected as raw-text-dos.

We cannot do that.  For true binary files, the user has to say
somehow that "this is a binary file".

From eliz@is.elta.co.il  Wed Aug 27 09:45:26 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "19:44:35" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "9" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id JAA22400 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 09:45:22 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id TAA09240; Wed, 27 Aug 1997 19:44:36 +0300
X-Sender: eliz@is
In-Reply-To: <199708271638.MAA29926@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970827194215.9184D-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 19:44:35 +0300 (IDT)


On Wed, 27 Aug 1997, Richard Stallman wrote:

> If a file is binary, no-conversion is right; but in addition,
> enable-multibyte-characters should be turned off, so that no sequence
> of bytes gets misinterpreted as a multibyte character.

I thought that no-conversion already prevents interpretation of multibyte 
sequences in the file, since it does I/O verbatim.  What am I missing?

From rms@gnu.ai.mit.edu  Wed Aug 27 13:19:51 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "16:21:09" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708272021.QAA31689@psilocin.gnu.ai.mit.edu>" "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id NAA09533 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 13:19:50 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id QAA31689; Wed, 27 Aug 1997 16:21:09 -0400
Message-Id: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708270827.RAA04900@etlken.etl.go.jp> (message from Kenichi 	Handa on Wed, 27 Aug 1997 17:27:58 +0900)
References: <Pine.SUN.3.91.970827104903.7942P-100000@is> <199708270827.RAA04900@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 16:21:09 -0400

    > This might be an OK solution, but I'm afraid I don't understand how would 
    > Emacs distinguish between these two coding categories (binary and 
    > raw-text)?

    Only by consistency of EOL format.  If consistent, it's raw-text, if
    not, it's no-conversion.

I think it is a mistake to try to distinguish this automatically.
It cannot be done right, so let's NOT TRY.

Instead, we should simply tell users that they must specify explicitly
which files are true binary files, one way or another.

Handa, please forget about trying to do this, and install the other
changes now.

From rms@gnu.ai.mit.edu  Wed Aug 27 14:15:33 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "17:16:46" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "9" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id OAA14163 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 14:15:32 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id RAA32099; Wed, 27 Aug 1997 17:16:46 -0400
Message-Id: <199708272116.RAA32099@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970827194215.9184D-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 19:44:35 +0300 (IDT))
References:  <Pine.SUN.3.91.970827194215.9184D-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, handa@etl.go.jp, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 17:16:46 -0400

    I thought that no-conversion already prevents interpretation of multibyte 
    sequences in the file, since it does I/O verbatim.

You're lumping together two entirely different issues.  no-conversion
means that the bytes are not translated when they are read in.  What
they mean in the buffer is another matter!

But perhaps no-conversion SHOULD turn off enable-multibyte-characters.
Handa, what do you think?

From handa@etl.go.jp  Wed Aug 27 17:59:04 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "09:59:59" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "31" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA00905 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 17:59:03 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id JAA00494; Thu, 28 Aug 1997 09:58:49 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA22179; Thu, 28 Aug 1997 09:58:48 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id JAA05760; Thu, 28 Aug 1997 09:59:59 +0900
Message-Id: <199708280059.JAA05760@etlken.etl.go.jp>
In-reply-to: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Wed, 27 Aug 1997 16:21:09 -0400)
References: <Pine.SUN.3.91.970827104903.7942P-100000@is> <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 09:59:59 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>> This might be an OK solution, but I'm afraid I don't understand how would 
>> Emacs distinguish between these two coding categories (binary and 
>> raw-text)?

>     Only by consistency of EOL format.  If consistent, it's raw-text, if
>     not, it's no-conversion.

> I think it is a mistake to try to distinguish this automatically.
> It cannot be done right, so let's NOT TRY.

> Instead, we should simply tell users that they must specify explicitly
> which files are true binary files, one way or another.

I agree.  But, we anyway have to define Emacs' behaviour when it
encounter such a file that has random 8-bit code in text but has
consistent EOL format, or and has inconsistent EOL format.  What Emacs
should do in these cases?  I think using raw-text-XXX in the former
case and using no-conversion is the latter case is reasonable.

> Handa, please forget about trying to do this, and install the other
> changes now.

What do you mean by "this"?
o introducing coding-category-raw-text?
o implementing some more intelligent EOL detecter?
o all the changes about handling raw-text?

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Wed Aug 27 18:13:06 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "10:14:04" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "12" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA01601 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 18:13:05 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA01489; Thu, 28 Aug 1997 10:12:54 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA23642; Thu, 28 Aug 1997 10:12:53 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA05777; Thu, 28 Aug 1997 10:14:04 +0900
Message-Id: <199708280114.KAA05777@etlken.etl.go.jp>
In-reply-to: <199708272116.RAA32099@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Wed, 27 Aug 1997 17:16:46 -0400)
References: <Pine.SUN.3.91.970827194215.9184D-100000@is> <199708272116.RAA32099@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 10:14:04 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
> But perhaps no-conversion SHOULD turn off enable-multibyte-characters.
> Handa, what do you think?

I agree, and your mail just reminded me that you actually asked that
kind of change long ago.

I'll do this change.

---
Ken'ichi HANDA
handa@etl.go.jp

From handa@etl.go.jp  Wed Aug 27 18:48:37 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "10:49:29" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "22" "Re: Terminal coding systems on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA03311 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 18:48:36 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id KAA04465; Thu, 28 Aug 1997 10:48:20 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id KAA26471; Thu, 28 Aug 1997 10:48:19 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id KAA05857; Thu, 28 Aug 1997 10:49:29 +0900
Message-Id: <199708280149.KAA05857@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970827170625.8908C-100000@is> (message from Eli 	Zaretskii on Wed, 27 Aug 1997 17:10:50 +0300 (IDT))
References:  <Pine.SUN.3.91.970827170625.8908C-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: rms@gnu.ai.mit.edu, voelker@cs.washington.edu
Subject: Re: Terminal coding systems on DOS_NT
Date: Thu, 28 Aug 1997 10:49:29 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> mule-cmds.el disables terminal and input coding systems in the Mule menus 
> when window-system is non-nil.  I think this should be enabled for 
> MS-DOS.  I'm not sure about NT, but it probably should be enabled there 
> also.  Geoff?

> Btw, why are these coding systems irrelevant for X-Windows?

Terminal coding system is used only when Emacs is running on some
terminal.  Keyboard coding system is for accepting multibyte
characters sent from terminal (perhaps via some input method embeded
in the terminal).  For instance, kterm (Japanized xterm) can have Umm
input method which sends iso-2022-jp or euc-japan to a program running
under kterm, cxterm (Chinese xterm) sends big5 or euc-china, hanterm
sends euc-korea.

So both of them has no meaning when (at least) X window system is
being used.  But, I don't know about DOS and NT.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Wed Aug 27 18:53:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "21:54:34" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "4" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id SAA03433 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 18:53:02 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id VAA00564; Wed, 27 Aug 1997 21:54:34 -0400
Message-Id: <199708280154.VAA00564@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708272054.NAA17158@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References:  <199708272054.NAA17158@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk
Subject: Re: get-file-buffer and find-buffer-visiting
Date: Wed, 27 Aug 1997 21:54:34 -0400

    Since DOS_NT is case insensitive, does it make sense to change
    get-file-buffer to ignore case?

I think so, if there is no other difficulty.

From rms@gnu.ai.mit.edu  Wed Aug 27 20:19:30 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "27" "August" "1997" "23:20:49" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "16" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id UAA06035 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 20:19:29 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id XAA01157; Wed, 27 Aug 1997 23:20:49 -0400
Message-Id: <199708280320.XAA01157@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708271108.UAA05054@etlken.etl.go.jp> (message from Kenichi 	Handa on Wed, 27 Aug 1997 20:08:58 +0900)
References: <Pine.SUN.3.91.970827120729.8289A-100000@is> <199708271108.UAA05054@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Wed, 27 Aug 1997 23:20:49 -0400

      My
    idea is to turn the first letter `-' to `='.

This change would be ok with me, but it isn't what the manual says,
and changes in this area of the code have tended to introduce bugs.

So please don't change this.

    Another change I want to do is to change mnemonic letter of
    `emacs-mule' from `=' to `M', then `=' always tells that the buffer
    contents are binary code.

The manual is already printed and says it is =, so don't change this.

Please don't make any change in this area of Emacs, and work on
the other issues which are more important.

From rms@gnu.ai.mit.edu  Wed Aug 27 23:18:48 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "02:16:53" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA12200 for <voelker@cs.washington.edu>; Wed, 27 Aug 1997 23:18:47 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id CAA01806; Thu, 28 Aug 1997 02:16:53 -0400
Message-Id: <199708280616.CAA01806@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708280059.JAA05760@etlken.etl.go.jp> (message from Kenichi 	Handa on Thu, 28 Aug 1997 09:59:59 +0900)
References: <Pine.SUN.3.91.970827104903.7942P-100000@is> <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> <199708280059.JAA05760@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 02:16:53 -0400

Please implement the raw-text coding system as we have already
described it.  Please DO NOT try to distinguish heuristically "real
binary" files from "raw-text files".

    I agree.  But, we anyway have to define Emacs' behaviour when it
    encounter such a file that has random 8-bit code in text but has
    consistent EOL format, or and has inconsistent EOL format.  What Emacs
    should do in these cases?

Distinguish raw-text-unix and raw-text-dos and raw-text-mac
just the same way as you do for most other coding systems.

THis issue is NOT IMPORTANT!  Stop spending time on it!
Implement raw-text in the natural way, as I have explained it,
and STOP PAYING ATTENTION TO IT and MOVE ON TO SOMETHING ELSE.

From handa@etl.go.jp  Thu Aug 28 00:17:42 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "16:18:28" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "23" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id AAA13937 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 00:17:41 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id QAA23158; Thu, 28 Aug 1997 16:17:19 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id QAA13875; Thu, 28 Aug 1997 16:17:18 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id QAA06591; Thu, 28 Aug 1997 16:18:28 +0900
Message-Id: <199708280718.QAA06591@etlken.etl.go.jp>
In-reply-to: <199708280616.CAA01806@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Thu, 28 Aug 1997 02:16:53 -0400)
References: <Pine.SUN.3.91.970827104903.7942P-100000@is> <199708270827.RAA04900@etlken.etl.go.jp> <199708272021.QAA31689@psilocin.gnu.ai.mit.edu> <199708280059.JAA05760@etlken.etl.go.jp> <199708280616.CAA01806@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 16:18:28 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
> Please implement the raw-text coding system as we have already
> described it.  Please DO NOT try to distinguish heuristically "real
> binary" files from "raw-text files".

>     I agree.  But, we anyway have to define Emacs' behaviour when it
>     encounter such a file that has random 8-bit code in text but has
>     consistent EOL format, or and has inconsistent EOL format.  What Emacs
>     should do in these cases?

> Distinguish raw-text-unix and raw-text-dos and raw-text-mac
> just the same way as you do for most other coding systems.

> THis issue is NOT IMPORTANT!  Stop spending time on it!
> Implement raw-text in the natural way, as I have explained it,
> and STOP PAYING ATTENTION TO IT and MOVE ON TO SOMETHING ELSE.

Ok, then I've already done what necessary.  I'll update FSF's code
soon.

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Thu Aug 28 02:06:03 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "12:05:25" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "40" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA18163 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 02:06:01 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id MAA10627; Thu, 28 Aug 1997 12:05:26 +0300
X-Sender: eliz@is
In-Reply-To: <199708272021.QAA31689@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970828120420.10622A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 12:05:25 +0300 (IDT)


On Wed, 27 Aug 1997, Richard Stallman wrote:

>     Only by consistency of EOL format.  If consistent, it's raw-text, if
>     not, it's no-conversion.
> 
> I think it is a mistake to try to distinguish this automatically.
> It cannot be done right, so let's NOT TRY.

There's nothing wrong IMHO with a mechanism that does detect binary
files most of the time, even if it doesn't work in all cases.  I have
added such capabilities in various DOS ports of GNU tools (e.g., see
the DJGPP port of Grep) and never heard any complaints.  The diffs
that Handa has sent me seem to implement this consistency test
already, and don't seem too resource-consuming.

I agree that further refinement of the binary file detection could be
delayed until more user experience is available, but I don't think it
can be dismissed altogether.

> Instead, we should simply tell users that they must specify explicitly
> which files are true binary files, one way or another.

I believe this should prove as a nuisance.  It was enough of a
nuisance in the DOS_NT world to introduce the
file-name-buffer-file-type-alist so frequently-used binary files will
be recognized automatically.  This solution is IMHO not good enough in
the presence of coding, since e.g. *.c and even *.text files could
include strings encoded in non-English languages.  But I believe that
with small changes in the coding-detection code we could make Emacs
recognize most of the binary files.

I don't think users will like the requirement to have in effect two
different ways of visiting files.  Unix users have never before
distinguished between binary files and the other kind, and I think
they will want to keep it that way.

I'm afraid that if we neglect to take some reasonable care of this
issue, it might become the single most important drive for users to
setq enable-multibyte-characters nil.

From eliz@is.elta.co.il  Thu Aug 28 02:24:59 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "12:24:51" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "35" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id CAA18542 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 02:24:57 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id MAA10739; Thu, 28 Aug 1997 12:24:52 +0300
X-Sender: eliz@is
In-Reply-To: <199708272054.NAA17158@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970828122413.10704E-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: get-file-buffer and find-buffer-visiting
Date: Thu, 28 Aug 1997 12:24:51 +0300 (IDT)


On Wed, 27 Aug 1997, Geoff Voelker wrote:

> I have a question about get-file-buffer and find-buffer-visiting.  A
> user has encountered a situation where get-file-buffer is invoked in
> different situations with the same filename, except that the filename
> differs in case in the different situations.  get-file-buffer only
> returns the associated buffer for the situation where the case
> matches, and find-buffer-visiting returns the buffer independent of
> case.
> 
> Since DOS_NT is case insensitive, does it make sense to change
> get-file-buffer to ignore case?

Similar problems had popped up before.  My impression from the
discussions back there is that it boils down to this: should we
consider file names which only differ in the letter-case as the *same*
file name, or *different* names that refer to the same file?

Emacs currently supports the former interpretation.  get-file-buffer
is documented to require exact match of the file name, and
find-buffer-visiting is documented to test for other buffers that
might visit the same file, perhaps under different names.

If we want to interpret file names case-insensitively, I would suggest
introducing a special function for filename comparison that on DOS_NT
(and VMS?) will be case-insensitive, and change all the places where
file names are compared with string-equal to use this new function
instead.  Places which use string-match will then need to ignore case
as well.

I'm afraid such a change would be a lot of work.  However, changing a
single function to ignore the case will make Emacs inconsistent in its
treatment of file names on DOS_NT.  If we think Emacs should be
case-insensitive on DOS_NT, it should do that consistently.

From eliz@is.elta.co.il  Thu Aug 28 06:59:26 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "16:58:53" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970828162025.11126B-100000@is>" "141" "\"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id GAA26767 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 06:59:24 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id QAA11433; Thu, 28 Aug 1997 16:58:54 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970828162025.11126B-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>,         Kenichi Handa <handa@etl.go.jp>
Subject: "Binary" I/O and subprocesses on DOS_NT
Date: Thu, 28 Aug 1997 16:58:53 +0300 (IDT)

I believe I've found a bug in call-process-region.  To reproduce, call 
hexl-find-file on a DOS-style text file (with CRLF EOLs), change it a 
bit, then save it: the file is written with Unix-style EOLs.

I think this is because call-process-region was incorrectly setting the 
coding systems for writing the region that serves as input to the process 
and for reading process output.  

First, if binary-process-input is nil, that means the input to process is
text, so setting the coding-system-for-write to nil (no conversion) is the
opposite of what should be done.

Also, call-process-region was setting the coding system for reading the
process output using binary-process-input (instead of
binary-process-output).  Actually, this latter part seems unnecessary at
all, since call-process does it itself, and it does it correctly.  So for 
now, I just ifdef'ed that part away, see the patch below.

I didn't install this change, because I would like you all to look at it 
carefully, in case I made some error.  But please read my other message 
about this before you look at the patch.  The whole issue is complicated, 
and I think it's a good idea that somebody else looks at what I've done.

Here's the patch:

1997-08-28 +03  Eli Zaretskii  <eliz@is.elta.co.il>

	* callproc.c (Fcall_process): Set EOL conversion type to LF when
	binary-process-output is non-nil.
	(Fcall_process_region): binary-process-XXXput only determines EOL
	conversion; if it is nil, convert LF <-> CRLF.  Don't bind
	coding-system-for-read, it is done in Fcall_process.

diff -c src/callproc.c~0 src/callproc.c
*** src/callproc.c~0	Sun Aug 24 00:19:34 1997
--- src/callproc.c	Thu Aug 28 15:44:20 1997
***************
*** 296,303 ****
  	  }
  	setup_coding_system (Fcheck_coding_system (val), &process_coding);
  #ifdef MSDOS
! 	/* On MSDOS, if the user did not ask for binary,
! 	   treat it as "text" which means doing CRLF conversion.  */
  	/* FIXME: this probably should be moved into the guts of
  	   `Ffind_operation_coding_system' for the case of `call-process'.  */
  	if (NILP (Vbinary_process_output))
--- 296,311 ----
  	  }
  	setup_coding_system (Fcheck_coding_system (val), &process_coding);
  #ifdef MSDOS
! 	/* On MSDOS, if the user did not ask for binary, treat it as
! 	   "text" which means doing CRLF conversion.  Otherwise, leave
! 	   the EOLs alone.
! 
! 	   Note that ``binary'' here only means whether EOLs should or
! 	   should not be converted, since that's what Vbinary_process_XXXput
! 	   meant in the days before the coding systems were introduced.
! 
! 	   For other conversions, the caller should set coding-system
! 	   variables explicitly, or rely on auto-detection.  */
  	/* FIXME: this probably should be moved into the guts of
  	   `Ffind_operation_coding_system' for the case of `call-process'.  */
  	if (NILP (Vbinary_process_output))
***************
*** 307,312 ****
--- 315,322 ----
  	      /* FIXME: should we set type to undecided?  */
  	      process_coding.type = coding_type_emacs_mule;
  	  }
+ 	else
+ 	  process_coding.eol_type = CODING_EOL_LF;
  #endif
        }
    }
***************
*** 801,813 ****
    start = args[0];
    end = args[1];
    /* Decide coding-system of the contents of the temporary file.  */
  #ifdef DOS_NT
!   specbind (Qbuffer_file_type, Vbinary_process_input);
!   if (NILP (Vbinary_process_input))
!     val = Qnil;
!   else
  #endif
-     {
        if (!NILP (Vcoding_system_for_write))
  	val = Vcoding_system_for_write;
        else if (NILP (current_buffer->enable_multibyte_characters))
--- 811,822 ----
    start = args[0];
    end = args[1];
    /* Decide coding-system of the contents of the temporary file.  */
+     {
  #ifdef DOS_NT
!       /* This is to cause find-buffer-file-type-coding-system (see
! 	 dos-w32.el) to choose correct EOL translation for write-region.  */
!       specbind (Qbuffer_file_type, Vbinary_process_input);
  #endif
        if (!NILP (Vcoding_system_for_write))
  	val = Vcoding_system_for_write;
        else if (NILP (current_buffer->enable_multibyte_characters))
***************
*** 825,834 ****
--- 834,860 ----
  	  else
  	    val = Qnil;
  	}
+ #ifdef DOS_NT
+       /* binary-process-input tells whether the buffer needs to be
+ 	 written with EOL conversions, but it doesn't say anything
+ 	 about the rest of text encoding.
+ 
+ 	 Don't let binary-process-input determine the EOL conversion if the
+ 	 coding system was set explicitly and it specified EOL handling.  */
+       if (NILP (val)
+ 	  || VECTORP (Fget (val, Qeol_type))
+ 	  || NILP (Vcoding_system_for_write))
+ 	{
+ 	  Fput (val, Qeol_type,
+ 		make_number (NILP (Vbinary_process_input) ? 1 : 0));
+ 	}
+ #endif
      }
    specbind (intern ("coding-system-for-write"), val);
    Fwrite_region (start, end, filename_string, Qnil, Qlambda, Qnil);
  
+   /* This is done by Fcall_process.  */
+ #if 0
  #ifdef DOS_NT
    if (NILP (Vbinary_process_input))
      val = Qnil;
***************
*** 853,858 ****
--- 879,885 ----
  	}
      }
    specbind (intern ("coding-system-for-read"), val);
+ #endif
  
    record_unwind_protect (delete_temp_file, filename_string);

From eliz@is.elta.co.il  Thu Aug 28 07:20:23 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "17:19:31" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" "<Pine.SUN.3.91.970828165856.11209A-100000@is>" "28" "\"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA28112 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 07:20:21 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA11546; Thu, 28 Aug 1997 17:19:31 +0300
X-Sender: eliz@is
Message-ID: <Pine.SUN.3.91.970828165856.11209A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: Geoff Voelker <voelker@cs.washington.edu>,         Andrew Innes <andrewi@harlequin.co.uk>,         Kenichi Handa <handa@etl.go.jp>
Subject: "Binary" I/O and subprocesses on DOS_NT
Date: Thu, 28 Aug 1997 17:19:31 +0300 (IDT)

Here are my thoughts about this.

First, the word ``binary'' is loaded, and it got in the way when I worked 
on this.  binary-process-XXXput being non-nil doesn't really mean that 
data should be read or written with no conversions; it just means that 
EOLs should not be converted.  I can imagine cases where the rest of the 
text should be encoded or decoded even though the EOLs should be left 
alone.

Therefore, it is IMHO incorrect to set coding system to nil when binary 
I/O is specified.  We should only set the eol-type property.

The patch that I sent to you also avoids setting the EOL conversion of the
coding system was specified explicitly, to let the callers override the
value of binary-process-XXXput, if they need to do so. 

Geoff, I think that the code which sets the coding system on dos-w32.el 
should also be revised, so that it doesn't fall into this trap of 
``binary'' files.  The patterns for names of binary files in 
file-name-buffer-file-type-alist designate true binary files which should 
be read with no conversions at all, but the untranslated filesystems only 
specify the EOL conversion.  As far as I can see, the current code 
doesn't make that distinction.  In particular, when buffer-file-type is 
non-nil, it does NOT mean the coding system for write should be 
no-conversion, as dos-w32 sets it now.

And btw, why is the coding system for ASCII buffers (such as C source) 
set to undecided?  Shouldn't it be emacs-mule?

From rms@gnu.ai.mit.edu  Thu Aug 28 09:19:31 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "12:20:51" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "15" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id JAA03868 for <voelker@cs.washington.edu>; Thu, 28 Aug 1997 09:19:30 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id MAA04533; Thu, 28 Aug 1997 12:20:51 -0400
Message-Id: <199708281620.MAA04533@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970828120420.10622A-100000@is> (message from Eli 	Zaretskii on Thu, 28 Aug 1997 12:05:25 +0300 (IDT))
References:  <Pine.SUN.3.91.970828120420.10622A-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Thu, 28 Aug 1997 12:20:51 -0400

    There's nothing wrong IMHO with a mechanism that does detect binary
    files most of the time, even if it doesn't work in all cases.

If it delays the Emacs 20 release even one day, that is something
very wrong with it.

Please drop the subject so that Handa will go back to fixing what he
needs to fix.

    I believe this should prove as a nuisance.  It was enough of a
    nuisance in the DOS_NT world to introduce the
    file-name-buffer-file-type-alist so frequently-used binary files will
    be recognized automatically.

Too bad.  Nothing can be done.

From eliz@is.elta.co.il  Fri Aug 29 03:14:41 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Fri" "29" "August" "1997" "13:14:28" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "23" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA26830 for <voelker@cs.washington.edu>; Fri, 29 Aug 1997 03:14:40 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id NAA13247; Fri, 29 Aug 1997 13:14:28 +0300
X-Sender: eliz@is
In-Reply-To: <199708290418.VAA17387@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970829131407.13196H-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Fri, 29 Aug 1997 13:14:28 +0300 (IDT)


On Thu, 28 Aug 1997, Geoff Voelker wrote:

> Eli, I can't reproduce this.  I did
> 
> M-x hexl-find-file /tmp/text
> changed some characters
> C-x C-s
> y
> 
> then, in a command prompt, "od -a /tmp/text".  the lines still ended
> with CRLFs.

Maybe because hexl is called differently on NT?  At least one of the
places I patched (in call-process) are MSDOS only, and there are
numerous other ``ifdef MSDOS'' there.

But anyway, please look at the patches for call-process-region (which
are DOS_NT) and tell me whether they seem to be corect.  Maybe you
could come up with your own test case when you look at the present
code of call-process-region (for starters, it uses
Vbinary_process_input instead of Vbinary_process_output in the
fragment that I ifdef'ed away).

From eliz@is.elta.co.il  Fri Aug 29 03:15:14 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Fri" "29" "August" "1997" "13:15:07" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "16" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id DAA26840 for <voelker@cs.washington.edu>; Fri, 29 Aug 1997 03:15:12 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id NAA13253; Fri, 29 Aug 1997 13:15:07 +0300
X-Sender: eliz@is
In-Reply-To: <199708290427.VAA26349@joker.cs.washington.edu>
Message-ID: <Pine.SUN.3.91.970829131436.13196I-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Geoff Voelker <voelker@cs.washington.edu>
cc: rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Fri, 29 Aug 1997 13:15:07 +0300 (IDT)


On Thu, 28 Aug 1997, Geoff Voelker wrote:

> > And btw, why is the coding system for ASCII buffers (such as C source) 
> > set to undecided?  Shouldn't it be emacs-mule?
> 
> I don't think that we can assume that it is ASCII.

No, I'm talking about what decode_coding returns when the file is read
in.  It returns undecided if only ASCII characters are seen in the
buffer.

If that is because the user can add non-ASCII characters after that,
then the coding system for writing should be decided by looking at the
buffer contents when it is saved, but I don't see that this is
actually done.

From handa@etl.go.jp  Fri Aug 29 03:29:18 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Fri" "29" "August" "1997" "19:29:34" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "20" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id DAA27014 for <voelker@cs.washington.edu>; Fri, 29 Aug 1997 03:29:12 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id TAA07998; Fri, 29 Aug 1997 19:28:27 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id TAA24216; Fri, 29 Aug 1997 19:28:26 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id TAA11118; Fri, 29 Aug 1997 19:29:34 +0900
Message-Id: <199708291029.TAA11118@etlken.etl.go.jp>
In-reply-to: <Pine.SUN.3.91.970829131436.13196I-100000@is> (message from Eli 	Zaretskii on Fri, 29 Aug 1997 13:15:07 +0300 (IDT))
References:  <Pine.SUN.3.91.970829131436.13196I-100000@is>
From: Kenichi Handa <handa@etl.go.jp>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, rms@gnu.ai.mit.edu, andrewi@harlequin.co.uk
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Fri, 29 Aug 1997 19:29:34 +0900

Eli Zaretskii <eliz@is.elta.co.il> writes:
> No, I'm talking about what decode_coding returns when the file is read
> in.  It returns undecided if only ASCII characters are seen in the
> buffer.

> If that is because the user can add non-ASCII characters after that,
> then the coding system for writing should be decided by looking at the
> buffer contents when it is saved, but I don't see that this is
> actually done.

When one inserts a new file in that buffer, and that new file is
encoded in, for instance, iso-latin-1, then buffer-file-coding-system
is changed to iso-latin-1.

But, if buffer-file-coding-system is emacs-mule before inserting that
new file, it doesn't change even after the insertion.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@priam.CS.Berkeley.EDU  Fri Aug 29 23:58:06 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "28" "August" "1997" "17:08:22" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "8" "Re: get-file-buffer and find-buffer-visiting" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id XAA18927 for <voelker@cs.washington.edu>; Fri, 29 Aug 1997 23:58:05 -0700
Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id XAA13297; Fri, 29 Aug 1997 23:57:58 -0700 (PDT)
Message-Id: <199708300657.XAA13297@priam.CS.Berkeley.EDU>
In-reply-to: <Pine.SUN.3.91.970828122413.10704E-100000@is> (message from Eli 	Zaretskii on Thu, 28 Aug 1997 12:24:51 +0300 (IDT))
References:  <Pine.SUN.3.91.970828122413.10704E-100000@is>
From: "Richard M. Stallman" <rms@priam.CS.Berkeley.EDU>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk,         rms@priam.CS.Berkeley.EDU
Subject: Re: get-file-buffer and find-buffer-visiting
Date: Thu, 28 Aug 1997 17:08:22 -0400

    I'm afraid such a change would be a lot of work.  However, changing a
    single function to ignore the case will make Emacs inconsistent in its
    treatment of file names on DOS_NT.

This convinces me that no change should be made now
in this aspect of Emacs.

Thanks.

From rms@priam.CS.Berkeley.EDU  Sat Aug 30 12:55:23 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sat" "30" "August" "1997" "12:55:04" "-0700" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "10" "Re: Coding system issues (3)" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA05888 for <voelker@cs.washington.edu>; Sat, 30 Aug 1997 12:55:23 -0700
Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id MAA13636; Sat, 30 Aug 1997 12:55:04 -0700 (PDT)
Message-Id: <199708301955.MAA13636@priam.CS.Berkeley.EDU>
In-reply-to: <199708251239.VAA02414@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 25 Aug 1997 21:39:16 +0900)
Reply-to: rms@priam.CS.Berkeley.EDU
References: <Pine.SUN.3.91.970825140112.3327O-100000@is> <199708251239.VAA02414@etlken.etl.go.jp>
From: "Richard M. Stallman" <rms@priam.CS.Berkeley.EDU>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Sat, 30 Aug 1997 12:55:04 -0700 (PDT)

    Hmm, perhaps, we must now give up detecting a coding system of a file
    in an incremental manner as being done now, but have to read the whole
    file with no conversion, detect a coding system by running
    sophisticated Emacs Lisp code on the whole buffer, then decode the
    whole buffer at once.

What worries me in this, is that this method could be used for
insert-file-contents, but cannot be used for a synchronous subprocess.
I am not sure that it is a good idea to use different methods for
subprocesses and files.

From rms@priam.CS.Berkeley.EDU  Sun Aug 31 01:09:08 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Fri" "29" "August" "1997" "14:19:59" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "11" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA23093 for <voelker@cs.washington.edu>; Sun, 31 Aug 1997 01:09:08 -0700
Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id BAA14078; Sun, 31 Aug 1997 01:08:17 -0700 (PDT)
Message-Id: <199708310808.BAA14078@priam.CS.Berkeley.EDU>
In-reply-to: <Pine.SUN.3.91.970828165856.11209A-100000@is> (message from Eli 	Zaretskii on Thu, 28 Aug 1997 17:19:31 +0300 (IDT))
From: "Richard M. Stallman" <rms@priam.CS.Berkeley.EDU>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Fri, 29 Aug 1997 14:19:59 -0400

    Therefore, it is IMHO incorrect to set coding system to nil when binary 
    I/O is specified.  We should only set the eol-type property.

I agree.

    The patch that I sent to you also avoids setting the EOL conversion of the
    coding system was specified explicitly, to let the callers override the
    value of binary-process-XXXput, if they need to do so. 

I agree in principle.  Implementing this in a fully satisfactory
way may be hard.

From rms@priam.CS.Berkeley.EDU  Sun Aug 31 01:14:45 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Fri" "29" "August" "1997" "14:21:43" "-0400" "Richard M. Stallman" "rms@priam.cs.berkeley.edu" nil "106" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from priam.CS.Berkeley.EDU (priam.CS.Berkeley.EDU [128.32.34.48]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA23092 for <voelker@cs.washington.edu>; Sun, 31 Aug 1997 01:09:08 -0700
Received: (from rms@localhost) by priam.CS.Berkeley.EDU (8.8.3/8.8.2) id BAA14081; Sun, 31 Aug 1997 01:08:18 -0700 (PDT)
Message-Id: <199708310808.BAA14081@priam.CS.Berkeley.EDU>
In-reply-to: <Pine.SUN.3.91.970828165856.11209A-100000@is> (message from Eli 	Zaretskii on Thu, 28 Aug 1997 17:19:31 +0300 (IDT))
From: "Richard M. Stallman" <rms@priam.CS.Berkeley.EDU>
To: eliz@is.elta.co.il
CC: voelker@cs.washington.edu, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Fri, 29 Aug 1997 14:21:43 -0400

    Geoff, I think that the code which sets the coding system on dos-w32.el 
    should also be revised, so that it doesn't fall into this trap of 
    ``binary'' files.  The patterns for names of binary files in 
    file-name-buffer-file-type-alist designate true binary files which should 
    be read with no conversions at all, but the untranslated filesystems only 
    specify the EOL conversion.  As far as I can see, the current code 
    doesn't make that distinction.

I've made changes (below) that I think solve these problems.
Can you please test them?  (These are in the .97 pretest too.)

  In particular, when buffer-file-type is 
    non-nil, it does NOT mean the coding system for write should be 
    no-conversion, as dos-w32 sets it now.

Please note that that code is used only when buffer-file-coding-system
is nil, which means, in a buffer that is not file-visiting.
As far as I can see, no-conversion is the right choice for that case.

*** dos-w32.el	1997/08/17 01:49:50	1.10
--- dos-w32.el	1997/08/29 18:17:38
***************
*** 72,89 ****
  	(setq alist (cdr alist)))
        found)))
  
  (defun find-buffer-file-type (filename)
!   ;; First check if file is on an untranslated filesystem, then on the alist.
!   (if (untranslated-file-p filename)
!       t ; for binary
!     (let ((match (find-buffer-file-type-match filename))
! 	  (code))
!       (if (not match)
! 	  default-buffer-file-type
! 	(setq code (cdr match))
! 	(cond ((memq code '(nil t)) code)
! 	      ((and (symbolp code) (fboundp code))
! 	       (funcall code filename)))))))
  
  (setq-default buffer-file-coding-system 'undecided-dos)
  
--- 72,87 ----
  	(setq alist (cdr alist)))
        found)))
  
+ ;; Don't check for untranslated file systems here.
  (defun find-buffer-file-type (filename)
!   (let ((match (find-buffer-file-type-match filename))
! 	(code))
!     (if (not match)
! 	default-buffer-file-type
!       (setq code (cdr match))
!       (cond ((memq code '(nil t)) code)
! 	    ((and (symbolp code) (fboundp code))
! 	     (funcall code filename))))))
  
  (setq-default buffer-file-coding-system 'undecided-dos)
  
***************
*** 123,142 ****
    (let ((op (nth 0 command))
  	(target)
  	(binary nil) (text nil)
! 	(undecided nil))
      (cond ((eq op 'insert-file-contents) 
  	   (setq target (nth 1 command))
! 	   (if (untranslated-file-p target)
! 	       (if (file-exists-p target)
! 		   (setq undecided t)
! 		 (setq binary t))
! 	     (setq binary (find-buffer-file-type target))
! 	     (unless binary
! 		     (if (find-buffer-file-type-match target)
! 			 (setq text t)
! 		       (setq undecided (file-exists-p target)))))
  	   (cond (binary '(no-conversion . no-conversion))
  		 (text '(undecided-dos . undecided-dos))
  		 (undecided '(undecided . undecided))
  		 (t '(undecided-dos . undecided-dos))))
  	  ((eq op 'write-region)
--- 121,145 ----
    (let ((op (nth 0 command))
  	(target)
  	(binary nil) (text nil)
! 	(undecided nil) (undecided-unix nil))
      (cond ((eq op 'insert-file-contents) 
  	   (setq target (nth 1 command))
! 	   ;; First check for a file name that indicates
! 	   ;; it is truly binary.
! 	   (setq binary (find-buffer-file-type target))
! 	   (cond (binary)
! 		 ;; Next check for files that MUST use DOS eol conversion.
! 		 ((find-buffer-file-type-match target)
! 		  (setq text t))
! 		 ;; For any other existing file, decide based on contents.
! 		 ((file-exists-p target)
! 		  (setq undecided t))
! 		 ;; Next check for a non-DOS file system.
! 		 ((untranslated-file-p target)
! 		  (setq undecided-unix t)))
  	   (cond (binary '(no-conversion . no-conversion))
  		 (text '(undecided-dos . undecided-dos))
+ 		 (undecided-unix '(undecided-unix . undecided-unix))
  		 (undecided '(undecided . undecided))
  		 (t '(undecided-dos . undecided-dos))))
  	  ((eq op 'write-region)

From rms@gnu.ai.mit.edu  Sun Aug 31 12:39:25 1997
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil]
	[nil "Sun" "31" "August" "1997" "15:40:56" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" "<199708311940.PAA15457@psilocin.gnu.ai.mit.edu>" "7" "Re: \"Binary\" I/O and subprocesses on DOS_NT" "^From:" nil nil "8" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id MAA06278 for <voelker@cs.washington.edu>; Sun, 31 Aug 1997 12:39:24 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id PAA15457; Sun, 31 Aug 1997 15:40:56 -0400
Message-Id: <199708311940.PAA15457@psilocin.gnu.ai.mit.edu>
In-reply-to: <199708290418.VAA17387@joker.cs.washington.edu> 	(voelker@cs.washington.edu)
References: <Pine.SUN.3.91.970828162025.11126B-100000@is> <199708290418.VAA17387@joker.cs.washington.edu>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: voelker@cs.washington.edu
CC: eliz@is.elta.co.il, andrewi@harlequin.co.uk, handa@etl.go.jp
Subject: Re: "Binary" I/O and subprocesses on DOS_NT
Date: Sun, 31 Aug 1997 15:40:56 -0400

    > I believe I've found a bug in call-process-region.  To reproduce, call 
    > hexl-find-file on a DOS-style text file (with CRLF EOLs), change it a 
    > bit, then save it: the file is written with Unix-style EOLs.

    Eli, I can't reproduce this.  I did

Can you try this in the new pretest?

From handa@etl.go.jp  Mon Sep  1 01:11:37 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 1" "September" "1997" "17:12:04" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "27" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id BAA24451 for <voelker@cs.washington.edu>; Mon, 1 Sep 1997 01:11:35 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id RAA02811; Mon, 1 Sep 1997 17:10:53 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id RAA28502; Mon, 1 Sep 1997 17:10:53 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id RAA15939; Mon, 1 Sep 1997 17:12:04 +0900
Message-Id: <199709010812.RAA15939@etlken.etl.go.jp>
In-reply-to: <199708301955.MAA13636@priam.CS.Berkeley.EDU> 	(rms@priam.CS.Berkeley.EDU)
References: <Pine.SUN.3.91.970825140112.3327O-100000@is> <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@priam.CS.Berkeley.EDU
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 1 Sep 1997 17:12:04 +0900

"Richard M. Stallman" <rms@priam.CS.Berkeley.EDU> writes:
>     Hmm, perhaps, we must now give up detecting a coding system of a file
>     in an incremental manner as being done now, but have to read the whole
>     file with no conversion, detect a coding system by running
>     sophisticated Emacs Lisp code on the whole buffer, then decode the
>     whole buffer at once.

> What worries me in this, is that this method could be used for
> insert-file-contents, but cannot be used for a synchronous subprocess.
> I am not sure that it is a good idea to use different methods for
> subprocesses and files.

?? A synchronous subprocess has no problem, we can read the whole
output into a buffer, and then process it.

The problem is with an asynchronous subprocess because we must detect
a coding system on the fly.  But, I think it is enough to give just a
bunch of data Emacs receives from the subprocess to the sophisticated
Emacs Lisp code which I mentioned above.

And, even in file-reading (insert-file-contents), if BEG and END are
specified, that Emacs Lisp code will detect a coding system only from
the part of a file.

---
Ken'ichi HANDA
handa@etl.go.jp

From rms@gnu.ai.mit.edu  Mon Sep  1 10:05:47 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 1" "September" "1997" "13:07:20" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "19" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA06618 for <voelker@cs.washington.edu>; Mon, 1 Sep 1997 10:05:46 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA04706; Mon, 1 Sep 1997 13:07:20 -0400
Message-Id: <199709011707.NAA04706@psilocin.gnu.ai.mit.edu>
In-reply-to: <199709010812.RAA15939@etlken.etl.go.jp> (message from Kenichi 	Handa on Mon, 1 Sep 1997 17:12:04 +0900)
References: <Pine.SUN.3.91.970825140112.3327O-100000@is> <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU> <199709010812.RAA15939@etlken.etl.go.jp>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: handa@etl.go.jp
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Mon, 1 Sep 1997 13:07:20 -0400

    > What worries me in this, is that this method could be used for
    > insert-file-contents, but cannot be used for a synchronous subprocess.

    ?? A synchronous subprocess has no problem, we can read the whole
    output into a buffer, and then process it.

Right, I meant to say asynchronous.

    The problem is with an asynchronous subprocess because we must detect
    a coding system on the fly.  But, I think it is enough to give just a
    bunch of data Emacs receives from the subprocess to the sophisticated
    Emacs Lisp code which I mentioned above.

I am not sure what "enough" means.  It would do something, it would
choose a coding system, but it would do so in an inconsistent way.

In effect, what I am saying is this: if this method is acceptable for
an asynchronous subprocess, doesn't that mean it is also acceptable
for a file?

From handa@etl.go.jp  Mon Sep  1 17:50:35 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Tue" " 2" "September" "1997" "09:50:57" "+0900" "Kenichi Handa" "handa@etl.go.jp" nil "27" "Re: Coding system issues (3)" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from mail1-im.etl.go.jp (mail1-im.etl.go.jp [192.50.105.9]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id RAA19068 for <voelker@cs.washington.edu>; Mon, 1 Sep 1997 17:50:34 -0700
Received: from etlpom.etl.go.jp (etlpom.etl.go.jp [192.31.200.9]) by mail1-im.etl.go.jp (8.8.5/3.5Wpl1-96112918) with ESMTP 	id JAA27257; Tue, 2 Sep 1997 09:49:48 +0900 (JST)
Received: from etlken.etl.go.jp (etlken.etl.go.jp [192.31.197.11]) by etlpom.etl.go.jp (8.8.5/3.5Wpl4-ETL_MASTER) with SMTP id JAA02661; Tue, 2 Sep 1997 09:49:47 +0900 (JST)
Received: by etlken.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE) 	id JAA16888; Tue, 2 Sep 1997 09:50:57 +0900
Message-Id: <199709020050.JAA16888@etlken.etl.go.jp>
In-reply-to: <199709011707.NAA04706@psilocin.gnu.ai.mit.edu> (message from 	Richard Stallman on Mon, 1 Sep 1997 13:07:20 -0400)
References: <Pine.SUN.3.91.970825140112.3327O-100000@is> <199708251239.VAA02414@etlken.etl.go.jp> <199708301955.MAA13636@priam.CS.Berkeley.EDU> <199709010812.RAA15939@etlken.etl.go.jp> <199709011707.NAA04706@psilocin.gnu.ai.mit.edu>
From: Kenichi Handa <handa@etl.go.jp>
To: rms@gnu.ai.mit.edu
CC: eliz@is.elta.co.il, voelker@cs.washington.edu, andrewi@harlequin.co.uk
Subject: Re: Coding system issues (3)
Date: Tue, 2 Sep 1997 09:50:57 +0900

Richard Stallman <rms@gnu.ai.mit.edu> writes:
>     The problem is with an asynchronous subprocess because we must detect
>     a coding system on the fly.  But, I think it is enough to give just a
>     bunch of data Emacs receives from the subprocess to the sophisticated
>     Emacs Lisp code which I mentioned above.

> I am not sure what "enough" means.  It would do something, it would
> choose a coding system, but it would do so in an inconsistent way.

> In effect, what I am saying is this: if this method is acceptable for
> an asynchronous subprocess, doesn't that mean it is also acceptable
> for a file?

Fo an asynchronous subprocess of which communication is hidden from
users such as ispell and nntp, we anyway have to specify a coding
system explicitly.  It is dangerous to beleive automatic code
detection.  But, for shell (for instance), users can see the output
from the subprocess, and they can easily find something is wrong when
Emacs fails to find a correct coding automatically.  And they can
change a coding system by C-x RET p interactively.  The sophisticated
Emacs Lisp code mentioned above won't work that acculate when it is
given a few data (perhaps only one line data), but I think it is
acceptable for subprocesses.

---
Ken'ichi HANDA
handa@etl.go.jp

From eliz@is.elta.co.il  Sun Sep 14 07:12:32 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "14" "September" "1997" "17:12:07" "+0300" "Eli Zaretskii" "eliz@is.elta.co.il" nil "19" "Re: EOL conversion in call-process-region" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from is.elta.co.il (is.elta.co.il [199.203.121.2]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with SMTP id HAA11931 for <voelker@cs.washington.edu>; Sun, 14 Sep 1997 07:12:31 -0700
Received: by is.elta.co.il (SMI-8.6/SMI-SVR4) 	id RAA20127; Sun, 14 Sep 1997 17:12:08 +0300
X-Sender: eliz@is
In-Reply-To: <199709111953.PAA04064@psilocin.gnu.ai.mit.edu>
Message-ID: <Pine.SUN.3.91.970914171101.20078F-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
From: Eli Zaretskii <eliz@is.elta.co.il>
To: Richard Stallman <rms@gnu.ai.mit.edu>
cc: handa@etl.go.jp, voelker@cs.washington.edu
Subject: Re: EOL conversion in call-process-region
Date: Sun, 14 Sep 1997 17:12:07 +0300 (IDT)


On Thu, 11 Sep 1997, Richard Stallman wrote:

> Suppose you read a DOS file and then use M-| to do something to the
> text.  Chances are you don't want that to be affected by what the
> file's EOL conversion was.

That is exactly what I am not sure about.  Won't people in such cases
expect to get the same behavior as if the region was cut out of the
original file, e.g. by a sed script?  At least when the region is the
entire buffer, they probably would.  Passing the region without the
original EOLs breaks this.

It is true that in many cases, particularly when the buffer contains
text in the native format, things will generally work both ways (DOS
programs that work on text usually drop the CR characters when they
read the file).  But reading non-text files, or reading DOS-style text
files on Unix, causes M-| to behave differently than if the file were
submitted to the invoked program outside Emacs.

From rms@gnu.ai.mit.edu  Sun Sep 14 10:04:06 1997
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Sun" "14" "September" "1997" "13:05:54" "-0400" "Richard Stallman" "rms@gnu.ai.mit.edu" nil "17" "Re: EOL conversion in call-process-region" "^From:" nil nil "9" nil nil nil nil]
	nil)
Received: from psilocin.gnu.ai.mit.edu (psilocin.gnu.ai.mit.edu [128.52.46.62]) by june.cs.washington.edu (8.8.5+CS/7.2ju) with ESMTP id KAA15021 for <voelker@cs.washington.edu>; Sun, 14 Sep 1997 10:04:05 -0700
Received: by psilocin.gnu.ai.mit.edu (8.8.5/8.6.12GNU) id NAA08358; Sun, 14 Sep 1997 13:05:54 -0400
Message-Id: <199709141705.NAA08358@psilocin.gnu.ai.mit.edu>
In-reply-to: <Pine.SUN.3.91.970914171101.20078F-100000@is> (message from Eli 	Zaretskii on Sun, 14 Sep 1997 17:12:07 +0300 (IDT))
References:  <Pine.SUN.3.91.970914171101.20078F-100000@is>
From: Richard Stallman <rms@gnu.ai.mit.edu>
To: eliz@is.elta.co.il
CC: handa@etl.go.jp, voelker@cs.washington.edu
Subject: Re: EOL conversion in call-process-region
Date: Sun, 14 Sep 1997 13:05:54 -0400

    > Suppose you read a DOS file and then use M-| to do something to the
    > text.  Chances are you don't want that to be affected by what the
    > file's EOL conversion was.

    That is exactly what I am not sure about.  Won't people in such cases
    expect to get the same behavior as if the region was cut out of the
    original file, e.g. by a sed script?

If you are using M-| to do something to the text as you see it,
you won't care what it looked like in the file.

If you are using M-| as a substitute for doing something with the file
itself, then you would care.

Basically it seems that neither choice is perfect, so I might as
well not change the one we have.


