
12/28/94

This README file describes version 7.4a1, the first alpha release
of BinarIO.  This version is compatible with Tcl 7.4b1 and is
considered alpha because not all the planned features have been
implemented nor has it been tested on a wide variety of platforms.
Plus the documentation is incomplete.  Still, the parts that have
been implemented are believed to be fairly stable.

I wrote this several months ago, but hadn't done much with it
for some time.  But with the recent release of Tcl 7.4b1 and the
tclbin extension and since it is quite usable in its current state,
I've decided to package it up and release it.

BinarIO (pronounced bee-NAH-ree-oh, for those of you who don't
speak Italian) is a package for performing unstructured binary
input and output in Tcl.

This program is Copyright 1994 by Joseph V. Moss (joe@italia.rain.com).
See the file LICENSE for distribution terms.

BACKGROUND

	Tcl uses null-terminated strings throughout, and thus
cannot deal with nulls embedded within strings.  It does however
handle any other character, so only '\0' is special.

While it would be possible to alter Tcl so that, internally, it
no longer used null-terminated strings, it would be a great deal
of work and amongst other disadvantages, it would break the
easy of extending the language that is currently enjoyed.

This package works around the problem by storing the data that
is input in one of two different forms:

	In strings, but with the null escaped somehow.
	Specifically this packages allows you to set an escape
	character.  The this character is doubled whenever a
	actual occurance of the character is found in the input
	stream (it escapes itself)  and whenever a null is
	found in the input stream is stored as the escape
	character followed by a zero (0).

	In arrays, where each element is indexed by the position
	of the character in the I/O stream, and the contents of
	each element being a decimal number corresponding the
	character.

As data is input, it is translated to one of the above forms.
The data can then be manipulated in whatever manner you choose
from within Tcl.  The data is translated back, on output.

(Actually the current implementation only does I/O in the first
form, but includes commands to convert between the two methods,
for use use when the second form is more convenient for data
modification/access)


BUILDING

	You'll need to compile all of the C files (which will
require header files from the Tcl distribution, that are not normally
installed) and then link them with the tcl library, for example:

   cc -O binAppInit.o bin_Misc.o bin_UnixAZ.o -ltcl -lm -o tclsh

(binAppInit.c is just the standard AppInit file from the Tcl
distribution plus a call to "Bin_Init" - you can modify/replace
it if you want to add in other extensions)

The header files may require that you set certain defines
(e.g. HAVE_UNISTD_H) to match the value of the AC_FLAGS in the
Makefile in the Tcl distribution source directory, as generated
by the configure script.

You can use the Makefile included in this directory to aid in
building the software.  It includes some comments and sample
variable settings that may be of use.

USAGE

The extension consists of two global variables and the commands
listed below:

Variables-

	bin_EscChar -	The character used to escape nulls;
			it defaults to '~', but can be pretty
			much any character that doesn't have
			special meaning to the Tcl parser

	bin_Mode -	If set to a non-zero value, files opened
			with the "bin_open" command will be
			opened in binary mode by default

Commands-

The first four commands work just like their standard counterparts
(i.e. open, gets, puts, read) with the differences listed below:

	bin_open -	Open file, the last argument can accept
			the additional options: b, a, BINARY, and
			ASCII, where b & BINARY force binary mode
			and a & ASCII force standard mode.  If you
			don't specify, you'll get whatever mode
			is set by the "bin_Mode" variable.

			For example, to open a file (R/W) in binary
			I/O mode you can either of these forms:

				set fd [open $filename r+b]

				set fd [open $filename \
				    [list RDWR BINARY]]


	bin_gets -	Inputs string and escapes nulls

	bin_puts -	Outputs string with escaped nulls

	bin_read -	Inputs a fixed number of bytes and escapes
			nulls

	bin_arr2str -	Converts a string w/ escaped nulls to an
			array of numeric values

	bin_str2arr -	Converts the other way

Since several of these commands are totally backward compatible
with the standard Tcl versions, by using the rename command, they
can actually be substituted for the originals.  The included script
"binrename.tcl" does just that.


EXAMPLES

If you have a file named foo with the contents:

Hex:	48 69 00 f8 7e 03 00 42  79 65 0a          Ascii:  Hi..~..Bye.

and using standard Tcl:

	set fd [open foo r]
	gets $fd line
	puts stdout "[string length $line] $line"

results in:

	2 Hi


Using the binary I/O extensions:

	set fd [bin_open foo rb]
	bin_gets $fd line
	puts stdout "[string length $line] $line"

results in (with X substituted for the non-printable ASCII chars):

	13 Hi~0X~~X~0Bye

which is not perfect, but usable - the string length includes the
extra tildes (I'm planning on making a new string length command
that will return the length without the escape chars) and the
newline is stripped off of the end (but would be replaced by
calling bin_puts)

You can also:

	bin_str2arr $line myarray

and then:

	puts "$myarray(0) $myarray(1) $myarray(2)"	-> 72 105 0
	array size myarray				-> 10
	puts "$myarray(8) $myarray(9)"			-> 121 101
but
	puts $myarray(10)
results in:
	can't read "myarray(10)": no such element in array


SAMPLE SCRIPTS

The file "bincopy" is an example of a script that copies binary
files a line at a time (using bin_read in large block sizes
would, of course, be more efficient, but I wrote this script
before I implemented the bin_read command)

bytedist reads in a file and then outputs a table showing all
256 different possible byte values, along with a count of how
many times each occured in the file


FUTURE DIRECTIONS

In addition to those things already mentioned (allowing the I/O
commands to read/write arrays of numbers directly and making
a new version of the string command that properly deals with
escaped strings), I had been thinking of implementing a way of
specifying record layouts in a file.  You would then define the
structure of a record and could read/write a record at a time.
Perhaps something like this:

	set recfmt [list \
		{string	2	id} \
		{int	4	controlnum} \
		{string 30	name} ]

then reading from a file and specifying the format as $recfmt would
read 36 bytes from the file and place the first two interpreted as
a string into the variable "id", the next 4 bytes would be considered
an int and placed as a decimal value into "controlnum", and lastly
the variable "name" would contain the remaining 30 bytes.

This could, in fact, be used to read kernel structures.

Making a script to parse C header files and create format specifi-
cations could come later.

However, since Laurent Demailly (dl@hplyot.obspm.fr) has recently
released a package that provides this capability, I may not bother.

