12/28/94 This README file describes version 7.4a1, the first alpha release of BinarIO. This version is compatible with Tcl 7.4b1 and is considered alpha because not all the planned features have been implemented nor has it been tested on a wide variety of platforms. Plus the documentation is incomplete. Still, the parts that have been implemented are believed to be fairly stable. I wrote this several months ago, but hadn't done much with it for some time. But with the recent release of Tcl 7.4b1 and the tclbin extension and since it is quite usable in its current state, I've decided to package it up and release it. BinarIO (pronounced bee-NAH-ree-oh, for those of you who don't speak Italian) is a package for performing unstructured binary input and output in Tcl. This program is Copyright 1994 by Joseph V. Moss (joe@italia.rain.com). See the file LICENSE for distribution terms. BACKGROUND Tcl uses null-terminated strings throughout, and thus cannot deal with nulls embedded within strings. It does however handle any other character, so only '\0' is special. While it would be possible to alter Tcl so that, internally, it no longer used null-terminated strings, it would be a great deal of work and amongst other disadvantages, it would break the easy of extending the language that is currently enjoyed. This package works around the problem by storing the data that is input in one of two different forms: In strings, but with the null escaped somehow. Specifically this packages allows you to set an escape character. The this character is doubled whenever a actual occurance of the character is found in the input stream (it escapes itself) and whenever a null is found in the input stream is stored as the escape character followed by a zero (0). In arrays, where each element is indexed by the position of the character in the I/O stream, and the contents of each element being a decimal number corresponding the character. As data is input, it is translated to one of the above forms. The data can then be manipulated in whatever manner you choose from within Tcl. The data is translated back, on output. (Actually the current implementation only does I/O in the first form, but includes commands to convert between the two methods, for use use when the second form is more convenient for data modification/access) BUILDING You'll need to compile all of the C files (which will require header files from the Tcl distribution, that are not normally installed) and then link them with the tcl library, for example: cc -O binAppInit.o bin_Misc.o bin_UnixAZ.o -ltcl -lm -o tclsh (binAppInit.c is just the standard AppInit file from the Tcl distribution plus a call to "Bin_Init" - you can modify/replace it if you want to add in other extensions) The header files may require that you set certain defines (e.g. HAVE_UNISTD_H) to match the value of the AC_FLAGS in the Makefile in the Tcl distribution source directory, as generated by the configure script. You can use the Makefile included in this directory to aid in building the software. It includes some comments and sample variable settings that may be of use. USAGE The extension consists of two global variables and the commands listed below: Variables- bin_EscChar - The character used to escape nulls; it defaults to '~', but can be pretty much any character that doesn't have special meaning to the Tcl parser bin_Mode - If set to a non-zero value, files opened with the "bin_open" command will be opened in binary mode by default Commands- The first four commands work just like their standard counterparts (i.e. open, gets, puts, read) with the differences listed below: bin_open - Open file, the last argument can accept the additional options: b, a, BINARY, and ASCII, where b & BINARY force binary mode and a & ASCII force standard mode. If you don't specify, you'll get whatever mode is set by the "bin_Mode" variable. For example, to open a file (R/W) in binary I/O mode you can either of these forms: set fd [open $filename r+b] set fd [open $filename \ [list RDWR BINARY]] bin_gets - Inputs string and escapes nulls bin_puts - Outputs string with escaped nulls bin_read - Inputs a fixed number of bytes and escapes nulls bin_arr2str - Converts a string w/ escaped nulls to an array of numeric values bin_str2arr - Converts the other way Since several of these commands are totally backward compatible with the standard Tcl versions, by using the rename command, they can actually be substituted for the originals. The included script "binrename.tcl" does just that. EXAMPLES If you have a file named foo with the contents: Hex: 48 69 00 f8 7e 03 00 42 79 65 0a Ascii: Hi..~..Bye. and using standard Tcl: set fd [open foo r] gets $fd line puts stdout "[string length $line] $line" results in: 2 Hi Using the binary I/O extensions: set fd [bin_open foo rb] bin_gets $fd line puts stdout "[string length $line] $line" results in (with X substituted for the non-printable ASCII chars): 13 Hi~0X~~X~0Bye which is not perfect, but usable - the string length includes the extra tildes (I'm planning on making a new string length command that will return the length without the escape chars) and the newline is stripped off of the end (but would be replaced by calling bin_puts) You can also: bin_str2arr $line myarray and then: puts "$myarray(0) $myarray(1) $myarray(2)" -> 72 105 0 array size myarray -> 10 puts "$myarray(8) $myarray(9)" -> 121 101 but puts $myarray(10) results in: can't read "myarray(10)": no such element in array SAMPLE SCRIPTS The file "bincopy" is an example of a script that copies binary files a line at a time (using bin_read in large block sizes would, of course, be more efficient, but I wrote this script before I implemented the bin_read command) bytedist reads in a file and then outputs a table showing all 256 different possible byte values, along with a count of how many times each occured in the file FUTURE DIRECTIONS In addition to those things already mentioned (allowing the I/O commands to read/write arrays of numbers directly and making a new version of the string command that properly deals with escaped strings), I had been thinking of implementing a way of specifying record layouts in a file. You would then define the structure of a record and could read/write a record at a time. Perhaps something like this: set recfmt [list \ {string 2 id} \ {int 4 controlnum} \ {string 30 name} ] then reading from a file and specifying the format as $recfmt would read 36 bytes from the file and place the first two interpreted as a string into the variable "id", the next 4 bytes would be considered an int and placed as a decimal value into "controlnum", and lastly the variable "name" would contain the remaining 30 bytes. This could, in fact, be used to read kernel structures. Making a script to parse C header files and create format specifi- cations could come later. However, since Laurent Demailly (dl@hplyot.obspm.fr) has recently released a package that provides this capability, I may not bother.