<html>
<head>
<title>Unofficial Guide To CLR Metadata</title>
</head>
<body bgcolor="#ffffff">
<h1>Unofficial Guide To CLR Metadata</h1>

Rhys Weatherley, <a href="mailto:rweather@southern-storm.com.au">rweather@southern-storm.com.au</a>.<br>
Last Modified: $Date: 2001/07/12 06:16:54 $<p>

Copyright &copy; 2001 Southern Storm Software, Pty Ltd.<br>
Permission to distribute unmodified copies of this work is hereby granted.<p>

<h2>1. Introduction</h2>

<blockquote>
Note: this document is now obsolete.  It pertained to the Beta 1 and earlier
versions of Microsoft's .NET framework.  Since Beta 2, the format has
changed considerably and is now fairly well documented by the ECMA.
The information in this document is being made available for historical
purposes only.
</blockquote>

Microsoft's published information on the metadata section within
IL (Intermediate Language) binaries is quite sparse, and it only gives
part of the picture.  Apparently they will be releasing full information
later, but at the time of writing there is very little on the format.<p>

The purpose of this document is to describe the contents of the metadata
section, and to identify any areas for which we still do not have complete
information.  The intended audience is authors of compiler tools and
authors of runtime engines for non-Microsoft platforms.  Contributions are
welcome from those who have discovered other aspects of metadata.<p>

This information has not been "blessed" by Microsoft, and could be
inaccurate.  It is based on the Beta 1 release.  Microsoft could
change the format radically at any moment, making this information
quickly obsolete.  There is some evidence that they will be changing
some of the details for Beta 2.  The author assumes no responsibility
for errors in the description, or for problems that may occur due
to using this information in a product.<p>

Some of the features described here may have been designated as
deprecated by Microsoft.  It is not the purpose of this document
to pass judgement on which features are good or bad: we document
all of them.  In the sections that follow, unknown or guessed
information is marked with "(?)".<p>

Unless otherwise stated, all values in the format are stored in
little-endian order.  We will use the following type names to
describe fields of various sizes:<p>

<table border=1>
<tr><td>Name</td><td>Description</td></tr>
<tr><td>BYTE</td><td>A single 8-bit byte quantity</td></tr>
<tr><td>UINT16</td><td>An unsigned 16-bit word quantity</td></tr>
<tr><td>UINT32</td><td>An unsigned 32-bit double-word quantity</td></tr>
<tr><td>UINT64</td><td>An unsigned 64-bit quad-word quantity</td></tr>
<tr><td>CHAR[n]</td><td>A character array of exactly n bytes in size</td></tr>
<tr><td>ZSTR</td><td>A NUL-terminated string.</td></tr>
<tr><td>PAD</td><td>Indicates padding to the next 4-byte boundary, if
not currently aligned on a 4-byte boundary.</td></tr>
<tr><td>COMPLEN</td><td>Compressed length encoding of a 32-bit value.</td></tr>
<tr><td>STRREF</td><td>A reference to the "<code>#Strings</code>" blob, which
can be either 16 or 32 bits in size</td></tr>
<tr><td>BLOBREF</td><td>A reference to the "<code>#Blob</code>" blob, which
can be either 16 or 32 bits in size</td></tr>
<tr><td>GUIDREF</td><td>A reference to the "<code>#GUID</code>" blob, which
can be either 16 or 32 bits in size</td></tr>
<tr><td>TKREF_nn</td><td>An index into token table nn (a hexadecimal value),
which can be either 16 or 32 bits in size.  For example, "TKREF_08" refers
to an index into the ParamDef token table.  Indexes typically start
at 1, instead of 0.</td></tr>
<tr><td>MIXEDREF_n</td><td>An index into one of a number of different
token tables.  The value n indicates the number of bits that are used
to identify the table.  The upper bits of the value contain the index
into the specified table.<p>

The size (16 or 32 bits) is determined by finding the maximum table size
amongst the types and then seeing if it and the table identifier will
fit into 16 bits.  If it does fit, then the size is 16 bits.  Otherwise
it is 32 bits.<p>

When the MIXEDREF_n type is used, it is typically followed by a description
of the table identifiers to be used in that situation.</td></tr>
</table><p>

The compressed length encoding is as follows:<p>

<table border=1>
<tr><td>0-127</td><td>Encode as the byte itself</td></tr>
<tr><td>128-16383</td><td>Encode as (0x80 | (len &gt;&gt; 8)),
(len &amp; 0xFF)<br>
Note: 16383 is (2^14 - 1).</td></tr>
<tr><td>16384-536870911</td><td>Encode as (0xC0 | (len &gt;&gt; 24)),
((len &gt;&gt; 16) &amp; 0xFF), ((len &gt;&gt; 8) &amp; 0xFF),
(len &amp; 0xFF)<br>
Note: 536870911 is (2^29 - 1).</td></tr>
</table><p>

Lengths greater than 536870911 are not supported by Microsoft's compressed
length encoding.  The functions they use are called
"<code>CorSigCompressData</code>" and "<code>CorSigUncompressData</code>".
<a href="http://www.southern-storm.com.au/portable_net.html">Portable.NET</a>
introduced a 5-byte encoding just in case:<p>

<table border=1>
<tr><td>536870912-4294967295</td><td>Encode as 0xE0, ((len &gt;&gt; 24)
&amp; 0xFF), ((len &gt;&gt; 16) &amp; 0xFF), ((len &gt;&gt; 8) &amp; 0xFF),
(len &amp; 0xFF)</td></tr>
</table><p>

It normally isn't necessary to compress lengths of this size, but it
is probably better to be safe than sorry.<p>

This document does not yet have details of the flag bits and identifiers
used to represent attributes for types, assemblies, fields, methods, etc.
See Microsoft's "<code>corhdr.h</code>" file, or Portable.NET's
"<code>il_meta.h</code>" file for that information.<p>

<h2>2. Locating the metadata section</h2>

The metadata section is referenced by the IL runtime header, starting
at offset 8:<p>

<table border=1>
<tr><td>UINT32</td><td>Relative virtual address (RVA) of the beginning
of the metadata section.</td></tr>
<tr><td>UINT32</td><td>Size of the metadata section in bytes.</td></tr>
</table>

<h2>3. Metadata section header</h2>

The metadata section begins with a "<code>COM+</code>" header.  Don't be
fooled though.  It has absolutely nothing to do with COM.  It is used solely
as a container format for other blob-oriented data.  The "<code>COM+</code>"
header begins with the following 12 bytes:<p>

<table border=1>
<tr><td>CHAR[4]</td><td>The string "<code>COM+</code>", without a NUL
terminator.</td></tr>
<tr><td>UINT32</td><td>The value 1.  May be a version number.  (?)</td></tr>
<tr><td>UINT16</td><td>The value 0.  Don't know what this is.  May be padding.
(?)</td></tr>
<tr><td>UINT16</td><td>The number of index records that follows
this 12 byte header.</td></tr>
</table><p>

Following the main header are zero or more index records for the various
blobs that are stored in the container:<p>

<table border=1>
<tr><td>UINT32</td><td>Offset from the start of the metadata section to the
beginning of the record's blob data.  Offset 0 corresponds to the position
of the "<code>COM+</code>" string.</td></tr>
<tr><td>UINT32</td><td>Size of the record's blob data</td></tr>
<tr><td>ZSTR</td><td>The name of the blob.</td></tr>
<tr><td>PAD</td><td>Pad to the next 4-byte boundary.</td></tr>
</table><p>

Currently known blob names are "<code>#~</code>", "<code>#Strings</code>",
"<code>#Blob</code>", "<code>#US</code>", and "<code>#GUID</code>",
usually in that order.  The order is probably unimportant.  Extra NUL
characters are inserted after the string to pad the record to a 4-byte
boundary.  The next index record begins after the padding.  If the record
is the last one in the list, then the padding is followed by the data
for the first blob.  Each blob is padded to a 4-byte boundary, and
padding is NOT included in the index record's size field.<p>

The following is an example of a metadata section header:<p>

<pre>
000b25b4: 43 4f 4d 2b 01 00 00 00 00 00 05 00 58 00 00 00  COM+........X...
000b25c4: 0c 07 08 00 23 7e 00 00 64 07 08 00 d0 9a 02 00  ....#~..d.......
000b25d4: 23 53 74 72 69 6e 67 73 00 00 00 00 34 a2 0a 00  #Strings....4...
000b25e4: bc 08 01 00 23 42 6c 6f 62 00 00 00 f0 aa 0b 00  ....#Blob.......
000b25f4: d4 b3 01 00 23 55 53 00 c4 5e 0d 00 10 00 00 00  ....#US..^......
000b2604: 23 47 55 49 44 00 00 00                          #GUID...
</pre><p>

We now describe the data blobs.  We leave "<code>#~</code>" till later,
even though it is normally the first blob.  This is because it relies
on information in the other blobs, and so it is easier to explain them first.

<h2>4. String pool blob: "#Strings"</h2>

This blob is the simplest to understand.  It contains 8-bit strings that
are referenced by other metadata blobs; particularly "<code>#~</code>".
The blob begins with a NUL byte, followed by zero or more NUL-terminated
strings.  At the end of the blob, extra NUL bytes are inserted to pad
the blob to a 4-byte boundary.<p>

Strings are referenced by their offset from the start of the "#Strings" blob.
The offset zero is used to represent the empty string, as the first byte is
always NUL.  This can sometimes make it difficult to tell if a structure
member is not specified, or if it is specified but set to the empty string.
We assume that empty string values are equivalent to "not specified"
in most cases.<p>

Following the initial NUL byte, Microsoft typically inserts four special
strings:<p>

<pre>
Version of the runtime against which the binary is built : 1.0.2204.21
&lt;Module&gt;
filename
mscorlib
</pre>

Where "<code>1.0.2204.21</code>" may be replaced with some other version
number.  The version appears to be for informational purposes only, because
there are other fields elsewhere in an IL binary that indicates the actual
version of IL in use.<p>

The "<code>filename</code>" is the name of the assembly in which the
metadata is found.  e.g. "<code>System.IO.dll</code>".  The
"<code>mscorlib</code>" is typically used in "AssemblyRef" metadata to
indicate the builtin runtime library.  Nearly every example that we have seen
seems to have assembly reference information for "<code>mscorlib</code>".
The one difference is "mscorlib.dll", which does not reference itself.

<h2>5. Signature blob: "#Blob"</h2>

This is actually the only blob that Microsoft seems to have documented
reasonably well in the "Metadata Structures" (PDC name) or
"Metadata API" (Beta 1) document.  It contains signature and other
information which is referenced by the "<code>#~</code>" blob.  Structures
such as "FieldSig", "LocalVarSig", custom attributes, etc, appear here.<p>

The signature blob begins with a zero byte, so that a blob reference
of zero can be used to indicate "no information".<p>

<h2>6. Unicode string pool blob: "#US"</h2>

The "<code>#US</code>" ("user strings") blob contains 16-bit Unicode strings
that are used by the program itself, rather than by the metadata.  The
metadata uses the "<code>#Strings</code>" blob to store its strings.<p>

Each string consists of an encoded length value, followed by that
many bytes, including a terminating NUL byte.  The length value is encoded
as a "COMPLEN" value.  The length includes the terminating NUL byte.
For example, the string "<code>*.*</code>" is encoded as:<p>

<pre>
07 2a 00 2e 00 2a 00 00
</pre><p>

i.e. a length byte, the 16-bit little-endian representations of the three
characters, and a terminating NUL.  The real length of the string in Unicode
characters is ((len - 1) / 2).<p>

The blob begins with a NUL byte, which allows the zero offset to be used
to indicate "no string".  The empty string is represented by the byte
sequence "<code>01 00</code>".<p>

A Unicode string at offset N within the "<code>#US</code>" blob is
represented by the token "(0x70000000 | N)" within method code.

<h2>7. GUID blob: "#GUID"</h2>

The "<code>#GUID</code>" blob is always 16 bytes in length and indicates
the GUID for the module.  We think.  Microsoft's "ildasm" tool prints
it as "MVID", whatever that is.  It could be a GUID for the assembly,
or the file, or something else.  In any case, there only seems to be
one GUID in the example files that we've seen, even files that contain
multiple classes, so it definitely is not the GUID of a class.

<h2>8. Token information blob: "#~"</h2>

This blob contains the meta information for all of the tokens that are
used by IL instructions and the runtime engine.  This includes type
definitions, methods, fields, assembly information, etc.  Almost any
kind of data can be stored here.  The format is still a little bit of
a mystery, but the following describes what we have learnt so far.<p>

The blob begins with a 24 byte header:<p>

<table border="1">
<tr><td>UINT32</td><td>Usually the value 0, but we've seen examples
with other values.  No idea what this is. (?)</td></tr>
<tr><td>BYTE</td><td>Major version number.  Usually 0.</td></tr>
<tr><td>BYTE</td><td>Minor version number.  Usually 0x14.</td></tr>
<tr><td>BYTE</td><td>Size flags.  The bit 0x01 will be set if the
"<code>#Strings</code>" blob is greater than 65535 bytes in
length.  This tells the loader to use 32-bit STRREF's instead
of 16-bit STRREF's.<p>

The bit 0x02 will be set if the "<code>#GUID</code>" blob is
greater than 65535 bytes in length.  This tells the loader to use
32-bit GUIDREF's instead of 16-bit GUIDREF's.<p>

The bit 0x04 will be set if the "<code>#Blob</code>" blob is
greater than 65535 bytes in length.  This tells the loader to use
32-bit BLOBREF's instead of 16-bit BLOBREF's.</td></tr>
<tr><td>BYTE</td><td>No idea what this is.  Usually 0x10. (?)</td></tr>
<tr><td>UINT64</td><td>64-bit value that indicates which token types
are present.  If bit n is set, then token type n is present.  For example,
0x0000000900001407 indicates that the following types are present: 0,
1, 2, 10, 12, 32, and 35.  These correspond to "Module", "TypeRef",
"TypeDef", "MemberRef", "CustomAttr", "Assembly", and "AssemblyRef".</td></tr>
<tr><td>UINT64</td><td>The value 0x000040003301fa00.  No idea what
this is.  Flags?  Some Microsoft code seems to call this "sorted".
Sort what? (?)</td></tr>
</table><p>

Following the header is a list of token counts for each of the token
types that are present.  Each count is a 32-bit value.  For example,
with the above token bits, the header might be followed by "1, 2, 1, 2,
2, 1, 1", which indicates 1 Module, 2 TypeRef's, 1 TypeDef, 2 MemberRef's,
2 CustomAttr's, 1 Assembly, and 1 AssemblyRef.<p>

The header and the token counts can be used to allocate memory for the
token table that is used by the IL runtime to represent metadata tokens.<p>

After the token counts are the records for each of the tokens.  There
doesn't appear to be any fixed format for these records.  Each token
type has its own representation, and we have only managed to decode
some of them so far.<p>

The lengths of token records are variable in size.  Some of the fields
may be 16-bit or 32-bit.  The names STRREF and BLOBREF are used to
indicate these variable-length types.  The precise size of these types
can be determined from the flags in the header, or from the size of
the blobs themselves.<p>

<h3>8.1.  Module record (0x00000000)</h3>

<table border=1>
<tr><td>STRREF</td><td>Offset of the string that represents the
"<code>filename</code>" for the module.  This is normally the third
special string in the "<code>#Strings</code>" blob.</td></tr>
<tr><td>UINT16</td><td>Normally the value 0.  Edit and
continue generation count.</td></tr>
<tr><td>GUIDREF</td><td>Index into the "<code>#GUID</code>" blob
for the GUID of the module.  1 indicates the first entry, 2 the
second, etc.  Each entry is 16 bytes in length.  A value of 0
indicates that no GUID has been specified.</td></tr>
</table><p>

If the "<code>.module <i>name</i></code>" assembler directive is
supplied, then "<code><i>name</i></code>" overrides the filename
of the module.  In the examples we have examined, the filename is
still present in the string table, even though it isn't used.
This may just be an oversight in existing code generators, but it
is probably safest to always include it.<p>

This type of record is not used for "<code>.module extern</code>"
directives.  See the description of the "ModuleRef record"
below for the encoding to use in that case.<p>

<!--

BETA2:

<table border=1>
<tr><td>UINT16</td><td>Edit and continue generation count.</td></tr>
<tr><td>STRREF</td><td>Name of the module.</td></tr>
<tr><td>GUIDREF</td><td>First GUID.</td></tr>
<tr><td>GUIDREF</td><td>Second GUID.</td></tr>
<tr><td>GUIDREF</td><td>Third GUID.</td></tr>
</table>

-->

<h3>8.2.  TypeRef record (0x01000000)</h3>

<table border=1>
<tr><td>MIXEDREF_2</td><td>Name of the scope in which the type can
be found.</td></tr>
<tr><td>STRREF</td><td>Name of the type.</td></tr>
<tr><td>STRREF</td><td>Namespace identifier.</td></tr>
</table><p>

For example, a type with the name
"<code>System.Diagnostics.DebuggableAttribute</code>"
would encode offsets for "<code>DebuggableAttribute</code>" and
"<code>System.Diagnostics</code>" within the record.<p>

The scope's MIXEDREF_2 type uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>Module scope: the token indexes the Module table.</td></tr>
<tr><td>1</td><td>External module scope: the token indexes the ModuleRef
table.</td></tr>
<tr><td>2</td><td>Assembly scope: the token indexes the AssemblyRef
table.</td></tr>
<tr><td>3</td><td>Nested scope: the token indexes the TypeRef
table, and indicates the name of the class in which this type
is nested.</td></tr>
</table><p>

We've seen at least one example in the PDC release (<code>wmiclient.dll</code>)
that uses a scope value of 0x0003, which refers to TypeRef 0, which isn't
legal.  We are not sure what it means.  (?)  For now, we have assumed that
it means the same as "unknown scope", or was simply a bug in PDC.

<h3>8.3.  TypeDef record (0x02000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Type definition flags (e.g. public, private,
abstract, etc).</td></tr>
<tr><td>UINT32</td><td>No idea what this is.  Usually 0.  (?)</td></tr>
<tr><td>UINT32</td><td>No idea what this is.  Usually 0.  (?)</td></tr>
<tr><td>STRREF</td><td>Offset of the type name within the
"<code>#Strings</code>" blob.</td></tr>
<tr><td>STRREF</td><td>Offset of the namespace name within the
"<code>#Strings</code>" blob.</td></tr>
<tr><td>UINT16</td><td>No idea what this is.  Usually 0.  (?)</td></tr>
<tr><td>MIXEDREF_1</td><td>Parent type, which may either be a TypeDef (0)
or a TypeRef (1).</td></tr>
<tr><td>TKREF_04</td><td>Index of the first field within the TypeDef.  The
index of the first field in the FieldDef table is 1.  The number of
fields in the type is determined by subtracting this value from the
corresponding value in the next type.  If this is the last type in
the file, then use all fields from this point until the end of the
FieldDef table.</td></tr>
<tr><td>TKREF_06</td><td>Index of the first method within the TypeDef.  The
index of the first method in the MethodDef table is 1.  The number of
methods in the type is determined by subtracting this value from the
corresponding value in the next type.  If this is the last type in
the file, then use all methods from this point until the end of the
MethodDef table.</td></tr>
</table><p>

The first TypeDef is typically "<code>.&lt;Module&gt;</code>".
i.e. the namespace name is the emtpy string (offset 0), and the type
name is "<code>&lt;Module&gt;</code>", which is normally the second
special string in the string table.  All global methods and fields
are attached to this TypeDef.<p>

Information about the interfaces that a TypeDef implements is contained
in the "InterfaceImpl" records.  Information about the properties within
a TypeDef are contained in "PropertyAssociation" records.<p>

<!--

BETA2:

In Beta 2, the "No idea" fields are removed, and the MIXEDREF_1 is actually
a MIXEDREF_2.  0 = TypeDef, 1 = TypeRef, 2 = TypeSpec.

-->

<h3>8.4.  FieldPtr record (0x03000000)</h3>

<table border=1>
<tr><td>TKREF_04</td><td>Index of the referenced field.</td></tr>
</table>

<h3>8.5.  FieldDef record (0x04000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Field definition flags (e.g. public,
static, etc).</td></tr>
<tr><td>STRREF</td><td>Offset of the name of the field within the
"<code>#Strings</code>" blob.</td></tr>
<tr><td>BLOBREF</td><td>Offset of the signature definition in the
"<code>#Blob</code>" blob.</td></tr>
</table>

<h3>8.6.  MethodPtr record (0x05000000)</h3>

<table border=1>
<tr><td>TKREF_06</td><td>Index of the referenced method.</td></tr>
</table>

<h3>8.7.  MethodDef record (0x06000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>RVA of the beginning of the method's code.</td></tr>
<tr><td>UINT16</td><td>Implementation attributes.  (e.g. il, native,
synchronized, etc).</td></tr>
<tr><td>UINT16</td><td>Method definition flags (e.g. public,
static, etc).</td></tr>
<tr><td>STRREF</td><td>Name of the method.</td></tr>
<tr><td>BLOBREF</td><td>Offset of the signature definition in the
"<code>#Blob</code>" blob.</td></tr>
<tr><td>TKREF_08</td><td>Index into the ParamDef table for the
parameters supplied to this method.  The number 1 indicates the
first ParamDef.  If the method has multiple parameters, as
indicated by the signature, then there will be N consecutive
ParamDef's starting at this index in the table.  N is the number
of parameters in the signature.</td></tr>
</table>

<h3>8.8.  ParamPtr record (0x07000000)</h3>

<table border=1>
<tr><td>TKREF_08</td><td>Index of the referenced parameter.</td></tr>
</table>

<h3>8.9.  ParamDef record (0x08000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Parameter attributes.  (e.g. in, out, lcid,
etc)</td></tr>
<tr><td>UINT16</td><td>Parameter number within the referring method.
1 indicates the first parameter.  The value 0 indicates the return
value, if there is a parameter definition for it.</td></tr>
<tr><td>STRREF</td><td>Name of the parameter.</td></tr>
</table><p>

It is not currently known what happens when the number of parameters
to a method is 65535 or higher.  (?)  Microsoft's version of "ilasm" seems
to choke on a method with that many parameters.  However, since it
is extremely unlikely that methods will have such a signature, it
shouldn't be a problem in practice.  For safety, loaders should
probably have a maximum limit on the number of parameters that a
method is permitted to have.

<h3>8.10.  InterfaceImpl record (0x09000000)</h3>

<table border=1>
<tr><td>TKREF_02</td><td>Index of the TypeDef to which this interface
implementation record applies.  The number 1 indicates the first TypeDef.
</td></tr>
<tr><td>MIXEDREF_1</td><td>Reference to the TypeDef (0) or TypeRef (1)
for the name of the interface which the TypeDef is implementing.</td></tr>
</table>

<!--

BETA2:

In Beta 2, the MIXEDREF_1 is replaced with a MIXEDREF_2.  TypeDef = 0,
TypeRef = 1, TypeSpec = 2.

-->

<h3>8.11.  MemberRef record (0x0A000000)</h3>

<table border=1>
<tr><td>MIXEDREF_3</td><td>Parent resolution scope for the member that
is being referenced.</td></tr>
<tr><td>STRREF</td><td>Name of the member that is being referenced.</td></tr>
<tr><td>BLOBREF</td><td>Signature of the member that is being
referenced.</td></tr>
</table><p>

The parent resolution scope uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>MemberRef</td></tr>
<tr><td>1</td><td>TypeRef</td></tr>
<tr><td>2</td><td>ModuleRef</td></tr>
<tr><td>3</td><td>MethodDef for a <code>vararg</code> method in
the same module.</td></tr>
<tr><td>4</td><td>TypeSpec for a constructed type. (e.g. an array
type)</td></tr>
</table><p>

<h3>8.12.  FieldInit record (0x0B000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Element type for the field.</td></tr>
<tr><td>MIXEDREF_2</td><td>Token reference for the field to which
this initialization record applies.</td></tr>
<tr><td>BLOBREF</td><td>Offset of the initialization data within
the "<code>#Blob</code>" blob.</td></tr>
</table><p>

<!--

BETA2:

The first field may be a BYTE in Beta 2.

-->

The token reference uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>FieldDef</td></tr>
<tr><td>1</td><td>ParamDef</td></tr>
<tr><td>2</td><td>Property</td></tr>
</table><p>

<h3>8.13.  CustomAttr record (0x0C000000)</h3>

<table border=1>
<tr><td>MIXEDREF_5</td><td>Token reference for the token that this
custom attribute is attached to.  i.e. its owner.</td></tr>
<tr><td>MIXEDREF_3</td><td>Token reference for the name of the
attribute.</td></tr>
<tr><td>BLOBREF</td><td>Offset of the attribute's value, or zero
if there is no value.</td></tr>
</table><p>

The owner token reference uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>MethodDef</td></tr>
<tr><td>1</td><td>FieldDef</td></tr>
<tr><td>2</td><td>TypeRef</td></tr>
<tr><td>3</td><td>TypeDef</td></tr>
<tr><td>4</td><td>ParamDef</td></tr>
<tr><td>5</td><td>InterfaceImpl</td></tr>
<tr><td>6</td><td>MemberRef</td></tr>
<tr><td>7</td><td>ModuleDef</td></tr>
<tr><td>8</td><td>Permission</td></tr>
<tr><td>9</td><td>Property</td></tr>
<tr><td>10</td><td>Event</td></tr>
<tr><td>11</td><td>Signature</td></tr>
<tr><td>12</td><td>ModuleRef</td></tr>
<tr><td>13</td><td>TypeSpec</td></tr>
<tr><td>14</td><td>Assembly</td></tr>
<tr><td>15</td><td>AssemblyRef</td></tr>
<tr><td>16</td><td>File</td></tr>
<tr><td>17</td><td>ComType</td></tr>
<tr><td>24</td><td>ManifestResource</td></tr>
</table><p>

<!--

BETA2:

Beta 2 uses 18 for ManifestResource.

-->

The name token references uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>TypeRef</td></tr>
<tr><td>1</td><td>TypeDef</td></tr>
<tr><td>2</td><td>MethodDef</td></tr>
<tr><td>3</td><td>MemberRef</td></tr>
<tr><td>4</td><td>String</td></tr>
</table>

<h3>8.14.  MarshalDef record (0x0D000000)</h3>

<table border=1>
<tr><td>MIXEDREF_1</td><td>Token reference for the item to which
this marshal definition applies.  Field = 0, ParamDef = 1.</td></tr>
<tr><td>BLOBREF</td><td>Native type that describes how to marshal.</td></tr>
</table><p>

<h3>8.15.  Permission record (0x0E000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Permission type.  (e.g. request, deny,
reqmin, etc)</td></tr>
<tr><td>MIXEDREF_2</td><td>Token to which the permission has been
attached.</td></tr>
<tr><td>BLOBREF</td><td>Serialized data for the permission
information.</td></tr>
</table><p>

The token reference in the second field uses the following type indicators:<p>

<table border=1>
<tr><td>0</td><td>TypeDef</td></tr>
<tr><td>1</td><td>MethodDef</td></tr>
<tr><td>2</td><td>Assembly</td></tr>
</table>

<h3>8.16.  LayoutDef record (0x0F000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Data packing for the object's representation,
which should be one of 1, 2, 4, 8, or 16.  [<code>.pack</code>]</td></tr>
<tr><td>UINT32</td><td>Size of the object's representation.
[<code>.size</code>]</td></tr>
<tr><td>TKREF_02</td><td>TypeDef for which this layout definition
applies.  The number 1 indicates the first TypeDef.</td></tr>
<tr><td>UINT16</td><td>No idea what this is.  Usually zero.  (?)</td></tr>
</table><p>

<!--

BETA2:

The last field is missing in Beta 2.

-->

<h3>8.17.  FieldOffset record (0x10000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Offset of the field.</td></tr>
<tr><td>TKREF_04</td><td>Field definition to which this record
applies.</td></tr>
</table><p>

This record is emitted when the assembler directive "<code>.field [N]
Type Id</code>" is used, where "<code>N</code>" is the offset.
A LayoutDef record for the field will normally be output also.

<h3>8.18.  Signature record (0x11000000)</h3>

<table border=1>
<tr><td>BLOBREF</td><td>Offset of the signature definition</td></tr>
</table>

<h3>8.19.  EventAssociation record (0x12000000)</h3>

<table border=1>
<tr><td>TKREF_02</td><td>Index of the TypeDef that owns the events
covered by this association.</td></tr>
<tr><td>TKREF_14</td><td>Index of the first event in the Event table
that is covered by this association.  The number of events is
determined by subtracting this value from the index of the first event
in the next EventAssociation record.  If this is the last record in
the file, then the events start at this index and continue to the end of
the Event table.</td></tr>
</table><p>

This type of record is used to associate event definitions with
their owning types.

<h3>8.20.  EventPtr record (0x13000000)</h3>

<table border=1>
<tr><td>TKREF_14</td><td>Index of the referenced event</td></tr>
</table>

<h3>8.21.  Event record (0x14000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Event attributes.  (e.g. specialname,
rtspecialname)</td></tr>
<tr><td>STRREF</td><td>Name of the event.</td></tr>
<tr><td>MIXEDREF_1</td><td>Index of the TypeDef (0) or TypeRef (1) for
the event's type, or zero if the event does not have a type.  Note: this
is the type of the event itself, not the type in which it is contained.  The
type in which it is contained is defined by an EventAssociation
record.</td></tr>
</table>

<h3>8.22.  PropertyAssociation record (0x15000000)</h3>

<table border=1>
<tr><td>TKREF_02</td><td>Index of the TypeDef that owns the properties
covered by this association.</td></tr>
<tr><td>TKREF_17</td><td>Index of the first property in the Property table
that is covered by this association.  The number of properties is
determined by subtracting this value from the index of the first property
in the next PropertyAsssociation record.  If this is the last record in
the file, then the properties start at this index and continue to the end of
the Property table.</td></tr>
</table><p>

This type of record is used to associate property definitions with
their owning types.

<h3>8.23.  PropertyPtr record (0x16000000)</h3>

<table border=1>
<tr><td>TKREF_17</td><td>Index of the referenced property</td></tr>
</table>

<h3>8.24.  Property record (0x17000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Property attributes.  (e.g. specialname,
rtspecialname).</td></tr>
<tr><td>STRREF</td><td>Name of the property.</td></tr>
<tr><td>BLOBREF</td><td>Signature for the property.</td></tr>
<tr><td>UINT16</td><td>No idea what this is.  Usually 0.  (?)</td></tr>
</table><p>

<!--

BETA2:

The last field is missing in Beta 2.

-->

<h3>8.25.  MethodAssociation record (0x18000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>Method semantics.  (e.g. getter, setter, addon,
etc)</td></tr>
<tr><td>TKREF_06</td><td>Index of the method that implements the requested
property or event association.</td></tr>
<tr><td>MIXEDREF_1</td><td>Reference to the Event (0) or Property (1)
that owns this method association.</td></tr>
</table><p>

This type of record is used to associate methods with either property
get and set actions or with event actions.

<h3>8.26.  MethodImpl record (0x19000000)</h3>

<table border=1>
<tr><td>TKREF_02</td><td>Type</td></tr>
<tr><td>MIXEDREF_1</td><td>MethodDef = 0, MemberRef = 1</td></tr>
<tr><td>MIXEDREF_1</td><td>MethodDef = 0, MemberRef = 1</td></tr>
</table>

<h3>8.27.  ModuleRef record (0x1A000000)</h3>

<table border=1>
<tr><td>STRREF</td><td>Name of the referenced module.</td></tr>
</table><p>

The referenced module record is produced using
the "<code>.module extern</code>" assembler directive.

<h3>8.28.  TypeSpec record (0x1B000000)</h3>

<table border=1>
<tr><td>BLOBREF</td><td>Signature that represents the
type specification.</td></tr>
</table>

<h3>8.29.  PInvoke record (0x1C000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>PInvoke attributes.  (e.g. ansi, winapi,
ole, etc)</td></tr>
<tr><td>MIXEDREF_1</td><td>Token index for the item that this PInvoke
record is attached to.  FieldDef = 0, MethodDef = 1.</td></tr>
<tr><td>STRREF</td><td>Alias name for the method.</td></tr>
<tr><td>TKREF_1A</td><td>Index into the ModuleRef table for the
name of the DLL that contains the platform implementation.</td></tr>
</table><p>

PInvoke records are produced using the "<code>pinvokeimpl(<i>dllname</i>
[as <i>alias</i>] <i>attrs</i>)</code>" specification on a method.
If the alias is not supplied, then the third field will be set to the
method name.<p>

<h3>8.30.  Data record (0x1D000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>RVA of the data within either the "<code>.sdata</code>"
or "<code>.tls</code>" sections of the PE file.</td></tr>
<tr><td>TKREF_04</td><td>Field definition to which this record
applies.</td></tr>
</table><p>

This type of record is emitted when the "<code>at</code>" keyword is used
with the "<code>.field</code>" assembler directive.

<h3>8.31.  EncLog record (0x1E000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>No idea what this is.  (?)</td></tr>
<tr><td>UINT32</td><td>No idea what this is.  (?)</td></tr>
</table>

<h3>8.32.  EncAssociation record (0x1F000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>No idea what this is.  (?)</td></tr>
</table>

<h3>8.33.  Assembly record (0x20000000)</h3>

In the table below, the ilasm assembly directive that produces the
information is listed in square brackets:<p>

<table border=1>
<tr><td>UINT32</td><td>Hash algorithm identifier.
[<code>.hash algorithm</code>]</td></tr>
<tr><td>UINT16</td><td>First word of the assembly version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Second word of the assembly version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Third word of the assembly version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Fourth word of the assembly version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT32</td><td>Assembly attribute flags.
e.g. "<code>implicitcom</code>", "<code>noappdomain</code>", etc.
[<code>.assembly</code>]</td></tr>
<tr><td>BLOBREF</td><td>Originator public key, or zero if not specified.
[<code>.originator</code>]<br>The public key is encoded in the
signature blob as a COMPLEN value followed by the bytes of the key.</td></tr>
<tr><td>STRREF</td><td>Name of the assembly.
[<code>.assembly</code>]</td></tr>
<tr><td>STRREF</td><td>Name of the locale, or zero if not specified.
[<code>.locale</code>]</td></tr>
<tr><td>STRREF</td><td>Configuration name.  [<code>.config</code>]</td></tr>
<tr><td>STRREF</td><td>Title of the assembly, or zero if not specified.
[<code>.title</code>]</td></tr>
<tr><td>STRREF</td><td>Description of the assembly, or zero
if not specified.  [<code>.title</code>]</td></tr>
<tr><td>STRREF</td><td>Alternative name for the assembly.
[<code>.assembly name as "altname"</code>]</td></tr>
</table>

<!--

BETA2:

The last four fields are missing in Beta 2.

-->

<h3>8.34.  ProcessorDef record (0x21000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Processor number.</td></tr>
</table><p>

The processor record is produced using the "<code>.processor</code>"
assembler directive within an assembly definition.

<h3>8.35.  OSDef record (0x22000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Operating system identifier.</td></tr>
<tr><td>UINT32</td><td>Operating system major version.</td></tr>
<tr><td>UINT32</td><td>Operating system minor version.</td></tr>
</table><p>

The operating system definition record is produced using the
"<code>.os</code>" assembler directive within an assembly definition.

<h3>8.36.  AssemblyRef record (0x23000000)</h3>

<table border=1>
<tr><td>UINT16</td><td>First word of the assembly reference version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Second word of the assembly reference version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Third word of the assembly reference version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT16</td><td>Fourth word of the assembly reference version, or
zero if not specified.  [<code>.ver</code>]</td></tr>
<tr><td>UINT32</td><td>Assembly reference attributes.  e.g.
"<code>fullorigin</code>".  [<code>.assembly extern</code>]</td></tr>
<tr><td>BLOBREF</td><td>Originator public key, or zero if not
specified.  [<code>.originator</code>]</td></tr>
<tr><td>STRREF</td><td>Name of the referenced assembly.
[<code>.assembly extern</code>]</td></tr>
<tr><td>STRREF</td><td>Locale name for the referenced assembly, or
zero if not specified.  [<code>.locale</code>]</td></tr>
<tr><td>STRREF</td><td>Configuration name.  [<code>.config</code>]</td></tr>
<tr><td>BLOBREF</td><td>Hash value for the referenced assembly.
[<code>.hash</code>]</td></tr>
<tr><td>UINT16</td><td>No idea what this is.  Usually zero.  (?)</td></tr>
</table><p>

<!--

BETA2:

The third-last and last fields are missing in Beta 2.

-->

The AssemblyRef record is missing an entry for the "alternative name",
even though the assembler syntax does support specifying such names.
The Beta 1 version of Microsoft's assembler appears to ignore alternative
names if they are supplied.<p>

We would be tempted to assume that the last field is the alternative
name, and that the current assembler has a bug and is not setting it
properly.  However, the last field does not widen when the
"<code>#Strings</code>" blob gets large.  i.e. its type does not
appear to be "STRREF".

<h3>8.37.  ProcessorRef record (0x24000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Processor number.</td></tr>
<tr><td>TKREF_23</td><td>Index of the AssemblyRef record that contains
this ProcessorRef record.  1 indicates the first assembly
reference.</td></tr>
</table><p>

The operating system definition record is produced using the
"<code>.processor</code>" assembler directive within an assembly
reference definition.  It is possible to have more than one
ProcessorRef record refering to the same AssemblyRef record.

<h3>8.38.  OSRef record (0x25000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Operating system identifier.</td></tr>
<tr><td>UINT32</td><td>Operating system major version.</td></tr>
<tr><td>UINT32</td><td>Operating system minor version.</td></tr>
<tr><td>TKREF_23</td><td>Index of the AssemblyRef record that contains
this OSRef record.  1 indicates the first assembly reference.</td></tr>
</table><p>

The operating system definition record is produced using the
"<code>.os</code>" assembler directive within an assembly reference
definition.  It is possible to have more than one OSRef record
refering to the same AssemblyRef record.

<h3>8.39.  File record (0x26000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>File flags.  e.g. "<code>nometadata</code>"
and "<code>readonly</code>".</td></tr>
<tr><td>STRREF</td><td>Filename.</td></tr>
<tr><td>BLOBREF</td><td>The hash value for the file.  The value is
stored in the signature blob as a COMPLEN followed by the bytes
of the hash.</td></tr>
</table><p>

<h3>8.40.  ComType record (0x27000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Type definition attributes.  This uses the
same values as for TypeDef attributes.  (e.g. public, nested private,
etc)</td></tr>
<tr><td>UINT32</td><td>Specifies an integer class identifier.
[<code>.class</code>]</td></tr>
<tr><td>STRREF</td><td>Type name.</td></tr>
<tr><td>STRREF</td><td>Type namespace</td></tr>
<tr><td>STRREF</td><td>Name to use to export the type.</td></tr>
<tr><td>MIXEDREF_2</td><td>Token reference.  File = 0, AssemblyRef = 1,
ComType = 2.</td></tr>
<tr><td>UINT16</td><td>(?)</td></tr>
</table>

<!--

BETA2:

This structure is very different for Beta 2.  No real details yet.

-->

<h3>8.41.  ManifestResource record (0x28000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>Byte offset within the file that holds
the resource.  [<code>.file</code>]</td></tr>
<tr><td>UINT32</td><td>Manifest resource attribute flags.  (e.g. public,
private).  [<code>.manifestres</code>]</td></tr>
<tr><td>STRREF</td><td>Name of the locale for the resource.
[<code>.locale</code>]</td></tr>
<tr><td>STRREF</td><td>Name of the resource.
[<code>.manifestres</code>]</td></tr>
<tr><td>STRREF</td><td>Description of the resource.
[<code>.manifestres</code>]</td></tr>
<tr><td>UINT16</td><td>No idea what this is.  Usually 0.  (?)</td></tr>
<tr><td>STRREF</td><td>MIME type for the resource.
[<code>.mime</code>]</td></tr>
</table>

<!--

BETA2:

This structure is very different for Beta 2.  No real details yet.

-->

<h3>8.42.  ExeLocation record (0x29000000)</h3>

<table border=1>
<tr><td>UINT32</td><td>No idea what this is.  Usually zero.  (?)</td></tr>
<tr><td>STRREF</td><td>Name 1.</td></tr>
<tr><td>STRREF</td><td>Name 2.</td></tr>
<tr><td>STRREF</td><td>Name 3.</td></tr>
</table><p>

The executable location record is produced using the "<code>.exeloc</code>"
assembler directive.  The two forms of the directive are:

<blockquote>
<code>.exeloc Name1 ("Name2") at "Name3"</code><br>
<code>.exeloc Name1 at "Name3"</code>
</blockquote>

<h3>8.43.  SourceFile record (0x2A000000)</h3>

<h3>8.44.  (?) record (0x2B000000)</h3>

<h3>8.45.  LocalVarScope record (0x2C000000)</h3>

<h3>8.46.  LocalVar record (0x2D000000)</h3>

<h3>8.47.  NestedClass record (0x2E000000)</h3>

<table border=1>
<tr><td>TKREF_02</td><td>TypeDef for the child that is being nested.</td></tr>
<tr><td>TKREF_02</td><td>TypeDef for the class that the child is
being nested within.</td></tr>
</table>

<h2>9. Owner Names</h2>

Microsoft's tools allow an "owner name" to be placed on a file when
it is assembled with Microsoft's "ilasm" program.  This name must be
supplied to disassemble an IL binary with Microsoft's "ildasm" program.<p>

The algorithm used to encrypt the owner name is extremely weak.
We have dubbed it the "Kid Sister Security Algorithm", or KSSA, because
it would only be effective against one's kid sister, and only if she
knew nothing about computer programming or mathematics:

<ul>
	<li>Assume that the plaintext is in the array <code>p[0..N-1]</code>,
		where <code>N</code> is the length of the plaintext.</li>
	<li>Initialize <code>key</code> to zero.</li>
	<li>For <code>i</code> = <code>0</code> to <code>N-1</code> do:
		<ul>
			<li>Set <code>c[i]</code> to <code>((p[i] + key)
			    &amp; 255)</code>.</li>
			<li>Set <code>key</code> to <code>((key + p[i]) &amp;
			    255)</code>.</li>
		</ul>
	</li>
	<li>The ciphertext is now in <code>c[0..N-1]</code>.</li>
</ul>

The KSSA decryption algorithm is as follows:

<ul>
	<li>Initialize <key>key</code> to zero.</li>
	<li>For <code>i</code> = <code>0</code> to <code>N-1</code> do:
		<ul>
			<li>Set <code>p[i]</code> to <code>((c[i] - key)
			    &amp; 255)</code>.</li>
			<li>Set <code>key</code> to <code>((key + p[i]) &amp;
			    255)</code>.</li>
		</ul>
	</li>
	<li>The plaintext is now in <code>p[0..N-1]</code>.</li>
</ul>

The details of KSSA were discovered in under 20 minutes using a chosen
plaintext attack, and the author is by no means an expert in cryptanalysis.
It provides no real security against either determined or simple adversaries.
It obfuscates the owner name so that it does not appear in plaintext
during casual viewing of a file, and that is all.<p>

The encrypted owner name is placed in a custom attribute called
"<code>Copyrighted.Material.Disassembly.Disabled</code>".  The value
begins with the length of the ciphertext, followed by the ciphertext itself.
The length is encoded as a "COMPLEN" value.  If "<code>/OWNER</code>"
was supplied to "ilasm", but no owner name was specified, then the
custom attribute will be present but won't have a value associated with it.

<h2>10. Resources</h2>

Although the resource section is not strictly part of the metadata,
we have documented it here because it is reasonably simple.
The resource section begins with the following structure:<p>

<table border=1>
<tr><td>UINT32</td><td>Length of the resource data that follows.  This
length includes the next field, but not the length field.</td></tr>
<tr><td>UINT32</td><td>The magic number value 0xBEEFCACE.</td></tr>
</table><p>

When the resources are stored in a separate file outside an assembly,
the length field is omitted.<p>

All strings in the section are encoded as a length value followed by
the bytes of the string.  There is no NUL terminator on the strings.
The strings are assumed to be in the UTF-8 encoding.<p>

The length is encoded as a multi-byte value.  The value is first
split into a list of 7-bit values.  The values are output from the
least significant 7 bits to the most significant.  Each byte except
the last has its high bit set.  Thus, 127 is encoded as 0x7F, and
130 is encoded as 0x82 0x01.<p>

Following the header is the name of the class to use to parse the
file.  The usual value is "System.Resources.ResourceReader",
encoded as described above.  This is followed by a sub-header:<p>

<table border=1>
<tr><td>UINT32</td><td>Version number.  The usual value is 4.</td></tr>
<tr><td>UINT32</td><td>The number of strings in the resource
section.</td></tr>
</table><p>

Next, the names of all of the resources are placed in the file.  Each
name consists of a string and a 32-bit offset.  The offset is NOT
aligned on a 32-bit boundary: it is placed immediately after the
string.<p>

The resource names are followed by a 32-bit value indicating the
number of class names, and the class names themselves.  Usually
there is 1 element in this list called "System.String".<p>

Finally, the string values themselves appear.  Each string value
is referenced by an offset that begins jut after the "System.String"
string described above.<p>

Each string value consists of a 32-bit type value, followed by
the string itself.  The type is usually 0.<p>

Multiple resource files can be embedded in the same resource section.
The length value in the header allows the loader to skip from file
to file.

</body>
</html>
