I wrote about version control systems in the last issue of the ASNA Newsletter. The article was inspired in part by a desperate customer having lost his source code and hoping that we had a magic way to recover that source. Alas, we don’t. Thus the article about version control. Astute reader Stanley Marcus mentioned in his comments to that article that I didn’t mention that source is generally able to be inferred directly by reading a .NET assembly. I did mention that to the customer at the time–but as you’ll see, it’s not really a very realistic way to recover source (AVR or otherwise) for the purposes of reconstructing the source for an entire project. Still, Stanley is correct that it can be done. Let’s take a closer look.
Disassemblers and assembly browsers
There is an interesting type of tool that the .NET Framework makes possible because of the way it compiles and executes code. These tools are called decompilers and/or assembly browsers. When .NET source code is compiled, the compiler output isn’t directly executable machine instructions. Rather an intermediate “assembly” is produced that at runtime works with .NET’s Common Language Runtime to produce the actual executable machine instructions. The intermediate assemblies that .NET produces provide a misleading appearance because, given their extensions, “EXE” or “DLL,” it would be quite reasonable to assume these files are the actually machine executables. But they are not–these are simply .NET’s intermediate files–that without the presence of the CLR won’t execute.
The immediate output of a .NET program (regardless of the language used to create the program) is Microsoft Intermediate Language (MSIL). When you investigate MSIL it looks like a mutant strain of assembler. A utility is provided with .NET called ILDASM.EXE (for Intermediate Language Disassembler). You can open one of your AVR EXEs or DLLs with ILDASM and see the MSIL produced for that assembly. For example, let’s consider a very simple AVR function. Its job is to report if there are more records forward from the last record read.
BegFunc HasMoreRowsForward Type(*Boolean)
SetGT CustomerByName Key(CustomerByName_CMName, CustomerByName_CMCustNo)
If we were to examine the source for this routine with ILDASM, the code looks like this:
.method private hidebysig instance bool HasMoreRowsForward() cil managed
// Code size 94 (0x5e)
.locals init ( bool V_0)
IL_0001: ldfld class [ASNA.VisualRPG.Runtime]ASNA...
IL_0007: isinst Acme.Lists.CustomerByName/CustomerByName
IL_000c: ldfld class '!File_Acme.Lists'.CustomerByName_CustomerByName/RCMMastL2/Key Acme.Lists.CustomerByName/CustomerByName::KeyRCMMastL2
IL_0013: ldfld string Acme.Lists.CustomerByName::CustomerByName_CMName
IL_0018: callvirt instance void '!File_Acme.Lists'.CustomerByName_CustomerByName/RCMMastL2/Key::set_CMName(string)
IL_001f: ldfld valuetype [mscorlib]System.Decimal Acme.Lists.CustomerByName::CustomerByName_CMCustNo
IL_0024: callvirt instance void '!File_Acme.Lists'.CustomerByName_CustomerByName/RCMMastL2/Key::set_CMCustNo(valuetype [mscorlib]System.Decimal)
IL_002a: callvirt instance void [ASNA.DataGate.Client]ASNA.DataGate.Client.AdgKeyTable::set_KeyPartCount(int32)
IL_002f: ldstr "*FILE"
IL_0035: ldfld class [ASNA.VisualRPG.Runtime]ASNA.VisualRPG.Runtime.DBFile Acme.Lists.CustomerByName::CustomerByName
IL_003a: isinst Acme.Lists.CustomerByName/CustomerByName
IL_003f: ldfld class '!File_Acme.Lists'.CustomerByName_CustomerByName/RCMMastL2/Key Acme.Lists.CustomerByName/CustomerByName::KeyRCMMastL2
IL_0044: ldc.i4.s 46
IL_0046: callvirt instance void [ASNA.VisualRPG.Runtime]ASNA.VisualRPG.Runtime.DBFile::Seek(string,
IL_004c: ldfld class [ASNA.VisualRPG.Runtime]ASNA.VisualRPG.Runtime.DBFile Acme.Lists.CustomerByName::CustomerByName
IL_0051: callvirt instance bool [ASNA.VisualRPG.Runtime]ASNA.VisualRPG.Runtime.RpgFile::get_IsFound()
IL_0057: br IL_005c
} // end of method CustomerByName::HasMoreRowsForward
The lines of the code above wrap because IL lines are generally very long. I normally try to present code online in a more formatted fashion. In the case of IL, though, I don’t think it matters. There are very few of us who can make sense out of it, formatted well or not! The interesting thing to notice is how much code got generated from our little four line AVR function. This gives you an appreciation of the abstractions provided by the AVR compiler (which we’ll reconfirm in just a moment).
ILDASM is only able to show an assembly expressed in Intermediate Language. It isn’t smart enough to walk that IL backwards to produce the AVR that created the IL. As an aside, it’s interesting to note that for as much gobbledygoop as seems to be presented with the IL rendering, a gifted IL coder can read that code as well as you or I can read AVR code. I can remember many times in the early days of .NET when one of our R&D members would spin up ILDASM to examine the IL that the AVR compiler produced. Those brainiacs can read, and make sense out of, IL in their sleep!
Because ILDASM’s feature set is limited, many alternatives have surfaced over the years. The granddaddy of them all is a product called Reflector. Lutz Roeder wrote Reflector many years ago and it was a free download. Redgate acquired Reflector two or three years ago and despite initial promises that it would continue to be freely distributed, now charges for it. .NET developers didn’t take kindly to this capitalistic move and the result is that there are now several freely available alternatives to Reflector. These include ILSpy and dotPeek, but there are many others.
Here is a screen shot of dotPeek showing the contents of an AVR assembly. Down the left-hand side is a listing of all of the classes and their members in the assembly and on the right is the C# source code. dotPeek disassembled the underlying IL into this C#.
The full source for the dotPeek-generated C# version of the HasMoreRowsForward() function is shown below. The disassembler included with .NET, ILDASM, isn’t as capable as free, third-party alternatives and doesn’t produce C# like this. The original four-line AVR function grew more than four times its original size when expressed in C#. When AVR’s compiler produces the MSIL, it has RPG-like abstractions built into it. The SETLL operation in AVR, which only requires one line of code, needs about five or six lines of C# to do the same thing. So, if you’re desperate and need to get the C# source for the HasMoreRowsForward() function, as shown below, you can indeed do that with a third-party disassembler. But now you have the challenge of translating that C# back into AVR.
private bool HasMoreRowsForward()
DBFile dbFile = this.CustomerByName;
CustomerByName_CustomerByName.RCMMastL2.Key key1 =
string str = this.CustomerByName_CMName;
key1.CMName = str;
Decimal num1 = this.CustomerByName_CMCustNo;
key1.CMCustNo = num1;
int num2 = 2;
key1.KeyPartCount = num2;
string formatName = "*FILE";
CustomerByName_CustomerByName.RCMMastL2.Key key2 =
int readMode = 46;
dbFile.Seek(formatName, (AdgKeyTable) key2, readMode);
Wait! My competitors can read my source?!
Well, sorta. They can’t decompile a .NET assembly to AVR, but they can decompile it to C# pretty easily. This exposure has given rise to a category of products called “obfuscators.” Visual Studio includes the community version of Dotfuscator, a third-party obfuscator. Here is a handy article explaining its use. An obfuscator obscures the intent of your assemblies by removing all meaningful names in the code. Thus, the code can still be decompiled, but it’s much harder to infer much meaning of the decompiled code. If you distribute code and want to minimize the possibility of your source being stolen or otherwise used inappropriately, consider using obfuscator on your assemblies before deployment. Java and any other language that compiles to an intermediate language all suffer from this exposure.
A good disassembler/assembly browser is a great tool to have in your .NET toolbox. You can use it to confirm version dependencies, prove the visibility of members in a class, examine compiler-generated code, and better understand inheritance hierarchies. I’ve used a disassembler several times to prove that a member isn’t public–contrary to the programmer’s insistence otherwise. As you can see by the source code produced, unless you are very desperate, that source isn’t going to be good for much.
Stanley suggested in his comments that ASNA should provide a decompiler. The limited appeal of such a feature well as the effort such a feature would require (not to mention the contingent liability of it all), conspire to keep that feature from our AVR roadmap. The bottom line for you: make good backups and please, please, consider backing up you code periodically and using a version control system!