Saturday, April 9. 2011
We all say that we want our applications to be "portable", but what do we mean by that? Usually, we're talking about some sort of practical portability: "My code will work anywhere I try to use it." This vague notion of portability can lead to many issues. For example, for someone used to the unixy world, "portable" probably means that it builds and runs on most Linux distributions and *BSDs. For someone in the Windows world, it likely means that the binary they ship will run on all versions of Windows after a certain version, for any hardware that Windows supports (for the moment--this may change with the upcoming ARM support in Windows 8 and NVidia's Project Denver). Why do we have such limited notions of portability? Wouldn't it be ideal if we could ship our program, and have a person using any OS, with any underlying hardware be able to use it? Past attempts to solve this problem all have various drawbacks. For the purposes of this post, I am defining portability to mean:
Code RepresentationWhen we send a program to another computers, there are a number of forms it could take. The most common at the moment is machine code packed into an object file to allow linking on the target host. However, this scheme is unable to deal with diverse architectures or operating systems. A number of mechanisms have been explored to deal with this, but have generally received only niche adoption.
Fat BinariesThis is a name generally given to the scheme wherein one extends the object file format to be able to contain multiple versions of the compiled code, one for each of a number of platforms that the developer wishes to support.
One example modern system which used this technique extensively is OS X. During their transition from PPC to X86, they wanted the users to not really have to think about which type of machine they had. The solution was to have everything, from the OS interface libraries down to the individual applications, ship as a fat binary. They additionally added a PPC interpreter to allow legacy applications to continue to function, but that technique will be explored in another section. They used this technique again for their transition from 32-bit to 64-bit within the x86/x86_64 architecture, simply shipping both versions inside an object file that had extensions to explain what to do.
The benefits of this system are fairly clear: Users see one binary, the developers just need to build twice, and minimal modification needs to be made to the target OS's toolchain and loader in order to accomodate this. However, a number of tricky issues show up.
Firstly, this means that miscompiled or malicious binaries may do different things on different architectures. This leads to both security issues (require running the binary under ever architecture it supports to detect bad behavior) and to support issues, in that the same shipped product may act differently on different systems, where the user will not be expecting it.
Another weakness is in the requirement on the program developers to predict all architectures the user may wish to use. This is not as much as a problem at the moment, with a relatively low number of architectures dominating the market, and was even less of an issue for Apple, who controlled all hardware released, but could become more of a problem in the future. Specifically, hitting a somewhat fundamental speed barrier with the current designs of x86 style chips means other architecture implementations can begin to catch up in performance, which may lead to a larger number of architectures available to the user. Even if we assume that x86_64 will dominate the market for years to come, Intel and AMD regularly add special instructions which allow for increased performance. Using these under this scheme requires some developer knowledge of what chips his users are using in order to build each optimized version.
Additionally, fat binaries suffer from a file size problem. This does not seem so bad for most applications, as their .text section is usually somewhat limited, and most deployments of fat binaries at the moment only support 2 or 3 architectures. However, if you want to support different generations of an architecture, or many architectures, this rapidly spirals out of control. Additionally, all of the libraries that program depends on are also multiplied in size, which makes the problem far worse, as they tend to be composed only of text, and have rather large .text sections. For reference, I have about 4G of libraries on the machine I am writing this on. While this may not seem like a particularly big deal (hard drive space is cheap), it can make a difference in network based execution, such as when executing off a network filesystem, or via a system similar to NaCl.
Finally, due to throwing extra seeks into execution, they can slow down program load time if the program loaded quickly to begin with. Finally, most OS vendors are strongly attached to a particular object format (Windows has WinPE, OS X has Mach-O, Linux has ELF) and it would be difficult to get them to cooperate and agree on a particular format. Reversing remains hard for this format, and I will visit upon the importance of this later.
BytecodeMost of you are likely familiar with the bytecode technique for portability, and may have even thought of it first when I mentioned portability. The idea here is to design a simple low level instruction set upon which your application can be based, then write a VM/interpreter which can execute it on each system you choose to support. The most famous use of bytecode is in the Java Runtime Environment. Most legacy phones used this for games, plugins which could run it were used as the execution environment of choice for applications and so called "rich content" delivered over the web (until Flash usurped this position, though it also uses a bytecode based execution model), and some desktop applications still use it today, especially in realms where neither a clean, designer grade UI nor performance is not vital, but portability is a must. Bytecode is also used in Flash, the Dalvik Java VM (an alternative Java target used for android phones), the OCaml runtime (though a native version is also available), and the PPC emulation used by Apple during their architecture switch can be seen as a form of bytecode, in that the system could not execute it natively, but treated it as an acceptable instruction set.
The simple benefits of this model are not requiring the application developer to guess what machines people will want to run the code on, platform equivalence (given a correctly implemented runtime), and smaller code size than fat binaries (though these will still usually be larger than native binaries unless a particularly clever bytecode format is in use).
The possibility of the bytecode containing additional information beyond the actual instructions to be executed is also possible. A simple example would be tagged types in the bytecode. This would make no sense to a CPU, but might be of use to an interpreter (for instance in garbage collection). This makes writing analysis tools for a language with appropriate metadata easier, which is useful for being able to prove various properties of code before running it.
Of course, this method also requires a VM to be written for every target OS/architecture combination in order to actually execute the code. The naive strategy for implementing this can result in huge slowdowns, as many CPU instructions become required for every bytecode instruction. Most modern bytecode execution environments use some form of what is called "just in time compilation" frequently referred to as JIT. This allows portions of the bytecode to be transformed into native code, and then executed as such, during runtime. With this advantage, the bytecode approach becomes much faster, but remains slower than the other options.
Reversing usually gets easier with bytecode representations, but this does not have to be the case (e.g. the view of other platform's native code as a bytecode).
SourceThis may have been the simple solution some of you thought of when thinking about different architectures, especially if you are used to the open source world, where getting the bleeding edge version of something usually involves a local build. Simply put, if we use source as an interchange format, then any system which is has a compiler for that language can compile and run it.
This has a number of obvious advantages. We get all of the typing, control flow, and modularity information the programmer originally had, allowing analysis tools or code rewriting schemes to work easily. The author of malicious code has a much harder time hiding the nature of his code from us. Per machine optimizations occur as a matter of course. When the source has been compressed, it is usually smaller than the resulting compiled binary.
The biggest downside to this scheme is that every user must compile it. This may not seem like a big deal for those of you who have used a binary distribution (e.g. Debian, Fedora, RHEL) and compiled only what you need from source, but those who have lived in the Gentoo world have already seen the problem. Compilation of large software takes a lot of CPU time and RAM. When I decide to download Firefox, if after waiting for the download to finish, I then have to wait an additional 10 hours and hope I have the RAM to build the product successfully, there is a serious issue.
This is the easiest to reverse, in that it is essentially already reversed. An obfuscator could be used, but source code level obfuscators are usually straightforward to defeat.
LLVM BitcodeLLVM, or Low Level Virtual Machine, was designed as a generalized compiler backend. Most compilers accomplish their goal by transforming the source through a series of intermediate representations until they are simple enough that they can be transformed to assembly. LLVM seeks to generalize the last few stages of most compilers, in that it provides a standardized compiler IR that it can optimize and then target at any of a large number of platforms. It also has the capability to do JIT when a full form compilation is not required. In order to allow their optimizer to work, the IR has a focus on analyzability and preserving the important structures and information inherent in the program. While they did not intend their bitcode format (the machine form of the compiler IR) as a program interchange format, it would be fairly optimal for this purpose. The LLVM code would be transferred, then if it was suspected to be a one-time load of a low-performance application (such as the role flash plays today) it could be JIT'd as it comes in to allow rapid application start. If it was intended as a more intensive application, or one that would be run many times, it would be compiled on the local machine.
To allow LLVM to do analysis, the bitcode retains typing data, control flow data, and implicit dataflow analysis (due to being in SSA form). While explaining what SSA is and why it is useful is far beyond the scope of this post, just take my word that this is very nice for analysis. All of this makes it much easier to analyze/modify/instrument if necessary.
As the end target for system-resident programs is the regular compiled code format for that particular system, it does not require that the entire system be built or transferred in this way, making it much easier to have compatible systems than with the universal binary or bytecode systems.
The ability to execute in a bytecode-like fashion or compile to native speed is nice, as this means that when receiving a small or low performance application, one need not wait for a full compilation, one can just start executing. At the same time, one can dedicate some CPU time to optimizing the program for a particular machine, enabling the user to take advantage of new features of his CPU without the developers needing to know the architecture even exists.
Compilation times from LLVM bitcode are much less of an issue than with compilation from a language. Most of the expensive work has already been done, just not the machine dependent work. If desired, the onus can be placed on the developer to pre-optimize the bitcode (LLVM can do this already) so that only machine dependent work remains. Of course, it will still take nonzero time if the user wants to get full compiled speed instead of JIT speed.
As with the runtime environment for the bytecode systems and the compilers for source, we need a per-architecture component here. Luckily however, the OS interface required here is minimal, making it much less likely that you really have a complexity of O(Operating Systems * Architectures) in terms of development of the support environment.
This is easier to reverse than machine code, but doesn't just give everything away like source does.
Aside on ReversibilityFrom a user perspective, reversibility is a very good thing. It makes it much easier for a user to be able to tell what's going on on their own system, have someone else audit the code if necessary, modify the behavior if they want, and makes it much easier for automated security tools to enforce various policies on programs.
From a practical standpoint, software vendors are unlikely to be willing to adopt a scheme which is too easy to reverse. Given that they put DRM in their products, and go so far as to lobby for laws making tampering with their DRM illegal, giving the user something as easy to modify as the source (where the DRM check could likely be easily commented out) would be out of the question. They additionally wish to prevent their competitors from seeing how they've solved certain problems so that they cannot use the same techniques.
System InterfaceEvery program has to at some point produce output (and usually take input) to be interesting. If it produces no output, we could simply say that the program had run, and the user would have no way of telling we are lying. So, the program needs a way to interface with the system, which is quite awkward for portability, given that every system has its own interface. There's many ways to approach this problem, and it is partially solved in many modern languages, but nobody really has this done perfectly yet.
Fat Binaries, Part DeuxWe can extend the fat binary strategy to different OSes, rewriting our code to use the interfaces available on each system we want it to be able to run on. While this will work if done perfectly, it makes the codebase much harder to maintain, loses even more of the gaurantee of code having the same effect on different systems, and is another multiplicative factor over the number of architectures. It also has the drawback of being unable to take advantage of new features as they arise. This is really not a serious solution, but is the technique implicitly in use by most multiplatform software today. They write interface code for each platform's system, produce n different builds, and ship n different pieces of software. Things are done this way because it is the simplest solution for porting an application. It has problems in the long run in terms of maintainability, efficiency, and scaling, but if your enterprise simply wants to take its existing Linux-tied product and re-release on Windows, this is going to be the easiest way to do it.
Standard InterfaceAnother solution is to just have all of the systems implement the same interface so that we can write code once, and be done with it. Examples of this in practice are POSIX, OpenGL, and OpenCL. They allow programatic, efficient access to some resource in a particular way, and are cross platform across some number of systems.
Given that the system implementors are also implementing this interface, the developer gets an efficiency boost from the knowledge of the system implementors, who know what is fastest on their platform. They also get to use the newest exports of the OS, as the implementors will use the new features they have added. This takes the previous strategy, where each development group creates their own abstraction layer and maintains it on all platforms, and makes it so that one abstraction layer per platform is maintained, which massively decreases the total amount of work which needs to be done.
However, at the same time, developers cannot have direct access to new features exposed by systems, which prevents them from using new abstractions that may have been produced in the meantime. A spec revision is slow, this yields problems for developers who want efficiency. As an example, whe kqueue and epoll came out, the underlying semantics for the corresponding POSIX functions were slightly different, and so the interface could not use them, although the small semantics changes gave massive performance boosts.
A poorly constructed standard may also fail to provide all things that a program needs to do, making it impossible for a program to communicate exclusively through standards compliant layers. This was a problem which OpenGL faced. It has since solved the problem with a series of extensions to the spec which platforms indicate their support for. This method however is only a stopgap measure, in that you no longer have complete portability, as old systems will be unlikely to support all the extensions needed.
Standards also have the eternal problem of people ignoring the standard or lying about their compliance with it. The GNU Autotools system is almost entirely derived from this problem. It probes for a large number of different idiosyncrasies of nearly-standards-compliant systems in order to allow code to build correctly. This system is largely regarded as a pile of hacks, but necessary for portability. When running build-everything systems, running this tool, or systems like it, takes up a large portion of the build time.
In the real world, POSIX is used a sort of partway measure. It almost never provides all the features you need, but by mostly using that layer, the amount of code that needs to be rewritten for a port is vastly reduced. OpenGL is actually fairly well accepted, and is fully featured enough to be used for mainline development, though it has started to fall behind a little bit relative to the DirectX framework (not portable at all) in recent years. Finally, OpenCL is being largely overlooked in favor of CUDA for a combination of reasons ranging from poor performance to poor abstractions. This too is unfortunate because it results in vendor lock-in.
Language runtime librariesThis strategy is a sort of medium between the standardized system interface strategy, and having everyone write their own abstraction layers. Here, you have your language group provide abstractions that make sense for your language in interacting with the system. This can be far more nimble in responding to the needs of their community than a standard interface can, as a library needs to be re-implemented rather than a part of a core system. They are also characterized by a higher level interface than one might see in a standard. For example, message passing between threads might be provided in a runtime library, whereas signals, shared memory, and locks might be provided in a standard. This is in large part because building appropriate abstractions for a language frequently require the use of things like this.
Any place your language was able to run, your language will be able to talk to the system. This provides fairly good portability, though it of course requires per-system action on the part of the language implementors. Additionally, this allows easy tweaking of certain operations for certain platforms, allowing for the types of optimizations we saw in the standard interface version. However, the ability to do fast revisions of the interface library also allows contraints to be relaxed or tightened where appropriate to allow the use of new developments in lower layers.
This is also good for portability from the perspective of making sure the programmer isn't thinking about the underlying machine as much. A well designed interface layer in a language will not give much indication of what's going underneath. This helps prevent the programmer from making assumptions he shouldn't. For example, if the way threading done is sufficiently abstract, the programmer won't have "pthreads" in the back of his mind as he writes his code, he will simply be conforming to the language threading interface.
Of course, this still leaves some things to be desired. There are some operations that are done in different ways on every system, and are not even available on some. A simple example is the tracing facilities provided by Linux's ptrace, Solaris/OS X's dtrace, and Windows tracing events (via ETW). They all work just a little bit differently, and as a result, most languages do not have an OS agnostic tracing facility. This is not to say it isn't possible. It is. However, the interfaces provided are so dissimilar that an operation which would be idiomatic in one could be quite expensive in another.
What really needs to be done is a lifting of system interfaces as a whole, likely from the language runtime position, to describe what you want to do, rather than how to do it. For example, a tracing library in which you provide hooks to be triggered on syscalls, whether or not to allow syscalls to occur, the ability to set and remove breakpoints, perform register access, and perform remote memory access would work across all interfaces and cover most use cases of the lower level libraries. In Windows, the tracing is structured as event handlers. Under linux with ptrace, this is structured as a loop that gets unblocked when certain conditions in the traced process are met (with the process becoming blocked at the same time). How the code is written should be idiomatic to the language, with a translation to the appropiate underlying structure. Another example comes from file IO. An mmaped region, a file descriptor, a file stream, and a buffer containing the contents of a file are all valid ways to represent a file which was opened for reading. Really, the programmer should be unaware of which is occuring. The programmer should instead simply be deciding that a file is to be read and going from there. However, care must be taken to be sure that the details which are being abstracted can be dealt with in some way. Haskell has run into this recently with their lazy "readFile" function. This allows them to lazily access the contents of a file, with the semantics that data will not be read from the file until it is accessed in the resulting stream. This lead to problematic semantics for them, in which the IO manager would run out of file descriptors, but would have no legal way of regaining them, as it could not assume the files were finite, and so is not allowed to clean up after itself. (There are other issues with the current form of lazy IO, but those are not the subject of this post.)
In summary, interfaces should be much higher level than they are now, be idiomatic to languages, and care must be taken to be sure that your runtime can gracefully provide the facilities you wish to offer.
User InterfaceThis is probably the hardest problem to solve, but in my opinion also the least interesting. The simple issue here is that we want the user interface to look like it belongs in the system, and interact properly, but also want to avoid complexity and have it run on all systems. Different systems have different interface paradigms and different ways of rendering and hooking in to UIs.
Write N InterfacesThe easiest solution, and the one done by high end companies when they port their software at the moment, is to write a UI for each operating system by hand to feel right on that system. This is pretty much the simplest way of dealing with it from an organization's perspective, as it uses only skills which are widely available.
On the good side, this causes everything to look and feel right to the user. It can use native UI paradigms and interact well with other native applications, as it is essentially native. This means that you will have a fairly polished finished product, and your users are unlikely to take issue with you using this method.
Unfortunately, this means a lot of extra coding time. Writing a good UI is nontrivial, and will need to be done for each targeted system. Additionally, they have to predict all systems their user might want to use. This isn't a super big issue today, as there are a low number of them, but many groups fail to predict the desire to use a gtk or qt based paradigm even today. Finally, you run into inconsistency problems, where it is not necessarily easy for users of the appliction to use it on multiple systems, as it may be completely different. This also makes testing and debugging scale awkwardly.
System Independent UIIn the vein of the strategies tried earlier, the next less naive approach is just to make a library that renders the same thing on all platforms. This allows platforms to essentially opt in to supporting your class of applications rather than requiring the developer to add support for each platform. The most famous example of one of these is Java's Swing UI library. It well embodies both the advantages and disadvantages, and is a good example of this strategy carried out well.
This means that it will look and act the same on every system, leading to easier debugging, testing, and support. It also decreases the complexity of the development, as only one UI team is needed rather than 3 or 4. Finally, it allows new platforms to be able to run your application without you needing to explicitly add support for them.
However, as any developer using Swing will tell you, the users do not usually like this decision. It will always look out of place, no matter which system they are on. It will frequently fail to match the interface paradigm (e.g. the file open box on Macs, Linux, and Swing all use different dialog modes). Finally, its support for interaction with native apps (e.g. in drag+drop) tends to be rather poor. Finally, this type of library can be tricky to write, especially if the underlying platform does not export the ability to directly draw onto the screen in the window manager.
Mimic the systemAnother strategy is to describe a user interface, then use the local widgets, dialog boxes, etc. where possible. The most common example of this is wxWidgets. While this is nice for code complexity, and seems like it could be a good idea, there are reasons this is not usually used in practice. The layout will now always cause the widgets to look oddly aligned, as the way that packing and layout is done for differently shaped widgets is always different. Additionally, you can only use the minimum of the features that exist on all the target system, or the widgets will not be selectable. Finally, you'll fail to preserve second order interface paradigms (e.g. interfaces which depend on each other.)
Descriptive SystemThis may be hard, or even impossible, but ideally, it would be nice if a system similar to LaTeX for UIs were available. The idea is to describe what you have (e.g. sections, data to be displayed, a type of value you want to get in). Hopefully, this would allow the layout to be chosen differently for each system, allowing it to "fit in" everywhere. Generalized input could potentially allow for application interaction on certain platforms, while not on others. Additionally, this style of design would allow more semantic meaning in the UI code, possibly allowing more automatic manual generation. Of course, no such system exists, and producing it would need designer tier knowledge of all accepted interface paradigms, a fancy layout compiler (in that it would be transforming from input layout to the layout engine of the target), and PL work to create a language for UI description that would make sense to use. Finally, real applications would likely still fall back to platform specific implementaitons to get minor tweaks to make things look just a little bit better.
(Page 1 of 1, totaling 1 entries)
Syndicate This Blog