Interpretation Versus Compilation
This is part of a larger series known as “How To Program Anything: Core Rulebook“
But I digress right out of the gate. One of the key characteristics of PHP, as it currently stands, is that it is an interpreted language. On the other hand, something like C (which at the time of this writing I’m currently covering elsewhere) was designed to be a compiled language. What does that mean?
You will find some individuals who consider whether a language is usually interpreted or compiled split languages into these categories. However, the truth of the matter is quite the opposite. Whether a language is compiled or interpreted is actually an independent choice from the nature of the language itself. Any language can be interpreted by what is known as an interpreter, or compiled by what is known as a compiler. Before I go on to much further I must explain these terms, lest I start not making sense.
The Lifecycle of a Program
In the use of any programming language there is a certain lifecycle. You write the program, then you run the program, find errors, and debug the program; thus writing more of the program, scrub, rinse, repeat. What we’re concerned with today is the “run” part of the program.
When you write a program your ultimate intention is to have the instructions, constraints, or what have you that you outline and detail in that program be processed by the computer. As I’ve covered in previous tutorials, a computer comes down to the processor which goes step by step executing instructions encoded in binary that are stored in memory. How do we get from you typing “a = 3;” to these specific encoded instructions that the processor can understand?
We do that through what is known as the compilation process. There is a special piece of software known as a compiler that takes as input the program that you have written. It then parses and steps through each part of the program and constructs assembly and machine code specific to the intended processor from those steps. This is often called object code. In the interim in this space there is the linker which takes various parts of the program that have been turned into object code separately and ties them all together into one executable, that being a file or application on the target computer that can be run and executed for the desired result. Here is a diagram of the process:
The finished piece from this process is the executable. It is basically a string of machine code, specific bits, 1s and 0s, that make sense to the specific computer’s processor. When you run, or tell the computer to execute, this executable it takes the first instruction straight from it, unfiltered or translated, and starts off the program and follows the instructions to the letter without any additional translation. That’s the key characteristic of the compilation process, it’s result, being the executable, does not have to be any further translated or filtered in order for the processor to take the first instruction and go full steam ahead with it.
An astute reader may ask, where did the compiler come from? Well, somebody programmed it. Programming compilers is a highly technical and can be very complex, and in fact the first compilers were written directly in machine code or using assembly. But the aim of the compiler is clear: to translate an input program into executable machine code for a particular processor.
Some programming languages were designed with this process, that of compilation, in mind. C for instance was designed to empower programmers with its ease and expressiveness, but ultimately was designed in such a way as to be easily translated into assembly or machine code. Not all languages had this in mind in their conception. For example Java was meant to run in a sort of “interpretive” environment, and Python as another example was always meant to be interpreted.
There is another method to getting a program to execute on a given computer. The alternative to compilation is interpretation. The biggest difference between a compiler and an interpreter is in how it relates to the input program. A compiler, as we’ve seen above, takes the entire program and converts it into object code, basically straight machine code the processor understands. An interpreter on the other hand is an executable that reads in your program step-by-step (or however much it needs to read in) and then processes, immediately executing, those particular instructions. In other words, it takes your program step by step and immediately performs those steps them as part of its own executable. This means there is no object code to hand to the processor, in a manner of speaking, the interpreter is the object code, built in such a way as to be called upon when the time comes.
This breaks up the “run” part of the lifecycle that was diagrammed above. In fact, we have a new diagram:
Here we can see that unlike the compiler, which we could disregard once our program is converted into a machine code executable, we must have the interpreter program on hand to invoke so as to run our program. In a way, the interpreter becomes the processor. Programs that are written to be interpreted are commonly called “scripts” because they in a way are scripts another program is following, as opposed to direct machine code.
For example, this is how languages like Python traditionally work. You write a program using the rules and such of the Python programming language, and then when you’re ready to execute it, you input it into the Python interpreter which subsequently does all the steps you’ve outlined. On the command line you might write something like this:
Here, Python is the executable you are running, which is then in turn inputting whatever is in myprogram.py file and performing its instructions. You couldn’t expect the computer to run myprogram.py without running python . This is because myprogram.py isn’t object code, it isn’t machine code the processor understands.As mentioned above, it is possible to compile Python programs to object or machine code and run them directly on the processor, but that usually involves compiling and including the entire Python interpreter with it.
It may seem silly at first glance to do this extra step or require this extra program, but there are benefits to doing it this way. I’ll explore a few of them in the following sections.
The Nature of an Interpreter
Interpreters can be built many different ways. There are interpreters that read in the source program (the one you’ve written) and don’t do any additional processing, they simply encounter strings of characters at a time and perform the given action. However, some interpreters do a certain bit of compiling of their own, but usually into what is traditionally called a byte code that only makes sense to the interpreter. It’s a sort of pseudo-machine language that only the interpreter would understand. Interpreters do this for a number of reasons: it’s faster to process and it’s easier to write an executor (the part of the interpreter that performs the actions) that reads a byte code as opposed to the source input.
However, there are intrepreters that make this kind of byte code a much larger deal than simply a means of performing actions swiftly. For example, the Java programming language has traditionally “run” on what is known as a virtual machine. A virtual machine is in essence an executable or part of a program that reads a specific byte code and emulates the workings of a processor, processing that byte code as if the computer’s processor was the virtual processor. If you’ve ever used an emulator you are actually quite familiar with this prospect. Let us say I have an emulator for a NIntendo Entertainment System. When I load in a ROM file, such as say Dragon Warrior, it is formatted in a machine code that only the NES processor understands. But if I build a fake processor, that is a processor in abstract terms, in programming, that interprets that byte code while running on another processor, I can run Dragon Warrior on any machine for which I can compile the emulator.
That’s powerful. And that’s exactly the concept Java takes advantage of, and that all interpreters take advantage of: if I can compile the interpreter, or the virtual machine executable on a given machine/processor, than any code written for that interpreter or virtual machine will run on that machine. Thus, the infamous, “write once, run anywhere.” Any processor for which I successfully build an interpreter/emulator is a candidate for running my interpreted programs/byte code. That is the biggest benefit of an interpreter over a compiler.
Pros and Cons
The biggest pro for the compilation process is speed. Being able to compile any target language to a machine code that the actual processor of the machine can understand eliminates any intermediary code, translation, or filter. By eliminating all of that (basically everything an interpreter does) we gain computational speed. We can do deliberate useful computations without any extra steps, thus increasing the number of them possible in a given timespan.
However, the largest con for the compilation process is specificity. When you compile a program to run on a particular processor, you are creating object code that will only run on that processor. If you want a program to run on another machine, say an ARM processor as opposed to an Intel 8086 based processor, you have to re-compile for that processor, and unfortunately, sometimes recompilation can be an arduous process if that your new processor has limitations or idiosyncrasies not present in the first.
The biggest pro for the interpretation business is flexibility. Not only can you run an interpreted program on any processor or platform for which that interpreter has been compiled, but the very way an interpreter is written can offer additional flexibility. Because in some case, in my opinion, interpreters are easier to understand and code than compilers (I think this is due to the fact that the action to be performed is so close to the program being input) it is often times easier to re-implement interpreters, add additional features, toy with implementations of things like garbage collectors, and otherwise extend the language. This is a side-benefit of interpreters along side the larger benefit of “write once, run anywhere.”
Another benefit of interpreters is that they can be more easily rewritten or re-compiled to future platforms. Writing a compiler for a processor that’s added a bunch of features, or is entirely different than what’s come before it is very difficult, but once that compiler is written you can then compile a bunch of interpreters and voila, a forward-looking language. You don’t have to re-implement an interpreter at a foundational level because a processor has changed.
However, the biggest drawback for interpreters is speed. With every program there’s so much translation, filtering, and byte code compiling going on that it slows down and gets in the way of the actual core deliberate computations. This is a huge concern for real-time specific applications such as high end games, simulations, or sensory responding robotics. Some interpreters have things called just-in-time compilers (JIT) that compile the program right before it executes, but these are special and above and beyond interpreter building. However, as processors become more and more powerful this becomes less of a concern.
Remember what I said in the preface, a language is independent of whether its interpreted or compiled. However, with that in mind, some languages are specifically designed to be compiled such as C, while other languages were always meant to be interpreted such as Java.
To me, it doesn’t really matter whether something is compiled or interpreted, as long as it can get the job done with the least amount of stress. Some systems don’t really offer the technical requirements to effectively (not theoretically) run interpreters such as some smaller microcontrollers or embeddable processors, and thus you must program them with something directly compile-able such as C (something I’m doing in a current project as I write this). Sometimes you want to do something so computationally intensive as fast as possible too, such as accurate voice recognition of a robot and that speed requirement can become an issue. In other cases, speed or computational power may not be as big of an issue, and writing something like a natural language processor might be easier in something like Python or R, which are definitely interpreted. Or you might find yourself in an ecosystem where the current tools are so powerful, such as the back-end systems of PHP or Node.js, that you practically required to use them regardless of whether they’re interpreted or not.
You tell me though, which would you prefer, interpreted or compiled? Does the thought of having extra computations hanging around what you’re trying to get done make you crazy? Or does having to worry about low-level processor details drive you into tedium you’d rather the computer handle all of that? Thanks for reading!
This is part of a larger series known as “How To Program Anything: Core Rulebook“
But if a monthly commitment is a bit much, I get it, you might consider buying me a coffee.