What's the difference between "scripting languages" (like Perl or Tcl) and "general purpose languages" (like C++, Java, or LISP)? For that matter, how do you define the terms "scripting language" and "general purpose language"?
Most people who address this topic [Ousterhout] say things like:
- general purpose languages require you to declare your variables and give them explicit types; scripting languages don't
- general purpose languages are compiled; scripting languages are interpreted
- general purpose languages allow for complex user-defined types (structs, unions, classes); scripting languages only provide simple types (everything's a string, or everything's a list)
But these are missing the point - they're missing the real differences between current general purpose and scripting languages.
For purposes of this article, I'll define scripting language simply as any programming language used to write scripts. Scripts are short programs, usually written quickly to solve some task at hand.
You might notice that those definitions are awfully vague. That's because there isn't actually a hard boundary between scripts on the one hand, and "real programs" or "applications", on the other. In reality, there is a continuum. Many scripts start out small and grow up to become full-fledged applications. Developers have been known to write large programs using what most programmers would consider to be a scripting language. Stallman points out that extensions (which are sort of like scripts) can often be "large, complex programs in their own right" [TclWar].
In summary, I think it's more interesting to look at the features that are unique to languages generally considered to be scripting languages (which I will shorten to simply "scripting languages"), and also features peculiar to general purpose languages.
Scripting Language Features
What is it that makes it easier and faster to write code in a scripting language? There are are several features involved. A language which has most (or even many) of these features will generally "feel like" a scripting language. A language which has few (or none) of them will feel like a general purpose language.
Interpreter. An interpreter makes it fast and easy to run a
program. Just type "
perl foo.pl" and off it goes.
Compare this to the typical compiler scenario. First, you have to
compile everything: "
gcc -g -O -Wall -c foo.c" (repeat
for each module). Then link the program: "
gcc -o foo foo.o
... -l...". And finally, run it: "
Native Complex Types. Scripting languages tend to provide
native string, list, and dictionary (aka hash or a-list) types. These
types can be implemented in pretty much any language - the point here
is that scripting languages provide good native
implementations. You should be able to create a string, list, or
dictionary with a minimum of extra syntax. For example, "
3)" creates a list (note: no explicit function calls). You
should also be able to concatenate two strings or two lists with a
builtin operator (again, with no explicit function calls).
Garbage Collection. Garbage collection relieves the programmer of the duty of keeping track of which objects need to be freed. It also makes memory leaks less likely (certain types of memory leaks anyway; and to be fair, it creates the potential for some nasty new kinds of memory leaks).
This table lists four general purpose languages on the left (C, C++, Java, Common Lisp) and two scripting languages on the right (Tcl, Perl). The rows are the features described above, and the table entries are "yes" or "no", indicating whether each language has that feature. As you'd expect, the scripting languages have lots of "yes"es, the general purpose languages have lots of "no"s.
|Native Complex Types||no||no||no||yes||no||yes|
Some notes on the table:
- C++ with the STL has complex types, but doesn't provide good, "native" syntax for them. Ditto for Java with respect to lists and dictionaries. (Java has reasonable syntax for strings.) Tcl has strings and lists, but not dictionaries.
- Common Lisp has the complex data types, although it's a little hard to talk about native operators without function calls, since everything in Lisp is an explicit function call.
- Java and Common Lisp are often interpreted rather than compiled, but don't provide the one-step "run my program" feature, which is what I mean here.
- Tcl and Perl do GC using reference-counting. One can argue (and I'd tend to agree) that this isn't real garbage collection, but the important point here is that the programmer doesn't have to deal with it.
General Purpose Language Features
There are several interesting programming language features missing from that list. These are things which are usually associated with general purpose languages.
Compiler. Compilers make code run faster (all else being equal, i.e., given that we're comparing a compiler and interpreter for the same language). They also allow you to generate standalone executables.
Structs. Structs (aka objects or records) are a generic building block for user-defined types. Languages without structs generally end up forcing the programmer to make everything look like a list (or a string, or whatever the language happens to provide).
Compile-Time Type Checking. A language that requires typed variables and does a certain amount of compile-time checking will catch programmer errors earlier, and save on debug time. Explicit types also make it easier to implement polymorphic functions. ("Explicit" is something of a fuzzy concept here - SML does compile-time type checking without requiring variables to be typed. The important thing here is the checking.)
Here is the continuation of the previous table, listing the general purpose language features. This time, the "yes"es are on the left side of the table.
|Compile-Time Type Checking||yes||yes||yes||no||no||no|
Why Not Take Them All?
There's absolutely no reason that all of the features listed here - the so-called scripting language features, as well as the general purpose language features - can't be incorporated into one programming language. Such a language could (with the appropriate development and runtime tools) be used as both a scripting language and a general purpose language. In fact, this hypothetical language could beat out existing languages (in both categories) for many tasks.
Note that all of the features discussed here are orthogonal to various other debates. For example, both imperative and functional languages could be designed with all of these features - and this wouldn't change the relative advantages and disadvantages that functional languages have compared to imperative languages.
The hypothetical language would have both a compiler and an interpreter. The interpreter would serve to run shorter programs ("scripts") as well as to do quick testing of larger programs. The compiler would be used to generate higher-performance, standalone executables. As an aside, the compiler would ideally incorporate some features currently more common in interpreters:
- the ability to locate modules (set PERLLIB once, rather than having to add several -I flags to the compiler command line)
- automatic dependence resolution (figuring out what needs to be recompiled)
- figuring out which libraries to link against (I've already typed "#include <X11/Xlib.h>" - why do I also have to type "-lX11"?)
Native complex types would be just as useful in general purpose languages as they are in scripting languages. C++ programmers end up using awkward STL types for lists and dictionaries. It wouldn't be that hard to incorporate these into the language in a syntactically clean way. (Various general purpose languages already get this mostly right - Lisp and SML come to mind here.)
Structs (records, objects, whatever you want to call them) would be very useful in a scripting language. Perl has made a kludgey attempt to add them with its pseudohashes and the fields pragma. As has been pointed out before, one should never design a language on the assumption that people will only write short scripts in it [TclWar]. Any language that becomes at all useful (whether because it's a good language or because it's embedded in an otherwise good application) will get used to write ever larger programs.
Garbage collection is useful. There really shouldn't be argument over this anymore. (There are clearly some applications where GC cannot or should not be used - these applications should be able to disable it, or perhaps use a more appropriate language.) Even the performance issue can be solved [Appel].
Compile-time type checking is also pretty clearly a good thing. In general purpose languages, it's generally agreed that type checking catches bugs earlier and ends up saving debug time. Why would anyone not want this for a scripting language too? Granted, it adds a few seconds to the time it takes to write a function, but the payoff in debug time is well worth it.
Someone needs to design a language that gets all of these things right. I haven't seen anyone attempt this yet. Even fairly recent languages like Java and C# don't address all of these issues. (For example, they provide garbage collection, but don't do so well with complex types or an interpreter.)
[Appel] Andrew W. Appel, "Garbage Collection Can Be Faster Than Stack
Allocation", Information Processing Letters 25(4):275-279, 17 June
[Ousterhout] John K. Ousterhout, "Scripting: Higher Level Programming
for the 21st Century", IEEE Computer, March 1998.
[TclWar] Richard Stallman, "Why you should not use Tcl" (and various
followup posts), September 1994.