At the beginnning of the Spring semester 2020, I had established a clear research direction, directed at what role aesthetics have in the process of understanding source code. While the definition of aesthetics upon which this research relies is based on aesthetics as a physical manifestation which can be grasped by the senses. The limitation of this starting point is justified mainly by the object of this study. By approaching source code as an object (or, rather, as multiplicity of objects, “texts” written and read), rather than as a concept, I therefore put its graspable aspects in the foreground. While the social, cultural, intellectual and emotional components are still significant in the appreciation of beauty in source code, the comparative lack of close examination of how code is written is the justification for such a definition of aesthetics.
What still needed to be defined, however, was the meaning of “understanding”. The work conducted this semester has therefore focused on the gathering and examination of the corpus of source code texts, along with the accompanying explanations, justifications and overall meta-texts, in order to find out how are references to “beauty” and “understanding” made. Amongst the vast majority of the corpus elements, practicioners tend to present or discuss a piece of source code which they consider “beatiful”, “aesthetically pleasing”, and accompany this presentation with justifications about how to make a piece of code beautiful and/or why make a piece of code beautiful, and it is these discourses that are used to elaborate on what source code aesthetics look like in practice, as well as what kind of role they play in the life of source code text. This process has also led to the constitution of an initial set of aesthetic properties that are repeatedly highlighted by a certain sub-set of practictioners. In effect, the group of those who write and read source code is far from being homogeneous, and can actually be grouped into at least three distinct categories: computer science, computational science and software development1. While additional sources establish their own distinctions234, the multiplicity of contexts within which code is written leaves litte doubt. Leaving aside a thorough defintion of each of these, I’ve identified five main categories of individuals writing and reading source code, which I group under the umbrella term code practicioners. These categories include: computer scientist, software engineer, hacker, amateur/student and artist. These categories intend to provide heuristics, rather than strict definitions, and each of these categories can overlap within one individual or group of individuals. The sub-sets of practicioners examined so far include software engineers and artists—and thanks to the aforementioned overlap of categories, I am making the hypothesis that the initial findings made through the comparison of how aesthetics are conceived of by engineers and artists will be reinforced and further qualified during the examination of how the remaining categories (scientists, hackers, amateurs)5.
The vast majority of code written today has been done by software engineers. As previously mentioned, while not the only group of people to write and read code, they are by far the most significant. The appearance of the profession in the late 1950s and early 1960s, emerging from a purely academic or military activity, brought with it a change in discourses relating to how code should be written, most eloquently by E. W. Djikstra 6, along with Knuth 7, Kernighan 8 and Martin 9 amongst others. Since these earlier (1960s) texts focused on defining the practice of software development at a professional level, the inclusion from the get-go of an artistic component (“The Art of Programming”), as well as a cognitive one (“GOTO Statement Considered Harmful”) is a significant indicator that writing software isn’t an exclusively mechanical activity. Despite its title, The Art of Programming doesn’t address what exactly it is that makes writing code an artistic process and source code a beautiful object. Ever since, this question of “beautiful code” has been addressed by members of the profession itself, at various levels: a couple of monographs1011, conferences, academic articles, blog posts and Q&A websites.
The analysis of this corpus has led to mulitple insights for this research project. First, it has helped anchor further what role aesthetics play for one of the categories of software practitioners. These findings complement the statements by more leading figures among software development, and confirm, or qualify the statements of said figures. Second, it has allowed me to constitute a significant part of the taxonomy of the features of beautiful code, based on John Cayley’s distinction between structures, syntaxes and vocabularies12. When offering their opinion on the oft-repeated topic of beautiful code, these comments and explanations of code do not contain uni-dimensional criteria, but rather criteria which can be applied at multiple levels of reading. Some of those tend to relate more to the over-arching design of the code examined while others, closer to our working definition of aesthetics, focus on the specific formal features exhibited by a piece of source code. Cayley’s framework is used here to create a bridge with literature and literary studies without imposing too rigid of a grid (leaving aside, for the time being, concepts such as fiction, account, authorship, literarity, etc.). Third, it has refined the relationship between “aesthetics” and “understanding”. The necessity for code to be understood and the desire for code to be beautiful are often intertwined and highlight how the latter facilitates the latter. While this relationship seems to be an argument in favor of Goodman’s aesthetic theory—aesthetics as a means of communicating concepts—, the hypothesis I have formed here is that there are multiple (and sometimes conflicting) aesthetic criteria among and across software practicioners because there are multiple concepts to be communicated.
“Aesthetics alleviate cognitive pain”13. This excerpt from The Art of Readable Code, presented as an industry manual for professional software developers, sums up the overall sentiment of software developers as surveyed through my corpus analysis. While most code can be understood, sometimes after considerable effort, beautiful code not only bypasses any need for additional commetary, reaching a highly sought-after status of “self-explanatory”. This status for aesthetics to ease understanding of the text also answers one of the early research questions of this thesis regarding the necessity for code to be beautiful. While the existence of beautiful code quickly manifested itself at the beginning of this research, two questions then seemed to arise due to its tight connection to complexity, intelligibility and understandability. First, what and how do aesthetics in code make intelligible? What is made intelligible isn’t exclusively what the program (and, by extension, the computer) does, but can also refer to the knowledge of an existing algorithm, a given idiomacy in a programming language, an architecture of hardware or a practice of reading and writing of fellow programmers. That is, what should be made intelligible is an intent, within a particular context (a socio-technical context composed at least of a writer, a reader, a language and a hardware). How it makes such an action understandable is addressed in the following section, in which I sketch out a typology of aesthetic criteria. These are the backdrop against which a practical, empirical examination of source code can take place. These criteria are established from the analysis of a given community, of which a vast majority of practictioners are part of, and which has consequently informed aesthetic standards in other communities of software practictioners (e.g. obfusctation contests, esolangs, code poetry). Second, do aesthetics in any kind of code always aim at making intelligible? That is, is the aesthetic in the code exclusively transitive, relating to something other than itself, or intransitive, referring only to itself? This enters a discussion on whether or not functionality is an essential part of aesthetics (i.e. “Beauty that you can use”14), and to what extent there are contexts and mediums in which beauty cannot exist without an external aim.
Therefore, there seems to be a necessity for beauty: since programming is an inherently complex activity15, dealing with abstract concepts as raw materials, an aesthetically pleasant piece of source code is therefore an easily-understandable one. In other words, aesthetics enable the development of mental models16, mainly through the process of metaphors, connecting immediate sensual manifestations to abstract, shared ideas17. The constraint here is to communicate what the program is (as a conjunction of what a program does and how a program does it), based on the writer-programmer’s mental model of the problem at hand, to the reader-programmer, by using a symbolic language which is theoretically able to compute any finite problem, while at the same time limited in practice to instruction sets and syntax which doesn’t adequately cover the need for expressing the intent of the work.
Source code thus needs to communicate something beyond itself. This can be what the code does, how it does it, and how it approaches the problem domain. The object, the manner and the context are all the primary focii of the reader; that is, if the code were to be written differently, it would not ultimately harm the purpose of reading it (even though it would still slow down the process of doing so). However remains the component of the skill of a given writer. Just because a significant part of code can be considered a semantic interface which should ultimately become as invisible as possible18 (self-explanatory, it allows the reader to get directly to the problem, without stumbling on the syntax), the aesthetic nature of a code-text can be a testament to the skill of a writer; as the number of lines of code diminish, each of them become more and more essential19. Throughout the corpus analysis, various references to Antoine de Saint-Éxupéry’s quote appear regularly: “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”. Beyond highlighting a desire for a literary connection, this citation also hints at the iterative process of writing code: adding, then removing (possibly in a loop).
Such a process is echoed in the practices of craftsmanship, rather than science, and it seems necessary not to oversee that connection, and to ask what are the relationships between the aesthetic criteria of craftsmanship the aesthetic criteria of source code? Indeed, several20 authors21 have alluded at programming as a craft22, but haven’t focused specifically on the parallels in form. Some of these criteria include clarity, mastery, cooperativity and utility and will be further explored in this research. Particularly, the criteria of mastery excludes, for some, any code-text which doesn’t solve an interesting problem (e.g. “a user-login form cannot be beautiful”). Particularly, deeply-engrained mottos and heuristics such as DRY (Do Not Repeat Yourself), KISS (Keep It Simple, Stupid) and SOLID23 act as essential heuristics for writing software in a professional context and underpin all other aesthetic standards in this community of practicioners.
As mentioned above, the corpus of textbooks, essays, online blog posts and comments addressing what makes source code beautiful or aesthetically pleasing has been analyzed according to three main categories. These categories of structure, syntax and vocabulary are all related to formal manifestations in source code. Structure is defined by the relative location of a particular statement within the broader context of the code-text, as well as the groupings of particular statements in relation to each other and in relation to the other groups of statements within the code-text. This also includes questions of formatting, indenting and linting as purely pattern-based formal arrangements. Syntax is defined by the local arrangement of tokens within a statement, including control-flow statements (and therefore not restricted to single-line statements). It also includes language-specific choices, referred to as idioms, and generally the kind of statements needed to achieve the kind of task required (e.g. using an Array
or a struct
). Finally, the vocabulary refers to the user-defined elements of the source code, in the form of variables, functions, classes and interfaces. Unlike the two precedent categories, this is the only where the writer can come up with new tokens.
Structure, as the highest-level group of criteria, is both easy to grasp and somewhat superficial: most of the criteria which compose it are indicators and not proof of beautiful code, indeed necessary, but not sufficient. Indeed, it can be separated between surface-structure, and deep-structure. The criteria for beauty in surface-structure is layout, as the spatial organization of statements, through the use of line breaks and indentations. While serving additional ends towards understanding, proper layout (whether according to conventions, or deliberately positioning themselves against these conventions) seems to be the first requirement for beautiful code. In terms of aiding undertsanding, blank space creates semantic groupings which enable the reader to understand, at a glance, what the decisive moments24 are in the code’s execution, and presented by some as akin to paragraphs in litterature25.
This aid to understanding is further highlighted by a deep-structure paradigm of conceptual distancing: the criteria according to which statements that have to do with each other are located close to each other, when visual appearance reflects the conceptual structure of the code (some argue that the data that the code processes is predates the code itself in dictating its layout). While an over-arching principle, it is vague enough to be open to interpretation by practitioners and is therefore unable to act as a strict normative criteria (e.g. should every code-text follow the stepdown rule of function declaration when writing in a language which doesn’t enforce it? should local variables all be declared at the beginning of the highest scope at which they belong, or at the closest location of their next use? should all data be prepared, and then processed, or should each data be prepared and processed in each of their contexts?). The explicit criteria of local coherence (what is next to each other is related to each other) echoes similar questions about the coherence and consistence in traditional aesthetic theory26. Indeed, proponents of local coherence in source code imply that a beautiful piece of code should not have to rely on input and output and therefore be entirely autotelic. Such an assumption runs contrary to the reality of software development as a practice, and as an object entangled in the world, and therefore could not qualify as an aesthetic criteria which would apply to code bases accross communities of practicioners.
A correlation to conceptual distancing is conceptual symmetry, which states that groups of statement which do the same thing should look the same. It then becomes possible to catche a glimpse of patterns, in which readers know what does what according to a brief overview. Conceptual distancing can be further improved by conceptual uniqueness (unicity?), which demands that all the statements that are grouped together only refer to one single action: complex enough to be useful, and simple enough to be graspable. Following this, then, beautiful code is “the code that does the job while using the least amount of different ideas”, which, according to the DRY principle, implies the ideal of a linear relationship between the number of lines of code and the amount to be understood. This is possibly an aesthetic standard, but it is unclear to what extent it is a sublime standard.
Interestingly, this last statement contradicts a different aesthetic that exists among different software practicioners—hackers. In their case, beautiful code is the code which manages to pack the maximum number of ideas in a minimal amount of lines of code, both in obfuscation practices and in the writing of one-liners. This apparent conflict between clarity and complexity can be resolved in certain codebases, in which the lines of code are few, but the implications are many27.
Syntax, as the mid-level group of criteria, deals most specifically with the two main components of the implementation: the algorithm and the language. Beautiful syntax would denote a conceptual understanding (Knuth states that these understandings are the ones that make writing code an art, and has devoted his magnum opus to the study and communication of algorithms). However, I will argue that algorithms cannot be separated from the language that they are written in. Indeed, most algorithms are expressed first as pseudo-code and then implemented in the language that is most suited to it; this seems to be a contemporary version of the 1950s, when computer scientists would devise those algorithms through pencil and paper, and then leave their implementation at the hands of entirely different individuals—computers.
Beautiful syntax in code responds to this limitation. Since algorithms must be implemented in a certain context, with a certain language, it is the task of the writer to best do so with respect to the language that she is currently working in. Therefore, beautiful syntax is idiomatic syntax. This involves knowing what are the possibilities that a given language offers and, in the spirit of the craftsmanship ethos noted previously, working with the language rather than against it. These sets of aesthetic criteria thus become entirely dependent on the syntactical context of the language itself, and can only be established with regards to each of the language (e.g. knowing which keywords shouldn’t be used, such as unless
in perl, or *
in C, knowing when to use decorators in python, ...
operator in ECMAScript, etc.). Here, syntax also follows the idea of conciseness that has been touched upon at the structure-level: a writer can only be concise if she knows how the language enables her to be concise. Knowing the algorithm and the problem domain will not help to match this criteria. To what extent a syntax is idiomatic syntax is a good indicator of the aesthetic value of a code-text.
It is difficult to establish a hierarchy between separate idioms, since they operate under different paradigms and assumptions. A developer who finds that she can best communicate her ideas according to Java will find Java beautiful. A developer who finds that she can best communicate her ideas while writing in Go will find Go beautiful. This state of affairs seems to be part of the reason as to why online platforms are full of “which language is better?” endless discussions. An syntactical criteria which acts as a responses to these discusssions is consistency. While there might be minor reasons as to why one would want to be calling functions on objects rather than calling functions from objects (in order to prevent output arguments), this minor increase in aesthetic value through display of skill and removal of cognitive noise doesn’t compensate for the possible increase in cognitive noise if those two ways of calling functions are used alternatively in an arbitrary manner. This hints at the possibility that aesthetics here is also a game of tradeoffs.
Beyond the state of syntactic consistency, the question of bringing heuristics from one language to another is yet another aesthetic criteria, that of linguistic reference. Being able to implicitly reference another language in a code-text (e.g. “this is how we do it now that we have C++, but the current code is written in C, so one can bring in ideas and syntax that are native to C++” or “since Ruby can qualify as a Lisp-like language, one can write lambda functions in an otherwise object-oriented language”), a code-switching of sorts, can both communicate a deep understanding of not just a language, but an ecosystem of languages while satisfying the purpose of maintaining clarity. This communicates a feeling of higher-understanding, akin to perceiving all programming languages as ultimately just “tools for the job” and whose ultimate purpose is nonetheless to get a concept across minds as fully and clearly as possible. However, a misguided intention of switching between two languages, or a mis-handled implementation can propel a code-text further down the gradient of ugliness. The concept communicated would in such a case be obscured by the conflicting idioms (e.g. writing p5.js—as a JavaScript implementation of a Java-based syntax—within an HTML document forces the co-existence of two distinct syntaxes which are made to cohabit more for purposes of platform-distribution rather than code clarity), reveal of lack of mastery of the unique aspects of the working language, and therefore fail to fulfill the aesthetic criterion of idiomaticity.
Finally, a syntax with high aesthetic value is a syntax which favors natural language reading flow. For instance, of the two alternatives in Ruby: if people.include? person
vs. if person.in? people
, the second one is going to be considered more beautiful than the first one, since it adapts to the readers habit of reading human languages. However, the succintness and clarity of compilable code is not to be sacrificed for the sake of human-like reading qualities. Indeed, a criteria for ugliness in code-text is verbosity, or useless addition of statements without equivalent addition of functionality. This testifies to the precedence of machine-idioms over human-idioms, the former only coming into play if the verbosity of the statement remains the same.
Vocabulary, as the only component in this frameworks which involves words that can be (almost) entirely invented by the writers, is often the most looked at in the literature regarding beautiful code among professional software developers. Aesthetics here deal mostly with beautiful names, and respect for conventional knowledge. It is the level of aesthetic standards which takes into account first and foremost the readership of a given code-text.
Of the two big problems of programming, the most frequent one is naming28. One reason as to why that is might be that naming (as language) is an inherently social activity29 and therefore a name is an utterance which only makes sense when done in the expectation of someone else. The process of creating a variable or function name on one’s own is often more time-consuming when done alone30. Naming, furthermore, aims not just at describing, but at capturing the essence of an object, or of a concept. Flusser sees poetry as the briging-forth that which is conceivable but not yet speakable through its essence in order to make it speakable through prose31. So it can be argued that good, efficient names in code, are those who can communicate the essence of the concept “at hand”, and through this touch of sublime are the key to beautiful code.
On a purely sensory level (visual and auditory), aesthetic criteria related to naming are that of character length and pronounceability. Visually, character length can indicate the relative importance of a named concept within the greater structure of the code-text. Variables with shorter names are variables that are more important, demand more cognitive attention, offer greater intelligibility in comparison with longer variable names, which only need to be “stored in memory” of the reader for a shorter amount of time. These visual cues, again, alleviate cognitive pain when trying to understand code, and therefore, within the aims of source code aesthetics, hold greater aesthetic value when respected. Pronounceability, meanwhile, take into account the basic human action of “speaking into one’s head” and therefore acknowledge the requirement for communicability of source code amongst human readers. This criterion exists in the category of aesthetic criteria which are required, but not sufficient, for beautiful code.
Equally visual, but aesthetically pleasing for a different reason, a typographical one, is the casing of names. Dealing with the constraint that variable names cannot have whitespace characters as part of them, casing has resulted into the establishment of conventions which pre-exists the precise understanding of what a word denotes, by first bringing that word into a category (all-caps denotes a constant, camelCasing denotes a multi-word variable and first-capitalized words indicate classes or interfaces. By using multiple cues (here, typographical, then semantical, as explicited below), casing again helps with understandability. Further more, casing, by its existence as a convention, implies that it exists within a social community of writers and readers, and acknowledges the mutual belonging to that community of both writer and reader to a given community, and turns the code-text from a readerly text further into a writerly one32.
Following these visual, auditory and typographical criteria, an aesthetically-pleasing vocabulary is a vocabulary which strictly names functions as verbs and variables as nouns. In the vein of making a correspondance between machine language and human language, there is here a clear mapping between the two: functions do things and variables are things. If it’s the other way around, while respecting the criteria for consistency, functions as nouns and variables as verbs hints at what it is not, are counter-intuitive and ultimately confusing—confusion which brings ugliness. Furthermore, the noun given to a variable should be a hint towards the concept addressed, and ideally address what it is, how it is used, and why it is present. Each of these three aims aren’t necessarily easily achieved at the same time, but finding one word, which through multiple means, point to the same end, is an aesthetic goal of source code writers. Particularly, limiting the naming to be the answer to only one of those questions (only what, only what, or only why) confuses the reader more than it enlightens her. A beautiful name is a name which differentiates between value (obviuous, decontextualized, and therefore unhelpful, as seen by the general frowning-upon of using magic numbers) and intention, informing the reader not just about the current use, but also about future possible use, in code that is written or yet to be written. So here we see a paradox between direct conceptual relationship between a name and what it denotes, and the multiple meanings that it embodies (its description, its desired immediate behaviour, and its purpose).
While, in the community of software developers, variable names should then have a 1:1 mapping with the object or concept they denote, this isn’t the case in other communities, whether those that rely on obfuscation, in which confusion becomes beautiful, or in poetic code, in which double-meaning brings an additional, different understanding which ultimately enriches the complexity of the reading[^32]. This aesthetic criteria of double-meaning comes from poetry in human languages, in which layered meanings are aesthetically pleasing, because they point to the un-utterable, and as such, the sublime. The way that these two communities address the aesthetic problem of naming will be explored further.
A final aesthetic criterion for vocabularies is the limitation of function arguments. Going back to the structural criterion above of limiting input/output and keeping groups of statements conceputally independent, function arguments solves this requirement at the level of vocabulary, demonstrating in passing the relative porosity of those categories. Indeed, the naming of variables also reveals the pick of adequate data-structures, echoing those who claim that the data on which the code operates can never be ignored, and that beautiful code is code which takes into account that data and communicates it, and its mutations, in the clearest (most intelligible) possible way.
Comments in code do not seem to fall clearly in any of the three categories above. By definition ignored by the compiler/interpreter, comments can be erroneous statements which will persist in an otherwise functional codebase, and are therefore not trusted by experienced, professional software practicioners. In this configuration, comments seem to exist as a compensation for a lack of functional aesthetic exchange. By functional aesthetic exchange I mean an exchange in which a skilled writer is able to be understood by a skilled reader with regards to what is being done and how. If any of these conditions fail (the writer isn’t skilled enough and relies on comments to explain what is going on and how it is happening, or the reader isn’t skilled enough to understand it without comments), then comments are here to remedy to that failure, and therefore are an indicator (but, again, not a proof) of non-beautiful code. For instance, referencing a variable name in a comment is a sure indicator of a message which refers to the what/how of a group of statements and is on the verge of stating the obvious (if not already stating the obivous).
The situation in which comments seem to be tolerated is when they provide contextual information, therefore (re-)anchoring the code in a broader world. For instance, this is achieved by offering an indication as to why such an action is being taken at a particular moment of the code. This particular use of comments seems to bypass the aesthetic criteria of code being self-explanatory, but nonetheless integrates the criteria of code being writable, a piece of code which, by its appearance, invites the reader to contribute to it, to modify it. As such, in an educational setting (from a classroom to an open-source project), comments are welcome, but rarely quoted as criteria for beautiful code, which seems to indicate that the appreciation of beautiful code does require a certain level of skill, even though this requirement isn’t uni-dimensional: skills can exist along multiple, independent axes.
expand from just practicioners
these understandings could be the interplay between theoretical, craft and literature
https://www.americanscientist.org/article/cultures-of-code
https://josephg.com/blog/3-tribes/
https://blog.codinghorror.com/the-two-types-of-programmers/
https://mkdev.me/en/posts/the-three-types-of-programmers
to what extent these categories match what Wittgenstein has called forms of life, socio-cultural contexts of use, underpinned by normative activities (e.g. technical writing)
Notes On Structured Programming
The Art of Programming
Elements of Programming Style
Clean Code
Beautiful Code
Geek Sublime
Cayley, The Code Is Not The Text (Unless It Is The Text)
The Art of Readable Code
Beautiful Code
Djikstra’s luncheon talk
Jay Wright Forrester
Goodman, Languages of Art
Galloway, Interface Effect
Nabokov on Flaubert
Sennett
Djikstra, craftsman vs. scientist
Software Craftsmanship
single responsibility of classes, open/close principle, liskov substitution, interface segregation, dependency inversion
Sennett
Matz
further-research-needed
Doom
(attributed to Phil Karlton), http://karlton.hamilton.com/quotes/index.cgi
Bakhtin
why is it that functions are easier to name than variables?
Flusser, on doubt
Barthes
- **syntax**
- idiomatic usage and practice is a requirement.
- some vocab in languages is haram (`unless` in perl, `*` in C, etc.) (related to vocabulary, except no because it's a `keyword`)
- cross-referencing language knowledge is a tricky subject. using the idiomaticity of one language into another one (using ideas from C++ into C code is good, but writing ruby the way you write java is not good).
- do not reference variable name in comments (redundancy, *DRY*)
- natural language flows (ruby can shine at this, because ruby has multiple ways of doing it: `if people.include? person` vs. `if person.in? people`)
- as similar as possible as natural language while retaining as much as possible the CLARITY of code
- eliminate nesting (one line instead of three)
- ways of calling functions *on* objects or calling functions *from* objects (aka no output arguments)
giving up the understanding in order to understand better (cognitive noise is related to the level of skill) a problem of a lot of research is inventing a new language without inventing new ideas
words as problem solvers (in reference to the book pat found, where the author states that programming is treated as a management problem, and not a mathematical one: but in both cases, we’re still talking about source code (and to what extent is it always the “same” source code?), a unique object/practice/paradigm which enables us to solve problems (and now, to what extent are these problems dependent on source code? or is it because source code (as computation) can solve theoretically any problem? and is that beautiful in itself?))
~~while it’s nice to have an overview of humanities at the beginning, it would be interesting to have reconciliation/reconsideration of those views by the end of this paper~~
relationship to art? yes, but tangential question question to the understanding of beauty? yes, but tangential question
~~apparent symmetry vs. assymetry of reading vs. writing~~ taken into account, see above
information is value, or information has value? (just because you have it, or because you know how to use it?)