Tuesday, May 22, 2012

Why We're Writing the Same Code Over and Over

As we saw in the previous article "Understanding at a lower price", Scala provides more fine-granular code reuse than a Java-like language. When in Java you either reuse a whole module as-is or write a new similar one from scratch, in Scala you often just reuse smaller sub-modules.

You can think "What stops me from doing the same thing in Java?" Well, actually you can define similar sub-modules. But defining and using them often will be so much more complex than just doing some copy-paste that it is not worth it.

Let's continue with the collections processing example. There are libraries for Java collections that allow you to assemble your collection processing part of code from predefined parts similarly to what you find in Scala. But almost nobody uses them. Let's write an equivalent to the one simple line of Scala from the previous article. This Java code is written using Guava (former Google Collections) library:
import com.google.common.base.Function;
import com.google.common.base.Predicate;
import com.google.common.collect.FluentIterable;
import com.google.common.base.Joiner;
import java.util.Collection;
import java.util.List;

public class UserHelpers {

  public String formatUsers(final List<User> users) {
    final Collection<String> filteredUserNames = FluentIterable
      .from(users)
      .transform(new Function<User, String>() {
        @Override
        public String apply(final User user) {
          return user.getUserName();
        }
      })
      .filter(new Predicate<String>() {
        @Override
        public boolean apply(final String userName) {
          return !userName.equals("test");
        }
      })
      .toImmutableList();
   
    return Joiner.on(", ").join(filteredUserNames);
  }

}

This code seems like too much effort for such a simple case. It is an example of how reusing some code can actually require more code than writing it from scratch. While it is true for Java it is false for Scala. So we can say that different programming languages sometimes either encourage or discourage writing reusable code.

If creating reusable code to eliminate duplication is much harder than just writing similar code again and again, then we usually write similar code again and again, probably going into technical debt. Until the interest on that debt becomes so high that even painful efforts to repay it become worthwhile.

Less reusable code also means less understandable and more buggy code. Human brain understands the world by some kind of divide-and-conquer algorithm. If some code has no easy way to be divided into meaningful mostly self-contained pieces, brain will have hard time understanding it. Understandability and reusability are linked very tightly. For an example, you can look at Gilles Dubochet's paper from the previouos article. One way to think about code reuse is as replacing "How to do"s with "What to be"s, and thus lowering maximum cognitive load during code understanding.


The collections example highlighted mainly Scala's easy way to define and compose functions. But similar examples can be shown for Scala's easy way to define and compose traits and some other features. For example, clever compiler makes it easier to reuse complex abstractions than it is possible in a language with dynamic typing.

Sometimes the loses from using either copy-paste or plain Java abstraction rules are so big that it becomes worthwhile to create and use such frameworks which extend Java into becoming a completely different language with very different abstraction rules. While this helps in some cases, it entails its own bag of problems, that I'm not going to talk about right now.


Let's define Abstraction cost as the additional effort required to write reusable code compared to one-time used code. By reusable code I mean code that can be used in different use-cases without copy-paste and in an easier way than writing it from scratch. The code is more reusable when it is useful in more use-cases. Writing more reusable code usually requires more efforts. I.e. more reusable code usually has higher abstraction cost.

At higher levels of abstraction (from the details of the computer) each new use-case is more often slightly different from a previous one. So code at higher levels of abstraction usually has higher abstraction cost. So average abstraction cost of a programming language limits the level of abstractions that allows profitable code reuse. That is why so much similar code is being written today from scratch, even so it seems like most of it is doing nothing new.


High abstraction cost of Java-like languages is also the main reason of "Gang of Four" patterns popularity. As we saw before, code reuse that you get for free in Scala sometimes comes for a big price in Java. Those design patterns suggest that there are cases when you should better pay that price. But there is no reason why a programming language should require you to pay such high price. That's why those design patterns received so much criticism from people such as Peter Norvig (now Director of Research at Google Inc.)

In Scala world there is no such attention to those patterns, because writing reusable code is much easier. You can find example Scala implementations of Singleton, Strategy, Factory, Visitor and Decorator in this blog:
amitdev.in/blog/?p=13
I agree that patterns translate to quite natural and concise code in some languages, but the intend of the pattern may still be important. So lets consider some of the popular patterns in Scala – a language which like Ruby/Lisp etc reduces Pattern implementation to triviality (though its not a dynamic language!).
...
Design Patterns are good recipes for designing software. However, most of them generally solve a language issue than a design issue. If you have a good Language patterns (at least most of them) will become trivial. For example, in a structured language, the concepts of virtual methods or classes may be a ‘Design Pattern’. Once you move to a powerful language, the design patterns that you deal with will also change. They will be at a higher level of abstraction.

Summing up, we can say that Scala has lower abstraction cost than Java. Thus in Scala it becomes worthwhile to reuse things we didn't consider reusing before. And therefore better understand them and have less bugs. The next two articles are "Software Engineering is Engineering" and "Are Variables Evil?".

1 comment:

  1. I think you're right about the relative overheads of code-reuse in Java and Scala. This cost is partly in the sheer volume of text you have to type in Java, and partly cognitive in that the intended composition isn't always explicitly expressed or easy to see for the boilerplate.

    To pick a worst-offender for GOF design patterns that are really about language expressivity, the Visitor in Java is almost never needed in Scala, because pattern matching subsumes it.

    I do see factories in Scala code. Implicit parameters can help hide them and lambdas make implementing factory methods almost free.

    Neither Java nor Scala have good out-of-the-box support for the Command design pattern. Perhaps when 2.10 comes out, this could be addressed. Summer-of-code anyone?

    Lastly, while heavyweight GOF-style design patterns are not as needed in scala, other abstractions are very prevalent. Scalaz collects a whole bunch of them together. These tend to abstract on `behaviour' composition - building up complex but semantically understandable function and data flows, where as GOF-style design patterns abstract on `aggregation' composition - building up complex but semantically understandable object graphs.

    ReplyDelete