Thursday, May 10, 2012

Scala. Understanding at a Lower Price.

I hope "Why Scala?" article sparked enough interest to dive into even more code.

Here is an edited quote from Oleg Ilyenko's blog:
hacking-scala.posterous.com/essential-scala-collection-functions
Let me ask you a question. How many times you wrote similar code?
case class User(id: Int, userName: String)

val users: List[User] = // ....
val resultUsers = new ArrayBuffer[User]

for (i <- 0 until users.size) {
    if (users(i).userName != "test") {
        resultUsers += users(i)
    }
}
Many imperative languages like Java, C, C++ have very similar approach for iteration. Some of them have some syntactic sugar (like Java for each loop). There are many problems with this approach. I think the most important annoyance is that it’s hard to tell, what this code actually does. Of course, if you have 10+ years Java experience, you can tell it within a second, but what if the body of the for is more involving and has some state manipulatin? Let me show you another example:
def formatUsers(users: List[User]): String = {
    val result = new StringBuilder 

    for (i <- 0 until users.size) {
        val userName = users(i).userName

        if (userName != "test") {
            result.append(userName)

            if (i < users.size - 1) {
                result.append(", ")
            }
        }
    }

    return result.toString
}

Was it easy to spot the purpose of this code? I think more important question: how do you read and try to understand this code? In order to understand it you need to iterate in your head – you create simple trace table in order to find out how values of i and result are changing with each iteration. And now the most important question: is this code correct? If you think, that it is correct, then you are wrong! It has an error, and I leave it to you to find it. It’s pretty large amount of code for such a simple task, isn’t it? I even managed to make an error it.

I’m going to show you how this code can be improved, so that intent is more clear and a room for the errors is much smaller. I’m not going to show you complete Scala collection API. Instead I will show you 6 most essential collection functions. These 6 function can bring you pretty far in your day-to-day job and can handle most of your iteration needs. I going to cover following functions: filter, map, flatMap, foldLeft, foreach and mkString.

Also note, that these functions will work on any collection type like List, Set or even Map (in this case they will work on tuple (key, value))

filter

filter is easy one. Let me show you how it can be used in order to simplify my first example:
val resultUsers = users filter (_.userName != "test")
_
As you can see, what left is most essential part of my algorithm. filter takes a predicate function as it’s argument (function that returns Boolean: User => Boolean in my example) and uses it to filter the collection. 
map

map converts one collection into another with the help of your function. Here is an example:
val userNames = users map (_.userName)
_
In this code I just converting my list of users in list of user names. As you can guess userNameshas type List[String].

...

Improving formatUsers

Now that you already know all of the essential tools that collection framework provides us, it should be straightforward to write a better implementation of the formatUsers method. Here is the whole implementation:
def formatUsers(users: List[User]): String = 
    users map (_.userName) filter (_ != "test") mkString ", "
_
The first and most important thing to notice – it’s correct. I had not enough room to make the same error. I also was able to extract the most essential parts of the algorithm and put it in front of you. Boilerplate and language ceremony is also reduced dramatically.

Another important aspect of this implementation is how you reading and writing it. Now you don’t think in terms of iterations and variable mutation. You are concentrating yourself on the set of collection transformations. Each transformation/step produces some correct and meaningful result. In my example I have three distinct steps: map, filter and make string. Each of them produces something that is valid and self contained.

In imperative version I had only 2 meaningful states: at the beginning and at the end of the method. Everything in between is broken/incomplete state, that can’t be used for anything else. When I’m trying to understand it, I need to keep the whole algorithm in head in order to understand how it works and what results it can produce. In contrast, in my function version each step is self contained and can be analyzed separately.

In my new implementation I can, without any code modifications, extract users map (_.userName), userNames filter (_ != "test") or safeUserNames mkString ", " to some methods and they can potentially be reused by other code. So my ability to reuse this code is also increased greatly.

...

And with one more method call you can run your computations in parallel on many CPU cores. Or maybe even use ScalaCL lib and speed up large collections processing up to 10x-100x by running on a GPU. The Java-like version of the code can't be reused in such scenarios and must be rewritten from scratch.


Research: Programming Style and Productivity
scala-lang.org/node/3069
Assessing the effect of different programming languages and programming styles on programmer productivity is of critical interest. In his paper, Gilles Dubochet, describes how he investigated two aspects of programming style using eye movement tracking. He found that it is, on average, 30% faster to comprehend algorithms that use for-comprehensions and maps, as in Scala, rather than those with the iterative while-loops of Java.

Interestingly, he also finds that for within small scopes, 2 to 16 lines of code, there is no advantage to having intermediate variables with "comprehensible" or "meaningful" names: Dense code with no such names, using arbitrary short names, is just as good.
...
Alex McGuire, who writes mission critical projects in Scala for power trading companies, says of Scala "The conciseness means I can see more of a program on one screen. You can get a much better overview. When I have some mathematical model to write with Java I have to keep two models in my head, the mathematical model itself and the second a model of how to implement it in Java. With Scala one model, the mathematical one, will do. Much more productive.”

With more work, like that of Gilles, the influence of programming style and language design may become more quantifiable.

Gilles Dubochet presented the paper “Computer Code as a Medium for Human Communication: Are Programming Languages Improving?” at the 21st Annual Psychology of Programming Interest Group Conference, Limerick, Ireland, June 24-26, 2009.

I hope you liked Oleg's collection processing example. But Scala features used here to increase understandability, reusability and correctness are useful for much more tasks than just collection processing. I try drawing a bigger picture in the next article.

No comments:

Post a Comment