Wednesday, May 30, 2012

Are Variables Evil?

Now that we know that "Software Engineering is Engineering", let's get back to Scala. Scala is often called a postfunctional or object-functional language. Its designers noticed that object orientation and functional orientation do not contradict each other. Look, for example, at these assumptions:

Object orientation
  • Every value is an object
  • Every operation is a method call
Functional orientation
  • Every operation is a function call
  • Every function is a value you can assign to variables or pass to other functions
  • Every value is an object
  • Every operation is a method call
  • Every method is a function
  • Every function is an object

But there is more to what is meant by words "functional programming". Sometimes it means programming without variables. Or at least without changeable variables. Again there is no contradiction to OO programming. We can program without changing variables in a pure OO language, but special languages usually make it much easier. I bet you've already benefited from an absence of variables. Think SQL. Or even Java. Look at this example from the book:

Programming in Scala, First Edition
by Martin Odersky, Lex Spoon, and Bill Venners
The second main idea of functional programming is that the operations of a program should map input values to output values rather than change data in place. To see the difference, consider the implementation of strings in Ruby and in Java. In Ruby, a string is an array of characters. Characters in a string can be changed individually. For instance you can change a semicolon character in a string to a period inside the same string object. In Java and Scala, on the other hand, a string is a sequence of characters in the mathematical sense. Replacing a character in a string using an expression like s.replace(';', '.') yields a new string object, which is different from s. Another way of expressing this is that strings are immutable in Java whereas they are mutable in Ruby. So looking at just strings, Java is a functional language, whereas Ruby is not. Immutable data structures are one of the cornerstones of functional programming. The Scala libraries define many more immutable data types on top of those found in the Java APIs. For instance, Scala has immutable lists, tuples, maps, and sets.

Another way of stating this second idea of functional programming is that methods should not have any side effects. They should communicate with their environment only by taking arguments and returning results. For instance, the replace method in Java's String class fits this description. It takes a string and two characters and yields a new string where all occurrences of one character are replaced by the other. There is no other effect of calling replace. Methods like replace are called referentially transparent, which means that for any given input the method call could be replaced by its result without affecting the program's semantics.

Functional languages encourage immutable data structures and referentially transparent methods. Some functional languages even require them. Scala gives you a choice. When you want to, you can write in an imperative style, which is what programming with mutable data and side effects is called. But Scala generally makes it easy to avoid imperative constructs when you want, because good functional alternatives exist.

But why is it useful to avoid mutable state/changing variables? Do you remember our Scala example of collection processing? Look, we made the code easier to understand, more reusable, bug-free and parallelizable by refusing to use mutable state. We even lowered Abstraction cost of our code. Immutable state means that all values are assigned to variables only once, and thus we have to understand each line of code only once.

I've selected some more answers on that topic from stackoverflow:

up vote43down voteaccepted
Well, there are a couple aspects to this. Number one, mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with an value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");

map.get(p);       // => null
The Person instance gets "lost" in the map when used as a key because it's hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your codedramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.

Advantages of stateless programming?

The more pieces of your program are stateless, the more ways there are to put pieces together without having anything break. The power of the stateless paradigm lies not in statelessness (or purity) per se, but the ability it gives you to write powerful, reusable functions and combine them.

Consider all the difficult bugs you've spent a long time debugging.
Now, how many of those bugs were due to "unintended interactions" between two separate components of a program? (Nearly all threading bugs have this form: races involving writing shared data, deadlocks, ... Additionally, it is common to find libraries that have some unexpected effect on global state, or read/write the registry/environment, etc.) I would posit that at least 1 in 3 'hard bugs' fall into this category.
Now if you switch to stateless/immutable/pure programming, all those bugs go away. You are presented with some new challenges instead (e.g. when you do want different modules to interact with the environment), but in a language like Haskell, those interactions get explicitly reified into the type system, which means you can just look at the type of a function and reason about the type of interactions it can have with the rest of the program.
That's the big win from 'immutability' IMO. In an ideal world, we'd all design terrific APIs and even when things were mutable, effects would be local and well-documented and 'unexpected' interactions would be kept to a minimum. In the real world, there are lots of APIs that interact with global state in myriad ways, and these are the source of the most pernicious bugs. Aspiring to statelessness is aspiring to be rid of unintended/implicit/behind-the-scenes interactions among components.

Someone once said that overwriting a mutable value means that you are explicitly garbage collecting/freeing the previous value. In some cases other parts of the program weren't done using that value. When values cannot be mutated, this class of bugs also goes away. – shapr May 11 '09 at 17:05

How can you do anything useful without mutable state?

I've been reading a lot of stuff about functional programming lately, and I can understand most of it, but the one thing I just can't wrap my head around is stateless coding. It seems to me that simplifying programming by removing mutable state is like "simplifying" a car by removing the dashboard: the finished product may be simpler, but good luck making it interact with end-users.
Just about every user application I can think of involves state as a core concept. If you write a document (or a SO post), the state changes with every new input. Or if you play a video game, there are tons of state variables, beginning with the positions of all the characters, who tend to move around constantly. How can you possibly do anything useful without keeping track of changing values?

66% accept rate

Short answer: you can't.
So what's the fuss about immutability then?
If you're well-versed in imperative language, then you know that "globals are bad". Why? Because they introduce (or have the potential to introduce) some very hard-to-untangle dependencies in your code. And dependencies are not good; you want your code to be modular. Parts of program not influence other parts as little as possible. And FP brings you to the holy grail of modularity: no side effects at all. You just have your f(x) = y. Put x in, get y out. No changes to x or anything else. FP makes you stop thinking about state, and start thinking in terms of values. All of your functions simply receive values and produce new values.
This has several advantages.
First off, no side-effects means simpler programs, easier to reason about. No worrying that introducing a new part of program is going to interfere and crash an existing, working part.
Second, this makes program trivially parallelizable (efficient parallelization is another matter).
Now, even though creating new values every time seems incredibly wasteful for complex types of values in terms of computer memory, it doesn't have to be so. Because, if you have f(x) = y, and values x and y are "mostly the same" (e.g. trees which differ only in a few leafs) then x and y can share parts of memory - because neither of them will mutate.
So if this unmutable thing is so great, why did I answer that you can't do anything useful without mutable state. Well, without mutability, your entire program would be a giant f(x) = y function. And the same would go for all parts of your program: just functions, and functions in the "pure" sense at that. As I said, this means f(x) = y every time. So e.g. readFile("myFile.txt") would need to return the same string value every time. Not too useful.
Therefore, every FP provides some means of mutating state. "Pure" functional languages (e.g. Haskell) do this using somewhat scary concepts such as monads, while "impure" ones (e.g. ML) allow this directly.
And of course, functional languages come with a host of other goodies which make programming more efficient, such as first-class functions etc.

But sometimes efforts required for avoiding mutable state far outweight its benefits. And of course, there are situations when you can't avoid mutable state at all. Think dynamic GUI. But you can minimize the negatives of mutable state. Think reactive programming. You can read about it in the next article.

No comments:

Post a Comment