Friday, May 15, 2009

Scala Functions vs Methods

Scala has both functions and methods. Most of the time we can ignore this distinction, but sometimes we have to deal with the fact that they are not quite the same thing.

In my Scala Syntax Primer I mention that I use the terms method and function interchangeably in the discussion. This is a simplification. In many situations, you can ignore the difference between functions and methods and just think of them as the same thing, but occasionally you may run into a situation in which the difference matters. This is analogous to how most of us treat mass and weight. In our daily lives on the surface of planet Earth, we treat them as interchangeable units, with 1Kg being the same as 2.2 pounds. But they are not quite the same thing: when an astronaut walks on the surface of the moon, his mass (kilograms) has not changed but his weight (pounds) is only about one sixth of what it was on Earth.

In contrast to Kg vs pounds, you are rather more likely to come across a situation in which the distinction between Scala functions and methods is important than you are to be walking on the surface of the moon. So when can you ignore the difference between functions and methods, and when do you need to pay attention to it? You can answer that question once you understand the difference.

A Scala method, as in Java, is a part of a class. It has a name, a signature, optionally some annotations, and some bytecode.

A function in Scala is a complete object. There are a series of traits in Scala to represent functions with various numbers of arguments: Function0, Function1, Function2, etc. As an instance of a class that implements one of these traits, a function object has methods. One of these methods is the apply method, which contains the code that implements the body of the function. Scala has special "apply" syntax: if you write a symbol name followed by an argument list in parentheses (or just a pair of parentheses for an empty argument list), Scala converts that into a call to the apply method for the named object. When we create a variable whose value is a function object and we then reference that variable followed by parentheses, that gets converted into a call to the apply method of the function object.

When we treat a method as a function, such as by assigning it to a variable, Scala actually creates a function object whose apply method calls the original method, and that is the object that gets assigned to the variable. Defining a function object and assigning it to an instance variable this way consumes more memory than just defining the functionally equivalent method because of the additional instance variable and the overhead of another object instance for the function. Thus you would not want every method to be a function; but functions give you a great deal of power that is not available with just methods, and in those situations that power is well worth the additional memory used.

Let's look at some details of this mechanism. Create a file test.scala with this in it:
class test { def m1(x:Int) = x+3 val f1 = (x:Int) => x+3 }
Compile that file with scalac and list the resulting files. Scala creates two files: test.class and test$$anonfun$1.class. That strange extra class file is the anonymous class for the function object that Scala created in response to the function expression assigned to f1. If you use more than one function expression in your test class, there will be more than one anonymous class file, even if you write the same function expression over again.

If you run javap on the test class, you will see this:
Compiled from "test.scala" public class test extends java.lang.Object implements scala.ScalaObject{ public test(); public scala.Function1 f1(); public int m1(int); public int $tag() throws java.rmi.RemoteException; }
Running javap on the function class test$$anonfun$1 yields this:
Compiled from "test.scala" public final class test$$anonfun$1 extends java.lang.Object implements scala.Function1,scala.ScalaObject{ public test$$anonfun$1(test); public final java.lang.Object apply(java.lang.Object); public final int apply(int); public int $tag() throws java.rmi.RemoteException; public scala.Function1 andThen(scala.Function1); public scala.Function1 compose(scala.Function1); public java.lang.String toString(); }
Because this class implements the Function1 interface we know it is a function of one argument. You can see that it contains a handful of methods, including the apply method.

You can also define a function in terms of an existing method by referencing that method name followed by a space and an underscore. Modify test.scala to add another line that does this:
class test { def m1(x:Int) = x+3 val f1 = (x:Int) => x+3 val f2 = m1 _ }
The m1 _ syntax tells Scala to treat m1 as a function rather than taking the value generated by a call to that method. Alternatively, you can explicitly declare the type of f2, in which case you don't need to include the trailing underscore:
val f2 : (Int) => Int = m1
In general, if Scala expects a function type, you can pass it a method name and have it automatically converted to a function. For example, if you are calling a method that accepts a function as one of its parameters, you can supply as that argument a method of the appropriate signature without having to include the trailing underscore.

Back to our test file, now when you compile test.scala there will be two anonymous class files, one for the f1 class and one for the f2 class. You can use either definition for f2, they generate identical class files.

If you use the -c option to javap to get the code of the second anonymous class, you can see the call to the m1 method of the test class in the apply method:
public final int apply(int); Code: 0: aload_0 1: getfield #17; //Field $outer:Ltest; 4: astore_2 5: aload_0 6: getfield #17; //Field $outer:Ltest; 9: iload_1 10: invokevirtual #51; //Method test.m1:(I)I 13: ireturn
Let's fire up the scala interpreter and see how this works. In the following examples, input text is shown in bold and output text in regular weight.
scala> def m1(x:Int) = x+3 m1: (Int)Int scala> val f1 = (x:Int) => x+3 f1: (Int) => Int = <function> scala> val f2 = m1 _ f2: (Int) => Int = <function> scala> m1(2) res0: Int = 5 scala> f1(2) res1: Int = 5 scala> f2(2) res2: Int = 5
Note the difference in the signatures between m1 and f1: the signature (Int)Int is for a method that takes one Int argument and returns an Int value, whereas the signature (Int) => Int is for a function that takes one Int argument and returns an Int value.

At this point we seem to have a method m1 and two functions f1 and f2 that all do the same thing. But f1 and f2 are actually variables that contain an instance of a generated class that implements the Function1 interface, and that object instance has methods that m1 does not have.
scala> f1.toString res3: java.lang.String = <function> scala> (f1 andThen f2)(2) res4: Int = 8
Because m1 is itself a method, unlike f1 you can't call methods on it:
scala> m1.toString <console>:6: error: missing arguments for method m1 in object $iw; follow this method with `_' if you want to treat it as a partially applied function m1.toString ^
Note that each time you separately reference a method as a function Scala will create a separate object.
scala> val f3 = m1 _ f3: (Int) => Int = <function> scala> f2 == f3 res6: Boolean = false
Even though f2 and f3 both refer to m1, and both do the same thing, they are not considered equal by Scala because function objects inherit the default equality method that compares identity, and these are two different objects. If you want two function values to be equal, you must ensure that they refer to the same instance of function object:
scala> val f4 = f2 f4: (Int) => Int = <function> scala> f2 == f4 res7: Boolean = true
Here are a few other examples showing that the expression m1 _ is in fact a function object:
scala> m1 _ res8: (Int) => Int = <function> scala> (m1 _).toString res9: java.lang.String = <function> scala> (m1 _).apply(3) res10: Int = 6
As of Scala version 2.8.0, the expression (m1 _)(3) will also return the same value (there is a bug in previous versions that causes this syntax to give a type mismatch error).

There are some other differences between methods and functions. A method can be type-parameterized, but an anonymous function can not:
scala> def m2[T](x:T) = x.toString.substring(0,4) m2: [T](T)java.lang.String scala> m2("abcdefg") res11: java.lang.String = abcd scala> m2(1234567) res12: java.lang.String = 1234
However, if you are willing to define an explicit class for your function, then you can type-parameterize it similarly:
scala> class myfunc[T] extends Function1[T,String] { | def apply(x:T) = x.toString.substring(0,4) | } defined class myfunc scala> val f5 = new myfunc[String] f5: myfunc[String] = <function> scala> f5("abcdefg") res13: java.lang.String = abcd scala> val f6 = new myfunc[Int] f6: myfunc[Int] = <function> scala> f6(1234567) res14: java.lang.String = 1234
So go ahead and keep converting pounds to kilograms by dividing by 2.2 (unless you are an astronaut), but when you start mixing functions and methods in Scala, keep in mind that they are not quite the same thing.

Updated 2010-12-08: clarify memory usage in response to comment by jqb.

23 comments:

dm3 said...
This comment has been removed by the author.
dm3 said...

You're right about kilogram - it is a unit of mass in system SI, but pound is a unit of mass to. Relation of relation of kilogram to pound can be found here.
That said, method and function distinction in Scala is on point.

Jim McBeath said...

dm3: Indeed, pound can be used as a measure of mass, but then my analogy would not make sense. So, to all my readers, when reading this post, please use only the more typical first definition of pound as a unit of force.

Unknown said...

Thank you for this post! It really enlightened me.

Y-Knot said...

A Pound is also a unit of money that one can put in the collection plate at Mass.

Tristam MacDonald said...
This comment has been removed by the author.
Tristam MacDonald said...

perhaps more relevantly, the SI unit of weight is the Newton.

Not that anyone but an engineer would use such a thing...

Anantha Kumaran said...

nice post

Mike said...

What is the difference between m1 and f1 when it comes to inheritance?

class test {
def m1(x:Int) = x+3
val f1 = (x:Int) => x+3
}

Mike said...

Why is there no implicit type conversion when it comes to
m1.toString
?

Jim McBeath said...

Mike: Regarding inheritance, you can override either a method (def m1) or a variable (val f1); you can override a def with a val, but you can't override a val with a def. Methods and vals occupy the same namespace, so if focusing just on inheritance, I can't think of any other differences.

Regarding implicit conversions, those are applied to values, and a method is not a value. In particular for your example, if m1 is defined as a method with no parameter list, the expression m1 alone will refer to the value returned by the method, so m1.toString is already a valid expression.

jqb said...

Functions consume more memory than their functionally equivalent methods because they include not only the code that implements the function, but all of the surrounding code for the accompanying methods and class structure.

That's nonsense; objects do not contain copies of the code of inherited methods -- the only code specific to function objects is the concrete implementation of apply. And you don't even mention the object's header and vtable, which are the cause of the extra consumption of memory.

This whole article suffers from a confusion that it propagates while allegedly addressing it: it confuses methods with functions, which are not at all the same sort of thing. A function is simply an object that wraps a method, viz. apply, and Scala provides convenience syntax for invoking the apply method of a function without having to actually name it.

jqb said...

P.S. You should read http://creativekarma.com/ee.php/weblog/comments/scala_function_objects_from_a_java_perspective/ -- a clear and technically accurate article on Scala functions.

Jim McBeath said...

jqb: Classes do not contain copies of the code of methods inherited from parent classes, but when the class inherits from a trait the code for the methods defined within the trait do in fact appear in the class file for the derived class. In the case of a function object compiled under Scala 2.7 (which was current when this blog entry was posted), in which Function1 and its siblings are traits, that means the class file contains a handful of other methods in addition to the apply methods, as you can see in my post where I show the output of javap on the function class. This situation has been improved in Scala 2.8 by making functions extend an abstract class rather than a trait.

I used the term "class structure" to refer to all of the memory overhead of objects rather than trying to explain the details. I have replaced that sentence in the article with what I believe is a more accurate explanation of why "Defining a function ... this way consumes more memory ..."

I admit to being confused about your comment regarding confusion. I clearly state that methods and functions are not the same thing in my opening paragraph, and in the fifth paragraph I address the fact that a function is an object with an apply method and that Scala provides special syntax for that method, although I did not dwell on that, as that was not the focus of this post.

I read Doug's post that you referenced. It's a fine post about functions. He covers a lot of different aspects of functions, whereas my post is focused just on a comparison of functions and methods.

Unknown said...

Perhaps pounds as a unit of force are more "typical" in your part of the world but it sure confused the heck out of me :-)
I've never used anything besides Newton myself (starting high school, so definitely not only for engineers)

Not decided yet said...

I liked this article a lot. Jim has explained it so nicely. Thanks.

toddaaro said...

Something I didn't see much about in this post was generic functions.

Say you have a compose method:

def compose[A,B,C](f: A=>B, g: B=>C): A=>C = (x: A) => g(f(x))

Now I want to create an actual function object that is still generic:

val myCompose = compose(_,_)

This fails with a type error.

My understanding is that this is because a function is an instantiated object that needs concrete type information. As a result a proper generic function object can't actually be created.

Is this correct?

Jim McBeath said...

toddaaro: A function in Scala is an instance of a class that extends one of the FunctionN traits. Any type parameters on the class must be concretely specified in order to instantiate the class, so in that way you are correct.

I'm not sure what you mean by "a proper generic function object", so I can't directly answer that question. Type parameters can be used on classes and on methods, so you can make generic versions of those.

In Scala a function carries its own compose method (defined in Function1) that gets a compatible type signature, so you don't need to write your own compose method.

Andrew Pimlott said...

Most of this discussion focuses on going from a method to a function. I was interested in going from a function to a method. I was pleased to find you can do this with no obvious restrictions. For example:

class C1(f2: Int => Int) {
def foo = f2
}

object MyApp extends App {
println(new C1({ x => x + 2 }).foo(2))
}

Outputs "4". :-) This may be worth adding to the article.

Andrew Pimlott said...

Oops, that's just a no-argument method returning a function!

Nirmalya Sengupta said...

Hello Jim, I am a Scala newbie, trying to get the concepts clear while unlearning many things acquired for last many years' professional imperative programming. Just want to mention that I have found the article extremely useful in my quest to understand the language better even though I have stumbled upon it quite late in time (~ 4 years). Thanks.

Vishal said...

Nice article!!

Shibaji said...

Nine one.... Thanks a lot , you cleared up all the dark and shadows on that topic I had form the starting day of scala learning.