I've been threatening to dive into functional programming for a long time. I've been skirting around the edges for the last few years, but I've moved this to the top of my queue. I'm currently going through Tomas Petricek's book Real World Functional Programming (Amazon link).
mention in back in Feb 2014). Based on my previous bookmark, it looks like I made it about halfway through before I got distracted by something else.
So I started back at the beginning, and I just finished up Chapter 6 today (about higher order functions). I've still got a ways to go, but I thought it would be good to put down some thoughts about the book and also some thoughts about F#.
Tomas' book has been very helpful for me to get ramped up with F#. Granted, I'm in a bit of a different position today than I was when I first got it. I've been playing a bit with F#, and I also feel like I have a better grip on functional programming concepts in general.
What I Like - Tuples vs. Individual Parameters
One thing that I really like about this book is that it's answering my questions as they come up. For example, in the early chapters, there's a mix between using tuples for parameters and using parameters individually.
Here's an example using a tuple (with the input from a script file and output in F# Interactive):
This shows an example of an "add1" function that takes a tuple as a parameter. The tuple itself consists of 2 integers. When we call the "add1" function, we need to supply a tuple as a parameter (2 integer values inside parentheses in this case).
By looking at the output, we can see that the function signature takes "a:int * b:int" as a parameter. This denotes the tuple that we described above. The output in this case is an integer.
Here's the same function written with separate parameters:
The "add2" function takes 2 separate parameters (both integers). When we call this function, we pass in the 2 integer values separately -- and we don't need parentheses around the parameters.
This brings up the question of which of these approaches should we prefer? And just when I was asking myself that question, Tomas answered that.
If we have values that should be kept together (for example, the X & Y coordinates of a point), it makes sense use a tuple. This is a single parameter that happens to contain 2 values. We probably won't be doing partial application on these values (more on partial application in just a bit).
If our parameter values do not need to travel together, then it's often better to have separate parameters. This leaves things open to partial application -- where we create a new function that provides some (but not all) of the parameters.
Here's an example of partial application:
The "addTen" function is a partial application of our "add2" function. We supply just one of the parameters (10). This means that to call "addTen" we need to supply the second parameter.
So when we do partial application, we can create functions that have specific functionality. We would not be able to use partial application with the "add1" function because it only takes 1 parameter (the tuple). This example is a bit contrived, but it can be useful when we're trying to write small, reusable functions.
Reading Function Signatures
The other thing that I like about Tomas' book is that it spends time helping us read the function signatures (actually, the type signatures). Let's compare the ones that we've seen so far.
When using the tuple parameter, we get the following signature:
Again, for "add1", we have a single parameter (represented by "a:int * b:int") which is a tuple of two integer values. Then we have the output (denoted by "-> int") which tells us that we get an integer back from this.
Let's compare this to the function with separate parameters:
The signature for "add2" is a bit more interesting (and it takes some getting use to). This a function takes 2 parameters (denoted by "a:int -> b:int") and outputs an integer ("-> int"). This is a bit more confusing since we have the arrow used between all of the values here.
But that's also the point. Technically, we have a function that takes a parameter "a" that returns a function that takes another parameter "b", and that function returns an integer.
And this is what allows us to do partial application. Let's look at the signature for "addTen" again:
This function only takes one parameter (because one parameter has already been "applied"). So this takes an "int" as a parameter and outputs an "int".
It does take a while to wrap your head around -- especially for more complex functions. But this is starting to make sense to me.
What I Can Do Without - C# Examples
The full title of Tomas' book includes "With examples in F# and C#". When I first picked up the book, I was glad to see the C# examples. But as I go through this time, I find that they are just getting in my way.
I'm not going to implement a Tuple class, a functional List class, or a Map function in C#. I would avoid it actually. I understand why the examples are there: they are giving the C# developer something to grab on to. But since these become overly complex in C# (with the use of explicit typing and generics), I've found that it isn't worth my time to understand them.
I'm all about the right tool for the job. Instead of trying to wedge functional types into C#, I'd rather concentrate on F#. So I've been skimming over the C# examples.
You'll notice that in the function declarations above that there are no types for the parameters or the output. This is because F# has a very strong type inference mechanism. This means that most of the time, F# can figure out the type based on how it is used. In these functions, since we are using "+", it makes the assumption that we will be adding numbers together. And "int" is the default number type.
There are ways to explicitly declare types. But it is common to only do this when needed -- meaning, when F# cannot figure out the type based on the context.
This has an added advantage: many functions end up using implicit generic types. This means that the functions will works with a variety of parameter types the same way as if we declare a generic type parameter in C#.
A Bit Concerned
This is a huge feature, and it is really cool. But I have some concerns about it, specifically around readability. When no types are explicitly declared, it can be difficult to figure out what's going on without tracking back through various functions or exploring the types using the editor tooltips in Visual Studio.
Here's an example: when I picked up the Digit Recognizer code again. It took me a while to figure out what was going on. Granted, I didn't write the code originally; I transcribed it from Mathias Brandewinder's book Machine Learning Projects for .NET Developers (Amazon link).
When I did the original transcription, I understood what was going on. But when I picked it up again, I had to do some exploration of the code.
For example, here's the "manhattanClassifier" that was used to create predictions based on the training set (with the tooltip for the "classifier" item):
This shows that "manhattanClassifier" consists of a "classifier" and a "manhattanDistance". When we hover over "classifier", we see that it is a "Distance -> int -> string". So this takes a "Distance" and an integer array (that represents the digit to be recognized) and outputs a string (the prediction).
Then we need to look at what a "Distance" is. This happens to be an explicitly declared type:
This represents a function that takes a tuple with 3 values (an integer array, another integer array, and an integer) and outputs an integer. The integer arrays represent the incoming data and a value from the training set (ignore the 3rd value for now -- that was some performance experimentation that I didn't quite complete).
But what about the "manhattanDistance" that we have?
This represents a "Distance" type (notice that it has the same signature as "Distance").
If we take a step back and look at what "classifier" is, we see this declaration:
This refers us to the "train" function which has a pretty complex signature.
This leads us to the declaration for "train":
Here we can see some parameter types that are declared explicitly: "trainingset" is an array of Observations, and "dist" is a Distance.
If we follow these declarations back up, we can see that the "classifier" function is a partial application of the "train" function. And "manhattanClassifier" completes that application by providing the "Distance" parameter that "train" needs.
The result is that we have very flexible code. We can use and reuse these functions over and over. In fact, many of these functions are re-used to declare a "euclideanClassifier".
My big concern here is readability. I'm sure that this is something that I will get used to as I get more comfortable with functional programming in general and F# in particular. But what does this mean for other developers who are walking up to this code? How approachable is it?
I've written a little bit about this before when talking about the "var" keyword. "var" allows for type inference in C# (although extremely limited compared to what's available in F#). I try to constrain the use of "var" in my code to make sure that the types are easy to see just by looking at the code. (Check the previous article for some examples there.)
And that leaves me really torn when it comes to F#. Type inference is extremely powerful, and it makes code very compact. But it also hinders readability. This is particularly true when we don't have a programming environment available to us.
I generally like code to be readable in a text editor -- meaning that if I just have Notepad (or even a printout of the code), I like the code to be understandable. But this is not possible when we're relying on type inference as heavily as we do with F#.
To see this for yourself, just take a look at the code samples above directly on GitHub: jeremybytes/digit-display/.../FunRecognizer.fs. It can be tricky to follow without any tooltips or compiler hints.
For now, I'll withhold judgment. I know a lot of F# developers who are big fans of the type inference system. And I have to believe that this is something that you do get used to with enough practice and experience.
At the same time, I think about how I will teach other developers how to use this. If I find F# so awesome (which is what's happening at the moment), I'll want to share this with other developers. There are a lot of hurdles to get over when moving from OO programming to functional programming (it's definitely a shift in mindset).
I'm worried that type inference just makes things harder to approach -- particularly when you're brand new to the environment.
So I am actually doing my deep dive into F#. I'm planning on finishing up Tomas' book, and then heading on to Dave Fancher's The Book of F# (Amazon link). I'm also going to get back to Mathias' machine learning book as well. (Last time, I kind of petered out when the math started to get a bit complex. But I need to hunker down and work on understanding it.)
Functional programming is definitely keeping me interested. And it's making me happy (particularly compared to some other technologies that I've been trying to stay away from). I'll keep exploring, and I'm sure that I'll come to grips with the things that work for me and the things that I'll start to recommend to others.
But I don't feel like I can make any real recommendations until I understand what I'm doing quite a bit better.