Jeremy Bytes: July 2016

Friday, July 29, 2016

Discover the Good: Object Oriented & Functional Programming

I've been spending a lot of time this month with functional programming - F# in particular. And I've been exploring functional programming for a while now (I just collected the articles, and there are more than I remembered).

Since I've written quite a bit this month, it has spurred some good conversations. I wanted to expand a bit on a question from Scott Nimrod (@Bizmonger).

As a side note, I got to meet Scott in person at NDC London back in January.

Are You Done with OO?
Here's a question Scott asked me a few days ago (Twitter link):

I've done Object-Oriented Programming for many years, and I've been successful with it. I've done Functional Programming for a much shorter time; and I really like it. But I don't see one paradigm replacing the other in my world.

These are different techniques with different strengths.

Discovering the Good
The thing I've been most impressed about in the Functional Programming community is the emphasis on discovering the good. And I've written about this before: An Outsider's View of the Functional Community.

Rather than getting caught up in "my language is better than your language", the people I know are more interested in the what each does well. When we find out what each tool is really good at, we know what to reach for when a particular problem comes up in our environment.

Sporadic Discouragement
Even though there is an overwhelming amount of positive that I see in the functional community, every so often, I'll come across a really depressing article like "Why I'll Never Do OO Programming Again". This is often a rant about failures of projects and problems that arose in their environment.

And this makes me really sad.

Object Oriented Programming has served a good purpose over the years. And it will continue to do that.

I Really Love Object Oriented Programming
Object-Oriented programming techniques have enabled a lot of successes in my career. There are situations where having state and behavior travel together works very well.

I've programmed a lot of line-of-business applications over the years -- and having a mutable object that knows how to validate itself based on a set of rules is really great in that environment. Encapsulation is great for exposing behaviors to the outside world but hiding the implementation. When you combine this with objects that are easily data-bound to a UI, then things get even better.

Object-oriented programming is the right technique for a lot of situations.

It's also the wrong technique for others. It's difficult to multi-thread processes when there is shared, mutable state. But that's okay. When we recognize that, we can look for other techniques.

I Really Love Functional Programming
Functional programming techniques are awesome. I managed to pick up on a lot of functional-ish ideas in my C# programming with delegates, lambdas, and LINQ. I didn't realize it at the time, but these were pushing me toward a better understanding of functional programming.

By having functions with discrete inputs (parameters) and discrete outputs (return) with no shared state, things get *extremely* easy to parallelize. And I've used this techniques in certain areas to get precisely this advantage.

Functional programming is the right technique for a lot of situations.

It's also the wrong technique for others. Data-binding immutable objects doesn't make sense since data-binding is all about notification of state change. But that's okay. When we recognize that, we can look for other techniques.

There's Enough Love to Go Around
I purposely stayed away from referring directly to languages here. In the .NET world, C# is an object-oriented language, but it also has a lot of features that enable functional programming. At the same time, F# is a functional language, but it has features that enable objected oriented programming (particularly since it needs to interop with .NET libraries that are OO-focused).

Object-Oriented Programming and Functional Programming are both awesome. They just happen to be awesome at different things.

We should really strive to use the right tool for the job. This means that we keep learning new things to add to our toolbox. I don't throw out my angle grinder because I got a new cordless drill. And I don't throw out perfectly working programming techniques just because I pick up a new one.

Keep learning; keep expanding; keep experimenting. What's best for your environment isn't necessarily what's best for my environment. And that's okay.

Happy Coding!

Tuesday, July 26, 2016

Sequences vs. Lists in F# (with Euler Problems #7, #8, #9, and #10)

In going through the Euler Problems in F#, I've found myself using lists and sequences a lot. Probably because I'm comfortable with the map, filter, collect, sum and similar functions because they are so much like the functions I love in LINQ in C#.

This means that my solutions have tended to be a bit of "brute force". But I have done some refinement later; this will continue.

I ended up working on the next set of Euler Problems at the same time -- bouncing between them and sharing ideas where I could. I ended up with solutions that were reasonably performant. But I picked up something else from this process:

I saw how choosing a sequence over a list (or a list over a sequence) can drastically impact performance.

Let's take a look at the solutions, and then we'll look at my learnings about sequences and lists.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

Euler Problem #7
Here's Euler Problem #7:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.
What is the 10,001st prime number?

And here's my solution:

And here's running the function with the target value:

This is a bit of a brute force solution. The "isPrime" determines whether there are any factors of a number. This is what we used when looking at Euler Problem #3 (the "factor" function), but then I added the "Seq.isEmpty" check on to the end. If there are no factors, then the number is prime.

Why Sequence is Important Here
The reason that using a sequence is important here is that sequences are lazy-evaluated -- similar to how IEnumerable in C# is only evaluated once you start enumerating through it.

So when we use "isEmpty", this pulls the first item in the sequence (and only the first item). If it finds an item, then it short-circuits the evaluation of the rest of the sequence. So, we don't actually end up calculating all the factors for a number, we only calculate the *first* factor. If it exists, then the number is not prime, and we return false. If there are no factors, then the number is prime, and we return true.

Why the "match"?
In the "euler7" function, we use pattern matching on our number. The reason is that I wanted to do a bit of optimization.

The second pattern uses another sequence. It goes from 3 to 1 million. But I specified "[3..2..1000000]", The middle term means "step 2", so we're only taking every other number in this sequence. This skips all of the even numbers (which we know are not prime), and cuts our processing time in half.

But we have one exception: "2" (an even number) is prime. So, I manually add a pattern for that in case someone asks for the first prime number.

Why Sequence is Important Here (Again)
You'll notice that we're using a sequence in the "euler7" function as well. This is because we ask for a particular "item". This will evaluate the sequence up to that item (and no further).

The "n-2" looks odd. That's because "item" is zero-based. So to get the 10001st prime, we need to get the item at index 10000. But we also skipped "2" in this sequence, so we need to subtract one more number from our index to get an accurate match.

Performance
It takes 18 seconds to evaluate the 10,001st prime number. That's not too bad for a brute force attack. There's sure to be a better mathematical approach to this problem, but I haven't researched that yet.

Euler Problem #8
Let's move on to Euler Problem #8:

The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?

First of all, 1000 digit number?!?

My solution starts out by treating this as a giant string:

When we run this to find the 13 numbers that produce the largest product, we get this result:

Just a warning when you're looking for Euler solutions online. For this particular problem, there are different target values. This one uses 13, but there are also solutions that are looking for 5 adjacent numbers or another variation.

We don't need to treat the original number as a number, we can treat it as a string. What I decided to do was create a 13-character "window" on the string that I could slide through the entire 1000 characters.

So the "windowProduct" function takes a starting index and the size of the window (13 in our case), takes those characters and turns them into integers, and then multiplies them together.

A couple notes: I extracted out a "charToInt64" function to make converting a character to a number a bit more obvious. And I'm using 64-bit values here because our product will overflow a 32-bit integer.

For the "euler8" function, I create a sequence that includes all of the possible windows in our 1000-digit number. It finds the product of all those numbers, and picks out the biggest one.

Why Sequence is Important Here
The sequence inside "windowProduct" is important because we're dealing with a string here. In C#, a string implements the IEnumerable<char> interface; in F#, string is a "seq<char>". Since we already have a sequence, we'll just keep working with it.

Why List is Important Here
Inside the "euler8" function, we use a list rather than a sequence. The reason is that we need all of the possible values in order to find the "max" value. This means that all of the items need to be evaluated. Because of this (and the fixed size of the collection), a list is a bit more efficient here.

Performance is good (less than a 10th of a second), but in my (extremely unscientific) tests, using a sequence in this function took about twice as long. With these particular values, it's not a significant difference. But it's something to keep in mind.

Euler Problem #9
So let's move on to Euler Problem #9:

A Pythagorean triplet is a set of three natural numbers, a < b < c, for which,
a² + b² = c²
For example, 3² + 4² = 9 + 16 = 25 = 5².
There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

I did a very severe brute-force on this:

This does give the correct answer:

(And in case you're wondering, the three values are 200, 375, and 425.)

The "squareList" is a list that contains the squares for the numbers from 1 to half of our target value (I used half because it sounded reasonable since we were adding and squaring numbers in various ways -- I can't say that I have a mathematical proof for why this is a good value).

The "collect" function creates a Cartesian product of 2 sets of these lists of squares (similar to what we did for Euler Problem #4). This creates all of the permutations of values -- adding each square together and recording the value. The result is a list of tuples with the first square, the second square, and the sum of the two squares.

The "filter" compares the sum of the two squares against our "squareList" (since this is a list of known squares). It hangs on to only those values where the sum is also a square.

The "map" takes the square root of all of our values (yes, I know this isn't very efficient, but when I was trying to come up with a data structure that held both the squares and the original values, I ended up with a bit of a mess).

The "find" function gets the values where the original (non-squared) values add up to our target (1000).

Finally, we have the "prodTuple" function which will multiply the 3 values in our tuple together to get us the ultimate result.

Why Lists are Important Here
Like with our previous example, we have a fixed number of values (the Cartesian product of all of the squares from 1 to half of our target). Since this is a fixed size, and we're evaluating everything, a sequence wouldn't buy us any short-circuiting advantages.

Performance
The performance is okay. It takes a little over a second to evaluate this. Again, there is probably a better mathematical approach to this. But computers are very good a brute-forcing things (that's why they're so good at playing chess).

Euler Problem #10
Finally today, we have Euler Problem #10:

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.

Here's my solution (that performs very well):

And when we run this with our target value, we get the correct answer:

This isn't my first try at this. My first try was a brute force approach. That didn't come out so well:

It worked, but it took almost an hour and a half to complete. This is really not acceptable.

Sieve of Eratosthenes
To get acceptable performance, I abandoned the brute-force approach and used something known as the Sieve of Eratosthenes. This is really good at finding the prime numbers below a certain value.

The short version is that we can eliminate all values that are multiples of a prime. Let's take the values from 2 to 20:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

First, we can eliminate everything evenly divisible by 2 (our first prime):

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Then we can eliminate everything evenly divisible by 3 (our second prime):

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

We only have to go up to the square root of our target value (that's the number I mistakenly applied to a different problem previously) Since the square root of 20 is 4.x, we can stop at 4.

What's left is the set of primes less than 20:

[2, 3, 5, 7, 11, 13, 17, 19]

A Clumsy Approach
I'll admit that my implementation of the Sieve of Eratosthenes is a bit clumsy. But it works. You'll see that we have the same "isPrime" function that we used in Euler Problem #7 (although it's just on a single line here).

Then I calculate a set of "sievePrimes". This uses brute-force to get the prime numbers up to 1414 (which happens to be the square root of our target number).

The "divisibleByRange" function takes a number and sees if it is evenly divisible by any value in a range of values. When we use our "sievePrimes" as the range, we can find out which values can be eliminated in our sieve.

The "euler10" function applies the "divisibleByRange" against the entire set of numbers from 2 to our target value. We have to use the "not" because "divisibleByRange" returns true if it's divisible; and these are the numbers we do *not* want to include in our list.

Why Sequence is Important
Our "isPrime" uses a sequence. The importance of this is the short-circuiting that we saw earlier.

The sequence in "sievePrimes" is important because we don't know how many values we'll end up with. All we know is that there will be fewer than 1414. So rather than allocating a list, we'll use a sequence.

Then we take that sequence and turn it into a list. Why?

Why List is Important
The reason why we turn this sequence into a list is that once we've evaluated it one time, we *do* know how many items are in the list (and there's no reason to re-evaluate them later).

"divisibleByRange" works with a list because it's designed to work with our "sievePrimes" value, which we just saw is a list.

Why Sequence is Important
In "euler10" I start with a sequence because I don't know how many prime numbers will be in our collection.

The rest of the function will "map" the values to 64-bit integers (because our sum will overflow a 32-bit integer), and then we call "sum" to get our result.

Sequences and Lists
In going through these examples, I had to think quite a bit about whether I wanted to use a sequence or a list. I found that if I picked the wrong one, I would end up taking a performance hit.

For example, when I used a sequence for the "sievePrimes" (instead of converting it to a list), things slowed down significantly -- so much that I didn't even wait for the process to finish. Since this value is used over and over again, it's better for it to be in a list that's fully evaluated.

At the same time, for "isPrime" it works much better to use a sequence. As soon as the first value returns from the sequence, the "isEmpty" function returns false. This short-circuits evaluation of the rest of the collection.

Short Version:

Sequences are good when we don't know how many values we'll have or when we want to short-circuit evaluation.

Lists are good when we know how many values we have or when we want to evaluate everything (such as a max or sum).

This is something that I "knew" from reading. But when I was working with these particular problems, I saw the difference in action.

At times, the differences were insignificant. But at other times, the performance differences were dramatic -- to the point of making one solution completely unusable.

Wrap Up
I've been doing a lot of experimentation lately. This has been a great help in showing me the real differences in different solutions. In this case, I feel like I've got a much better handle on when sequences are appropriate and when lists are appropriate. (You'll notice that I didn't include any arrays here -- I guess that's something I'll have to look into.)

I know that there are better solutions to the Euler Problems than then ones that I've got here -- for example, I know that there are recursive implementations of the Sieve of Eratosthenes. But I'm not done looking into these problems just yet. It's time for me to do a bit more exploration online to see what other folks have come up with.

If you have solutions or ideas that you'd like to share, feel free. The comments on the other Euler articles that I've written have been very helpful to me (and hopefully to others as well).

Happy Coding!

Monday, July 25, 2016

Learnings from The Book of F# by Dave Fancher

As you've probably guessed, I've been doing a lot of work with F# recently. A big part of those learnings was from The Book of F# by Dave Fancher (Amazon link).

The Book of F# by Dave Fancher

I set aside this month to dive deeper into F# (since I've been threatening to do it for a while but never put it high enough on my priority list). I started the month with Tomas Petricek's Real World Functional Programming, and that gave me a lot of things to think about. But as I got further in, I recognized that I really needed to get to the basics of F#. I was going through some samples and trying to explore a bit, but I felt like I was fumbling a bit.

This problem came to the forefront when I started working on the Euler Problems. I found that I was having a hard time knowing where to go next; I just didn't know enough about the language itself.

So, I put Tomas' book on hold (after 8 chapters), and picked up Dave's book. And that turned out to be exactly what I needed to do.

Starting with the Basics
I've picked up quite a few of the basics of F# from looking at samples and listening to other developers. But it has been rather hit-and-miss.

The Book of F# started right where I needed it to: how to use the tools. The first 2 chapters are short, but they are extremely useful when you're not familiar with the F# tools in Visual Studio.

Project Layout
One thing was file layout in F# projects. Because of the way that F# parses things, file order is important. Dave shows the various ways to add files and move them up and down in a project (something that you never have to do in C#).

Another part of that was how to use modules and namespaces. This is something I could have used about a month ago when I was working with my Digit Recognizer project. In that project, I needed to take F# code that was in a script and move it into a dll that I could use from a C# project. I fumbled my way through that (and also asked some advice from Mathias Brandewinder when I saw him at NDC Oslo). But this book really helped to formalize that.

Using F# Interactive
The other thing I really appreciated were the instructions on how to use F# Interactive in Visual Studio (I've been using Visual Studio as opposed to the command-line tools. I may start using those in the future.)

Particularly, Dave shows the various commands: #help, #quit, #load, #r, #l, and my favorite #time. I've been using #time quite a bit when I've been looking at how to improve performance when solving the Euler Problems.

I'm also glad that Dave points out the "it" value that gets automatically assigned by F# Interactive. I've found myself using that a few times to get some more information about something that was just evaluated.

One Parameter, One Return
Another thing that I appreciate is the statement that every function has one parameter and one return value. This is a great starting way of stating this (which I clumsily tried to describe when talking about partial application).

By thinking about functions this way, it makes partial application and composition a bit more clear -- although I still have a long way to go with function composition.

Semi-Colons (or Not)
Another one of the basics that I really appreciated had to do with the use of semi-colons. Because white-space is significant in F#, there are times where semi-colons are optional.

For example, when declaring a record type, you don't need semi-colons between the labels if you declare them on separate lines. These two declarations are equivalent:

This is also true when creating records with assignment pairs. I ran into this issue when I was going through an example from somewhere else. I was putting things on a single line and just leaving spaces, and that didn't work. When I added semi-colons it did. But the example didn't have semi-colons (because things were on separate lines).

This cleared up quite a bit of confusion for me.

Lots, Lots, More
Obviously, the book covers a lot more information about F# than what I've listed here. These are just a few of the things that have been tripping me up lately, so they were especially appreciated.

After going through The Book of F#, I know I've got a much better grip on the language.

A Bit of Fun
Another thing that I appreciated about The Book of F# is that the samples use rather nerdy references. This included references from "The Day the Earth Stood Still" and "Doctor Who" (in addition to a list of Arnold Schwarzenegger movies).

What I liked about the references is that they were just there (they weren't pointed out). So, if you had no idea who Rory Williams was, it wouldn't detract from the example. But if you do, then it's a nice nod to a bit of fun.

Wrap Up
The Book of F# has been a great help in getting me a formalized version of the basics of F# that I needed to move forward. A big thanks to Dave Fancher for going to the work of putting this together. (And Dave's a really great guy if you ever get a chance to meet him in person).

The Book of F# also makes a good reference. The material is well organized, and I've found myself flipping to the chapter on Collections to remind myself of a particular function.

Overall, this is a great place to get a solid foundation.

Happy Coding!

Friday, July 22, 2016

Struggling with Readability in Functional Programming (with Euler Problem #6 in F#)

I've been working through Project Euler as a learning tool to get better with F#. When I looked at Euler Problem #6, I thought, "That will be pretty easy" (I even mentioned that at the end of my last article). But I ended up thinking about readability and barriers that make it difficult for folks to learn functional programming.

This goes to show that we can learn from even "easy" problems.

Here's Euler Problem #6:

The sum of the squares of the first ten natural numbers is,
1² + 2² + ... + 10² = 385
The square of the sum of the first ten natural numbers is,
(1 + 2 + ... + 10)² = 55² = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.
Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

This seems trivial compared to the other problems that we've looked at so far.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

A Simple Solution
We basically need two functions: one to get the sum of the squares for a range, and one to get the square of the sum for a range. Then we just take the difference of running those functions.

Sum of the Squares
To calculate the sum of the squares, I used pipelining to run a range through a couple simple functions (showing snippets from a script file and the output in the F# Interactive window):

This maps each item in our range through the "pown" function. This raises a number to a specified power (in this case, 2). So this will calculate the square of each number.

Then we pipe this list of squares through the "sum" function. And we can see that when we do this on our sample range (1 to 10), we get the expected result: 385.

Square of the Sum
Calculating the square of the sum is also pretty easy. Here's my first pass at that:

We can't use pipelining quite the same as in our previous function. That's because with the "pown" function, we want to pipe in the first parameter (as opposed to the last parameter). So instead, I created a local "square" function that would only take a single parameter.

Then we take our range, pipe it to "sum" (to get the sum of our items) and then pipe it into our "square" function.

For our sample range (1 to 10), this gives us the expected output of 3025.

Getting the Difference
Then we just have to get the difference of these two functions:

This subtracts the square of the sums from the sum of the squares. Then I run it through the "abs" function to get the absolute value. The reason I do this is because I don't want to worry about which of the values is bigger (it actually varies based on the range). By taking the absolute value, we get the difference between the two values, regardless of which is bigger.

Done!
Trivial. Easy. Obvious. Time to write the article.

Nope. Then I started thinking about other options.

Other Options
I wasn't very happy with the "squareOfSums" function. I felt like I should be able to write this more cleanly. I came up with a few different options, and I wasn't happy with any of them:

I know that I'm not very good at function composition yet (I need to get my brain used to working that way). So, I sent out a tweet to see if someone could point me in a better direction.

I got a response back from Mark Heath (@mark_heath - who also provided direction when I was looking at prime factors):

Using function composition like this seemed much cleaner than the options I came up with, so I gave it a try:

This gives us the same results as our other functions. And there's really no need to use the "pown" function when we're just squaring the value; x * x works just as well.

But where did the "range" parameter go?

Let's compare the signature to our original version:

These both have the same signature: int list -> int -- the parameter is a list of integers, and the output is an integer.

To get a better idea of what's going on, let's make a similar change to our other function.

Losing the Parameter
As a reminder, here is our original "sumOfSquares" function:

The "map" and "sum" functions can be combined into a single "fold" function:

The "fold" goes through each item in our list, squares it (x * x) and then adds the value to our accumulator (which ends up as our sum).

Note: As Mathias Brandewinder (@brandewinder) points out in the comments "sumBy" is a more direct combination of "map" and "sum": Mathias' Comment. I like this because the intent is more clear (the result is a sum), and the code is cleaner since we don't have to deal with the accumulator.

Rather than using pipelining, we can move "range" to the last parameter of the "fold":

Then we can make things a bit more interesting. Rather than providing the "range" parameter explicitly, we can remove it entirely:

This makes "sumOfSquares" a partial application of the "List.fold" function. "fold" takes 3 parameters (the function, the accumulator, and the list), but we're only supplying the first 2 parameters (the function and the accumulator).

Because of this, our "sumOfSquares" function still needs an "int list" parameter to complete the evaluation. But rather than being an explicit parameter in our code, it becomes an implicit parameter needed to complete the partial application.

When we look at the signatures of all of these methods, we see they are identical: int list -> int -- the parameter is a list of integers and the result is an integer.

What About Readability?
This made me really start thinking about readability. I've already been thinking about readability in F#, but for other reasons -- specifically, type inference. But this gave me something else to think about.

I'm not questioning whether this is a "right" way of doing functional programming; I've seen a lot of code that is similar to this while I've been doing my exploration. But I'm wondering if other people have concerns about readability.

Mark was nice enough to provide his view:

This seems to be a common theme when talking to folks about functional programming: It seems weird at first, but you get used to it.

Removing Barriers
For OO developers, there are a lot of barriers to get past when learning functional programming. To be successful with functional programming, you have to think about problems much differently than with object-oriented programming.

I'm not trying to change the way people do functional programming. But I'm wondering if there's a way to make things a bit easier for OO developers.

I'm always thinking about ways to make things more approachable; that's why I've been so successful in teaching people about things like lambda expressions, interfaces, and dependency injection. As I'm learning about functional programming, I'm thinking about ways to make things easier to learn -- to smooth out some of the steep barriers to make more of an even slope.

I don't have answers at this point. But this "easy" problem brought these to the forefront of my mind. I'll definitely be working on this more as I dig deeper into the functional world.

Completing Euler #6
This kind of got off-track for "solving Euler Problem #6". For those who are still interested, here's the combined function:

There's probably a better way to do the difference part of this function (the last two lines), and I'm open to suggestions for that. But we see that this is working, and we do get the correct answer for the sample case: 2640.

Last step is to run this against the target case, from 1 to 100:

We can verify online that the answer to this is correct.

Wrap Up
Just because something is "easy" doesn't mean that we won't still learn something from going through the process. In this case, I learned a bit more about function composition (which I need to work on much more).

More importantly, I'm really thinking about readability and barriers-to-entry for developers who are new to functional programming. It's probably going to take much thought and a lot of conversations with people who have been doing functional programming for a while.

If you have any ideas, I'd love to hear them. The folks I know in the functional community are very helpful and want to bring more developers into the community. There are quite a few barriers that make things difficult. Let's have the conversation about how we can make things easier.

Happy Coding!

Thursday, July 21, 2016

Functional Practice: Euler Problem #5 - Faster with Math & Help from the Community

Rather than moving on to the next problem from Project Euler, we'll revisit Euler Problem #5. Last time, I did a brute-force attack of the problem. I got the right answer, but I knew there was a better way of approaching the problem: by using math.

Here's Euler Problem #5 (again):

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

It's possible to solve this problem using prime factors. And since I have a good way to find prime factors in F#, I figured it would be good to explore this approach.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

Manually Solving the Problem
We'll start by looking at the prime factors for the values 2 to 10. (We'll skip 1 because that divides evenly into everything):

2 = 2
3 = 3
4 = 2 * 2
5 = 5
6 = 2 * 3
7 = 7
8 = 2 * 2 * 2
9 = 3 * 3
10 = 2 * 5

From here, we want to create a set of prime factors that covers all of these individual sets of prime factors. That looks like this:

2 * 2 * 2 * 3 * 3 * 5 * 7

If we look at this set, we see that it covers all of the cases from our list. If we look at 8, we see that it's factors are three 2s; and we have three 2s in our result set. In addition, 10 consists of a 2 and a 5, and we see that we have (at least) one 2 and one 5 in our result set.

We can go through each of the numbers from 2 to 10 and find that the prime factors are represented in our result set.

And this is where the magic happens. If we multiply the numbers of our result set, we get...

2520

Hey, that looks significant. In fact, it's the answer to the sample case of Euler Problem #5.

We can use this same technique to get the solution to the target case. We just need to figure out how to get the prime factors for the numbers 2 to 20 and then reduce the factors.

Getting the Prime Factors
It's pretty easy to get the prime factors of our values. First, we'll use the "primeFactors" function that we came up with earlier:

It's pretty easy to use a map with this function to get all of the prime factors for the values 2 to 20:

This gives us a list of lists. Now we just have to figure out how to reduce this list to a single collection.

We can't use a "concat" because that just appends the lists together. If we did that, then we would have too many values -- for example, we would have sixteen 2s instead of just the four 2s that we need.

And we can't use a "distinct" or "union" operation because that would eliminate values that we need -- it would give us just one 2.

I thought about a few ways that I might approach this problem. But I felt like I was doing things the hard way. Maybe I was missing something that would make this really easy.

The F# Community is Awesome
I decided to put send out a tweet with my problem:

And I had several people from the F# community jump on this (even though I didn't really provide any specification at all).

Huge thanks to everyone who responded including Cameron Presley (@pcameronpresley), Reid Evans (@ReidNEvans), Jack Fox (@foxyjackfox), Tea Driven Dev (@TeaDrivenDev -- sorry, I don't know your real name), and Llewellyn Pritchard (@leppie).

In the end, Tea Driven Dev provided the solution I needed (again, especially good since my "spec" was pretty lacking):

If nothing else, I felt better about not missing an "easy" solution. But as Jack Fox pointed out:

If you'd like to take a look at the Twitter interaction, you can find the thread starting here: https://twitter.com/jeremybytes/status/755758533648322560.

Breaking Down the Solution
There are quite a few steps in this solution. Let's take things one line at a time to try to understand what's going on. As a reminder, here's "whatIhave" (a list of lists):

Getting Lists with Single Numbers
So, we'll send this in to the first line of the function:

This combines several functions to filter the initial list. First it does a distinct on each of our lists:

[2] --> [2]
[2; 2] --> [2]
[3; 2] --> [3; 2]
[19] --> [19]
[5; 2; 2] --> [5; 2]

Then it looks for the lists that have a length of 1. This eliminates any lists with multiple values:

[2] --> [2]
[2; 2] --> [2]
~~[3; 2] --> [3; 2]~~
[19] --> [19]
~~[5; 2; 2] --> [5; 2]~~

If we look at the resulting set of lists, we see that each list only contains one number (although some have multiple occurrences of that one number).

We can safely do this because the list with multiple numbers are represented by lists with single numbers. For example, the number 20 has factors of [5; 2; 2] (one 5 and two 2s). These factors are represented by other numbers in our list as well. 5 has a factor of one 5, and 4 has factors of two 2s.

This is true of all of our items with multiple numbers; earlier items in the list will represent the factors with only one number lists. This may be a bit hard to follow, but it does work.

Grouping by Number
Now let's head to the next step:

This groups our lists according to the number they contain. "head" pulls out the first number of each list, but since all the lists only have one number, the result is that they are grouped by that number.

Each item is now a tuple with the number and the list of lists. For example, we have two lists that contain 3s. These are grouped together as (3, [[3]; [3; 3]]).

Sorting by Number
The next step is to sort them according to the number:

"fst" represents the first value in our tuple, and that's what we're sorting by. This is the same order that they were already in. But doing the explicit sorting is a good safety measure.

Getting the Longest Lists
The next step is to get the longest list in each grouping:

This has quite a few functions combined together, so let's look at what each one does.

The "map" will run this against each item in our overall list -- so it will run for each number.

The "snd" pulls out the second item of the tuple (which is our list of lists):

(2, [[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]])
becomes
[[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]]

Then we sort by descending length:

[[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]]
becomes
[[2; 2; 2; 2]; [2; 2; 2]; [2; 2]; [2]]

Then we take the "head" item of the list which leaves us with [2; 2; 2; 2] -- the collection with the most number of this digit.

Combining the Lists
The last step is to combine the lists using "concat":

This gives us the output that we're looking for.

A Function to Reduce Prime Factors
Now that we've seen all the steps, let's look at the entire function:

Now we can take our list of lists of factors and get a single list back.

Solving the Problem
Now that we have our reduced prime factor list, all we have to do is multiply the items together to get the solution:

For this we'll use a "fold". This uses an accumulator value ("1" in this case), and then runs a function against every item in our list. In this case "x" represent our list item and "a" represents the accumulator.

Update: As Thomas Leveseque points out in the comments, This can be shortened to simply "|> List.fold (*) 1", and we'll get the same result.

The result of this is that we multiply each item in our list. And we can see that the result is what is expected (based on our previous look at Euler Problem #5).

Troubleshooting
Now that we have the pieces working, let's start putting them together into a function that will take an arbitrary range as a parameter.

We'll start with a hard-coded list:

When we run this, we get the right answer. But look at the timing; it took 13 seconds to get there. Something doesn't seem right about that.

Let's run just the first two lines to see what happens:

Look at the first value that comes out: [-1; -1]. This is our problem. In our earlier tests, we were starting with "2" because "1" evenly divides into everything. But we didn't take that into consideration here.

We tried to run our "primeFactors" function against "1", and that caused a problem. In fact, if we were to walk through the function, we would find that it loops through all of the integers, overflows, and then works it's way back to -1. And that's why it takes 13 seconds.

It's easy enough to fix this. Just to be safe, we'll skip over everything that's less than 2:

The "skipWhile" will make sure that we don't have "0", "1", or any negative numbers in our range. And now we see that our function takes almost no time at all to complete.

The Final Function
Now let's put everything together:

This puts "primeFactors" and "reduceFactors" into our function as local functions. These functions are actually useful on their own, so it would probably make sense to keep these separate. But we'll put them together so we have everything in one place.

When we run this, we get the expected output:

And look at the timing; this took almost no time at all. Much better than waiting over 1 minute for our brute force approach.

Learnings
There were a couple of big take-aways from this particular process.

The F# Community is Awesome
The F# community is really awesome. I already knew that, but this just reinforced that. I was very excited to see so many people take interest in my problem. And I think the reason for that is 2-fold. First, there are people who are willing and able to help out folks who have questions. Second, there are people who like the challenge of solving a problem.

In this case, Euler Problem #5 is a well-defined problem, and it has many well-defined solutions. But it's still an interesting challenge. Once Reid Evans found that I was working on Euler #5, he sent over his solution: FunctionalKnox/Euler.

I have met so many people that are willing to help and share. I'm hoping that I'll be able to pass that along in the not-too-distant future.

Debugger vs. REPL
The other thing that I noticed here was the difference in how to troubleshoot an issue. I had read about (and heard people talk about) how in F# you don't really use the debugger. Instead, you use the REPL to investigate and find things.

And I experienced that first-hand with this problem. When I ran into the timing issue (why is it taking 13 seconds to calculate prime numbers?), I didn't fire up the debugger. Instead, I just ran a few lines of the code to see what the intermediate output is.

It makes more sense now. You don't need to inspect intermediate variables because values aren't changing -- there's no need for a "watch". Instead you're mostly dealing with discrete inputs and outputs, and many processes are several functions composed together. So you can run them in bits and pieces and to get observable results. That's pretty cool.

Wrap Up
I'm obviously still headed forward with F#. And I'll be exploring with more of the Euler problems. I've taken a peek at #6 and it looks pretty easy. We'll see how it goes when I dive into it.

We get better by getting help from those who are ahead of us on the path, and we make our community stronger by helping those who are behind us on the path.

Happy Coding!