Monday, August 29, 2016

Being Present - Mid-Year Review

At the beginning of this year, I made a commitment to Being Present at events where I'm speaking. I've been thinking about this the last couple days, so it's probably time to put down some specifics. (And yes, I know it's a bit past mid-year, but we'll just ignore that.)

Lots of Time Away from Home
In case you haven't figured it out, I really like speaking. I like helping developers take an easier path around the hurdles that I had to get over when I was learning these topics. Since January I've spoken at 22 events. These range from local user groups that are 8 miles from home to conferences that are 9 times zones away from where I live.

In all, I've spent 54 days on the road. A regular job would advertise that as 25% travel. That's a lot more than I've done in the past (and I've still got several more trips before the year is out). Fortunately, I don't have much that keeps me from traveling (the cats pretend to not get lonely), so I'm taking advantage of the opportunity while I can.

So how are things going?

Awesome Interactions
I've had a ton of awesome interactions this year. I first made it to the central time zone last year, and I've made some really good friends who make the rounds in that area.

Music City Code is freshest in my mind (since I was there a little over a week ago). It was really great to spend some time with Eric Potter (@pottereric) who I think I first met at Nebraska.Code() last year. Also, Cameron Presley (@pcameronpresley) who I spent some time with at Code PaLOUsa in Kentucky earlier this year. I also had some good conversations with Chris Gardner (@freestylecoder) and Heather Tooil (@HeatherTooill) -- I've seen both of them at other events, but never really sat down to talk. It was great to get to know them better.

Other people I got to know for the first time included Hussein Farran (@Idgewoo), Jesse Phelps (@jessephelps), Paul Gower (@paulmgower), and Spencer Schneidenbach (@schneidenbach).

In addition, I got to catch up with people who I know well from other events, including (but not limited to) Ondrej Balas, Justin James, Jim Wooley, Jeff Strauss, James Bender, Duane Newman, Kirsten Hunter, David Neal, Phil Japikse, and Paul Sheriff. (Sorry, I'm too lazy to include links; I know I've mentioned them in the past.)

And this is really just from Music City Code. If I look back at the other events I've been to, I've met some great people and been able to get to know them better (as an example, I met Matt Renze (@MatthewRenze) when we shared a ride from the airport to CodeMash; we both went to NDC London; and we hung out again at KCDC). And speaking of KCDC, it was great to spend some time with Cori Drew (@coridrew) who I first met at That Conference last year, and Heather Downing (@quorralyne) who I first met at Nebraska.Code() last year and got to hang out with again at Code PaLOUsa. (More on KCDC: A Look Back at June 2016.)

This makes it sound like I only hang out with other speakers, but that's definitely not the case. I tend to spend a bit of additional time with speakers because we're often staying at the same hotel and/or looking for things to do after all the local folks go home for the night. And repeated interactions at different events reinforce these relationships.

I have great conversations with the non-speaker folks, too.

Other Interactions
I'm always surprised at the folks that I end up running into over and over at an event. At Music City Code, I had a conversation with Eric Anderson one morning, and we kept running into each other throughout the event.

At Visual Studio Live! in Austin, I ended up having dinner with Andrew, Mike, and Mike (differentiated as "Jersey Mike" and "Baltimore Mike"). None of us knew each other before the event, but we walked over to an event dinner together, ended up talking throughout the week, and even rounded things out with really excellent barbecue on the last night.

I made a ton of new friends at NDC Oslo (I mentioned just a few of them previously). CodeMash was awesome because I got to sit down with Maggie Pint to get to know her better (and you can read more about that in the NDC article).

Okay, so I'm going to stop now. I've been going through my notes from the various conferences and there are too many awesome people to mention. I've met a ton of great people. The conversations are useful because I get to hear what other people are successful with and what they are struggling with. Even if those relationships don't continue, we're still the better for having had the conversation.

And when the relationships do continue, it's a great thing.

Being Present
I credit these conversations and these relationships to "being present" at the event. I'm around during the morning coffee time before the event. I'm around during lunch time. I'm around at the breaks. I'm around for the dinners and after parties (with some caveats). And because I know that I can sleep when I get home, I try to be around for the hotel lobby conversations late in the evening.

This gives me a lot of opportunities to interact. I'm not always successful, but the more I'm available, the more conversations I have.

Stepping Out Early
I have stepped out early from parts of events. This is actually something that I put in to my original commitment:
  • This also means that I will be available at the noisy, crowded receptions that clash with my introvert nature (although I reserve the right to find a small group of people to go have coffee with in a quieter location).
I don't usually last very long at receptions or after parties. As an introvert, the noise and activity are overwhelming and suck out all of my energy. So I usually try to find a group of folks where I can "anchor". Sometimes this lets me stay at the party, sometimes it means that we go off somewhere else.

For example, at CodeMash there was a reception at a bar that was *very* loud. But I managed to get into a circle of 4 or 5 people (and stay in that circle), so I was able to manage by focusing on the conversation with the people around me. I managed to do the same thing at the KCDC party. I walked around the venue a little bit and had some good (short) conversations. But when I was saw that I was running out of energy (I even stepped outside for a bit), I found a table of folks where I could "anchor". I could focus on the 5 or 6 people at the table and block out the rest of the activity.

Other events played out a bit differently. At the Music City Code party, things were extremely loud. I had a couple good conversations, but it was overwhelming. A few of us ended up going upstairs to the restaurant (which was a bit quieter) -- our group kept getting bigger as more people stepped out for a "break". I think we ended up with 6 folks having dinner. I went back down to the party for a little while to make sure I had a chance to say goodbye to folks I wouldn't be seeing again. And I ended up talking with Erin Orstrom (@eeyorestrom) about introvert & extrovert developers.

The party at NDC Oslo had a couple bands. I kind of wanted to stay for a little while to hear them, but I ran into a group of folks who were going out to dinner. Since I knew I wouldn't last long at the party, I decided to take the opportunity to go to dinner with Evelina Gabasova (@evelgab), Tomas Petricek (@tomaspetricek), and Jamie Dixon (@jamie_dixon).

I'm still working on how I can best deal with the overwhelming situations. I'd like to be present for the entirety of those, but I know that I need to take care of myself as well.

Tough Decisions
As expected, there have been some tough decisions this year. This is the first year that I've had to decline events because I was accepted at more than one for the same time period. That's a good problem to have, but I want to avoid it as much as possible. It's hard enough for organizers to select speakers and put together a conference schedule; it's even worse when one of the selected speakers can't make it.

When there are multiple events during a particular week, I've decided to submit to only one of them. This has been tough because I don't always make the right decisions. I've "held" a week for a particular event (that I was pretty sure I'd get selected for), and then I don't get selected. By that time, it's too late to submit to the other event for that week. The result is that I had some gaps in my schedule that I would rather have not had. But I'm just playing things by ear at this point. I'm not sure what the "right" events are.

As an example, I would really like to be part of the first TechBash (particularly since Alvin Ashcraft (@alvinashcraft) has been such a great support in sending folks to my blog). But I held that week for another event that I had submitted to (actually 2 more local events that were back-to-back). One of those events didn't accept me; had I known that, I would have planned differently. But it also opened up an opportunity for me to do a workshop at Code Stars Summit (there's still space available), so I've got something good to look forward to that week.

It has been hard getting rejected by events that I really wanted to be at. And it's even harder when the event is happening, and I'm watching folks online talk about how awesome it is. Rejection is part of the process, though. It's normal, and it doesn't reflect on who you are as a person -- at least I keep telling myself that :)

There are some events that I went to this year (and I'm really glad that I did), but I won't be submitting again next year. These are also tough decisions. If you really want me to be at your event, contact me personally, and I'll see what I can arrange. I try to move stuff around for people who send me an invitation.

Falling into Success
I think that my decision to only submit for events that I really want to attend helps me stick with my commitment. If I'm at an event that I want to be at, then I'm more likely to be engaged and excited about it.

I've had some awesome opportunities as a speaker this year. I'm very thankful to everyone who comes to hear me speak and for those who tell me that it was useful to them. I'm looking forward to the opportunities that are still coming this year (Upcoming Events). And I'm also excited about some events that are coming up next year -- I'll announce those as soon as they are official.

In the meantime, I'm glad that I'm conscious about "being present" at the events I'm speaking at. It gives me lots of opportunities to meet new people, catch up with old friends, and expand the amount of awesome that I get from each event. And hopefully it expands the amount of awesome for the folks I talk to as well.

Happy Coding!

Monday, August 15, 2016

Recognizing Hand-Written Digits: Getting Worse Before Getting Better

I took a stab at improving some machine-learning functions for recognizing hand-written digits. I actually made things less accurate, but it's pointing in a promising direction.

It's been a long time since I first took a look at recognizing hand-written digits using machine learning. Back when I first ran across the problem, I had no idea where to start. So instead of doing the machine learning bits, I did some visualization instead.

Then I got my hands on Mathias Brandewinder's book Machine Learning for .NET Developers, and he showed some basics that I incorporated into my visualization. I still didn't know where to go from there. Recently, I've been doing some more F# exploration, and that inspired some ideas on how I might improve the digit recognizers.

To take a look at the history of the Digit Display & Recognition project, check out the "Machine Learning (sort of)" articles listed here: Jeremy Explores Functional Programming.

Blurring the Results
My first stab at trying to improve the recognizers came from reading Tomas Petricek's book Real-World Functional Programming. In the book, he shows a simple function for "blurring" an array:


There's a lot going on here, and I won't walk through it. But this takes a array of values and then averages each item with its neighbors.

Here's an example that creates an array of random values and then runs it through the "blurArray" function:


If we look at the output, the first array is a set of random numbers. The second output shows the result of running it through our blur function one time.

The last result shows the result of running through the blur function three times. And we can see that the values get "smoother" (or "blurrier") with each step.

Applying Blur to the Digit Recognizer
When I saw this, I thought of the digit recognition problem. Our data was simply an array of numbers. What would happen if I ran a similar "blur" over the digit data?

Note: this code is available in the "BlurClassifier" branch of the "digit-display" project on GitHub: jeremybytes/digit-display "Blur Classifier".

The reason I thought of this is because the current algorithms are doing strict comparisons between 2 images (one pixel at a time). But if the images are offset (meaning translated horizontally or vertically by several pixels), then the current recognizers would not pick it up. If I added a "blur", then it's possible that it would account for situations like this.

Blurring the Data
Here's my function to blur the data that we have:


This is a bit more complex than the function we have above. That's because we're really dealing with 2-dimensional data. Each pixel has 8 adjacent pixels (including the row above and below).

I won't go into the details here. I skipped over the edges to make things a bit simpler, and I also weighted the "center" pixel so that it was averaged in 4 times more than the other pixels.

The New Distance Function
With this in place, I could create a new "distance" function:


This takes 2 pixel arrays, blurs them, and then passes them to our Manhattan Distance function that we already have in place. This means that we can do a direct comparison between our Manhattan Distance recognizer and our new Blur Distance recognizer.

The Results
Unfortunately, the results were less than stellar. Here's the output using our Digit Display application:


Note: When comparing the results, the numbers aren't in the same order due to the parallelization in the application. But they should be in the same general area in both sets of data.

There is both good and bad in the results. The good news is that we correctly identified several of the digits that the Manhattan Classifier got wrong.

The bad news is that there are new errors that the original classifier got right. But even with the new errors, it didn't perform any "worse" overall than the original. That tells me that there may be some good things that we can grab from this technique.

But now let's look at another approach.

Adding Some Weight
The other idea that I came up with had to do with how the "best" match was selected. Here's the basic function:


This runs the "distance" function (the "dist" right in the middle) to compare our target item against every item in the training set. In the distance calculation, smaller is better, so this just takes the smallest one that it can find.

But the "best" match isn't always the correct one. So I came up with the idea of looking at the 5 closest matches to come up with a consensus.

Note: this code is available in the "WeightedClassification" branch of the "digit-display" project on GitHub: jeremybytes/digit-display "Weighted Classification".

Here's that function:


This has quite a few steps to it. There's probably a much shorter way of doing this, but this makes it easy to run step-by-step using F# Interactive.

Instead of pulling the smallest value (using "minBy" in the original), it gets the 5 smallest values. It looks something like this (there's some bits left out to make it more readable):


Then it counts up how many of each value. In this case, we have three 6s and two 5s. Then it pulled out the one with the most values in the list. (And 6 is correct in this case.)

To put this into the application, I composed the functions a bit differently to come up with a "weighted" classifier that still used the Manhattan Distance.

The results were not very good:


This actually makes things less accurate overall. But there are some promising items by looking at these results.

First, several of the items that the standard Manhattan Classifier got wrong were correctly identified by the weighted classifier. This did reinforce that the smallest number was not always the correct number.

But there were also a lot of items that this new classifier identified incorrectly. So overall, the performance was worse than the original.

More Refinement
Although this looks like a failure, I think I'm actually headed in the right direction. One thing that I can do to make this more accurate is to add a true "weight" to the calculation. Here's another example from our current approach:


If we look at these values, the distance calculations are fairly close together (within about 1500 of each other.) In this case, we can pretty confidently take the one with the "most" values (which is 2 in this case).

But compare that to this:


Here we have a much bigger gap between our best value and our worst value (over 5000). And there is even a big gap between the first value and the next best value (over 4000). Because of this, I really want to weight the first value higher. A simple consensus doesn't work in this case (especially since we have a "tie").

So even though we get worse results with the current implementation, I think this really shows some promise.

If I can add some "weight" to each value (rather than simply counting them), I think it can improve the accuracy by eliminating some of the outliers in the data.

Wrap Up
I really like having the visualization for what the machine-learning algorithms are doing. This gives me a good idea of where things are going right and where they are going wrong. This is not something that I could get just from looking at "percentage correct" values.

These two approaches to improving the results didn't have the intended effect. But because we could see where they went right and where they went wrong, it's possible to refine these into something better.

I'll be working on adding actual weights to the weighted classifier. I think this holds the most promise right now. And maybe adding a bit of "blur" will help as well. More experimentation is needed. That means more fun for me to explore!

Happy Coding!

Sunday, August 14, 2016

Recognizing Hand-Written Digits: Easier Side-By-Side Comparison

I've been working some more on my code that recognizes hand-written digits. I've actually been trying out a few different approaches to try to improve the machine learning algorithm. But before talking about those, we'll take a look at a few improvements that I've made to the UI.

Note: Articles in this series are collected under the "Machine Learning (sort of)" heading here: Jeremy Explores Functional Programming.

New Features
I added a couple of features to the application. Here's a screenshot:


This code is available on GitHub in the "DetailedComparison" branch of the "digit-display" project: jeremybytes/digit-display "DetailedComparison". (This has been rolled into the "master" branch as well, but the code there may have been updated by the time you read this.)

Record Count
There's now an input field for the number of records to process. Previously, this was a value in the code. This makes it much easier to pick new values based on the screen size.

(Note: I just noticed the typo in the header. D'oh.)

Offset
Previously, we were only able to use records from the beginning of our data set. This offset allows us to start at an arbitrary location in our data file (we'll see why this is important in just a bit).

"Go" Button
Instead of processing things when the application starts up, we now have a button to kick things off. This also means that we can change the parameters and re-run the process without needing to restart the application.

Separate Duration
Each classifier now has it's own duration timer. (The previous version just had a single timer for the entire process.)

Error Counts
This is actually the same as our previous version. But I wanted to point out that we can go through and click on the incorrect items. This changes the color of the item and increments our error count. This makes it easy to compare our two different recognizers.

This is still a manual process. We need a human to make a determination on whether the computer was correct. If I can't make out the hand-written digit, then I do not count it as an error.

Different Offsets
I wanted to have the offset available because I knew that if you keep trying to improve a recognizer against a small dataset, eventually you get really good at that small set. But that doesn't necessarily translate into being a good general-purpose recognizer.

I had done some experimentation by changing the offset in code. But having the parameter available makes things much easier. Here's an example of how different data makes a big difference:


Both of our recognizers performed significantly better with this set of data (which starts at item 500) compared to the data above (which starts at 0).

This is why it's important for us to look at different sets of data. When we start changing things, it may get better in some areas but worse in others.

Code Updates
I made a bit of a significant change in the UI: I created a user control to run the recognizer and process the data. This moved a bunch of code out of the main form, and it also reduced quite a bit of the duplication.

The application displays 2 of the user controls side-by-side, but we could display 3 or more (as many as we'd like, really). The user control makes that really easy.

Here's the code behind the "Go" button:


In our UI, we have 2 panels: LeftPanel and RightPanel. We start by clearing these out.

Then we grab the data. Since we have the parameters in the UI, I figured it was best to get the data in this central location (and only do it one time), and then we can pass the data to each of our user controls. The "LoadDataStrings" method grabs the data from the file based on our parameters.

Then we create 2 "RecognizerControl" objects (this is our user control). This has three parameters: (1) the string for the header, (2) the classifier (our recognizer), and (3) the data that will be processed.

We just create a user control for each of our recognizers and then add them to the panels in the UI. I'll be tweaking this part a bit in the future, but this works for us for now.

As a reminder, this code is available on GitHub in the "DetailedComparison" branch of the "digit-display" project: jeremybytes/digit-display "DetailedComparison".

Wrap Up
These changes aren't real exciting. But they do put us in a place that makes it very easy for us to swap out different recognizers. I originally wanted to add some drop-downs so that we could pick different recognizers, but I wanted to prioritize some other things before tweaking the UI further. That may get added in the future.

I've been playing with a couple of ideas to improve the recognizers. These have been inspired by some of the F# reading that I've been doing as well as some ideas of my own. We'll take a look at those next time.

[Update 08/15/2016: Here's the experimentation with those ideas: Getting Worse Before Getting Better.]

Happy Coding!

Tuesday, August 2, 2016

Jeremy at Live! 360 Orlando 2016

I'll be speaking at Live! 360 in Orlando, FL in December. It will be a great time. I had the chance to speak there last year, and it was a week packed with lots of great sessions, lots of great people, and a bit of fun, too.

I've got three sessions scheduled for Live! 360 Orlando in December. Check out my talks along with all the other great speakers here: Speakers - Live! 360.


If you're not convinced that this is an event you want to attend, you can get a preview by taking a look at the Live! 360 Learning Library.

If you head out there right now, you'll see me! To watch my video for FREE, just fill in your name and email address and click the "Access Now" button.


This recording is from Visual Studio Live! in Austin, TX this past May. You'll be able to see this talk (plus, tons of others) by signing up to go to Live! 360 Orlando in December.

Happy Coding!

Monday, August 1, 2016

August 2016 Speaking Engagements

I'm back on the road in August, and the rest of the year is filling in as well. If you'd like me to come to your event, be sure to drop me a line. Here are some of the things that I'm most passionate about: Presentation Topics.

Thu-Sat, Aug 18-20, 2016
Music City Code
Nashville, TN
Conference Site
o DI Why? Getting a Grip on Dependency Injection
o Unit Testing Makes Me Faster: Convincing Your Boss, Your Co-Workers, and Yourself

I'm really excited about heading to Music City Code in a couple weeks. It's my first visit to Nashville, so I'm looking forward to some new sights and the chance to share with a new group of people. I'm also looking forward to seeing some of my friends from the area.

I'm talking about two things that have been really useful to me: Dependency Injection and Unit Testing. I love talking about DI because I get to to watch the light bulbs go on as people finally "get it". It's not a difficult topic, it's just that we're generally introduced to it completely backwards. When we look at it from the front, it makes a lot more sense.

I also love talking about Unit Testing. Lots of people have had bad experiences with unit testing. But if we pay attention to what we're doing, we can get some amazing benefits from it -- it's made me a faster developer.

A Full Day with Jeremy
If you've wanted to spend a full day with me, here's your chance. On September 30, I'll be conducting a full-day workshop as part of the Code Stars Summit in San Jose, CA.

Friday, Sep 30, 2016
Code Stars Summit
San Jose, CA
Workshop Site
o Getting Better at C#: Interfaces and Dependency Injection

Loosely coupled code is easier to maintain, extend, and test. Interfaces and Dependency Injection (DI) help us get there. In this workshop, we'll see how interfaces add "seams" to our code that make it easier to swap out functionality. We'll also see how DI gives us loose coupling for extensibility and testing. And this doesn't have to be complicated; just a few simple changes to our constructors and properties give us huge benefits. More...

Sign up soon to take advantage of Early Bird discounts.

Coming Soon
This fall will be pretty busy. In September, I'll be at AIM hdc in Nebraska. At the end of the month, I'll be speaking at the SouthBay.NET User Group in Mountain View, CA. And I'll also be doing a full-day workshop as part of Code Stars Summit in San Jose (as already mentioned).

October is filling up, including trips to San Jose, CA for the Silicon Valley Code Camp, to Chandler, AZ for the Desert Code Camp, and to St. Louis, MO for DevUp.

To get details and see a full list of places you can come see me, just take a look at my website: Upcoming Events.

Happy Coding!

Friday, July 29, 2016

Discover the Good: Object Oriented & Functional Programming

I've been spending a lot of time this month with functional programming - F# in particular. And I've been exploring functional programming for a while now (I just collected the articles, and there are more than I remembered).

Since I've written quite a bit this month, it has spurred some good conversations. I wanted to expand a bit on a question from Scott Nimrod (@Bizmonger).

As a side note, I got to meet Scott in person at NDC London back in January.

Are You Done with OO?
Here's a question Scott asked me a few days ago (Twitter link):


I've done Object-Oriented Programming for many years, and I've been successful with it. I've done Functional Programming for a much shorter time; and I really like it. But I don't see one paradigm replacing the other in my world.

These are different techniques with different strengths.

Discovering the Good
The thing I've been most impressed about in the Functional Programming community is the emphasis on discovering the good. And I've written about this before: An Outsider's View of the Functional Community.

Rather than getting caught up in "my language is better than your language", the people I know are more interested in the what each does well. When we find out what each tool is really good at, we know what to reach for when a particular problem comes up in our environment.

Sporadic Discouragement
Even though there is an overwhelming amount of positive that I see in the functional community, every so often, I'll come across a really depressing article like "Why I'll Never Do OO Programming Again". This is often a rant about failures of projects and problems that arose in their environment.

And this makes me really sad.

Object Oriented Programming has served a good purpose over the years. And it will continue to do that.

I Really Love Object Oriented Programming
Object-Oriented programming techniques have enabled a lot of successes in my career. There are situations where having state and behavior travel together works very well.

I've programmed a lot of line-of-business applications over the years -- and having a mutable object that knows how to validate itself based on a set of rules is really great in that environment. Encapsulation is great for exposing behaviors to the outside world but hiding the implementation. When you combine this with objects that are easily data-bound to a UI, then things get even better.

Object-oriented programming is the right technique for a lot of situations.

It's also the wrong technique for others. It's difficult to multi-thread processes when there is shared, mutable state. But that's okay. When we recognize that, we can look for other techniques.

I Really Love Functional Programming
Functional programming techniques are awesome. I managed to pick up on a lot of functional-ish ideas in my C# programming with delegates, lambdas, and LINQ. I didn't realize it at the time, but these were pushing me toward a better understanding of functional programming.

By having functions with discrete inputs (parameters) and discrete outputs (return) with no shared state, things get *extremely* easy to parallelize. And I've used this techniques in certain areas to get precisely this advantage.

Functional programming is the right technique for a lot of situations.

It's also the wrong technique for others. Data-binding immutable objects doesn't make sense since data-binding is all about notification of state change. But that's okay. When we recognize that, we can look for other techniques.

There's Enough Love to Go Around
I purposely stayed away from referring directly to languages here. In the .NET world, C# is an object-oriented language, but it also has a lot of features that enable functional programming. At the same time, F# is a functional language, but it has features that enable objected oriented programming (particularly since it needs to interop with .NET libraries that are OO-focused).
Object-Oriented Programming and Functional Programming are both awesome. They just happen to be awesome at different things.
We should really strive to use the right tool for the job. This means that we keep learning new things to add to our toolbox. I don't throw out my angle grinder because I got a new cordless drill. And I don't throw out perfectly working programming techniques just because I pick up a new one.

Keep learning; keep expanding; keep experimenting. What's best for your environment isn't necessarily what's best for my environment. And that's okay.

Happy Coding!

Tuesday, July 26, 2016

Sequences vs. Lists in F# (with Euler Problems #7, #8, #9, and #10)

In going through the Euler Problems in F#, I've found myself using lists and sequences a lot. Probably because I'm comfortable with the map, filter, collect, sum and similar functions because they are so much like the functions I love in LINQ in C#.

This means that my solutions have tended to be a bit of "brute force". But I have done some refinement later; this will continue.

I ended up working on the next set of Euler Problems at the same time -- bouncing between them and sharing ideas where I could. I ended up with solutions that were reasonably performant. But I picked up something else from this process:
I saw how choosing a sequence over a list (or a list over a sequence) can drastically impact performance.
Let's take a look at the solutions, and then we'll look at my learnings about sequences and lists.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

Euler Problem #7
Here's Euler Problem #7:
By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.
What is the 10,001st prime number?
And here's my solution:


And here's running the function with the target value:


This is a bit of a brute force solution. The "isPrime" determines whether there are any factors of a number. This is what we used when looking at Euler Problem #3 (the "factor" function), but then I added the "Seq.isEmpty" check on to the end. If there are no factors, then the number is prime.

Why Sequence is Important Here
The reason that using a sequence is important here is that sequences are lazy-evaluated -- similar to how IEnumerable in C# is only evaluated once you start enumerating through it.

So when we use "isEmpty", this pulls the first item in the sequence (and only the first item). If it finds an item, then it short-circuits the evaluation of the rest of the sequence. So, we don't actually end up calculating all the factors for a number, we only calculate the *first* factor. If it exists, then the number is not prime, and we return false. If there are no factors, then the number is prime, and we return true.

Why the "match"?
In the "euler7" function, we use pattern matching on our number. The reason is that I wanted to do a bit of optimization.

The second pattern uses another sequence. It goes from 3 to 1 million. But I specified "[3..2..1000000]", The middle term means "step 2", so we're only taking every other number in this sequence. This skips all of the even numbers (which we know are not prime), and cuts our processing time in half.

But we have one exception: "2" (an even number) is prime. So, I manually add a pattern for that in case someone asks for the first prime number.

Why Sequence is Important Here (Again)
You'll notice that we're using a sequence in the "euler7" function as well. This is because we ask for a particular "item". This will evaluate the sequence up to that item (and no further).

The "n-2" looks odd. That's because "item" is zero-based. So to get the 10001st prime, we need to get the item at index 10000. But we also skipped "2" in this sequence, so we need to subtract one more number from our index to get an accurate match.

Performance
It takes 18 seconds to evaluate the 10,001st prime number. That's not too bad for a brute force attack. There's sure to be a better mathematical approach to this problem, but I haven't researched that yet.

Euler Problem #8
Let's move on to Euler Problem #8:
The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?
First of all, 1000 digit number?!?

My solution starts out by treating this as a giant string:


When we run this to find the 13 numbers that produce the largest product, we get this result:


Just a warning when you're looking for Euler solutions online. For this particular problem, there are different target values. This one uses 13, but there are also solutions that are looking for 5 adjacent numbers or another variation.

We don't need to treat the original number as a number, we can treat it as a string. What I decided to do was create a 13-character "window" on the string that I could slide through the entire 1000 characters.

So the "windowProduct" function takes a starting index and the size of the window (13 in our case), takes those characters and turns them into integers, and then multiplies them together.

A couple notes: I extracted out a "charToInt64" function to make converting a character to a number a bit more obvious. And I'm using 64-bit values here because our product will overflow a 32-bit integer.

For the "euler8" function, I create a sequence that includes all of the possible windows in our 1000-digit number. It finds the product of all those numbers, and picks out the biggest one.

Why Sequence is Important Here
The sequence inside "windowProduct" is important because we're dealing with a string here. In C#, a string implements the IEnumerable<char> interface; in F#, string is a "seq<char>". Since we already have a sequence, we'll just keep working with it.

Why List is Important Here
Inside the "euler8" function, we use a list rather than a sequence. The reason is that we need all of the possible values in order to find the "max" value. This means that all of the items need to be evaluated. Because of this (and the fixed size of the collection), a list is a bit more efficient here.

Performance is good (less than a 10th of a second), but in my (extremely unscientific) tests, using a sequence in this function took about twice as long. With these particular values, it's not a significant difference. But it's something to keep in mind.

Euler Problem #9
So let's move on to Euler Problem #9:
A Pythagorean triplet is a set of three natural numbers, a < b < c, for which,
a2 + b2 = c2
For example, 32 + 42 = 9 + 16 = 25 = 52.
There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.
I did a very severe brute-force on this:


This does give the correct answer:


(And in case you're wondering, the three values are 200, 375, and 425.)

The "squareList" is a list that contains the squares for the numbers from 1 to half of our target value (I used half because it sounded reasonable since we were adding and squaring numbers in various ways -- I can't say that I have a mathematical proof for why this is a good value).

The "collect" function creates a Cartesian product of 2 sets of these lists of squares (similar to what we did for Euler Problem #4). This creates all of the permutations of values -- adding each square together and recording the value. The result is a list of tuples with the first square, the second square, and the sum of the two squares.

The "filter" compares the sum of the two squares against our "squareList" (since this is a list of known squares). It hangs on to only those values where the sum is also a square.

The "map" takes the square root of all of our values (yes, I know this isn't very efficient, but when I was trying to come up with a data structure that held both the squares and the original values, I ended up with a bit of a mess).

The "find" function gets the values where the original (non-squared) values add up to our target (1000).

Finally, we have the "prodTuple" function which will multiply the 3 values in our tuple together to get us the ultimate result.

Why Lists are Important Here
Like with our previous example, we have a fixed number of values (the Cartesian product of all of the squares from 1 to half of our target). Since this is a fixed size, and we're evaluating everything, a sequence wouldn't buy us any short-circuiting advantages.

Performance
The performance is okay. It takes a little over a second to evaluate this. Again, there is probably a better mathematical approach to this. But computers are very good a brute-forcing things (that's why they're so good at playing chess).

Euler Problem #10
Finally today, we have Euler Problem #10:
The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.
Here's my solution (that performs very well):


And when we run this with our target value, we get the correct answer:


This isn't my first try at this. My first try was a brute force approach. That didn't come out so well:


It worked, but it took almost an hour and a half to complete. This is really not acceptable.

Sieve of Eratosthenes
To get acceptable performance, I abandoned the brute-force approach and used something known as the Sieve of Eratosthenes. This is really good at finding the prime numbers below a certain value.

The short version is that we can eliminate all values that are multiples of a prime. Let's take the values from 2 to 20:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

First, we can eliminate everything evenly divisible by 2 (our first prime):

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Then we can eliminate everything evenly divisible by 3 (our second prime):

[2, 3, 4, 5, 6, 7, 8, 910, 11, 12, 13, 14, 1516, 17, 18, 19, 20]

We only have to go up to the square root of our target value (that's the number I mistakenly applied to a different problem previously) Since the square root of 20 is 4.x, we can stop at 4.

What's left is the set of primes less than 20:

[2, 3, 5, 7, 11, 13, 17, 19]

A Clumsy Approach
I'll admit that my implementation of the Sieve of Eratosthenes is a bit clumsy. But it works. You'll see that we have the same "isPrime" function that we used in Euler Problem #7 (although it's just on a single line here).

Then I calculate a set of "sievePrimes". This uses brute-force to get the prime numbers up to 1414 (which happens to be the square root of our target number).

The "divisibleByRange" function takes a number and sees if it is evenly divisible by any value in a range of values. When we use our "sievePrimes" as the range, we can find out which values can be eliminated in our sieve.

The "euler10" function applies the "divisibleByRange" against the entire set of numbers from 2 to our target value. We have to use the "not" because "divisibleByRange" returns true if it's divisible; and these are the numbers we do *not* want to include in our list.

Why Sequence is Important
Our "isPrime" uses a sequence. The importance of this is the short-circuiting that we saw earlier.

The sequence in "sievePrimes" is important because we don't know how many values we'll end up with. All we know is that there will be fewer than 1414. So rather than allocating a list, we'll use a sequence.

Then we take that sequence and turn it into a list. Why?

Why List is Important
The reason why we turn this sequence into a list is that once we've evaluated it one time, we *do* know how many items are in the list (and there's no reason to re-evaluate them later).

"divisibleByRange" works with a list because it's designed to work with our "sievePrimes" value, which we just saw is a list.

Why Sequence is Important
In "euler10" I start with a sequence because I don't know how many prime numbers will be in our collection.

The rest of the function will "map" the values to 64-bit integers (because our sum will overflow a 32-bit integer), and then we call "sum" to get our result.

Sequences and Lists
In going through these examples, I had to think quite a bit about whether I wanted to use a sequence or a list. I found that if I picked the wrong one, I would end up taking a performance hit.

For example, when I used a sequence for the "sievePrimes" (instead of converting it to a list), things slowed down significantly -- so much that I didn't even wait for the process to finish. Since this value is used over and over again, it's better for it to be in a list that's fully evaluated.

At the same time, for "isPrime" it works much better to use a sequence. As soon as the first value returns from the sequence, the "isEmpty" function returns false. This short-circuits evaluation of the rest of the collection.

Short Version:
Sequences are good when we don't know how many values we'll have or when we want to short-circuit evaluation.
Lists are good when we know how many values we have or when we want to evaluate everything (such as a max or sum).
This is something that I "knew" from reading. But when I was working with these particular problems, I saw the difference in action.

At times, the differences were insignificant. But at other times, the performance differences were dramatic -- to the point of making one solution completely unusable.

Wrap Up
I've been doing a lot of experimentation lately. This has been a great help in showing me the real differences in different solutions. In this case, I feel like I've got a much better handle on when sequences are appropriate and when lists are appropriate. (You'll notice that I didn't include any arrays here -- I guess that's something I'll have to look into.)

I know that there are better solutions to the Euler Problems than then ones that I've got here -- for example, I know that there are recursive implementations of the Sieve of Eratosthenes. But I'm not done looking into these problems just yet. It's time for me to do a bit more exploration online to see what other folks have come up with.

If you have solutions or ideas that you'd like to share, feel free. The comments on the other Euler articles that I've written have been very helpful to me (and hopefully to others as well).

Happy Coding!