Association Rule Learning Via F# (Part 1)

I was reading the most recent MSDN when I came across this article.  How awesome is this?  McCaffrey did a great job explaining a really interesting area of analytics and I am loving the fact that MSDN is including articles about data analytics.  When I was reading the article, I ran across this sentence “The demo program is coded in C# but you should be able to refactor the code to other .NET languages such as Visual Basic or Iron Python without too much difficulty”  Iron Python?  Iron Python!  What about F#, the language that matches analytics the way peanut butter goes with chocolate?  Challenge accepted!

The first thing I did was to download his source code from here.  When I first opened the source code, I realized that the code would be a little bit hard to port because it is written from a scientific angle, not a business application point of view.  34 FxCop errors in 259 lines of code confirmed this:

image

Also, there are tons of comments which is very distracting. I generally hate comments, but I figure that since it is a MSDN article and it is supposed to explain what is going on, comments are OK. However, many of the comments can be refactored into more descriptive variables and method names. For example:

imageimage

In any event, let’s look at the code. The first thing I did was change the CS project from a console app to a library and move the test data into an another project . I then moved the console code to the UI. I also moved the Rule class code into its own file, made sure the namespaces matched, and made the AssociationRuleProgram public.  Yup it still runs:

imageimage

So then I created a FSharp library in the solution and set up the class with the single method:

image

A couple of things to note:

1) I left the parameter naming the same, even though it is not particularly intention-revealing

2) F# is typed inferred, so I don’t have to assign the types to the parameters

Next, started looking at the supporting functions to GetHighConfRules.  Up first was the function call NextCombination.  Here is the side by side between the imperative style and the functional style:

imageimage

The next function was NextCombination was more difficult for me to understand.  I stopped what I was doing and built a unit test project that proved correctness using the commented examples as the expected values.  I used 1 test project for both the C# and F# project so I could see both side by side.  An interesting side not is that the unit test naming is different than usual –> instead of naming the class XXXXTests where XXXX is the name of another class, XXXX is the function name that both classes are implementing:

So going back to the example,

image

I wrote two unit tests that match the two comments

image

When I ran the tests, the 1st test passed but the second did not:

image

The problem with the failing test is that null is not being returned, rather {3,4,6}.  So now I have a problem, do I base the F# implementation on the code comments or the code itself?  I decided to base it on the code, because comments often lie but CODE DON”T LIE (thanks ‘sheed).  I adjusted the unit test, got green. 

One of the reasons the code is pretty hard to read/understand is because of the use of ‘i’,’j’,’k’,’n’ variables.  I went back to the article and McCaffrey explains what is going on at the bottom left of page 60.  Another name for the function ‘NextCombination’ could be called ‘GetLexicographicalSuccessor’ and the variable ‘n’ could be called ‘numberOfPossibleItems’.   With that mental vocabulary in place, I went through the functional and divided it into 4 parts:

1Checking to see if the value of the first element is of a certain length

image

2) Creating a result array that is seeded with the values of the input array

image

3) Looping backwards to identify the 1st number in the array that will be adjusted

image

4) From that target element, looping forward and adjusting all subsequent items

image

#1 I will not worry about now and #2 is not needed in F#, so #3 is the first place to start.  What I need is a way of splitting the array into two parts.  Part 1 has the original values that will not change and part 2 has the values that will change.  Seq.Take and Seq.Skip are perfect for this:

  1. let i = Array.LastIndexOf(comb,n)
  2. let i' = if i = – 1 then 0 else i
  3. let comb' = comb |> Seq.take(i') |> Seq.toArray
  4. let comb'' = comb |> Seq.skip(i') |> Seq.toArray

Looking at #4, I now need to increment the values in part 2 by 1.  Seq.take will work:

image

And then putting part 1 and part 2 back together via Array.Append, we have equivalence*:

imageimage

*Equivalence is defined by my unit tests, which both pass green.  I have no idea if other inputs will not work.  Note that the second unit test runs red, so I really think that the code is wrong and that the comment to return null is correct.  The value I am getting for (3;4;5)(5) is (3;4;1) which seems to make sense.

I am not crazy about these explanatory variables (comb’, comb’’, and comb’’’) but I am not sure how to combine them without sacrificing readability.  I definitely want to combine the i and i’ into 1 statement…

I am not sure why Scan is returning 4 items in an array when I am passing in an array that has a length of 3.  I am running out of time today so I just hacked in a Seq.Take.

I’ll continue this exercise in my blog next week.

 

Advertisements

One Response to Association Rule Learning Via F# (Part 1)

  1. Pingback: F# Weekly #20, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: