Using Subsets for Association Rule Learning
June 3, 2014 1 Comment
I finished up writing the association rule program from MSDN in F# last week. One of the things bothering me about the way I implemented the algorithms is that I hard-coded the combinations (antecedent and consequent) from the item-sets:
I thought it would be a fun exercise to make a function that returns the combinations for an N number of itemSets. My first several attempts failed because I started off with the wrong vocabulary. I spent several days trying to determine how to create all of the combinations and/or permutations from the itemSet. It then hit me that I would be looking at getting all subsets and what do you know, there are some excellent examples out there.
So if I was going to use the yield and yield-bang method of calculating the subsets in my class, I first needed to remove the rec and just let the class call itself.
I then needed a way of translating the itemSet which is a an int array into a set and back again. Fortunately, the set module has ofArray and toArray functions so I wrote my code exactly the way I just described the problem:
Note that I had to call toArray twice because the Subsets returns a Set<Set<Int>>.
In any event, I then needed a way of spitting the itemSet into antecedents and consequents (called combinations) based on the current subset. I toyed around with a couple different ways of solving the problem when I stumbled upon a way that makes alot of sense to me. I changed the itemset from an array of int to an array of tuple<int*bool>. If the subset is in the itemSet, then the bool flag is true, if not it is false. Then, I would apply an Seq.filter to the array and separate it out into antecedents and consequents.
The major downside of this approach is that I am using Array.exists for my filter flag so if there is more than one of the same value in the itemset, it does not work. However, the original example had each itemset being unique so think I am OK.
So with these tow methods, I now have a way of dealing with N number of itemsets. Interestingly, the amount of code (even with my verbose F#) is significantly less than the C# equivalent and closer to how I think I think.