# Using Subsets for Association Rule Learning

June 3, 2014 1 Comment

I finished up writing the association rule program from MSDN in F# last week. One of the things bothering me about the way I implemented the algorithms is that I hard-coded the combinations (antecedent and consequent) from the item-sets:

- static member GetCombinationsForDouble(itemSet: int[]) =
- let combinations = new List<int[]*int[]*int[]>()
- combinations.Add(itemSet, [|itemSet.[0]|],[|itemSet.[1]|])
- combinations
- static member GetCombinationsForTriple(itemSet: int[]) =
- let combinations = new List<int[]*int[]*int[]>()
- combinations.Add(itemSet, [|itemSet.[0]|],[|itemSet.[1];itemSet.[2]|])
- combinations.Add(itemSet, [|itemSet.[1]|],[|itemSet.[0];itemSet.[2]|])
- combinations.Add(itemSet, [|itemSet.[2]|],[|itemSet.[0];itemSet.[1]|])
- combinations.Add(itemSet, [|itemSet.[0];itemSet.[1]|],[|itemSet.[2]|])
- combinations.Add(itemSet, [|itemSet.[0];itemSet.[2]|],[|itemSet.[1]|])
- combinations.Add(itemSet, [|itemSet.[1];itemSet.[2]|],[|itemSet.[0]|])
- combinations

I thought it would be a fun exercise to make a function that returns the combinations for an N number of itemSets. My first several attempts failed because I started off with the wrong vocabulary. I spent several days trying to determine how to create all of the combinations and/or permutations from the itemSet. It then hit me that I would be looking at getting all subsets and what do you know, there are some excellent examples out there.

So if I was going to use the yield and yield-bang method of calculating the subsets in my class, I first needed to remove the rec and just let the class call itself.

- static member Subsets s =
- set [
- yield s
- for e in s do
- yield! AssociationRuleProgram2.Subsets (Set.remove e s) ]

I then needed a way of translating the itemSet which is a an int array into a set and back again. Fortunately, the set module has ofArray and toArray functions so I wrote my code exactly the way I just described the problem:

- static member GetAntcentAndConsequent(itemSet: int[]) =
- let combinations = new List<int[]*int[]*int[]>()
- let itemSet' = Set.ofArray itemSet
- let subSets = AssociationRuleProgram2.Subsets itemSet'
- let subSets' = Set.toArray subSets
- let subSets'' = Array.map(fun s-> Set.toArray s)
- let subSets''' = Array.map(fun s -> Seq.toArray s, AssociationRuleProgram2.GetAntcentAndConsequent s)

Note that I had to call toArray twice because the Subsets returns a Set<Set<Int>>.

In any event, I then needed a way of spitting the itemSet into antecedents and consequents (called combinations) based on the current subset. I toyed around with a couple different ways of solving the problem when I stumbled upon a way that makes alot of sense to me. I changed the itemset from an array of int to an array of tuple<int*bool>. If the subset is in the itemSet, then the bool flag is true, if not it is false. Then, I would apply an Seq.filter to the array and separate it out into antecedents and consequents.

- static member GetCombination array subArray =
- let array' = array |> Seq.map(fun i -> i, subArray |> Array.exists(fun j -> i = j))
- let antecedent = array' |> Seq.filter(fun (i,j) -> j = true) |> Seq.toArray
- let consquent = array' |> Seq.filter(fun (i,j) -> j = false) |> Seq.toArray
- let antecedent' = antecedent|> Seq.map(fun (i,j) -> i)
- let consquent' = consquent|> Seq.map(fun (i,j) -> i)
- Seq.toArray antecedent', Seq.toArray consquent'

The major downside of this approach is that I am using Array.exists for my filter flag so if there is more than one of the same value in the itemset, it does not work. However, the original example had each itemset being unique so think I am OK.

So with these tow methods, I now have a way of dealing with N number of itemsets. Interestingly, the amount of code (even with my verbose F#) is significantly less than the C# equivalent and closer to how I think I think.

Pingback: F# Weekly #23, 2014 | Sergey Tihon's Blog