“Word Counts”: Using FSharp and HDInsight

 

I decided to learn a bit more about HDINisght, Microsoft’s implementation of Hadoop on Azure.  I was surprised about the dirth of tutorials on-line (not even Pluralsight) with only this one seemingly having what I wanted.  I started down the tutorial path –> and rewrite the map and reduce programs in F#.

Here is the original mapper code (in C#)

1 static void Main(string[] args) 2 { 3 if (args.Length > 0) 4 { 5 Console.SetIn(new StreamReader(args[0])); 6 } 7 8 string line; 9 string[] words; 10 11 while ((line = Console.ReadLine()) != null) 12 { 13 words = line.Split(' '); 14 15 foreach (string word in words) 16 Console.WriteLine(word.ToLower()); 17 } 18 }

And here it is in F#

1 [<EntryPoint>] 2 let main argv = 3 if argv.Length > 0 then 4 let inputString = argv.[0] 5 Console.SetIn(new StreamReader(inputString)) 6 let mutable continueLooping = true 7 while continueLooping do 8 let line = Console.ReadLine() 9 match String.IsNullOrEmpty(line) with 10 | true -> 11 continueLooping <- false 12 | false -> 13 let words = line.Split(' ') 14 words |> Seq.iter(fun w -> Console.WriteLine(w.ToLower())) 15 0

 

And here is the original reducer in C#

1 static void Main(string[] args) 2 { 3 string word, lastWord = null; 4 int count = 0; 5 6 if (args.Length > 0) 7 { 8 Console.SetIn(new StreamReader(args[0])); 9 } 10 11 while ((word = Console.ReadLine()) != null) 12 { 13 if (word != lastWord) 14 { 15 if(lastWord != null) 16 Console.WriteLine("{0}[{1}]", lastWord, count); 17 18 count = 1; 19 lastWord = word; 20 } 21 else 22 { 23 count += 1; 24 } 25 } 26 Console.WriteLine(count); 27 }

and here it is in F#

1 [<EntryPoint>] 2 let main argv = 3 if argv.Length > 0 then 4 let inputString = argv.[0] 5 Console.SetIn(new StreamReader(inputString)) 6 let mutable continueLooping = true 7 let mutable lastWord = String.Empty 8 let mutable count = 0 9 while continueLooping do 10 let word = Console.ReadLine() 11 match String.IsNullOrEmpty(word), word = lastWord, String.IsNullOrEmpty(lastWord) with 12 | true,_,_ -> 13 continueLooping <- false 14 | false,true,_ -> 15 count <- count + 1 16 | false,false,true -> 17 count <- 1 18 lastWord <- word 19 | false,false,false -> 20 Console.WriteLine("{0}[{1}]",lastWord,count) 21 Console.WriteLine(count) 22 0

 

The biggest difference is that the conditional if..thens of the imperative style C# is replaced by pattern matching, which I feel makes the logic much more understandable.  The use of the mutable keyword is a smell, but I am not sure how to loop user input in a Console app without it.

In any event, with the programs complete and pushed out to the Hadoop file system, I ran it via the Azure Powershell

 image

 

image

And looking at the output, nothing is coming down.

image

Drat.  I then tried to run the C# program and nothing is coming down.  I wonder if it is a problem with the original code or perhaps the data I am using?  The tutorial does not include a link to a dataset that works with the programs so I am a bit out of luck.  More investigation needed, as it were.