Microsoft Language Stack Analogy

I am getting ready for my presentations at Charlotte Code Camp next Saturday.  My F# session is a business-case driven one: reasons why the average C# developer might want to take a look at F#.  I break the session down into 5 sections:  F# is integrated, fast, expressive, bug-resistant, and analytical.  In the fast piece, I am going to make the analogy of Visual Studio to a garage. 

Consider a man who lives in a nice house in a suburban neighborhood with a three car garage. Every morning when he gets ready for his morning commute to work, he opens the door that goes from their house into the their garage and there sitting in the 1st bay is a minivan. 

image

Now there is nothing wrong with the minivan – it is dependable, all of the neighbors drive it, it does many things pretty well.  However, consider that right next to the minivan, never been used, is a Ferrari.  Our suburban programmer has heard about a Ferrari, and has perhaps even glanced at it curiously when he  pulls out in the morning , but he:

  • Doesn’t see the point of driving it because the minivan suits him just fine
  • Is afraid to try driving it because he doesn’t drive stick and taking the time to learn would slow him down
  • Don’t want to drive it because then he would have to explain to his project manager wife why he are driving around town in such a car

So the Ferrari sits unused.  To round out the analogy, in the 3rd bay is a helicopter that no one in their right mind will touch.  Finally, there is a junked car around back that no one uses anymore that he has to keep around because it is too expensive to haul it to the junkyard.

image

 

So this is what happens to a majority of .NET developers when they open their garage called visual studio.  The go with the comfortable language of the C# minivan, ignoring the power and expressiveness of the F# Ferrari and certainly not touching the C++ helicopter.  I picked helicopter for C++ b/c helicopters can go places cars can not, is notoriously difficult to pilot, and when they crash, it is often spectacular and brings down others with them.  The junked car is VB.NET, which makes me sad on certain days….

Also, since C# 2.0, the minivan has tried to becomes more Ferrari-like.  It has added turbo engine called linq, added the var keyword, anonymous types, the dynamic keyword, all in the attempt to become the one minivan that shall rule all.

image

I don’t know much about Roslyn but what I have seen, I think I can take and remove language syntax and it will still compile.  If so, I will try and write a C# program that removes all curly-braces and semi-colons and replaces the var keyword with let.  Is it still C# then?

OT: can you tell which session I am doing at the Hartford Code Camp in 2 weeks?

image

(And no, I did not submit in all caps.  I guess the organizer is very excited about the topic?)

F# and List manipulations

I am preparing for a Beginning F# dojo for TRINUG tomorrow and I decided to do a presentation of Seq.GroupBy, Seq.CountBy, and Seq.SumBy for tuples.  It is not apparent by the same the difference among these constructs and I think having a knowledge of them is indispensible when doing any kind of list analysis.

I started with a basic list like so:

  1. let data = [("A",1);("A",3);("B",2);("C",1)]

I then ran a GroupBy through the REPL and got the following results:

  1. let grouping = data
  2.                 |> Seq.groupBy(fun (letter,number) -> letter)
  3.                 |> Seq.iter (printfn "%A")

  1. ("A", seq [("A", 1); ("A", 3)])
  2. ("B", seq [("B", 2)])
  3. ("C", seq [("C", 1)])

I then ran a CountBy through the REPL and got the following results:

  1. let counting = data
  2.                 |> Seq.countBy(fun (letter,number) -> letter)
  3.                 |> Seq.iter (printfn "%A")

  1. ("A", 2)
  2. ("B", 1)
  3. ("C", 1)

I then ran a SumBy through the REPL and got the following results:

  1. let summing = data
  2.                 |> Seq.sumBy(fun (letter,number) -> number)
  3.                 |> printfn "%A"

  1. 7

Now the fun begins.  I combined a GroupBy and a CountBy through the REPL and got the following results:

  1. let groupingAndCounting = data
  2.                         |> Seq.groupBy(fun (letter,number) -> letter)
  3.                         |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.countBy snd))
  4.                         |> Seq.iter (printfn "%A")

  1. ("A", seq [(1, 1); (3, 1)])
  2. ("B", seq [(2, 1)])
  3. ("C", seq [(1, 1)])

Next I combined a GroupBy and a SumBy through the REPL and got the following results:

  1. let groupingAndSumming = data
  2.                             |> Seq.groupBy(fun (letter,number) -> letter)
  3.                             |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.sumBy snd))
  4.                             |> Seq.iter (printfn "%A")

  1. ("A", 4)
  2. ("B", 2)
  3. ("C", 1)

I then combined all three:

  1. let groupingAndCountingSummed = data
  2.                                 |> Seq.groupBy(fun (letter,number) -> letter)
  3.                                 |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.countBy snd))
  4.                                 |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.sumBy snd))
  5.                                 |> Seq.iter (printfn "%A")

  1. ("A", 2)
  2. ("B", 1)
  3. ("C", 1)

With this in hand, I created a way of both counting and summing the second value of a tuple, which is a pretty common task:

  1. let revisedData =
  2.     let summed = data
  3.                     |> Seq.groupBy(fun (letter,number) -> letter)
  4.                     |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.sumBy snd))
  5.     let counted = data
  6.                     |> Seq.groupBy(fun (letter,number) -> letter)
  7.                     |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.countBy snd))
  8.                     |> Seq.map(fun (letter,sequence) -> (letter,sequence |> Seq.sumBy snd))
  9.     Seq.zip summed counted
  10.                     |> Seq.map(fun ((letter,summed),(letter,counted)) -> letter,summed,counted)
  11.                     |> Seq.iter (printfn "%A")

  1. ("A", 4, 2)
  2. ("B", 2, 1)
  3. ("C", 1, 1)

Finally, Mathias pointed out that I could use this as an entry to Deddle.  Which is a really good idea….

 

 

F# and the Open/Closed Principle

One of the advantages of using F# is that it is a .NET language.  Although F# is a functional-first language, it also supports object-oriented constructs.  One of the most powerful (indeed, the most powerful) technique in OO programming is using interfaces to follow the Open/Closed principle.  If you are not familiar, a good explanation of Open/Closed principle is found here.

As part of the F# for beginners dojo I am putting on next week, we are consuming and then analyzing Twitter.  The problem with always making calls to Twitter is that

1) The data changes every call

2) You might get throttled

Therefore, it makes good sense to have an in-memory representation of the data for testing and some Twitter data on disk so that different experiments can be run on the same data to see the result.  Using Interfaces in F# makes this a snap.

First, I created an interface:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5.  
  6. type ITweeetProvider =
  7.    abstract member GetTweets : string -> IEnumerable<DateTime * int * string>

Next, I created the actual Twitter feed.  Note I am using TweetInvi (available on Nuget) and that this file has to be below the interface in the solution explorer:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Configuration
  5. open Tweetinvi
  6.  
  7. type TwitterProvider() =
  8.     interface ITweeetProvider with
  9.         member this.GetTweets(stockSymbol: string) =
  10.             let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  11.             let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  12.             let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  13.             let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  14.         
  15.             TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  16.             let tweets = Search.SearchTweets(stockSymbol);
  17.             tweets
  18.                 |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount, t.Text)

 

I then hooked up a unit (integration, really) test

  1. [TestClass]
  2. public class UnitTest1
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingIBM_returnsExpectedValue()
  6.     {
  7.         ITweeetProvider provider = new TwitterProvider();
  8.         var actual = provider.GetTweets("IBM");
  9.         Assert.IsNotNull(actual);
  10.     }
  11. }

Sure enough, it ran green with actual Twitter data coming back:

image

I then created an In-Memory Tweet provider that can be used to:

1) Provide repeatable results

2) Have 0 external dependencies so that I can monkey with the code and a red unit test really does mean red

Here is its implementation:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5.  
  6. type InMemoryProvider() =
  7.     interface ITweeetProvider with
  8.         member this.GetTweets(stockSymbol: string) =
  9.             let list = new List<(DateTime*int*string)>()
  10.             list.Add(DateTime.Now, 1,"Test1")
  11.             list.Add(DateTime.Now, 0,"Test2")
  12.             list :> IEnumerable<(DateTime*int*string)>

The only really interesting thing is the smiley/bird character (: >).  F# implements interfaces a bit differently than what I was used to –> F# implements interfaces explicitly.  I then fired up a true unit test and it also ran green:

  1. [TestClass]
  2. public class InMemoryProviderTests
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingValidInput_ReturnsExpectedValue()
  6.     {
  7.         ITweeetProvider provider = new InMemoryProvider();
  8.         var tweets = provider.GetTweets("TEST");
  9.         var tweetList = tweets.ToList();
  10.         Int32 expected = 2;
  11.         Int32 actual = tweetList.Count;
  12.         Assert.AreEqual(expected, actual);
  13.     }
  14. }

Finally, I created a file-system bound provider so that I can download and then hold static a large dataset.  Based on past experience dealing with on-line data sources, getting data local to run multiple tests against is generally a good idea.  Here is the implementation:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5. open System.IO
  6.  
  7. type FileSystemProvider(filePath: string) =
  8.     interface ITweeetProvider with
  9.         member this.GetTweets(stockSymbol: string) =
  10.             let fileContents = File.ReadLines(filePath)
  11.                                 |> Seq.map(fun line -> line.Split([|'\t'|]))
  12.                                 |> Seq.map(fun values -> DateTime.Parse(values.[0]),int values.[1], string values.[2])
  13.             fileContents

And the covering unit (integration really) tests look like this:

  1. [TestClass]
  2. public class FileSystemProviderTests
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingValidInput_ReturnsExpectedValue()
  6.     {
  7.         var baseDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
  8.         var testFile = Path.Combine(baseDir, "TweetData.csv");
  9.         ITweeetProvider provider = new FileSystemProvider(testFile);
  10.         var tweets = provider.GetTweets("TEST");
  11.         var tweetList = tweets.ToList();
  12.         Int32 expected = 2;
  13.         Int32 actual = tweetList.Count;
  14.         Assert.AreEqual(expected, actual);
  15.     }
  16. }

Note that I had to add the actual file in the test project. 

image

Finally, the F# code needs to include try..catches for the external calls (web service and disk) and some argument validation for the strings come in.

In any event, I now have 3 different implementations that I can swap out depending on my needs.  I love having the power of Interfaces combined with benefits of using a functional-first language.

Consuming Twitter With F#

I set up a meetup for TRINUG’s F#/data analytics SIG to center around consuming and analyzing Tweets.  Since Twitter is just JSON, I assumed it would be easy enough to search Tweets for a given subjects in a given time period.  How wrong I was.  I spent several hours research different ways to consume Twitter to varying degrees of success.  My 1st stop was to investigate some of the more common libraries that C# developers use to consume Twitter.  Here is my survey of some of the more popular ones:

Twitterizer: No longer maintained

  1. // Install-Package twitterizer -Version 2.4.2
  2. // Update-Package Newtonsoft.Json -Reinstall
  3. open Twitterizer
  4.  
  5. type public TwitterProvider() =
  6.     member this.GetTweetsForDateRange(ticker:string, startDate: DateTime, endDate: DateTime) =
  7.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  8.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  9.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  10.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  11.         
  12.         let tokens = new OAuthTokens()
  13.         tokens.set_ConsumerKey(consumerKey)
  14.         tokens.set_ConsumerSecret(consumerSecret)
  15.         tokens.set_AccessToken(accessToken)
  16.         tokens.set_AccessTokenSecret(accessTokenSecret)
  17.  
  18.         let searchOptions = new SearchOptions()
  19.         searchOptions.SinceDate <- startDate
  20.         searchOptions.UntilDate <- endDate
  21.         let results = TwitterSearch.Search(tokens, ticker,searchOptions)
  22.         results.ResponseObject
  23.                     |> Seq.map(fun r -> r.CreatedDate, r.Text)

TweetSharp: No longer maintained

  1. open TweetSharp
  2.  
  3. type public TwitterProvider() =
  4.     member this.GetTweetsForDateRange(ticker:string, startDate: DateTime, endDate: DateTime) =
  5.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  6.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  7.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  8.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  9.         
  10.         let service = new TwitterService(consumerKey, consumerSecret)
  11.         service.AuthenticateWith(accessToken, accessTokenSecret)
  12.  
  13.         let searchOptions = new SearchOptions()
  14.         searchOptions.Q <- "IBM%20since%3A2014-03-01&src=typd"
  15.         service.Search(searchOptions).Statuses
  16.                                         |> Seq.map(fun s -> s.CreatedDate, s.Text)

Note that I did try and add a date range the way the Twitter API instructs, but it still came back with only 20 tweets.

LinqToTwitter: Active but nave to use Linq syntax.  Ugh!

Twitterinvi: Active but does not have date range functionality

  1. open System
  2. open System.Configuration
  3. open Tweetinvi
  4.  
  5. type public TwitterProvider() =
  6.     member this.GetTodaysTweets(ticker: string) =
  7.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  8.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  9.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  10.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  11.  
  12.         TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  13.         let tweets = Search.SearchTweets(ticker);
  14.         tweets |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount)
  15.  
  16.     member this.GetTweetsForDateRange(ticker: string, startDate: DateTime)=
  17.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  18.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  19.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  20.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  21.  
  22.         TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  23.         let searchParameter = Search.GenerateSearchTweetParameter(ticker)
  24.         searchParameter.Until <- startDate;
  25.         let tweets = Search.SearchTweets(searchParameter);
  26.         tweets |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount)

So without an out of the box API to use, I thought about using a Json Type Provider the way Lincoln Atkinson did.  The problem is that is example is for V1 of Twitter and V 1.1 uses Oauth.  If you run his code, you get

image

I then thought about a 3rd party API that captures Tweets.  I ran across gnip ($500!) and Topsy (no longer accepting new licenses b/c Apple bought them) so I am back to square one.

So finally I thought about rolling my own (with OAuth being the hard part) but I am quickly running out of time to get ready for the SIG and I don’t want to spend the time on only this part. 

Why isn’t there a Twitter type provider?  I’ll add it to the list….