Analytics in the Microsoft Stack

Disclaimer:  I really don’t know what I am talking about

I received an email from a coworker/friend yesterday with this in the body:

So, I have a friend who works for a major supermarket chain. In IT, they are straight out of the year 2000. They have tons and tons of data in SQL Server and I think Oracle. The industrial engineers (who do all of the planning) ask the IT group to run queries throughout the day, which takes hours to run. They use Excel for most of their processing. On the weekends, they run reporting queries which take hours and hours to run – all to get just basic information.

This got my wheels spinning about how I would approach the problem with the analytics toolset that I know is available.  The supermarket chain has a couple of problems

  • Lots of data that takes too long to munge through
  • The planners are dependent on IT group for processing the data

I would expect the official Microsoft answer is that they should implement Sql Server Analytics with Power BI.  I would assume if the group threw enough resources at this solution, it would work.  I then thought of a couple of alternative paths:

The first thing that comes to mind is using HDInsight (Microsoft’s Hadoop product)  on Azure.  That way the queries can run in a distributed manner and they can provision machines as they need them -> and when they are not running their queries, they can de-allocate the machines.

The second thought is using AzureML to do their model generation.  However, depending on the size of the datasets, AzureML may not be able to scale.  I have only used Azure ML on smaller datasets.

The third thought was using R?  I don’t think R is the best answer here.  Everything I know about R is that it is designed for data exploration and analysis of datasets that comfortably fit into the local machine’s memory.  Performance on R is horrible and scaling R is a real challenge. 

What about F#?  So this might be a good answer.  If you use the Hive Type Provider, you can get the benefits of HDInsight to do the processing and then have the goodness of the language syntax and REPL for data exploration.  Also, the group could look at MBrace for some kick-butt distributed processing that can scale on Azure. Finally, if they don come up with some kind of insight that lends itself for building analytics or models into an app, you can take the code out of the script file and stick it into a compliable assembly all within Visual Studio. 

What about Python?  No idea, I don’t enough about it

What about Matlab, SAS, etc..  No idea.  I stopped using those tools when R showed up.

What about Watson?  No idea.  I think I will have a better idea once I go to this.

Parsing Wake County School System Attendance Assignment Site With F#

As a follow up to this post, I then turned my attention to parsing the Wake County Public School Assignment Site.  If you are not familiar, large schools districts in America have a concept of ‘nodes’ where a child is assigned to a school pyramid (elementary, middle, high schools) based on their home address.  This gives the school attendance tremendous power because a house’s value is directly tied to how “good” (real or perceived) their assigned school pyramid.  WCPSS has a site here where you can enter in your address and find out the school pyramid.

Since there is not a public Api or even a publically available dataset, I decided to see if I could screen scrape the site.  The first challenge is that you need to navigate through 2 pages to get to your answer.  Here is the Fiddler trace

image

The first mistake you will notice is that they are using php.  The second is that they are using the same uri and they are parameterizing the requests via the form value:

image

Finally, their third mistake is that the pages comes back in an non-consistent way, making the DOM traversal more challenging.

Undaunted, I fired up Visual Studio. Because there are 2 pages that need to be used, I imported both of them as a the model for the HtmlTypeProvider

image

I then pulled out the form query string and placed them into some values.  The code so far:

1 #r "../packages/FSharp.Data.2.1.1/lib/net40/FSharp.Data.dll" 2 3 open System.Net 4 open FSharp.Data 5 6 type context = HtmlProvider<"../data/HouseSearchSample01.html"> 7 type context' = HtmlProvider<"../data/HouseSearchSample02.html"> 8 9 let uri = "http://wwwgis2.wcpss.net/addressLookup/index.php" 10 let streetLookup = "StreetTemplateValue=STRATH&StreetName=Strathorn+Dr+Cary&StreetNumber=904&SubmitAddressSelectPage=CONTINUE&DefaultAction=SubmitAddressSelectPage" 11 let streetLookup' = "SelectAssignment%7C2014%7CCURRENT=2014-15&DefaultAction=SelectAssignment%7C2014%7CCURRENT&DefaultAction=SelectAssignment%7C2015%7CCURRENT&CatchmentCode=CA+0198.2&StreetName=Strathorn+Dr+Cary&StreetTemplateValue=STRATH&StreetNumber=904&StreetZipCode=27519" 12

Skipping the 1st page, I decided to make a request and see if I could get the school information out of the DOM.  It well enough but you can see the immediate problem –> the page’s structure varies so just tagging the n element of the table will not work

1 let webClient = new WebClient() 2 webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded") 3 let result = webClient.UploadString(uri,"POST",streetLookup') 4 let body = context'.Parse(result).Html.Body() 5 6 let tables = body.Descendants("TABLE") |> Seq.toList 7 let schoolTable = tables.[0] 8 let schoolRows = schoolTable.Descendants("TR") |> Seq.toList 9 let elementaryDatas = schoolRows.[0].Descendants("TD") |> Seq.toList 10 let elementarySchool = elementaryDatas.[1].InnerText() 11 let middleSchoolDatas = schoolRows.[1].Descendants("TD") |> Seq.toList 12 let middleSchool = middleSchoolDatas.[1].InnerText() 13 //Need to skip for the enrollement cap message 14 let highSchoolDatas = schoolRows.[3].Descendants("TD") |> Seq.toList 15 let highSchool = highSchoolDatas.[1].InnerText() 16

 

image

I decided to take the dog for a walk and that time away from the keyboard was very helpful because I realized that although the table is not consistent, I don’t need it to be for my purposes.  All I need are the schools names for a given address.  What I need to do it remove all of the noise and just find the rows of the table with useful data:

1 let webClient = new WebClient() 2 webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded") 3 let result = webClient.UploadString(uri,"POST",streetLookup') 4 let body = context'.Parse(result).Html.Body() 5 6 let tables = body.Descendants("TABLE") |> Seq.toList 7 let schoolTable = tables.[0] 8 let schoolRows = schoolTable.Descendants("TR") |> Seq.toList 9 let schoolData = schoolRows |> Seq.collect(fun r -> r.Descendants("TD")) |>Seq.toList 10 let schoolData' = schoolData |> Seq.map(fun d -> d.InnerText().Trim()) 11 let schoolData'' = schoolData' |> Seq.filter(fun s -> s <> System.String.Empty) 12 13 //Strip out noise 14 let removeNonEssentialData (s:string) = 15 let markerPosition = s.IndexOf('(') 16 match markerPosition with 17 | -1 -> s 18 | _ -> s.Substring(0,markerPosition).Trim() 19 20 let schoolData''' = schoolData'' |> Seq.map(fun s -> removeNonEssentialData(s)) 21 22 let unimportantPhrases = [|"Neighborhood Busing";"This school has an enrollment cap"|] 23 let containsUnimportantPhrase (s:string) = 24 unimportantPhrases |> Seq.exists(fun p -> s.Contains(p)) 25 26 let schoolData'''' = schoolData''' |> Seq.filter(fun s -> containsUnimportantPhrase(s) = false ) 27 28 schoolData''''

And Boom goes the dynamite:

image

So working backwards, I need to parse the 1st page to get the CatchmentCode for an address, build the second’s page form data and then parse the results.  Parsing the 1st page for the catachmentCode was very straight forward:

1 let result = webClient.UploadString(uri,"POST",streetLookup) 2 let body = context.Parse(result).Html.Body() 3 let inputs = body.Descendants("INPUT") |> Seq.toList

image

1 let catchmentCode = inputs' |> Seq.filter(fun (n,v) -> n = "CatchmentCode") 2 |> Seq.map(fun (n,v) -> v) 3 |> Seq.head 4 let streetName = inputs' |> Seq.filter(fun (n,v) -> n = "StreetName") 5 |> Seq.map(fun (n,v) -> v) 6 |> Seq.head 7 let streetTemplateValue = inputs' |> Seq.filter(fun (n,v) -> n = "StreetTemplateValue") 8 |> Seq.map(fun (n,v) -> v) 9 |> Seq.head 10 let streetNumber = inputs' |> Seq.filter(fun (n,v) -> n = "StreetNumber") 11 |> Seq.map(fun (n,v) -> v) 12 |> Seq.head 13 let streetZipCode = inputs' |> Seq.filter(fun (n,v) -> n = "StreetZipCode") 14 |> Seq.map(fun (n,v) -> v) 15 |> Seq.head

 

image

So the answer is there, just the code sucks.  I refactored it to a single function and

1 let getValueFromInput(nameToFind:string) = 2 inputs' |> Seq.filter(fun (n,v) -> n = nameToFind) 3 |> Seq.map(fun (n,v) -> v) 4 |> Seq.head 5 let catchmentCode = getValueFromInput("CatchmentCode") 6 let streetName = getValueFromInput("StreetName") 7 let streetTemplateValue = getValueFromInput("StreetTemplateValue") 8 let streetNumber =getValueFromInput("StreetNumber") 9 let streetZipCode = getValueFromInput("StreetZipCode")

With the page 1 out of the way, I was ready to start altering the form query string.  I pulled the values out of the string and set up like this:

1 let streetTemplateValue = "STRAT" 2 let street = "Strathorn" 3 let suffix = "Dr" 4 let city = "Cary" 5 let streetNumber = "904" 6 let streetName = street+"+"+suffix+"+"+city 7 let streetLookup = "StreetTemplateValue="+streetTemplateValue+"&StreetName="+streetName+"&StreetNumber="+streetNumber+"&SubmitAddressSelectPage=CONTINUE&DefaultAction=SubmitAddressSelectPage" 8

1 let streetLookup' = "SelectAssignment%7C2014%7CCURRENT=2014-15&DefaultAction=SelectAssignment%7C2014%7CCURRENT&DefaultAction=SelectAssignment%7C2015%7CCURRENT&CatchmentCode="+catchmentCode+"&StreetName="+streetName+"&StreetTemplateValue="+streetTemplateValue+"&StreetNumber="+streetNumber+"&StreetZipCode="+streetZipCode 2

So now it was just a matter of creating some data structures to pass into the 1st query string

1 type SearchCriteria = {streetTemplateValue:string;street:string;suffix:string;city:string;streetNumber:string;} 2 3 let searchCriteria = {streetTemplateValue="STRAT";street="Strathorn";suffix="Dr";city="Cary";streetNumber="904"} 4 //Page1 Query String 5 let streetName = searchCriteria.street+"+"+searchCriteria.suffix+"+"+searchCriteria.city 6 let streetLookup = "StreetTemplateValue="+searchCriteria.streetTemplateValue+"&StreetName="+streetName+"&StreetNumber="+searchCriteria.streetNumber+"&SubmitAddressSelectPage=CONTINUE&DefaultAction=SubmitAddressSelectPage" 7

and we now have the basis for a series of functions to do the school lookup.  You can see the gist here.

Parsing Wake County Tax Site With F#

Based on the response of my last post on Wake County School scores, I decided to look at each school’s revenue base.   Instead of looking at free and reduced lunch as a correlating factor for school scores, I wanted to look at the aggregate home valuations of each school’s population.

To do that, I thought of Wake County Tax Department’s web site found here, which you can look up an address and see the tax value of the property.  Although they don’t have an api, their web site’s search result page has a predictable uri like this: http://services.wakegov.com/realestate/Account.asp?id=0000001 so by placing in a 7-character integer, I could theoretically look at all of the tax records for the county.  Also, the HTML of the result page is standardized so parsing it should be fairly straightforward.

So I fired up Visual Studio and opened up the F# REPL. The first thing I did was to bring in the Html type provider and wire up a standard page for the type.

1 #r "../packages/FSharp.Data.2.1.1/lib/net40/FSharp.Data.dll" 2 open FSharp.Data 3 type context = HtmlProvider<"../data/RealEstateSample.html"> 4

I then could bring down all of the DOM elements for the page: and find all of the <Table> elements

1 let uri = "http://services.wakegov.com/realestate/Account.asp?id=0000001" 2 let body = context.Load(uri).Html.Body() 3 let tables = body.Descendants("TABLE") |> Seq.toList 4 tables |> Seq.length 5

image

So there are 14 tables on the page.  After some manual inspection, the table that holds the address information is table number 7:

1 let addressTable = tables.[7] 2

image

My first thought was to parse the text to see if there are key words that I can search on

1 let baseText = taxTable.ToString() 2 let marker = baseText.IndexOf("Total Value Assessed") 3 let remainingText = baseText.Substring(marker) 4 let marker' = remainingText.IndexOf("$") 5 let remainingText' = remainingText.Substring(marker') 6 let marker'' = remainingText'.IndexOf("<") 7 let finalText = remainingText'.Substring(0,marker'')

I then thought, “Jamie you are being stupid”.  Since the DOM is structured consistently,  I can just use the type provider and search on tags:

1 let addressTable = tables.[7] 2 let fonts = addressTable.Descendants("font") |> Seq.toList 3 let addressOne = fonts.[1].InnerText() 4 let addressTwo = fonts.[2].InnerText() 5 let addressThree = fonts.[3].InnerText() 6

and sure enough

image

And then going to table number 11, I can get the assessed value:

1 let taxTable = tables.[11] 2 let fonts' = taxTable.Descendants("font") |> Seq.toList 3 let assessedValue = fonts'.[3].InnerText() 4

and how cool is this?

image

So with the data elements in place, I need a way of saving the data.  Fortunately, the Json type provider is also in FSharp.Data so I could do this:

1 let valuation = JsonValue.Record [| 2 "addressOne", JsonValue.String addressOne 3 "addressTwo", JsonValue.String addressTwo 4 "addressThree", JsonValue.String addressThree 5 "assessedValue", JsonValue.String assessedValue |] 6 open System.IO 7 File.AppendAllText(@"C:\Data\dataTest.json",valuation.ToString()) 8

And in the file:

image

So now I have the pieces to make requests to the Wake County site and put the values into a json file.  I decided to push the data to the file after each request so if there is a reentrant fault, I would not lose everything:  So here is the gist and here is the results:

image

I then decided to see how long it will take to download the 1st 1,000 Ints.

1 #time 2 [1..100] |> Seq.iter(fun id -> doValuation id)

and with fiddler running

image

It took about 5 minutes for 1,000 ints

image

so extrapolating the max possible (9,999,999), it would take 83 hours.

image

Two thoughts come to mind for the next step

1) Use MBrace with some VMs on Azure to do the requests in parallel

2) Do a binary search to see the actual upper number for Wake County.

Tune in next week so see if that works.

Wake County School Report Cards Using R

Recently Wake County School Systems released a “school report card” that can be used to compare how well a school is doing relative to the other schools in the state. As expected, it made front-page news in our local newspaper here.  The key theme was that schools that have kids from poorer families have worse results than schools from more affluent families.  Though this shouldn’t come as a surprise, the follow-up op eds were equally predictable: more money for poorer schools, changing the rating scale, etc..

I thought it would be an interesting data set to analyze to see if the conclusion that the N&O came up with was, in fact, the only conclusion you can get out of the dataset..  Since they did simple crosstab analysis, perhaps there was some other analysis that could be done?  I know that news paper articles are at a pretty low level reading level and perhaps they are also written at a low analytical level also?  I went to the website to download the data here and quickly ran into two items:

1) The dataset is very limited –> there are only 3 creditable variables in the dataset (county, free and reduced percent, and the school score).  It is almost as if the dataset was purposely limited to only support the conclusion.

2) The dataset is shown in a way that alternative analysis is very hard.  You have to install Tableau if you want to look the data yourself.  Parsing Tableau was a pain because even with Fiddler, they don’t render the results as HTML with some tags but as images.

Side Note –> My guess is that Tableau is trying to be the Flash for the analytics space.  I find it curious that companies/organizations that think they are just “one tool away” from good analytics.   Even the age of Watson,  it is never the tooling – it is always the analyst that determines the usefulness of a dataset.  It would much better if WCPSS embraced open data and had higher expectations of the people using the datasets.

In any event, with the 14 day trial of Tableau, I could download into Access.  I then exported the data into a .txt file (where it should have been in the 1st place).  I the pumped it into R Studio like so:

image

I then created 2 variables from the FreeAndReducedLunch and SchoolScores vectors.  When I ran the correlation the 1st time, I got an NA, meaning that there are some mal-formed data. 

image

I re-ran the correlation using only complete data and sure enough, there is a creditable correlation –> higher the percent of free and reduced lunch, the lower the score.  The N&O is right. 

image

I then added a filter to only look at Wake County and there is even a stronger correlation in Wake County than the state as a whole:

image

As I mentioned earlier, the dataset was set up for a pre-decided conclusion by limited the number of independent variables and the choice of using Tableau as the reporting mechanism.  I decided to augment the dataset with additional information.  My son plays in TYO and I unsuccessful tried to set up an orchestra at our local elementary school 8 years ago.  I also thought of this article where  some families tried to get more orchestras in Wake County schools.  Fortunately, the list of schools with orchestra can be found here and it did not take very long to add an “HasAnStringsProgram” field to the dataset.

image

Running a correlation for just the WCPSS schools shows that there is no relationship  between a school having an orchestra and their performance grade. 

image

So the statement by the parents in the N&O like this

… that music students have higher graduation rates, grades and test scores …

might be true for all music but a specialized strings program does not seem to impact the school’s score –> at least with this data.

Record Types and Serialization -> F# to C#

I was working on an exercise where I have a F# component being consumed by an MVC web application written in C#.   As I already asked about here, the CLIMutable is a great feature to the F# language spec.  I created a basic record type with the CLIMutable attribute like so:

1 [<CLIMutable>] 2 type Account = {Number:int; Holder:string; Amount:float}

I then create an instance of that type in the controller written in C#:

1 public class AccountController : ApiController 2 { 3 [HttpGet] 4 public Account Index() 5 { 6 var account = new Account(); 7 account.Amount = 100; 8 account.Holder = "Homer"; 9 account.Number = 1; 10 return account; 11 } 12 }

When I write the site and call the method via Fiddler, I get a nasty “@” symbol added to the end of the name:

image

which is no good.  I then thought of my question on SO and Mark Seeman’s blog post on this topic found here.  I added this to the WebApiConfig class

1 public static void Register(HttpConfiguration config) 2 { 3 // Web API configuration and services 4 GlobalConfiguration.Configuration.Formatters.JsonFormatter.SerializerSettings.ContractResolver = 5 new Newtonsoft.Json.Serialization.CamelCasePropertyNamesContractResolver(); 6 7 // Web API routes 8 config.MapHttpAttributeRoutes(); 9 10 config.Routes.MapHttpRoute( 11 name: "DefaultApi", 12 routeTemplate: "api/{controller}/{id}", 13 defaults: new { id = RouteParameter.Optional } 14 ); 15 }

And the “@” symbol goes away

image

The next task was to add in option types.  I updated the class like so:

1 [<CLIMutable>] 2 type Account = {Number:int;Holder:string option;Amount:float}

I then added a F# class to do the assignment of the value

1 type AccountRepository() = 2 member this.GetAccount() = 3 {Number=1;Holder=Some "Homer"; Amount=100.}

And then updated the controller:

1 [HttpGet] 2 public Account Index() 3 { 4 AccountRepository repository = new AccountRepository(); 5 return repository.GetAccount(); 6 }

And I get this:

image

Which is not what my UI friends want.  They want the same value for the json –> they want to ignore the existence of the option type.  Fair enough. I hit up stack overflow here and Sven had a great answer pointing me to Isaac Abraham’s serializer found here.  I popped that puppy into the project and updated the WebApiConfig like so:

1 var formatter = GlobalConfiguration.Configuration.Formatters.JsonFormatter; 2 formatter.SerializerSettings.ContractResolver = new DefaultContractResolver(); 3 formatter.SerializerSettings.Converters.Add(new IdiomaticDuConverter());

And boom goes the dynamite.

clip_image002

Logentries.com and F#

I recently was working on a project that has a fair bit of legacy code.  One of the pieces of the project is an logging service whose interface is this:

1 public interface ILoggingRepository 2 { 3 void LogMessage(String message); 4 void LogException(String message, Exception exception); 5 }

 

There are 2 or 3 different implementations of the logging repository – one that covers the windows logs, one that writes to azure service bus, one that writes to nothing (and in-memory one used for testing).  I thought about using Logentries as place to write the messages to.  I created an account and set up my first log

image image

Note that the log also gets a token (a guid) that I will use to send messages to the log at the bottom of the page.

I then fired up visual studio and created a new FSharp project and added a reference from the CSharp project to the FSharp project.  I then added an associated unit test class to the existing unit test project:

image

I then went back to Logentries and read the api documentation about posting to the log here.   They suggested either log4net or NLog.  For no particular reason, I picked NLog.  I fired up Nuget and installed the Logentries.NLog package

image

I then read further down the documentation and yuck, there is tons of places where you have to add to the configuration file.  I am trying to maintain a clean separation of concerns in the app and this intertwines the working code with the .config file.  Also, the other implementations don’t use the .config so I would like to keep consistant there.  After bouncing around in the api for a bit, I went to stack overflow and asked if there was a way I could implement without the .config file.  Sure enough, the dev team was kind enough to answer.  I went ahead and implemented their code (after porting it from C#)  in my project like so:

1 namespace ChickenSoftware.LoggingExample.FS 2 3 open NLog 4 open System 5 open NLog.Targets 6 open NLog.Config; 7 open ChickenSoftware.LoggingExample 8 9 type LogEntriesLoggingRepository(logEntriesToken:string) = 10 let target = new LogentriesTarget() 11 let config = new LoggingConfiguration() 12 do target.Token <- logEntriesToken 13 do target.Ssl <- true 14 do target.Debug <- true 15 do target.Name <- "Logentries" 16 let layout = Layouts.Layout.FromString("${date:format=ddd MMM dd} ${time:format=HH:mm:ss} ${date:format=zzz yyyy} ${logger} : ${LEVEL}, ${message}") 17 do target.Layout <- layout 18 do target.HttpPut <- false 19 do config.AddTarget("Logentries2",target) 20 let loggingRule = new LoggingRule("*", LogLevel.Debug, target) 21 do LogManager.Configuration.AddTarget("targetName", target) 22 do LogManager.Configuration.LoggingRules.Add(loggingRule) 23 do LogManager.Configuration.Reload() |> ignore 24 let logger = LogManager.GetCurrentClassLogger() 25 26 interface ILoggingRepository with 27 member this.LogMessage(message) = 28 logger.Log(LogLevel.Warn, message) 29 member this.LogException(message, exn) = 30 logger.LogException(LogLevel.Error,message,exn)

I then went into the unit test and attempted to generate a log message:

1 public class LogEntriesLoggingRepositoryTests 2 { 3 ILoggingRepository _repository = null; 4 public LogEntriesLoggingRepositoryTests() 5 { 6 string logEntriesToken = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"; 7 _repository = new LogEntriesLoggingRepository(logEntriesToken); 8 } 9 10 [TestMethod] 11 public void LogMessage_ReturnsExpected() 12 { 13 _repository.LogMessage("This is a test"); 14 15 } 16 }

Unfortunately, when I ran it, I got the following exception, even though I marked the .dlls to be copied

image image

So back to Nuget, where I added in the Logentries.NLog to the Tests project.  I feel really dirty by doing it:

image

I then ran the test again but I got this exception:

image

When I added a break to the code and stepped through, I found it was on the LogManager.Configuration.

image

Apparently, the only way out of this pickle is to add some basic entries to the .config file <sigh>:

1 <?xml version="1.0" encoding="utf-8"?> 2 <configuration> 3 <configSections> 4 <section name="nlog" type="NLog.Config.ConfigSectionHandler, NLog" /> 5 </configSections> 6 <runtime> 7 <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> 8 <dependentAssembly> 9 <assemblyIdentity name="NLog" publicKeyToken="5120e14c03d0593c" culture="neutral" /> 10 <bindingRedirect oldVersion="0.0.0.0-2.1.0.0" newVersion="2.1.0.0" /> 11 </dependentAssembly> 12 </assemblyBinding> 13 </runtime> 14 <nlog> 15 <extensions> 16 <add assembly="LogentriesNLog" /> 17 </extensions> 18 <targets> 19 <target name="logentries" type="Logentries" debug="true" httpPut="false" ssl="false" layout="${date:format=ddd MMM dd} ${time:format=HH:mm:ss} ${date:format=zzz yyyy} ${logger} : ${LEVEL}, ${message}" /> 20 </targets> 21 <rules> 22 <logger name="*" minLevel="Debug" appendTo="logentries" /> 23 </rules> 24 </nlog> 25 </configuration>

After I added it, the test ran green.

image

Alas, nothing was showing up in the log!

image

After some back and forth with the Logentries team, it became clear that the thread was terminating before the Logentries library had a chance to post it to the service.  This was proven by adding a Thread.Sleep to the test:

1 public void LogMessage_ReturnsExpected() 2 { 3 _repository.LogMessage("This is a test"); 4 Thread.Sleep(500); 5 6 }

image

So what to do?  The api does not have an async implementation so I can’t await it and if I leave that Thread.Sleep as is, the main thread will be blocked.  I decided to add an async implementation to the interface

1 public interface ILoggingRepository 2 { 3 void LogMessage(String message); 4 Task LogMessageAsync(String message); 5 void LogException(String message, Exception exception); 6 Task LogExceptionAsync(String message, Exception exception); 7 }

I then updated the repository like so:

1 interface ILoggingRepository with 2 member this.LogMessage(message) = 3 logger.Log(LogLevel.Warn, message) 4 member this.LogMessageAsync(message) = 5 Tasks.Task.Run(fun _ -> logger.Log(LogLevel.Warn, message) 6 Thread.Sleep(500)) 7 member this.LogException(message, exn) = 8 logger.LogException(LogLevel.Error,message,exn) 9 member this.LogExceptionAsync(message, exn) = 10 Tasks.Task.Run(fun _ -> logger.LogException(LogLevel.Error,message,exn) 11 Thread.Sleep(500))

And then I added an async unit test like so:

1 [TestMethod] 2 public void LogMessageAsync_ReturnsExpected() 3 { 4 var task = _repository.LogMessageAsync("This is an async test"); 5 task.Wait(); 6 }

And sure enough, green (note that the async test takes longer than 500MS) and the expected side-effect:

image 

image

So now another CSharp shop has some FSharp sprinkled into their code base.  Note the code actually used is slightly different b/c  the code as written will keep adding more and more targets, which is not what we want.

F# Record Types with SqlProvider Code-Last

As I talked about last week, I was looking at different ways of using the Entity Framework type provider to map to my domain model.  While I was working on the process,  Ross McKinley saw some of my whining on Twitter and suggested that I take a look at SqlProvider.

 image

He made a good case to use this type provider over entity framework.  Specifically:

  • There is no code bloat/file bloat/code-gen issues that you get with EF
  • It targets Sql Server like EF, but also can handle Oracle, Postgres, MySql, and other RDBMS
  • It has had a update in the last year

So that was a good enough reason to take a look.  The project site is a bit lacking in terms of examples but between what is on GitHub and on Ross’s blog, you can get a pretty good idea of how to accomplish basic crud tasks.  I was interested in how well it handles nested types and F# choice types.  I fired up Visual Studio and installed it from nuget.

I then created the same domain types I was working with earlier –> note the Choice type for gender.

1 #r "../packages/SQLProvider.0.0.9-alpha/lib/net40/FSharp.Data.SqlProvider.dll" 2 3 open System.Linq 4 open FSharp.Data.Sql 5 open System.Security.Principal 6 7 type sqlSchema = SqlDataProvider< 8 ConnectionString = @"Server=.;Database=FamilyDomain;Trusted_Connection=True;", 9 UseOptionTypes = true > 10 11 let context = sqlSchema.GetDataContext() 12 13 //Local Idomatic Types 14 type Gender = Male | Female 15 [<CLIMutable>] 16 type Pet = {Id:int; ChildId:int; GivenName:string} 17 [<CLIMutable>] 18 type Child = {Id:int; FirstName:string; Gender:Gender; Grade:int; Pets: Pet list} 19 [<CLIMutable>] 20 type Address = {Id:int; State:string; County:string; City:string} 21 [<CLIMutable>] 22 type Parent = {Id:int; FirstName:string} 23 [<CLIMutable>] 24 type Family = {Id:int; LastName:string; Parents:Parent list; Children: Child list; Address:Address} 25 26

I then added in the same code that I used for the Entity Framework Type Provider, made some changes (like you get subtypes via querying the foreign key and I am not using Linq to query the data store

1 let MapPet(efPet: entity.dataContext.``[dbo].[Pet]Entity``) = 2 {Id=efPet.Id; ChildId=efPet.ChildId; GivenName=efPet.GivenName} 3 4 let MapGender(efGender) = 5 match efGender with 6 | "Male" -> Male 7 | _ -> Female 8 9 let MapChild(efChild: entity.dataContext.``[dbo].[Child]Entity``) = 10 let pets = efChild.fk_Pet_Child |> Seq.map(fun p -> MapPet(p)) 11 |> Seq.toList 12 {Id=efChild.Id; FirstName=efChild.FirstName; 13 Gender=MapGender(efChild.Gender); 14 Grade=efChild.Grade;Pets=pets} 15 16 let GetPet(id: int)= 17 context.``[dbo].[Pet]`` 18 |> Seq.where(fun p -> p.Id = id) 19 |> Seq.head 20 |> MapPet 21 22 let GetChild(id: int)= 23 context.``[dbo].[Child]`` 24 |> Seq.where(fun c -> c.Id = id) 25 |> Seq.head 26 |> MapChild 27 28 let myPet = GetPet(1) 29 30 let myChild = GetChild(1)

image

And then I added some code to insert a new pet

1 let SavePet(pet: Pet)= 2 let ssPet = context.``[dbo].[Pet]``.Create() 3 ssPet.ChildId <- pet.ChildId 4 ssPet.GivenName <- pet.GivenName 5 context.SubmitUpdates() 6 7 let newPet = {Id=0;ChildId=1;GivenName="Kiss"} 8 SavePet(newPet) 9 10 let failurePet = {Id=0;ChildId=0;GivenName="Should Fail"} 11 SavePet(failurePet)

And pow on the expected happy path

image

and pow pow on the expected exception

System.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint "fk_Pet_Child". The conflict occurred in database "FamilyDomain", table "dbo.Child", column ‘Id’.

The statement has been terminated.

   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

So this is pretty cool.  But then it got better, I showed some of this code to Ross and he told me I was doing everything wrong.  Basically, I need to think about the get code less imperative linq and more like computed expressions.  The biggest downside to how I wrote the gets is that the TP would pull all of the records from the database locally before filtering them.  So going back to the documentation, I changed the getPet functional to this

1 let GetPet(id: int)= 2 query {for p in context.``[dbo].[Pet]`` do 3 where (p.Id = id) 4 select {Id=p.Id; ChildId=p.ChildId; GivenName=p.GivenName}} 5 |> Seq.head 6

And it still works

image

The nice thing is that I no longer need the mapPet function as the project happens in the select clause.  So this is pretty cool and very powerful.  Time to learn some more query syntax!

Follow

Get every new post delivered to your Inbox.