Parsing Microsoft MVP Pages and Uploading Photos to Sky Biometry

As a piece of the Terminator project that I am bringing to the MVP Summit, I wanted to load in all of the MVP photographs to Sky Biometry and if a person matches the photo at a high level, terminate them.  I asked my Microsoft contact if I could get all of the MVP photos to load into the app and they politely told me no.

Not being one who takes no lightly, I decided to see if I could load the photos from the MVP website.  Each MVP has a profile photo like here and all of the MVPs are listed here with their MVP IDs specified.  So if I can get the Id from the search page and then create a Uri to the photo, I can then load it into Sky Biometry.

I first created a new FSharp project and fired up a script window.  I created a function that gets the entire contents of a page with the only variable being the index number of the pagination.

1 let getPageContents(pageNumber:int) = 2 let uri = new Uri("http://mvp.microsoft.com/en-us/search-mvp.aspx?lo=United+States&sl=0&browse=False&sc=s&ps=36&pn=" + pageNumber.ToString()) 3 let request = WebRequest.Create(uri) 4 request.Method <- "GET" 5 let response = request.GetResponse() 6 let stream = response.GetResponseStream() 7 let reader = new StreamReader(stream) 8 reader.ReadToEnd() 9

I then parsed the page for all instances of the MVPId.  Fortunately, I found this post that helped me understand how the pattern match works in .NET.  Note that the regex for the tag mvpid=123456 is “mvpid=\d+”

1 let getMVPIdsFromPageContents(pageContents:string) = 2 let pattern = "mvpid=\d+" 3 let matchCollection = Regex.Matches(pageContents, pattern) 4 matchCollection 5 |> Seq.cast 6 |> Seq.map(fun (m:Match) -> m.Value) 7 |> Seq.map(fun s -> s.Split('=')) 8 |> Seq.map(fun a -> a.[1]) 9

With that out of the way, I could get a Seq of all MVP IDs (at least from America and then collect each of the pages together:

1 let getGetMVPIds(pageNumber: int) = 2 let pageContents = getPageContents(pageNumber) 3 getMVPIdsFromPageContents pageContents 4 5 let pageList = [1..17] 6 let mvpIds = pageList 7 |>Seq.collect(fun i -> getGetMVPIds(i)) 8

so far so good:

image

I then could create a method that generates the MVP Photo Uri:

1 let getMvpImageUri(mvpId: int) = 2 new Uri("http://mvp.microsoft.com/private/en-us/PublicProfile/Photo/" + mvpId.ToString()) 3

With that out of the way, it was time to point the photos to Sky Biometry for facial detection and tagging.  I used the code found in this post with a couple of changes to account that a face might not be found in the photo (hence the choice type) and that bad things might happen (like too big of a photo)

1 type skybiometryFaceDetection = JsonProvider<".\SkyBiometryImageJson\FaceDetection.json"> 2 type skybiometryAddTags = JsonProvider<".\SkyBiometryImageJson\AddTags.json"> 3 type skybiometryFaceTraining = JsonProvider<".\SkyBiometryImageJson\FaceTraining.json"> 4 5 let detectFace (imageUri:string) = 6 let stringBuilder = new StringBuilder() 7 stringBuilder.Append(skyBiometryUri) |> ignore 8 stringBuilder.Append("/fc/faces/detect.json?urls=") |> ignore 9 stringBuilder.Append(imageUri) |> ignore 10 stringBuilder.Append("&api_key=") |> ignore 11 stringBuilder.Append(skyBiometryApiKey) |> ignore 12 stringBuilder.Append("&api_secret=") |> ignore 13 stringBuilder.Append(skyBiometryApiSecret) |> ignore 14 try 15 let faceDetection = skybiometryFaceDetection.Load(stringBuilder.ToString()) 16 if faceDetection.Photos.[0].Tags.Length > 0 then 17 Some faceDetection.Photos.[0].Tags.[0].Tid 18 else 19 None 20 with | :? System.Exception -> None 21

I then added the other two methods to tag and recognize

1 let saveTag(uid:string, tid:string)= 2 let stringBuilder = new StringBuilder() 3 stringBuilder.Append(skyBiometryUri) |> ignore 4 stringBuilder.Append("/fc/tags/save.json?uid=") |> ignore 5 stringBuilder.Append(uid) |> ignore 6 stringBuilder.Append("&tids=") |> ignore 7 stringBuilder.Append(tid) |> ignore 8 stringBuilder.Append("&api_key=") |> ignore 9 stringBuilder.Append(skyBiometryApiKey) |> ignore 10 stringBuilder.Append("&api_secret=") |> ignore 11 stringBuilder.Append(skyBiometryApiSecret) |> ignore 12 let tags = skybiometryAddTags.Load(stringBuilder.ToString()) 13 tags.Status 14 15 let trainFace(uid:string)= 16 let stringBuilder = new StringBuilder() 17 stringBuilder.Append(skyBiometryUri) |> ignore 18 stringBuilder.Append("/fc/faces/train.json?uids=") |> ignore 19 stringBuilder.Append(uid) |> ignore 20 stringBuilder.Append("&api_key=") |> ignore 21 stringBuilder.Append(skyBiometryApiKey) |> ignore 22 stringBuilder.Append("&api_secret=") |> ignore 23 stringBuilder.Append(skyBiometryApiSecret) |> ignore 24 let training = skybiometryFaceTraining.Load(stringBuilder.ToString()) 25 training.Status 26

Upon reflection, this would have been a perfect place for Scott W’s ROP, but I just created a covering function

1 let saveToSkyBiometry(mvpId:string, imageUri:string) = 2 let tid = detectFace(imageUri) 3 match tid with 4 | Some x -> saveTag(mvpId + "@terminatorChicken",x) |> ignore 5 trainFace(mvpId + "@terminatorChicken") 6 | None -> "Failure" 7 8 let results = mvpIds 9 |> Seq.map(fun mvpId -> mvpId, getMvpImageUri(Int32.Parse(mvpId))) 10

I then created a Seq.Map to call all of the photos in order but I quickly ran into this:

Capture

So I changed my Seq.Map to a Loop so I could throttle the requests:

1 for (mvpId,uri) in results do 2 let result= saveToSkyBiometry(mvpId, uri.ToString()) 3 printfn "%s" result 4 Thread.Sleep(TimeSpan.FromMinutes(1.)) 5

And sure enough

Capture1Capture2

And you can see the load every hour

Capture3

You can see the full code here.

Hacking the Dream Cheeky Thunder Missile Launcher, Part 2

One of the things that the Terminator will have is a missile launcher, which I started hacking here.  The missile launcher Api is controlled by time.  Specifically, you tell it to turn in a certain direction for a certain amount of time. 

1 member this.moveMissleLauncher(data, interval:int) = 2 if devicePresent then 3 this.SwitchLed(true) 4 this.sendUSBData(data) 5 Thread.Sleep(interval) 6 this.sendUSBData(this.STOP) 7 this.SwitchLed(false) 8

The challenge is converting that duration into X,Y Cartesian coordinates the way the Kinect and the phidget laser system does.  And before getting Cartesian coordinates, we needed to get the polar coordinates.  This is how we did it.

First, we tackled the pan (X coordinate) of the missile launcher.  The launcher is a on a square base and the full range of the launcher is 45 degrees to 315 degrees.

image

With some experimentation, we determined that the total time it takes the turret to traverse from 45 degrees to 315 degrees (270 total degrees) is 6346 Milliseconds.  Assume that the motor is consistent (which is a big if using cheap electronics), it takes the motor about 23.5 milliseconds to move 1 degree on the X axis. 

The tilt was more of a challenge.  We needed a way of measuring the total range along the Y axis.  To that end, we placed the turret 300 millimeters away from the wall.  We then placed a laser pointer on the turret and put a level on it to ensure that it was a 0 degrees and marked the wall.  We then moved the turret to its highest position and then to its lowest, marking the wall with those points.  We then measured the distance to the highest and lowest point on the wall.

WP_20141010_002WP_20141010_001

 

image

Assuming the wall was vertical, we could then use this site to figure out the angle of the turret.  We first calculated the length of the unknown side using the Pythagorean theorem (3002 + 2352 = X2) =  Sqrt(90000 + 55225) = 381

With all three sides known, we went over to this great site to use some basic trigonometry to help solve the angle problem.  Since we are looking at the angle between adjacent and hypotenuse, we need to determine the  inverse cosine via this formula: cos(θ) = Adjacent / Hypotenuse.  cos(θ) = 300/381 or cos(θ) = .7874 or (θ) = cos-1.7874 or .6642.  This means our rocket launcher can move up about 66 degrees and by doing the same calculation for down, it can move down about -8 degrees.  Since 66+8 equals 74, we decided to round to 75.

We then determined that the total time it takes the rocket launcher to traverse from its max up position to level was 710 milliseconds.  Dividing 66 into 710, each degree takes about 10.7 milliseconds.

So with handy chart in place, we are ready to map the polar coordinates to the Missile Launcher

image

I added the adjustment values to the type

1 let tiltMultiplier = 10.7 2 let panMultiplier = 23.1 3

So now it is a question of keeping track of where the launcher is pointed at and then calling the correct adjustments.  To that end, I created a couple of mutable variables and set them to 90 when the missile launcher initializes

1 let mutable currentPan = 0. 2 let mutable currentTilt = 0. 3

1 member this.Reset() = 2 if devicePresent then 3 this.moveMissleLauncher(this.LEFT,6346) 4 this.moveMissleLauncher(this.RIGHT,3173) 5 this.moveMissleLauncher(this.UP,807) 6 this.moveMissleLauncher(this.DOWN,710) 7 currentPan <- 90. 8 currentTilt <- 90. 9 ()

I then implemented the method for the interface to acquire the target.  Note that I am pretty liberal with with my use of explanatory variables.

1 member this.AquireTarget(X:float, Y:float) = 2 match X = 0.0, Y = 0.0 with 3 | true,true -> false 4 | true,false -> false 5 | false,true -> false 6 | false, false -> 7 let tilt = X 8 let pan = Y 9 10 let tiltChange = currentTilt - tilt 11 let panChange = currentPan - pan 12 13 let tiltChange' = int tiltChange 14 let panChange' = int panChange 15 16 let tiltChange'' = abs(tiltChange) 17 let panChange'' = abs(panChange) 18 19 match tiltChange' with 20 | tiltChange' when tiltChange' > 0 -> this.Down(tiltChange''); currentTilt <- tilt 21 | tiltChange' when tiltChange' < 0 -> this.Up(tiltChange''); currentTilt <- tilt 22 | tiltChange' when tiltChange' = 0 -> () 23 | _ -> () 24 25 match panChange' with 26 | panChange' when panChange' > 0 -> this.Left(panChange''); currentPan <- pan 27 | panChange' when panChange' < 0 -> this.Right(panChange''); currentPan <- pan 28 | panChange' when panChange' = 0 -> () 29 | _ -> () 30 true 31

And with that, we have another weapons system we can add to our kinect Terminiator

Smart Nerd Dinner

I think there is general agreement that the age of the ASP.NET wire-framing post-back web dev is over.  If you are going to writing web applications in 2015 in the .NET stack, you have to be able to use java script and associated javascript frameworks like Angular.  Similarly, the full-stack developer needs to have a much deeper understanding of the data that is passing in and and out of their application.  With the rise of analytics in an application, the developer needs different tools and approaches to their application.  Just as you need to know javascript if you are going to be in the browser, you need to know F# if you are going to be building industrial-grade  domain and  data layers.

I decided to refactor an existing ASP.NET postback website to see how hard it would be to introduce F# to the project and apply some basic statistics to make the site smarter.  It was pretty easy and the payoffs were quite large.

If you are not familiar, nerd Dinner is the cannonal example of a MVC application that was created to show Microsoft web devs how to create a website using the .NET stack.  The original project was put into a book with the Mount Rushmore of MSFT uber-devs

image

The project was so successful that it actually was launched into a real website

image

and you can find the code on Codeplex here

image

When you download the source code from the repository, you will notice a couple of things:

1) It is not a very big project – with only 1100 lines of code

image

2) There are 191 FxCop violations

image

3) It does compile coming out of source, but some of the unit tests fail

image

4) There is pretty low code coverage (21%)

image

Focusing on the code coverage issue, it makes sense that there is not much code coverage because there is not much code that can be covered.  There is maybe 15 lines of “business logic” if the term business logic is expanded to include input validation.  This is an example

image

Also, there is maybe ten lines of code that do some basic filtering

image

So step one in the quest to refactor nerd dinner to be a bit smarter was to rename the projects.  Since MVC is a UI framework, it made sense to call it that.  I then changed the namespaces to reflect the new structure

image

The next  step was to take the domain classes out of the UI and put them into the application.  First, I created another project

image

I then took all of the interfaces that was in the UI and placed them into the application

1 namespace NerdDinner.Models 2 3 open System 4 open System.Linq 5 open System.Linq.Expressions 6 7 type IRepository<'T> = 8 abstract All : IQueryable<'T> 9 abstract AllIncluding 10 : [<ParamArray>] includeProperties:Expression<Func<'T, obj>>[] -> IQueryable<'T> 11 abstract member Find: int -> 'T 12 abstract member InsertOrUpdate: 'T -> unit 13 abstract member Delete: int -> unit 14 abstract member SubmitChanges: unit -> unit 15 16 type IDinnerRepository = 17 inherit IRepository<Dinner> 18 abstract member FindByLocation: float*float -> IQueryable<Dinner> 19 abstract FindUpcomingDinners : unit -> IQueryable<Dinner> 20 abstract FindDinnersByText : string -> IQueryable<Dinner> 21 abstract member DeleteRsvp: 'T -> unit

I then tooks all of the data structures/models and placed them in the application.

1 namespace NerdDinner.Models 2 3 open System 4 open System.Web.Mvc 5 open System.Collections.Generic 6 open System.ComponentModel.DataAnnotations 7 open System.ComponentModel.DataAnnotations.Schema 8 9 type public LocationDetail (latitude,longitude,title,address) = 10 let mutable latitude = latitude 11 let mutable longitude = longitude 12 let mutable title = title 13 let mutable address = address 14 15 member public this.Latitude 16 with get() = latitude 17 and set(value) = latitude <- value 18 19 member public this.Longitude 20 with get() = longitude 21 and set(value) = longitude <- value 22 23 member public this.Title 24 with get() = title 25 and set(value) = title <- value 26 27 member public this.Address 28 with get() = address 29 and set(value) = address <- value 30 31 type public RSVP () = 32 let mutable rsvpID = 0 33 let mutable dinnerID = 0 34 let mutable attendeeName = "" 35 let mutable attendeeNameId = "" 36 let mutable dinner = null 37 38 member public self.RsvpID 39 with get() = rsvpID 40 and set(value) = rsvpID <- value 41 42 member public self.DinnerID 43 with get() = dinnerID 44 and set(value) = dinnerID <- value 45 46 member public self.AttendeeName 47 with get() = attendeeName 48 and set(value) = attendeeName <- value 49 50 member public self.AttendeeNameId 51 with get() = attendeeNameId 52 and set(value) = attendeeNameId <- value 53 54 member public self.Dinner 55 with get() = dinner 56 and set(value) = dinner <- value 57 58 59 and public Dinner () = 60 let mutable dinnerID = 0 61 let mutable title = "" 62 let mutable eventDate = DateTime.MinValue 63 let mutable description = "" 64 let mutable hostedBy = "" 65 let mutable contactPhone = "" 66 let mutable address = "" 67 let mutable country = "" 68 let mutable latitude = 0. 69 let mutable longitude = 0. 70 let mutable hostedById = "" 71 let mutable rsvps = List<RSVP>() :> ICollection<RSVP> 72 73 [<HiddenInput(DisplayValue=false)>] 74 member public self.DinnerID 75 with get() = dinnerID 76 and set(value) = dinnerID <- value 77 78 [<Required(ErrorMessage="Title Is Required")>] 79 [<StringLength(50,ErrorMessage="Title may not be longer than 50 characters")>] 80 member public self.Title 81 with get() = title 82 and set(value) = title <- value 83 84 [<Required(ErrorMessage="EventDate Is Required")>] 85 [<Display(Name="Event Date")>] 86 member public self.EventDate 87 with get() = eventDate 88 and set(value) = eventDate <- value 89 90 [<Required(ErrorMessage="Description Is Required")>] 91 [<StringLength(256,ErrorMessage="Description may not be longer than 256 characters")>] 92 [<DataType(DataType.MultilineText)>] 93 member public self.Description 94 with get() = description 95 and set(value) = description <- value 96 97 [<StringLength(256,ErrorMessage="Hosted By may not be longer than 256 characters")>] 98 [<Display(Name="Hosted By")>] 99 member public self.HostedBy 100 with get() = hostedBy 101 and set(value) = hostedBy <- value 102 103 [<Required(ErrorMessage="Contact Phone Is Required")>] 104 [<StringLength(20,ErrorMessage="Contact Phone may not be longer than 20 characters")>] 105 [<Display(Name="Contact Phone")>] 106 member public self.ContactPhone 107 with get() = contactPhone 108 and set(value) = contactPhone <- value 109 110 [<Required(ErrorMessage="Address Is Required")>] 111 [<StringLength(20,ErrorMessage="Address may not be longer than 50 characters")>] 112 [<Display(Name="Address")>] 113 member public self.Address 114 with get() = address 115 and set(value) = address <- value 116 117 [<UIHint("CountryDropDown")>] 118 member public this.Country 119 with get() = country 120 and set(value) = country <- value 121 122 [<HiddenInput(DisplayValue=false)>] 123 member public self.Latitude 124 with get() = latitude 125 and set(value) = latitude <- value 126 127 [<HiddenInput(DisplayValue=false)>] 128 member public v.Longitude 129 with get() = longitude 130 and set(value) = longitude <- value 131 132 [<HiddenInput(DisplayValue=false)>] 133 member public self.HostedById 134 with get() = hostedById 135 and set(value) = hostedById <- value 136 137 member public self.RSVPs 138 with get() = rsvps 139 and set(value) = rsvps <- value 140 141 member public self.IsHostedBy (userName:string) = 142 System.String.Equals(hostedBy,userName,System.StringComparison.Ordinal) 143 144 member public self.IsUserRegistered(userName:string) = 145 rsvps |> Seq.exists(fun r -> r.AttendeeName = userName) 146 147 148 [<UIHint("Location Detail")>] 149 [<NotMapped()>] 150 member public self.Location 151 with get() = new LocationDetail(self.Latitude,self.Longitude,self.Title,self.Address) 152 and set(value:LocationDetail) = 153 let latitude = value.Latitude 154 let longitude = value.Longitude 155 let title = value.Title 156 let address = value.Address 157 ()

Unlike C# where there is a class per file, all of the related elements are placed into a the same location.  Also, notice that the absence of semi-colons, curly braces, and other distracting characters, and finally you can see that because were are in the .NET framework, all of the data annotations are the same.  Sure enough, pointing the MVC UI to the application and hitting run, the application just works.

image

With the separation complete, it was time time to make our app much smarter.  The first thing that I thought of was when the person creates an account, they enter their first and last name

 

This seems like an excellent opportunity to add some user manipulation personalization to our site.  Going back to this analysis of names gives to newborns in the United States, if I know your first name, I have a pretty good chance of guessing your age/gender/and state of birth.  For example ‘Jose’ is probably a male born in his twenties in either Texas or California.  ‘James’ is probably a male in his 40s or 50s.

I added 6 pictures to the site for young,middleAged, and old males and females.

image

 

I then modified the logonStatus partial view like so

1 @using NerdDinner.UI; 2 3 4 @if(Request.IsAuthenticated) { 5 <text>Welcome <b>@(((NerdIdentity)HttpContext.Current.User.Identity).FriendlyName)</b>! 6 [ @Html.ActionLink("Log Off", "LogOff", "Account") ]</text> 7 } 8 else { 9 @:[ @Html.ActionLink("Log On", "LogOn", new { controller = "Account", returnUrl = HttpContext.Current.Request.RawUrl }) ] 10 } 11 12 @if (Session["adUri"] != null) 13 { 14 <img alt="product placement" title="product placement" src="@Session["adUri"]" height="40" /> 15 }

Then, I created a session variable called adUri that the picture will reference in the Logon controller

1 public ActionResult LogOn(LogOnModel model, string returnUrl) 2 { 3 if (ModelState.IsValid) 4 { 5 if (ValidateLogOn(model.UserName, model.Password)) 6 { 7 // Make sure we have the username with the right capitalization 8 // since we do case sensitive checks for OpenID Claimed Identifiers later. 9 string userName = MembershipService.GetCanonicalUsername(model.UserName); 10 11 FormsAuth.SignIn(userName, model.RememberMe); 12 13 AdProvider adProvider = new AdProvider(); 14 String catagory = adProvider.GetCatagory(userName); 15 Session["adUri"] = "/Content/images/" + catagory + ".png"; 16

And finally, I added an implementation of the adProvider back in the application:

1 type AdProvider () = 2 member this.GetCatagory personName: string = 3 "middleAgedMale"

So running the app, we have a product placement for a Middle Aged Male

image

So the last thing to do is to turn names into those categories.  I thought of a couple of different implementations: loading the entire census data set and searching it on demand,  I then thought about using Azure ML and making a API request each time, I then decided into just creating a lookup table that can be searched.  In any event, since I am using an interface, swapping out implementations is easy and since I am using F#, creating implementations is easy.

I went back to my script file that analyzed the baby names from the US census and created a new script.  I loaded the names into memory like before

1 #r "C:/Git/NerdChickenChicken/04_mvc3_Working/packages/FSharp.Data.2.0.14/lib/net40/FSharp.Data.dll" 2 3 open FSharp.Data 4 5 type censusDataContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/AK.TXT"> 6 type stateCodeContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv"> 7 8 let stateCodes = stateCodeContext.Load("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv"); 9 10 let fetchStateData (stateCode:string)= 11 let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode) 12 censusDataContext.Load(uri) 13 14 let usaData = stateCodes.Rows 15 |> Seq.collect(fun r -> fetchStateData(r.Abbreviation).Rows) 16 |> Seq.toArray 17

I then created a function that tells the probability of male

1 let genderSearch name = 2 let nameFilter = usaData 3 |> Seq.filter(fun r -> r.Mary = name) 4 |> Seq.groupBy(fun r -> r.F) 5 |> Seq.map(fun (n,a) -> n,a |> Seq.sumBy(fun (r) -> r.``14``)) 6 7 let nameSum = nameFilter |> Seq.sumBy(fun (n,c) -> c) 8 nameFilter 9 |> Seq.map(fun (n,c) -> n, c, float c/float nameSum) 10 |> Seq.filter(fun (g,c,p) -> g = "M") 11 |> Seq.map(fun (g,c,p) -> p) 12 |> Seq.head 13 14 genderSearch "James" 15

image

I then created a function that calculated the year the last name was popular (using 1 standard deviation away)

1 let ageSearch name = 2 let nameFilter = usaData 3 |> Seq.filter(fun r -> r.Mary = name) 4 |> Seq.groupBy(fun r -> r.``1910``) 5 |> Seq.map(fun (n,a) -> n,a |> Seq.sumBy(fun (r) -> r.``14``)) 6 |> Seq.toArray 7 let nameSum = nameFilter |> Seq.sumBy(fun (n,c) -> c) 8 nameFilter 9 |> Seq.map(fun (n,c) -> n, c, float c/float nameSum) 10 |> Seq.toArray 11 12 let variance (source:float seq) = 13 let mean = Seq.average source 14 let deltas = Seq.map(fun x -> pown(x-mean) 2) source 15 Seq.average deltas 16 17 let standardDeviation(values:float seq) = 18 sqrt(variance(values)) 19 20 let standardDeviation' name = ageSearch name 21 |> Seq.map(fun (y,c,p) -> float c) 22 |> standardDeviation 23 24 let average name = ageSearch name 25 |> Seq.map(fun (y,c,p) -> float c) 26 |> Seq.average 27 28 let attachmentPoint name = (average name) + (standardDeviation' name) 29 30 let popularYears name = 31 let allYears = ageSearch name 32 let attachmentPoint' = attachmentPoint name 33 let filteredYears = allYears 34 |> Seq.filter(fun (y,c,p) -> float c > attachmentPoint') 35 |> Seq.sortBy(fun (y,c,p) -> y) 36 filteredYears 37 38 let lastPopularYear name = popularYears name |> Seq.last 39 let firstPopularYear name = popularYears name |> Seq.head 40 41 lastPopularYear "James" 42

image

 

And then created a function that takes in the gender probability of being male and the last year the name was poular and assigns the name into a category:

1 let nameAssignment (malePercent, lastYearPopular) = 2 match malePercent > 0.75, malePercent < 0.75, lastYearPopular < 1945, lastYearPopular > 1980 with 3 | true, false, true, false -> "oldMale" 4 | true, false, false, false -> "middleAgedMale" 5 | true, false, false, true -> "youngMale" 6 | false, true, true, false -> "oldFemale" 7 | false, true, false, false -> "middleAgedFemale" 8 | false, true, false, true -> "youngFeMale" 9 | _,_,_,_ -> "unknown"

And then it was a matter of tying the functions together for each of the names in the master list:

1 let nameList = usaData 2 |> Seq.map(fun r -> r.Mary) 3 |> Seq.distinct 4 5 nameList 6 |> Seq.map(fun n -> n, genderSearch n) 7 |> Seq.map(fun (n,mp) -> n,mp, lastPopularYear n) 8 |> Seq.map(fun (n,mp,(y,c,p)) -> n, mp, y) 9 10 let nameList' = nameList 11 |> Seq.map(fun n -> n, genderSearch n) 12 |> Seq.map(fun (n,mp) -> n,mp, lastPopularYear n) 13 |> Seq.map(fun (n,mp,(y,c,p)) -> n, mp, y) 14 |> Seq.map(fun (n,mp,y) -> n,nameAssignment(mp,y)) 15

image

And then write the list out to a file

1 open System.IO 2 let outFile = new StreamWriter(@"c:\data\nameList.csv") 3 4 nameList' |> Seq.iter(fun (n,c) -> outFile.WriteLine(sprintf "%s,%s" n c)) 5 outFile.Flush 6 outFile.Close()

Thanks to this stack overflow post for the file write (I wish the csv type provider had this ability).  With the file created, I can then use the file as a lookup for my name function back in the MVC app using a csv type provider

1 type nameMappingContext = CsvProvider<"C:/data/nameList.csv"> 2 3 type AdProvider () = 4 member this.GetCatagory personName: string = 5 let nameList = nameMappingContext.Load("C:/data/nameList.csv") 6 let foundName = nameList.Rows 7 |> Seq.filter(fun r -> r.Annie = personName) 8 |> Seq.map(fun r -> r.oldFemale) 9 |> Seq.toArray 10 if foundName.Length > 0 then 11 foundName.[0] 12 else 13 "middleAgedMale"

And now I have some (basic) personalization to Nerd Dinner. (Emma is a young female name so they get a picturer of a campground)

image

So this a rather crude.  There is no provision for nicknames, case-sensitivity, etc.  But the site is along the way to becoming smarter…

The code can be found on github here.

Wake County Restaurant Inspection Data with Azure ML and F#

With Azure ML now available, I was thinking about some of the analysis I did last year and how I could do even more things with the same data set.  One such analysis that came to mind was the restaurant inspection data that I analyzed last year.  You can see the prior analysis here.

I uploaded the restaurant data into Azure and thought of a simple question –> can we predict inspection scores based on some easily available data?  This is an interesting dataset because there are some categorical data elements (zip code, restaurant type, etc…) and there are some continuous ones (priority foundation, etc…).

Here is the base dataset:

image

I created a new experiment and I used a boosted regression model and a neural network regression and used a 70/30 train/test split.

image

After running the models and inspecting the model evaluation, I don’t have a very good model

image

I then decided to go back and pull some of the X variables out of the dataset and concentrate on only a couple of variables.  I added a project column module and then selected Restaurant Type and Zip Code as the X variables and left the Inspection Score as the Y variable. 

image

With this done, I added a couple of more models (Bayesian Linear Regression and a Decision Forest Regression) and gave it a whirl

image

image

Interesting, adding these models did not give us any better of a prediction and dropping the variables to two made a less accurate model.  Without doing any more analysis, I picked the model with the lowest MAE )Boosted Decision Tree Regression) and published it at a web service:

image

I published it as a web service and now I can consume if from a client app.   I used the code that I used for voting analysis found here as a template and sure enough:

["27519","Restaurant","0","96.0897827148438"]

["27612","Restaurant","0","95.5728530883789"]

So restaurants in Cary,NC have a higher inspection score than the ones found in Northwest Raleigh.   However, before we start  alerting the the Cary Chamber of Commerce to create a marketing campaign (“Eat in Cary, we are safer”), the difference is within the MAE.

In any event, it would be easy to create a  phone app and you don’t know a restaurant score, you can punch in the establishment type and the zip code and have a good idea about the score of the restaurant. 

This is an academic exercise b/c the establishments have to show you their card and yelp has their score on them, but a fun exercise none the less.  Happy eating.

Consuming Azure ML web api endpoint from an array

Last week, I blogged about creating an Azure ML experiment, publishing it as a web service, and then consuming it from F#.  I then wanted to consume the web service using an array – passing in several values and seeing the results.  I created added on to my existing F #script with the following code

1 let input1 = new Dictionary<string,string>() 2 input1.Add("Zip Code","27519") 3 input1.Add("Race","W") 4 input1.Add("Party","UNA") 5 input1.Add("Gender","M") 6 input1.Add("Age","45") 7 input1.Add("Voted Ind","1") 8 9 let input2 = new Dictionary<string,string>() 10 input2.Add("Zip Code","27519") 11 input2.Add("Race","W") 12 input2.Add("Party","D") 13 input2.Add("Gender","F") 14 input2.Add("Age","47") 15 input2.Add("Voted Ind","1") 16 17 let inputs = new List<Dictionary<string,string>>() 18 inputs.Add(input1) 19 inputs.Add(input2) 20 21 inputs 22 |> Seq.map(fun i -> invokeService(i)) 23 |> Async.Parallel 24 |> Async.RunSynchronously 25

And sure enough, I can run the model using multiple inputs:

image

Consuming Azure ML With F#

(This post is a continuation of this one)

So with a model that works well enough,  I selected only that model and saved it

image

 

image

Created a new experiment and used that model with the base data.  I then marked the project columns as the input and the score as the output (green and blue circle respectively)

image

After running it, I published it as a web service

image

And voila, an endpoint ready to go.  I then took the auto generated script and opened up a new Visual Studio F# project to use it.  The problem was that this is the data structure that the model needs

FeatureVector = new Dictionary<string, string>() { { "Precinct", "0" }, { "VRN", "0" }, { "VRstatus", "0" }, { "VRlastname", "0" }, { "VRfirstname", "0" }, { "VRmiddlename", "0" }, { "VRnamesufx", "0" }, { "VRstreetnum", "0" }, { "VRstreethalfcode", "0" }, { "VRstreetdir", "0" }, { "VRstreetname", "0" }, { "VRstreettype", "0" }, { "VRstreetsuff", "0" }, { "VRstreetunit", "0" }, { "VRrescity", "0" }, { "VRstate", "0" }, { "Zip Code", "0" }, { "VRfullresstreet", "0" }, { "VRrescsz", "0" }, { "VRmail1", "0" }, { "VRmail2", "0" }, { "VRmail3", "0" }, { "VRmail4", "0" }, { "VRmailcsz", "0" }, { "Race", "0" }, { "Party", "0" }, { "Gender", "0" }, { "Age", "0" }, { "VRregdate", "0" }, { "VRmuni", "0" }, { "VRmunidistrict", "0" }, { "VRcongressional", "0" }, { "VRsuperiorct", "0" }, { "VRjudicialdistrict", "0" }, { "VRncsenate", "0" }, { "VRnchouse", "0" }, { "VRcountycomm", "0" }, { "VRschooldistrict", "0" }, { "11/6/2012", "0" }, { "Voted Ind", "0" }, }, GlobalParameters = new Dictionary<string, string>() { } };

And since I am only using 6 of the columns, it made sense to reload the Wake County Voter Data with just the needed columns.  I went back to the original CSV and did that.  Interestingly, I could not set the original dataset as the publish input so I added a project column module that does nothing

image

With that in place, I republished the service and opened Visual Studio.  I decided to start with a script.  I was struggling though the async when Tomas P helped me on Stack Overflow here.  I’ll say it again, the F# community is tops.  In any event, here is the initial script:

#r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" open System open System.Net.Http open System.Net.Http.Headers open System.Net.Http.Formatting open System.Collections.Generic type scoreData = {FeatureVector:Dictionary<string,string>;GlobalParameters:Dictionary<string,string>} type scoreRequest = {Id:string; Instance:scoreData} let invokeService () = async { let apiKey = "" let uri = "https://ussouthcentral.services.azureml.net/workspaces/19a2e623b6a944a3a7f07c74b31c3b6d/services/f51945a42efa42a49f563a59561f5014/score" use client = new HttpClient() client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Bearer",apiKey) client.BaseAddress <- new Uri(uri) let input = new Dictionary<string,string>() input.Add("Zip Code","27519") input.Add("Race","W") input.Add("Party","UNA") input.Add("Gender","M") input.Add("Age","45") input.Add("Voted Ind","1") let instance = {FeatureVector=input; GlobalParameters=new Dictionary<string,string>()} let scoreRequest = {Id="score00001";Instance=instance} let! response = client.PostAsJsonAsync("",scoreRequest) |> Async.AwaitTask let! result = response.Content.ReadAsStringAsync() |> Async.AwaitTask if response.IsSuccessStatusCode then printfn "%s" result else printfn "FAILED: %s" result response |> ignore } invokeService() |> Async.RunSynchronously

 

Unfortunately, when I run it, it fails.  Below is the Fiddler trace:

image

 

So it looks like the Json Serializer is postpending the “@” symbol.  I changed the records to types and voila:

image

You can see the final script here.

So then throwing in some different numbers. 

  • A millennial: ["27519","W","D","F","25","1","1","0.62500011920929"]
  • A senior citizen: ["27519","W","D","F","75","1","1","0.879632294178009"]

I wonder why social security never gets cut?

In any event, just to check the model:

  • A 15 year old: ["27519","W","D","F","15","1","0","0.00147285079583526"]

Azure ML and Wake County Election Data

I have been spending the last couple of weeks using Azure ML and I think it is one of the most exciting technologies for business developers and analysts since ODBC and FSharp type providers.   If you remember, when ODBC came out, every relational database in the world became accessible and therefore usable/analyzable.   When type providers came out, programming, exploring, and analyzing data sources became much easier and it expanded from RDBMS to all formats (notably Json).  So getting data was no longer a problem, but analyzing it still was.

Enter Azure ML. 

I downloaded the Wake County Voter History data from here.  I took the Excel spreadsheet and converted it to a .csv locally.  I then logged into Azure ML and imported the data

image

I then created an experiment and added the dataset to the canvas

image

 

And looked at the basic statistics of the data set

image

(Note that I find that using the FSharp REPL  a better way to explore the data as I can just dot each element I am interested in and view the results).

In any event, the first question I want to answer is

“given a person’s ZipCode, Race, Party,Gender, and Age, can I predict if they will vote in November”

To that end, I first narrowed down the columns using a Column Projection and picked only the columns I care about.  I picked “11/6/2012” and the X variable because that was the last  national election and that is what we are going to have in November.  I prob should have done 2010 b/c that is a national without a President, but that can be analyzed at a later date.

image

image

I then ran my experiment so the data would be available in the Project Column step.

image

 

I then renamed the columns to make them a bit readable by using a series Metadata Editors (it does not look like you can do all renames in 1 step.  Equally as annoying is that you have to add each module, run it, then add the next.)

image

(one example)

image

 

I then added a Missing Values scrubber for the voted column.  So instead of a null field, people who didn’t vote get a “N”

image

The problem is that it doesn’t work –> looks like we can’t change the values per column.

image

I asked the question on the forum but in the interest of time, I decided to change the voted column from a categorical column to an indicator. That way I can do binary analysis.  That also failed.  I went back to the original spreadsheet and added a Indicator column and then also renamed the column headers so I am not cluttering up my canvas with those meta data transforms.  Finally, I realized I want only active voters but there does not seems to be a filtering ability (remove rows only works for missing) so I removed those also from the original dataset.  I think the ability to scrub and munge data is an area for improvement, but since this is release 1, I understand.

After re-importing the data, I changed my experiment like so

image

I then split the dataset into Training/Validation/And Testing using a 60/20/20 split

image

So the left point on the second split is 60% of the original dataset, the right point on the second split is 20% of the original dataset (or 75%/25% of the 80% of the first split)

I then added a SVM with a train and score module.  Note that I am training with 60% of the original dataset and I am validating with 20%

 

image

After it runs, there are 2 new columns in the dataset –> Scored labels and probabilities so each row now has a score.

 

image

With the model in place, I can then evaluate it using an evaluation model

image

And we can see an AUC of .666, which immediately made me think of this

image

In any event, I added a Logisitc Regression and a Boosted Decision Tree to the canvas and hooked them up to the training and validation sets

image

And this is what we have

image image

 

SVM: .666 AUC

Regression: .689 AUC

Boosted Decision Tree: .713 AUC

So with Boosted Decision Tree ahead, I added a Sweep Parameter module to see if I can tune it more.  I am using AUC as the performance metric

image

image

So the best AUC I am going to get is .7134 with the highlighted parameters.  I then added 1 more Model that uses those parameters against the entire training dataset (80% of the total) and then evaluates it against the remaining 20%.

image

With the final answer of

image

With that in hand, I can create a new experiment that will be the bases of a real time voting app.

Follow

Get every new post delivered to your Inbox.