Parsing Microsoft MVP Pages and Uploading Photos to Sky Biometry

As a piece of the Terminator project that I am bringing to the MVP Summit, I wanted to load in all of the MVP photographs to Sky Biometry and if a person matches the photo at a high level, terminate them.  I asked my Microsoft contact if I could get all of the MVP photos to load into the app and they politely told me no.

Not being one who takes no lightly, I decided to see if I could load the photos from the MVP website.  Each MVP has a profile photo like here and all of the MVPs are listed here with their MVP IDs specified.  So if I can get the Id from the search page and then create a Uri to the photo, I can then load it into Sky Biometry.

I first created a new FSharp project and fired up a script window.  I created a function that gets the entire contents of a page with the only variable being the index number of the pagination.

1 let getPageContents(pageNumber:int) = 2 let uri = new Uri("http://mvp.microsoft.com/en-us/search-mvp.aspx?lo=United+States&sl=0&browse=False&sc=s&ps=36&pn=" + pageNumber.ToString()) 3 let request = WebRequest.Create(uri) 4 request.Method <- "GET" 5 let response = request.GetResponse() 6 let stream = response.GetResponseStream() 7 let reader = new StreamReader(stream) 8 reader.ReadToEnd() 9

I then parsed the page for all instances of the MVPId.  Fortunately, I found this post that helped me understand how the pattern match works in .NET.  Note that the regex for the tag mvpid=123456 is “mvpid=\d+”

1 let getMVPIdsFromPageContents(pageContents:string) = 2 let pattern = "mvpid=\d+" 3 let matchCollection = Regex.Matches(pageContents, pattern) 4 matchCollection 5 |> Seq.cast 6 |> Seq.map(fun (m:Match) -> m.Value) 7 |> Seq.map(fun s -> s.Split('=')) 8 |> Seq.map(fun a -> a.[1]) 9

With that out of the way, I could get a Seq of all MVP IDs (at least from America and then collect each of the pages together:

1 let getGetMVPIds(pageNumber: int) = 2 let pageContents = getPageContents(pageNumber) 3 getMVPIdsFromPageContents pageContents 4 5 let pageList = [1..17] 6 let mvpIds = pageList 7 |>Seq.collect(fun i -> getGetMVPIds(i)) 8

so far so good:

image

I then could create a method that generates the MVP Photo Uri:

1 let getMvpImageUri(mvpId: int) = 2 new Uri("http://mvp.microsoft.com/private/en-us/PublicProfile/Photo/" + mvpId.ToString()) 3

With that out of the way, it was time to point the photos to Sky Biometry for facial detection and tagging.  I used the code found in this post with a couple of changes to account that a face might not be found in the photo (hence the choice type) and that bad things might happen (like too big of a photo)

1 type skybiometryFaceDetection = JsonProvider<".\SkyBiometryImageJson\FaceDetection.json"> 2 type skybiometryAddTags = JsonProvider<".\SkyBiometryImageJson\AddTags.json"> 3 type skybiometryFaceTraining = JsonProvider<".\SkyBiometryImageJson\FaceTraining.json"> 4 5 let detectFace (imageUri:string) = 6 let stringBuilder = new StringBuilder() 7 stringBuilder.Append(skyBiometryUri) |> ignore 8 stringBuilder.Append("/fc/faces/detect.json?urls=") |> ignore 9 stringBuilder.Append(imageUri) |> ignore 10 stringBuilder.Append("&api_key=") |> ignore 11 stringBuilder.Append(skyBiometryApiKey) |> ignore 12 stringBuilder.Append("&api_secret=") |> ignore 13 stringBuilder.Append(skyBiometryApiSecret) |> ignore 14 try 15 let faceDetection = skybiometryFaceDetection.Load(stringBuilder.ToString()) 16 if faceDetection.Photos.[0].Tags.Length > 0 then 17 Some faceDetection.Photos.[0].Tags.[0].Tid 18 else 19 None 20 with | :? System.Exception -> None 21

I then added the other two methods to tag and recognize

1 let saveTag(uid:string, tid:string)= 2 let stringBuilder = new StringBuilder() 3 stringBuilder.Append(skyBiometryUri) |> ignore 4 stringBuilder.Append("/fc/tags/save.json?uid=") |> ignore 5 stringBuilder.Append(uid) |> ignore 6 stringBuilder.Append("&tids=") |> ignore 7 stringBuilder.Append(tid) |> ignore 8 stringBuilder.Append("&api_key=") |> ignore 9 stringBuilder.Append(skyBiometryApiKey) |> ignore 10 stringBuilder.Append("&api_secret=") |> ignore 11 stringBuilder.Append(skyBiometryApiSecret) |> ignore 12 let tags = skybiometryAddTags.Load(stringBuilder.ToString()) 13 tags.Status 14 15 let trainFace(uid:string)= 16 let stringBuilder = new StringBuilder() 17 stringBuilder.Append(skyBiometryUri) |> ignore 18 stringBuilder.Append("/fc/faces/train.json?uids=") |> ignore 19 stringBuilder.Append(uid) |> ignore 20 stringBuilder.Append("&api_key=") |> ignore 21 stringBuilder.Append(skyBiometryApiKey) |> ignore 22 stringBuilder.Append("&api_secret=") |> ignore 23 stringBuilder.Append(skyBiometryApiSecret) |> ignore 24 let training = skybiometryFaceTraining.Load(stringBuilder.ToString()) 25 training.Status 26

Upon reflection, this would have been a perfect place for Scott W’s ROP, but I just created a covering function

1 let saveToSkyBiometry(mvpId:string, imageUri:string) = 2 let tid = detectFace(imageUri) 3 match tid with 4 | Some x -> saveTag(mvpId + "@terminatorChicken",x) |> ignore 5 trainFace(mvpId + "@terminatorChicken") 6 | None -> "Failure" 7 8 let results = mvpIds 9 |> Seq.map(fun mvpId -> mvpId, getMvpImageUri(Int32.Parse(mvpId))) 10

I then created a Seq.Map to call all of the photos in order but I quickly ran into this:

Capture

So I changed my Seq.Map to a Loop so I could throttle the requests:

1 for (mvpId,uri) in results do 2 let result= saveToSkyBiometry(mvpId, uri.ToString()) 3 printfn "%s" result 4 Thread.Sleep(TimeSpan.FromMinutes(1.)) 5

And sure enough

Capture1Capture2

And you can see the load every hour

Capture3

You can see the full code here.

Advertisements

2 Responses to Parsing Microsoft MVP Pages and Uploading Photos to Sky Biometry

  1. Grant Crofton says:

    Cool, the Sky Biometry service looks fun!

    Next time you’re making the calls, try Http.fs – implement your getPageContents in one line, and make the URI construction a bit nicer too! ;-D. FSharp.Data has something similar.

    (I guess you know about at least one of those and didn’t want any dependencies in your blog code, but I hate to miss a chance to plug my thing..)

  2. Pingback: Anniversary edition of F# Weekly #43, 2014 – Two years together | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: