Global Azure Bootcamp: Car Lab Analysis
April 28, 2015 1 Comment
As part of the Global Azure Bootcamp, the organizers created a hand-on lab where individuals could install a racing game and compete against other drivers. The cool thing was the amount of telemetry that the game pushed to Azure (I assume using Event Hubs to Azure Tables). The lab also had a basic “hello world” web app that could read data from the Azure Table REST endpoints so newcomers could see how easy it was to create and then deploy a website on Azure.
I decided to take a bit of a jaunt though the data endpoint to see what analytics I could run on it using Azure ML. I went to the initial endpoint here and sure enough, the data comes down in the browser. Unfortunately, when I set it up in Azure ML using a data reader:
I got 0 records returned. I think this has something to do with how the datareader deals with XML. I quickly used F# in Visual Studio with the XML type provider:
1 #r "../packages/FSharp.Data.2.2.0/lib/net40/FSharp.Data.dll" 2 3 open FSharp.Data 4 5 [<Literal>] 6 let uri = "https://reddoggabtest-secondary.table.core.windows.net/TestTelemetryData0?tn=TestTelemetryData0&sv=2014-02-14&si=GabLab&sig=GGc%2BHEa9wJYDoOGNE3BhaAeduVOA4MH8Pgss5kWEIW4%3D" 7 8 type CarTelemetry = XmlProvider<uri> 9 let carTelemetry = CarTelemetry.Load(uri) 10 11
I reached out to the creator of the lab and he put a summary file on Azure Blob Storage that was very easy to consume with AzureML, you can find it herehere. I created Regression to predict the amount of damage a car will sustain based on the country and car type:
This was great, but I wanted to working on my R chops some so I decided to play around with the data in R Studio. I imported the data into R Studio and then fired up the scripting window. The first question I wanted to answer was “how does each country stack up against each other in terms of car crashes?”
I did some basic data exploration like so:
1 summary(PlayerLapTimes) 2 3 aggregate(Damage ~ Country, PlayerLapTimes, sum) 4 aggregate(Damage ~ Country, PlayerLapTimes, FUN=length) 5
And then getting down to the business of answering the question:
1 2 dfSum <- aggregate(Damage ~ Country, PlayerLapTimes, sum) 3 dfCount <- aggregate(Damage ~ Country, PlayerLapTimes, FUN=length) 4 5 dfDamage <- merge(x=dfSum, y=dfCount, by.x="Country", by.y="Country") 6 names(dfDamage) <- "Sum" 7 names(dfDamage) <- "Count" 8 dfDamage$Avg <- dfDamage$Sum/dfDamage$Count 9 dfDamage2 <- dfDamage[order(dfDamage$Avg),] 10
So that is kinda interesting that France has the most damage per race. I have to ask Mathias Brandewinder about that.
In any event, I then wanted to ask “what county finished first”. I decided to apply some R charting to the same biolerplate that I created earlier
1 dfSum <- aggregate(LapTimeMs ~ Country, PlayerLapTimes, sum) 2 dfCount <- aggregate(LapTimeMs ~ Country, PlayerLapTimes, FUN=length) 3 dfSpeed <- merge(x=dfSum, y=dfCount, by.x="Country", by.y="Country") 4 names(dfSpeed) <- "Sum" 5 names(dfSpeed) <- "Count" 6 dfSpeed$Avg <- dfSpeed$Sum/dfSpeed$Count 7 dfSpeed2 <- dfSpeed[order(dfSpeed$Avg),] 8 plot(PlayerLapTimes$Country,PlayerLapTimes$Damage) 9
So even though France appears to have the slowest drivers, the average is skewed by 2 pretty bad races –> perhaps the person never finished.
In any event, this was a fun exercise and I hope to continue with the data to show the awesomeness of Azure, F#, and R…