Wake County Restaurant Inspection Data with Azure ML and F#

With Azure ML now available, I was thinking about some of the analysis I did last year and how I could do even more things with the same data set.  One such analysis that came to mind was the restaurant inspection data that I analyzed last year.  You can see the prior analysis here.

I uploaded the restaurant data into Azure and thought of a simple question –> can we predict inspection scores based on some easily available data?  This is an interesting dataset because there are some categorical data elements (zip code, restaurant type, etc…) and there are some continuous ones (priority foundation, etc…).

Here is the base dataset:

image

I created a new experiment and I used a boosted regression model and a neural network regression and used a 70/30 train/test split.

image

After running the models and inspecting the model evaluation, I don’t have a very good model

image

I then decided to go back and pull some of the X variables out of the dataset and concentrate on only a couple of variables.  I added a project column module and then selected Restaurant Type and Zip Code as the X variables and left the Inspection Score as the Y variable. 

image

With this done, I added a couple of more models (Bayesian Linear Regression and a Decision Forest Regression) and gave it a whirl

image

image

Interesting, adding these models did not give us any better of a prediction and dropping the variables to two made a less accurate model.  Without doing any more analysis, I picked the model with the lowest MAE )Boosted Decision Tree Regression) and published it at a web service:

image

I published it as a web service and now I can consume if from a client app.   I used the code that I used for voting analysis found here as a template and sure enough:

["27519","Restaurant","0","96.0897827148438"]

["27612","Restaurant","0","95.5728530883789"]

So restaurants in Cary,NC have a higher inspection score than the ones found in Northwest Raleigh.   However, before we start  alerting the the Cary Chamber of Commerce to create a marketing campaign (“Eat in Cary, we are safer”), the difference is within the MAE.

In any event, it would be easy to create a  phone app and you don’t know a restaurant score, you can punch in the establishment type and the zip code and have a good idea about the score of the restaurant. 

This is an academic exercise b/c the establishments have to show you their card and yelp has their score on them, but a fun exercise none the less.  Happy eating.

Tech Jam and Team Islington Green

IslingtonGreen

 

So a couple of friends from TRINUG – Ian Cillay and David Green – invited me to join them at a 36 hour continuous code fest called Tech Jam. Tech Jam was put on Met Life and the Department of Veterans Affairs on Nov 1 and 2 in RTP. This team of 3 developers was joined by Ian Henshaw who helped with some primary data provision and much of the user story development. David handled the MongoDB part, Ian took care of the UI (Bootstrap and Knockout), and I did the analytics (F# and R) and security piece.

The problem domain was that the Veterans Administration has this format of medical records called Blue Button. Blue Button is an unstructured format which make typical parsing and analysis very difficult. Here is a sample:

clip_image002

Also, the VA wanted some kind of mutli-platform solution that a vet can use that can make sense of this data and allow him/her to provide it to a care giver when needed.

Some random thoughts about the contest:

  • MongoDB makes a lot of sense for the data because of it being so unstructured. Note that the VA has now introduced BlueButton+, which is XML format – so that is a step in the right direction. I was impressed how easy Mongo was to use and how powerful it is to tackle unstructured data – but note that even MongoDb still needs some kind of structure – just not xNF relational…
  • Bootstrap was awesome – we used an out of the box template and it was great to knock out an easy design.
  • F# made analytics and predictive analysis a snap. 
  • There were 15 teams registered, 10 actually presented at the end. There were 2 community teams (us and another), 1 high school team (awesome), and a bunch of corporate teams (Deutsche Bank, IBM, Tata(X3!), etc…).
  • One of the teams (Infusion) was a vendor for Met Life already (they worked on the wall project and the infinity project). Unsurprisingly, they won the grand prize.  My only comment on that is that civic/community hackers already view events like this with a skeptical eye – and this did nothing to help MetLife in the eyes of those kind of people – in fact probably did the opposite.
  • I was amazed by how many teams did not actually address the primary problems that the VA needed fixed.  The problems were taming unstructured data and presenting it in a platform-agnostic way.  Most teams used HTML5/Phonegap for req #2 – which is the easier one.  I think only 3 teams actually addressed requirement #1?
  • I am proud to say that my team did address both requirements.  As I sometimes say “I listen to two things in life: my wife and the requirements.  It goes better for me if I do that.”
  • Another team was from a company where the boss showed up on Saturday to present the team’s work. I guess that is the difference with a corporate hack-a-thon.  Also, you can tell the corporate teams because they had more powerpoint and less code in their final presentation.  Not that there is anything wrong with that….
  • The best line this weekend was when I asked a D level person at Metlife why no one at Metlife was retweeting my tweets with their hashtags (#techjam) and he said "we have a guy for that." Sure enough, they had 1 person who was their tweet guy.
  • MetLife really know how to put on a contest. The MC for this – Gary Hoberman – was awesome – a V-Level techie that really could communicate with the coders. The food was good (they took into account different dietary needs), the working space was good and the swag was useable.  Also, they had dev mentors circulating around the room, though they seemed to spend their time with other teams – so we didn’t interact with them.  They also had reps from MongoDB and MSFT helping out – which is great because we leaned on them for specific problems.

The problem with a code contest is that you have little time to learn from your fellow devs – it is pretty much heads down and check-in. In any event, the best part was hanging out great developers like Ian and David and working on a worthwhile project for our country’s vets.  At the end, it was a fun time and my team delivered that the VA can use to springboard into a real application.