Restaurant Classification Via the Yellow Pages API Using F#

As part of the restaurant analysis I did for open data day, I built a crude classifier to identify Chinese restaurants.  The classifier looked at the name of the establishment and if certain key words were in the name, it was tagged as a Chinese restaurant.

  1. member public x.IsEstablishmentAChineseRestraurant (establishmentName:string) =
  2.     let upperCaseEstablishmentName = establishmentName.ToUpper()
  3.     let numberOfMatchedWords = upperCaseEstablishmentName.Split(' ')
  4.                                 |> Seq.map(fun x -> match x with
  5.                                                         | "ASIA" -> 1
  6.                                                         | "ASIAN" -> 1
  7.                                                         | "CHINA" -> 1
  8.                                                         | "CHINESE" -> 1
  9.                                                         | "PANDA" -> 1
  10.                                                         | "PEKING" -> 1
  11.                                                         | "WOK" -> 1
  12.                                                         | _ -> 0)
  13.                                 |> Seq.sum
  14.     match numberOfMatchedWords with
  15.         | 0 -> false
  16.         | _ -> true

Although this worked well enough for the analysis, I was interested in seeing if there was a way of using something that is more precise.  To that end, I thought of the Yellow Pages – they classify restaurants into categories and assuming that the restaurant is in the yellow pages, it is a better way to determine the restaurant category versus just a name search.

The first thing I did was head over to the Yellow Pages (YP.com) website and sure enough, they have an API and a developers program.  I signed up and had an API key within a couple of minutes.

The first thing I did was to try and search for a restaurant in the browser.  I picked the first restaurant I came across in the dataset – Jumbo China #5.  I created a request uri based on their API like so

http://pubapi.atti.com/search-api/search/devapi/search?term=Jumbo+China+5&searchloc=6108+Falls+Of+Neuse+Rd+27609&format=json&key=XXXXXXXXXX

When I plugged the name into the browser, I got this:

image

After screwing around with the code for about ten minutes thinking it was my API Key (Invalid Key would lead you to believe that, no?), Mike Thomas came over and told me that the url encoding was messing with my request – specifically the ‘#’ in Jumbo China #5.  When I removed the # symbol, I got Json back:

image

Throwing the Json into Json2CSharp, the results look great:

image

I then took this URL and tried to load it into a F# type provider, I couldn’t understand why I was getting a red squiggly line of approbation (Json and XML):

image

 

so I pulled out Fiddler to see I was getting a 400.  Digging into the response value, I found that “User Agent” was a required field. 

image

The problem was then compounded because the FSharp Json type provider does not allow you to enter a User Agent into the constructor.  I headed over to Stack Overflow where Thomas Petricek was kind enough to answer the question – basically you have to use the FSharp Http class to make the request (which you can add the user agent to) and then parse the response via the JsonProvider using the “Parse” versus the “Load” method.  So spinning up the method like so:

image

This gave me the results back that I wanted.  I then created a couple of methods to clean up any characters that might screw up the url encoding, added some argument validation, and I had a pretty good module to consume the YP.com listings:

  1. namespace ChickenSoftware.RestaurantClassifier
  2.  
  3. open System
  4. open FSharp.Data
  5. open FSharp.Net
  6.  
  7. type ypProvider = JsonProvider< @"YP.txt">
  8.  
  9. type RestaurantCatagoryRepository() =
  10.    member this.GetCatagories(restaurantName: string, restaurantAddress: string) =
  11.         if(String.IsNullOrEmpty(restaurantName)) then
  12.             failwith("restaurantName cannot be null or empty.")
  13.         if(String.IsNullOrEmpty(restaurantAddress)) then
  14.             failwith("restaurantAddress cannot be null or empty.")
  15.         let cleanedName = this.CleanName(restaurantName)
  16.         let cleanedAddress = this.CleanAddress(restaurantAddress);
  17.         let uri = "http://pubapi.atti.com/search-api/search/devapi/search?term=&quot;+cleanedName+"&searchloc="+cleanedAddress+"&format=json&key=XXXXXX"
  18.         let response = FSharp.Net.Http.Request(uri, headers=["user-agent", "None"])
  19.         let ypResult = ypProvider.Parse(response)
  20.         try
  21.             ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
  22.         with
  23.             | ex -> String.Empty
  24.  
  25.     member this.CleanName(name: string) =
  26.                 name.Replace("#","").Replace(" ","+")
  27.  
  28.     member this.CleanAddress(address: string)=
  29.                 address.Replace("#","").Replace(" ","+")
  30.     
  31.     member this.IsCatagoryInCatagories(catagories: string, catagory: string) =
  32.         if(String.IsNullOrEmpty(catagories)) then false
  33.         else if (String.IsNullOrEmpty(catagory)) then false
  34.         else catagories.Contains(catagory)
  35.  
  36.     member this.IsRestaurantInCatagory(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
  37.         if(String.IsNullOrEmpty(restaurantName)) then
  38.             failwith("restaurantName cannot be null or empty.")
  39.         if(String.IsNullOrEmpty(restaurantAddress)) then
  40.             failwith("restaurantAddress cannot be null or empty.")
  41.         if(String.IsNullOrEmpty(restaurantCatagory)) then
  42.             failwith("restaurantCatagory cannot be null or empty.")
  43.  
  44.         System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
  45.         let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  46.         if(String.IsNullOrEmpty(catagories)) then false
  47.         else this.IsCatagoryInCatagories(catagories,restaurantCatagory)
  48.  
  49.     member this.IsRestaurantInCatagoryAsync(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
  50.         async {
  51.             if(String.IsNullOrEmpty(restaurantName)) then
  52.                 failwith("restaurantName cannot be null or empty.")
  53.             if(String.IsNullOrEmpty(restaurantAddress)) then
  54.                 failwith("restaurantAddress cannot be null or empty.")
  55.             if(String.IsNullOrEmpty(restaurantCatagory)) then
  56.                 failwith("restaurantCatagory cannot be null or empty.")
  57.  
  58.             let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  59.             if(String.IsNullOrEmpty(catagories)) then return false
  60.             else return this.IsCatagoryInCatagories(catagories,restaurantCatagory)
  61.         }

The associated unit and integration tests that I made in building this module look like this:

  1. [TestClass]
  2. public class CatagoryBuilderTests
  3. {
  4.  
  5.     [TestMethod]
  6.     public void CleanName_ReturnsExpectedValue()
  7.     {
  8.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  9.         String restaurantName = "Jumbo China #5";
  10.  
  11.         String expected = "Jumbo+China+5";
  12.         String actual = repository.CleanName(restaurantName);
  13.         Assert.AreEqual(expected, actual);
  14.     }
  15.  
  16.     [TestMethod]
  17.     public void CleanAddress_ReturnsExpectedValue()
  18.     {
  19.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  20.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  21.  
  22.         String expected = "6108+Falls+Of+Neuse+Rd+27609";
  23.         String actual = repository.CleanAddress(restaurantAddress);
  24.         Assert.AreEqual(expected, actual);
  25.     }
  26.  
  27.  
  28.     [TestMethod]
  29.     public void GetCatagories_ReturnsExpectedValue()
  30.     {
  31.         string restaurantName = "Jumbo China #5";
  32.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  33.  
  34.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  35.         var result = repository.GetCatagories(restaurantName, restaurantAddress);
  36.         Assert.IsNotNull(result);
  37.     }
  38.  
  39.     [TestMethod]
  40.     public void CatagoryIsContainedInCatagoriesUsingValidTrueData_ReturnsExpectedValue()
  41.     {
  42.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  43.  
  44.         String catagories = "Chinese Restaurants|Restaurants|";
  45.         String catagory = "Chinese";
  46.  
  47.         Boolean expected = true;
  48.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  49.  
  50.         Assert.AreEqual(expected, actual);
  51.     }
  52.  
  53.     [TestMethod]
  54.     public void CatagoryIsContainedInCatagoriesUsingValidFalseData_ReturnsExpectedValue()
  55.     {
  56.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  57.  
  58.         String catagories = "Chinese Restaurants|Restaurants|";
  59.         String catagory = "Seafood";
  60.  
  61.         Boolean expected = false;
  62.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  63.  
  64.         Assert.AreEqual(expected, actual);
  65.     }
  66.  
  67.     [TestMethod]
  68.     public void IsJumboChinaAChineseRestaurant_ReturnsTrue()
  69.     {
  70.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  71.  
  72.         string restaurantName = "Jumbo China #5";
  73.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  74.         String restaurantCatagory = "Chinese";
  75.  
  76.         Boolean expected = true;
  77.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  78.  
  79.         Assert.AreEqual(expected, actual);
  80.     }
  81.  
  82.     [TestMethod]
  83.     public void IsJumboChinaAnItalianRestaurant_ReturnsFalse()
  84.     {
  85.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  86.  
  87.         string restaurantName = "Jumbo China #5";
  88.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  89.         String restaurantCatagory = "Italian";
  90.  
  91.         Boolean expected = false;
  92.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  93.  
  94.         Assert.AreEqual(expected, actual);
  95.     }
  96.  
  97.     [TestMethod]
  98.     public void IsUnknownAnItalianRestaurant_ReturnsFalse()
  99.     {
  100.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  101.  
  102.         string restaurantName = "Some Unknown Restaurant";
  103.         String restaurantAddress = "Some Address";
  104.         String restaurantCatagory = "Italian";
  105.  
  106.         Boolean expected = false;
  107.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  108.  
  109.         Assert.AreEqual(expected, actual);
  110.     }
  111.  
  112.  
  113.  
  114.     [TestMethod]
  115.     public void CatagoryIsContainedInCatagoriesUsingEmptyCatagory_ReturnsExpectedValue()
  116.     {
  117.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  118.  
  119.         String catagories = "Chinese Restaurants|Restaurants|";
  120.         String catagory = String.Empty;
  121.  
  122.         Boolean expected = false;
  123.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  124.  
  125.         Assert.AreEqual(expected, actual);
  126.     }

The hardest test to get run green was the negative test – passing in a restaurant name that is not recognized

  1. [TestMethod]
  2. public void IsUnknownAnItalianRestaurant_ReturnsFalse()
  3. {
  4.     RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  5.  
  6.     string restaurantName = "Some Unknown Restaurant";
  7.     String restaurantAddress = "Some Address";
  8.     String restaurantCatagory = "Italian";
  9.  
  10.     Boolean expected = false;
  11.     Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  12.  
  13.     Assert.AreEqual(expected, actual);
  14. }

To code around the fact that a different set of Json came back and the original code is expecting a specific structure, I finally resorted to a try…catch

  1. try
  2.     ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
  3. with
  4.     | ex -> String.Empty

I feel dirty, but I don’t know how else to get around it.  In any event, I then coded up a module that pulled the list of restaurants from Azure and put them through the classifier.

  1. namespace ChickenSoftware.RestaurantClassifier
  2.  
  3. open FSharp.Data
  4. open System.Linq
  5. open System.Configuration
  6. open Microsoft.FSharp.Linq
  7. open Microsoft.FSharp.Data.TypeProviders
  8.  
  9. type internal SqlConnection = SqlEntityConnection<ConnectionStringName="azureData">
  10.  
  11. type public RestaurantBuilder () =
  12.     
  13.     let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;
  14.     
  15.     member public this.GetRestaurants () =
  16.         SqlConnection.GetDataContext(connectionString).Restaurants
  17.             |> Seq.map(fun x -> x.EstablishmentName, x.EstablishmentAddress + " " + x.EstablishmnetZipCode)
  18.             |> Seq.toArray
  19.             
  20.     member public this.GetChineseRestaurants () =
  21.         let catagoryRepository = new RestaurantCatagoryRepository()
  22.         let catagory = "Chinese"
  23.         this.GetRestaurants()
  24.                 |> Seq.filter(fun (name, address) -> catagoryRepository.IsRestaurantInCatagory(name, address,catagory))
  25.                 |> Seq.toList

This code is almost identical to the code I posted 2 weeks ago.  Sure enough, When I threw my integration tests at the functions, check out fiddler. 

image

I was getting responses.  I ran into the problem on the 50th request though.

image

To get around this occasional timeout issue, I threw in a second delay between each request, which seemed the solve the problem.

  1. System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
  2. let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  3. if(String.IsNullOrEmpty(catagories)) then false
  4. else this.IsCatagoryInCatagories(catagories,restaurantCatagory)

However, this then introduced a new problem.  There are 4,000 or so restaurants, so that is over 66 minutes of running.  Not good.  Next week, I hope to add some parallelism to speed things up…

 

 

 

 

 

Advertisements

One Response to Restaurant Classification Via the Yellow Pages API Using F#

  1. Pingback: F# Weekly #9, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: