The Counted: Initial Analysis Using FSharp and R

(Note: this is post one of three.  Next week is a deeper dive into the data and the following week is an analysis of law enforcement officers killed in the line of duty)

Andrew Oliver hit me up on Twitter with a new dataset that he stumbled across.  The dataset is called “The Counted” and it is an attempt to count all of the deaths at the hand of police in America in 2015.  Apparently, this data is not collected systematically by the US government, which is kind of puzzling.  You can read about and download the data here.  A sample looks like:

image

John asked what we could do with the dataset –> esp when comparing to other variables like socio-economic status.  Step #1 in my mind was to geo-locate the data.  Since this is a .csv, the first-first thing was to remove extra commas and replace them with semi-colons or blank spaces (for example, US Marshals Service, Pennsylvania State Police, Allegheny County Sheriff’s Office became US Marshals Service; Pennsylvania State Police; Allegheny County Sheriff’s Office and Corrections Department, 1400 E 4th Ave became Corrections Department 1400 E 4th Ave)

Adding Geolocations

Drawing on my code that I wrote using Texas A&M’s Geoservice found here, I converted the json type provider script into a function that takes address info and returns a geolocation:

1 let getGeoCoordinates(streetAddress:string, city:string, state:string) = 2 let apiKey = "xxxxx" 3 let stringBuilder = new StringBuilder() 4 stringBuilder.Append("https://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx") |> ignore 5 stringBuilder.Append("?streetAddress=") |> ignore 6 stringBuilder.Append(streetAddress) |> ignore 7 stringBuilder.Append("&city=") |> ignore 8 stringBuilder.Append(city) |> ignore 9 stringBuilder.Append("&state=") |> ignore 10 stringBuilder.Append(state) |> ignore 11 stringBuilder.Append("&apiKey=") |> ignore 12 stringBuilder.Append(apiKey) |> ignore 13 stringBuilder.Append("&version=4.01") |> ignore 14 stringBuilder.Append("&format=json") |> ignore 15 16 let searchUri = stringBuilder.ToString() 17 let searchResult = GeoLocationServiceContext.Load(searchUri) 18 19 let firstResult = searchResult.OutputGeocodes |> Seq.head 20 firstResult.OutputGeocode.Latitude, firstResult.OutputGeocode.Longitude, firstResult.OutputGeocode.MatchScore

I then loaded in the dataset via the .csv type provider:

1 [<Literal>] 2 let theCountedSample = "..\Data\TheCounted.csv" 3 type TheCountedContext = CsvProvider<theCountedSample> 4 let theCountedData = TheCountedContext.Load(theCountedSample) 5

I then mapped the geofunction to the imported dataset:

1 let theCountedGeoLocated = theCountedData.Rows 2 |> Seq.map(fun r -> r, getGeoCoordinates(r.Streetaddress, r.City, r.State)) 3 |> Seq.toList 4 |> Seq.map(fun (r,(lat,lon,ms)) -> String.Format("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15}", 5 r.Name,r.Age,r.Gender,r.Raceethnicity,r.Month,r.Day,r.Year, r.Streetaddress, r.City,r.State,r.Cause,r.Lawenforcementagency,r.Armed,lat,lon,ms)) 6

And then finally exported the data

1 let baseDirectory = __SOURCE_DIRECTORY__ 2 let baseDirectory' = Directory.GetParent(baseDirectory) 3 let filePath = "Data\TheCountedWithGeo.csv" 4 let fullPath = Path.Combine(baseDirectory'.FullName, filePath) 5 File.WriteAllLines(fullPath,theCountedGeoLocated)

image

The gist is here.  Using the csv and json type providers made the analysis a snap –> a majority code is just building up the string for the service call.  +1 for simplicity.

Analyzing The Results

After adding geolocations to the dataset, I opened R studio and imported the dataset.

1 theCounted <- read.csv("./Data/TheCountedWithGeo.csv") 2 summary(theCounted) 3

 

image

So this is good news that we have good confidence on all of the observations so we don’t have to drop any records (making the counted, un-counted, as it were).

I then googled how to create a US map and put some data points on them and ran across this post.  I copied and pasted the code, changed the variable names, said “there is no way it is this easy” out loud, and hit CTRL+ENTER.

1 library(ggplot2) 2 library(maps) 3 4 all.states <- map_data("state") 5 plot <- ggplot() 6 plot <- plot + geom_polygon(data=all.states, aes(x=long, y=lat, group = group), 7 colour="grey", fill="white" ) 8 plot <- plot + geom_point(data=theCounted, aes(x=lon, y=lat), 9 colour="#FF0040") 10 plot <- plot + guides(size=guide_legend(title="Homicides")) 11 plot

 

image

The gist is here.

Advertisements

3 Responses to The Counted: Initial Analysis Using FSharp and R

  1. Nice example 🙂

    You can probably simplify the code a little if you use the “Http.RequestString” function available in F# Data – it lets you specify the query parameters as an F# list and so you do not need to construct the list with string builder – it would look something like this:

    Http.RequestString(…, query=[“streetAddress”, streetAddress; “city”, city; … “version”, “4.01”])

  2. Pingback: Dew Drop – July 8, 2015 (#2049) | Morning Dew

  3. Pingback: F# Weekly #28, 2015 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: