Crime Analytics: Visualization of Incident Reports

Key Finding
Prostitution in Seattle occurs primarily along North Aurora Ave.

Project Description

This is an analysis of Seattle criminal data from summer 2014. It covers the months of June, July and August. There are 32,779 observations in the data set and each observation corresponds to a single crime report. Each observation consists of 17 variables, including the timing of the report, the timing of the incident, location data and the type of crime reported. In this analysis, I focus on prostitution arrests, specifically crime reports from the data set with the offense types “PROSTITUTION” and “PROSTITUTION LOITERING”.

Getting the data

For the analysis, I downloaded the subset of Seattle crime data from the course github repository in the form of the file: “seattle_incidents_summer_2014.csv”. In order to get an overall view of the “crime scene”, my initial approach was to load all of the data into a Google Fusion table (

Exploratory analysis

Because the Seattle crime data contains latitude and longitude fields, the Google fusion tables automatically plotted all of the crime reports in the data set on a Google map. The results are in the figure below.



Crime reports are spread fairly uniformly throughout the city with a few dense clusters, most notably in downtown Seattle. I was interested to know whether various types of crimes occurred more frequently in specific parts of the city. Using, the filter feature on Google fusion tables, I filtered the data by the “Offense Type” field and observed the crime report locations on the map. For the most part many crimes followed the overall uniform pattern in the figure above. Prostitution, however, was quite different as seen in the figure below.

Records of prostitution arrests in Seattle follow nearly a straight line through the middle of the metropolitan area and the vast majority of those that occurred in the summer of 2014 were along a very specific stretch of North Aurora Ave. Aurora Ave. was formerly the main route for automobile travelers heading north out of town and many motels sprang up to cater to these travelers. Following the construction of Interstate 5, however this area of Aurora Ave. has fallen into decay.

I was also interested to know the time of day in which prostitution arrests occurred in Seattle. This required some manipulation of the data, for which I used R. I first read the data from the csv file:

seattle <- read.csv('seattle_incidents_summer_2014.csv')

Then, I converted the character strings from the “Occurred.Date.orDate.Range.start” field into R DateTime objects:

times = as.POSIXct(seattle$Occurred.Date.or.Date.Range.Start, format="%m/%d/%Y %I:%M:%S %p")

I separated the times into hour long bins:

hours<-as.numeric(format(times, "%H"))

and created a new field in the data frame representing the hour in which the crime occurred:

seattle$hours = hours

Finally, I made a subset of the data representing only the prostitution arrests:

prostitution = subset(seattle, seattle$Offense.Type == "PROSTITUTION" || seattle$Offense.Type == "PROSTITUTION.LOITERING")

and plotted those by hour using the ggplot2 package:

ggplot(data=prostitution, aes(x=hours, fill=hours)) + geom_bar(colour="black", fill="red") + xlab("Hour") + ylab("Arrests") + scale_x_discrete(limits=c(0:23))+ ggtitle('Seattle prostitution arrests by hour')

This resulted in the figure below:


With a small bump near lunch time, prostitution arrests are clustered around the hours of 7:00 PM – 11:00 PM (19:00 -23:00) local time indicating that prostitution in Seattle occurs mostly at night. I was also interested to know during which parts of the week prostitution arrests were most common. Again using R, I converted the time objects I had earlier produced into weekdays:

days = weekdays(times)

added these to the dataframe:

$seattle$days = days

filtered the dataframe for prostitution arrests:

prostitution = subset(seattle, seattle$Offense.Type == "PROSTITUTION" | seattle$Offense.Type == "PROSTITUTION.LOITERING")

and plotted the results of the subset:

ggplot(data=prostitution, aes(x=days, fill=days)) + geom_bar(colour="black", fill="red") + xlab("Day of week") + ylab("Arrests") + ggtitle('Seattle prostitution arrests by day of week') + theme(plot.title = element_text(lineheight=1, face='bold')) + scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

This resulted in the following figure:


I had expected prostitution arrests to occur mostly on Friday and Saturday nights, but was surprised to find that most prostitution arrests occurred on Thursdays and Sundays. Exploring the reasons for this would be an interesting endeavor, but outside of the scope of this project. It should, however, be noted that prostitution arrests are rarely the result of citizen reports, but more commonly the result of sting operations by the police department. The timing of the prostitution arrests may be more of an indication of when sting operations were underway than it is of when prostitution was actually occurring.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>