-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathpaper.Rmd
194 lines (145 loc) · 12.3 KB
/
paper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Data Analysis of Crime Statistics"
subtitle: "Toronto’s Police Service Annual Statistical Report, 2014 - 2019"
author: "Youjing Li"
thanks: "Code and data are available at https://github.com/lilydia/Toronto_Crime_Rates."
date: "`r format(Sys.time(), '%d %B %Y')`"
abstract: "This report examines reported and cleared crime offenses using open-source crime data provided by Toronto Police Services on Open Data Toronto. Using various methods of data analysis, we estimate emerging crime patterns and suggest limitations toward the current collection of data. Our results provide evidence that detailed collection and proper storage of criminal offense data may provide an alternative information source to improve crime investigations."
output:
bookdown::pdf_document2:
toc: FALSE
bibliography: references.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
#remotes::install_github('yihui/tinytex')
#tinytex:::install_prebuilt()
# Load libraries
library(tidyverse)
library(kableExtra)
# Load dataset
clean_data <- readr::read_csv("../../inputs/data/clean_data.csv")
```
# Introduction
Analysis on crime statistics allows a greater understanding of the dynamics of criminal activities and is important to governments, law enforcement agencies, and the residents themselves [@Phillips2011]. Although law enforcement agencies have the primary responsibility to monitor and reduce the rate of criminal activities [@Marzan2017], residents may increase their awareness of local crime activities using open-source data [@Smith2014]. Therefore, data analysis on publicly available data provides an alternative information source to inform residents of local crime activities. This report examines data provided by Toronto’s Police Service Annual Statistical Report and suggest important trends in law enforcement—the number of crimes reported are increasing while the number of crimes cleared are decreasing.
In the following sections, we will first explain the source of the data--what the dataset is, where the dataset came from, and how the dataset was obtained. Second, we will have a discussion around any biases and limitations that exist within the dataset. Lastly, we will compute statistics and generate visualizations using the dataset to show emerging trends.
# Data
This paper uses the R statistical programming language [@citeR] to perform data analysis, and more specifically, the `opendatatoronto` package for data imports [@citeOpenData], the `tidyverse` package for data manipulation [@citeTidyverse], the `kableExtra` package for table formats [@citeKableExtra].
## Data Source
This paper uses the R package `opendatatoronto` [@citeOpenData] to obtain Toronto’s Police Service Annual Statistical Report on reported crimes. The data includes all crime offenses reported to the Toronto Police Service between 2014 to 2019, and has been aggregated by year, category, subtype, and geographic division [@citeData]. For this paper, we will examine the number of crimes reported and the number of crimes cleared using filters like year, category, and subtype.
Description of key data features below:
* `ReportedYear`: Year crime was reported
* `Category`: Crime category
* `Subtype`: Crime category subtype
* `Count_`: Total number of crimes
* `CountCleared`: Total number of crimes identified as cleared
## Bias and Limitation
There are a few considerations when using this dataset to estimate crime rate patterns. First, the data includes all crimes reported to the Toronto Police Service, including, and not limited to, those that may have been deemed unfounded after investigation, those that have no verified location, and those that may have occurred outside the City of Toronto limits [@citeData]. While a detailed record containing all data points can provide a better picture of reported criminal offenses, these unfounded or not valid cases will skew crime rates and present an inaccurate picture of activities that constitute criminal offenses—behaviors that are prohibited by law and considered violate the moral standards of society [@citeLegal]. One suggestion is to provide case specific details to the current dataset so that crime rate patterns can be better studied.
Another bias is that the variable count does not indicate the number of distinct crimes—if an offense fits under multiple categories, it would be included [@citeData]. This again adds a layer of bias because the same case can be counted multiple times and increase the overall crime rate.
Lastly, the geographic division label of No Specified Address also includes occurrences that have no verified locations, which is not a good representation of local criminal occurrences [@citeData]. In this paper, we are involving all data presented in The Annual Statistical Report to provide an estimate of reported cases processed by the Toronto Police Service.
## Data Analysis
The tables and figures presented in this section are all created using R [@citeR]. The `tidyverse` package [@citeTidyverse] is used and more specifically, `ggplot2` [@citePlot] for generating plots, `dplyr` [@citeDp] for hiding system messages, `readr` [@citeRead] for importing the dataset. Additional table formatting support is also provided by `kableExtra` [@citeKableExtra].
In crime statistics, an offense is cleared if one person has been arrested, charged with the commission of the offense, or turned over to the court for prosecution [@citeClear]. Table \@ref(tab:year) outlines the cleared cases in relation to the reported cases to consider the progression of resolved cases as the number of reported cases increase. In comparison, we see that there is a steady increase in cases reported over time while the increase in cases cleared remains the same, causing a 10% drop in cleared crimes from 2014 to 2019.
```{r year, echo = FALSE}
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
#aggregate data to put into a table:
#total reported cases, total cleared cases, and percentage cleared grouped by different years
crimes_year <-
clean_data %>%
group_by(reported_year) %>%
summarise( Total_Cases = sum(count),
Total_Cleared = sum(count_cleared),
Percent_Cleared = signif(sum(count_cleared)/sum(count),2)
)
#display the aggregated table with specified column names
#package kableExtra used here
crimes_year %>%
knitr::kable(
col.names = c("Year", "Cases Committed", "Cases Cleared", "Percent Cleared"),
align = "cccc",
caption = "Total Number of Crimes Over Time (2014-2019)") %>%
kable_styling(latex_options = "hold_position")
```
To look at the potential causes of these unsolved (uncleared) crimes, we aggregate this target, the percent cleared, by different categories of offenses. Table \@ref(tab:category) shows the average annual cases by 6 main categories of crimes. While majority of the categories indicate a high clearance percentage, crimes that are labeled "Crimes Against Property" shows a low clearance percentage at 30%--only 30% of the property crimes are resolved (cleared).
\newpage
```{r category, echo = FALSE}
# Get the number of years included in the dataset
year <- length(unique(clean_data$reported_year))
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
#aggregate data to put into a table:
#Average reported cases, average cleared cases, and percentage cleared grouped by different categories
crimes_category <-
clean_data %>%
group_by(category) %>%
summarise( Average_Annual_Cases = round(sum(count)/year, 0),
Average_Annual_Cases_Cleared = round(sum(count_cleared)/year, 0),
Average_Percent_Cleared = signif(sum(count_cleared)/sum(count),2)
)
#display the aggregated table with specified column names
#package kableExtra used here
crimes_category %>%
knitr::kable(
col.names = c("Category", "Cases Committed", "Cases Cleared", "Percent Cleared"),
align = "cccc",
caption = "Average Annual Cases by Crime Type (2014-2019)") %>%
kable_styling(latex_options = "hold_position")
```
Furthermore, Figure \@ref(fig:time) looks at the progression of each crime category over the years. It is shown that property crimes not only have the highest count of cases, but also have the greatest rise in numbers over the years.
```{r time, fig.width=10, fig.height=7, fig.cap="Total Annual Cases Plotted by Crime Categories (2014-2019)", echo = FALSE}
# Plot reported crimes over time (x-axis is years reported, y-axis is number of reported cases)
clean_data %>%
ggplot(aes(x = reported_year, y = count)) +
geom_bar(stat = "identity", fill = "darkorchid4") +
# Do the same graph over the 6 crime categories
facet_wrap(~ category, ncol = 2) +
labs(title = "Total Annual Cases (2014-2019)",
subtitle = "Data plotted by crime types",
y = "Number of Cases Reported",
x = "Year") + theme_bw(base_size = 15) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) #rotate x-axis labels to display vertically
```
A deeper dive into crimes against property is presented in Figure \@ref(fig:report) and Figure \@ref(fig:cleared), where we mapped the number of cases reported and the number of cases cleared according to specific subtypes. As shown in Figure \@ref(fig:report), subtypes like theft under %500 and fraud have a positive slope in crimes reported; however, similar increases are not observed in Figure \@ref(fig:cleared) when we look at the cleared cases. The reported-cleared relationship is clear from these two plots where the crimes reported are continuing to increase in recent years while the number of cleared cases remain the same.
```{r report, fig.width=10, fig.height=5, fig.cap="Total Reported Crime Cases (2014-2019)", echo = FALSE}
# aggregate data for plotting:
# number of reported crimes by year and subtype
crimes_over_time <-
clean_data %>%
filter(category == "Crimes Against Property") %>%
group_by(reported_year, subtype) %>%
summarise(
reported_cases = sum(count)
)
# Plot the aggregation of reported crimes over time (by subtype)
ggplot( data = crimes_over_time, mapping = aes(x = reported_year, y = reported_cases) ) +
geom_line( aes(color=subtype)) +
geom_point( aes(color=subtype)) +
labs(title = "Total Annual Cases by sub-category (2014-2019)",
y = "Number of Cases Reported",
x = "Year") + theme_bw(base_size = 15) +
theme_minimal()
```
```{r cleared, fig.width=10, fig.height=5, fig.cap="Total Cleared Crime Cases (2014-2019)", echo = FALSE, regfloat=TRUE}
# aggregate data for plotting:
# number of cleared crimes by year and subtype
crimes_over_time <-
clean_data %>%
filter(category == "Crimes Against Property") %>%
group_by(reported_year, subtype) %>%
summarise(
reported_cases = sum(count_cleared)
)
# Plot the aggregation of cleared crimes over time (by subtype)
ggplot( data = crimes_over_time, mapping = aes(x = reported_year, y = reported_cases) ) +
geom_line( aes(color=subtype)) +
geom_point( aes(color=subtype)) +
labs(title = "Total Annual Cases Cleared by sub-category (2014-2019)",
y = "Number of Cases Reported",
x = "Year") + theme_bw(base_size = 15) +
theme_minimal()
```
\newpage
Perhaps what this specific dataset tells us is that the Toronto police capacity is not meeting the rise in demand--more and more cases are being reported over the years but the number of cases cleared stay consistent (see Figure \@ref(fig:report) and Figure \@ref(fig:cleared)). Theft under 5000 dollars appears to be the most commonly reported crimes amongst all subtypes within the property crime category and resolution of such crime might not be prioritized in relation to other types of offenses. Moreover, the property crime category contributes more than 62% of the the total crimes reported, which suggests that the drop in percentage cleared that we witnessed in Table \@ref(tab:year) is mainly caused by unresolved crimes reported under this category. However, limitations exist such that we cannot detect distinct crimes. As discussed in Section 2.2 Bias and Limitation, a single crime can be reported multiple times as long as they fit the category description. The great number of cases under property crimes might be caused due to the broad inclusion of activities [@citeLegal]. Furthermore, the inclusion of unfounded cases might also increase the number of property crimes that are of little significance (eg. Theft under $5000).
\newpage
# References