Data Science For Social Good, Spring 2020
Students: Andy Jeong, Minyoung Na, Stella Blue Ponzulogo, Justin Rooney (& Zoe Bowhay)
We received datasets for four different states: Arkansas, Louisiana, Michigan and New York from The Bail Project, detailing crimes and associated bail bonds amounts. Each state has its own legislative charge definitions, and therefore we went through manual verificationi and mapping process in which we map state to national Bureau of Justice Statiscis charge codes. The bubbles in the map below are proportional to the sizes of each dataset.
Please view in a web browser, full screen at 100%.
1. What is bail in the United States? [Wikipedia]
Bail is a set of pre-trial restrictions that are imposed on a suspect to ensure that they comply with the judicial process. Suspects can be released from custody before their hearing if they make bail, or pay a sum of money set by a judge. The amount of bail is generally based on the severity of the crime(s), but the process is convoluted and arbitrary. The bail money is returned if the suspect attends the trial.
2. What is wrong with the bail system?
The current bail system was shaped under the Bail Reform Act of 1966 when the political climate favored the detention of all defendants. The bail system has been viewed negatively by the public as it can exploit minorities and the poor. For instance, there are people who cannot make bail for the smallest crimes, which can lead to an unreasonably long detention period. It is important to note that these detentions are pre-trial, meaning the suspect has not yet been found guilty.
3. What is The Bail Project's mission?
The issues with the bail system have spurred a number of movements seeking change. The Bail Project pays bail for strategically chosen groups of people who are awaiting trial for minor crimes, effectively removing them from detention and restoring the presumption of innocence. The Bail Project has already freed thousands of people in the United States.
4. How can one categorize crimes?
One of our initial goals was to build an ontology given these datasets, and to that end categorizing the disparate State-level penal codes into a National scheme such that we can draw comparisons across states in a more consistent manner. The Bureau of Justice Statistics (BJS) has created a framework for categorizing crime descriptions which has three levels listed below. Throughout the ensuing analyses, we convert state-level crime descriptions into the standardized BJS codes.
Broad Category:
Crimes are binned at the highest level into Violent, Property, Drug, Public Order, or Other categories
BJS Category:
Additional categorization granularity is provided. For example, Violent can be subdivided into Murder, Kidnapping, Assault, etc.
BJS Description:
The highest level of granularity is provided. For instance, Assault can be subdivided into Aggravated Assault, Simple Assault on a Public Safety Officer, etc.
-
Broad Category:
Crimes are binned at the highest level into Violent, Property, Drug, Public Order, or Other categories
-
BJS Category:
Additional categorization granularity is provided. For example, Violent can be subdivided into Murder, Kidnapping, Assault, etc.
-
BJS Description:
The highest level of granularity is provided. For instance, Assault can be subdivided into Aggravated Assault, Simple Assault on a Public Safety Officer, etc.
Datasets
We were provided with web-crawled datasets from four regions:
(1) Washington, Arkansas (AR)
(2) East Baton Rouge, Louisiana (LA)
(3) Wayne, Michigan (MI)
(4) New York City, New York (NY)
Each region had a different demographic composition, illustrated below:
The datasets contained information regarding bail bond amounts which were tied to crime descriptions committed in the region. Some datsets contained information on race, sex, and time held in prison.
For each region, we started by analyzing demographic compostion, focusing on suspect race and sex. We also analyzed the distribution of crime types, which were categorized into BJS broad categories.
Arkansas
- For AR, WHITE and MALE were dominant.
After pre-processing, percentage breakdown by BJS Broad Category:
Louisiana
- For LA, BLACK and MALE were dominant.
After pre-processing, percentage breakdown by BJS Broad Category:
Michigan
- Michigan did not provide demographic information about charged cases.
After pre-processing, percentage breakdown by BJS Broad Category:
New York
- For NY, BLACK and MALE were dominant.
After pre-processing, percentage breakdown by BJS Broad Category:
Assumptions/Caveats throughout our data analysis
-
Data is not exhaustive and only collected for a limited period of time
-
For NY, we observed that there is only one row of data with all charges concatenated. If the detained committed more than one crime, then the bail bond amount was in singles (multiple $1). From this observation, we assumed it was valid to take anything
< $10 as "placeholder" dollar bails. However, due to all charges combined into one for each person, for the purpose of this analysis we only extracted the first charge, assuming that the first is the most significant and heavily weighted.
-
"Placeholder" bail bonds for the states AR, LA, and MI are the casaes with bail bonda amounts of $0. Filling in $0 for bails for multiple different charge cases (for single detained person) is observed across all state datasets, and we assume it is almost
a tradition for other states as well.
-
In order to classify state-level charge descriptions, which were prone to human error in entering in data, into national Bureau of Justice Statistics (BJS) codes, we took a list of unique state-level charge descriptions and mapped to BJS codes manually
by hand. This mapping is performed with our group's discretion.
-
Units: Bail bond amounts in USD($), Time spent in jail in days
Pre-processing Stage
-
Arkansas: (1) dropped duplicates (2) computed time in jail by subtracting booking date from release date
-
Louisiana: (1) dropped duplicates (2) computed time in jail by subtracting booking date from release date based on status
-
Michigan: (1) dropped duplicates (2) computed time in jail (expected) from sentence length
-
New York: (1) dropped duplicates (2) removed cases that has 'released', 'remanded','sentenced','unknown' in the bond information column
(2) East Baton Rouge, Louisiana (LA)
(3) Wayne, Michigan (MI)
(4) New York City, New York (NY)
The datasets contained information regarding bail bond amounts which were tied to crime descriptions committed in the region. Some datsets contained information on race, sex, and time held in prison.
For each region, we started by analyzing demographic compostion, focusing on suspect race and sex. We also analyzed the distribution of crime types, which were categorized into BJS broad categories.
-
Data is not exhaustive and only collected for a limited period of time
-
For NY, we observed that there is only one row of data with all charges concatenated. If the detained committed more than one crime, then the bail bond amount was in singles (multiple $1). From this observation, we assumed it was valid to take anything
< $10 as "placeholder" dollar bails. However, due to all charges combined into one for each person, for the purpose of this analysis we only extracted the first charge, assuming that the first is the most significant and heavily weighted.
-
"Placeholder" bail bonds for the states AR, LA, and MI are the casaes with bail bonda amounts of $0. Filling in $0 for bails for multiple different charge cases (for single detained person) is observed across all state datasets, and we assume it is almost
a tradition for other states as well.
-
In order to classify state-level charge descriptions, which were prone to human error in entering in data, into national Bureau of Justice Statistics (BJS) codes, we took a list of unique state-level charge descriptions and mapped to BJS codes manually
by hand. This mapping is performed with our group's discretion.
-
Units: Bail bond amounts in USD($), Time spent in jail in days
Pre-processing Stage
-
Arkansas: (1) dropped duplicates (2) computed time in jail by subtracting booking date from release date
-
Louisiana: (1) dropped duplicates (2) computed time in jail by subtracting booking date from release date based on status
-
Michigan: (1) dropped duplicates (2) computed time in jail (expected) from sentence length
-
New York: (1) dropped duplicates (2) removed cases that has 'released', 'remanded','sentenced','unknown' in the bond information column
Arkansas
Race
11.5% Hispanic 0.37% Indian
68.2% White 1.85% Other
Gender
16.9% Female 0.1% Other
Total: 49,960
Louisiana
Race
2.01% Hispanic 0.82% Other
17.8% White
Gender
85.7% Male 0.03% Other
Total: 280,240
Michigan
Race
Gender
New York
Race
0.23% Indian 11.2% White
27.8% Other
Gender
Total: 5,182
For each BJS Broad Category, we further grouped by the following criteria:
- (1) What are the raw distributions before pre-processing?
- (2) Is this bail bond a "placeholder" or actually something significant?
- (3) What are the distributions for the cases with bail bonds in (0, 1000] and [1000, Inf) groups?
- (3) What are the distributions for the cases with time spent in jail in (0, 20] and [20, Inf) groups?
- (4) What are the distributions for the cases with bail bonds in (0, 1000] and [1000, Inf) groups AND time spent in jail in (0, 20] and [20, Inf) groups?

For the following set of analyses, we provide visualizations for each of the five BJS Broad Categories,as described in the [Background] section.
- Two Demographic distribution heat maps for each category shows which race and sex profiles (for each state respectively) are most commonly charged with a certain crime category
- Four bail bond amount comparison charts display the average bails for each subgroup (race, sex)
Public Order
In this category, the demographic distributions (race and gender) are shown below.
From looking at the graphs below, one can see that there is an obvious disparity in how each demographic gets charged between the states; whites in Arkansas are much more likely to receive a charge on Public Order compared to people of color in the state, while in New York and especially in Louisiana the opposite is true.
The sexes of those charged are majority male, with little to no data available for females charged in the states. There was no data available for people that identified as ‘other,’ or non-gender conforming.
In Arkansas, the bail amounts were largely concentrated in the categories showing that the amount was greater than $1,000, and those charged spent at least 20 days in jail.
In Louisiana, our data was split amongst four different categories, with the largest showing that bail is greater than $1,000 and time spent in jail greater than or equal to 20 days, and the smallest number of data residing in the category representing the bail amount being less than or equal to $1,000, and time spent in jail greater than 20 days.
In Michigan, most of the data is lumped between the two categories, bail amount greater than $1,000 and time spent less than or equal to 20 days, or bail amount less than or equal to $1,000 with time spent in jail less than or equal to 20 days.
In New York, the data was all concentrated in the category of bail amount being larger than $1,000, with one outlier in the less than or equal to $1,000 category.
Drug
From looking at the graphs below, we can see that the data of charged demographic for drug-related crimes is very similar to the heatgraphs above relating to public order charges. Arkansas is still charging a larger amount of white people compared to Louisiana and New York, where most of the people being processed for drug related crimes are people of color. It is important to note that for the new York data we received, we are assuming that the race category “Other” refers to people of Hispanic heritage.
Similar to the graphics for the Public Order graph, we can see that the majority of those charged are male, and there is no data collected for individuals identifying as “other.”
In Arkansas, the bail bond data was divided amongst two groups, with the most charges being larger than $1,000 and consisting of the time spent in jail greater than or equal to 20 days, and the least amount of data relating to charges less than or equal to $1,000 and time spent in jail less than or equal to 20 days.
In Louisiana, data was collected from four categories, with most of them evenly split between both spending less than or equal to 20 days in jail and more people with a bail amount greater than $1,000, than the bail amount being less than or equal to $1,000.
In Michigan, there is a disproportionately large amount cf data with the time spent in jail being less than or equal to 20 days, and more charges being less than or equal to $1,000 than there are with an amount larger than $1,000.
In New York, the data was in the majority showing the bail amount greater than $1,000.
Property
From looking at the graphs below, we can see that the data of charged demographic for property-related crimes is very similar to the heatgraphs above relating to drug charges. Arkansas is still charging a larger amount of white people compared to Louisiana and New York, where most of the people being processed for property related crimes are people of color. It is important to note that for the new York data we received, we are assuming that the race category “Other” refers to people of Hispanic heritage. In Louisiana, one change is that slightly more white people are being charged and in New York, slightly more black people are being charged on property related crimes.
Similar to the graphics for the Public Order graph, we can see that the majority of those charged are male, and there is no data collected for individuals identifying as “other”. Arkansas charged more females than the other states.
In Arkansas, a small amount of data was recorded for bail amounts less than or equal to $1,000, and time spent in jail less than or equal to 20 days. Most of the data fell under the category of bail amount greater than $1,000.
In Louisiana, most of the bail amount data was recorded in the categories of just spending time in jail or just paying bail, with most people paying more than $1,000 for bail and spending more than 20 days in jail.
In Michigan, the majority of the data collected shows that the time spent in jail is less than or equal to 20 days, with a similar amount of data both in bail is greater than $1,000 and in the bail being less than or equal to $1,000.
In New York most of the data collected shows that the bail was greater than $1,000, with a small amount showing the bail being less than or equal to $1,000.
BJS Broad Category:
Violent
From looking at the graphs below, we can see that the demographic for violent crimes is similar to the heat graphs above relating to property and drug charges. Arkansas is charging a larger amount of white people compared to Louisiana and New York, where most of the people being processed for violent related crimes are people of color. Arkansas has slightly more charges associated with White people. New York had slightly more charges associated with “Other” races.
Similar to the graphics for the Property graph, we can see that the majority of those violent charges are male, and there is no data collected for individuals identifying as “Other” races. Specifically, Arkansas charged more males and Louisiana charged more females when compared with the other Broad Categories.
In Arkansas, the data is collected from only two categories, with bail being less than or equal to $1,000 and time spent in jail less than or equal to 20 days, or bail being greater than $1,000 with no jail time.
In Louisiana, the majority of the data shows that the bail charged is greater than $1,000, and the time spent in jail is less than or equal to 20 days.
In Michigan, the majority of data shows that the bail is greater than $1,000 and that the time in jail spent was less than or equal to 20 days.
In New York, the majority of the data collected shows that the bail amounts are larger than $1,000.
In summary, New York is the state whose bail charges are in the majority more expensive than $1,000.
While organizing the NYC dataset, an unusually high amount of one and other single digit bail bond amounts was discovered. This was revealed to be a practice limited to NYC and has been aptly named dollar bail by those who are aware. Dollar bail is carried out ``when a defendant has 2 cases open simultaneously where one is more serious." In such a case, ``defense attorneys will often ask that a $1 bail be set on the more minor case" ( dollar bail brigade). Despite this incredibly low amount of bail, in a period between August 2018 and December 2018, "the city has locked up 149 detainees on just $1 bail, according to Correction Department records" (news reference). This happens as defendants are unable to pay for the dollar bail themselves when they are arrested. Furthermore, there are myriad of testimonies to prove the dollar bails' anachronistic nature. For an instance, a "Queens man languished on Rikers Island for five months in 2014 without knowing his bail was a mere $1" (news reference). There have been more recent efforts to establish a more transparent system. Representative Lancman has introduced a bill to "ensure that the Department of Correction is promptly communicating with defendants detained solely for $1 bail" (news reference). More direct approaches have been taken as well, as with the dollar bail brigade where they hire volunteers to pay the dollar bail for the incarcerated. However, the group is unaware whether or not the individuals that have were part of the NYC data and has a dollar under a bond amount have been affected by the changes in the system.
An analysis was run on the web-crawled NY data, specifically for all bail amounts less than $10 based on crime severity. Crime severity was assessed by sorting suspects into misdemeanor, felony, or both bins, with both representing a single suspect with simultaneous felony and misdemeanor charges. The Venn diagram below has relatively equal bin sizes, illustrating a lack of correlation. This hints at the bail system being arbitrary and opaque in general.

We note cases with $0 bail bond amount listed to have a "placeholder" bail, from observing that there are typically more than one crime committed by that person. For AR, LA, MI, these are entered with $0, whereas for NY, each case bears a value of $1, accumulated.
Ontology is a formal, explicit specification of a shared conceptualization of a domain of interest.
- formal: should be readable by machines without human intervention
- explicit: should be able to interpret the word that bears multiple meanings
- shared: should be available and shared among people
- conceptualization: should represent an abstract model that describes concepts of entities
- domain: cannot represent entities of other domains
We linked their charge descriptions to the Bureau of Justice Statistics (BJS) charge descriptions
There are five BJS Broad Categories: Violent, Drug, Public Order, Property and Other
Each of these categories has cases associated with zero or 1-dollar bail bonds, which we note as 'placeholder bonds' and the rest as 'meaningful bonds' in this tree
In each of this subset, we again group by bond amounts ( <=$ 1,000 and> $1,000) and further by time spent in jail ( <=20 days and> 20 days)
Make decisions as someone who is detained on bail.
You choose your race, gender, and crime category.
See the average bail you must pay to be released from detention.