We are publishing data from Court Watch every week on our website and also sharing that same information directly with the Rollins administration. We’re working to make the information sharing process consistent and transparent. To that end, we also want to make our process for collecting, cleaning, and analyzing the data consistent and transparent. Here’s how we do it:
For the First 100 Days project, courtwatchers are trained to listen for specific information focused on charging and bail decisions. We created a single-page form courtwatchers can use to track this information for each hearing they observe. The data is later uploaded through an online form, so we advise courtwatchers that they may take notes however they like, provided they capture the information we’re tracking for each hearing they watch.
Courtwatchers are asked to enter their data directly into an online form themselves after each shift, preferably within 24 hours of observation. We’ve provided instructions on how to use the form. This method resolves issues with accuracy, capacity, and timeliness. Time and workloads don’t allow our three-person steering team to do data entry across hundreds of arraignments each week. The courtwatcher only has to read their own handwriting and recall their own experience, and given the turnaround time the memory is still fresh in their mind. And this efficient, people-powered method enables us to track and release data in real time on twitter and in the weekly digest. We are really thankful that courtwatchers have been willing and able to do this. We're doing it this way in order to offer our community and the Rollins administration the opportunity in real time to address injustices in the courts, rather than writing a report months later, as well as to document the pace of change.
Reviewing, Cleaning, and Analyzing Data
Because DA Rachael Rollins was inaugurated on a Wednesday, we pull and analyze weekly data for periods stretching from Wednesday of one week to Tuesday of the following week.
We download an excel spreadsheet of all the form responses input online and keep copies of each week’s master spreadsheet to avoid issues with cleaning things up within the online interface itself. Here’s the process:
Step 1 – narrow the period: we delete any entries that don’t apply to the week we’re working on. (e.g. tests from December, the prior week(s), etc.)
Step 2 – de-duplicate: Sometimes more than one volunteer observes each arraignment, but we only want each person being arraigned/each case to appear in the dataset once. Therefore, in excel we organize by date/court/defendant name so we can easily spot duplicate arraignments based on the defendant’s name and docket number. In order to be able to go back to earlier versions, we save a second copy of the worksheet where the duplicates will be deleted, and then manually clean the data—removing those extra entries after making sure to consolidate any info across the two watchers that is relevant to preserve.
Sometimes the process takes some creativity because courtwatchers didn’t quite catch the name or wrote down dramatically different spellings. This is why docket numbers can be especially helpful in cleaning the data. However, docket numbers aren’t always announced and courtwatchers don’t always catch them.
Step 3 – verify charges: then we sift through the open-ended charges box, back-filling anything that is a “decline to prosecute” charge and wasn’t properly entered. We also make sure that every entry includes the charge the person was facing; we do this by manually scanning every row of data to be sure at least one charge is filled in, and make sure it doesn’t get counted or categorized inaccurately. Because of limited capacity of the CourtWatch MA team, we cannot verify every case against the limited available charge information on masscourts.org. We verify the charge in any case where the courtwatcher left the charge blank, indicated they did not hear the charge, or noted that reading the charge was waived in court. It is of course possible courtwatchers mishear the charge(s) in any given case; our data collection represents our trained volunteers' best efforts and is not a substitute for open government. We are very hopeful that DA Rollins will release data tracked by the Suffolk County District Attorney’s Office and make it publicly available for community review, much like State’s Attorney Kim Foxx in Cook County, IL. During this step, we also try to weed out cases that aren’t actually arraignments, like status hearings / restraining orders / probation violations. We also try to separate out specific dockets when a person was appearing for multiple open cases. If the courtwatcher noted a specific disposition by docket, we separate those cases into separate rows so we can track outcomes across each case in the dataset.
Step 4 – count the total number of cases: we use this second, de-duplicated version of the data, once cleaned, to get a total tally of the unique arraignments observed for that week. We also use this version to run comparison questions to the decline to prosecute (“DTP”) subset of cases.
Each week we start with more than 200 rows of data, but after cleaning and categorizing we review an average (over these past 3 weeks) of 86 cases in the DTP subset.
So far there have been hundreds of cases we have not critically examined because we lack capacity to review them and they are not the focus of our project; however, it would be incredibly illuminating if Suffolk County made these data public.
Step 5 – DTP subset: finally, we create a third version of the data for cases with only charges on the DTP list. We find these cases based on whether the “other charges” field was left blank. If that field is empty, the row of data for that arraignment gets included in this third version of the dataset. Within that third version, we focus on outcomes, reorganizing the data so that a single column reflects whether the case was dismissed, diverted, bail was set, the person was released, or there was some other outcome. That makes it easier to get relevant tallies. Sometimes courtwatchers fill in two mutually exclusive fields regarding the disposition, so we read the open-ended comments to get a better sense of the actual outcome of the case and fill in the accurate information.
Since the administration has not released any official policies, we have had to try to figure out the difference between dismissal and diversion. As reflected in our week 3 digest, we have decided to try to differentiate between cases that end at that arraignment (a true dismissal) and cases where the person is required to come to another court date to fulfill a condition or show proof of compliance (diversion). We acknowledge these are judgement calls courtwatchers have to make in the moment and are grateful they approach their work so thoughtfully.
Step 6 – analyze: then we analyze. We write a bunch of functions in excel to get tallies for the number of charges, racial demographics, release decisions and dispositions, bail amounts, etc.
Step 7 – write the digest: finally, we take that analysis, put it into prose, and add charts and tables to present the data for public consumption.
Step 8 – respond & revise: the team reviews the data and offers some analysis as well as recommendations for how to address serious injustices volunteers witnessed that week and/or departures from the administration’s stated policy goals. And voila! We have a digest.
Step 9 – publicize & disseminate: we put the digest out over our twitter account (@CourtWatchMA) and highlight some of the most significant findings in a thread. This complements our daily stories from court, which we find by reading through open-ended comments courtwatchers have recorded in our online data collection form each morning. Follow along with the hashtags #first100days and #storiesfromcourt.