It looks like you're new here. If you want to get involved, click one of these buttons!
Title: Ice-Air Data Appendix
Author/s: University of Washington Center for Human Rights
Language/s: pmd -- python + markdown
Year/s of development: 2019
The Ice-Air Data Appendix is a Jupyter notebook document that accompanied the University of Washington Center for Human Rights' report: "Hidden in Plain Sight: ICE Air and the Machinery of Mass Deportation." It processes almost 2 million passenger records from ICE's deportation flight database, obtained through FOIA. ICE is the U.S. Immigration and Customs Enforcement. This pmd file is audit code that loads the cleaned ICE Air data set, enforces the schema, runs consistency checks, and generates groupings to verify the credibility of the data used in the appendix.
Source: Arts: (Alien Repatriation Tracking System) ICE's internal database for managing deportation charter flights, tracking: passengers, flights, passenger traits, pickup and drop off locations.
The data work follows "Principled Data Processing" developed by Human Rights Data Group (HRDAG).
What aspects of ICE operations and ideology (deport/punish/oppress/purify) can we critique through this code? What is the value of critiquing the pipeline around code we cannot see? How does this code model ways of monitoring and auditing the State?
Source File: ice-air-data-appendix.pmd
This is an appendix to the report Hidden in Plain Sight: ICE Air and the Machinery of Mass Deportation, which uses data from ICE's Alien Repatriation Tracking System (ARTS) released by ICE Enforcement and Removal Operations pursuant to a Freedom of Information Act request by the University of Washington Center for Human Rights. This appendix intended to provide readers with greater detail on the contents, structure, and limitations of this dataset, and the process our researchers performed to render it suitable for social scientific analysis. The appendix is a living document that will be updated over time in order to make ICE Air data as widely-accessible and transparently-documented as possible.
The project repository contains all the data and code used for the production of the report.
# Get optimal data types before reading in the ARTS dataset
with open('input/dtypes.yaml', 'r') as yamlfile:
column_types = yaml.load(yamlfile)
read_csv_opts = {'sep': '|',
'quotechar': '"',
'compression': 'gzip',
'encoding': 'utf-8',
'dtype': column_types,
'parse_dates': ['MissionDate'],
'infer_datetime_format': True}
df = pd.read_csv('input/ice-air.csv.gz', **read_csv_opts)
# The ARTS Data Dictionary as released by ICE
data_dict = pd.read_csv('input/ARTS_Data_Dictionary.csv.gz', compression='gzip', sep='|')
data_dict.columns = ['Field', 'Definition']
# A YAML file containing the field names in the original ARTS dataset
with open('hand/arts_cols.yaml', 'r') as yamlfile:
arts_cols = yaml.load(yamlfile)
# Asserting characteristics of key fields
assert sum(df['AlienMasterID'].isnull()) == 0
assert len(df) == len(set(df['AlienMasterID']))
assert sum(df['MissionID'].isnull()) == 0
assert sum(df['MissionNumber'].isnull()) == 0
assert len(set(df['MissionID'])) == len(set(df['MissionNumber']))