Description of Data Sets:
I volunteer alongside the Environmental Reporting Collective (https://www.investigative.earth/) to combine machine learning methods using open source data and tools within investigative journalism to report on environmental crimes around the world. The ERC runs cross-border investigative projects that highlight the role of large-scale environmental crimes in the global climate crisis. I partnered with the ERC to produce an investigative report exposing the locations and frequency of flaring in the Autonomous Region of Iraqi Kurdistan.
Using open source satellite imagery I built a machine learning pipeline to identify flaring hotspots and track the level of flaring activity through time. Although the winter months showed a drop off in flaring activity, historical data reveals flaring levels have not decreased relative to previous years (2018-19). My findings project flaring will increase, not decrease, relative to 2021.
To develop this visualization I collected data from the VIIRS Nightfire (dataset produced by the Earth Observation Group for the previous three years, to isolate locations that had known activity of fire and flaring. I filtered this dataset for temperature values below 1600K to separate isolated fires from spatially diffuse, high-temperature clusters of fire which would indicate wildfire. The result of this analysis was the location of flares across the last three years within the Kurdistan region. To validate the results I used Open Street map and Open Infrastructure map (both open source) based on their proximity to oil and gas infrastructure, as well as photos geolocated on google maps.
To understand if proximity to flaring could lead to negative environmental and health outcomes, I selected additional NASA Imagery such as:
GSMaP Operational Global Satellite Mapping of Precipitation
Terra & Aqua MAIAC Land Aerosol Optical Depth Daily 1 Km (AOD)
GPWv411: Population Density Gridded Population of the World (Population Density)
I used such imagery to predict the air pollution (PM2.5 concentrations) within a 10km buffer zone using machine learning (Last et al. 2019, https://arxiv.org/abs/2103.12505). We developed a stacked ensemble Machine Learning model, enabling us to leverage the capabilities of a range of well-performing models to improve the accuracy of PM2.5 predictions over any individual model. Base models were trained on 80% of the PM2.5 concentration data. Predictions from base models were aggregated to be used as the input features for the meta-model. This “ensemble” approach reduces variance making our machine learning solution robust and stable in performance. We used this approach model to generate predictions of weekly averaged PM2.5 concentrations. Base models were trained on 80% of the OpenAQ PM2.5 concentration data. We used linear regression with 5-fold temporal cross-validation to train the meta-model. These measures used thresholds from the WHO which suggest high PM2.5 concentrations. These predictions are visualized as average PM2.5 concentrations around the flare locations.
Scientific Potential of Presentation:
We calculated exposure metrics to understand how many individuals were directly exposed (population within one kilometer) to particulate matter from gas flares since October 2018using open source population density data (GPWv411). We found that the number of people in Iraq living within a 1km radius of more than 10 flaring events was 1.19 million. In Russia, (the nation with the highest number of flares in the world), only 275,000 experienced the more than 10 flaring events level of exposure across the same time period.
The findings of my research in the Kurdistan region of Iraq have implications for air quality monitoring internationally. Globally, 93% of children live where air pollution levels exceed World Health Organization (WHO) guidelines. My open source library to query allows scientists and researchers to query NASA’s satellite imagery to generate predictions of air pollution concentrations using an open source model. My visualization enables scientists and researchers to understand the spatial and temporal distribution of pollution to understand pollution concentrations in regions with sparse pollution sensor coverage, to understand the locations of affected communities.
In addition to the negative externalities associated with oil and gas refineries, my investigation into flaring in the region also uncovered gas refineries that have significantly reduced flaring by investing in infrastructure. For example, this open source pipeline exposed a plant located near the Sarqala field, Garmian block, southeast of the Region, that cut flaring by a third during the study period. When interviewed, we understood that the Glasgow-headquartered energy firm Aggreko recently completed one of the largest flare gas-to-power projects in the Middle East. As more researchers and Scientists develop visualization tools to assess the level of pollution from oil and gas, they are more likely to identify successful technologies that lead to a larger positive environmental and health impact, thus enhancing the potential of these tools to encourage greater positive change.
Current Insitute of Study/Organization: MIT
Currently Pursuing: Master's
- Grand Prize