9 Apr 2021 SAFS Quantitative Seminar

Why Southeast coast of India?

2014-2019 NOAA Fisheries/India Ministry of Earth Sciences joint research on improving forecasts of the indian oil sardine

  • Very important fishery for India.
  • India fishery produces 66–96% (average 80%) of the global oil sardine catch.
  • Southeastern Arabian Sea produces the vast majority of this.
  • Upwelling drives productivity in this system.
  • I know a lot about this upwelling system.

Improving forecasts using environmental covariates: a case study on the Indian oil sardine (Sardinella longiceps), May 24th, UW Fish and Wildlife Ecology Seminars.

Study Area Southeast Arabian Sea (SEAS) off the coast of Kerala. ca 350 miles of coastline. Dispersed non-motorized and motorized fishery.

Productivity in the SEAS is driven by seasonal upwelling

Summer monsoon brings wind and rain

Strong upwelling starts from the tip and moves north

Coastal upwelling

  • Nutrient rich water brought to the surface
  • Phytoplankton blooms
  • Sea surface temperature (SST) differential: Cold water along the coast and warm water off-shore

Changing coastal upwelling patterns

Land warms faster than the ocean -> changes in coastal winds

Projected warming between 2015 and 2050

Projected warming between 2015 and 2050

Di Lorenzo, E. 2015. The future of coastal ocean upwelling. Nature 518, 310–311.

Can we use unsupervised image classification of sea surface temperature (SST) to study upwelling patterns and changes to those patterns?

Why unsupervised classification?

  • Looking for novel patterns
  • Looking for change rather than specific patterns

Overview

Pilot project - fall 2020 by Jacob Zikan (Dartmouth College), k-means analyses as an undergrad project for a machine learning class.

Phase 2 - extending that to other classification algorithms

Three types of unsupervised image classification

  • Principal Components Analysis (PCA) decomposition
  • K-means clustering
  • Hierarchical clustering

Notes

  • Changes to the pattern not absolute temperature. I removed the mean from each image.
  • Arabian Sea has warmed 1-2 deg C in the past 2 decades.
  • monthly images in most of the talk, but daily at the end
  • Monthly SST from ERA5 global reanalysis 0.25 deg grid; Daily SST for NOAA OISST AVHRR-only 0.25 deg grid.

Working with images

  • An image has \(p\) pixels
  • Each row is an image. Each column is a pixel.
  • Get rid of the NA (land) columns.
  • Values are temperature minus image mean temperature.

Here are 5 images and just the first 10 pixels of the image.

##               p1    p2    p3    p4    p5    p6    p7    p8    p9   p10
## 1950-01-01  0.17  0.19  0.18  0.13  0.05 -0.02 -0.08 -0.10 -0.07 -0.06
## 1950-02-01  0.05  0.09  0.10  0.06 -0.01 -0.05 -0.11 -0.13 -0.09 -0.05
## 1950-03-01 -0.26 -0.22 -0.19 -0.22 -0.26 -0.27 -0.25 -0.17 -0.10 -0.04
## 1950-04-01 -0.23 -0.17 -0.14 -0.17 -0.21 -0.23 -0.19 -0.13 -0.06 -0.03
## 1950-05-01 -0.19 -0.14 -0.11 -0.13 -0.17 -0.19 -0.17 -0.12 -0.05 -0.01

PCA (and EOF)

  • Each image (a data set with \(p\) variables) can be expressed as the weighted sum of orthogonal images (\(\lambda\)).
  • The first \(\lambda\) captures most of the variance in the images.

\[\text{image} = \alpha_1 \lambda_1 + \alpha_2 \lambda_2 + \alpha_3 \lambda_3 + \dots\] For those familiar with singular value decomposition \[ \underbrace{\mathbf{X}}_{\text{data}}= \underbrace{\mathbf{U}\,\,\mathbf{D}}_{\alpha} \underbrace{\mathbf{V}^\top}_{\lambda} \]

For those familiar with PDO index That’s another example of PCA decomposition on gridded SST data.

Classic example is facial recognition: eigenfaces

Let’s see how to do it

Let’s use prcomp in R. Using prcomp means you can use the visualization tools in R for PCA.

prcomp.pca <- prcomp(X_norm, scale = FALSE, center=FALSE)
  • The \(\lambda\) are in prcomp.pca$rotation with each column an “eigen image”.
  • The \(\alpha\) are in prcomp.pca$x. One for each image and each \(\lambda\).

Eigen images. Just first 10 pixels of the image are shown.

eigenimages <- t(prcomp.pca$rotation)
round(eigenimages[1:5, 1:10], digits=2)
##        p1    p2    p3    p4    p5    p6    p7    p8    p9   p10
## PC1  0.06  0.06  0.05  0.05  0.04  0.04  0.04  0.04  0.04  0.03
## PC2  0.05  0.06  0.06  0.07  0.08  0.08  0.07  0.07  0.07  0.06
## PC3 -0.03 -0.02 -0.01  0.00  0.00  0.00  0.00 -0.01 -0.02 -0.02
## PC4  0.03  0.03  0.03  0.04  0.06  0.07  0.07  0.07  0.05  0.04
## PC5 -0.02 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 -0.02
# if image is say 25x25
library(raster)
img <- as.raster(matrix(eigenimages[1,], byrow=TRUE, ncol=25))
plot(img)

Reconstructing the SST images

image = \(\alpha_1 \times\) + \(\alpha_2 \times\) + \(\alpha_3 \times\) + \(\dots\)

August

March

May

Variance (in data set) explained

SST Anomaly pattern in the PC1-PC2 space

1980s versus 2010s

PC1: PC2:

PC1:

PC1:

Sep and Aug seem to be changing

August images

Outlier identification using distance from center

March images

PCA summary

  • Nice way to summarize an image with a few dimensions
  • Nice visual of the seasonal cycle
  • Dimension reduction and deals with spatial correlation
  • July and August seem to be changing
  • A few outlier months

K-means clustering

Let’s try it

We need to choose the number of centers. I will start with 12 — big enough to capture most of the variability (which I discovered through trial and error).

set.seed(1221966)
n_K <- 12
out_norm <- kmeans(X_norm, n_K, iter.max=25, nstart=100)
  • The centroid images are in out_norm$centers. Each row is an image.
  • Which cluster each image belongs to is in out_norm$cluster

First 5 pixels of first 5 centroids

centroidimages_norm <- out_norm$centers
rownames(centroidimages_norm) <- paste("Centroid", 1:n_K)
round(centroidimages_norm[1:5, 1:5], digits=2)
##              p1    p2    p3    p4    p5
## Centroid 1 0.21  0.20  0.18  0.15  0.13
## Centroid 2 0.18  0.16  0.12  0.08  0.05
## Centroid 3 0.03  0.00 -0.06 -0.12 -0.19
## Centroid 4 0.00 -0.03 -0.07 -0.11 -0.13
## Centroid 5 0.17  0.13  0.11  0.09  0.10

Distances between the images

Distance based on Euclidian distance.

Seasonal heat map

Different # of centers (5 main groups)

K-means summary

  • How the pattern is changing depends somewhat on how may centers (how fine a scale).
  • July, August and September have consistent upwelling patterns but strength of the temperature differential changes
  • October hard to place in a group (moves around depending on number of centers)
  • May also different in first versus second half
  • Transition months

Raw temperature patterns

Raw temperature seasonal pattern

Blue, green, grey are upwelling patterns (colder to warmer). Red (hot) and orange (warm) are uniform patterns.

Hierarchical clustering

Let’s try it

  • I am going to cluster on the raw images, but later I will be clustering on the Principal Components so that I have orthogonal variables.
  • I am also going to use Euclidian distance. I’ve tried a bunch. At this point, I don’t see that huge of a difference.
ncomp <- 20
d <- dist(X_norm, method = "euclidian")
clus <- hclust(d)

method="ward.D2" default

This method tries to make compact clusters of similar size.

method="complete" default

This method allows you to find outliers.

method="complete" default

centroid image is the mean (by pixel) of all images within a cluster.

method="ward.D2"

Summary for hierarchical clustering

  • Allows you to get a better understanding of the structure
  • Overall not that different from patterns from k-means
  • Does better than k-means at outlier identification using method="complete"
  • method="complete" also doesn’t try to make similar size clusters

Upwelling changes in the SEAS

Daily SST from the NOAA OISST AVHRR-only data set.

August 1st demeaned images 1982-2020

Jun-Sep # of days with upwelling signal

  • PCA on the full data set: daily Jan 1982 to Dec 2020
  • Use the first 20 Principal Components to decribe the images
  • Look at June to Sept months only
  • Divide Jul-Sep images using hierarchical Clustering with method="complete" into 3 groups.
  • Number of days that are in each cluster

Jun-Sep # of days with upwelling signal

Pattern relative to 2 recent oil sardine collapses

Jun-Sep day of first and last strong upwelling signal

  • Same image processing
  • Identify the first and last day that image classified as strong upwelling

Jun-Sep day of first and last strong upwelling signal

Summary - Changes in the SEAS

  • Image classification worked on the daily data!
  • There is evidence that the upwelling season is changing
    • shortening, or upwelling width is narrowing, or northern extent shrinking, or tip getting more intense
    • Observed this with other approaches also
  • Rethink the image de-meaning and scaling

Summary - Image Classification

  • PCA
    • Very helpful in reducing the dimensionality
    • Not clear that PC1-PC2 space captures enough variation
    • Need to retry with just June-Sept data
    • Distance from centroid might be useful…
  • k-means and hierarchical clustering
    • Identifying specific centroid images is very helpful
    • Which image is most like xyz seems like a promising approach
    • Need to think more on how to choose number of clusters