이 사례 연구는 UC Berkeley의 Gallant 신경 과학 연구소 (Gallant lab 관련 논문 참조)에서 한 개인에 대해 수행한 기능적 자기 공명 영상 (fMRI) 실험에서 수집 한 데이터를 검사합니다. edu / mlpapers / paper_files / NIPS2008_0963.pdf “> 참조1 참조2 ).

fMRI는 신경 활동의 간접적인 척도로 간주 될 수있는 뇌의 산소가 공급 된 혈액 흐름을 측정합니다 (두 과정은 상호 연관성이 높음). fMRI 실험에서 얻은 측정치는 뇌의 큐브와 같은 화소 내에서 수십만개 뉴런의 총 응답과 일치합니다. 두뇌를 3D 화소로 세분화하는 것은 이미지를 2D 픽셀로 세분화하는 것과 유사합니다.

The data

이 데이터에는 1,750 가지의 이미지 (예 : 아기, 집 또는 그림의 영상)에 대한 반응으로 한 개인의 시각 피질에있는 약 1,300개 화소에 대한 fMRI 측정 값 입니다.

진행 상황을 시각화하기 위해 아래 그림은 fMRI 시스템에서 개별 이미지를 보는 모습입니다.

An individual viewing an image while in an fMRI machine

fMRI 기기에서 이미지를 보는 개인

각각의 이미지는 길이 1282 = 16,3841282 = 16,384의 벡터로 표현 될 수 있지만 가보 (Gabor) 웨이블릿 변환을 통해 길이 10,92110,921로 감소 될 수있는 128 × 128128 × 128 픽셀 그레이 스케일 이미지이다.

원시 데이터는 Computational Neuroscience (CRCNS) 데이터 공유 저장소의 협업 연구에 저장되며 여기에서 찾을 수 있습니다. 데이터에 액세스하려면 데이터로 수행 할 작업을 설명하는 CRCNS 계정을 요청해야하지만 이는 매우 간단합니다.

Loading Superheat

Installing the superheat package from github is easy, assuming you have the devtools package installed in R. Simply type the following command:

Assuming that you didn’t run into any unfortunate errors when installing the package, you can load the package into R in the normal way.

Viewing the raw images

It is helpful to have an idea of what kind of images the subject was viewing. The raw images are contained in the Stimuli.mat file, and is separated into a set of 1,750 training images (used to train our models) and 120 validation images (used to evaluate model performance).

The code below loads in the data and extracts the training and validation images

We display four of the training images below using our superheat package.

Preparing the data for analysis

Despite best efforts, sadly not all of the data is publicly available. The Gabor wavelet filters, for example, are not available on the CRCNS website (see above), but the dedicated reader can try to generate their own Gabor wavelets using the raw images contained in Stimuli.mat (see above).

Gabor wavelet features

I, however, have access to a file, fMRIdata.RData which contains the Gabor wavelet features for the training and validation set images. These Gabor feature matrices are contained in the fit_feat and val_feat objects from the fMRIdata.Rdata file (again, this is unfortunately not publicly available).

The voxel responses

The voxel responses to each image was collected for two subjects, “S1” and “S2”. We will restrict our analysis to predicting the voxel response in the V1 region for Subject 1 only (see the image below taken from Matthew Schmolesky’s Webvision).

Loading in the voxel responses (from the EstimatedResponses.mat file) involves first converting the .mat file to a format readable by R. Specifically I had to convert the data to version 6 MATLAB file in Octave:

Having converted the original MATLAB file to version 6 MATLAB file, we can load it into R using the R.matlab package.

We can then filter through the objects contained in this file to extract only the responses for the 1331 V1 voxels to the training and validation images.

Cleaning the data

The final data objects in our workspace are

where

  • train.stimuli: a 1750×128×1281750×128×128 array corresponding to the 1,750 raw training images each of dimension 128×128128×128.
  • val.stimuli: a 120×128×128120×128×128 array corresponding to the 120 raw validation images each of dimension 128×128128×128.
  • train.feat: a 1750×109211750×10921 matrix corresponding to the 10,921 Gabor wavelet features for each of the 1,750 training images.
  • val.feat: a 120×10921120×10921 matrix corresponding to the 10,921 Gabor wavelet features for each of the 120 validation images.
  • train.resp: a 1331×17501331×1750 matrix containing the responses of the 1,331 voxels in the V1 region to each of the 1,750 training images.
  • val.resp: a 1331×1201331×120 matrix containing the responses of the 1,331 voxels in the V1 region to each of the 120 validation images.

Missing values

Note that of the 1,331 voxels in the V1 region, 37 of them have at least 40% missing responses. So we will remove these voxels.

We now thus have 1,294 voxels in the V1 region.

Modeling

For each of our models, our goal is to predict the response to the viewing of an image for each of the 1,294 voxels (cubic region in the brain).

That is, we have 1,294 separate models (one for each voxel), where the predictors/variables correspond to the 10,921 Gabor features from the training images.

Feature selection

To make the problem computationally feasible, we decided to filter the 10,921 Gabor features to the 500 that were most correlated with each voxel.

To identify the correlation of each Gabor feature with each voxel response, we ran the following code (also contained in the code/voxel_cor.R file that can be found here) on the statistics department computer cluster at UC Berkeley. You could probably run it on your laptop, but I would advise against it unless you are incredibly patient.

Next, to identify the 500 Gabor features that are most correlated with each voxel, we can simply load the correlation data saved above and run the following code.

Lasso + OLS model

We now fit a lasso linear model to predict the voxel responses to the image stimuli.

The 1750×5001750×500 design matrix (feature matrix) can be represented as follows:

X=⎡⎣⎢⎢⎢⎢x1,1x2,1x1750,1x1,2x2,2x1750,2............x1,500x2,500x1750,500⎤⎦⎥⎥⎥⎥X=[x1,1×1,2…x1,500×2,1×2,2…x2,500⋮⋮…⋮x1750,1×1750,2…x1750,500]

where xi,jxi,j is the jjth Gabor wavelet feature value for the iith image.

We have 1,294 response vectors (as we fit 1,294 separate models) each of which contains the response for an individual voxel to the 17501750 training images. The response vector for voxel v{1,...,1294}v∈{1,…,1294} can be written as:

yv=⎡⎣⎢⎢⎢⎢⎢yv1yv2yv1750⎤⎦⎥⎥⎥⎥⎥yv=[y1vy2v⋮y1750v]

To fit these 1,294 models, the following code was run using the Statistics Department cluster at UC Berkeley. You will not be able to run this code on your laptop.

Having run the above code on the cluster and saved the results in the file lasso_results_top500.RData, we can load the results into the current R session and calculate the corresponding voxel response predictions.

First, we can extract the model coefficients for each model (since we saved the results for each possible lambda, we need to filter to the model with the “best” lambda).

We can also identify which feature was selected (had a non-zero lasso coefficient) by each voxel.

Next, for each voxel we re-fit a classical un-regularized OLS model for each voxel based on the selected (non-zero) coefficients from the corresponding lasso model.

Evaluating the model on the validation set

The code below extracts the predicted voxel response for the validation set.

Below we calculate and plot the correlation of the true voxel responses with the predicted voxel responses for each of the 1,294 voxels.

A histogram of the correlation of the true voxel response on the validation images and the predicted voxel response on the validation images for each of the 1294 voxels.

A histogram of the correlation of the true voxel response on the validation images and the predicted voxel response on the validation images for each of the 1294 voxels.

We see that there appear to be two groups of voxels: those whose whose true and predicted correlations are quite low (around 0.2), and another group whose correlations are around 0.6.

Visualizing voxel performance accross all voxels

We will now use superheat to simultaneously evaluate the performance of the lasso models for predicting the responses to the validation images of each of the 1294 voxels.

First we will cluster the images and voxels into two groups for visualization purposes.

Below we plot some examples of images from each cluster clusters. The following presents four images from the first cluster

While the next four images are from the second cluster.

Next, we plot a heatmap of the validation response matrix, that is, the response of each voxel to each validation image. Above each column in the heatmap (each column corresponds to a voxel), we plot the correlation of the observed voxel response with the predicted voxel response.

The heatmap above is very noisy as a lot of information is being crammed into a small number of pixels. It is thus often much easier to “smooth” the heatmap within its clusters to highlight the “big picture”.

What we can see is that the two clusters of voxels behave very differently:

  • the first cluster of voxels is much more active in response to viewings of images from the second cluster of images than to images from the first cluster.
  • the second cluster of voxels does not appear to exhibit a strong difference in response to the two clusters of images.

Furthermore, the predicted responses for the first cluster of voxels are much more correlated with the true responses for these voxels, with an average correlation of around 0.5. The predicted responses to the second cluster of voxels, on the other hand, have approximately zero correlation with the true responses exhibited by these voxels.

Exploring the voxel clusters

Notice that we found two clusters of voxels that respond to the visual stimuli differently: the first cluster of voxels is highly sensitive to visual stimuli, whereas the second cluster is not.

Our goal is to explore the physical locations of the voxels. We can load in the locations data that can be found on Yuval Benjamini’s website.

Note that v1_locations appears to hold the (x, y, z)-locations of each V1 voxel. However, recall that we had a number of voxels with mostly missing values. We remove these from our location data frame below.

Next, we can plot the voxels in space.

 

 

소스: Superheat