程序代写案例-VOLUME 18

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
1664 VOLUME 18 | NUMBER 11 | NOVEMBER 2015 nature neurOSCIenCe
a r t I C l e S
All individuals are unique. Nevertheless, human neuroimaging stud-
ies have traditionally collapsed data from many subjects to draw infer-
ences about general patterns of brain activity that are common across
people. Studies that contrast two populations—such as patients and
healthy controls—typically ignore the considerable heterogeneity
within each group.
Despite the predominance of such population-level inferences,
researchers have long recognized that even among the neurologically
healthy, brain structure1–3 and function4–6 show high individual vari-
ability. In terms of function, variability is found in activation patterns
during cognitive tasks4–6 as well as intrinsic organization as measured by
functional connectivity analyses of data acquired while subjects are rest-
ing7. Recently, the Human Connectome Project (HCP) set out to map
the connections in the human brain by acquiring high-quality structural
and functional MRI scans from a large number of healthy subjects8.
Many of the early analyses of HCP data have focused on elucidating the
general blueprint for brain connectivity that is shared across people. Yet
despite the gross similarities, there is reason to believe that a substantial
portion of the brain connectome is unique to each individual9.
An open question is whether this uniqueness is sufficiently observ-
able by fMRI to enable a transition from population-level studies to
investigations of single subjects. Here we show that functional connec-
tivity profiles act as an identifying fingerprint, establishing that indi-
vidual variability in connectivity is both substantial and reproducible.
Using HCP data from 126 subjects, each scanned during six separate
sessions over 2 d, we demonstrate that a functional connectivity pro-
file obtained from one session can be used to uniquely identify a given
individual from the set of profiles obtained from the second session.
We show that identification is successful between rest sessions, task
sessions and even across rest and task. These results indicate that
although changes in brain state may modulate connectivity patterns to
some degree, an individual’s underlying intrinsic functional architec-
ture is reliable enough across sessions and distinct enough from that
of other individuals to identify him or her from the group regardless
of how the brain is engaged during imaging.
Furthermore, we establish the relevance of these connectivity
profiles to behavior by demonstrating, in a fully cross-validated
analysis, that functional connectivity profiles can be used to predict
the fundamental cognitive trait of fluid intelligence in subjects. These
results provide a critical foundation for future work to novel test infer-
ences about single subjects to reveal how individual functional brain
organization relates to distinct behavioral phenotypes.
RESULTS
Data for this study consisted of scans from 126 subjects provided
in the Q2 data release of the HCP8. Each subject was scanned over
a period of 2 d. We used data from six separate imaging conditions:
two rest sessions (one on each day, here referred to as Rest1 and
Rest2) and four task sessions (working memory, emotion, motor and
language; two on each day). Functional connectivity was assessed using a
functional brain atlas10 consisting of 268 nodes covering the whole brain;
this atlas was defined on the basis of a separate population of healthy
subjects (Online Methods, Yale data set). The Pearson correlation coef-
ficients between the time courses of each possible pair of nodes were
calculated and used to construct 268 × 268 symmetrical connectivity
matrices, where each element represents a connection strength, or edge,
1Interdepartmental Neuroscience Program, Yale University, New Haven, Connecticut, USA. 2Department of Diagnostic Radiology, Yale School of Medicine,
New Haven, Connecticut, USA. 3Department of Psychology, Yale University, New Haven, Connecticut, USA. 4Department of Neurobiology, Yale University, New Haven,
Connecticut, USA. 5Department of Biomedical Engineering, Yale University, New Haven, Connecticut, USA. 6Department of Neurosurgery, Yale School of Medicine,
New Haven, Connecticut, USA. 7These authors contributed equally to this work. Correspondence should be addressed to E.S.F. ([email protected]).
Received 11 August; accepted 11 September; published online 12 October 2015; doi:10.1038/nn.4135
Functional connectome fingerprinting: identifying
individuals using patterns of brain connectivity
Emily S Finn1,7, Xilin Shen2,7, Dustin Scheinost2, Monica D Rosenberg3, Jessica Huang2, Marvin M Chun1,3,4,
Xenophon Papademetris2,5 & R Todd Constable1,2,6
Functional magnetic resonance imaging (fMRI) studies typically collapse data from many subjects, but brain functional
organization varies between individuals. Here we establish that this individual variability is both robust and reliable, using
data from the Human Connectome Project to demonstrate that functional connectivity profiles act as a ‘fingerprint’ that can
accurately identify subjects from a large group. Identification was successful across scan sessions and even between task and
rest conditions, indicating that an individual’s connectivity profile is intrinsic, and can be used to distinguish that individual
regardless of how the brain is engaged during imaging. Characteristic connectivity patterns were distributed throughout the brain,
but the frontoparietal network emerged as most distinctive. Furthermore, we show that connectivity profiles predict levels of fluid
intelligence: the same networks that were most discriminating of individuals were also most predictive of cognitive behavior.
Results indicate the potential to draw inferences about single subjects on the basis of functional connectivity fMRI.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe VOLUME 18 | NUMBER 11 | NOVEMBER 2015 1665
a r t I C l e S
between two nodes. This was done for each subject for each condition
separately, such that each subject had a total of six matrices reflecting
connectivity patterns during each of the different scan sessions.
Identification was performed across pairs of scans consisting of
one ‘target’ and one ‘database’ session, with the requirement that the
target and database sessions be taken from different days: for example,
Rest1 matrices were used as the target and compared to a database
of Rest2 matrices (Fig. 1a). In an iterative process, one individual’s
connectivity matrix was selected from the target set and compared
against each of the connectivity matrices in the database to find the
matrix that was maximally similar (Fig. 1b). Similarity was defined
as the Pearson correlation coefficient between vectors of edge values
taken from the target matrix and each of the database matrices. Once
an identity had been predicted, the true identity of the target matrix
was decoded and that iteration was assigned a score of 1, if the pre-
dicted identity matched the true identity, or 0, if it did not. Within a
target-database pair, each individual target connectivity matrix was
tested against the database in an independent trial.
We tested identification across all possible pairs of scan sessions
(Fig. 1a; nine pairs, each with two possible configurations created by
exchanging the roles of target and database session). In each case, the
success rate was measured as the percentage of subjects whose identity
was correctly predicted out of the total number of subjects.
Connectivity-based identification of individual subjects
Whole-brain identification. As a first pass, identification was
performed using the whole-brain connectivity matrix (268 nodes;
35,778 edges), with no a priori network definitions. The success
rate was 117/126 (92.9%) and 119/126 (94.4%) based on a target-
database of Rest1-Rest2 and the reverse Rest2-Rest1, respectively.
The success rate ranged from 68/126 (54.0%) to 110/126 (87.3%)
with other database and target pairs, including rest-to-task and task-
to-task comparisons.
Given that the 126 identification trials were not independent
from one another, we performed nonparametric permutation
testing to assess the statistical significance of these results
(Online Methods). Across 1,000 iterations, the highest success
rate achieved was 6/126, or roughly 5%. Thus the P value associated
with obtaining at least 68 correct identifications (the minimum rate
we achieved) is 0.
Network-based identification. We next tested identification
accuracy on the basis of each of eight specific functional networks
to test the hypothesis that certain brain networks contribute more
to individual subject discriminability than others. These networks
were derived from the same set of healthy subjects used to define
the 268-node atlas (Fig. 1c). Two networks emerged as the most
successful in individual subject identification: the medial frontal
network (network 1) and the frontoparietal network (network 2),
both comprised of higher-order association cortices in the fron-
tal, parietal and temporal lobes. We also tested a combination of
these networks to determine whether this combination would
afford even higher predictive power than each network on its own.
Thus, we determined identification rates based on each network
separately, the combination of networks 1 and 2 (referred to for
convenience as the frontoparietal networks) as well as the whole
brain for each of the nine database and target pairs (Fig. 2a,b).
Frontoparietal-based identification accuracy was extremely high
between Rest1 and Rest2 (98–99%). Accuracy dropped slightly
when identification was performed between rest and task or two task
conditions, but remained high (80–90% for most condition pairs).
The combination of these two frontoparietal networks significantly
outperformed either network alone and the whole brain across all 18
database-to-target pairs (one-tailed paired t-test versus network 1,
t17 = 10.4, P < 10−9; versus network 2, t17 = 1.97, P = 0.03; versus whole
brain, t17 = 5.1, P < 10−5).
Day 1
WM MtR1
R2 Lg Em
Day 2
Subj 1 Subj 2 Subj N
Target
matrix
r1 r2 rN
ID* = argmax({r1, r2, ..., rN})
1. Medial frontal
2. Frontoparietal
3. Default mode
4. Subcortical-cerebellum
5. Motor
6. Visual l
7. Visual ll
8. Visual association
Database matrices
b
c
aFigure 1 Identification analysis procedure
and network definitions. (a) Database and
target design. Data for each subject was
collected in six fMRI sessions: a resting-state
session (R1), a working-memory task (WM)
and a motor task (Mt) on day 1 and a resting-
state session (R2), a language task (Lg)
and an emotion task (Em) on day 2.
For identification, we used a set of connectivity
matrices from one session for the database
and connectivity matrices from a second
session acquired on a different day as the
target set. All possible combinations of
database and target sessions are indicated
by the arrows connecting session pairs.
(b) Identification procedure. Given a query
connectivity matrix from the target set, we
computed the correlations between this
matrix and all the connectivity matrices in the
database. The predicted identity ID* is the
one with the highest correlation coefficient
(argmax). Subj, subject. (c) Node and network
definitions. We used a 268-node functional
atlas defined on an independent data set of
healthy control subjects using a group-wise
spectral clustering algorithm10. Nodes were
further grouped into networks (1–10) using the
same clustering algorithm, and networks were
named according to their correspondence to
other existing resting-state network definitions.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
1666 VOLUME 18 | NUMBER 11 | NOVEMBER 2015 nature neurOSCIenCe
a r t I C l e S
Factors affecting identification accuracy
We next explored several factors affecting identification accuracy,
including quantifying contributions of individual edges to subject
discriminability, varying the length of the time courses used to calcu-
late connectivity matrices, expanding the database set to two matri-
ces, comparing different node and network schemes and ruling out
potential confounds (motion and anatomic differences).
Quantifying edgewise contributions to identification. To quantify
the extent to which different edges contribute to subject identification,
we derived two measures: the differential power (DP) and the group
consistency (Φ). DP reflects each edge’s ability to distinguish an indi-
vidual subject by quantifying how ‘characteristic’ that edge tends to be,
such that edges with a high DP tend to have similar values within an
individual across conditions, but different values across individuals
regardless of condition. Φ quantifies the consistency of a connection
within a subject and across the group (Online Methods).
Restricting analysis to Rest1 and Rest2, we calculated DP and Φ
for all edges in the brain and determined which edges were in
the 99.5 percentile on each of these two measures (Fig. 3a). We
found that majority of edges in the 99.5 percentile of DP are in
the frontal, temporal and parietal lobes and involve nodes in
the frontoparietal networks (networks 1 and 2) or default mode
(network 3). This result was stable across percentile thresh-
olds (Supplementary Table 1). This data-driven mathematical
definition of characteristic edges recapitulates the results of our
network-based analysis, showing that in general, connections involving
higher-order association cortices are the most discriminating of
subjects. Approximately 28% of the edges with high DP were within
and between networks 1 and 2. Another 48% were edges linking
these two networks to other networks (Fig. 3a), suggesting that
levels of interaction between the frontoparietal networks and the
rest of the brain are also highly discriminating.
Conversely, the majority of high-Φ edges connected cross-
hemisphere homologs in the occipital and parietal lobes (Fig. 3a).
Many of these edges were within the motor network (network 5) or
the primary visual network (network 6; Fig. 3a). These edges are
highly consistent both within and across subjects, and thus do not
contribute substantially to individual subject identification.
Identification using shorter time courses. Scan sessions differed in
duration, meaning that the amount of data used to compute the con-
nectivity profiles varied between sessions. Sessions Rest1 and Rest2
contained two runs of 1,200 brain volumes (time points) each, and
were substantially longer than the working memory scan (two runs of
405 time points), the language scan (316), the motor scan (284) and
the emotion scan (176). To investigate the effect of the number of time
points on identification power, we performed frontoparietal-based
identification between the two rest sessions while varying the number
of time points used to calculate connectivity matrices between 100
and 1,100. We observed that longer time courses better preserved
individual characteristics in connectivity profiles (Fig. 3b) and that
temporal variability in the connectivity profiles degraded identifica-
tion based on shorter time courses, especially those with fewer than
~500 time points.
Identification based on two-matrix database. We found, as expected,
that individual identification is more challenging when performed
across connectivity matrices obtained from different task conditions
or from task and rest conditions (even after taking into account the
fact that task runs contain fewer time points) (Fig. 2). It is well known
that connectivity patterns can be modulated by different imaging
conditions11–13, and such modulation contributes to intra-subject
variation, making identification more difficult. In an effort to cap-
ture the intra-subject variation, we performed identification using
an expanded database that included two connectivity matrices per
1.0
R1
R2
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l
1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l
1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l 1 2 3 4 5 6 7 8
1+
2 Al
l
1.0
0.8
0.6
0.4
0.2
0
1.0
0.8
0.6
0.4
0.2
0
1.0
0.8
0.6
0.4
0.2
0I
de
nt
ifi
ca
tio
n
ra
te
Network
0.99 0.98
Frontoparietal networks
Testing
T
esting
Database
D
at
ab
as
e
R2
R1
WM
Mt
R1
WM
Mt
R2Lg
0.85 0.82 0.89 0.92
0.790.900.89
0.92
ID rate
0.5 1.0
0.86 0.93
0.86
0.79
0.90
0.64
0.87
0.83
LgEm Em
1.0
0.8
0.6
0.4
0.2
0
1.0
0.8
0.6
0.4
0.2
0
1.0
0.8
0.6
0.4
0.2
0
1.0
0.8
0.6
0.4
0.2
0
R1
Lg 1.0
0.8
0.6
0.4
0.2
0
R1
Em
Em
WM
Em
Mt
Lg
Mt
Lg
WMWM
R2
R2
Mt
a b
Figure 2 Identification accuracy across session pairs and networks. (a) Identification accuracy based on all nine database and target pairs, where
each row shares the same database session and each column shares the same target session. Bar shading (black or gray) indicates which session was
used as the database (with the other session serving as the target), according to the legend above each graph. Graphs show accuracy based on each
individual network as well as a combination of networks 1 and 2 (the frontoparietal networks) and the whole brain (all). (b) Identification (ID) results
from the combined frontoparietal networks (top) are highlighted in color-coded matrices (bottom) to compare accuracy across rest-rest, rest-task and
task-task session pairs. R1, Rest1; WM, working-memory task; Mt, motor task; R2, Rest2; Lg, language task; Em, emotion task.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe VOLUME 18 | NUMBER 11 | NOVEMBER 2015 1667
a r t I C l e S
subject (one rest session and one task session from the same day, with
the target matrix from a task session on the other day; networks 1 and
2 were used for this analysis). In all cases, accuracy was improved
using the two-matrix database over a database of either the rest or
task session alone (rank sum = 68, two-sided P = 0.004 versus rest
alone; rank sum = 100, two-sided P = 0.0002 versus task alone, Mann-
Whitney U test). The average success rate increased to 97.2%, with
a maximum rate of 100% (average rates were 82.1% with a task-only
database and 86.9% with a rest-only database) (Fig. 3c). This suggests
that the within-subject variability is well captured by a linear model
that includes a baseline (connectivity at rest) and a deviation intro-
duced by an independent task (connectivity during task).
Effects of parcellation scheme. To investigate the sensitivity of iden-
tification to the specific choice of parcellation atlas and network defi-
nitions, we repeated the identification experiments using connectivity
matrices calculated from the 68-node FreeSurfer atlas14 and grouped
into seven networks using the scheme described in Yeo et al.15 (below
referred to as the FreeSurfer-Yeo scheme), where networks 3, 4 and 6
represent the frontoparietal networks. Between the two rest sessions,
the identification rate based on this atlas was ~89% when the whole
brain was used and ~75% when the frontoparietal networks were used
(Fig. 4a). The reduction in identification accuracy compared to our
268-node functional parcellation and corresponding network defi-
nitions, especially in the case of frontoparietal-based identification,
suggests that a relatively high-resolution parcellation contributes to
the detection of individual variability and boosts identification rate.
To further investigate the difference in identification accuracy,
we calculated correlations between connectivity matrices of all 126
subjects from Rest1 and Rest2 on the basis of our atlas and network
scheme and the FreeSurfer-Yeo scheme. In a comparison of the raw
cross-subject correlation coefficients for the frontoparietal networks
(Fig. 4b), we found that with the FreeSurfer-Yeo scheme, the raw coe-
fficients were larger for both matched (diagonal) subjects (t250 = −4.3,
P < 10−5) and unmatched (off-diagonal) subjects (t31,750 = −18.0,
P < 10−72) (Fig. 4b), which does not explain the difference in identifi-
cation accuracy. To control for equal global distribution of correlation
coefficients, we normalized both matrices (Fig. 4c). After normalization,
there was no difference between schemes in off-diagonal elements
(t31,750 = 0.27, P = 0.79); however, using our parcellation and network
scheme, the diagonal elements were significantly larger than with
the FreeSurfer-Yeo scheme (t250 = 14.6, P < 10−35), underpinning the
increase in identification rate (Fig. 4c).
Effects of head motion. As head motion is a known confound of
connectivity analyses16, we tested whether subjects could be identi-
fied on the basis of the distribution of their frame-to-frame motion
during functional scans (Online Methods). Identification based on
each subject’s motion distribution had an average success rate of 2.4%,
well below the identification accuracy achieved using connectivity
profiles. Thus it is unlikely that identification power is based on
idiosyncratic patterns related to motion in the scanner.
Effects of anatomical differences. The HCP provides functional data
that have been normalized to a common space. However, small errors
are inherent to the normalization process, and such errors result in
part from individual differences in anatomy, which are constant in
time. Thus, the influence of individual brain anatomy is hard to elimi-
nate completely from the functional data, as these errors could bias
the individual identification to select the same subject on two different
days. Nonetheless, any such influence from anatomy should remain
largely static and should not be modulated by task conditions.
To confirm that identification power came from true differences
in functional connectivity rather than anatomic idiosyncrasies, we
recalculated connectivity matrices using different smoothing kernels
(4 mm, 6 mm and 8 mm) for the blood oxygen level–dependent (BOLD)
data. With larger smoothing kernels, the registration advantages for
the same brain compared to different brains should be vastly reduced
or eliminated. Yet we saw only a slight drop in identification power:
with Rest1 and Rest2 pairs, identification using the frontoparietal
Figure 3 Factors affecting identification
accuracy. (a) Highly unique (DP, top, red)
and highly consistent (Φ, bottom, blue)
edges in individual connectivity profiles.
For visualization, both sets of edges were
thresholded at the 99.5 percentile. In the
circle plots (left), the 268 nodes (inner circle)
are organized into a lobe scheme (outer circle)
roughly reflecting brain anatomy from anterior
(top of circle) to posterior (bottom of circle),
and split into left and right hemispheres; lines
indicate edges. In the colored matrices (right),
the same data are plotted as percentage of
edges within and between each pair of networks;
a darkly shaded cell indicates a relative over-
representation of that network pair in the DP
(top) or Φ (bottom) masks. PFC, prefrontal
cortex; mot, motor; ins, insula; par, parietal;
tem, temporal; occ, occipital; lim, limbic
(including cingulate cortex, amygdala and
hippocampus); cer, cerebellum; sub, subcortical
(including thalamus and striatum); bsm, brainstem; L, left hemisphere; R, right hemisphere. (b) Longer time series improve identification accuracy.
To control for the fact that task sessions contained fewer time points than rest sessions, we recalculated rest connectivity matrices using truncated
time series containing between 100 and 1,100 time points. Results shown are from 500 randomizations using Rest1 and Rest2 as the database
and target sessions, respectively. Boxplots represent median (stripe), 25th and 75th percentiles (box) and range (whiskers). (c) Use of a two-matrix
database improves identification rate relative to a single matrix (task or rest). Dots and error bars represent mean and range of identification rate across
all possible database and target pairs, where the target matrix was always from a task session and the database consisted of a rest-task pair (n = 8
combinations), task only (n = 8) or rest only (n = 4). *P < 0.01, Mann-Whitney U test.
1
DP
L
PFC
Mot
Ins
Par
Tem
Occ
Lim
Sub
Bsm
Cer
R
L R
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
0.12
1.0
0.9
0.8
0.7
0.6
0.5
1 2 3 4
Number of time points (×100)
ID
r
at
e
1.0
0.9
0.8
0.7
0.6
0.5
ID
r
at
e
5 6 7 8 9 10 11
0.06
0
0.2
0.1
0
Networks
F
ra
ct
io
n
D
P
e
dg
es
F
ra
ct
io
n
Φ
e
dg
es
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
Networks

Task
only
Rest +
task
Rest
only
* *
a b
c
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
1668 VOLUME 18 | NUMBER 11 | NOVEMBER 2015 nature neurOSCIenCe
a r t I C l e S
networks remained above 96% for all three smoothing levels
(Supplementary Table 2).
We also investigated whether individuals could be identified by
BOLD signal variance in each node (Online Methods). Although it is
likely that BOLD variance reflects metabolic function to a substantial
degree, it could also be influenced by anatomic factors, such as
differences among subjects in number of gray-matter voxels per node.
This identification was successful, ranging 48–87% across different
combinations of database and target sessions, but in all cases it was
less successful than connectivity-based identification (Fig. 2b and
Supplementary Table 3). Crucially, if BOLD variance were driven
mostly by anatomic properties, it should be fairly constant regardless
of how the brain is engaged during imaging, and thus the accuracy
of variance-based identification should not differ according to brain
state. That there was considerable variability with brain state further
suggests that functional rather than anatomic features are responsible
for the high degree of identification accuracy observed.
Connectivity profiles predict cognitive behavior
To determine whether individual differences in functional connec-
tivity are relevant to individual differences in behavior, we tested
whether connectivity profiles could be used to predict subjects’
level of fluid intelligence. Fluid intelligence (gF) is the capacity
for on-the-spot reasoning to discern patterns and solve problems
independently of acquired knowledge17. Levels of gF vary widely
in the population18 and correlate with many other cognitive abilities
and life outcomes19–22; investigating the biological basis of gF is
of interest, as it is thought to reflect intrinsic cognitive ability.
In the HCP protocol, gF was assessed using a form of Raven’s progres-
sive matrices with 24 items23 (scores are integers indicating number
of correct items).
We used leave-one-subject-out cross-validation to demonstrate
that gF can be predicted based solely on the connectivity profile of a
previously unseen individual. In this iterative process, data from one
subject is set aside as the test set, and data from the remaining n − 1
subjects is used as the training set. Each iteration consisted of three
steps: (i) feature selection, in which edges with a significant relation-
ship to gF are identified in the training set and separated into two tails
according to sign (positive or negative); (ii) model building, in which
training data are used to fit two simple linear regressions between gF
and a summary statistic of connectivity strength in the positive- and
negative-feature network, respectively, and (iii) prediction, in which
data from the excluded subject are input into each model to generate
a predicted gF score (Online Methods). Following all iterations, we
assessed the predictive power of each model by correlating predicted
and observed gF scores across all subjects. Data from the Rest1 ses-
sion were used here.
Similarly to the identification analysis, as a first pass, we tested
whether whole-brain connectivity could predict gF in novel subjects.
On the basis of a feature-selection threshold of P < 0.01, the correla-
tion between predicted and observed gF scores was r = 0.50 (P < 10−9)
for the positive-feature model (Fig. 5a) and r = 0.26 (P = 0.005) for the
negative-feature model. Although both models generated significant
predictions, the positive-feature model was more accurate than the
negative-feature model (z = 2.15, two-tailed P = 0.03). Prediction
was significant across all three feature-selection thresholds tested:
all r > 0.29, P ≤ 0.001 for the positive tail; all r > 0.22, P ≤ 0.01 for the
negative tail. (The predicted range was narrower than the observed
range in Fig. 5a; thus the model was most successful at generating
predictions of gF level relative to other subjects.)
Owing to the nature of our cross-validated approach, a slightly
different set of edges was selected in each iteration. To explore which
networks contributed the most predictive power, for each of the eight
networks we calculated the average number of within-network edges
selected across all iterations and normalized by the total number of
within-network edges (to control for differences in overall network
sizes). Networks 1 and 2 contributed the highest fraction of edges to
the positive model, and network 3 contributed the highest fraction
of edges to the negative model; this was consistent across statistical
thresholds for feature selection (Fig. 5b). Thus, edges that show a con-
sistent positive correlation with gF are disproportionately located in
the frontoparietal networks, and edges that show a consistent negative
correlation with gF are mostly in the default-mode network.
As a second-pass analysis, we tested directly whether predictive
power varied across the different networks; specifically, whether the
networks that performed best for identification (the frontoparietal
networks) also performed best for predicting cognitive behavior. To
do this, we repeated the leave-one-subject-out cross-validated proce-
dure described above, this time restricting the feature selection step
to features (within-network edges) from each of the eight networks
in turn. We also tested a combination of networks 1 and 2. Thus,
nine sets of predicted gF scores were generated. Each of these was
correlated with observed gF scores to assess the predictive power of
each network or network combination (Fig. 5c,d).
Figure 4 Effect of node and network scheme
on identification accuracy. (a) Identification
accuracy with the node atlas and network
definitions described in Shen et al.10 (Shen)
versus the FreeSurfer-Yeo scheme14,15
(FS + Yeo). (b) Raw 126 × 126 cross-subject
correlations of frontoparietal connectivity
patterns from Rest1 and Rest2 (top).
Row and column subject order is symmetric;
thus, diagonal elements are correlation scores
from matched subjects. Mean correlation
coefficients for both diagonal (matched) and
off-diagonal (unmatched) elements are also
shown (bottom, error bars represent ±s.d.).
The overall correlation coefficients are higher
using the FreeSurfer-Yeo scheme, for both
diagonal elements (n = 126) and off-diagonal elements (n = 15,876). **P < 10−5, two-tailed t-test. (c) Cross-subject correlation matrices after z-score
normalization (top). The global difference in correlation values is eliminated as there is no significant (n.s.) difference in the off-diagonal z scores.
However, correlations between diagonal elements are significantly higher using the Shen scheme than the FreeSurfer-Yeo scheme (bottom; error bars
represent ±s.d.), which helps account for the increase in identification accuracy using the Shen scheme. **P < 10−5, two-tailed t-test.
1.0
0.9
0.8
0.7
0.6
0.5
ID
r
at
e
Frontoparietal
Whole brain
Shen FS + Yeo
Non-normalized
Shen FS + Yeo
1
0
Diagonal Off-diagonal
1.0
0.8
0.6
0.4
0.2
0
**
**
Sh
en
FS
+
Y
eo
Sh
en
FS
+
Y
eo
Sh
en
FS
+
Y
eo
Sh
en
FS
+
Y
eo
2.0
1.5
1.0
0.5
0
z
sc
or
e
z score
**
n.s.
Diagonal Off-diagonal
Normalized
Shen FS + Yeo
2.2
0
r
va
lu
e
r value
a b c
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe VOLUME 18 | NUMBER 11 | NOVEMBER 2015 1669
a r t I C l e S
As hypothesized, predictive power based on the positive features
was highest for the frontoparietal networks (at a feature-selection
threshold of P < 0.01, r = 0.42, P < 10−6 and r = 0.39, P < 10−5 for
networks 1 and 2, respectively; r = 0.50, P < 10−9 for the two net-
works combined; this pattern was consistent across different statistical
thresholds used for feature selection). The subcortical-cerebellar
network (network 4) also had significant predictive power at a feature-
selection threshold of P < 0.01 (r = 0.22, P = 0.01) but this result was
less consistent across feature-selection thresholds. Predictive power
based on the negative features was significant only for the default
mode (network 3; r = 0.35, P < 10−5 at P < 0.01); this result was con-
sistent across feature-selection thresholds.
These results reinforce the functional relevance of our identifica-
tion analyses, in that the networks most discriminating of individu-
als are also the most relevant to individual differences in behavior.
Crucially, the relationship between connectivity and cognitive ability
is sufficiently robust to generalize to previously unseen subjects.
DISCUSSION
Here we show that an individual’s functional brain connectivity
profile is both unique and reliable, similarly to a fingerprint. We
demonstrate that it is possible, with near-perfect accuracy in many
cases, to identify an individual from a large group of subjects solely on
the basis of his or her connectivity matrix. Although inter-individual
consistency in functional brain networks has been well characterized
across task and rest conditions24,25 and even states of consciousness26,
the intra-individual reliability observed here suggests that the gen-
eral blueprint may be shared, but functional organization within
individual subjects is idiosyncratic and relatively robust to changes
in brain state, and provides meaningful information beyond the
common template27.
We also demonstrate that this individual variability is relevant to
individual differences in behavior, in that connectivity profiles can
be used to predict the fundamental cognitive trait of fluid intelligence
in novel subjects. These results underscore the potential to discover
fMRI-based connectivity ‘neuromarkers’ of present or future behavior
that may eventually be used to personalize educational and clinical
practices and improve outcomes28–30.
Anatomic loci of distinguishing connectivity features
Although identification based on the whole-brain connectivity matrix
was highly successful, performance was best using a combination of
the two frontoparietal networks. These networks are comprised of
higher-order association cortices rather than primary sensory regions;
the cortical regions are also the most evolutionarily recent31 and show
the highest inter-subject variance7,32,33.
That the frontoparietal networks were most distinguishing of indi-
viduals—and the most predictive of behavior—is consistent with their
function in cognition. Nodes in these networks tend to act as flexible
hubs, switching connectivity patterns according to task demands34.
Additionally, broadly distributed across-network connectivity has been
reported in these regions35, suggesting a role in large-scale coordination
0.3
0.2
0.1
0
Negative features
Negative-feature model
1 2 3 4 5 6 7 8
Network
M
ea
n
fr
ac
tio
n
of
e
dg
es
P < 0.01
P < 0.05
P < 0.10
P < 0.01
P < 0.05
P < 0.10
0.6
0.4
0.2
0
–0.2
1+2 1 2 3 4 5 6 7 8
Network
r
va
lu
e * *
25
20
15
10
5
25
20
15
10
5
5 1510 20 25
Observed gF
Frontoparietal networks
P
re
di
ct
ed
g
F
P
re
di
ct
ed
g
F
Whole brain
5 10 15 20 25
Observed gF
r = 0.50
P < 10–9
r = 0.50
P < 10–9
a
c
0.3
0.2
0.1
0
Network
1 2 3 4 5 6 7 8
M
ea
n
fr
ac
tio
n
of
e
dg
es
P < 0.01
P < 0.05
P < 0.10
Positive features
Positive-feature model
0.6
0.4
0.2
0
–0.2
1+2 1 2 3 4 5 6 7 8
Network
P < 0.01
P < 0.05
P < 0.10
r
va
lu
e
***
*** ***
** *
b
d
* * *
Figure 5 Individual connectivity profiles predict cognitive behavior. (a) Results from a leave-one-subject-out cross-validation (LOOCV) analysis comparing
predicted and observed fluid intelligence (gF) scores (n = 118 subjects). Scatter plot shows predictions based on the whole brain in the positive-feature
network at a feature-selection threshold of P < 0.01. Each dot represents one subject; gray area represents 95% confidence interval for best-fit line, used to
assess predictive power of the model. (b) Mean fraction of within-network edges selected in the whole-brain positive-feature (left, red) and negative-feature
(right, blue) models, shown at a range of statistical thresholds for feature selection. y axis indicates mean fraction of edges selected across all LOOCV
iterations; x axis indicates network (labeled as in Fig. 1c). (c) Results from a LOOCV analysis in which feature selection was restricted to within-network
edges in the frontoparietal networks (1 and 2), at a feature-selection threshold of P < 0.01. Each dot represents one subject; gray area represents 95%
confidence interval for best-fit line. (d) Results from nine LOOCV analyses in which feature selection was restricted to within-network edges in each of the
eight networks plus a combination of networks 1 and 2. y axis indicates correlation between predicted and observed gF scores; x axis indicates network
label. *P < 0.05, uncorrected. Results based on a range of feature-selection thresholds (P values) are shown to demonstrate consistency across thresholds.
For some networks, no features passed the statistical thresholding step and thus it was not possible to generate predictions; this is reflected by missing bars.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
1670 VOLUME 18 | NUMBER 11 | NOVEMBER 2015 nature neurOSCIenCe
a r t I C l e S
of brain activity. Although the frontoparietal network is particularly
active in tasks requiring a high degree of cognitive control, here we
show that it can identify individuals regardless of whether the data
are collected during a task or at rest. Training cross-subject classifiers
based on frontoparietal connectivity to predict which task a subject is
performing yields classification accuracy that is statistically significant
but still low34; the present findings of high inter-individual variability
may help explain these results. In light of this, future work might use
within-subject classification to explore how the frontoparietal net-
works reorganize according to task demands in individual subjects.
Similarly, the frontoparietal networks emerged as most predictive
of gF, a finding that is consistent with previous reports that structural
and functional properties of these networks relate to intelligence36–38.
Aberrant functional connectivity in the frontoparietal networks has
been linked to a variety of neuropsychiatric illnesses39,40; the work
presented here, although focused on healthy subjects, suggests that
sensitivity may be compromised in studies of disease if inferences
are drawn only at the group level. New insights into neuropsychiatric
illnesses may be gained from an approach that links individual func-
tional connectivity profiles to a spectrum of behavioral and symptom
measures rather than a single diagnosis41,42.
Additional considerations
The discriminating power of connectivity profiles here is a result of inte-
gration over a relatively long period of time (that is, runs that last several
minutes). There is a growing body of literature43 showing that in the rest-
ing state, functional connectivity is dynamic and varies considerably over
short periods of time (that is, intervals shorter than 1 min). Future work
may seek to characterize individuals on the basis of properties of these
dynamic fluctuations; however, our results indicate that single measures
of time-averaged functional connectivity based on relatively long scan
sessions provide meaningful information about individuals.
We note also that our cross-session identification was done between
sessions separated by a single day; it remains unclear to what degree
individual connectivity profiles are consistent across the lifespan.
Cross-sectional studies have shown changes in functional connectivity
with age44–46, but future work should employ longitudinal designs to
test the stability or evolution of the functional connectivity fingerprint
over months or years rather than days.
From a methodological perspective, the scale of the node atlas
seems to influence identification accuracy. The parcellation used in
our primary analysis consisted of 268 nodes across the whole brain,
with each node optimized to contain voxels with similar resting-state
time courses10. This number is consistent with the range postulated
by other groups47,48 but represents a more fine-grained scheme
than other atlases, such as the automatic anatomic labeling atlas (90
nodes)49 or the FreeSurfer atlas (68 nodes). In a comparison of iden-
tification rates, network definitions based on our high-resolution
parcellation outperformed networks based on the FreeSurfer atlas; it
is likely that the coarser node size of the latter diminishes accuracy
by averaging out individual variability.
Conclusion
Together, these findings suggest that analysis of individual fMRI
data is possible and indeed desirable. Given this foundation, human
neuroimaging studies have an opportunity to move beyond popula-
tion-level inferences, in which general networks are derived from the
whole sample, to inferences about single subjects, examining how
individuals’ networks are functionally organized in unique ways and
relating this functional organization to behavioral phenotypes in both
health and disease.
METHODS
Methods and any associated references are available in the online
version of the paper.
Note: Any Supplementary Information and Source Data files are available in the
online version of the paper.
AckNoWlEdgmENTS
Data were provided in part by the Human Connectome Project, WU-Minn
Consortium (principal investigators, D. Van Essen and K. Ugurbil;
1U54MH091657) funded by the 16 US National Institutes of Health (NIH)
institutes and centers that support the NIH Blueprint for Neuroscience Research;
and by the McDonnell Center for Systems Neuroscience at Washington University.
This work was also supported by NIH grant EB009666 (R.T.C.), T32 DA022975
(D.S.) and the US National Science Foundation Graduate Research Fellowship
Program (E.S.F. and M.D.R.).
AUTHoR coNTRIBUTIoNS
E.S.F., X.S., D.S., X.P. and R.T.C. conceptualized the study. X.S. designed and
performed the identification analyses with support from E.S.F. and D.S. E.S.F.
designed and performed the behavioral analyses with support from M.D.R. and
J.H. X.S. and X.P. contributed unpublished data analysis tools and visualization
software. X.P., M.M.C. and R.T.C. provided support and guidance with data
interpretation. E.S.F. wrote the manuscript, with contributions from X.S. and
comments from all other authors.
comPETINg FINANcIAl INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Mangin, J.F. et al. A framework to study the cortical folding patterns. Neuroimage
23 (suppl. 1): S129–S138 (2004).
2. Amunts, K., Malikovic, A., Mohlberg, H., Schormann, T. & Zilles, K. Brodmann’s
areas 17 and 18 brought into stereotaxic space-where and how variable? Neuroimage
11, 66–84 (2000).
3. Bürgel, U. et al. White matter fiber tracts of the human brain: three-dimensional
mapping at microscopic resolution, topography and intersubject variability.
Neuroimage 29, 1092–1105 (2006).
4. Grabner, R.H. et al. Individual differences in mathematical competence predict
parietal brain activation during mental calculation. Neuroimage 38, 346–356
(2007).
5. Newman, S.D., Carpenter, P.A., Varma, S. & Just, M.A. Frontal and parietal
participation in problem solving in the Tower of London: fMRI and computational
modeling of planning and high-level perception. Neuropsychologia 41, 1668–1682
(2003).
6. Rypma, B. & D’Esposito, M. The roles of prefrontal brain regions in components
of working memory: effects of memory load and individual differences. Proc. Natl.
Acad. Sci. USA 96, 6558–6563 (1999).
7. Mueller, S. et al. Individual variability in functional connectivity architecture of the
human brain. Neuron 77, 586–595 (2013).
8. Van Essen, D.C. et al. The WU-Minn Human Connectome Project: an overview.
Neuroimage 80, 62–79 (2013).
9. Barch, D.M. et al. Function in the human connectome: Task-fMRI and individual
differences in behavior. Neuroimage 80, 169–189 (2013).
10. Shen, X., Tokoglu, F., Papademetris, X. & Constable, R Groupwise whole-brain
parcellation from resting-state fMRI data for network node identification. Neuroimage
82, 403–415 (2013).
11. Bianciardi, M. et al. Modulation of spontaneous fMRI activity in human visual cortex
by behavioral state. Neuroimage 45, 160–168 (2009).
12. Jiang, T., He, Y., Zang, Y. & Weng, X. Modulation of functional connectivity during
the resting state and the motor task. Hum. Brain Mapp. 22, 63–71 (2004).
13. Stevens, W.D., Buckner, R.L. & Schacter, D.L. Correlated low-frequency BOLD
fluctuations in the resting human brain are modulated by recent experience in
category-preferential visual regions. Cereb. Cortex 20, 1997–2006 (2010).
14. Fischl, B. et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex
14, 11–22 (2004).
15. Buckner, R.L., Krienen, F.M., Castellanos, A., Diaz, J.C. & Yeo, B.T. The organization
of the human cerebellum estimated by intrinsic functional connectivity.
J. Neurophysiol. 106, 2322–2345 (2011).
16. Van Dijk, K.R., Sabuncu, M.R. & Buckner, R.L. The influence of head motion on
intrinsic functional connectivity MRI. Neuroimage 59, 431–438 (2012).
17. Cattell, R.B. Intelligence: Its Structure, Growth and Action: Its Structure, Growth
and Action (Elsevier, 1987).
18. Deary, I.J., Whalley, L.J., Lemmon, H., Crawford, J.R. & Starr, J.M. The stability of
individual differences in mental ability from childhood to old age: follow-up of the
1932 Scottish Mental Survey. Intelligence 28, 49–55 (2000).
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe VOLUME 18 | NUMBER 11 | NOVEMBER 2015 1671
a r t I C l e S
19. Colom, R. & Flores-Mendoza, C.E. Intelligence predicts scholastic achievement
irrespective of SES factors: evidence from Brazil. Intelligence 35, 243–251 (2007).
20. Strenze, T. Intelligence and socioeconomic success: a meta-analytic review of
longitudinal research. Intelligence 35, 401–426 (2007).
21. Gottfredson, L.S. Intelligence: is it the epidemiologists’ elusive “fundamental cause”
of social class inequalities in health? J. Pers. Soc. Psychol. 86, 174 (2004).
22. Chandola, T., Deary, I., Blane, D. & Batty, G. Childhood IQ in relation to obesity
and weight gain in adult life: the National Child Development (1958) Study.
Int. J. Obes. (Lond.) 30, 1422–1432 (2006).
23. Bilker, W.B. et al. Development of abbreviated nine-item forms of the Raven’s
Standard Progressive Matrices Test. Assessment 19, 354–369 (2012).
24. Cole, M.W., Bassett, D.S., Power, J.D., Braver, T.S. & Petersen, S.E. Intrinsic and
task-evoked network architectures of the human brain. Neuron 83, 238–251 (2014).
25. Smith, S.M. et al. Correspondence of the brain’s functional architecture during
activation and rest. Proc. Natl. Acad. Sci. USA 106, 13040–13045
(2009).
26. Martuzzi, R., Ramani, R., Qiu, M., Rajeevan, N. & Constable, R.T. Functional
connectivity and alterations in baseline brain state in humans. Neuroimage 49,
823–834 (2010).
27. Laumann, T.O. et al. Functional system and areal organization of a highly sampled
individual human brain. Neuron 87, 657–670 (2015).
28. Gabrieli, John D.E., Ghosh, Satrajit S. & Whitfield-Gabrieli, S. Prediction as a
humanitarian and pragmatic contribution from human cognitive neuroscience.
Neuron 85, 11–26 (2015).
29. Castellanos, F.X., Di Martino, A., Craddock, R.C., Mehta, A.D. & Milham, M.P.
Clinical applications of the functional connectome. Neuroimage 80, 527–540
(2013).
30. Kelly, C., Biswal, B.B., Craddock, R.C., Castellanos, F.X. & Milham, M.P.
Characterizing variation in the functional connectome: promise and pitfalls. Trends
Cogn. Sci. 16, 181–188 (2012).
31. Zilles, K., Armstrong, E., Schleicher, A. & Kretschmann, H.J. The human pattern of
gyrification in the cerebral cortex. Anat. Embryol. (Berl.) 179, 173–179 (1988).
32. Hill, J. et al. A surface-based analysis of hemispheric asymmetries and folding of
cerebral cortex in term-born human infants. J. Neurosci. 30, 2268–2276 (2010).
33. Miranda-Dominguez, O. et al. Connectotyping: model based fingerprinting of the
functional connectome. PLoS ONE 9, e111048 (2014).
34. Cole, M.W. et al. Multi-task connectivity reveals flexible hubs for adaptive task
control. Nat. Neurosci. 16, 1348–1355 (2013).
35. Power, J.D. et al. Functional network organization of the human brain. Neuron 72,
665–678 (2011).
36. Cole, M.W., Yarkoni, T., Repovš, G., Anticevic, A. & Braver, T.S. Global connectivity
of prefrontal cortex predicts cognitive control and intelligence. J. Neurosci. 32,
8988–8999 (2012).
37. Choi, Y.Y. et al. Multiple bases of human intelligence revealed by cortical thickness
and neural activation. J. Neurosci. 28, 10323–10329 (2008).
38. Kanai, R. & Rees, G. The structural basis of inter-individual differences in human
behaviour and cognition. Nat. Rev. Neurosci. 12, 231–242 (2011).
39. Fornito, A. & Harrison, B.J. Brain connectivity and mental illness. Front. Psychiatry
3, 72 (2012).
40. Greicius, M. Resting-state functional connectivity in neuropsychiatric disorders.
Curr. Opin. Neurol. 21, 424–430 (2008).
41. Cuthbert, B.N. & Insel, T.R. Toward the future of psychiatric diagnosis: the seven
pillars of RDoC. BMC Med. 11, 126 (2013).
42. Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework
for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).
43. Hutchison, R.M. et al. Dynamic functional connectivity: promise, issues, and
interpretations. Neuroimage 80, 360–378 (2013).
44. Hampson, M. et al. Intrinsic brain connectivity related to age in young and middle
aged adults. PLoS ONE 7, e44067 (2012).
45. Meunier, D., Achard, S., Morcom, A. & Bullmore, E. Age-related changes in modular
organization of human brain functional networks. Neuroimage 44, 715–723 (2009).
46. Scheinost, D. et al. Sex differences in normal age trajectories of functional brain
networks. Hum. Brain Mapp. 36, 1524–1535 (2014).
47. Craddock, R.C., James, G.A., Holtzheimer, P.E., Hu, X.P. & Mayberg, H.S. A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Hum. Brain
Mapp. 33, 1914–1928 (2012).
48. Van Essen, D.C., Glasser, M.F., Dierker, D.L., Harwell, J. & Coalson, T. Parcellations
and hemispheric asymmetries of human cerebral cortex analyzed on surface-based
atlases. Cereb. Cortex 22, 2241–2262 (2012).
49. Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM
using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.
Neuroimage 15, 273–289 (2002).
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe doi:10.1038/nn.4135
ONLINE METHODS
Subject information. The primary data set used in this work is from the Human
Connectome Project (HCP). A second data set (the Yale data set), was used for
node and network definitions. These two data sets are described below.
HCP data. We used the Q2 HCP data release, which was all the HCP data
publicly available at the time that this project began. The full Q2 release contains
data on 142 healthy subjects; we restricted our analysis to subjects for whom all
six fMRI sessions were available (n = 126; 40 males, age 22–35). This represents
a relatively large sample size compared to most neuroimaging studies and has the
advantage of being an open-source data set, facilitating replication and extension
of this work by other researchers. Most subjects have at least one blood relative in
the group, many of them twins. A more heterogeneous population should make
the identification problem easier, and therefore the high accuracy rate observed
here, despite the homogeneity of the sample, underscores the power of functional
connectivity–based identification.
The resting-state runs (HCP filenames: rfMRI_REST1 and rfMRI_REST2)
were acquired in separate sessions on two different days. Task runs included the
following: working memory (tfMRI_WM), motor (tfMRI_MOTOR), language
(tfMRI_LANGUAGE, including both a story-listening and arithmetic task) and
emotion (tfMRI_EMOTION). The working memory task and motor task were
acquired on the first day, and the language and emotion tasks were acquired on
the second day. In total, there were six conditions: day 1 rest, day 2 rest, work-
ing memory, motor, language and emotion. The HCP scanning protocol was
approved by the local Institutional Review Board at Washington University in
St. Louis. For all sessions, data from both the left-right (LR) and right-left (RL)
phase-encoding runs were used to calculate connectivity matrices. Full details
on the HCP data set have been published previously8.
Yale data. This data set consisted of 45 healthy subjects scanned on a Siemens 3T
Tim Trio at Yale University (28 males; age = 31 ± 7.3 (s.d.), range 19–50). Resting-
state fMRI data were acquired using a multiband gradient echo EPI sequence
with similar parameters to the HCP acquisition (FOV = 210 mm, matrix size
84 × 84, TR = 0.956 ms, TE = 30 ms, resolution = 2.5 mm3). Eight 5.6-min
runs were acquired, totaling 45 min of data. Additionally, an MPRAGE image
(TR = 2,530 ms, TE = 2.77 ms, TI = 1,100 ms, flip angle = 7 degrees, resolution = 1 mm3)
and a two-dimensional anatomical T1 image (TR = 285 ms, TE = 2.61 ms, resolu-
tion = 2.5 mm3) were acquired for registration purposes. All participants pro-
vided written informed consent in accordance with a protocol approved by the
Human Research Protection Program of Yale University.
Preprocessing. The HCP minimal preprocessing pipeline was used50 for the HCP
data set. This pipeline includes artifact removal, motion correction and registra-
tion to standard space. For the Yale data set, images were motion corrected using
SPM8 and were warped to common space using a series of linear and nonlinear
transformations as previously described10.
For both the HCP and Yale data sets, standard preprocessing procedures
were applied to the fMRI data and included removal of linear components
related to the six motion parameters (Yale data) or 12 motion parameters (HCP
data; these include first derivatives, given as Movement_Regressors_dt.txt),
regression of the mean time courses of the white matter and CSF as well as the
global signal, removal of the linear trend, and low-pass filtering. For the HCP
data set, we investigated a range of spatial smoothing Gaussian kernel sizes—
from no smoothing to a full-width half-maximum (FWHM) of 4 mm, 6 mm
or 8 mm—and found that smoothing level had essentially no effect on identi-
fication accuracy (Supplementary Table 2); thus, results based on data with
no spatial smoothing are presented in the main text. (Our node-based analysis,
in contrast to a voxel-wise analysis, contains a considerable degree of inherent
smoothing because the time courses of many contiguous voxels are averaged
into a single node.)
Image preprocessing and calculation of connectivity matrices was done using
BioImage Suite51. Pearson correlation coefficients between pairs of node time
courses were calculated and normalized to z scores using the Fisher transformation,
resulting in a 268 × 268 symmetric connectivity matrix for each session for each
subject. Connectivity matrices were not thresholded or binarized in any way.
Functional parcellation and network definition. Using the 45 subjects in the
Yale data set, we constructed a 268-node functional atlas using a group-wise
spectral clustering algorithm10. The two hemispheres were segmented separately
into a target number of 150 regions. The final parcellation was examined to ensure
each node contained a reasonable number of voxels. Note that this single whole-
brain parcellation atlas was defined in MNI space and was applied to all subjects
in the HCP data set via traditional registration techniques. The parcellation image
is publicly available on the BioImage Suite NITRC page (https://www.nitrc.org/
frs/?group_id=51).
In addition to parcellating the brain into 268 functionally coherent nodes, we
further clustered these nodes into large-scale networks. To define the networks,
the same group-wise spectral clustering algorithm was applied to connectivity
matrices from the 45 Yale subjects to group the 268 nodes into eight networks10.
The eight networks were evaluated and compared visually to existing definitions
of resting-state networks published by other groups15,25. Despite the fact that we
included subcortical regions and cerebellum whereas other definitions excluded
these regions, our network configuration matched well with these other network
definitions. Our eight clusters represent approximately the following networks:
(i) medial frontal, (ii) frontoparietal, (iii) default mode, (iv) subcortical-cerebellum,
(v) motor, (vi) visual I, (vii) visual II and (viii) visual association (Fig. 1c).
Identification analysis. Figure 1b illustrates the prediction procedure. First, a
database was created that consisted of all the individual subjects’ connectivity
matrices from a single condition, D = =[ , , , ]X ii 1 126… , where Xi is a 268 × 268
correlation matrix and the subscript i denotes subject. In the identification step,
the identity of the target matrix was predicted using a correlation matrix obtained
from a different session. To predict the subject identity, the similarity between
the current target matrix Yi and all other matrices in D was computed, and the
predicted identity was that with the maximal similarity score. Similarity was
defined as the Pearson correlation between two vectors of edge values taken
from the target matrix and each of the database matrices. Note that we performed
prediction with replacement, such that the algorithm was not forced to predict a
unique subject on each iteration within a condition.
To assess the statistical significance of identification accuracy, we performed
nonparametric permutation testing. In each iteration, we first randomly selected
one condition from day 1 to serve as the database set and a second condition
from day 2 to serve as the target set. Next, subject identity was permuted—such
that each subject in the target set was assigned a ‘correct’ identity corresponding
to a different subject in the database set—and identification performed. Then
the roles of database and target sets were reversed. This procedure was repeated
1,000 times.
To investigate the contributions of individual networks (as described above) to
identification accuracy, a sub-matrix was used corresponding to a single network
or combination of networks. If we denote the set of nodes belonging to network j
as V k Kj jk j= =[ , , , ]u 1… , where Kj is the total number of nodes in network j, the
sub-matrix of network j is X(Vj, Vj), thus only connections within the selected
network(s) are included.
Quantifying edgewise contributions to identification. Edge-based analyses
were performed to determine whether specific edges contribute more to individ-
ual identification than other edges. When performing subject identification, the
Pearson correlation coefficients were computed between the target connectivity
pattern and all connectivity patterns in the database, and the subject identity was
chosen to be the one that had the largest correlation coefficient. Computationally,
the Pearson correlation of two vectors is the sum of element-wise products, given
that the two vectors are z-score normalized (0 mean with unit s.d.). Therefore,
this score can be broken down to quantify the individual amount contributed
from each entry in the vector, where some edges contribute positively to the total
coefficient and others contribute negatively.
Given two sets of connectivity matrices [ ],[ ]X Xi
R
i
R1 2 obtained from Rest1
and Rest2 runs after z-score normalization, we computed the corresponding
edge-wise product vector (ϕi),
ji iR iRe X e X e e M( ) ( ) * ( ), , ,= =1 2 1…
where i indexes subject, e indexes edge and M is the total number of edges in the
selected network (or the whole brain). The sum of ϕi over all edges Σe i ej ( ) is
the correlation between [ ]Xi
R1 and [ ]Xi
R2 . The group consistency measure (Φ)
was computed as the mean of ϕi across all subjects. Large positive entries in Φ are
edges that are consistent both within a subject and across the group.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCedoi:10.1038/nn.4135
In the same way, we can calculate ϕi between patterns from different subjects,
for example
jij iR jRe X e X e i j( ) ( ) * ( ),= ≠1 2
It is possible that an edge e is equally correlated within the same subject and
between different subjects. In other words, ϕij(e) (when subject subscript is not
matched) and ϕii(e) (when subject subscript is matched) are of similar value.
Therefore, such an edge will not contribute to distinguishing an individual from
the other subjects.
For an edge to be truly helpful in individual identification, the following prop-
erty must hold,
j j j jii ij ii jie e e e i j( ) ( ), ( ) ( ),> > ≠and
In this way, the particular edge contributes to maximize the correlation between
connectivity patterns from matched subjects. To quantify the differential power
of an edge for the purpose of subject identification, we computed an empirical
probability
P e P e e e e
e e e
i ji ii ij ii
ji ii ij
( ) ( ) ( ) ( ) ( )
( ( ) ( ) ( )
= > >
= > + >
j j j j
j j j
or
jii e N N( ) )/ ( ),2 1 126− =
We defined Pi(e) in this way so that it can be interpreted similarly as the
P value in a standard statistical test. The smaller the Pi(e) value the better differ-
ential power the edge has to identify a single subject i. The overall differential
power of an edge across all subjects is then defined by the differential power
measure (DP),
DP e P ei
i
( ) { ln( ( ))}= −∑
The results of these edgewise analyses are visualized in Figure 3a.
Identification using shorter time courses. Scan sessions differed in duration:
rest sessions were substantially longer than task sessions, and thus the discrepancy
in amount of data could at least partially account for the differences in identifi-
cation accuracy between rest-rest, rest-task and task-task pairs. To explore this
possibility, we tested frontoparietal-based identification between the two rest
sessions while varying the number of time points used to calculate connectivity
matrices between 100 and 1,100 in increments of 100. For each number of time
points n, identification accuracy was tested for each of 500 randomizations; these
randomizations were generated by choosing among 50 starting points for each of
the two rest runs and using n brain volumes beginning with that starting point
to calculate matrices. Results based on a database and target of Rest1 and Rest2,
respectively, are shown in Figure 3b; results based on reversing the database and
target were extremely similar.
Identification based on two-matrix database. We also tested a database design
option in which two matrices were included for each subject. The current design
of the database and target set makes identification challenging because the two
sets of data were not only acquired on separate days, but also under different con-
ditions (except for the Rest1 and Rest2 pair) with the brain engaged in a different
cognitive task. Inputting additional information about the difference between
task and rest could potentially improve identification performance. Therefore,
we created a database that included a connectivity matrix obtained from a
resting-state session and another matrix obtained from a task session acquired on
the same day: D = =[( , ), , , ]X X iiR iT 1 126… . For identification, we always required
that the target matrix be obtained on a different day. To predict the subject iden-
tity, we projected the current target matrix Yi to the subspace spanned by the pair
[ , ]X Xi
R
i
T , to obtain a projection Yi , and then computed the similarity between
Yi and Yi to find the best match (Fig. 3c).
Effects of parcellation scheme. To investigate the effect of brain parcellation
and network assignment on identification accuracy, we also computed correla-
tion matrices based on the 68-node FreeSurfer atlas included as part of the HCP
data. We used Yeo et al.’s previously published seven-network definition15 which
included the following networks: (i) visual, (ii) motor, (iii) pre-motor/parietal
(dorsal attention), (iv) ventral attention, (v) ventromedial prefrontal, (vi) fron-
tal-parietal control and (vii) default mode. These networks do not include sub-
cortical structures or cerebellum. Network labels were assigned to each node in
the FreeSurfer atlas as follows. Given a node, we counted the number of voxels
belonging to each network. The network with the largest number of voxels was
designated the primary network label for the given node. Because the 68-node
parcellation is created independently from the seven networks, their boundaries
do not align in a one-to-one manner. Therefore, we also allowed a secondary
network association for a node when the number of voxels in this network ranked
second and the proportion to the total number of voxels in the node exceeded
30%. All 68 nodes have at least one primary network association. When defining
a sub-matrix that represents a single network, we included nodes for which the
target network was either the primary or secondary label.
Comparisons of identification accuracy between the Shen and FreeSurfer and
Yeo schemes are shown in Figure 4.
Effects of head motion. In order to rule out the possibility that successful iden-
tification was driven simply by characteristic movement patterns leading to
predictable motion artifacts, we performed prediction using motion estimates
only. HCP data collection provides an estimate of frame-to-frame displacement
for each run (Movement_RelativeRMS.txt). These data were used to generate
a discrete motion distribution vector from each of the six relative RMS vectors
(from the six conditions, using data from the left-right phase encoding run).
We computed the mean and s.d. of the relative RMS across all conditions and
subjects. We then specified 60 bins that spanned the grand mean ± 3 s.d., and the
motion distribution vectors were calculated accordingly. The motion distribution
vectors were then used in the same way as the correlation matrices for individual
identity prediction purposes.
Effects of anatomic differences. Although connectivity calculations are based
on functional BOLD data, subtle effects of anatomic variability could potentially
confer a preference between the same subject on different days when it comes to
applying the 268-node parcellation, defined in standard space, to each individual
subject. To help rule out confounds of anatomy introduced at the registration step,
we recalculated connectivity matrices based on BOLD data smoothed with three
different kernel sizes (4 mm, 6 mm and 8 mm, keeping all other preprocessing
steps the same), and performed the identification analysis again; at higher levels
of smoothing, any registration advantage for the same brain relative to a different
brain should be eliminated or vastly reduced. The resulting identification accura-
cies are presented in Supplementary Table 2.
We also performed a second analysis to help rule out the possibility
that identification was driven mainly by anatomic rather than functional
differences between subjects. Rather than connectivity between pairs of nodes,
we tested whether individuals could be identified on the basis of a measure
of BOLD variance in each node (calculation described below). In principle,
although BOLD variance probably reflects baseline metabolic function to a
substantial degree, it could also be influenced by anatomic factors such as
partial volume effects introduced by the gray- and white-matter segmentation
and/or differing numbers of gray-matter voxels per node owing to underlying
variation in regional tissue volumes and gyral folding patterns. This analysis
helps to address these potential confounds. BOLD variance was calculated and
identification performed as follows:
1. Within each node, we computed the mean BOLD signal in each
frame. This yields an N × 268 matrix of node-wise mean BOLD intensities
for each subject for each condition, where N is the number of frames
(1,200 for the resting-state runs and fewer for the task runs). (This is
identical to the first step in calculating connectivity matrices. Note that
the mean across the time dimension is zero because of the drift removal
and band-pass steps.)
2. For each node, using its N × 1 time course vector, we compute its variance
as follows:
variance sum mean time course= ( )/2 N
This results in a single 1 × 268 vector of node-wise BOLD variances for
each brain state for each subject. In signal processing terms, this variance
is also known as mean energy.
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
nature neurOSCIenCe doi:10.1038/nn.4135
3. We then performed identification by computing the correlation between
these mean BOLD variance vectors across the different conditions (states)
and sessions (days).
The results of this analysis are presented in Supplementary Table 3.
Behavioral prediction. In the HCP protocol, fluid intelligence (gF) was assessed
using a form of Raven’s progressive matrices with 24 items23 (scores are inte-
gers indicating number of correct items; mean = 16.8, s.d. = 4.7, median = 18,
mode = 20, range 5–24; HCP: PMAT24_A_CR).
In light of evidence that head motion is a substantial confound in functional
connectivity analyses, we first checked for any correlation between head motion
and gF score. In the full sample of n = 126, this correlation was trending toward
significance (r = −0.15, P = 0.09). Upon further examination, we found that
this relationship was driven by a small number of individuals with both high
head motion and low gF score. Thus, for purposes of the behavioral analysis, we
excluded subjects with particularly high motion during the Rest1 run; specifically,
eight subjects with >0.14 frame-to-frame head motion estimate (averaged across
both day 1 rest runs; HCP: Movement_RelativeRMS_mean) were excluded. There
was no correlation between head motion and gF in the remaining set of n = 118
subjects (r = −0.05, P = 0.55).
Leave-one-subject-out cross-validation was used for the prediction analysis.
In this iterative analysis, features are selected and a predictive model is built
based on n − 1 subjects (the training set) and the model is then tested on the
remaining subject (the test set). Each subject is left out once. Each iteration
consisted of (i) feature selection, (ii) model building, and (iii) prediction,
described in turn below.
In the feature-selection step, Pearson correlation was performed between
each edge in the connectivity matrices and gF score across subjects in the
training set. (Note that it is not necessary to correct for multiple comparisons
in this step because the nature of a predictive analysis includes a built-in guard
against false positives: if the proportion of false positives in the feature-selection
step is high, the model should not generalize well to independent data.) On the
basis of the signs of the resulting correlation values, edges were separated into
two tails: those positively correlated with gF and those inversely correlated with
gF. Edges were then thresholded on the basis of the statistical significance of
their correlation with gF, resulting in two sets of features (positive and negative).
Because the choice of statistical threshold at this step is somewhat arbitrary,
a range of thresholds was tested (P < 0.01, 0.05, 0.10) to ensure that results
were consistent.
In the model-building step, we first defined a single-subject summary statistic,
‘network strength’, by summing values of all edges in each feature set in individual
connectivity matrices. In graph-theoretic terms, this statistic can be thought of
as a type of weighted degree for each feature network52. This summary statistic
can be represented as follows:
[ ] ( )
,
Positive feature network strength s ij ij
i j
c m= +∑
[ ] ( )
,
Negative feature network strength s ij ij
i j
c m= −∑
Where c is an individual s’s connectivity matrix, and m(+) and m(−) are binary
matrices indexing the edges (i,j) that are significantly positively or negatively
correlated with gF, respectively.
After obtaining network strength for each subject in the training set, simple
linear regression was used to model the relationship between network strength
(the explanatory variable) and gF (the dependent variable). Two models were
built: one based on strength in the positive-feature network and one based
on strength in the negative-feature network. Each model—positive and nega-
tive—consisted of a first-degree polynomial that fit the training data best in a
least-squares sense:
[ ] *( )Predicted gF score a Network strength bpos pos= +
[ ] *( )Predicted gF score a Network strength bneg neg= +
Finally, in the prediction step, positive and negative network strengths from
the excluded subject were calculated and input into each of the two respective
models to generate predicted gF scores for that subject.
These three steps were repeated iteratively such that each subject was excluded
once. Finally, we assessed predictive power of both models by correlating observed
versus predicted gF scores across all subjects (Fig. 5).
code availability. The 268-node functional parcellation is available online on the
BioImage Suite NITRC page (https://www.nitrc.org/frs/?group_id=51). Matlab
scripts were written to perform the analyses described; this code is available from
the authors upon request.
A Supplementary methods checklist is available.
50. Glasser, M.F. et al. The minimal preprocessing pipelines for the Human Connectome
Project. Neuroimage 80, 105–124 (2013).
51. Joshi, A. et al. Unified framework for development, deployment and robust testing
of neuroimaging algorithms. Neuroinformatics 9, 69–84 (2011).
52. Rubinov, M. & Sporns, O. Complex network measures of brain connectivity: uses
and interpretations. Neuroimage 52, 1059–1069 (2010).
np
g
©
20
15
N
at
ur
e A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468