System 124 (2024) 103381
Available online 13 June 2024
0346-251X/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Working memory and prior vocabulary knowledge in incidental
vocabulary learning from listening, reading,
reading-while-listening, and viewing captioned videos
Mark Feng Teng
Faculty of Languages and Translation, Macao Polytechnic University, Macau SAR, China
ARTICLE INFO
Keywords:
Incidental vocabulary learning
Retention
Listening
Reading
Reading while listening
Viewing captioned videos
Working memory
Prior vocabulary knowledge
ABSTRACT
This study explores how certain input modes (i.e., listening, reading, reading while listening, and
viewing captioned videos) affect incidental vocabulary learning in a foreign language context. It
also examines the roles of learners’ prior vocabulary knowledge and working memory in inci-
dental vocabulary learning using the examined input modes. A total of 150 EFL students at a
Chinese university were randomly and equally assigned to the four input modes, as well as a
control group only took tests. Forty-eight words were chosen as target words. Participants either
listened, read, read while listening to, or watched transcripts during viewing a documentary
video. Incidental vocabulary learning outcomes were assessed through a two-part vocabulary test
(i.e., form and meaning recognition). Mixed effects model results showed that incidental learning
and retention of form and meaning recognition were superior under the caption-viewing condi-
tion followed by the reading-while-listening, reading, and listening conditions. Findings also
revealed that prior vocabulary knowledge and working memory play distinct roles in incidental
learning and retention of form and meaning recognition for each input mode. Relevant impli-
cations for vocabulary instruction are provided.
1. Introduction
Increasing attention has been given to incidental vocabulary learning in a foreign language context. This type of vocabulary
learning is a by-product of meaning-focused activities (e.g., reading, listening, or viewing) for interest, information, and enjoyment
purposes (Webb, 2020). Scholars have begun exploring incidental vocabulary learning via multiple input modes, including listening,
reading, and viewing (Feng & Webb, 2020). This young line of research is contextualizing the amount of input needed and the potential
for incidental vocabulary learning.
Most work in this vein has concerned the role of reading input (e.g., Pellicer-S
́
anchez & Schmitt, 2010; Waring & Takaki, 2003). A
strong connection exists between reading and the incidental learning of word forms and meanings. Reading can engage learners while
helping them develop reading fluency as they consolidate prior lexical knowledge. Researchers are also interested in incidental vo-
cabulary learning from spoken input based on findings supporting the possibility of such learning from listening (e.g., van Zeeland &
Schmitt, 2013; Vidal, 2011). Brown et al. (2008) pointed out that listening yields smaller gains than reading and suggested an input
mode of reading while listening for incidental vocabulary learning; this mode is similarly effective for the incidental learning of
multiword items compared with reading or listening alone (Webb & Chang, 2022). Recent research has described the utility of viewing
E-mail address: [email protected].
Contents lists available at ScienceDirect
System
journal homepage: www.elsevier.com/locate/system
https://doi.org/10.1016/j.system.2024.103381
Received 12 July 2023; Received in revised form 22 April 2024; Accepted 12 June 2024
System 124 (2024) 103381
2
second language (L2) TV programs for incidental vocabulary learning (e.g., Peters & Webb, 2018). L2 program videos, featuring
multimodal input of print text and images, help learners cultivate the skills required to evaluate multimodal texts that use visuals for
vocabulary acquisition. To enhance video’s promise for incidental vocabulary learning, empirical investigations have also included
captions for L2 videos (e.g., ; Montero Perez et al., 2014; Teng, 2022). Studies on incidental vocabulary learning from viewing are
important, as watching TV is the preferred input mode for out-of-class L2 learning (Peters, 2018; Vanderplank, 2016). Studying
captions is crucial as well: they can help learners process and remember information for incidental vocabulary learning (Teng, 2021).
Dang et al. (2022) innovatively explored this type of learning through input modes such as listening, reading, reading while listening,
viewing, and viewing with captions. However, little is known about how individual differences in working memory (WM) and prior
vocabulary knowledge affect incidental vocabulary learning using different input sources.
Given the criticality of incidental vocabulary learning from reading, listening, reading while listening, and captions from the
perspectives of frequency and prior vocabulary knowledge (Teng, 2024), it is also essential to understand how learners’ individual
differences in working memory and prior vocabulary knowledge may affect this form of learning. One’s prior vocabulary knowledge
level may determine incidental vocabulary learning outcomes (e.g., Peters & Webb, 2018). WM, a key component in the ability to
maintain and rehearse information (Baddeley, 2000), may further shape one’s consolidation of lexical knowledge based on modes of
viewing input (Montero Perez, 2020). Therefore, we consider the impacts of different input modes (i.e., reading, listening, reading
while listening, and viewing captions) on incidental vocabulary learning. We also assess to what extent learners’ prior vocabulary
knowledge and WM influence such learning. Examining these two factors across input modes enriches the domain of incidental vo-
cabulary learning.
2. Literature review
2.1.Incidental vocabulary learning from reading
Reading provides rich contexts, exposure to, and interaction with vocabulary, leading to the possible incidental learning of un-
known words (Pellicer-S
́
anchez, 2017; Pellicer-S
́
anchez & Schmitt, 2010; Teng, 2020; Waring & Takaki, 2003; Webb, 2008). Waring
and Takaki (2003) initially tested incidental vocabulary learning from graded readers. Participants recognized and recalled the
meanings of 10.6 (42.4%) and 4.6 (18.4%) target words in a 26-word set. The delayed tests, which were administered three months
later, demonstrated a substantial decay trend. Frequency was deemed a core element of incidental vocabulary learning from reading.
Pellicer-S
́
anchez and Schmitt (2010) explored incidental vocabulary learning from reading a novel. Study participants progressed in
spelling, word class recall, meaning recognition, and meaning recall. Pellicer-S
́
anchez (2017) subsequently delved into the incidental
learning of collocational knowledge from reading. This learning was found to occur at a rate similar to learning single words. Teng
(2020) examined learners’ retention in recognizing and recalling word forms and meanings incidentally gained from reading. Findings
highlighted the power of glosses and repeated target word encounters in maximizing incidental vocabulary learning. Webb (2008)
evaluated frequency and contextual clues for incidental vocabulary learning from reading. The quality of the context appeared
important for acquiring meaning, whereas frequency tended to affect form learning. Contextual quality may explain “why gains in
knowledge of meaning have varied from word to word ... and study to study” (p. 238).
The above studies underline the role of reading input for incidental vocabulary learning. However, outcomes related to this style of
learning have been inconsistent. Discrepancies may be due to frequency, the tests employed, contextual quality, or learners’ vocab-
ulary knowledge. Incidental vocabulary learning from reading seems cumulative, and more effort is needed to identify how new words’
form–meaning links are incidentally learned through this task.
2.2.Incidental vocabulary learning from listening
Aural input provides learners with several types of knowledge required for language learning, including phonology, grammar, and
vocabulary. Researchers have paid growing attention to incidental vocabulary learning from spoken input. Vidal (2003) explored
incidental vocabulary learning through listening. Results from 116 university students showed that learners can achieve significantly
better vocabulary gains from doing so. Their performance on a delayed posttest, administered four weeks after the treatment, was
significantly better than on the pretest. van Zeeland and Schmitt (2013) also studied incidental vocabulary learning outcomes from
listening. Vocabulary learning was assessed using a dimensional approach spanning form, grammar, and meaning. Of these three
dimensions, 29.2% of cases (i.e., an average 7.05 out of 24 target items) were detected upon immediate posttest learning. Nineteen
percent of cases, or 4.56 target items, were identified on the delayed test. Participants primarily came to recognize words (followed by
grammar and finally word meaning) after listening. Jin and Webb (2020) more recently examined incidental vocabulary learning from
listening through a unique medium: teacher talk. Approximately 2.85 words (15.8%) and 2 words (12%) were known at the posttest
and delayed posttest, respectively. Listening to teacher talk can thus be a fruitful source of incidental vocabulary learning. Meanwhile,
consistent with van Zeeland and Schmitt (2013), input frequency did not significantly affect participants’ incidental vocabulary
learning from listening. Jin and Webb (2020) also reinforced the importance of explaining target word meanings in one’s first language
for incidental vocabulary learning from listening.
Overall, the above research implies the potential of listening to input for incidental vocabulary learning, including single words and
collocations. Yet participants’ learning gains were quite small—perhaps because of challenges in speech segmentation while listening.
For example, learners may struggle to balance demands for faster meaning processing of spoken words because it allows less time to
process linguistic information than reading input (van Zeeland & Schmitt, 2013). Moreover, the quality of context may affect
M.F. Teng
System 124 (2024) 103381
3
vocabulary meaning comprehension from listening more than reading. Aural input is nonetheless integral for optimizing incidental
vocabulary learning and warrants a closer look.
2.3.Incidental vocabulary learning from reading while listening
Along with the aforementioned types of incidental vocabulary acquisition, scholars have started to scrutinize reading while
listening to an audio recording. Some learners tend to break sentences into small, incoherent parts when reading. Reading while
listening can help learners retain sentence integrity, resulting in better comprehension. Webb and Chang (2012) explored vocabulary
learning through assisted and unassisted repeated reading. Eighty-two students read or read and listened to 28 short texts several
times. Reading while listening significantly influenced vocabulary learning. Chang (2011) specified the effect of reading while
listening to audiobooks. During a 26-week study, seven students voluntarily took part in the reading-while-listening treatment while
12 received the usual formal instruction (control group). The reading-while-listening group gained 17 marks, whereas learners in the
control group only gained four. The aural–written verification of reading while listening is particularly beneficial for incidental vo
-
cabulary acquisition among students learning English as a foreign language (EFL). In an empirical study, Webb and Chang’s (2014)
participants read and listen to the same graded readers in class and then worked on language activities with teacher involvement.
Students’ vocabulary knowledge increased significantly after the reading-while-listening treatment. For instance, this group learned
19.68 words on average from pre-to posttest, while the comparison group only learned 4.43 words. Webb et al. (2013) considered
participants’ incidental learning of collocations from reading while listening to a graded reader. Target words consisted of 1, 5, 10, and
15 encounters. Receptive and productive knowledge of collocations could be gained incidentally through reading while listening to a
graded reader; repeated encounters with target collocations positively affected participants’ incidental vocabulary acquisition.
In summary, studies indicate that reading while listening generates pronounced vocabulary learning outcomes. Brown et al. (2008)
justified the benefits of reading while listening as follows. First, this treatment might help learners segment information into mean-
ingful chunks, leading to effective vocabulary acquisition. Second, learners must read at the pace of the audio input when reading
while listening, and this pace is likely faster than students’ own. Third, reading while listening may help EFL students match a word’s
spoken and written forms to establish more robust auditory discrimination and word recognition.
2.4.Incidental vocabulary learning from viewing
Peters and Webb (2018) emphasized incidental vocabulary learning from viewing L2 TV programs. Upon controlling word- and
learner-related factors, findings conveyed the potential of viewing a long TV program for incidental vocabulary learning based on
meaning recall and meaning recognition. Word-related aspects (e.g., occurrence frequency and cognateness) and learner-related
features (e.g., prior vocabulary knowledge) partly predicted incidental vocabulary learning from viewing. No captioning group was
included. Captions, which were first used to facilitate video content comprehension among the deaf and hard of hearing (Vanderplank,
2016), are now gaining traction in a foreign language context. Captioned videos promote EFL students’ incidental vocabulary learning
because simultaneously presenting visual and verbal input stimulates information processing and recall. Incidental vocabulary
learning then becomes feasible (Teng, 2021). Several empirical studies have confirmed the role of captions in vocabulary learning. For
example, Peters et al. (2016) compared the use of captions and subtitles. Results from 31 secondary school EFL students showed that
captions led to significantly better outcomes than subtitles: participants in the captioning group achieved correct responses of 19.3%
for meaning recall and 48.2% for form recognition, whereas the subtitling group achieved 20.8% for meaning recall and 32.4% for
form recognition. Teng (2022) further verified captions’ utility for incidental vocabulary learning; participants in the captioning group
outperformed the non-captioning group in terms of word form and meaning recognition and recall. Learner-related factors, including
proficiency level and aptitude, may influence incidental vocabulary learning from captioned videos. Some scholars have compared
captioning conditions in this regard (Montero Perez et al., 2014, 2018), with positive results on the keyword captioning and full
captioning with highlighted keywords groups for meaning recognition (Montero Perez et al., 2014) and students in the glossed
keyword captions group scored best on the form recognition and meaning recall tests (Montero Perez et al., 2018). Recently, Teng
(2023a) supported the effectiveness of glossed captions for young learners’ incidental vocablary learning and Teng (2023b) suggested
the full captions and keyword captions made significant contributions to incidental learning and retention of form recognition and
incidental learning of meaning recall but not of delayed meaning recall. The effects of different captioning groups remain inconclusive,
possibly becasue of test modality and video genre (Teng, 2023c). However, captions do seem to play a part in vocabulary acquisition.
Despite the impacts of captions, Winke et al. (2010) contended that learners experience a split-attention effect when processing
verbal and nonverbal input. The merits of captioning are constrained by differences in script, vocabulary knowledge, and learners’
language proficiency. It is accordingly necessary to explore how individual differences affect captioning’s contributions to learners’
incidental vocabulary acquisition.
2.5.Comparing input modes for incidental vocabulary learning
Researchers have empirically compared input modes for incidental vocabulary learning. Vidal (2011) did so from listening and
reading, with incidental vocabulary learning from reading being superior. Learners need more repetitions while listening (e.g., at least
5–6) than while reading (e.g., 2–3) to achieve marked vocabulary gains. Teng (2018) compared reading and reading while listening for
incidental vocabulary learning. Participants who read while listening performed much better on four tests of vocabulary knowledge
than participants who only read. The tests measured several types of vocabulary knowledge in L2 students: form recognition, grammar
M.F. Teng
System 124 (2024) 103381
4
recognition, meaning recall, and collocation recognition. Webb and Chang (2012) compared reading and reading while listening as
well. Participants in the reading-while-listening condition acquired significantly more vocabulary knowledge incidentally compared
with their counterparts in the reading condition. Feng and Webb (2020) compared how listening, reading, and viewing a TV program
affected incidental vocabulary learning. While these three input modes indeed influenced such learning, no significant differences
were detected between them. Two other studies (Brown et al., 2008; Webb & Chang, 2022) also compared reading, listening, and
reading while listening and documented the great impact of reading while listening on incidental vocabulary learning, such as for
single words (Brown et al., 2008) and collocations (Webb & Chang, 2022). Dang et al. (2022) compared listening, reading, reading
while listening, viewing, and viewing with captions on incidental collocation learning. Reading, viewing, and viewing with captions
each led to evident learning of form recognition. Even so, these modes’ effectiveness did not vary significantly. Teng (2024) compared
reading, listening, reading while listening, and captioned viewing for incidental vocabulary learning. Results supported the pro-
nounced effects of captioned viewing while highlighting the effects of frequency and prior vocabualry knowledge.
In general, incidental vocabulary learning occurs via multiple input modes. Combining written and aural input might be partic-
ularly useful, but this assumption is tentative. Conflicting findings point to the need to better compare input modes’ potential for
incidental vocabulary learning.
2.6.Working memory (WM) and prior vocabulary knowledge
Individual differences (e.g., WM and prior vocabulary knowledge) must be accounted for when exploring incidental vocabulary
learning from input modes. Such differences highlight the need to consider learners’ cognitive abilities and linguistic background when
investigating the efficacy of captioned viewing for vocabulary acquisition. By considering factors such as WM capacity and prior
vocabulary knowledge, researchers can gain a deeper understanding of the underlying mechanisms and determine how to maximize
input exposure for different students.
WM crucially influences learners’ caption reading (Gass et al., 2019). It refers to one’s ability to briefly maintain and operate on a
limited amount of information while completing mentally demanding tasks (Wen et al., 2015). WM can be understood in light of
Baddeley and Hitch’s (1974) multicomponential model. This model asserts that WM has three components: (a) the central executive,
which directs attention, maintains task goals, makes decisions, and retrieves memory; (b) the phonological loop, which is responsible
for temporarily storing verbal information; and (c) the visuospatial sketchpad, which stores information in visual and spatial forms.
Baddeley (2000) added episodic buffer as the fourth component. This feature stores and integrates visual, spatial, and verbal infor-
mation; it also connects information with long-term memory. Malone (2018) verified WM’s role in form recognition outcomes from
reading while listening. Although a captioning group was not included, Montero Perez (2020) supported the potential of viewing a
documentary video in incidental vocabulary learning. In addition, participants’ prior vocabulary knowledge and complex WM posi-
tively correlated with incidental vocabulary learning from viewing. Teng and Zhang (2023) explored vocabulary learning using
multimodal input. They attended to phonological short-term memory and executive WM, two popular components of WM in L2
acquisition research. Both components influenced vocabulary learning. There were some recent studies that supported the role of WM
in incidental vocabulary learning in the captioned viewing context (Teng, 2023a–c; Teng & Cui, 2023), documenting the influence of
WM in either learning single words or collocations. However, WM’s impact on incidental vocabulary learning from caption viewing
stands to be confirmed.
Prior vocabulary knowledge is another aspect of interest. Horst et al. (1998) underscored its role in vocabulary learning from
reading. Later, Webb and Chang (2015) indicated its significance during extensive reading. Peters and Webb (2018) also argued for the
role of prior vocabulary knowledge when viewing L2 programs. Dang et al. (2022) explored this attribute via input modes including
reading, listening, reading while listening, viewing, and viewing with captions. Participants’ prior vocabulary knowledge did not
significantly contribute to their incidental learning of collocations. Puim
`
ege and Peters (2020) noted that prior vocabulary knowledge
influenced vocabulary learning from viewing L2 TV programs. In addition, Teng and Mizumoto (2023) highlighted that depth of
vocabulary knowledge can make a unique contribution to the prediction of incidental vocabulary learning at the form and meaning
recognition level, in addition to the prediction afforded by the breadth of vocabulary knowledge. These inconclusive findings may be
attributable to input modes’ characteristics, hence the need for greater scrutiny.
3. The present study
The present study tested students’ incidental learning of form and meaning recognition across control, listening, reading, reading-
while-listening, and captioned video viewing groups. The present study also examined prior vocabulary knowledge and WM in
incidental vocabulary learning gains. Two research questions were addressed.
1. Do different input modes lead to incidental learning of single words? If so, to what extent?
2. What relationships exist between incidental vocabulary learning through different input modes, prior vocabulary knowledge, and
working memory?
M.F. Teng
System 124 (2024) 103381
5
4. Methods
4.1.Participants
The study sample consisted of 150 students at a university in China. All participants were English majors, but English was learned as
a foreign language (EFL). Their ages ranged from 18.1 to 19.8 (M =19.1, SD =1.01), and the students were from six classes. They were
gathered and then equally and randomly assigned to one of five conditions (i.e., listening, reading, reading while listening, caption
viewing, and a control group that only took the tests). Their first language was Mandarin Chinese, and all were EFL students. The
participants described themselves as intermediate English learners (e.g., the B1–B2 level based on the Common European Framework
of Reference for Languages).
Participants signed a consent form prior to joining the study voluntarily. They were briefly told they would need to watch, listen to,
or read a text or video and complete some exercises. The study’s true purpose, namely to test incidental vocabulary learning from
different input modes, was disclosed after the study. The vocabulary tests thus came as a surprise to students and reflected incidental
vocabulary learning. Each participant received a supermarket coupon worth 50 Chinese Yuan as a token of gratitude. No participants
withdrew from the study.
4.2.Video selection
The chosen video was a documentary titled Ancient World available on YouTube (https://www.youtube.com/watch?v=Ml7lgPw-
X3E). This video was selected based on several criteria, following previous studies (e.g., Montero Perez et al., 2018). First, it needed to
be appealing enough to maintain participants’ interest. Second, its language dimensions (e.g., lexical coverage and speed of dialogue)
had to be suitable for students. Third, the video needed to contain some words with which learners were unlikely to be familiar. Ancient
World described the top 10 enigmas of the ancient world. A pilot group of 10 English majors chose this topic after watching the video
and finding it interesting and suitable for L2 learning. The video runs for 1 h, 6 min, and 27 s. Its spoken and written languages align.
We used VocabProfile (https://www.lextutor.ca/) to determine its lexical profile. The script contained 8039 running words. The
1000-, 2000-, 3000-, and 4000-word families covered 73.67%, 81.20%, 89.01%, and 95.8% of all running words in the script,
respectively. Following the cut-off point of mastery (24/30; Hu & Nation, 2000), updated Vocabulary Levels Test (VLT) results (see
Results section) showed that participants had reached the 4000-word level. Thus, the target learners could understand this video.
4.3.Target words
We took 48 words as test items based on VocabProfile. Approximately 56% of the target words (nouns, verbs, and adjectives) were
beyond the 3000-word level. All test items occurred only once (see Table 1).
Table 1
Target words.
Items Frequency Item Frequency
Zigzag 1 Meddled 1
Worship 1 Incarnation 1
Withstand 1 Illusion 1
Vicinity 1 Fierce 1
Verdict 1 Fascinating 1
Unreinforced 1 Explosion 1
Tribute 1 Execution 1
Thunderbolt 1 Erosion 1
Testament 1 Equivalent 1
Symbolic 1 Magnificent 1
Staggering 1 Enslaved 1
Speculate 1 Elaborate 1
Sophisticated 1 Distraction 1
Shipwreck 1 Depiction 1
Sculptor 1 Decipher 1
Screw 1 Crushed 1
Sacrilege 1 Companion 1
Revolt 1 Combat 1
Resilient 1 Coalition 1
Renaissance 1 Brutal 1
Reenactment 1 Besiege 1
Ransack 1 Alignment 1
Perplexing 1 Accuse 1
Mock 1 Acoustic 1
M.F. Teng
System 124 (2024) 103381
6
4.4.Learner-related factors
We explored incidental vocabulary learning through reading, listening, reading while listening, and viewing captioned videos. We
also considered the roles of prior vocabulary knowledge and WM in participants’ incidental vocabulary learning.
4.4.1.Prior vocabulary knowledge
Prior vocabulary knowledge was evaluated via the updated VLT (Webb et al., 2017), which has a paper-and-pencil format. This test
mainly concerns receptive vocabulary knowledge across the 1000-, 2000-, 3000-, 4000-, and 5000-word levels. Each level includes 30
test items. The full test contains 150 items worth one point each. Test takers must match each definition with the word it defines. The
measure’s Cronbach’s alpha value was 0.91, indicating sound item reliability.
4.4.2.WM
The assessment of WM was based on a reading span task (RST) adapted from Daneman and Carpenter (1980) and van den Noort
et al. (2008). An RST is a verbal memory test often used to examine WM, cognitive processing, and reading comprehension. The
participants were required to read a series of unconnected sentences aloud and judge whether each sentence made sense. This section
captured the processing element of WM. Participants were also instructed to recall the end-of-sentence words in their original order at
the end of a series, representing the storage element of WM. This RST therefore served as a complex verbal WM test. The number of
sentences in a series increased incrementally. A sentence–word sequence is called a “set size.” Each trial included 3–7 set sizes, totaling
80 target words and 80 sentences. The words to be recalled were unrelated. All sentences to be processed were in participants’ first
language of Mandarin Chinese. This parameter minimized the potential for individual differences in language proficiency and reading
comprehension to counterbalance the results. Half of the sentences were plausible and half were not.
We followed Conway et al. (2005) in granting partial credit during scoring. For example, each item recalled in the correct order was
awarded one point, even though participants could not remember all items in the trial. As in Unsworth et al. (2005), we set an 85%
accuracy criterion: only when the accuracy rate of sentence judgment reached this threshold were the items in that trial calculated.
This per-trial accuracy criterion mitigated the possibility that participants might sacrifice their sentence judgment to deliberately
memorize the target words. This test was administered through E-prime. Its Cronbach’s alpha value was 0.86, indicating good
reliability.
4.5.Vocabulary test and scoring
Incidental vocabulary learning was measured with a two-part test (i.e., form and meaning recognition) in paper-and-pencil format.
It was administered via a pretest, immediate posttest, and delayed posttest. Each test included a different set of 20 high-frequency
words within the 1000 word level from the BNC/COCA word list (Nation, 2017). The aim was to encourage participants to focus
on the assessment. The added words were not scored. This test included written and aural input so that a specific mode of exposure was
not favored.
The form recognition measure was based on the yes/no EFL vocabulary test (Meara, 1992). Participants had to check off whether
they recognized a word after hearing it read twice by a native English speaker. The meaning recognition test contained four response
options: the correct meaning, three distractors, and an “I don’t know this word” option. The “I don’t know” option was meant to reduce
wild guessing. Participants were required to choose one option after hearing the target word. Table 2 presents sample items for each
test section.
The test took 40 min. Each correct answer was given 1 point, each incorrect answer was given 0 points, and the maximum score on
each section was 48 points. The Cronbach’s alpha value was 0.92, demonstrating strong reliability.
4.6.Procedure
This study was completed over three sessions. The first session involved a pretest and a VLT completed during the first week. The
second session occurred two weeks later when participants were gathered and then equally and randomly assigned to a group. They
also completed the informed consent form at that time. Each experimental treatment took approximately 1 h and 50 min. All par
-
ticipants used a separate computer. Learners in the reading group read the text online at a similar pace as in other conditions. Learners
in the listening group listened to the text without visual support. Learners in the reading-while-listening group read the text online with
audio support. Learners in the viewing group watched the captioned video. The presentation pace for the three groups was similar.
Table 2
Sample items for the vocabulary test.
Form recognition test Meaning recognition test
Items Have you ever heard of the word? Do you recognize the word? If you are
sure, please choose “yes”. If you are not sure, choose “no”.
Please choose the appropriate meaning for each word you have heard. If
you are not sure, please choose the “I don’t know” option.
Zigzag * Yes * No a. 开心前进 b. 曲折行进c. 照顾 d. 恳求e.我不知道
Worship * Yes * No a.崇拜 b. 出名 c.高尚 d. 愤怒 e我不知道
Withstand * Yes * No a. 站住 b. 一起 c. 抵挡 d. 煎熬e.我不知道
M.F. Teng
System 124 (2024) 103381
7
Participants were told to focus on content comprehension and were allowed to take notes during the treatment. Learners in the control
group only took the test. The third session, the delayed test, took place two weeks after the second session.
4.7.Data analysis
The Kolmogorov–Smirnov test of normality showed reasonable normality (p >0.05 in all cases). The first question pertained to how
different input modes lead to greater increases in incidental vocabulary learning. Linear mixed effects models were performed using
the lme4 package (Bates & Maechler, 2010) in the R language and environment (R Development Core Team, 2009). Mixed effects
models are preferable to analyses of variance (ANOVA) and covariance because they include time (pretest, immediate posttest, and
delayed posttest) and groups (control vs. experimental) in a single model. These models also account for potential variance due to
individual differences through random effects. The fixed effects consisted of group, time, and the interaction between group and time;
the random effects were participants. Group and time were categorical variables. The control group acted as the reference group, and
the pretest was the reference group for time. The variance inflation factors for Time and Group were approximately 1.0. Multi-
collinearity was thus not a problem.
The second question referred to the extent to which prior vocabulary knowledge and WM explain incidental vocabulary learning
outcomes for each input mode. A separate logistic regression was conducted for the immediate posttest and the delayed posttest. The
analysis was based on the number of cases instead of total test scores or overall learning gains per participant. The odds ratio was
calculated to predict the odds of a correct response. Prior vocabulary knowledge and WM were entered into the model as predictors.
5. Results
5.1.Question 1: Incidental vocabulary learning across input modes
To begin, descriptive statistics were compiled for the baseline test (prior vocabulary knowledge and WM) and inferential analyses
(i.e., ANOVA) were performed to determine whether groups’ test outcomes were different (Table 3).
Table 3 indicates variations in participants’ prior vocabulary knowledge and WM. The individual differences between groups were
not significant (p >0.05). Figs. 1 and 2 present similar graphical results for VLT and WM.
Table 4 displays statistics for the form and meaning recognition test. In all cases, the experimental groups’ mean scores increased
from the pretest to the immediate posttest. A loss in form and meaning recognition scores was detected on the delayed posttest.
Table 5 reflects the first mixed effects model comparing form recognition scores across the five groups and three test times.
A significant main effect for the listening, reading, reading-while-listening, and viewing groups on form recognition was noted (p <
0.001). A significant main effect was not found for time (p >0.05). A significant group-by-time (posttest) interaction for all experi-
mental groups was noticed (p <0.001). The results further showed a significant group-by-time (delayed test) interaction for the
reading, reading-while-listening, and viewing groups (p <0.001). Fig. 3 illustrates that time and group interaction significantly
influenced word form recognition. Taking the control group and pretest as reference points, all experimental groups demonstrated
better performance in word form recognition on the posttest. The reading, reading-while-listening, and viewing groups performed best
on the delayed test.
A series of pairwise comparison tests based on the emmeans package in R (Lenth, 2019) was conducted. Bonferroni adjustment was
adopted for multiple comparisons. For each group, the estimate of the mean of the pretest scores was lower than for the posttest scores
(p <0.001). The results identified significant differences between the estimate of the mean of the immediate posttest scores and the
delayed posttest scores for the reading, reading-while-listening, and viewing groups (p <0.05). No significant differences appeared
between the estimate of the mean of the pretest scores for all five groups (p >0.05); that is, the groups possessed similar knowledge of
the target words at the form recognition level prior to treatment. The viewing group earned significantly higher scores than the other
groups on both the immediate posttest (p <0.001) and the delayed posttest (p <0.001). The viewing group showed the most pro-
nounced vocabulary learning at the form recognition level.
Table 6 lists the mixed effects model for comparing meaning recognition across the five groups over three test times.
Findings revealed significant main effects of the listening, reading, reading-while-listening, and viewing groups on meaning
recognition (p <0.001). A significant main effect did not emerge for time (p >0.05). The results also showed a significant group-by-
time (posttest) interaction for all experimental groups (p <0.001). Only a significant group-by-time (delayed test) interaction applied
Table 3
Results for VLT and WM.
Group VLT WM
M Std. M Std.
Control 105.73 17.02 43 6.76
Listening 110.67 20.26 44.83 8.3
Reading 109.37 19.2 45.87 13.66
Reading while listening 105.8 20.33 42.87 8.39
Caption viewing 111.5 19.34 49.09 13.68
F =0.593, p =0.669 F =1.741, p =0.144
M.F. Teng
System 124 (2024) 103381
8
to the viewing group (p <0.001). As Fig. 4 shows, the group and time interaction significantly affected word meaning recognition.
When the control group and pretest were taken as references, all experimental groups performed better in word meaning recognition
on the posttest. The viewing group exhibited the best performance on the delayed test.
A series of pairwise comparison tests were also conducted. The estimate of the mean of the pretest scores for each group was always
lower than that of the posttest scores (p <0.001). Significant differences between these estimates for immediate posttest scores and
delayed posttest scores only applied to the viewing group (p <0.05). No group’s estimate of the mean of the pretest scores varied
Fig. 1.Graphical results for VLT.
Fig. 2.Graphical results for WM.
Table 4
Descriptive statistics for dependent variables.
Form recognition Meaning recognition
Pre Post Delayed Pre Post Delayed
M Std. M Std. M Std. M Std. M Std. M Std.
Control 4 2.07 3.4 1.96 3.37 1.83 3.97 1.87 3.53 2.05 3.43 1.92
Listening 5.93 1.8 13.73 4.88 6.17 2.94 5.9 1.73 11.13 3.59 4.9 2.31
Reading 6.73 1.7 23.23 10.42 12.4 6.77 6.63 1.65 14.73 6.83 7.9 3.71
Reading while listening 7.33 2.2 23.43 7.9 12.17 5.85 7.33 2.23 16.17 4.73 8.37 3.02
Caption viewing 9.7 2.6 32.83 8.64 20.03 7.32 9.57 1.57 22.27 5.58 14.1 3.82
M.F. Teng
System 124 (2024) 103381
9
significantly (p >0.05); put simply, all groups possessed similar knowledge of the target words at the meaning recognition level prior
to treatment. The viewing group earned significantly higher scores than other groups on the immediate posttest (p <0.001) and the
delayed posttest (p <0.001). The viewing group also displayed the most pronounced vocabulary learning at the meaning recognition
level.
5.2.Question 2: Relationships between incidental vocabulary learning through different input modes and prior vocabulary knowledge and
working memory
The second question concerned how prior vocabulary knowledge and WM contribute to incidental vocabulary learning by input
Table 5
Comparisons of form recognition for the five groups over three test times.
Estimate Std. Error z Value Pr (>|z|)
(Intercept) 1.38629 0.09128 15.187 <2.00E-16 ***
Group (listening) 0.39429 0.11811 3.338 0.000843 ***
Group (reading) 0.52078 0.11525 4.519 6.23E-06 ***
Group (listening +reading) 0.60614 0.11348 5.341 9.23E-08 ***
Group (viewing) 0.88583 0.10849 8.165 3.20E-16 ***
Time (post) 0.16252 0.13467 1.207 0.227499
Time (delay) 0.17237 0.13503 1.277 0.201758
Group (listening) ╳Time (post) 1.00176 0.1618 6.191 5.97E-10 ***
Group (reading) ╳Time (post) 1.40104 0.15659 8.947 <2.00E-16 ***
Group (listening +reading) ╳Time (post) 1.32425 0.15525 8.53 <2.00E-16 ***
Group (viewing) ╳Time (post) 1.38184 0.15029 9.195 <2.00E-16 ***
Group (listening)╳Time (delay) 0.21094 0.17104 1.233 0.217473
Group (reading) ╳Time (delay) 0.783 0.16084 4.868 1.13E-06 ***
Group (listening +reading) ╳Time (delay) 0.67864 0.15974 4.248 2.15E-05 ***
Group (viewing)╳Time (delay) 0.89764 0.15275 5.877 4.19E-09 ***
Fig. 3.Time and group interaction for form recognition.
Table 6
Comparisons of meaning recognition for the five groups over three test times.
Estimate Std. Error z Value Pr (>|z|)
(Intercept) 1.37793 0.09167 15.031 <2.00E-16 ***
Group (listening) 0.39703 0.11855 3.349 0.000811 ***
Group (reading) 0.51418 0.11588 4.437 9.12E-06 ***
Group (listening +reading) 0.6145 0.11379 5.4 6.66E-08 ***
Group (viewing) 0.88036 0.10903 8.074 6.78E-16 ***
Time (post) 0.11568 0.13356 0.866 0.386389
Time (delay) 0.14439 0.13458 1.073 0.283308
Group (listening) ╳Time (post) 0.75068 0.16273 4.613 3.97E-06 ***
Group (reading) ╳Time (post) 0.91369 0.15851 5.764 8.20E-09 ***
Group (listening +reading) ╳Time (post) 0.90621 0.15635 5.796 6.79E-09 ***
Group (viewing) ╳Time (post) 0.96049 0.15106 6.358 2.04E-10 ***
Group (listening)╳Time (delay) 0.04132 0.17483 0.236 0.813151
Group (reading) ╳Time (delay) 0.31915 0.1654 1.93 0.053659 .
Group (listening +reading) ╳Time (delay) 0.27622 0.16322 1.692 0.090591 .
Group (viewing)╳Time (delay) 0.53228 0.15479 3.439 0.000584 ***
M.F. Teng
System 124 (2024) 103381
10
mode. Regarding form recognition, Table 7 indicates that prior vocabulary knowledge made a significant difference to the model for
the delayed posttest (p <0.001) but not for the immediate posttest (p >0.05) in the caption-viewing condition. The odds ratios (Exp
[B]) revealed that, when participants’ prior vocabulary knowledge increased by one unit, the odds of a correct response on the delayed
posttest rose by 9.6%. Prior vocabulary knowledge significantly contributed to the model for the delayed posttest (p <0.05) but not for
the immediate posttest (p >0.05) in the reading-while-listening condition. The odds ratio values (Exp [B]) demonstrated that a one-
unit increase in learners’ prior vocabulary knowledge raised the odds of a correct response on the delayed posttest by 7.4%. Prior
vocabulary knowledge significantly contributed to the model for the immediate posttest (p <0.05) and delayed posttest (p <0.05) in
the reading condition. As evidenced by the odds ratios (Exp [B]), as participants’ prior vocabulary knowledge increased by one unit,
the odds of a correct response on the immediate and delayed posttests improved by 10.1% and 9.7%, respectively. Prior vocabulary
knowledge significantly contributed to the model for the immediate posttest (p <0.001) and delayed posttest (p <0.001) in the
listening condition. The odds ratio values (Exp [B]) revealed that a one-unit rise in learners’ prior vocabulary knowledge caused the
odds of a correct response on the immediate and delayed posttests to increase by 10.2% and 10.3%, respectively. WM also significantly
contributed to the model for the immediate posttest and delayed posttest. Such results were consistent across all input modes with the
exception of the immediate posttest in the reading-while-listening condition (p >0.05).
Table 8 presents results for meaning recognition. Prior vocabulary knowledge significantly contributed to the model for the delayed
posttest (p <0.001) but not for the immediate posttest (p >0.05) in the caption-viewing condition. The odds ratios (Exp [B]) showed
that as learners’ prior vocabulary knowledge increased by one unit, the odds of a correct response in the delayed posttest increased by
10.5%. Prior vocabulary knowledge significantly contributed to the model for the immediate posttest (p <0.05) and delayed posttest
(p <0.001) in the reading-while-listening condition. The odds ratio values (Exp (B]) confirmed that a one-unit rise in participants’
prior vocabulary knowledge led the odds of a correct response on the immediate and delayed posttest to improve by 8.1% and 6%,
respectively. Prior vocabulary knowledge significantly contributed to the model for the immediate posttest (p <0.05) but not the
delayed posttest (p >0.05) in the reading condition. Odds ratios (Exp (B]) revealed that, as learners’ prior vocabulary knowledge
increased by one unit, the odds of a correct response on the immediate posttest increased by 10.2%. Prior vocabulary knowledge
significantly contributed to the model for the immediate posttest (p <0.05) but not the delayed posttest (p >0.05) in the listening
condition. The odds ratio values (Exp (B]) showed that as participants’ prior vocabulary knowledge rose by one unit, the odds of a
correct response on the immediate posttest rose by 10.1%. WM significantly contributed to the model for the immediate posttest and
Fig. 4.Time and group interaction for meaning recognition.
Table 7
Logistic regression for form recognition.
B S.E. Wald df Sig. Exp(B) 95% C.I. EXP(B)
Lower Upper
Caption viewing Immediate posttest VLT 0.002 0.007 0.091 1 0.763 0.998 0.984 1.012
WM 0.064 0.011 35.672 1 0.000 1.066 1.044 1.088
Delayed posttest VLT 0.034 0.007 20.480 1 0.000 0.967 0.953 0.981
WM 0.087 0.011 57.559 1 0.000 1.091 1.067 1.116
Reading while listening Immediate posttest VLT 0.094 0.074 1.611 1 0.204 0.910 0.787 1.053
WM 0.290 0.180 2.596 1 0.107 1.336 0.939 1.902
Delayed posttest VLT 0.295 0.096 9.500 1 0.002 0.745 0.618 0.898
WM 0.791 0.233 11.586 1 0.001 2.207 1.399 3.480
Reading Immediate posttest VLT 0.015 0.007 4.223 1 0.040 1.015 1.001 1.030
WM 0.043 0.011 16.805 1 0.000 1.044 1.023 1.066
Delayed posttest VLT 0.025 0.009 8.450 1 0.004 0.976 0.959 0.992
WM 0.082 0.013 42.805 1 0.000 1.085 1.059 1.112
Listening Immediate posttest VLT 0.023 0.004 36.020 1 0.000 1.023 1.015 1.030
WM 0.023 0.009 6.844 1 0.009 1.023 1.006 1.041
Delayed posttest VLT 0.035 0.009 16.562 1 0.000 1.036 1.018 1.053
WM 0.060 0.019 10.290 1 0.001 1.062 1.024 1.102
M.F. Teng
System 124 (2024) 103381
11
delayed posttest as well. These outcomes were consistent across all input modes.
6. Discussion
The discussion centers on incidental vocabulary learning across various input modes. The focus was then turned to the roles of WM
and prior vocabulary knowledge. Moreover, by comparing the findings with those of previous studies, I derive new arguments and
possible research ideas.
6.1.Incidental vocabulary learning across input modes
The first research question considered certain input modes’ potential to promote incidental vocabulary learning via form and
meaning recognition. The results identified listening to spoken input as a source of such learning: participants exhibited greater word
form knowledge of about 13.73 (28.6%) of the new words and word meaning knowledge of approximately 11.13 (23.18%) new words
immediately after listening. These encouraging results were similar to those of van Zeeland and Schmitt (2013). As in earlier work on
incidental vocabulary learning from listening (Jin & Webb, 2020; Vidal, 2003), listening to spoken input is essential to building an
initial form–meaning link. Second, the findings provide evidence for the power of reading in incidental vocabulary learning. Such
results were not surprising considering that previous studies have supported this role (e.g., Horst, 2005; Horst et al., 1998; Pelli-
cer-S
́
anchez & Schmitt, 2010; Pigada & Schmitt, 2006; Teng, 2020; Waring & Takaki, 2003; Webb, 2008). Incidental vocabulary
learning at the form and meaning recognition level appears to occur through reading, which is encouraging and expected. Being able to
choose a word’s meaning from a list of plausible choices (as in a multiple-choice test) shows that at least some knowledge of form and
meaning has been retained. The capacity to recognize which words occurred in the text and which did not indicates that learners
showed some familiarity with word form. This step is important in incidental vocabulary learning: managing to recognize a word’s
form from reading is a substantial step.
Third, encountering words during reading while listening contributed to participants’ incidental vocabulary learning. Other studies
have come to similar conclusions regarding single words (Teng, 2018; Webb & Chang, 2012, 2014) and collocations (Webb et al.,
2013). The participants in the present study made sizeable gains in receptive knowledge of the form–meaning link through reading and
listening to transcripts. The participants’ scores increased by 16.1 words from 7.33 to 23.43 for form recognition and 8.84 words from
7.33 to 16.17 for meaning recognition. The size of these gains contrasts with the relatively small ones identified in prior research on
reading (e.g., Horst et al., 1998; Waring & Takaki, 2003). Reading while listening provides greater opportunities to consolidate one’s
knowledge of unknown and partially known words.
Finally, the results (e.g., recognition of word form knowledge =68.3% and word meaning knowledge =46.39% at the immediate
posttest) highlight the role of viewing captioned videos in incidental vocabulary learning (Montero Perez et al., 2014, 2018; Teng,
2022). This benefit of watching L2 programs is in line with studies that did not include a captioning group (Peters & Webb, 2018). We
have thus confirmed the use of L2 captioned videos as a preferred input mode for incidental vocabulary learning (Teng, 2021; Teng,
2023a–c; Teng & Cui, 2023). It seems that these videos (i.e., combining verbal and visual mental representations) help learners
organize and store information in WM in addition to activating prior knowledge. Incidental vocabulary acquisition therefore increases.
The identified incidental vocabulary learning and retention profile can be summarized as follows: caption viewing >reading while
listening >reading >listening >control. These results remained consistent across the form and meaning recognition tests, mirroring
studies aiming to depict acquisition profiles across input modes. For instance, reading is preferable to listening for incidental vo-
cabulary learning (Vidal, 2011), reading while listening is better than reading only (Teng, 2018), and reading while listening is more
effective than both reading and listening for incidentally learning single words (Brown et al., 2008) and collocations (Webb & Chang,
Table 8
Logistic regression for meaning recognition.
B S.E. Wald df Sig. Exp(B) 95% C.I. EXP(B)
Lower Upper
Caption viewing Immediate posttest VLT 0.005 0.007 0.442 1 0.506 0.995 0.981 1.009
WM 0.050 0.011 22.271 1 0.000 1.051 1.030 1.073
Delayed posttest VLT 0.023 0.009 6.629 1 0.010 0.978 0.961 0.995
WM 0.077 0.013 34.924 1 0.000 1.080 1.053 1.107
Reading while listening Immediate posttest VLT 0.208 0.083 6.320 1 0.012 0.813 0.691 0.955
WM 0.552 0.200 7.579 1 0.006 1.736 1.172 2.572
Delayed posttest VLT 0.504 0.134 14.179 1 0.000 0.604 0.465 0.785
WM 1.284 0.325 15.555 1 0.000 3.610 1.907 6.832
Reading Immediate posttest VLT 0.023 0.008 7.390 1 0.007 1.023 1.006 1.040
WM 0.025 0.012 4.522 1 0.033 1.025 1.002 1.048
Delayed posttest VLT 0.008 0.012 0.397 1 0.529 1.008 0.984 1.032
WM 0.054 0.016 11.040 1 0.001 1.056 1.022 1.090
Listening Immediate posttest VLT 0.015 0.004 11.770 1 0.001 1.015 1.006 1.023
WM 0.027 0.010 7.205 1 0.007 1.028 1.007 1.048
Delayed posttest VLT 0.002 0.019 0.008 1 0.927 0.998 0.961 1.037
WM 0.151 0.057 7.109 1 0.008 1.163 1.041 1.299
M.F. Teng
System 124 (2024) 103381
12
2022). Consistent with Teng (2024), captioned viewing yielded better incidental vocabulary learning performance than
reading-while-listening, followed by it were reading and listening. Dang et al. (2022) found that reading, viewing, and viewing with
captions each yielded significant differences on an immediate form recognition test; however, the groups’ performance on the delayed
posttest did not vary significantly. The present results partially echo these outcomes but do not support Feng and Webb’s (2020) lack of
significant differences between three input modes (e.g., viewing without captions, reading, and listening) for incidental vocabulary
learning. Several arguments can be put forth based on these inconsistencies. First, EFL learners rely more on written than spoken input
when processing information for incidental vocabulary learning. Second, reading while listening reinforces the benefits of navigating
demands to comprehend content solely through spoken input. Finally, compared with reading while listening, on-screen text in
captioned videos may better help learners understand L2 programs. Incidental vocabulary learning can then be maximized.
6.2.Prior vocabulary knowledge and incidental vocabulary learning
The findings imply that prior vocabulary knowledge plays distinct roles in form and meaning recognition for each input mode.
These outcomes somewhat support research revealing significant relationships between prior vocabulary knowledge and incidental
vocabulary learning through reading (Horst et al., 1998), listening (Vidal, 2011), viewing L2 TV programs (Peters & Webb, 2018), and
captioned viewing (Teng & Mizumoto, 2023). Feng and Webb (2020) identified a partial effect of prior vocabulary knowledge on
incidental vocabulary learning (e.g., they noticed a significant correlation between the two for reading and viewing but not for
listening). In the present study, prior vocabulary knowledge partially predicted incidental vocabulary learning performance in the
caption-viewing, reading-while-listening, reading, and listening conditions. Exceptions included the immediate meaning recognition
posttest in the caption-viewing condition, the delayed meaning recognition posttest in the reading and listening conditions, the im-
mediate form recognition posttest in the caption-viewing condition, and the immediate form recognition posttest in the
reading-while-listening condition. According to Feng and Webb (2020), prior vocabulary knowledge may not have significantly
influenced incidental vocabulary learning from listening because a written test measuring such knowledge might not reflect students’
familiarity with the spoken form–meaning link. I argue that EFL learners’ prior vocabulary knowledge could help them segment the
amount of connected speech and the speech rate in spoken input, but immediate and delayed posttests may pose different demands.
Dang et al. (2022) assessed how prior vocabulary knowledge affects incidental learning of collocations after watching an academic
lecture. This knowledge did not significantly contribute to participants’ incidental learning of single words and collocations. Different
from Dang et al. (2022), Teng and Cui (2023) argued for the importance of prior vocabulary knowledge in learning single words and
collocations. Peters and Webb (2018) also underlined the importance of prior vocabulary knowledge in incidental vocabulary learning
from viewing L2 TV programs. Puim
`
ege and Peters (2020) found similar results for learning collocations. In the present study, this type
of knowledge significantly contributed to delayed posttest scores in form and meaning recognition under the caption-viewing con-
dition. These contradictory findings could be attributed to several factors. First, the present study and that by Peters and Webb (2018)
examined nonacademic genres (e.g., L2 TV programs), whereas Dang et al. (2022) considered an academic genre. Learners may need
specialized vocabulary to follow academic lectures. The updated VLT did not include specialized terms but tapped into knowledge of
form–meaning links for words occurring between the 1000-word and 5000-word levels. Second, Dang et al. (2022) explored the effects
of prior knowledge on recognizing collocations. The present study and Peters and Webb (2018) investigated the form and meaning of
single words.
6.3.WM and incidental vocabulary learning
The results determined that WM significantly affected incidental vocabulary learning via different input modes apart from the
immediate form recognition test in the reading-while-listening condition. Overall, WM was crucial for incidental vocabulary learning
in all input conditions. Gass et al. (2019) stated that WM’s impact on captioned video comprehension varied between learners (e.g.,
Spanish L2 learners’ WM capacity did not influence comprehension; this capacity had a moderate effect on ESL learners). The authors
contended that although participants used captions irrespective of WM capacity, caption use partly depended on learners’ WM ca-
pacities and L2 proficiency levels. WM’s significant effects on incidental vocabulary learning from listening, reading, reading while
listening, and caption viewing can be rationalized thusly: WM assessment is a verbal memory task that involves learners’ storage and
processing ability while reading. Such tasks may be more directly related to people’s language proficiency and reading comprehension.
Therefore, individual differences in WM account for variation in incidental vocabulary learning from the examined input modes.
Montero Perez (2020) also documented WM’s role in incidental vocabulary learning from viewing a documentary. Complex WM
positively correlated with incidental vocabulary learning from viewing. The chosen RST in the present study measured complex WM as
well, wherein learners were expected to hold information (i.e., end-of-sentence words) in phonological memory while manipulating it.
This task calls for applying background information to judge whether a sentence is plausible. Participants who performed better on the
complex WM tests were more likely to score higher on incidental vocabulary learning from the examined input modes because their
higher complex WM scores presumably correlated with a “greater ability to focus, divide, and switch attention among various task
demands” (R
́
ev
́
esz, 2012, p. 123). Complex WM tasks, which measure one’s capacity to store and manipulate information in memory,
appear essential to allocating one’s attentional resources to input for incidental vocabulary learning. People can then notice word
forms and infer new words’ meanings. However, Malone (2018) pointed out that reading while listening places higher WM demands
on learners than reading alone. Including an aural component, as in the reading-while-listening condition, increases this memory load.
The findings have complemented prior studies by showing that complex WM significantly affects incidental vocabulary learning from
reading, listening, reading while listening, and caption viewing. The present study did not include phonological short-term memory,
M.F. Teng
System 124 (2024) 103381
13
the results cannot verify its role in incidental vocabulary learning through different input modes. Montero Perez (2020) asserted that
this type of short-term memory is less important for incidental vocabulary acquisition from viewing. However, Teng and Zhang (2023)
highlighted the significance of phonological short-term memory and complex WM for vocabulary learning via multimodal input. The
different varied roles of WM in incidental vocabulaery learning in the captioned viewing context were also noted in Teng (2023ab) and
Teng and Cui (2023). Future research on this topic is necessary.
It is also crucial to acknowledge that we may have oversimplified WM by concentrating on verbal WM. To enhance theoretical
understanding, scholars should contemplate how the visuospatial sketchpad operates in conjunction with verbal WM. Integrating
verbal and visuospatial components is vital to fully grasp learners’ WM capacity. Considering the interplay between these WM
components will provide a more nuanced view of how learners process information through different modalities. By acknowledging
the roles of verbal and visuospatial WM, researchers can more fully unveil the cognitive processes involved in incidental vocabulary
learning. Conclusions may yield a more robust theoretical framework regarding learners’ WM capacity.
7. Conclusions
Overall, the results underscore the effectiveness of incidental vocabulary learning through diverse modes of input, such as listening,
reading, reading while listening, and viewing videos with captions. Notably, the data highlighted the significant advantage provided
by the use of on-screen text in captioned videos for facilitating incidental vocabulary learning. Furthermore, the study delved into the
nuanced impact that prior vocabulary knowledge and working memory capacity have on the learning process across different input
modes. Prior vocabulary knowledge and working memory capacity play critical and distinct roles in the learners’ ability to learn and
retain both the form and meaning of new words across different input modes. The interplay between prior vocabulary knowledge,
working memory, and the mode of input presents a complex yet insightful picture of how incidental vocabulary learning occurs.
This study has several limitations. First, repeated viewing was not assessed across input modes but may influence how WM and
prior vocabulary knowledge affect incidental vocabulary learning. Second, word-related factors (e.g., cognateness, word relevance,
and contextual clues surrounding target items) were not evaluated; adding these characteristics may enhance the understanding of
incidental vocabulary learning. Consideration must also be given to the nature of target words. The acquisition of different word types,
such as abstract nouns, verbs, and adjectives, may vary. Certain words might be imperative for video comprehension (and for capturing
students’ interest); other words could affect conceptual understanding less and cause students to be less attentive to them. Third, the
chosen video was an L2 documentary. Future studies can include additional video genres, as lexical coverage may differ by category.
Fourth, follow-up work can examine productive knowledge, which is also a crucial part of vocabulary learning. Finally, learners’ look-
up behavior when encountering target words was not addressed. Eye-tracking technology could be deployed in the future to expand
our sense of incidental vocabulary learning from different input modes.
Despite its limitations, this study offers theoretical and pedagogical implications for incidental vocabulary learning. Theoretically,
the findings offer insights into learners’ WM resources when processing audio and visual input. Exploring dual processing mechanisms
reveals how people concurrently engage with auditory and visual stimuli. The interplay between WM and dual processing is vital to the
effectiveness of incidental vocabulary learning from audiovisual materials. This theoretical framework encourages contemplation of
how learners use their WM resources when simultaneously processing information from different modalities. Findings could inform
later research on the cognitive aspects of language acquisition.
In terms of pedagogical implications, the results underscore the need to combine spoken, written, and audiovisual input with
captions to improve incidental vocabulary learning. This recommendation aligns with the literature stressing reading as a primary
avenue for such knowledge acquisition. Teachers can leverage this trend by integrating audio support for reading and captions for
video viewing; this approach should foster incidental vocabulary learning in foreign language contexts. With the growing accessibility
of audiobooks as well as captioned videos (e.g., on YouTube and Netflix), educators can easily bring these resources into classroom and
independent EFL learning. Second, even with advances in incidental vocabulary learning, the participants’ vocabulary gains were still
relatively limited. Teachers may need to diversify their instructional materials across input modes by tailoring items to students’
current vocabulary knowledge. Finally, the present study emphasizes the importance of considering learners’ WM resources in inci
-
dental vocabulary learning. Teachers should account for students’ WM capacities when it comes to recalling word forms and meanings.
While learners use WM resources across various input modes, implementing these modes does not necessarily exclude students with
lower WM abilities. Scholars should further explore incidental vocabulary learning across input modes. Investigating high- and low-
WM participants separately could fortify our understanding of how WM shapes vocabulary acquisition.
CRediT authorship contribution statement
Mark Feng Teng: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Resources, Method
-
ology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization.
Declaration of competing interest
The author declares that there are no competing financial interests or personal relationships that could have appeared to influence
the work reported in this paper.
M.F. Teng
System 124 (2024) 103381
14
Acknowledgement
This research was supported by National Social Science Fund of China, entitled cross-sectional effects and longitudinal develop-
ment of working memory and vocabulary acquisition (Grant number: 22BYY182).
References
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4, 417–423.
Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. Bower (Ed.), The psychology of learning and motivation (pp. 47–90). New York, NY: Academic Press.
Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. URL http://CRAN.R-project.org/package=lme4.Rpackageversion0.999375-33.
Brown, R., Waring, R., & Donkaewbua, S. (2008). Incidental vocabulary acquisition from reading, reading while-listening, and listening to stories. Reading in a Foreign
Language, 20, 136–163.
Chang, A. C. (2011). The effect of reading while listening to audiobooks: Listening fluency and vocabulary gain. Asian Journal of English Language Teaching, 21, 43–64.
Conway, A., Kane, M., Bunting, M., Hambrick, Z., Wilhelm, D., & Engle, R. (2005). Working memory span tasks: A methodological review and user’s guide.
Psychonomic Bulletin & Review, 12, 769–786.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466.
Dang, T., Lu, C., & Webb, S. (2022). Incidental learning of collocations in an academic lecture through different input modes. Language Learning. Online advance
publication. https://doi.org/10.1111/lang.12499
Feng, Y., & Webb, S. (2020). Learning vocabulary through reading, listening, and viewing: Which mode of input is most effective? Studies in Second Language
Acquisition, 42, 499–523.
Gass, S., Winke, P., Isbell, D. R., & Ahn, J. (2019). How captions help people learn languages: A working-memory, eye-tracking study. Language, Learning and
Technology, 23(2), 84–104.
Horst, M. (2005). Learning l2 vocabulary through extensive reading: A measurement study. The Canadian Modern Language Review, 61, 355–382.
Horst, M., Cobb, T., & Meara, P. (1998). Beyond a clockwork orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11,
207–223.
Hu, M., & Nation, I. S. P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 13, 403–430.
Jin, Z., & Webb, S. (2020). Incidental vocabulary learning through listening to teacher talk. The Modern Language Journal, 104(3), 550–565.
Lenth, R. (2019). Emmeans: Estimated marginal means, aka least-squared means. Retrieved from https://CRAN.R-project.org/package=emmeans.
Malone, J. (2018). Incidental vocabulary learning in SLA: Effects of frequency, aural enhancement, and working memory. Studies in Second Language Acquisition, 40,
651–675.
Meara, P. (1992). EFL vocabulary tests. Clearing House.
Montero Perez, M. (2020). Incidental vocabulary learning through viewing video: The. role of vocabulary knowledge and working memory. Studies in Second Language
Acquisition. https://doi.org/10.1017/S0272263119000706. Online preprint.
Montero Perez, M., Peters, E., Clarebout, G., & Desmet, P. (2014). Effects of captioning on video comprehension and incidental vocabulary learning. Language,
Learning and Technology, 18, 118–141.
Montero Perez, M., Peters, E., & Desmet, P. (2018). Vocabulary learning through viewing video: The effect of two enhancement techniques. Computer Assisted Language
Learning, 31(1–2), 1–26.
Nation, I. S. P. (2017). The BNC/COCA Level 6 word family lists [Data. file] Version 1.0.0. http://www.victoria.ac.nz/lals/staff/paul-nation.aspx.
Pellicer-S
́
anchez, A. (2017). Learning L2 collocations incidentally from reading. Language Teaching Research, 21, 381–402.
Pellicer-S
́
anchez, A., & Schmitt, N. (2010). Incidental vocabulary acquisition from an authentic novel: Do things fall apart? Reading in a Foreign Language, 22, 31–55.
Peters, E. (2018). The effect of out-of-class exposure to English language media on learners’ vocabulary knowledge. ITL - International Journal of Applied Linguistics,
169, 142–168.
Peters, E., Heynen, E., & Puim
`
ege, E. (2016). Learning vocabulary through audiovisual input: The differential effect of L1 subtitles and captions. System, 63, 134–148.
Peters, E., & Webb, S. (2018). Incidental vocabulary acquisition through viewing L2. television and factors that affect learning. Studies in Second Language Acquisition,
40, 551–577.
Pigada, M., & Schmitt, N. (2006). Vocabulary acquisition from extensive reading: A case study. Reading in a Foreign Language, 18, 1–28.
Puim
`
ege, E., & Peters, E. (2020). Learning formulaic sequences through viewing L2 television and factors that affect learning. Studies in Second Language Acquisition,
42, 525–549.
R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-
project.org. ISBN 3-900051-07-0.
R
́
ev
́
esz, A. (2012). Working memory and the observed effectiveness of recasts on different L2 outcome measures. Language Learning, 62, 93–132.
Teng, M. F. (2018). Incidental vocabulary acquisition from reading-only and reading-while-listening: A multi-dimensional approach. Innovation in Language Learning
and Teaching, 12(3), 274–288.
Teng, M. F. (2020). Retention of new words learned incidentally from reading: Word exposure frequency, L1 marginal glosses, and their combination. Language
Teaching Research, 24(6), 785–812. https://journals.sagepub.com/doi/10.1177/1362168819829026.
Teng, M. F. (2021). Language learning through captioned videos: Incidental EFL vocabulary acquisition. New York: Routledge.
Teng, M. F. (2022). Incidental L2 vocabulary learning from viewing captioned videos: Effects of learner-related factors. System. https://doi.org/10.1016/j.system.2022.
102736.
Teng, M. F. (2023a). Effectiveness of captioned videos for incidental vocabulary learning and retention: The role of working memory. Computer Assisted Language Learning.
https://doi.org/10.1080/09588221.2023.2173613.
Teng, M. F. (2023b). Incidental vocabulary learning from captioned videos: Learners’ prior vocabulary knowledge and working memory. Journal of Computer Assisted
Learning, 39(2), 517–531. https://doi.org/10.1111/jcal.12756
Teng, M. F. (2023c). Incidental vocabulary learning from captioned video genres: Vocabulary knowledge, comprehension, repetition, and working memory. Computer
Assisted Language Learning. https://doi.org/10.1080/09588221.2023.2275158
Teng, M. F., & Cui, Y. (2023). Comparing incidental learning of single words and collocations from different captioning conditions: The role of vocabulary knowledge
and working memory. Journal of Computer Assisted Learning. https://doi.org/10.1111/jcal.12910.
Teng, M. F., & Mizumoto, A. (2023). The role of spoken vocabulary knowledge in language minority students’ incidental vocabulary learning from captioned
television. Australian Review of Applied Linguistics, 46(2), 253–278. https://doi.org/10.1075/aral.22033.ten.
Teng, M. F., & Zhang, D. (2023). The associations between working memory and the effects of multimedia input on L2 vocabulary learning. International Review of
Applied Linguistics in Language Teaching (IRAL), 61(3), 1021–1049. https://doi.org/10.1515/iral-2021-0130.
Teng, M. F. (2024). Incidental vocabulary learning from listening, reading, and viewing captioned videos: Frequency and prior vocabulary knowledge. Applied
Linguistics Review. https://doi.org/10.1515/applirev-2023-0106
Unsworth, N., Heitz, R., Schrock, J., & Engle, R. (2005). An automated version of the operation span task. Behavior Research Methods, 37, 498–505.
van den Noort, M., Bosch, P., Haverkort, M., & Hugdahl, K. (2008). A standard computerized version of the Reading Span Test in different languages. European Journal
of Psychological Assessment, 24, 35–42.
Vanderplank, R. (2016). Captioned media in foreign language learning and teaching: Subtitles for the deaf and hard-of-hearing as tools for language learning. Palgrave.
van Zeeland, H., & Schmitt, N. (2013). Incidental vocabulary acquisition through L2 listening: A dimensions approach. System, 41, 609–624.
M.F. Teng
System 124 (2024) 103381
15
Vidal, K. (2003). Academic listening: A source of vocabulary acquisition? Applied Linguistics, 24, 56–89.
Vidal, K. (2011). A Comparison of the effects of reading and listening on incidental vocabulary acquisition. Language Learning, 61, 219–258.
Waring, R., & Takaki, M. (2003). At what rate do learners learn and retain new vocabulary from reading a graded reader? Reading in a Foreign Language, 15 pp. 130–163).
Webb, S. (2008). The effects of context on incidental vocabulary learning. Reading in a Foreign Language, 20(2), 232–245.
Webb, S. (2020). Incidental vocabulary learning. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 225–239). London: Routledge.
Webb, S., & Chang, A. C. (2012). Vocabulary learning through assisted and unassisted repeated reading. The Canadian Modern Language Review, 68, 267–290.
Webb, S., & Chang, A. C. (2014). Second language vocabulary learning through extensive reading with audio support: How do frequency and distribution of
occurrence affect learning? Language Teaching Research, 19(6), 667–686.
Webb, S., & Chang, A. C. (2015). How does prior word knowledge affect vocabulary learning progress in an extensive reading program? Studies in Second Language
Acquisition, 37, 651–675.
Webb, S., & Chang, A. C. (2022). How does mode of input affect the incidental learning of collocations? Studies in Second Language Acquisition, 44, 35–56.
Webb, S., Newton, J., & Chang, A. C. (2013). Incidental learning of collocation. Language Learning, 63, 91–120.
Webb, S., Sasao, Y., & Balance, O. (2017). The updated vocabulary levels test. ITL - International Journal of Applied Linguistics, 168, 33–69.
Wen, Z., Borges Mota, M., & McNeill, A. (2015). Working memory in second language acquisition and processing. Bristol, UK: Multilingual Matters.
Winke, P., Gass, S., & Sydorenko, T. (2010). The effects of captioning videos used for foreign language listening activities. Language, Learning and Technology, 14,
65–86.
Mark Feng Teng, Ph.D., is Associate Professor at Macao Polytechnic University. He was the recipient of the 2017 Best Paper Award from the Hong Kong Association for
Applied Linguistics (HAAL), 2023 Best Paper Award in social sciences from Education Ministry in China. His research portfolio mainly focuses on computer-assisted
vocabulary learning, and L2 writing from the perspective of metacognition. His publications have appeared in international journals, including Applied Linguistics,
TESOL Quarterly, Language Teaching Research, System, Applied Linguistics Review, Computer Assisted Language Learning, Computers & Education, Foreign Language
Annals, and IRAL, among others. His recent monographs were published by Routledge, Springer, and Bloomsbury. He also edited and co-edited special issues for in-
ternational journals, including Journal of Writing Research, Studies in Second Language Learning and Teaching, and TESOL Journal.
M.F. Teng