ALL GOOD THINGS COME IN THREES: EARLY ENGLISH LEARNING, CLIL AND MOTIVATION IN SWITZERLAND

In this study, I examine the strength of the association between L3 English performance and starting age, on the one hand, and motivation and different types of provision of foreign language teaching, on the other, in Swiss learners of EFL with a long learning experience (between 6–11 years). Multilevel analyses were performed to investigate whether early starters in instructional settings achieve the same kind of long-term advantage as late starters and to examine how motivation and type of instruction (regular EFL instruction vs. Content and Language Integrated Learning or CLIL) factor into this process. Results show that starting age alone does not seem to be the distinguishing variable and that type of instruction and, above all, motivation are stronger predictors of L3 proficiency than starting age. Furthermore, qualitative analyses reveal a bi-directional causal link between CLIL and motivation and CLIL and learner outcomes. The study thus complements previous research by offering a critical empirical examination of age effects as well as CLIL outcomes and by investigating second-order interactions of individual difference variables and linguistic and contextual variables, which are still under-researched both in educational psychology and the study of second language acquisition.


INTRODUCTION
Education policy makers in many European countries tend to assume that age of instruction onset (AO) is the most important and robust predictor of success in foreign language learning in an instructional setting, "irrespective of what research findings suggest" (Mihaljević Djigunović, 2014: 420). 1 However, recent measures that have been implemented in Switzerland to improve students' communicative skills and intercultural competence, such as the early teaching of Englishlearning English as a foreign language (EFL) starts as early as the age of eight or nine now in 15 out of 26 cantons in Switzerland (EDK 2014) -have 1 To give an example for this line of argumentation: in 2003, the Bildungsrat of the Canton Zurich explained that early English was introduced because "younger learners are capable of acquiring and storing a language unconsciously, provided they are exposed to regular and rich input. Language skills that are stored in this manner will automatically be available to the learners later in life" (Bildungsratsbeschluss 18/3/2003, my translation). For a recent publication on language policy documents in Europe, the interested reader is referred to Nikolov and Mihaljevic Djigunvic (2011). yielded rather disappointing first results, and it has become clear that the early teaching of English may not -cannot -be the sole course of action to improve students' English language competence. At this time it is particularly important to revisit linguistic and affective characteristics of early starters vs. late starters in various types of foreign language (FL) programs in Switzerland, as educational authorities in Europe have recently brought forward the starting age of language instruction in elementary schools, mainly as a result of the "younger-is-better" view and the steady growth of English as a lingua franca, although other reasons are also mentioned in official Swiss language policy documents, such as the political and cultural significance of the four national languages (German, French, Italian, Rhaeto-Romanic) on a national level, later learned languages (particularly French, see Haenni Hoti et al., 2011), the multilingualism requirement/goal in Europe, parental encouragement, globalization, integration and the world-wide network, and favorable attitudes to other languages, people, and cultures (see EDK 2014EDK , 2015Eurobarometer 2006;European Commission 1995). This has led to small amounts of second language (L2) instruction stretched over a rather long period of time, which may have an impact on students' motivation, especially in the long term (Lasagabaster, 2011: 13).
Around the same time as the Swiss Conference of Education Directors decided to lower the starting age of English instruction, they also started to implement Content and Language Integrated Learning programs, generally known as CLIL, in which three content subjects (such as mathematics or biology) are taught through the FL. 2 The introduction of CLIL in Switzerland reflected a general need in Europe to provide students with enhanced opportunities in school to acquire competence in additional languages (see Marsh, 2002). Since then, the very positive associations of CLIL (e.g. its perceived success and effectiveness) have attracted researchers, administrators, teacher educators, and teachers, particularly those in the field of English as an L2/FL (Cenoz, Genesee & Gorter, 2014: 247). Anton Näf, Emeritus Professor at the University of Neuchâtel, even went as far as to call CLIL programs in Switzerland "the egg of Columbus" (Tages-Anzeiger 27/1/2014) due to its potential to improve students' FL skills.
It is one of the main goals of this study to offer a critical empirical examination of age effects in interaction with CLIL in state educational institutions in order to better identify the strengths and weaknesses of different FL programs. However, any study of the outcomes of CLIL has to take into account one of the most crucial factors interacting with type of instruction, namely motivation, since in many European countries, CLIL programs are often not available to all students, which leads to a selection of students for these programs "who will be academically motivated to succeed in the FL, as in other subjects" (Bruton, 2011: 524). Examining the impact of starting age, type of instruction and motivation in the same study may help us understand the relative importance of each of these factors for language use, an insight that has been impossible to gain in previous studies.
I would like to point out that due to the scope of this paper and the focus of this special issue, it was not possible to consider English in the broader multilingual context (e.g. in relation to French as an additional foreign language). The interested reader is referred to Pfenninger and Singleton (in prep) and Pfenninger and Singleton (submitted), where we analyze in detail the causes of and constraints on crosslinguistic influences in the Zurich system, including the French-English interaction and the socio-affective dimension (motivation, attitudes, awareness, anxiety, and learning strategies).

AGE-BY-TREATMENT INTERACTION RESEARCH
Age-by-treatment interaction research has traditionally shown that different learning processes are at work at different ages, which explains the need for different "treatment", both in the broad sense (exposure in a naturalistic setting vs. instruction in a classroom) and in the narrow sense (e.g. meaning-focused vs. form-focused instruction) (see DeKeyser, 2012). In this section, I will first focus on the macro level, that is, age * treatment (context) interaction 3 , followed by a discussion of the micro level, i.e. age * (instructional) treatment interaction.
Numerous classroom studies in Europe and indeed across the world (see, e.g., Al-Thubaiti, 2010 for Saudi Arabia; Muñoz, 2006Larson-Hall, 2008 for Japan;Myles & Mitchell, 2012 for GB;Unsworth, de Bot, Persson & Prins, 2012 for the Netherlands, just to name a few) have found that there are no correlations between starting age and FL language outcomes in formal instructional settings, in contrast to the situation in naturalistic settings (for a recent review, see DeKeyser & Larson-Hall, 2005). The main goals of research in FL learning settings have been to examine FL outcomes as a function of the starting age, size and characteristics of older learners' advantage for different language dimensions and after different amounts of exposure, and, more recently, the interplay of the age factor with social, affective and personal variables. Contextual factors, such as amounts and intensity of input (see, e.g., the collection in Muñoz, 2012a), highquality input (e.g. Winitz, Gillespie & Starcev, 1995;Flege & Liu, 2001), range of contexts of L2 use (e.g. Moyer, 2004), and co-habitation with native speakers (e.g. Muñoz & Singleton, 2007;Kinsella & Singleton, 2014), have been shown to have a significant impact on learners' attainment (for a review see Muñoz & Singleton, 2011). In the following I will focus on amount and intensity of input, i.e. different types of provision of FL teaching in a classroom, notably CLIL vs. regular EFL instruction.
Launched in Europe in the 1990s by "a group of experts from different backgrounds" (Cenoz et al., 2014: 243), CLIL is "a dual-focused educational approach in which an additional language is used for the learning and teaching of both content and language" (Coyle, Hood & Marsh, 2010: 1;see Cenoz et al. 2013 for a description of the wide range of educational CLIL practices). The dual role of language and content thus means that proficiency is to be developed in both the non-language subject and the language in which it is taught (Lasagabaster, 2011) -although it is notoriously difficult to achieve a strict balance of language and content, which leads to "a lack of cohesion around CLIL pedagogies" (Coyle, 2008: 101;see also Cenoz et al., 2014;Mehisto, 2008;Pérez-Vidal & Juan-Garau, 2010). Since the definition of CLIL now also includes reference to partial immersion (Cenoz et al., 2014: 246;Maillat, 2010;see Pérez-Cañado, 2012, for an opposing view), the notion of CLIL will be used in the following as a cover term for both CLIL and immersion 4 .
A considerable amount of CLIL research has been carried out in intensive primary and secondary school classes in the last twenty years, and various benefits of CLIL have been pointed out, such as the following: (1) Due to the higher amount and intensity of exposure to the FL, on the one hand, and the opportunities for engaging in authentic and meaningful interaction in reallife contexts, on the other, immersion students have traditionally been found to be highly successful in comparison with students who have received regular FL instruction, particularly with respect to receptive skills (listening and reading), oral fluency, syntactic complexity, lexical range and confidence/risk-taking in the target language (e.g., Collins & White, 2011Dalton-Puffer, 2007;Spada & Lightbown, 1989;Pfenninger, 2014;Ruiz de Zarobe & Jiménez Catalán, 2009;Serrano & Muñoz, 2007); (2) CLIL students have been reported to demonstrate better verbal and non-verbal communication skills, cognitive skills and divergent thinking than their non-CLIL counterparts (Vesterbacka, 1991); (3) The above benefits have emerged both when exposure has been concentrated and when it has been distributed across time in short intensive experiences (e.g., Collins & White, 2011; (4) CLIL is said to be able to minimize the role that individual differences, such as language learning aptitude, may play in more limited exposure situations (e.g., Collins & White, 2011; (5) CLIL increases exposure to the target language without taking up more time in an already crowded school timetable (e.g., Lasagabaster, 2011); (6) Content knowledge appears to remain on a par with that learned through the L1 (e.g., Admiraal, Westhoff & de Bot, 2006; see also Cummins, 1995;Genesee, 1987Genesee, , 2004); (7) L1 skills are very similar both in CLIL classes and in non-CLIL classes (e.g. Seikkula-Leino, 2007;Vesterbacka, 1991); (8) Due to the higher exposure to the FL than in regular programs, CLIL programs are known to foster implicit learning, 5 which has been identified as a highly effective way of learning (Coyle, 2008;de Graaff & Housen, 2009;DeKeyser, 2000;Hulstijn, 2002); (9) Related to point (8), CLIL is age-appropriate in elementary schools, since younger children (e.g. in an early FL program) cannot attend to formal, explicit L2 instruction to the same extent as older children as prepubertal learning is less reliant on analytic ability (e.g., N. Ellis, 2002).
Of course there are numerous well-known issues with the implementation of CLIL in the classroom, particularly with implicit learning in connection with maturational effects, but I do not wish to go very deeply into this here (the interested reader is referred to Pfenninger, 2011Pfenninger, , 2014Pfenninger & Singleton, in prep.). The important point here is that AO-treatment interaction research shows more than the importance of starting age or a particular treatment. It can show why a treatment works best (or more precisely why sometimes it does and 5 According to R. Ellis (2005) implicit knowledge "is procedural, is held unconsciously, and can be verbalized only if it is made explicit. It is accessed rapidly and easily and thus is available for use in rapid, fluent communication" (p. 214). By contrast, explicit knowledge "is conscious and declarative and can be verbalized. It is typically accessed through controlled processing when learners experience some kind of linguistic difficulty in the use of the second language" (p. 214).
sometimes doesn't): due to the learning processes it involves, the treatment works well only with certain AO groups.

AGE * (INSTRUCTIONAL) TREATMENT * MOTIVATION INTERACTION
From the above discussion it has become clear that in order to have valid comparisons of the effect on learner outcomes in CLIL vs. non-CLIL classes, it is inevitable either to control for the motivational levels of the students or, preferably, to use motivation as yet another fixed effect in the statistical model. The theoretical framework for motivation in this study is based on the L2 Motivational Self System proposed by Dörnyei (2005Dörnyei ( , 2009, which hypothesizes that students' motivated learning behavior will be largely affected by three variables. The Ideal L2 Self, the person the learner wants to become, incorporates "traditional integrative and internalised instrumental motives". The Ought-to L2 Self, the side which wants to avoid punishment and meet expectations, incorporates "more extrinsic (i.e. less internalised) types of instrumental motives" (Dörnyei, 2009: 9). 6 A third component, L2 Learning Experience, covers the more immediate learning situation important to any study of L2 motivation in a classroom context (syllabus, teacher, etc.). Instrumentality is thus partly related to the Ideal L2 Self, particularly instrumentality with a promotion focus (Dörnyei, 2005: 30). Other forms of instrumentality (e.g. instrumentality with a prevention focus) may be more associated with the Ought-to L2 Self, the image of oneself which avoids punishment, i.e. external regulation.
As Dörnyei and Chan (2013: 439) point out, numerous studies in recent decades have confirmed the overall explanatory power of the L2 Motivational Self System, with the Ideal L2 Self in particular seen as a strong predictor of various criterion measures related to language learning, thus playing a substantive role in determining motivated behavior. For instance, research by Csizér and Lukács (2010) confirmed the seminal role of Dörnyei's conception of the Ideal L2 Self in predicting motivated learning behavior, and the paramount influence of the Ideal L2 Self on motivation. By contrast, it has often been suggested that the Ought-to L2 Self appears to have no significant impact on results (e.g. Csizér & Dörnyei, 2005: 29). However, Csizér and Lukács (2010: 12) argue that further research on, and reformulation of, the concept of the Ought-to L2 Self may clarify this aspect of the L2 Motivational Self System. What is more, age plays a role in the formation of selves: Kormos and Csizér (2008) found that secondary school students in Hungary scored lower values for the Ideal L2 Self than university students or adult workers, speculating that "students' self-image is relatively stable, and because they have to acquire the L2 in adulthood, the L2 self is also under transformation at this stage" (2008: 346).

RESEARCH QUESTION AND VARIABLES
On the basis of what has been discussed so far, the present study aims to enrich our knowledge of the effects of input in long-term FL learning by exploring which of the three predictors (starting age, type of instruction, motivation) has a stronger predictive power. The following summarizes the main research question: (1) What is the strength of the association between L3 performance and starting age, on the one hand, and type of instruction and motivation, on the other, in learners with a long learning experience (between 6-11 years)?
Individual differences factors in this study are AO and motivation. The context-level factor, CLIL, is hypothesized to influence individual EFL proficiency through its mediating effect on the association between individual differences factors and L2 proficiency. Although I hypothesize motivation and CLIL to have a positive effect and starting age to have a neutral effect on EFL proficiency at both the individual and contextual levels, how and to what extent individual-and contextual-level factors may interact with each other are open empirical questions.

PARTICIPANTS: NESTING STRUCTURE
A total of 200 participants (89 males and 111 females) were clustered in 12 classes in five schools, mostly consisting of 10-20 learners, all of whom had similar characteristics: they were in grade 12 English classes in academically oriented secondary school, they were between 17 and 20 years old (mean 18;9), they came from similar socioeconomic backgrounds and did not take any private classes of English outside school. 7 Through this clustering, participants were streamed into two different instruction types: on the one hand, there were students who were enrolled in CLIL programs (100 students in six classes) and, on the other, students who followed an EFL approach and who only had exposure to EFL in the traditional way (100 students in six classes). As mentioned above, students in CLIL classes received additional exposure to the foreign language: English classes as well as the school subject taught in English. Furthermore, they were divided into four groups according to age of onset and learning constellation in primary and secondary school: 50 of the participants were early starters who attended an immersion (CLIL) program in secondary school (EARLY CLIL), 50 had followed the same elementary school program but then received regular EFL instruction after elementary school (EARLY NON-CLIL), 50 were late starters who began learning English immersively in secondary school (LATE CLIL), while the other 50 attended a regular EFL program (LATE NON-CLIL). Note that the early starters (EARLY CLIL and EARLY NON-CLIL) and the late starters (LATE CLIL and LATE NON-CLIL) had dissimilar amounts of exposure: due to their earlier start, the EARLY CLIL and EARLY NON-CLIL had had access to greater instruction time. By the end of secondary school, the EARLY CLIL group spent an average of 1,770 hours learning English, followed by the LATE CLIL with 1,330 hours, the EARLY NON-CLIL with 1,170 hours, and the LATE NON-CLIL with 730 hours. Other recent studies of maturational effects in a classroom have used shorter periods (from 600 to 800 hours) in their longest-term comparisons (e.g., García Mayo & García Lecumberri, 2003;Larson-Hall, 2008;Muñoz, 2006). However, the early starters were not mixed in with late starters in the same class. A biodata questionnaire was administered to collect biographical data and quantifiable information concerning their language learning experience (e.g., starting age, number of instructional hours in school, frequency of contact with L2 speakers, time spent abroad).
The school track under investigation here, which I refer to as 'academically oriented secondary school', represents the main -but not the only -university entry pathway. It is an elite and selective publicly funded school, representing one of three main secondary school tracks (the highest educational level). In the canton of Zurich, admission is based on students' average grades and an entrance examination. The number of those taking the matura or maturité exam (i.e. the final graduation exam) has increased in recent years. Between 1986 and 2013 the percentage awarded this certificate almost doubled to 20 percent (http://www.bfs.admin.ch/bfs/portal/de/index/themen/15/01/pan.html on 12/7/2015). There are three main reasons why it was decided to assess the development of EFL skills of this group of learners: (1) This particular secondary school track is roughly equivalent to grammar schools, Baccalaureate schools and high schools in other countries in terms of length of instruction (six years until graduation), institutional design (e.g. number and kinds of compulsory subjects, assessment of students, final certificate) and purpose (e.g. they do not lead to professional qualifications, but prepare students for tertiary level education programs). This is important for comparisons with related previous work in Europe and elsewhere.
(2) Lower secondary levels, which only take three years, are not ideal to test for longterm effects of an early foreign language program. In age-related research, it is one of the most basic and most important tasks to identify predictors of short-term AND long-term FL attainment. Furthermore, it has been previously suggested that it takes a substantial accumulation of input to yield manifestations of advantages of an early start (e.g., Larson-Hall, 2008;Muñoz & Singleton, 2011;Singleton, 1995aSingleton, , 1995bSingleton, , 2005. (3) Assessing "good and motivated" learners 8 , who (ideally!) involve themselves in the language-learning process and take into account the demands that FL learning imposes, is not considered a limitation in this kind of study: strong learners can provide key data on the effectiveness of a new FL program and yield revealing results in search of influential factors in the process of FL learning (see, e.g., Muñoz, 2014). The insights thus gained can then also help learners who are not obtaining such good results.
It goes without saying that the complexity of the Swiss educational system makes generalizations difficult; this, however, is a general problem in studies of foreign language learning, which we discuss in detail in Pfenninger and Singleton (forthcoming) and Pfenninger and Singleton (in prep.).
It is also important to bear in mind that in Switzerland, a distinction is made between CLIL and immersion: while activities are undertaken in English in the CLIL classroom, these activities relate to the learning of the second language. As such, the CLIL program in Swiss primary schools is similar to the "intensive English programs" in Canada (see, e.g., Netten & German, 2004), albeit with considerably fewer hours of instruction a week (two 45-minute lessons per week). The emphasis is placed on L2 sensitization, oral fluency, comprehension, cultural awareness, vocabulary and formulaic language. However, the strong focus on meaning in comprehensible input and the communication of authentic messages resemble the main goal of immersion programs. By contrast, the CLIL program that Swiss students later attend in secondary school is a partial immersion program that consists of three content subjects (e.g. mathematics, biology and history) taught through the FL (L3 English) in order to maximize the quantity of comprehensible input and purposeful use of English, in line with Swain's (1985) Output Hypothesis and Long's (1981) Interaction Hypothesis. Additionally, English is taught formally as a separate school subject. Thus, learners experience a combination of formal and informal learning, which offers them what seems to be an ideal opportunity to learn an FL in a classroom: a combination of explicit learning, or "focus on forms", and implicit learning, or "focus on meaning", to use Long and Robinson's terms (1998). Even though in many Swiss schools a student's average school grade functions as a criterion in deciding who can join the program and who cannot, the immersion students in this study did not have significantly better grades in English before they entered the program. This ensures to a certain extent that the results will not be contaminated by the fact that the EARLY CLIL and LATE CLIL groups are more proficient than the EARLY NON-CLIL and LATE NON-CLIL groups to start with.
Finally, it is important to mention that English is considered an L3 here due to the special linguistic landscape in Switzerland: while Swiss German is a High Alemannic variety of German, it is hardly understandable to someone who knows only Standard German, as the two languages differ to some extent in lexicon, phonology and syntax (for a discussion of this, see e.g., Berthele, 2010). According to Lüdi (2007: 161), most Swiss citizens are monolingual during their childhood, but they usually become bilingual in the early primary grades at the latest when they receive formal literacy training in L2 German from 1st grade on (age 7). This means that German-speaking Swiss children have to learn to read, write, and use a relatively unknown language all at once.

MEASURES
Due to the fuzziness of the Ideal L2 Self/Ought-to L2 Self binary in the L2 Motivational Self System proposed by Dörnyei (2005Dörnyei ( , 2009) as well as the Integrativeness/Instrumentality binary in Gardner's Socio-Educational Model of Language Learning (e.g. Gardner, 2008;Gardner & Lambert, 1959), which emerged in a language experience essay written by the 200 participants (see Pfenninger & Singleton, in prep., for a detailed description of this task), it was decided in this study to make a distinction between learners' Future selves and their Present selves, rather than between L2 Self, Integrativeness and Instrumentality. Future selves encompasses students' wish to become similar to native speakers of English as well as the usefulness of the L2 skills learned in the future. Present selves refers to the current attitudes learners display toward EFL and the L2 community and their reactions to a world in which English plays a predominant role, as well as the extent to which the learners want to be involved in cross-cultural contact situations and travel to English-speaking countries. This dimension also includes those factors of external regulation which lead to action in order to avoid punishment or bad grades or assuage one's guilty conscience. Participants completed a Likert-type questionnaire that consisted of 15 items, which comprised five choices (totally disagree, disagree, neutral, agree, totally agree) for each of the five statements. The 15 items were taken from the motivation questionnaire of a large-scale study (see Pfenninger & Singleton, forthcoming). A third of the statements were formulated in the negative, and the resultant list was translated into German and randomized. Table 1 shows the Cronbach's alpha reliability coefficients for the two multi-item scales of the present study. All of the reliability coefficients are above the recommended .70 threshold.  Laufer and Nation (1999), and a listening comprehension task (see Pfenninger, 2014). The tasks had been aligned against Level B2/C1 in the Common European Framework of Reference for Languages (CEFR). The grammaticality judgment task included morphosyntactic structures that have been found to be particularly agesensitive, such as articles and inflections, as well as structures that are not particularly agesensitive, for instance word order and do-support (see, e.g., McDonald, 2006).

METHOD
I used R (R Development Core Team 2014) and lme4 (Bates, Maechler & Bolker, 2008) to perform a linear mixed effects analysis (also called multi-level analysis) of the relationship between AO, CLIL and the L2 Self (see Pfenninger & Singleton, forthcoming, for a discussion of the benefits of such models). As fixed effects, I entered AO, type of instruction, and motivation into the model. Note that when including continuous predictors in a mixedeffect model such as motivation, it is often useful to center each predictor around its mean value (Cunnings, 2012). This involves subtracting from each individual value of a predictor the predictor's overall mean, and is done to help reduce collinearity within the model (e.g. between main effects and interactions; see Jaeger, 2010). The final models had random effects (intercepts) to account for class-to-class and school-to-school differences that induce correlation among scores for students within a school and within a class. In other words, the hierarchical structure of the data on all skills tested consisted of three levels: student (level 1), class (level 2), and school (level 3). The scores on the tests were added to the model at the student level. There were significant random school and class effects for all dependent variables. Likelihood ratio tests showed that random slope models (subject-specific slopes for the fixed effect AO) were not necessary for any dependent measure, so I constructed random intercept models. None of the interactions included (age * motivation; age * instruction; instruction * motivation) provided any better fit, except for one area (productive vocabulary, see below). For the listening comprehension task, the grammaticality judgment task, the productive vocabulary and receptive vocabulary tasks, I added random intercepts for subjects and items in order to account for the fact that some participants may generally have attained higher scores in this particular task than others, and some items may generally have yielded lower scores than others (see Cunnings, 2012: 374).
Visual inspection of residual plots did not reveal any obvious deviations from homoscedasticity or normality. P-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. All models reported were fitted using Laplace estimation with the R software. Also, all models were first evaluated with likelihood ratio tests (test model vs. null model with only the control variables). If the full model vs. null model comparison reached significance, I present pvalues based on likelihood ratio tests. Given the lack of degrees of freedom with mixed models, I refrain from reporting df. Table 2 presents the mean scores, standard deviations and intergroup differences for the seven language measures and the motivation measure:  To answer the research question regarding the strength of the association between English proficiency with starting age, on the one hand, and with type of instruction and motivation, on the other, mixed linear regression models with the test scores as dependent variables were fitted. A summary of all models is presented in Tables 3 and 4:   *Statistically significant at < .05; **Statistically significant at < .01 Note. LC = listening comprehension; PV = productive vocabulary; RV = receptive vocabulary; W/TU = written fluency: words per T-unit; CL/TU = written syntactic complexity: clauses per T-unit; ERR/TU = written accuracy: morphosyntactic errors per T-unit; GJT = grammaticality judgment task. Table 3 that there were no age effects for any of the dependent variables, and AO did not interact with type of instruction or motivation for any of the seven measures, with one exception: there was a significant interaction between AO and type of instruction for listening comprehension, which reflects the advantage of the EARLY CLIL group over all the other groups, as illustrated in Figure 1: The early starters in the immersion program (EARLY CLIL) significantly outperformed all the other groups, including the late starters in the same program (LATE CLIL). Furthermore, older starters did not show greater variation in their L2 performance, as Table 2 above shows (see also Pfenninger 2011Pfenninger , 2014.

It is clear from
CLIL significantly affected five out of seven dependent variables: listening comprehension (increasing it by about 6.5±2.4 points on a 20-point scale), productive vocabulary (raising it by about 5.8±5.89 points on the 54-point scale), receptive vocabulary (increasing it by 10.32±5.40 points on the 60-point scale), fluency (increasing it by 0.63±2.83 words per T-unit), and complexity (enhancing it by 0.21±0.25 per T-unit). Interestingly, both early starters and late starters benefited from immersion with respect to these measures. Accuracy as measured by errors/T-unit and grammaticality judgments was not affected by CLIL (see also Pfenninger, 2014). In fact, accuracy was not affected by either AO, type of instruction or motivation. All four groups had similar scores despite their dissimilar profiles, as Figure 2 shows for productive accuracy: Motivation also affected five out of seven dependent variables: listening comprehension (improving it by 1.54±0.41 points on a 20-point scale), productive and receptive vocabulary (increasing them by 6.17±1.02 points and 0.54±0.93 points, respectively), fluency (improving it by 0.96±0.49 words per T-unit), and grammaticality judgments (improving them by 0.42±0.35 points), but it did not have an effect on complexity and accuracy. Interestingly, there was no interaction between motivation and AO or motivation and type of instruction, which indicates that irrespective of starting age or type of instruction received, students with a higher motivation level outperformed less motivated students. The only interaction between motivation and type of instruction was found in the area of productive vocabulary, which was due to the CLIL students' higher motivation.
In order to calculate the effects of AO and type of instruction on motivation, a mixed model was fitted with AO and type of instruction as fixed effects, and school and class as random effects (intercepts). The results showed that whereas AO did not affect motivation (χ 2 (1)=0.00, p=0.949), instruction type (that is, CLIL in secondary school) had a significant impact (χ 2 (1)=12.77, p=0.0004), enhancing motivation by about 0.40± 0.10 points on a 5point scale. Figure 4 illustrates the higher motivation of the CLIL students:  Thus, with respect to the second variable under investigation, motivation, the findings suggest the following: the CLIL students were more motivated than their non-CLIL counterparts; however, generally speaking, students with greater motivation performed better on the English tests, irrespective of the type of instruction and AO; as for FL competence, CLIL had more beneficial effects than regular EFL instruction.

DISCUSSION
The current study has found that late-starting groups (LATE CLIL and LATE NON-CLIL) were able to catch up with the EARLY NON-CLIL group, which supports the hypothesis that the initial fast rate of FL learning of older learners may last for several years in an input-impoverished environment (Larson-Hall, 2008;Muñoz & Singleton, 2011;Singleton, 1995aSingleton, , 1995bSingleton, , 2005. Even though early learners (such as the EARLY CLIL and the EARLY NON-CLIL in this study) may in theory have greater potential than late starters due to their earlier AO and the larger amount of cumulative input, this does not translate into better performance unless formal instruction in English in secondary school is supported by late immersion, as we have seen in Probably more surprising than the EARLY CLIL students outperforming students in non-CLIL programs (EARLY NON-CLIL and LATE NON-CLIL) is the finding that the LATE CLIL group had made significant progress in a variety of skill areas, to the extent that they were able to catch up to the performance of the EARLY CLIL group. Thus, it seems to be access to late CLIL, regardless of early instruction, that makes the difference here. The oral-based, communicative pedagogical approach used in CLIL programs in secondary school could explain the significant differences in productive and receptive vocabulary knowledge, as well as written complexity and fluency between the students who were immersively educated in secondary school (EARLY CLIL and LATE CLIL) and the traditionally instructed participants (EARLY NON-CLIL and LATE NON-CLIL). The fact that CLIL seems to bear rich fruits with respect to vocabulary has been well documented in the literature (see literature review above). The overall success of the LATE CLIL group in these various skills is yet another indicator that instruction seems capable of overriding the age factor in a classroom setting.
The findings also confirm previous studies (e.g., Collins et al., 2012;Genesee, 1987Genesee, , 2004Pica, 2011;Spada & Lightbown, 1989) that found that (morphosyntactic) accuracy remains challenging for CLIL students. They also corroborate the positive effects of formfocused instruction on acquisition, that is, the effectiveness of explicit instruction on students' acquisition and use of specific morphosyntactic features of English. The lack of significant differences between all groups in relation to morphosyntactic accuracy might be due to the fact that the four groups practiced English grammar to the same extent. Since all the participants attended formal, explicit EFL instruction, they were required to read and write in English equally often and paid great attention to accuracy.
With respect to motivation, the findings confirm previous CLIL research (e.g. Lasagabaster, 2011) suggesting that learning in the FL increases motivation. The novel aspect of this study is that CLIL and motivation had a similar effect on language competence, without interaction between them. Finally, the results do not confirm previous findings that the more years students spend studying a subject, the more disenchanted with it they become (see e.g. Davies & Brember, 2001), i.e., AO (and therefore also length of instruction) do not have a significant effect on motivation. CLIL in secondary school, on the other hand, has a significant impact on students' motivation levels at the end of secondary education. 10

CONCLUSION
It was my goal in this study to not see CLIL "in a vacuum", as Bruton (2011: 531) fears happens in most CLIL studies, but to examine four different real-life educational scenarios that have been or are currently practiced in the Swiss system. As DeKeyser (2012: 190) rightly points out, interactions between individual variables and external, educational or contextual variables allow for more fine-tuned (and hence more generalizable) predictions that help to adapt teaching methodologies to students or curriculum design.
Like so many previous studies (e.g. Freed, Dewey, Segalowitz & Halter, 2004;Lasagabaster, 2011), my analysis has shown that CLIL programs should be boosted as they exert a very positive influence on learners' FL achievement. Since CLIL is not very well established in Switzerland, it still has to struggle for recognition and support. To date, intensive EFL is an optional program available to a minority of (high-achievement) students. In light of the finding that it is particularly low-level learners that make the most impressive progress in an intensive program (see e.g. White & Collins, 2012), it is highly recommended to implement a plan to offer intensive EFL to more secondary school students in Switzerland.
Furthermore, a number of (well-known) problems have emerged in this study, similar to previous studies of the outcomes of CLIL programs: (1) One obvious limitation in this study is that since the CLIL groups not only had English classes (language classes), but also three school subjects which were taught in English, two variables were conflated at the same time in the CLIL groups: type of provision and exposure (see Bruton, 2011;Cenoz et al., 2014). In other words, the CLIL students received many more hours of (formal and informal) EFL instruction than any of the other groups. This is probably one of the most fundamental issues for CLIL researchers and can only be resolved with complementary qualitative analyses (see Pfenninger & Singleton, in prep.).
(2) One factor that can be -and has to be -controlled for in the future, however, is aptitude, based on the insight that "CLIL can attract a disproportionally large number of academically bright students" (Mehisto, 2007: 63). It would greatly enrich the CLIL field to analyze the impact of aptitude at the beginning and at the end of immersion programs, so that the effect of intensive learning contexts could be more effectively assessed from a theoretical perspective. Even though the four groups in this study started with similar overall academic achievement according to previous grades in English, the CLIL participants might have profited from cognitive advantages that could not be captured in this research design.
(3) Related to (2), another caveat that needs to be mentioned is that there was no pretest. Even though the four groups in this study started with the same percentage averages in English, the CLIL participants might have profited from cognitive advantages that could not be captured in this research design. Of course, there might also have been language competence differences between the CLIL and non-CLIL groups that were not reflected by the students' grades, as was the case in Alonso, Grisaleña & Campo (2008).
(4) Because of the diversity of CLIL programs in Europe and the lack of conceptual clarity (see Cenoz et al., 2014), it is difficult for researchers to provide a clear and detailed description of CLIL classrooms/programs. This calls for further (critical) research into the methodological approach in which foreign language teaching takes place.
In a next step, it also seems to be interesting to analyze which input measures (length of instruction in years, use of English as the language of instruction, number of curricular and extracurricular lessons, amount of time spent in a naturalistic immersion situation abroad, current informal contact with the target language) are more strongly associated with longterm L3 performance and how aptitude factors into this process (see Pfenninger & Singleton, in prep.). Englisch an der Primarschule. Grundsatz und Rahmenbedingungen. Erwägungen zum Bildungsratsbeschluss und Bildungsratsbeschluss vom 18.3.2003Bildungsratsbeschluss vom 18.3. (2003