Linguistic Thresholds in the CLIL Classroom?
The Threshold Hypothesis Revisited

Wolfgang Zydatiß
Freie Universität Berlin


The article summarises some important findings of an evaluation project on extensive bilin­gual courses set at grammar schools in Berlin (using English as the working language). It focuses on interdependencies between foreign language compe­tencies and academic discourse competencies relevant to subject-matter teaching at the lower secondary level. Employing a comparative design, the study relates the performance of two pupil-samples; one taken from regular classes (taught in German) and one drawn from bilingual streams at the same schools. Both groups of students underwent an English achievement & proficiency test plus a test (in either German or English) probing learners’ subject-matter literacy pertaining to different cur­ricular areas. Statistical analysis can show that significant correlations exist between linguistic competencies (especially lexicogrammar and/or general proficiency) and task performance in English-medium content learning. The article suggests that there may be a double language threshold (a lower one and an upper one) which acts as an interven­ing variable towards transferable academic discourse competencies; accounting thereby for learners' low or high scores in a test setting which tries to tap the underlying con­struct of a general academic proficiency relevant to German CLIL classrooms.


Keywords: CLIL, bilingual education, threshold hypothesis, task-based language assessment, bilingualer Sachfachunterricht, general academic (second / foreign) language proficiency

1. Academic success and limited language proficiency

1.1 CLIL in its wider context

International research on bilingual education focused fairly early on the question of whether there are interdependencies between second language proficiency and learners’ academic achievements (Figure 1):

Figure 1: Interdependencies between second language ability and scholastic success in bilingual educational contexts

Figure 1

Having reviewed the empirical data on both the Canadian immersion programmes and on bilingual contexts in Europe involving school learners with a limited proficiency in the respective majority language, Cummins formulated his famous threshold hypothesis (1979) which differentiated between a lower and an upper threshold: “…if a bilingual child attains a very low level of competence in the second (or first) language, interaction with the environ­ment through that language, both in terms of input and output, is likely to be impoverished” (Cummins, 1979: 229f.). These learners may risk, in other words, becoming underachievers at school (when compared to their intellectual potential). By contrast, scholastic achievement will in general not be hampered provided the following condition holds: “The attainment of a lower level threshold of bilingual competence would be sufficient to avoid any negative cognitive effects” (Cummins, 1979: 230). Balanced bilingual speakers, however, who have reached the second, upper level with an age-appropriate competence in both languages might even have cognitive advantages: “The attainment of a second higher level of bilingual competence might be neces­sary to lead to accelerated cognitive growth” (Cummins, 1979: 230).

The threshold hypothesis has been criticised on the grounds that it cannot be quantified in strict linguistic terms; i.e. how many lexical items a learner may need to have to ‘be above’ that “ominous” lower threshold (to avoid a negative impact on learning), or what complexity and/or range of grammatical structures (s)he should be able to handle both receptively and productively to be considered a ‘top floor’ balanced bilingual. Intuitively, Cummins’ hypothe­sis has a lot of plausibility, though, especially these days when European school systems have to cater for a high percentage of second language learners who either have a migrant back­ground or come from families distant to scholastic endeavour (groups of students labelled “vulnerable learners” by the Council of Europe: Vollmer and Thürmann, 2010: 107).

1.2 CLIL classrooms in Germany

In Germany, ‘bilingual classrooms’ (in the form of a ‘bilingual stream’ = “bilin­gualer Zweig/Zug”) developed in the wake of the 1963 German-French treaty, pre­dominantly in humanities teaching at selected grammar schools along the Rhine valley, using French as the ‘working language’. It was a genuine grass roots movement adopted by highly committed subject teachers with a well above average competence in the foreign language and a special interest in intercultural encounters. Meanwhile, the curricular concept has spread to other types of secondary schools, other subjects (notably Biology) and other foreign languages (cf. Werner, 2009); such that some 700 secondary schools now teach one or two subjects for several years of the secondary curriculum in this way (what Canadians might call late immersion). The concept has attracted some criticism, particularly by experts in the field of history teaching (Hasberg, 2004) who see their subject instrumentalised for the goal of content-based foreign language instruction but remain sceptical towards the outcomes of the actual content teaching itself.

It is in this light that Berlin’s Ministry of Education commissioned an evaluation project of the most common, prototypical CLIL variant in Germany (called “bilingualer Sachfachunterricht” in Ger­many), which was still in its experimental stage in the capital after the turn of the 21st century. Three grammar schools were chosen which were already offering ‘regular’ and ‘bilingual’ classrooms (with samples of 85 and 106 pupils respectively). All testees were Grade 10 students (age 16+) having English as their first foreign language (taught from Grade 5 onwards). Throughout the first two years of a ‘bilingual stream’ the foreign language is given additional weight in the pupils’ timetable, usually with two extra lessons per week. It is only after this bridging support (= “Vorlauf”) that the first subject is taught through the foreign language (normally Geography, sometimes His­tory); often bolstered by an additional lesson because curricular progress in the subject tends to be slower at the beginning. The following year, a second subject is introduced (increasingly Biology instead of History) such that two subjects are taught ‘bilingually’ at the lower secon­dary level. (The French and German terms are somewhat misleading as the lessons are effectively monolingual). At the upper secondary level the working language shifts to Political Science which is a compulsory subject for all students.

2. Research methodology

2.1 Design of the study

Since the evaluation project was meant to investigate the summative and the formative dimen­sion of the curricular concept, the two samples of the comparative study had to be drawn from the natural settings of their respective schools. That is to say that the testees of the two samples could not be matched with regard to their socioeconomic background and/or cognitive dispo­sitions. Randomisation was also not possible, because during the evaluation (2001-2004) the three schools selected were the only ones in Berlin which had fully developed ‘bilingual streams’ offering the three core subjects. Otherwise, the samples would have become too small to conduct a reliable statistical analysis. All testees were submitted to the same English test (see 3.1 be­low), whereas the learners’ subject-matter competence was assessed in the respective working language: pupils drawn from regular classes used German input materials and answered tasks in German; CLIL students, however, processed the ‘same’ texts and tasks in English (otherwise test content was identical). For the fully-fledged (English) “Achievement & Proficiency Test” (= APT) the schools granted 180 minutes (two double lessons of 90 minutes each). Another 45 minutes of the second block of 2 x 90 minutes were reserved for two ques­tionnaires which elicited data relevant to learners’ family background as well as their attitudes, interests and learning strategies in the subjects under investigation (cf. Zydatiß, 2007, for a detailed description). The remaining 130 minutes were allocated to the assessment of subject-matter competence. The development of this test (for which no fore­runner was available) faced an additional unexpected difficulty: The choice and sequence of CLIL subjects and the content topics for each subject were chosen by school staff. Not a single curricular topic (e.g. the tropical rainforest, the Cuban missile crisis or photosynthesis) was covered by all of the schools in the sample.)

2.2 The research question

This situation had two consequences: First, I had to reflect on the superordinate educational value of the CLIL approach (= “Bildungswert”); considering the fact that the curricular topic was somehow ‘secondary’ to the teachers in charge, although the subject-specific learning objectives had to be accomplished. Secondly, I had to find a topical field of my own that not only covered thematic aspects of geography, history and biology, but which also tapped trans­ferable academic competencies pertaining to all three CLIL subjects. Input materials and tasks were meant to focus, in other words, on literacy skills (that is, on academic discourse compe­tencies) considered essential across the whole range of learners’ ‘bilingual’ subjects. Thus, the two-pronged hypothesis to be tested is that, on the one hand, CLIL learners exceed regular learners significantly with regard to their proficiency in English. On the other hand, CLIL learners only achieve adequate levels of academically relevant discourse competence (compa­rable to those attained by mother-tongue learners), if their proficiency in the (for­eign) working language is high (as indicated by their scores in the APT and/or some of its relevant sub-scales).

3. Results of the assessment in the two curricular areas

3.1 A comparison of regular and CLIL learners’ English proficiency

The integrated “Achievement & Proficiency Test” (APT) consists of four major components, as can be seen from Figure 2:

  1. a scale related to linguistic competence regarding vocabulary and grammar calling up the contextualised use of lexicogrammatical language exponents (= “Use of English”),
  2. a scale pertaining to learners’ general proficiency in English operationalised in a C-
    test with six independent texts and a cloze test tapping everyday colloquial language,
  3. a scale assessing students’ receptive communicative skills (i.e. different modes of listening and reading comprehension) and
  4. a scale probing pupils’ writing skills involving them in the production of three genres considered essential at the lower secondary level – namely a written summary of a listening text, a comment (a piece of subjective argumentation in the shape of a letter to the editor) and a picture story (i.e. a narrative text).

By adding up these sub-scales (see top of Figure 2) we can generate superordinate scales; for example, “communicative competencies”, “overall proficiency” and “APT”. Speaking was also assessed, via a communicative oral test involving about a quarter of the total student sam­ple. The results exhibited a substantial difference in students’ oral proficiency in favour of pupils coming from CLIL classrooms (Zydatiß, 2007: 239-265) as tested by two communica­tive tasks: participation in a small-group simulation game (= interactive speaking) and the realisation of an extended narrative turn (= spoken production) requiring an oral retelling of the content of the preceding role play from the participant’s specific role perspective.

Table 1 summarises the results of a statistical analysis testing the different test scores of the two samples for significance. The analysis draws upon the Chi²-test and the Lambda-test; the latter one being a directional statistical test (= “Richtungsmaß”) showing the difference in achievement between the two samples in terms of percentage. Statistically this is similar to a regression equation or gradient, because the percentage value reveals the extent to which testees of one sample outperform the learners in the other sample. Here up to 60 per cent of students from CLIL classrooms show markedly better developed linguistic and communicative compe­tences in English than pupils from regular classes of the same schools (on the highest level of statistical significance: *** or p < 0.001). Empirical studies in the educational domain hardly ever employ this test because dependencies on instructional variables of the scale reported here are very rare indeed. These findings are a great surprise, since learners in regular classes of German grammar schools can also have a fairly high level of proficiency in English.

Figure 2: Relative means in APT-scales (regular classes v. CLIL classes)

Table 1: Differences between regular and CLIL classes as regards learners’ linguistic and communicative competences in English

Scales of the English Test
Level of Significance Lambda in per cent
(in favour of CLIL)
APT (N = 164)
Overall Proficiency
Communicative Competences
Written Production (N = 163)
• Summary (N = 147)
• Comment (N = 154)
• PictureStory (N = 162)
Reception (N = 164)
• Listening
• Reading
General Proficiency (N = 164)
• C-Test
• ConversationalCloze
Use of English (N = 164)
• Grammar
• Vocabulary

A very impressive picture of the difference in achievement can be obtained if testees’ raw scores in the APT (the theoretical maximum therein was 331 points) are transformed into per­centiles. With N = 164 learners, splitting the group into six smaller groups, yielded a sensible choice of 27 or 28 testees for each achievement segment. Converting the absolute number of students in each segment into a percentage value we obtain Figure 3 showing a highly symmetrical but inverse distribution of English proficiency scores across the six attainment segments (*** p < .001, ** p < .01, * p < .05). The differences in scholastically obtained aptitude in English are remarkable.

Figure 3: Distribution of regular and CLIL students across sextiles in the APT

Figure 3
3.2 A comparison of regular and CLIL learners’ academic discourse competence relevant to subject-matter learning

As was pointed out above, the conception of an assessment tool probing CLIL learners’ aca­demic literacy skills across the range of ‘bilingual’ subjects offered at Berlin’s secondary schools led to the design of a (combined) teaching & testing unit, which was new to all stu­dents involved in the evaluation. It integrated a representative choice of continuous and dis­continuous text types; where the latter category comprises not only statistics, graphs, diagrams, charts and maps but also pictures and cartoons. The input materials are accompanied by a mix of tasks of varying complexity involving a range of prototypical study skills and cognitive processing activities judged as being realistic, necessary and/or distinctive by subject specialists for the age group, Grades, school type and particular curricular aims under investigation. In terms of content, I opted for the population explosion in the 19th century, because this topic could be linked to histori­cal, geographical and biological phenomena: industrialisation, employment changes, urban v. rural developments, emigration and immigration, infectious diseases, paths of infection, birth and mortality rates, public health and personal hygiene etc.. The resulting test (adding up to a maximum score of 220 points) was coded and marked in three ways in order to capture learners’ performance on subject-matter tasks (Table 2); i.e., with regard to:

  • the features of the tasks in relation to both the linguistic and the content profile of the input material (= “materialbezogene Leistungsanforderungen”),
  • the cognitive operations called upon for solving the tasks successfully and
  • the degree of openness of the questions inducing a specific response genre (bounded, half-open or open).

It is hoped that students’ performances on a test of this kind (with a range of authentic sources, documents and carefully designed tasks) can be attributed to certain commonalities of under­lying competencies developed by the learners via the preceding systematic and cumulative subject-matter teaching.

Table 2: Scales of the ADC-test (Academic Discourse Competencies)

Coding and Marking Modality: Analysis according to … Points
A Cognitive operations involved 220
Retrieving information
Single details in a clause or complex sentence
Single details in a paragraph
Single details in the text as a whole
Making inferences
Drawing simple conclusions
Clause-based inferences of single details
Inferences within a complex sentence (1 variable)
Drawing complex conclusions
Context-bound inferences (> 1 variable)
Conclusions using text-external knowledge
Concept formation
Matching technical terms and definitions
Using newly introduced technical terms
Forming a new conceptual field
Reflection & evaluation
Evaluating the form and message of a cartoon
Making notes for a personal comment
Making notes for an expository essay
B Task features and profile of the input material 220
Understanding continuous expository texts
Global comprehension of key ideas
Reading for detailed understanding
Search reading (scanning)

Genres relevant to subject-matter learning
(focus on discontinuous texts)

Single tables, graphs and bar charts
Flow chart
Short expository text & multiple diagrams (tables, maps)
Push-and-pull diagram
Giving evidence from various sources
Numerical calculation (exponential growth of bacteria)
Vector diagram

Text interpretation and text production
Describing and interpreting a cartoon
Written text production (personal comment, expository essay)
8 Understanding and using technical terms 30.0
C / 9 Degree of openness of questions 220
Bounded items
Half-open items
Open answers
The students’ success with this array of tasks might then qualify as evi­dence of their ability to activate knowledge of a more abstract, transferable nature: a construct which might be called ‘Academic Discourse Competence’ or ‘Cross- Curricular Subject-Matter Literacy’ (Cummins, 1978; 1979, surely was on the right track with his CALP concept: “Cognitive-Academic Language Proficiency”). Personally, I am convinced that such a general academic (cognition & language-based) proficiency is highly relevant to CLIL contexts, as well; because the overriding purpose of the CLIL approach in our multilingual and highly mobile societies would seem to be the empowerment of school learners (through the perform­ance of scholastic tasks) to acquire subject knowledge, study skills and cognitive operations (based on verbal thought) via a foreign language, almost regardless of which particular school subject or topic may be chosen in a specific instructional setting. CLIL set­tings must be liberated from their traditional self-im­posed conceptual straitjacket of wanting to maximise, foremost, foreign language proficiency (named BICS by Cummins, 1978; 1979). As such, a CLIL concept like “bilingualer Sachfachunterricht” must also face (similar to other variants of task-based language assessment) the problem of generalisability of its assessment procedures (cf. Bachman 2002; Mislevy et al., 2002): from concrete task perform­ance to abstract underlying constructs and from specific subject knowledge to transferable academic abilities. As for me, Academic Discourse Com­petence would be a close and valid approximation to this goal.

Testing the scores of the two sub-samples for significant differences with respect to the ADC-tasks we note a fairly balanced level of achievement in the two groups (N = 133). The 75 CLIL learners never fall behind regular learners (N = 58); and on a limited number of scales (out of nine major scales and 30 sub-scales) they even attain scores which are statistically higher on some level of significance (Table 3). Chi²-values are much lower, though, than in the English test (see Table 1). With some scales the rather rigorous Lambda-test reveals higher test scores for learners from CLIL classes; however, apart from information retrieval, text interpretation and text production these percentage differences are statistically not significant (= ns). In summary, it can be stated that the sample of CLIL learners as a whole, embedded in the curricular infrastructure of Berlin’s ‘bilingual streams’ (which tended to have a rather selected student population), appears to have developed academic discourse competen­cies on the same or on similar levels compared to those observed with pupils who attended regular classes getting subject-matter instruction in German.

Table 3: Differences between regular and CLIL classes as regards learners’ academic discourse competencies

No. Scales of the ADC
(N = 133)
Chi² Level of
Lambda in %
(in favour of CLIL)
ADC (test as a whole)
Retrieving information
Making inferences
Complex conclusions
New conceptual field
Discontinuous text types
Text interpretation and text production
Cartoon interpretation

Looking at the internal distribution of test scores in the two sub-samples (in analogy to Figure 3 but drawing upon quartiles to realise a sensible comparison: N = 75 and 58 respectively), a rather surprising picture holds with some of the more demanding tasks requiring open-ended, discursive answers (Figure 4). With these tasks, learners in the top half of the achieve­ment range tend to come from CLIL classes (about 60-65 per cent of their group), whereas about the same percentage of pupils from regular classes (taught in German) attain test scores which place them in the bottom half of the distribution. This inversely skewed distribution of the two sub-samples does not only hold with “Inferencing” (scale no. 2) but also with “Dis­continuous Texts” (no. 6), “Interpreting a Cartoon” (no. 71) and “Open Answers” (no. 93). The nearly balanced nature of most sub-scales of the ADC derives from the fact that students from regular classes gain an adequate number of points with bounded and half-open items; whereas CLIL learners excel at the cognitively more complex tasks involving interpretation, drawing conclusions and/or text-bound writing. Since the testees of the two groups could not be matched with regard to their cognitive dispositions, this result may well be a consequence of the selective nature of CLIL learners’ admittance to these classes (which demands further inves­tigation with matched or randomised samples).

Figure 4: Distribution of regular and CLIL students across quartiles in the ADC-scale “Making Inferences” (no. 2)

Figure 4

There is a problem, however, with the CLIL sample: 15-20 per cent of these learners are located in the lowest segment of the attainment distribution (throughout the open-ended, more chal­lenging tasks); which is not the case in the English test (see Figure 3). This begs the crucial question of whether there are relationships between a limited proficiency in the working lan­guage and subject-matter achievements.

4. Empirical evidence for a double language threshold

4.1 Relationships between CLIL learners’ performance on ADC- and APT-tasks

Statistically, a relationship between two variables with numerical data can be explored via a correlation. Relating ADC- and APT-scores (see Table 4) the Pearson correlation (= r) de­notes the direction and consistency of testees’ performance in either test (or its sub-scales). In order to say something about the strength of the relationship the correlation has to be squared, yielding the coefficient of determination (= r²). This value allows a prediction of the overlap between the two variables (in per cent); i.e., it measures the proportion of variability in one set of test scores as it is determined by its relationship with the other set of performance scores (correlations > 0.5 are considered to be high or ‘substantial’).

Table 4: Relationships between CLIL learners’ test scores on the ADC and selected APT-scales

Relationships between ADC-
test scores
(ie. the test as a whole) and …:

CLIL sample
(N = 70)

Scales of the APT
Correlations Significance
r r2 p Level
Achievement &
Proficiency Test
0.64 40.6% 0.000 ***
0.60 35.5% 0.000 ***
C-Test 0.55 29.8% 0.000 ***
Use of English 0.52 27.0% 0.000 ***
Reading Comprehension 0.49 24.4% 0.000 ***
Genre Writing 0.44 19.2% 0.000 ***
Grammar 0.37 13.8% 0.002 **
Vocabulary 0.57 32.6% 0.000 ***

Taking the correlations between the ADC and the APT (in their entirety) as a point of depar­ture (r = .64, r² = 40.6%) it can be observed that the strongest relationships between CLIL learners’ performance on ADC-tasks and tasks in the English test are with scales which tap linguistic competencies (notably the vocabulary test, which amounts to only 15 per cent of the total test score) and general foreign language proficiency (i.e. the C-test). For every third student in the CLIL sample, lexical competencies and discourse competence relevant to subject-matter learning converge at the highest level of statistical significance. Or, to put it another way, for a third of the sample, the ADC-scores can be predicted reliably from the vocabulary scores.

4.2 Quantifying two linguistic thresholds

Apart from Cummins’ seminal work on the threshold hypothesis there is a forerunner to the analysis presented here; namely Clapham’s (1996) work on developing the latest variant of the IELTS test (= International English Language Testing System), the widely used university entrance exam for (potential) students whose mother tongue is not English. Her research on the question whether advanced learners’ proficiency in English for Academic Purposes may be contingent on a language threshold is summarised as follows by Alderson (2000: 104):

As part of a study of the effect of content, specifically subject matter knowledge, she [Clapham, WZ] investigated the relationship between the language ability of students taking the IELTS test of reading for academic purposes and their ability to understand texts in and out of their own subject discipline. She discovered two linguistic thresholds, not one.

The first one, at a score of roughly 60% on her grammar test, represented a level of lin­guistic knowledge below which students were unable to understand texts even in their own subject discipline. The second, at a score of roughly 80% on the same test, repre­sented a level of linguistic knowledge above which students had little difficulty reading texts outside their own discipline.

Considering the facts that the ADC attempts to probe more than reading comprehension of expository texts and that ‘linguistic knowledge’ comprises more than grammatical compe­tence (notably the functional availability of a rich vocabulary, of word formation and collocations: see Zydatiß, 2007: 157), a contingency table was set up (Table 5), which matches CLIL learners’ scores on the ADC-test as a whole and their performance on the APT-scale “Use of English”. With a sample of N = 70 testees a frequency table with three achievement segments (= terciles) seems appropriate to avoid (if at all possible) very small numbers of ‘cases’ in certain cells of that table. The category “adjusted residual” indicates the difference between the empirical and the theoretical distribution for each cell; where the [–] stands for fewer cases than statistically expected (and vice versa for the implicit +).

Table 5: Frequency distribution of CLIL learners in relation to their attainments in the entire ADC-test and the APT-scale “Use of English”

Academic Discourse Competencies (= ADC) –
[subject-matter relevant]
Use of English
1. Tercile
< 38,5 P
2. Tercile
39-52 P
3. Tercile
> 52,5 P
1. Tercile (< 90 P) 6 6 6 18
% of ADC 75% 25%;  15.8% 25.7%
Adjusted residual 3.4** -0.1 -2.1  
2. Tercile (91 - 110 P) 2 12 6 20
% of ADC 25% 50% 15.8% 28.6%
Adjusted residual   -0.2 2.9* -2.6*  
3. Tercile (>111 P) 0 6 26 32
% of ADC 0% 25% 68.4% 45.7%
Adjusted residual 2.8* 2.5* 4.2***  
Total 8 24 38 70
% of ADC -100% 100% 100% 100%

Chi² = 24.3 ***; r² = 27% ***; N = 70

Note that an evenly balanced frequency distribution among the nine cells is not accomplished, mainly because test scores on the ‘lower’ percentile ranks (related to the linguistic compe­tences of vocabulary and grammar) contribute lower scores on the ADC-tasks; despite the fact that the APT-test of English proficiency had to be adjusted (in parts) to a fairly high level of global competence in English to suit learners from CLIL classrooms as well (over and above the level aimed at regular learners). In a second step of the analysis, the tercile boundaries of Table 5 are interpreted as heuristic levels of lexicogrammatical competence, such that 38.5 points count as the ‘lower threshold’ and 52.5 points as the ‘upper threshold’. “Use of English” is 80.5 points, the lower language threshold is pitched at 48 per cent and the upper threshold at 65 per cent (an interval of 17 per cent between the two thresholds). An analogous calculation for the other linguistically relevant scales of the APT (see Table 4) yields the following values for the two language thresholds (Table 6).

Table 6: Linguistic thresholds in relation to analogous ADC-performance

Scales of the APT: Double Linguistic Threshold
Sub-tests & entire test Lower Upper Difference
Vocabulary 51% 72% 21%
Grammar 39% 57% 18%
Use of English 48% 65% 17%
C-Test 48% 65% 17%
APT (as a whole) 56% 74% 18%

To summarise, I would maintain that adequate levels of academic discourse proficiency in Ger­man CLIL classrooms (of the extensive type realised by a ‘bilingual stream’) are dependent on rather high levels of linguistic competence (especially regarding vocabulary and grammar) and/or general proficiency in the working language (as mirrored in a C-test), if this academic proficiency is to be developed and used in a range of different subjects at sec­ondary school level. If, however, the condition of being above a ‘lower’ language threshold is not met (which in fact has to be quite high), CLIL learners run the risk of insufficient success with subject-matter instruction through a foreign language: Learners below the first threshold of lexicogrammatical competence are disproportionately often located in the lowest tercile of subject-matter task performance; whereas top achievers on the ADC-tasks have a high profi­ciency in the working language.

5. Consequences and perspectives

We need a lot more empirical research on the interdependencies between limited second / foreign language proficiencies and scholastic attainments in bilingual education settings, of which CLIL classrooms are only one type among others. To come back to the last point raised in 4.2: As reference points for ‘insufficient’ scholastic attainments we might have to consider the relevant Grade / curriculum objectives, the achievement levels attained by stu­dents in regular classes and CLIL learners’ current intellectual capacities (they may be ‘underachiev­ers’ in content learning due to their limited proficiency in the ‘working language’). I am not entirely convinced that we should open the fully fledged CLIL concept of the long-term ‘bilingual stream’ to all learners at the secondary school level, because it may adversely affect their academic success at school. The data from this empirical study strongly suggest (see already Cummins 1978, 1979 with regard to limited proficiency learners in majority language school settings) that there are language thresholds which can and must be interpreted as intervening variables that either impede or support subject-matter learning (see Clapham 1996 and Alderson 2000 for similar results with regard to non-native university candidates applying for higher studies in Britain). CLIL is not (primarily) a curricular concept aimed at maximising everyday foreign language proficiency (called BICS by Cummins 1978, 1979), since there seems to be a certain risk (see 4.2) that it impinges upon the goals of subject-matter teaching (see 1.2 above, especially the criticism voiced by Hasberg 2004). The curricular CLIL concept known as the ‘bilingual stream’ in Germany is primarily a means of acquiring knowledge, understanding, learning strategies and higher-order thinking skills in a limited range of domains of subject-matter learning. What is wanted then is quasi-experimental studies with randomised, non-selective samples of students or comparative projects matching testees with regard to their cognitive dispositions, motivational potential and/or socio-economic background. Since the CLIL concept has functionally diversified quite substantially, empirical evidence will have to be collected about subject-matter achievements in ‘bilingual modules’ (temporally limited CLIL units), in ‘com­petence, seminar or project courses’ (which are increasingly offered in German sixth form colleges, i.e. at the upper secondary level), in vocational schools, for example, through business stud­ies or technology in English and in bilingual immersion schools starting topic work where several content subjects are learnt through an additional language at the primary level. Keeping the hypothesis of a (highly probable) double language threshold in mind stakeholders in CLIL projects should firmly insist on maintaining the present organisa­tional structure of the ‘bilingual stream’ (especially the ‘bridging support’ at the beginning and the parallel foreign language lessons in the subsequent years). It is well known that politi­cians want to and have to economise, but CLIL concepts must not be used as ‘piggy banks’ for cuts in pupils’ timetables (by replacing foreign language pedagogy and majority language subject lessons with CLIL classes in order ‘to get two for the price of one’).

BICS and CALP (Cum­mins’ terms) are two very different types of footwear. What can and must be improved; how­ever, is the communication in the staff room, because chances for synergy effects between different curricular areas are often wasted due to a lack of cooperation among teachers. In terms of the didactic implications of the research at hand, we need more ‘basic’ pre-service teacher education regarding CLIL, but also more in-service teacher training at the later stages of professional development. CLIL is not a surefire recipe for success, but has to be organised carefully: by way of bifocal lesson planning (with due regard to content and language), adequate input materials and tasks of varying complexity (to cater for both differential student ability and differential demands of tasks), visual, verbal and social scaffolding, changing levels of ab­straction and (last but not least) systematic but sensible content and language integrated modes of assessment. Talking of assessment matters adequate for CLIL concepts forces us to pause for some nuclear theory-building in order to ponder the question what may, in fact, constitute the pedagogic and curricular specificity of the extensive ‘bilingual stream’. If a successful CLIL approach can neither be equated  (in simple exclusive terms) with the added value of a higher foreign language proficiency nor with scholastic attainments in individual subjects deemed comparable to those achieved by ‘regular’ learners, we must make sure that our assessment procedures become attuned to something ‘more abstract, deeper or more general’ (i.e. a theoretical construct). If the CLIL approach aims at cross-curricular, transferable competencies, a CLIL concept like “bilingualer Sachfachunterricht” must also address (like other variants of task-based language assessment) the problem of generalisability of its assessment procedures (cf. Bachman, 2002; Mislevy et al., 2002): from concrete task performance to abstract underlying constructs and from specific subject knowledge to transferable academic abilities. The construct of Academic Discourse Competence would be a close and valid approximation to this goal. Despite some open questions or even shortcomings, ‘bilingual streams’ are an important facet of many secondary schools in Germany, which enhance a school’s educational profile making it a very attractive choice to parents, pupils and teachers alike. Let us hope that this demanding curricular concept will receive all the official support it needs and deserves in terms of teacher education and development, organisational infrastructure, textbooks and other learning materials as well as additional extracurricular activities promoting not only transnational encounters and intercultural learning but also student, parent and teacher identification with the school programme.

