![]() |
|
| Prof Gui Shichun | |
The project has two phases: Phase One
is the development of CLEC (Chinese Learner English Corpus) which consists
of one million words.
-- Secondary school students: 300,000
words
-- Non-major college students: 300,000
words
-- English-majors ( intermediate):
200,000 words
English-majors ( advanced): 200,000
words
-- The samples should be genuine work
of the students; all errors should be kept intact.
-- The samples should be well distributed
and proportionate over all levels.
The samples should be collected from various sources, so that the written work is a spontaneous production of the user. When taking the written papers in a test, the candidates tend to adopt the avoidance strategy.
-- Concise but logical and systematic
scheming
-- Clear conceptual framework for error
defining
-- Detailed categorization of common
grammatical errors and rough for those with lower frequency of occurrence
Exclusion of stylistic errors (which can*t be tagged objectively and with consistency)
-- Providing adequate amount of error
information for further analysis
-- Open-endedness of error tags for
the convenience of later addition or revision
-- Easy / natural recognition and consistent
tagging operation by different project members
-- Word form: errors concerning individual
words only
-- Word class: errors found in larger
linguistic context (from phrase to discourse). Two levels of analysis
level 1 = word class division: verb phrases, noun phrases, adjectival phrases, prepositional phrases, pronouns, adverbs and conjunctions
* level 2 = specific errors for each
word class
Wording: errors concerning larger linguistic
context (from phrase to discourse), which include order, choice,
quantity and clarity
Collocation: errors concerning the co-occurrence
of notional words in a linguistic context (from phrase to discourse)
Sentence: structural and semantic errors
concerning a whole sentence, punctuation included.
* Uncertainty: erroneous or doubtful expressions whose classification/tagging awaiting further consideration
An error tag provides the following information:
1) error type
2) erroneous word(s)
3) minimal error recognition
context
Finally an error tagging scheme consisting of 63 error types was designed.
The idea is that errors can be retrieved
in different context and studied.
| Code | Type |
| ST(student) | = 1 (Junior Middle School) |
| = 2 (Senior Middle School) | |
| = 3 (Non-Major English, Level 4, undergraduate) | |
| = 4 (Non-Major English, Level 6, undergraduate) | |
| = 5 (English Major, 1st to 2nd -year, undergraduate) | |
| = 6 (English Major,3rd to 4th -year, undergraduate) | |
| = 7 (English Major, postgraduate) |
| Code | Type |
| SEX | = 1 (Male) |
| = 2 (Female) | |
| Y (number of years in learning English) | |
| = accumulative years (e.g. 6, 9,) | |
| =DN (Don't know) | |
| SCH£¨the name of the school£© | |
| = Provided by the sample collector, using the first letter of Chinese Pinying as an acronym, its total length should be less than 3, e.g. Hanmin Senior Middle School as HMS) | |
| = DN (Don*t know) |
| Code | Type |
| AGE | = Natural age (e.g. 15,20*) |
| = DN (Don*t know) | |
| WAY£¨the way the paper is written) | |
| = 1 (test paper) | |
| = 2 (classroom assignment) | |
| = 3 (homework) | |
| = DN (Don*t know) | |
| DIC£¨Have dictionaries been used in ?£© | |
| = 1 (Yes) | |
| = 2 (No) | |
| = DN (Don*t know) |
| Code | Type |
| TYP (essay type) | = 1 (argumentative, expository) |
| = DN (Don*t know) | |
| = 2 (narrative, descriptive) | |
| = 3 (practical: letters, diaries, notes, form-filling, etc) | |
| = 4 (others) |
-- Researchers can also use other concordancers
like Longman*s , MicroConcord, TACT, etc., to retrieve the texts
-- Putting the corpus on the internet
* The correlation between error tags
and grammatical tags
* Stylistic differences in English
between Chinese learners and native speakers
* Identification of errors specific
to learner level
* Error sources (influence of the mother
tongue, over-generalization, etc.)
* Qualitative studies of specific error
types (verbs, prepositions, patterns, etc.)
* Implications for English teaching
in the Chinese context