zhongwen
大家好~ Curating a collection of Mandarin Chinese vocabulary, idioms (成语), and characters (汉字). Utilizing data from HSK 3.0, RSH, and other frequency lists. Providing information such as meanings, pinyin pronunciation, character decomposition, frequency order, and relevant tags.
| csv file name | words | characters (汉字) | total |
|---|---|---|---|
| hsk 3.0 - characters.csv | x | 11,092 | |
| hsk 3.0 - words.csv | x | 3,000 | |
| rsh.csv | x | 3,000 | |
| chengyu_by_theme.csv | 2,350 | ||
| mega_hanzi_compilation.csv | x | 11,266 |
Table of Contents
HSK 3.0 (11,092 words and 3,000 characters)
Remembering Simplified Hanzi - RSH (3,000 characters)
Chengyu 成语 - Chinese Idioms ordered by theme
General Standard Chinese Characters 通用规范汉字表 (8,105 characters)
Jun Da's Character frequency list of Modern Chinese List (9,933 characters)
Character frequency list compilation of 11,266 characters
Additional Language Learning Resources

HSK 3.0 汉语水平考试 (Chinese Proficiency Test)
People's Republic of China's standardized test of proficiency in PRC Standard Chinese for non-native speakers.
Both list, includes characters, pinyin, and definitions
- Characters (recognition) - 3,000
- Words - 11,092
Remembering Simplified Hanzi (RSH)
3,000 characters
Book 1 and 2.
By James W. Heisig, Timothy W.Richardson. Book 1 of Remembering Simplified Hanzi covers the writing and meaning of the 1,000 most commonly used characters in the simplified Chinese writing system, plus another 500 that are best learned at an early stage. (Book 2 adds another 1,500 characters for a total of 3,000.)
Chengyu 成语
Chengyu are a type of traditional Chinese idiomatic expressions, most of which consist of four characters.
Data source : 中国成语大全,值得收藏!(Chengyu)
Jun Da's Modern Chinese Character Frequency List
This website provides character frequency lists generated from a large corpus of Chinese texts collected from online sources.
https://lingua.mtsu.edu/chinese-computing/statistics/
General Standard Chinese Characters 通用规范汉字表
The Table of General Standard Chinese Characters is the current standard list of 8,105 Chinese characters published by the government of the People's Republic of China and promulgated in June 2013.
Compilation of 11,266 Hanzi
Used the following sources:
Lists
Dictionaries
Parser/Module/Libraries
- CC-CEDICT parser
- Pinyin Parser (Python/pip), Github
- Chinese Character Module - Character decomposition (npm), Github
- Cantonese Module (Python/pip)