π Adaptive Chinese Vocabulary Learning System
Version: 2.2.0 Β· Status: Production Β· Last Updated: March 2026
π Try it now β learnchinese.kzwbelieve.top β No installation required! Works on mobile & desktop.
An intelligent, adaptive vocabulary learning system for intermediate-level Chinese as a Foreign Language (CFL) learners. Built as part of a master's thesis at Peking University β "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System" β this project implements a full-stack learning platform with AI-driven personalized learning paths, spaced repetition, and comprehensive learning analytics.
β¨ Key Features
- π§ Adaptive Recommendation Engine β AI-powered personalized learning path based on user proficiency, learning patterns, and performance history
- π Spaced Repetition (SM-2) β Scientific review scheduling based on the SuperMemo-2 algorithm with personalized intervals
- π Learning Analytics Dashboard β Real-time data visualization with mastery heatmaps, trend analysis, and predictive insights
- π VKS-based Assessment β Vocabulary Knowledge Scale testing to determine optimal learning entry points
- β±οΈ Millisecond-precision Tracking β Fine-grained learning behavior recording for research-grade data collection
- π TTS Audio Pronunciation β Built-in text-to-speech for characters, words, collocations, and example sentences
- π Multi-module Learning Chain β Character β Vocabulary β Collocation β Sentence progressive learning flow
- π SLA-informed Curriculum Design β Learning materials grounded in Second Language Acquisition theory: word frequency-based difficulty grading via BCC corpus (billions of tokens), NLP-powered collocation extraction using dependency parsing and mutual information, automated sentence complexity scoring, and interlanguage corpus-based confused word identification
- π± PWA Support β Install as a native-like app on iOS, Android, and desktop; works offline with Service Worker caching
- βοΈ Cross-device Progress Sync β Learning state persisted to backend; switch devices without losing progress
πΈ Screenshots
Click to view all 9 screenshots π
| Home Page | VKS Assessment |
![]() |
![]() |
| Character Learning | Word Learning |
![]() |
![]() |
| Collocation Learning | Sentence Learning |
![]() |
![]() |
| Vocabulary Exercise | Learning Dashboard |
![]() |
![]() |
| Today's Review | |
![]() |
π οΈ Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React, TypeScript, Tailwind CSS, shadcn/ui |
| Backend | Flask, SQLAlchemy, SQLite |
| PWA | Service Worker, Web App Manifest, offline caching |
| Algorithm | Modified SuperMemo-2, Multi-factor recommendation engine |
| ML Models | AdaBoost (Multinomial NB), Gaussian NB, XGBoost with voting ensemble |
| NLP Pipeline | BCC corpus frequency analysis, dependency parsing, mutual information scoring |
| Deployment | Nginx, PM2, VPS with HTTPS |
π Research Foundation
This system is built on rigorous academic research at Peking University, combining SLA theory, NLP techniques, and adaptive learning algorithms:
- Corpus-driven vocabulary selection β Word frequency analysis across BCC corpus (billions of tokens) and a self-collected CFL textbook corpus (165K characters from 13 intermediate-level textbooks) using Pandas and SQL
- Frequency-difficulty modeling β Implements Stewart's finding that log(corpus frequency) strongly correlates with word difficulty (r=0.8), enabling automated difficulty grading
- NLP-based collocation extraction β Collocations sourced from a knowledge base built with dependency parsing and mutual information filtering, ranked by collocation strength
- Automated sentence selection β Sentence complexity computed by summing normalized word difficulties, selecting the lowest-complexity example sentences from textbook corpora
- Interlanguage error analysis β Confused words extracted from the HSK Dynamic Composition Corpus based on learner error frequency, with separated learning to avoid semantic clustering interference
- "Relative Character-based" pedagogy β Following Bai Lesan's theory: learning characters through words (δ»₯θ―εΈ¦ε) at intermediate level, covering pronunciation, form, and high-frequency meanings
- Cognitive load balancing β High/mid/low frequency words and confused words distributed evenly across learning sessions
- Validated with real learners β Two-month teaching experiment with 17 HSK-4 learners, 51 users total, producing statistically significant improvements in vocabulary acquisition, collocation learning, and word proficiency
π Live Demo
No installation needed! Visit the live deployment directly:
π learnchinese.kzwbelieve.top
The system is deployed on a VPS with Nginx reverse proxy, PM2 process management, and full backend/frontend services running 24/7.
π Quick Start (Local Development)
Prerequisites
- Python 3.11+ (conda recommended)
- Node.js 18+
Installation
# Clone the repository
git clone https://github.com/1137043480/word-learning-system.git
cd word-learning-system
# Install backend dependencies
pip install -r requirements.txt
# Install frontend dependencies
npm install
Running the System
Option 1: One-click Start (Recommended)
# Auto-generate test data and start API server
./start_system.sh
# In another terminal, start the frontend
npm run dev
Option 2: Manual Start
# Start Phase 2 API server (port 5004)
python app_phase2.py
# Start frontend dev server (port 3000)
npm run dev
Option 3: Docker Deployment
# Production deployment with Docker Compose
docker-compose -f docker-compose.prod.yml up -d
Access
- Local: http://localhost:3000 (dev) or http://localhost:3002 (Docker)
- Live: http://learnchinese.kzwbelieve.top
π― Feature Tour
Recommended Experience Path
- System Status β
/system-statusβ Check service health and architecture overview - Phase 2 Demo β
/phase2-demoβ Interactive demo of the adaptive recommendation engine - Learning Dashboard β
/learning-dashboardβ Full learning analytics and visualization - Start Learning β
/word-learning-entranceβ VKS-guided personalized learning experience
Core Pages
| Page | Route | Description |
|---|---|---|
| Home | / |
Welcome page and learning entry |
| VKS Assessment | /word-learning-entrance |
Vocabulary Knowledge Scale test |
| Character Learning | /character-learning |
Chinese character module |
| Vocabulary Learning | /word-learning |
Word meaning and usage |
| Collocation Learning | /collocation-learning |
Word collocation patterns |
| Sentence Learning | /sentence-learning |
Contextual sentence practice |
| Exercises | /exercise |
Three exercise types |
| Learning Dashboard | /learning-dashboard |
Analytics and insights β |
| Phase 2 Demo | /phase2-demo |
Feature demonstration β |
| System Status | /system-status |
Health check |
π API Reference
Service Ports
| Port | Service |
|---|---|
| 3000 | Next.js Frontend |
| 5004 | Phase 2 API (primary) β |
| 5002 | Phase 1 Extended API |
| 5001 | Original API |
Key Endpoints
# System statistics
GET /api/stats
# Adaptive recommendations for a user
GET /api/adaptive/recommendation/{user_id}
# Learning dashboard data
GET /api/analytics/user/{user_id}/dashboard
# Due review items
GET /api/review/user/{user_id}/due
# User list
GET /api/users
# Learning state persistence (cross-device sync)
GET /api/users/{user_id}/learning-state
PUT /api/users/{user_id}/learning-state
# Learning session management
POST /api/learning/session/start
POST /api/learning/session/end
POST /api/learning/events/batch
π§ How the Adaptive Engine Works
Recommendation Logic
The system uses a multi-layer recommendation strategy:
- Urgent Review β Items at risk of being forgotten (based on memory decay model)
- Scheduled Review β Items due for spaced repetition review
- New Content β Fresh material matched to the learner's proficiency level
Key Algorithms
- Modified SM-2: Personalized interval scheduling based on individual performance
- Memory Strength Model: Multi-factor assessment of retention probability
- User Pattern Recognition: Classifies learners by efficiency, accuracy, and preferences
- Confidence Scoring: Each recommendation includes a confidence rating
π Performance Metrics
Algorithm Performance
| Metric | Value |
|---|---|
| Recommendation response time | < 300ms |
| Recommendation accuracy | > 85% |
| Review timing accuracy | > 90% |
| Learning efficiency improvement | > 25% |
System Performance
| Metric | Value |
|---|---|
| Dashboard load time | < 1.5s |
| Concurrent request handling (100 req) | < 2s |
| Data accuracy | 99.5% |
| Real-time update latency | < 100ms |
π Project Structure
βββ pages/ # Next.js pages
β βββ index.tsx # Home page
β βββ word-learning-entrance.tsx # VKS assessment
β βββ learning-dashboard.tsx # Analytics dashboard β
β βββ phase2-demo.tsx # Feature demo β
β βββ exercise.tsx # Practice exercises
βββ components/ui/ # UI component library (shadcn)
βββ src/
β βββ context/ # React Context providers
β βββ hooks/ # Custom React hooks
β βββ lib/ # Utility functions
βββ app_phase2.py # Phase 2 API server β
βββ adaptive_engine.py # Adaptive recommendation engine
βββ models_extended.py # Database models
βββ start_system.sh # One-click startup script
βββ README.md # This file
π Dataset Scale
| Metric | Count |
|---|---|
| Test Users | 51 |
| Learning Sessions | 4,050 |
| Exercise Records | 15,200 |
| Learning Events | 50,100 |
π€ Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Development Guidelines
- React components: Functional components + TypeScript
- Code style: 2-space indentation, PascalCase file naming
- Python: PEP 8 compliant
- Commits: Conventional Commits format
π License
This project is open source and available under the MIT License.
π Documentation
Built with β€οΈ for language learners worldwide Based on a master's thesis at Peking University: "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System"








