About word-learning-system

An intelligent adaptive vocabulary learning system for Chinese as a Foreign Language (CFL) — built from master's thesis research at Peking University. Features AI-driven personalized learning, SM-2 spaced repetition, and real-time analytics.

1

Published by

1137043480

Visit View Profile

README.md

View on GitHub

🎓 Adaptive Chinese Vocabulary Learning System

Version: 2.2.0 · Status: Production · Last Updated: March 2026

🌐 Try it now → learnchinese.kzwbelieve.top — No installation required! Works on mobile & desktop.

An intelligent, adaptive vocabulary learning system for intermediate-level Chinese as a Foreign Language (CFL) learners. Built as part of a master's thesis at Peking University — "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System" — this project implements a full-stack learning platform with AI-driven personalized learning paths, spaced repetition, and comprehensive learning analytics.

✨ Key Features

🧠 Adaptive Recommendation Engine — AI-powered personalized learning path based on user proficiency, learning patterns, and performance history
🔄 Spaced Repetition (SM-2) — Scientific review scheduling based on the SuperMemo-2 algorithm with personalized intervals
📊 Learning Analytics Dashboard — Real-time data visualization with mastery heatmaps, trend analysis, and predictive insights
📝 VKS-based Assessment — Vocabulary Knowledge Scale testing to determine optimal learning entry points
⏱️ Millisecond-precision Tracking — Fine-grained learning behavior recording for research-grade data collection
🔊 TTS Audio Pronunciation — Built-in text-to-speech for characters, words, collocations, and example sentences
🔗 Multi-module Learning Chain — Character → Vocabulary → Collocation → Sentence progressive learning flow
📖 SLA-informed Curriculum Design — Learning materials grounded in Second Language Acquisition theory: word frequency-based difficulty grading via BCC corpus (billions of tokens), NLP-powered collocation extraction using dependency parsing and mutual information, automated sentence complexity scoring, and interlanguage corpus-based confused word identification
📱 PWA Support — Install as a native-like app on iOS, Android, and desktop; works offline with Service Worker caching
☁️ Cross-device Progress Sync — Learning state persisted to backend; switch devices without losing progress

📸 Screenshots

Click to view all 9 screenshots 👇

Home Page	VKS Assessment

Character Learning	Word Learning

Collocation Learning	Sentence Learning

Vocabulary Exercise	Learning Dashboard

Today's Review

🛠️ Tech Stack

Layer	Technology
Frontend	Next.js 14, React, TypeScript, Tailwind CSS, shadcn/ui
Backend	Flask, SQLAlchemy, SQLite
PWA	Service Worker, Web App Manifest, offline caching
Algorithm	Modified SuperMemo-2, Multi-factor recommendation engine
ML Models	AdaBoost (Multinomial NB), Gaussian NB, XGBoost with voting ensemble
NLP Pipeline	BCC corpus frequency analysis, dependency parsing, mutual information scoring
Deployment	Nginx, PM2, VPS with HTTPS

📚 Research Foundation

This system is built on rigorous academic research at Peking University, combining SLA theory, NLP techniques, and adaptive learning algorithms:

Corpus-driven vocabulary selection — Word frequency analysis across BCC corpus (billions of tokens) and a self-collected CFL textbook corpus (165K characters from 13 intermediate-level textbooks) using Pandas and SQL
Frequency-difficulty modeling — Implements Stewart's finding that log(corpus frequency) strongly correlates with word difficulty (r=0.8), enabling automated difficulty grading
NLP-based collocation extraction — Collocations sourced from a knowledge base built with dependency parsing and mutual information filtering, ranked by collocation strength
Automated sentence selection — Sentence complexity computed by summing normalized word difficulties, selecting the lowest-complexity example sentences from textbook corpora
Interlanguage error analysis — Confused words extracted from the HSK Dynamic Composition Corpus based on learner error frequency, with separated learning to avoid semantic clustering interference
"Relative Character-based" pedagogy — Following Bai Lesan's theory: learning characters through words (以词带字) at intermediate level, covering pronunciation, form, and high-frequency meanings
Cognitive load balancing — High/mid/low frequency words and confused words distributed evenly across learning sessions
Validated with real learners — Two-month teaching experiment with 17 HSK-4 learners, 51 users total, producing statistically significant improvements in vocabulary acquisition, collocation learning, and word proficiency

🌐 Live Demo

No installation needed! Visit the live deployment directly:

👉 learnchinese.kzwbelieve.top

The system is deployed on a VPS with Nginx reverse proxy, PM2 process management, and full backend/frontend services running 24/7.

🚀 Quick Start (Local Development)

Prerequisites

Python 3.11+ (conda recommended)
Node.js 18+

Installation

# Clone the repository
git clone https://github.com/1137043480/word-learning-system.git
cd word-learning-system

# Install backend dependencies
pip install -r requirements.txt

# Install frontend dependencies
npm install

Running the System

Option 1: One-click Start (Recommended)

# Auto-generate test data and start API server
./start_system.sh

# In another terminal, start the frontend
npm run dev

Option 2: Manual Start

# Start Phase 2 API server (port 5004)
python app_phase2.py

# Start frontend dev server (port 3000)
npm run dev

Option 3: Docker Deployment

# Production deployment with Docker Compose
docker-compose -f docker-compose.prod.yml up -d

Access

Local: http://localhost:3000 (dev) or http://localhost:3002 (Docker)
Live: http://learnchinese.kzwbelieve.top

🎯 Feature Tour

Recommended Experience Path

System Status → /system-status — Check service health and architecture overview
Phase 2 Demo → /phase2-demo — Interactive demo of the adaptive recommendation engine
Learning Dashboard → /learning-dashboard — Full learning analytics and visualization
Start Learning → /word-learning-entrance — VKS-guided personalized learning experience

Core Pages

Page	Route	Description
Home	`/`	Welcome page and learning entry
VKS Assessment	`/word-learning-entrance`	Vocabulary Knowledge Scale test
Character Learning	`/character-learning`	Chinese character module
Vocabulary Learning	`/word-learning`	Word meaning and usage
Collocation Learning	`/collocation-learning`	Word collocation patterns
Sentence Learning	`/sentence-learning`	Contextual sentence practice
Exercises	`/exercise`	Three exercise types
Learning Dashboard	`/learning-dashboard`	Analytics and insights ⭐
Phase 2 Demo	`/phase2-demo`	Feature demonstration ⭐
System Status	`/system-status`	Health check

🔌 API Reference

Service Ports

Port	Service
3000	Next.js Frontend
5004	Phase 2 API (primary) ⭐
5002	Phase 1 Extended API
5001	Original API

Key Endpoints

# System statistics
GET /api/stats

# Adaptive recommendations for a user
GET /api/adaptive/recommendation/{user_id}

# Learning dashboard data
GET /api/analytics/user/{user_id}/dashboard

# Due review items
GET /api/review/user/{user_id}/due

# User list
GET /api/users

# Learning state persistence (cross-device sync)
GET  /api/users/{user_id}/learning-state
PUT  /api/users/{user_id}/learning-state

# Learning session management
POST /api/learning/session/start
POST /api/learning/session/end
POST /api/learning/events/batch

🧠 How the Adaptive Engine Works

Recommendation Logic

The system uses a multi-layer recommendation strategy:

Urgent Review — Items at risk of being forgotten (based on memory decay model)
Scheduled Review — Items due for spaced repetition review
New Content — Fresh material matched to the learner's proficiency level

Key Algorithms

Modified SM-2: Personalized interval scheduling based on individual performance
Memory Strength Model: Multi-factor assessment of retention probability
User Pattern Recognition: Classifies learners by efficiency, accuracy, and preferences
Confidence Scoring: Each recommendation includes a confidence rating

📊 Performance Metrics

Algorithm Performance

Metric	Value
Recommendation response time	< 300ms
Recommendation accuracy	> 85%
Review timing accuracy	> 90%
Learning efficiency improvement	> 25%

System Performance

Metric	Value
Dashboard load time	< 1.5s
Concurrent request handling (100 req)	< 2s
Data accuracy	99.5%
Real-time update latency	< 100ms

📂 Project Structure

├── pages/                    # Next.js pages
│   ├── index.tsx            # Home page
│   ├── word-learning-entrance.tsx  # VKS assessment
│   ├── learning-dashboard.tsx      # Analytics dashboard ⭐
│   ├── phase2-demo.tsx             # Feature demo ⭐
│   └── exercise.tsx                # Practice exercises
├── components/ui/            # UI component library (shadcn)
├── src/
│   ├── context/             # React Context providers
│   ├── hooks/               # Custom React hooks
│   └── lib/                 # Utility functions
├── app_phase2.py            # Phase 2 API server ⭐
├── adaptive_engine.py       # Adaptive recommendation engine
├── models_extended.py       # Database models
├── start_system.sh          # One-click startup script
└── README.md                # This file

📈 Dataset Scale

Metric	Count
Test Users	51
Learning Sessions	4,050
Exercise Records	15,200
Learning Events	50,100

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Development Guidelines

React components: Functional components + TypeScript
Code style: 2-space indentation, PascalCase file naming
Python: PEP 8 compliant
Commits: Conventional Commits format

📄 License

This project is open source and available under the MIT License.

📚 Documentation

中文文档 (Chinese README)

Built with ❤️ for language learners worldwide Based on a master's thesis at Peking University: "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System"

word-learning-system