Home
Softono
word-learning-system

word-learning-system

Open source MIT TypeScript
24
Stars
1
Forks
2
Issues
0
Watchers
2 months
Last Commit

About word-learning-system

An intelligent adaptive vocabulary learning system for Chinese as a Foreign Language (CFL) β€” built from master's thesis research at Peking University. Features AI-driven personalized learning, SM-2 spaced repetition, and real-time analytics.

Platforms

Web Self-hosted

Languages

TypeScript

Links

πŸŽ“ Adaptive Chinese Vocabulary Learning System

δΈ­ζ–‡ζ–‡ζ‘£ License: MIT Live Demo GitHub stars PWA Ready

Version: 2.2.0 Β· Status: Production Β· Last Updated: March 2026

🌐 Try it now β†’ learnchinese.kzwbelieve.top β€” No installation required! Works on mobile & desktop.

An intelligent, adaptive vocabulary learning system for intermediate-level Chinese as a Foreign Language (CFL) learners. Built as part of a master's thesis at Peking University β€” "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System" β€” this project implements a full-stack learning platform with AI-driven personalized learning paths, spaced repetition, and comprehensive learning analytics.


✨ Key Features

  • 🧠 Adaptive Recommendation Engine β€” AI-powered personalized learning path based on user proficiency, learning patterns, and performance history
  • πŸ”„ Spaced Repetition (SM-2) β€” Scientific review scheduling based on the SuperMemo-2 algorithm with personalized intervals
  • πŸ“Š Learning Analytics Dashboard β€” Real-time data visualization with mastery heatmaps, trend analysis, and predictive insights
  • πŸ“ VKS-based Assessment β€” Vocabulary Knowledge Scale testing to determine optimal learning entry points
  • ⏱️ Millisecond-precision Tracking β€” Fine-grained learning behavior recording for research-grade data collection
  • πŸ”Š TTS Audio Pronunciation β€” Built-in text-to-speech for characters, words, collocations, and example sentences
  • πŸ”— Multi-module Learning Chain β€” Character β†’ Vocabulary β†’ Collocation β†’ Sentence progressive learning flow
  • πŸ“– SLA-informed Curriculum Design β€” Learning materials grounded in Second Language Acquisition theory: word frequency-based difficulty grading via BCC corpus (billions of tokens), NLP-powered collocation extraction using dependency parsing and mutual information, automated sentence complexity scoring, and interlanguage corpus-based confused word identification
  • πŸ“± PWA Support β€” Install as a native-like app on iOS, Android, and desktop; works offline with Service Worker caching
  • ☁️ Cross-device Progress Sync β€” Learning state persisted to backend; switch devices without losing progress

πŸ“Έ Screenshots

Click to view all 9 screenshots πŸ‘‡
Home Page VKS Assessment
Home Page - Mobile-first welcome interface with navigation to all learning modules VKS Assessment - Vocabulary Knowledge Scale test to determine learning entry point
Character Learning Word Learning
Character Learning - Chinese character breakdown with pinyin, stroke order, and definitions Word Learning - Deep dive into word meanings, collocations, and usage
Collocation Learning Sentence Learning
Collocation Learning - Mastering native-like phrasing combinations Sentence Learning - Contextual reading and listening practice
Vocabulary Exercise Learning Dashboard
Vocabulary Exercise - Interactive quizzes with immediate feedback Learning Dashboard - AI-powered smart recommendations with confidence scoring
Today's Review
Spaced Repetition Review - Daily personalized review tasks

πŸ› οΈ Tech Stack

Layer Technology
Frontend Next.js 14, React, TypeScript, Tailwind CSS, shadcn/ui
Backend Flask, SQLAlchemy, SQLite
PWA Service Worker, Web App Manifest, offline caching
Algorithm Modified SuperMemo-2, Multi-factor recommendation engine
ML Models AdaBoost (Multinomial NB), Gaussian NB, XGBoost with voting ensemble
NLP Pipeline BCC corpus frequency analysis, dependency parsing, mutual information scoring
Deployment Nginx, PM2, VPS with HTTPS

πŸ“š Research Foundation

This system is built on rigorous academic research at Peking University, combining SLA theory, NLP techniques, and adaptive learning algorithms:

  • Corpus-driven vocabulary selection β€” Word frequency analysis across BCC corpus (billions of tokens) and a self-collected CFL textbook corpus (165K characters from 13 intermediate-level textbooks) using Pandas and SQL
  • Frequency-difficulty modeling β€” Implements Stewart's finding that log(corpus frequency) strongly correlates with word difficulty (r=0.8), enabling automated difficulty grading
  • NLP-based collocation extraction β€” Collocations sourced from a knowledge base built with dependency parsing and mutual information filtering, ranked by collocation strength
  • Automated sentence selection β€” Sentence complexity computed by summing normalized word difficulties, selecting the lowest-complexity example sentences from textbook corpora
  • Interlanguage error analysis β€” Confused words extracted from the HSK Dynamic Composition Corpus based on learner error frequency, with separated learning to avoid semantic clustering interference
  • "Relative Character-based" pedagogy β€” Following Bai Lesan's theory: learning characters through words (δ»₯词带字) at intermediate level, covering pronunciation, form, and high-frequency meanings
  • Cognitive load balancing β€” High/mid/low frequency words and confused words distributed evenly across learning sessions
  • Validated with real learners β€” Two-month teaching experiment with 17 HSK-4 learners, 51 users total, producing statistically significant improvements in vocabulary acquisition, collocation learning, and word proficiency

🌐 Live Demo

No installation needed! Visit the live deployment directly:

πŸ‘‰ learnchinese.kzwbelieve.top

The system is deployed on a VPS with Nginx reverse proxy, PM2 process management, and full backend/frontend services running 24/7.


πŸš€ Quick Start (Local Development)

Prerequisites

  • Python 3.11+ (conda recommended)
  • Node.js 18+

Installation

# Clone the repository
git clone https://github.com/1137043480/word-learning-system.git
cd word-learning-system

# Install backend dependencies
pip install -r requirements.txt

# Install frontend dependencies
npm install

Running the System

Option 1: One-click Start (Recommended)

# Auto-generate test data and start API server
./start_system.sh

# In another terminal, start the frontend
npm run dev

Option 2: Manual Start

# Start Phase 2 API server (port 5004)
python app_phase2.py

# Start frontend dev server (port 3000)
npm run dev

Option 3: Docker Deployment

# Production deployment with Docker Compose
docker-compose -f docker-compose.prod.yml up -d

Access


🎯 Feature Tour

Recommended Experience Path

  1. System Status β†’ /system-status β€” Check service health and architecture overview
  2. Phase 2 Demo β†’ /phase2-demo β€” Interactive demo of the adaptive recommendation engine
  3. Learning Dashboard β†’ /learning-dashboard β€” Full learning analytics and visualization
  4. Start Learning β†’ /word-learning-entrance β€” VKS-guided personalized learning experience

Core Pages

Page Route Description
Home / Welcome page and learning entry
VKS Assessment /word-learning-entrance Vocabulary Knowledge Scale test
Character Learning /character-learning Chinese character module
Vocabulary Learning /word-learning Word meaning and usage
Collocation Learning /collocation-learning Word collocation patterns
Sentence Learning /sentence-learning Contextual sentence practice
Exercises /exercise Three exercise types
Learning Dashboard /learning-dashboard Analytics and insights ⭐
Phase 2 Demo /phase2-demo Feature demonstration ⭐
System Status /system-status Health check

πŸ”Œ API Reference

Service Ports

Port Service
3000 Next.js Frontend
5004 Phase 2 API (primary) ⭐
5002 Phase 1 Extended API
5001 Original API

Key Endpoints

# System statistics
GET /api/stats

# Adaptive recommendations for a user
GET /api/adaptive/recommendation/{user_id}

# Learning dashboard data
GET /api/analytics/user/{user_id}/dashboard

# Due review items
GET /api/review/user/{user_id}/due

# User list
GET /api/users

# Learning state persistence (cross-device sync)
GET  /api/users/{user_id}/learning-state
PUT  /api/users/{user_id}/learning-state

# Learning session management
POST /api/learning/session/start
POST /api/learning/session/end
POST /api/learning/events/batch

🧠 How the Adaptive Engine Works

Recommendation Logic

The system uses a multi-layer recommendation strategy:

  1. Urgent Review β€” Items at risk of being forgotten (based on memory decay model)
  2. Scheduled Review β€” Items due for spaced repetition review
  3. New Content β€” Fresh material matched to the learner's proficiency level

Key Algorithms

  • Modified SM-2: Personalized interval scheduling based on individual performance
  • Memory Strength Model: Multi-factor assessment of retention probability
  • User Pattern Recognition: Classifies learners by efficiency, accuracy, and preferences
  • Confidence Scoring: Each recommendation includes a confidence rating

πŸ“Š Performance Metrics

Algorithm Performance

Metric Value
Recommendation response time < 300ms
Recommendation accuracy > 85%
Review timing accuracy > 90%
Learning efficiency improvement > 25%

System Performance

Metric Value
Dashboard load time < 1.5s
Concurrent request handling (100 req) < 2s
Data accuracy 99.5%
Real-time update latency < 100ms

πŸ“‚ Project Structure

β”œβ”€β”€ pages/                    # Next.js pages
β”‚   β”œβ”€β”€ index.tsx            # Home page
β”‚   β”œβ”€β”€ word-learning-entrance.tsx  # VKS assessment
β”‚   β”œβ”€β”€ learning-dashboard.tsx      # Analytics dashboard ⭐
β”‚   β”œβ”€β”€ phase2-demo.tsx             # Feature demo ⭐
β”‚   └── exercise.tsx                # Practice exercises
β”œβ”€β”€ components/ui/            # UI component library (shadcn)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ context/             # React Context providers
β”‚   β”œβ”€β”€ hooks/               # Custom React hooks
β”‚   └── lib/                 # Utility functions
β”œβ”€β”€ app_phase2.py            # Phase 2 API server ⭐
β”œβ”€β”€ adaptive_engine.py       # Adaptive recommendation engine
β”œβ”€β”€ models_extended.py       # Database models
β”œβ”€β”€ start_system.sh          # One-click startup script
└── README.md                # This file

πŸ“ˆ Dataset Scale

Metric Count
Test Users 51
Learning Sessions 4,050
Exercise Records 15,200
Learning Events 50,100

🀝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Development Guidelines

  • React components: Functional components + TypeScript
  • Code style: 2-space indentation, PascalCase file naming
  • Python: PEP 8 compliant
  • Commits: Conventional Commits format

πŸ“„ License

This project is open source and available under the MIT License.


πŸ“š Documentation


Built with ❀️ for language learners worldwide Based on a master's thesis at Peking University: "Research and Design of an Adaptive Intermediate Chinese Vocabulary Learning System"