Home
Softono
Network-Traffic-Classification

Network-Traffic-Classification

Open source Jupyter Notebook
40
Stars
11
Forks
0
Issues
2
Watchers
2 weeks
Last Commit

About Network-Traffic-Classification

Network Traffic Classification is a machine learning project for classifying network flows captured from a Docker-based SDN lab environment. The dataset includes over 300,000 flows analyzed with nDPI, with more than 100 application protocols grouped into 10 traffic classes such as Web, Streaming, Social Media, and DNS. Seven flow features are used for training, spanning protocol metadata, packet counts, and byte counts. The repository supports reproducible experiments across multiple supervised algorithms. Reported results include KNN at 97.24%, Random Forest at 96.69%, Decision Tree at 95.80%, and a custom PAA method achieving 99.29% accuracy. It provides a modern modular Python package with a command-line interface for training and evaluation, alongside original per-algorithm folders containing legacy scripts, notebooks, saved models, and confusion matrices. DNN experiments using TensorFlow are supported through optional dependencies. Typical workflows include training classifiers on the provided CSV datase

Platforms

Web Self-hosted Docker

Languages

Jupyter Notebook

Links

Network Traffic Classification

Python scikit-learn Paper GitHub stars GitHub forks DOI

Machine-learning experiments for classifying network traffic flows collected from a Docker-based SDN lab network. The dataset contains more than 300,000 flows analyzed with nDPI, grouped from 100+ application protocols into 10 traffic classes.

This repository includes the original research artifacts, saved models, confusion matrices, and a cleaner Python workflow for reproducing common experiments.

Project Impact

As of June 2, 2026, this public research repository has accumulated measurable academic and open-source interest:

Signal Count Notes
GitHub stars 40 First public star recorded on March 10, 2021.
GitHub forks 11 Fork activity spans September 2020 through September 2025.
Crossref cited-by count 14 Citation metadata for the associated Connection Science article.
Google Scholar profile Available Author profile provides a complementary citation view.
Paper references 42 References registered in Crossref metadata.

See Repository Impact for the star/fork timeline, citation graph, and data-source notes.

Research Summary

The project evaluates supervised machine-learning models for flow-level network traffic classification using seven selected flow features:

Feature group Columns
Protocol metadata protocol, src_port, dst_port
Packet counts src2dst_packets, dst2src_packets
Byte counts src2dst_bytes, dst2src_bytes

Reported results from the original experiments:

Method Accuracy
Decision Tree 95.80%
Random Forest 96.69%
KNN 97.24%
PAA 99.29%

For the full methodology, class grouping, and experimental setup, read the paper:

P. K. Mondal, L. P. Aguirre Sanchez, E. Benedetto, Y. Shen, and M. Guo, "A dynamic network traffic classifier using supervised ML for a Docker-based SDN network," Connection Science, 2021. https://doi.org/10.1080/09540091.2020.1870437

Repository Layout

.
├── DecisionTree/                  # Original decision-tree scripts, notebook, model, outputs
├── RandomForest/                  # Original random-forest scripts, model, outputs
├── KNN/                           # Original KNN scripts, model, outputs
├── DNN/                           # Original neural-network scripts, model, outputs
├── Dataset/                       # Dataset access notes
├── network_traffic_classification/ # Modern reusable Python package
├── docs/                          # Project documentation
├── dictionary.py                  # Legacy protocol-to-class helper
├── test.txt                       # Protocol-to-class mapping used by legacy scripts
└── README.md

Quick Start

Create a virtual environment and install the dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For the legacy DNN scripts, install the optional TensorFlow dependency:

pip install -r requirements-dnn.txt

Train a model with the modern CLI:

python -m network_traffic_classification train \
  --data path/to/total_class.csv \
  --model random-forest \
  --output models/random_forest.joblib

Available model names:

decision-tree
random-forest
knn

The CLI prints accuracy and a classification report. It can also save the trained model and class labels:

python -m network_traffic_classification train \
  --data path/to/total_class.csv \
  --model knn \
  --output artifacts/knn.joblib \
  --labels-output artifacts/classes.txt \
  --test-size 0.33 \
  --random-state 42

Dataset

The raw .pcap files and processed CSV are large, so the dataset is not committed to this repository. See Dataset/How to get the data.txt for access instructions and research-use conditions.

Expected labeled training file:

total_class.csv

Expected columns:

#flow_id, protocol, src_ip, src_port, dst_ip, dst_port, ndpi_proto_num,
src2dst_packets, src2dst_bytes, dst2src_packets, dst2src_bytes,
ndpi_proto, class

Original Scripts

The original scripts are preserved for traceability:

python DecisionTree/decisiontree.py
python RandomForest/randomforest.py
python KNN/knn.py
python DNN/dnn.py

Some legacy scripts contain machine-specific Windows paths. Prefer the modern CLI for new experiments, or update file_dir in the legacy scripts before running them.

Citation

If this repository or dataset supports your work, please cite:

@article{mondal2021dynamic,
  title={A dynamic network traffic classifier using supervised ML for a Docker-based SDN network},
  author={Mondal, Pritom Kumar and Aguirre Sanchez, Lizeth P. and Benedetto, Emmanuele and Shen, Yao and Guo, Minyi},
  journal={Connection Science},
  pages={1--26},
  year={2021},
  publisher={Taylor \& Francis},
  doi={10.1080/09540091.2020.1870437}
}

Contributing

Contributions are welcome, especially improvements to reproducibility, documentation, and model evaluation. Please read CONTRIBUTING.md before opening a pull request.