Computer Vision · Safety Systems Completed (Undergrad)

UAV / Drone Detection using Deep Learning

Exploring VGG16, ResNet50 and YOLO for vision-based drone identification, with a focus on real-time readiness and responsible deployment.

Back

My Role

Modeling · Training · Evaluation

When

2023

Context

Defense-inspired student project

Transfer Learning Object Detection TensorFlow/Keras OpenCV

Built With

Python · TensorFlow/Keras · NumPy · OpenCV · Matplotlib

Motivation

Low-altitude UAVs enable logistics and inspection but also introduce risks when misused. We explored whether compact vision models can reliably identify drones in varied scenes as a step toward real-time monitoring.

Goal: Detect and localize drones in images/video frames.
Constraint: Favor smaller backbones and transfer learning for quicker convergence on limited data.
Outcome: Baseline classifiers with bounding-box regressors and a YOLO pathway for real-time detection.

Model Architecture

We compared transfer-learned CNN backbones (VGG16, ResNet50) for classification + box regression, and a YOLO head for single-stage detection.

VGG16: Pretrained on ImageNet; frozen early blocks; Dense head with 4-value bounding box (x₁, y₁, x₂, y₂). Best stability on this dataset.
ResNet50: Residual connections improved gradient flow; needed stronger augmentation to avoid overfitting.
YOLO (single-stage): Grid-based detection for real-time scenarios; earmarked for future integration after backbone benchmarking.

Dataset & Preprocessing

Kaggle UAV imagery with text annotations for bounding boxes; standardized to 258×258 RGB and split into train/test.

Parsed TXT labels → [x1, y1, x2, y2]; normalized coordinates to image size.
Applied basic augmentation: random flip, modest rotation/zoom; color jitter held back to preserve silhouette cues.
Train/validation/test = 80/10/10; batched loaders using tf.data.

Training & Metrics

Adam optimizer with small learning rates, early stopping, and model checkpoints. We tracked accuracy for classification and MSE for box regression.

Losses: categorical cross-entropy (class), MSE (bbox).
Batch size / Epochs: 16 / 10–16 with patience-based stopping.
Reported: Train/val accuracy & loss curves; test accuracy; MSE/MAE; qualitative frame outputs.

~0.66–0.70 Val Accuracy (VGG16)

Stable BBox Regression (MSE)

Real-time* YOLO Pathway

VGG16 — Transfer Learning

Classifier + BBox

Val Acc ~0.68 MSE ↓ Overfit risk: low

ResNet50 — Transfer Learning

Classifier + BBox

Val Acc ~0.60 Overfit control ↑ Augment ↑

YOLO — Single-Stage Detector

Realtime Pathway

FPS Target ✓ mAP: TBD Future integration

Results

VGG16 converged consistently and generalized best; ResNet50 required more regularization; YOLO earmarked for real-time extension.

VGG16 (transfer-learned): Final val accuracy ≈ 0.66–0.70 on our split; smooth loss decay; stable bbox regression.
ResNet50: Higher capacity led to overfitting without heavier augmentation; validation metrics lagged VGG16.
Takeaway: For limited datasets, simpler pretrained backbones + careful augmentation can outperform deeper stacks.

Ethics & Governance

Detection tech has dual-use concerns. Any deployment must prioritize accountability, privacy, and human oversight.

Intended use: Safety & monitoring in controlled environments; not for autonomous engagement.
Human-in-the-loop: Model outputs treated as alerts, not final decisions.
Privacy: Avoid storing personally identifiable footage; retain only labeled frames for evaluation when necessary.
Evaluation: Document failure modes (occlusion, distance, lighting) and publish limitations alongside metrics.

Resources

Code, slides, and references.

Dataset: “Drone Dataset (UAV)” — Kaggle.
Models: VGG16, ResNet50 (ImageNet pretrain), YOLO (single-stage detector).