Computer Vision · Safety Systems Completed (Undergrad)

UAV / Drone Detection using Deep Learning

Exploring VGG16, ResNet50 and YOLO for vision-based drone identification, with a focus on real-time readiness and responsible deployment.

Back

My Role

Modeling · Training · Evaluation

When

2023

Context

Defense-inspired student project

Transfer Learning Object Detection TensorFlow/Keras OpenCV

Built With

Python · TensorFlow/Keras · NumPy · OpenCV · Matplotlib

Motivation

Low-altitude UAVs enable logistics and inspection but also introduce risks when misused. We explored whether compact vision models can reliably identify drones in varied scenes as a step toward real-time monitoring.

  • Goal: Detect and localize drones in images/video frames.
  • Constraint: Favor smaller backbones and transfer learning for quicker convergence on limited data.
  • Outcome: Baseline classifiers with bounding-box regressors and a YOLO pathway for real-time detection.

Model Architecture

We compared transfer-learned CNN backbones (VGG16, ResNet50) for classification + box regression, and a YOLO head for single-stage detection.

  • VGG16: Pretrained on ImageNet; frozen early blocks; Dense head with 4-value bounding box (x1, y1, x2, y2). Best stability on this dataset.
  • ResNet50: Residual connections improved gradient flow; needed stronger augmentation to avoid overfitting.
  • YOLO (single-stage): Grid-based detection for real-time scenarios; earmarked for future integration after backbone benchmarking.

Dataset & Preprocessing

Kaggle UAV imagery with text annotations for bounding boxes; standardized to 258×258 RGB and split into train/test.

  • Parsed TXT labels → [x1, y1, x2, y2]; normalized coordinates to image size.
  • Applied basic augmentation: random flip, modest rotation/zoom; color jitter held back to preserve silhouette cues.
  • Train/validation/test = 80/10/10; batched loaders using tf.data.

Training & Metrics

Adam optimizer with small learning rates, early stopping, and model checkpoints. We tracked accuracy for classification and MSE for box regression.

  • Losses: categorical cross-entropy (class), MSE (bbox).
  • Batch size / Epochs: 16 / 10–16 with patience-based stopping.
  • Reported: Train/val accuracy & loss curves; test accuracy; MSE/MAE; qualitative frame outputs.
~0.66–0.70 Val Accuracy (VGG16)
Stable BBox Regression (MSE)
Real-time* YOLO Pathway
VGG16 — Transfer Learning
Classifier + BBox
Val Acc ~0.68 MSE ↓ Overfit risk: low
ResNet50 — Transfer Learning
Classifier + BBox
Val Acc ~0.60 Overfit control ↑ Augment ↑
YOLO — Single-Stage Detector
Realtime Pathway
FPS Target ✓ mAP: TBD Future integration

Results

VGG16 converged consistently and generalized best; ResNet50 required more regularization; YOLO earmarked for real-time extension.

  • VGG16 (transfer-learned): Final val accuracy ≈ 0.66–0.70 on our split; smooth loss decay; stable bbox regression.
  • ResNet50: Higher capacity led to overfitting without heavier augmentation; validation metrics lagged VGG16.
  • Takeaway: For limited datasets, simpler pretrained backbones + careful augmentation can outperform deeper stacks.

Ethics & Governance

Detection tech has dual-use concerns. Any deployment must prioritize accountability, privacy, and human oversight.

  • Intended use: Safety & monitoring in controlled environments; not for autonomous engagement.
  • Human-in-the-loop: Model outputs treated as alerts, not final decisions.
  • Privacy: Avoid storing personally identifiable footage; retain only labeled frames for evaluation when necessary.
  • Evaluation: Document failure modes (occlusion, distance, lighting) and publish limitations alongside metrics.

Resources

Code, slides, and references.

  • Dataset: “Drone Dataset (UAV)” — Kaggle.
  • Models: VGG16, ResNet50 (ImageNet pretrain), YOLO (single-stage detector).