UAV / Drone Detection using Deep Learning
Exploring VGG16, ResNet50 and YOLO for vision-based drone identification, with a focus on real-time readiness and responsible deployment.
My Role
Modeling · Training · Evaluation
When
2023
Context
Defense-inspired student project
Built With
Python · TensorFlow/Keras · NumPy · OpenCV · Matplotlib
Motivation
Low-altitude UAVs enable logistics and inspection but also introduce risks when misused. We explored whether compact vision models can reliably identify drones in varied scenes as a step toward real-time monitoring.
- Goal: Detect and localize drones in images/video frames.
- Constraint: Favor smaller backbones and transfer learning for quicker convergence on limited data.
- Outcome: Baseline classifiers with bounding-box regressors and a YOLO pathway for real-time detection.
Model Architecture
We compared transfer-learned CNN backbones (VGG16, ResNet50) for classification + box regression, and a YOLO head for single-stage detection.
- VGG16: Pretrained on ImageNet; frozen early blocks; Dense head with 4-value bounding box (x1, y1, x2, y2). Best stability on this dataset.
- ResNet50: Residual connections improved gradient flow; needed stronger augmentation to avoid overfitting.
- YOLO (single-stage): Grid-based detection for real-time scenarios; earmarked for future integration after backbone benchmarking.
Dataset & Preprocessing
Kaggle UAV imagery with text annotations for bounding boxes; standardized to 258×258 RGB and split into train/test.
- Parsed TXT labels →
[x1, y1, x2, y2]; normalized coordinates to image size. - Applied basic augmentation: random flip, modest rotation/zoom; color jitter held back to preserve silhouette cues.
- Train/validation/test = 80/10/10; batched loaders using
tf.data.
Training & Metrics
Adam optimizer with small learning rates, early stopping, and model checkpoints. We tracked accuracy for classification and MSE for box regression.
- Losses: categorical cross-entropy (class), MSE (bbox).
- Batch size / Epochs: 16 / 10–16 with patience-based stopping.
- Reported: Train/val accuracy & loss curves; test accuracy; MSE/MAE; qualitative frame outputs.
Results
VGG16 converged consistently and generalized best; ResNet50 required more regularization; YOLO earmarked for real-time extension.
- VGG16 (transfer-learned): Final val accuracy ≈ 0.66–0.70 on our split; smooth loss decay; stable bbox regression.
- ResNet50: Higher capacity led to overfitting without heavier augmentation; validation metrics lagged VGG16.
- Takeaway: For limited datasets, simpler pretrained backbones + careful augmentation can outperform deeper stacks.
Ethics & Governance
Detection tech has dual-use concerns. Any deployment must prioritize accountability, privacy, and human oversight.
- Intended use: Safety & monitoring in controlled environments; not for autonomous engagement.
- Human-in-the-loop: Model outputs treated as alerts, not final decisions.
- Privacy: Avoid storing personally identifiable footage; retain only labeled frames for evaluation when necessary.
- Evaluation: Document failure modes (occlusion, distance, lighting) and publish limitations alongside metrics.
Resources
Code, slides, and references.
- Dataset: “Drone Dataset (UAV)” — Kaggle.
- Models: VGG16, ResNet50 (ImageNet pretrain), YOLO (single-stage detector).