← Back to Projects

Deep-Learning Image Reconstruction

U-Net architecture with self-attention trained to reconstruct still images from event-camera noise, serving as an independent validation signal for the underlying sensor model.

Python Deep Learning U-Net Self-Attention Image Reconstruction

Problem

Event-based cameras produce asynchronous streams of brightness-change events rather than conventional image frames. Reconstructing a recognizable still image from this event stream—especially from the noise component alone—is a challenging inverse problem. The goal here is twofold: (1) build a reconstruction pipeline that can recover images from event-camera data, and (2) use the quality and consistency of those reconstructions as a validation signal for the companion analytical sensor model. If the probabilistic model correctly captures how the sensor generates events, then reconstructions from model-simulated noise should closely match reconstructions from real sensor noise.

Physical System

The input data consists of event streams from a neuromorphic camera, where each event encodes a pixel location, timestamp, and polarity (brightness increase or decrease). The reconstruction task takes temporal windows of events and produces a grayscale intensity image representing the scene at that moment. The key physical constraint is that the event stream encodes only changes—not absolute brightness—so the network must learn to integrate temporal information and resolve ambiguities from noise.

What I Built

I developed a deep-learning reconstruction pipeline based on the U-Net encoder-decoder architecture, augmented with self-attention layers. The key components are:

  • Event representation: Preprocessing step that converts raw event streams into tensor representations suitable for convolutional processing (e.g., event histograms, time surfaces, or voxel grids).
  • U-Net backbone: Encoder-decoder architecture with skip connections that captures both local detail and global context in the event data.
  • Self-attention modules: Attention layers that enable the network to model long-range spatial dependencies in the event data, improving reconstruction quality for large-scale scene structures.
  • Training pipeline: End-to-end training workflow with data loading, augmentation, loss computation, and evaluation metrics.

Computational Workflow

  1. Data preparation: Converted recorded event streams into structured input tensors, paired with ground-truth intensity images for supervised training.
  2. Architecture design: Implemented U-Net with self-attention in Python, iterating on layer depth, attention placement, and skip-connection design.
  3. Training: Trained the network on event-camera data using standard reconstruction losses; monitored convergence and reconstruction quality across training iterations.
  4. Evaluation: Assessed reconstruction quality using quantitative image similarity metrics and visual inspection against ground-truth images.
  5. Model validation: Applied the trained network to noise-only event streams generated both by the real sensor and by the companion analytical model, comparing reconstruction outputs to test whether the model accurately describes the noise process.

Validation

The reconstruction pipeline is validated at two levels:

  • Reconstruction fidelity: The network produces visually recognizable images from event-camera data, with quantitative metrics confirming improvement over baseline methods.
  • Cross-model consistency: Reconstructions from model-simulated event noise are visually and metrically consistent with reconstructions from real sensor noise, confirming that the analytical model captures the relevant noise statistics. This provides an independent, ML-based validation of the physical model.

Key Outcomes

Architecture U-Net with self-attention, designed for event-camera reconstruction
Dual Purpose Both a reconstruction tool and an independent validation signal for the sensor model
End-to-End Pipeline Complete workflow from raw events through preprocessing, training, and evaluation
Cross-Validation Demonstrated consistency between model-simulated and real-sensor reconstructions

Relevance

This project demonstrates end-to-end ML engineering for an imaging application: from data pipeline design through architecture selection, training, and quantitative evaluation. It combines deep learning with physics-based reasoning—the reconstruction network does not operate in isolation but interacts with a physical sensor model. This workflow directly applies to roles involving ML for imaging, sensor data processing, and computational photography in defense and aerospace settings.