Gabriele Caslini | Deep Learning Engineer | MLOps

About Me

I am an Deep Learning Engineer with a deep passion for artificial intelligence, MLOps and autonomous systems. I graduated Cum Laude from both my Bachelor's and Master's degrees in Automation and Control Engineering at Politecnico di Milano, with international experience at TU Delft in the Netherlands and Télécom Paris.

I am driven by building reliable, high-impact intelligent systems at the intersection of modern deep learning, AI, and automation. My main interests lie in modern machine learning methods, AI architectures, and robotics.

I am convinced that AI will reshape how we work, move, learn, and make decisions, and that the people who understand how to develop and deploy it responsibly will define the next decade.

Python C++ PyTorch Transformers Computer Vision ROS Docker CI/CD Git Slurm MATLAB/Simulink Linux

Background

Experience Mar 2025 - Dec 2025

Deep Learning Engineer | MLOps

AIDA - Artificial Intelligence Driving Autonomous · Master Thesis

Deployed a unified multimodal sensor fusion pipeline for automatic HD map generation. Designed, trained, and deployed traffic sign recognition models to improve perception robustness and support autonomous vehicle planning.

97.5% recall over 14 km of real-world testing on Italian roads.

Upcoming publication: Camera-LiDAR Fusion for Traffic Sign and Road Marking Extraction for Autonomous Driving (Feb 2026, CCTA).

CI/CDMLOpsSlurmDocker ROSDeep LearningComputer Vision

Experience Oct 2024 - Mar 2025

Deep Learning Engineer

Epoch V · Delft, NL

MIT Challenge “Goodnight Moon: Hello Early Literacy Screening”. Developed ML models to automatically score children's literacy screener audio using ASR, transformers, and contrastive learning.

Ranked 8th out of 399 teams in an international competition, helping enable faster early intervention in education.

WhisperHuggingFaceWandb PyTorchAudio Processing

Experience Sep 2023 - Jul 2024

Software & Hardware Engineer

Polimi Sailing Team · Milan

Designed and integrated the boat's electronic architecture, automated height control for stable foil flight system, sensor fusion, and embedded control logic.

1st place at SuMoth Challenge 2024.

Real Time ControlSensor FusionCAN Bus PCBEmbedded Systems

Education Sep 2023 - Dec 2025

Master of Science in Automation and Control Engineering

Politecnico di Milano

GPA: 29.6/30 · Final Score: 110 Cum Laude.
Focus: statistical modeling, ML, optimization, systems robustness and safety.

Education Nov 2024

Practice in Deep Learning Course

Athens, Télécom Paris

Focus: Transformers, CNN, RNN, PyTorch, Keras.

Education Aug 2024 - Mar 2025

Erasmus Exchange

TU Delft, Netherlands · Computer Science Faculty

Exchange semester at the Computer Science Faculty of TU Delft.

Education Sep 2020 - Jul 2023

Bachelor of Science in Automation and Control Engineering

Politecnico di Milano

GPA: 29.4/30 · Final Score: 110 Cum Laude.

Projects & HP Teams

Camera-LiDAR Fusion for HD Map Generation

Offline camera-LiDAR fusion pipeline that automatically detects traffic signs and road markings, estimates their 3D pose, and annotates HD maps for autonomous vehicles. Achieved 97.5% recall over 14 km of real Italian roads.

PythonPyTorchYOLOv11ROS DockerSlurmLiDARComputer Vision

Overview

This Master Thesis project, carried out within the AIDA (Artificial Intelligence Driving Autonomous) research group at Politecnico di Milano, tackles the challenge of automatically generating regulation-aware HD maps for autonomous driving in urban environments. The core problem: autonomous vehicles depend on pre-constructed HD maps for localization and planning, but manually annotating traffic regulations is slow and error-prone.

Pipeline Architecture

The system is composed of four main stages that process recorded drives:

Detection: A YOLOv11-M detector, trained on a custom dataset of 20,930 images (34,000+ annotations) spanning 31 traffic sign classes. Hyperparameters were optimized via Bayesian Optimization (20 runs, 30 epochs each) targeting mAP50-95. Training achieved mAP50 of 0.917, Recall of 0.841, Precision of 0.923.
Tracking: BoT-SORT multi-object tracker with appearance re-identification and global motion compensation. A custom multi-camera mosaic approach fuses detections from 3 cameras with a selection layer to avoid duplicates.
3D Localization & Pose Estimation: LiDAR points are projected onto camera frames and filtered per bounding box. Background point removal via statistical filtering, followed by 3D pose estimation (normal vector extraction) for vertical signs and side refinement with ROI extraction for horizontal markings.
HD Map Route Annotation: Detected signs are converted into regulatory Special Nodes (stop, yield, crosswalk, speed limit) placed at precise distances along the vehicle's route with a hierarchical fallback system that prioritizes horizontal markings over vertical signs.

Results

Validated on 14 km of Italian roads. Recall ranges from 93% to 100% across classes. In the fallback configuration, only 3 missed events out of 161 and 13 false positives. Processing time: 5 minutes per kilometer.

Upcoming publication: Camera-LiDAR Fusion for Traffic Sign and Road Marking Extraction for Autonomous Driving (Feb 2026, CCTA).

Goodnight Moon: Early Literacy Screening

Custom Transformer scoring model built on Whisper embeddings with interleaved self/cross-attention for automated children's literacy assessment. Ranked 8th out of 399 teams in the MIT “Goodnight Moon” challenge.

PythonPyTorchWhisper Wav2Vec2HuggingFaceWandbASR

Overview

The “Goodnight Moon” challenge, organized by Reach Every Reader (Harvard, MIT & Florida State University), asks teams to build ML models that automatically score children's early literacy assessments from audio recordings. Children's speech introduces greater acoustic variability due to smaller vocal tracts and unpredictable pronunciations, making standard ASR systems unreliable. Accurate automated scoring enables faster early intervention for students who may benefit from additional literacy support.

Approach

Main model - Whisper Transformer: A custom encoder-decoder Transformer that takes the expected text (character-level tokenizer, 40-token vocab) and pre-computed Whisper-base audio embeddings as inputs. Four interleaved self-attention / cross-attention layers (S-C-S-C pattern) progressively fuse text and audio representations. A learnable CLS token aggregates the final sequence into a single vector, fed through a LayerNorm + MLP head to predict a score in [0, 1]. Training uses AdamW with linear warmup + cosine decay, EMA weight averaging (decay ≈ 0.983), and Bayesian hyperparameter search via W&B.
Contrastive model: Dual-branch architecture (Wav2Vec2 for audio, FastText for text) projecting both modalities into a shared 256-d space and scoring via cosine similarity.
Phonetic Transformer: Variant replacing continuous Whisper embeddings with discrete phoneme tokens, enabling a symbolic text-vs-phoneme comparison through a shared embedding space.
Experiment tracking: Full lifecycle managed with Weights & Biases sweeps and HuggingFace model hub.

Results

The team ranked 8th out of 399 teams in the international competition, contributing to making automated early literacy screening more accessible and scalable. The project was carried out with Epoch V during the Erasmus exchange at TU Delft.

SuMoth Challenge 2024 - Polimi Sailing Team

Full electronic architecture design for a moth-class sailing boat with automated height control for stable foil flight, sensor fusion, and embedded control logic. The team won 1st place at SuMoth Challenge 2024.

C++ROSMATLAB/Simulink CAN BusSensor FusionEmbedded

Overview

The Polimi Sailing Team is Politecnico di Milano's competitive sailing team that designs and builds high-performance moth-class sailboats. The moth is a single-handed foiling dinghy that lifts entirely out of the water on hydrofoils, requiring precise real-time control to maintain stable flight.

Technical Contribution

Electronic Architecture: Designed and integrated the entire onboard electronic system from scratch, including distributed sensor nodes and communication infrastructure.
Automated Height Control: Developed the real-time control algorithm for stable foil flight, managing ride height and pitch stability at high speeds.
Sensor Fusion: Implemented multi-sensor data fusion for accurate state estimation, combining inertial measurements, height sensors, and speed data.
CAN Bus & Embedded: Built a CAN bus network for communication between distributed nodes, handled PCB integration, debugging, and waterproofing of all electronics.

Results

The team achieved 1st place at the SuMoth Challenge 2024, a competitive event where university teams race self-designed moth sailboats. The electronic control system was a key differentiator for the team's performance.

Coursework

Rotary Inverted Pendulum Control

Full control design for a rotary inverted pendulum: from Euler–Lagrange modeling and parameter identification to frequency-based controllers, state-space stabilization, and energy-based swing-up. Our team’s code was featured in the official promotional video of the MSc in Automation and Control Engineering.

Automation and Control Laboratory · Politecnico di Milano · 2025

MATLABSimulinkControl Theory State-SpaceLoop ShapingLQGReal-Time

Overview

The rotary inverted pendulum is a classic nonlinear, underactuated mechanical system with an intrinsically unstable upright equilibrium. The physical setup consists of a Quanser SRV02 base unit driven by a DC servo motor, with angular positions measured by two optical encoders and controlled via MATLAB/Simulink through a DAQ board. The lab project was carried out in a team of four as part of the Automation and Control Laboratory course at Politecnico di Milano.

Approach

Modeling: Derived the equations of motion via Euler–Lagrange formulation, including DC motor dynamics. Identified physical parameters and friction coefficients from experimental data. Validated the model in both time and frequency domains.
Frequency-based control: Designed P, PI, and loop-shaped controllers for horizontal arm position tracking. Loop shaping eliminated hunting motion caused by stick-slip friction and achieved fast rise time with no actuator saturation.
State observers: Implemented and compared a Luenberger observer, linear Kalman Filter, and Extended Kalman Filter for velocity estimation. The Luenberger observer was selected for its simplicity and robust performance.
State-space control: Designed Pole Placement and LQG controllers for pendulum stabilization in the upright position, then extended to simultaneous reference tracking with an LQI (LQG + integral action) controller.
Swing-up: Implemented an energy-based nonlinear swing-up strategy managed by a finite state machine that transitions to the LQR stabilizer once the pendulum reaches the upright region. Successfully demonstrated on the real system with disturbance rejection.

Results

All controllers successfully stabilized the pendulum and tracked reference signals. The swing-up achieved steady state within 4 seconds on the real hardware. The team’s code and results were selected for the promotional video of the MSc in Automation and Control Engineering at Politecnico di Milano.

Solving the 2D Schnakenberg Reaction-Diffusion System

Numerical solution of the coupled Schnakenberg PDE system on a 2D domain with Neumann boundary conditions. Compared explicit Forward Euler and implicit Backward Euler with Newton–Raphson, including stability analysis and empirical convergence bounds. The simulation produces Turing patterns from random initial perturbations.

Numerical Analysis · TU Delft · Jan 2025

PythonNumPySciPy PDEsFinite DifferencesNewton–Raphson

Overview

The Schnakenberg system is a two-species reaction-diffusion model that, for suitable parameters, exhibits Turing instabilities: starting from a nearly uniform initial condition with small random perturbations, the solution evolves into stable spatial patterns. The assignment required solving this coupled PDE system numerically on the square domain [0, 4]² with zero Neumann boundary conditions, using two different time-integration strategies.

Methods

Spatial discretization: Constructed the 2D negative-Laplacian matrix with Neumann BCs via Kronecker products of 1D operators, ensuring symmetry through boundary equation multipliers.
Forward Euler (explicit): Conditionally stable - derived the maximum time step via Jacobian eigenvalue analysis and Gerschgorin bounds. Required ∼50,130 steps on a 100×100 grid (CPU time ≈ 9.3 s).
Backward Euler + Newton–Raphson (implicit): Unconditionally stable - each time step requires solving a nonlinear system via Newton iterations with a block-Jacobian matrix. Converged in only 346 steps on the same grid (CPU time ≈ 84 s), with ∼3 Newton iterations per step.
Stability analysis: Full linearization around the trivial steady state, Jacobian block structure, eigenvalue negativity proof, and Gerschgorin-based upper bound for the Forward Euler time step.

Results

Both methods correctly produce Turing patterns from the perturbed initial condition. Forward Euler is cheaper per step but its conditional stability forces very small time increments. Backward Euler–Newton–Raphson takes far fewer steps at the cost of solving a large sparse linear system at each iteration, making it more attractive for stiffer problems or longer simulations.