πŸ‘¨πŸ»β€πŸ”¬ Connect With Me

  • 🎯 Seeking Roles: Research or industry roles where I can leverage my multi-disciplinary toolkit, combining low-level systems logic, machine learning workflows, and modern web architecture, to solve real-world problems in engineering, finance, and adjacent fields.
  • πŸ“ CV / Resume: View my Resume
  • πŸ“ Location: Bologna, Italy
  • πŸ“± Phone: (+39) 366 4207296
  • πŸ“§ Email: giovannigravili112@gmail.com
  • πŸ”— LinkedIn: linkedin.com/in/giovanni-gravili
  • πŸ™ GitHub: github.com/ghovax

Other Projects

πŸ”Ž Minimizing Aberration in Spherical Lenses with PyTorch

To better learn PyTorch, I set out to develop an optimization program that combines the physics of lenses with machine learning. The goal is to design the shape of a single glass lens that takes a given bundle of parallel incoming light rays and focuses them to a single, theoretically infinitesimal point on a screen at a specified distance.

Physics of Ray Optics

Because the apertures have diameters hundreds of times larger than the wavelength of light, we operate in the regime of geometrical optics, which treats light as rays. The underlying assumption is that light travels in straight lines within a uniform medium and that each ray is independent, with no interactions between rays. Wave effects such as diffraction and interference are neglected because the lens features are large compared with the wavelength. However, we will apply the full Snell’s law rather than the paraxial approximation, so the model remains accurate for rays at steep angles relative to the optical axis.

A light ray is mathematically represented by a vector with a given origin P→0=(x0,y0,z0)\vec{P}_0 = (x_0, y_0, z_0) and a normalized direction D→=(Dx,Dy,Dz)\vec{D} = (D_x, D_y, D_z) where |D→|=1|\vec{D}| = 1. The position along the ray can be parameterized by the distance tt traveled:

P→(t)=P→0+tD→\vec{P}(t) = \vec{P}_0 + t \vec{D}

This parametric form is computationally convenient because finding where a ray intersects a surface reduces to solving for the scalar tt.

A lens is defined by two spherical surfaces separated by a thickness tlenst_{lens}, each surface specified by its curvature c=1/Rc = 1/R where RR is the radius of curvature. The front surface has curvature c1c_1 with vertex at z=0z = 0, and the back surface has curvature c2c_2 with vertex at z=tlensz = t_{lens}. A positive curvature means the center of curvature lies in the +z+z direction from the surface vertex. For a sphere centered at (0,0,R)(0, 0, R), the implicit equation x2+y2+(zβˆ’R)2=R2x^2+y^2+(z-R)^2=R^2 can be solved for the sag (the z-displacement from the vertex):

z=Rβˆ’R2βˆ’r2z = R - \sqrt{R^2 - r^2}

where r2=x2+y2r^2 = x^2 + y^2. Rewriting in terms of curvature and rationalizing to avoid numerical instability when c→0c \to 0:

z=cr21+1βˆ’c2r2z = \frac{c r^2}{1 + \sqrt{1 - c^2 r^2}}

This is the non-paraxial sag formula, valid for all ray heights. The paraxial approximation zβ‰ˆcr22z \approx \frac{c r^2}{2} is only accurate for small rr.

To apply Snell’s law at a surface, we need the normal vector. For a sphere centered at Cβ†’=(0,0,1/c)\vec{C} = (0, 0, 1/c), the outward normal at any point Pβ†’\vec{P} is simply:

Nβ†’=Pβ†’βˆ’Cβ†’|Pβ†’βˆ’Cβ†’|\vec{N} = \frac{\vec{P} - \vec{C}}{|\vec{P} - \vec{C}|}

Snell’s law states that when light passes between media with different refractive indices, it bends according to n1sin(ΞΈ1)=n2sin(ΞΈ2)n_1 \sin(\theta_1) = n_2 \sin(\theta_2). For computation, a vector form is more practical. We decompose the incident ray Dβ†’inc\vec{D}_{inc} into components parallel and perpendicular to the surface:

Dβ†’inc=Dβ†’βˆ₯+Dβ†’βŠ₯\vec{D}_{inc} = \vec{D}_{\parallel} + \vec{D}_{\perp}

where Dβ†’βŠ₯=(Dβ†’incβ‹…Nβ†’)Nβ†’\vec{D}_{\perp} = (\vec{D}_{inc} \cdot \vec{N})\vec{N} and Dβ†’βˆ₯=Dβ†’incβˆ’Dβ†’βŠ₯\vec{D}_{\parallel} = \vec{D}_{inc} - \vec{D}_{\perp}. The magnitudes relate to angles: |Dβ†’βˆ₯|=sinΞΈ1|\vec{D}_{\parallel}| = \sin\theta_1 and |Dβ†’βŠ₯|=cosΞΈ1|\vec{D}_{\perp}| = \cos\theta_1. With ΞΌ=n1/n2\mu = n_1/n_2, the refracted direction becomes:

Dβ†’refr=ΞΌDβ†’inc+(ΞΌcos(ΞΈ1)βˆ’cos(ΞΈ2))Nβ†’ \vec{D}_{refr} = \mu \vec{D}_{inc} + \left(\mu \cos(\theta_1) - \cos(\theta_2)\right) \vec{N}

where cos(ΞΈ1)=βˆ’Dβ†’incβ‹…Nβ†’\cos(\theta_1) = -\vec{D}_{inc} \cdot \vec{N} and cos(ΞΈ2)=1βˆ’ΞΌ2(1βˆ’cos2(ΞΈ1))\cos(\theta_2) = \sqrt{1 - \mu^2 (1 - \cos^2(\theta_1))}.

A crucial edge case appears when the term inside this square root becomes negative. This occurs when light travels from a denser to a less dense medium at a sufficiently shallow angle, beyond the critical angle ΞΈc=arcsin(n2/n1)\theta_c = \arcsin(n_2/n_1). For glass (nβ‰ˆ1.5n \approx 1.5) to air, ΞΈcβ‰ˆ41.8Β°\theta_c \approx 41.8Β°. Beyond this, Total Internal Reflection (TIR) occurs and the ray never exits the lens.

The Computational Model

With the physics established, we can build an algorithm to trace rays through the lens. To properly sample the lens aperture, we need rays distributed uniformly across a circular cross-section. A naive approach using concentric rings creates artificial clustering. Instead, we use the Vogel spiral distribution: for NN rays indexed by kk, the polar coordinates are Ο•k=kβ‹…Ο†\phi_k = k \cdot \varphi and rk=Rbeamk/Nr_k = R_{beam} \sqrt{k/N}, where Ο†=Ο€(3βˆ’5)\varphi = \pi(3 - \sqrt{5}) is the golden angle. The k/N\sqrt{k/N} scaling ensures uniform area density since area grows as r2r^2. All initial ray directions are set to Dβ†’0=(0,0,1)\vec{D}_0 = (0, 0, 1) for a collimated beam.

The first step in tracing is determining where a ray hits a surface. For a differentiable implementation, an iterative Newton-Raphson method is more numerically stable than the closed-form quadratic solution and generalizes to arbitrary surface shapes. We seek the root of f(t)=z(t)βˆ’zsurface(x(t),y(t))=0f(t) = z(t) - z_{surface}(x(t), y(t)) = 0, iterating tn+1=tnβˆ’f(tn)/fβ€²(tn)t_{n+1} = t_n - f(t_n)/f'(t_n). The derivative is:

fβ€²(t)=Dzβˆ’c21βˆ’c2r2β‹…2(xDx+yDy)f'(t) = D_z - \frac{c}{2\sqrt{1 - c^2 r^2}} \cdot 2(x D_x + y D_y)

Starting from an initial guess at the planar intersection, 3-5 iterations suffice for convergence.

The full ray-tracing algorithm proceeds as:

  1. Find intersection Pβ†’1\vec{P}_1 with front surface S1S_1 via Newton’s method
  2. Verify x12+y12<Raperture2x_1^2 + y_1^2 < R_{aperture}^2 (ray hits the lens)
  3. Compute normal Nβ†’1\vec{N}_1, apply Snell’s law to get refracted direction Dβ†’1\vec{D}_1, check for TIR
  4. Find intersection P→2\vec{P}_2 with back surface S2S_2
  5. Verify aperture, compute normal Nβ†’2\vec{N}_2 (negated since we’re exiting), apply Snell’s law for Dβ†’2\vec{D}_2
  6. Propagate to target plane: Pβ†’final=Pβ†’2+ttargetDβ†’2\vec{P}_{final} = \vec{P}_2 + t_{target} \vec{D}_2 where ttarget=(ztargetβˆ’z2)/D2,zt_{target} = (z_{target} - z_2)/D_{2,z}

A ray is valid if it passes all aperture checks, experiences no TIR, and ttarget>0t_{target} > 0.

Optimization with PyTorch

We have three parameters, c1c_1, c2c_2, and tlenst_{lens}, and a complex simulation mapping them to a spot size. How do we find optimal values? Brute-force search over continuous parameters is hopeless. Gradient descent is smarter: if we know how the spot size changes when we tweak each parameter, we can iteratively adjust them to reduce it. The quantity we need is the gradient:

βˆ‡L=(βˆ‚Lβˆ‚c1,βˆ‚Lβˆ‚c2,βˆ‚Lβˆ‚tlens)\nabla L = \left( \frac{\partial L}{\partial c_1}, \frac{\partial L}{\partial c_2}, \frac{\partial L}{\partial t_{lens}} \right)

where LL is a loss function measuring lens quality. We define LL as the mean squared distance of ray endpoints from the origin:

L(c*1,c2,t*lens)=1Kβˆ‘_k=1K(xk2+yk2) L(c*1, c_2, t*{lens}) = \frac{1}{K} \sum\_{k=1}^{K} (x_k^2 + y_k^2)

This equals the squared RMS spot radius. We also add a penalty Ξ»(1βˆ’fvalid)\lambda(1 - f_{valid}) to discourage configurations where rays undergo TIR.

The structure of the optimization is straightforward:

  1. Initialize parameters ΞΈ=(c1,c2,tlens)\theta = (c_1, c_2, t_{lens}) and mark them as requiring gradients
  2. Repeat until convergence:
    • Run the ray tracer: Obtain {(xk,yk)}\{(x_k, y_k)\} for each ray from running the ray tracer as a function of the parameters ΞΈ\theta
    • Compute loss: L=1Kβˆ‘k(xk2+yk2)L = \frac{1}{K}\sum_k (x_k^2 + y_k^2)
    • Backpropagate: compute βˆ‡ΞΈL\nabla_\theta L automatically
    • Update the parameters: ΞΈβ†ΞΈβˆ’Ξ±β‹…Adam(βˆ‡ΞΈL)\theta \leftarrow \theta - \alpha \cdot \text{Adam}(\nabla_\theta L)

The code module contains all the physics, Newton’s method, Snell’s law, propagation, but to PyTorch it’s just a sequence of primitive operations (multiply, add, sqrt, divide) whose derivatives are known. By marking parameters as β€œrequiring gradients,” every operation gets recorded into a computation graph. Calling backpropagate then walks through this graph in reverse, applying the chain rule to compute how much each parameter contributed to the final loss.

Computing βˆ‡ΞΈL\nabla_\theta L by hand would require pages of chain rule through Newton iterations, Snell’s law, and thousands of rays. PyTorch automates this entirely. To illustrate, consider a toy example L=(c1β‹…r2)2L = (c_1 \cdot r^2)^2. Letting u=c1β‹…r2u = c_1 \cdot r^2:

c1→×r2u→(⋅)2Lc_1 \xrightarrow{\times \, r^2} u \xrightarrow{(\cdot)^2} L

When we call L.backward(), PyTorch walks backward through this graph applying the chain rule:

βˆ‚Lβˆ‚c1=βˆ‚Lβˆ‚uβ‹…βˆ‚uβˆ‚c1=2uβ‹…r2=2c1r4\frac{\partial L}{\partial c_1} = \frac{\partial L}{\partial u} \cdot \frac{\partial u}{\partial c_1} = 2u \cdot r^2 = 2 c_1 r^4

For our full ray tracer, the graph contains hundreds of operations from Newton iterations, Snell’s law, and averaging. PyTorch traverses it automatically, computing all gradients without us writing derivative code. One subtlety is that the derivative of x\sqrt{x} is 12x\frac{1}{2\sqrt{x}}, which explodes as xβ†’0x \to 0. We use a regularized square root max(x,Ο΅)\sqrt{\max(x, \epsilon)} with Ο΅β‰ˆ10βˆ’9\epsilon \approx 10^{-9} to bound gradients.

We mark parameters for optimization using torch.nn.Parameter, telling PyTorch to track computations involving them. The optimizer we use is Adam, which maintains adaptive per-parameter learning rates based on gradient statistics. It tracks exponential moving averages of gradients (mtm_t) and squared gradients (vtv_t), applying bias correction before updating:

ΞΈt+1=ΞΈtβˆ’Ξ±mΜ‚tvΜ‚t+Ο΅\theta_{t+1} = \theta_t - \alpha \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}

The optimization loop is: zero gradients, run forward pass (building the graph), compute loss, call backward() (computing gradients via chain rule), call optimizer.step() (updating parameters), repeat.

Results

The optimization started with a symmetric biconvex lens with initial curvatures c1=0.01 mmβˆ’1c_1 = 0.01 \text{ mm}^{-1} and c2=βˆ’0.01 mmβˆ’1c_2 = -0.01 \text{ mm}^{-1} (corresponding to radii R1=100R_1 = 100 mm and R2=βˆ’100R_2 = -100 mm), and thickness tlens=5t_{lens} = 5 mm. Initial loss was 0.12580.1258 (RMS spot radius Οƒβ‰ˆ0.35\sigma \approx 0.35 mm). After 300 iterations, the loss dropped to 0.0011260.001126 (Οƒβ‰ˆ0.034\sigma \approx 0.034 mm), a 100Γ— improvement. The final curvatures correspond to radii R1=95.95R_1 = 95.95 mm, R2=βˆ’94.51R_2 = -94.51 mm, with thickness 4.994.99 mm. The optimizer slightly increased both curvatures (shorter radii) and reduced thickness. Valid rays remained at 100% throughout.

The solution reveals interesting physics. Starting symmetric (|c1|=|c2||c_1| = |c_2|), the optimizer converged to a slightly asymmetric configuration, its attempt to minimize spherical aberration. The Coddington shape factor q=(R2+R1)/(R2βˆ’R1)q = (R_2 + R_1)/(R_2 - R_1) shifted from 00 to βˆ’0.0076-0.0076. For a single lens focusing collimated light, theory predicts an asymmetric shape is optimal, and the optimizer discovered this independently.

The loss history shows characteristic dynamics: rapid exponential but oscillatory descent in early iterations as large gradients drive big steps, then slowing convergence as the minimum is approached, finally plateauing around iteration 150. Convergence in ~100 iterations (rather than thousands) demonstrates the efficiency of gradient-based optimization on smooth, differentiable problems.

The first figure shows a side view of the optimized lens with ray paths converging to a tight spot, alongside the spot diagram showing final (x,y)(x, y) positions on the target, zoomed in where they’re appearing.