πŸ‘¨πŸ»β€πŸ”¬ Connect With Me

  • 🎯 Seeking Roles: Research or industry roles where I can leverage my multi-disciplinary toolkit, combining low-level systems logic, machine learning workflows, and modern web architecture, to solve real-world problems in engineering, finance, and adjacent fields.
  • πŸ“ CV / Resume: View my Resume
  • πŸ“ Location: Bologna, Italy
  • πŸ“± Phone: (+39) 366 4207296
  • πŸ“§ Email: giovannigravili112@gmail.com
  • πŸ”— LinkedIn: linkedin.com/in/giovanni-gravili
  • πŸ™ GitHub: github.com/ghovax

Selected Projects

πŸ’  Computational Design of Caustic Optics using PyTorch

I’ve been learning PyTorch extensively and believe the best approach is to combine physics with machine learning. My goal in this article is to show how to model a lens while enforcing physical constraints, so it reproduces a custom pattern via caustics, that is, the rays of light passing through the lens and focusing in specific directions. This problem appears simple but conceals considerable mathematical complexity, which I will explain step by step. The code is publicly available at the following link.

The setup: a lens with a two free-form surfaces, thickness tlenst_{lens}, and radius rlensr_{lens} is positioned at distance dbackd_{back} from a screen where the caustics are projected. The reference axis is defined so that the axis through the lens center is positive when moving away from the screen; thus z=0z = 0 is the screen and increases toward the back of the lens. A parallel bundle of rays from infinity strikes the lens under geometric optics, so we treat light only by refraction (Snell’s law), neglecting diffraction because the aperture is much larger than the wavelength. We seek the mathematical formulation, subject to physical constraints, for the front surface height hfront(x,y)h_{front}(x,y) that produces the desired screen pattern; the lens is glass with refractive index nlensβ‰ˆ1.49n_{lens} \approx 1.49.

Given this setup, the goal is to ensure the surface distribution of light intensity projected by the lens shape hfront(x,y)h_{front}(x, y) closely matches a specified target distribution, which is the pattern we intend to recreate. To do this, we must first understand, in practical terms, the formulas and procedures for tracking the positions of light rays as they propagate through the lens and project onto the screen.

First of all, the light rays come from infinity and all parallel, as a bundle, with a direction versor

dΜ‚in=(0,0,βˆ’1)\hat{d}_{in} = (0, 0, -1)

directed to the z=0z = 0 of the projection surface. Each ray is coming from a specific point along the xx and yy axes, xinx_{in} and yiny_{in}, respectively. They intersect the lens surface at a generic position

zfront(x,y)=dback+tlens+hfront(x,y)z_{front}(x, y) = d_{back} + t_{lens}+ h_{front}(x, y)

This makes the intersection point between the front surface of the lens given in coordinate space by

P→front=(xin,yin,zfront(x,y))\vec{P}_{front} = (x_{in}, y_{in}, z_{front}(x, y))

To apply Snell’s law, refracting the ray, we need to find the surface normal nΜ‚lens\hat{n}_{lens} at that point, technically given by

nΜ‚front=βˆ‡Fβˆ₯βˆ‡Fβˆ₯\hat{n}_{front} = \frac{\nabla F}{\|\nabla F\|}

where

F(x,y,z)=zβˆ’(dback+tlens+hfront(x,y))=0F(x, y, z) = z βˆ’(d_{back} + t_{lens} + h_{front}(x, y))=0

is the surface function. This means we need to calculate the gradients of the height function. Because we can express the height function in a differentiable form using Zernike polynomials, PyTorch automatic differentiation computes each component of the derivative, making the calculation trivial. Then, the ray passes from the front surface of the lens through the glass: Snell’s law says that the direction of propagation of light within the glass lens, dΜ‚lens\hat{d}_{lens}, upon hitting the surface, is related the dΜ‚in\hat{d}_{in} by the refractive index of the glass, Ξ·=1nlens\eta = \frac{1}{n_{lens}}, as nair=1n_{air} = 1 by definition. Through geometric identities based on the dot product it’s possible to derive that

dΜ‚lens=Ξ·dΜ‚in+(Ξ·cosΞΈinβˆ’1βˆ’Ξ·2sin2ΞΈin)nΜ‚front\hat{d}_{lens} = \eta \hat{d}_{in} + \left(\eta \cos{\theta_{in}} - \sqrt{1 - \eta^2 \sin^2{\theta_{in}}}\right) \hat{n}_{front}

where

cosΞΈin=βˆ’dΜ‚inβ‹…nΜ‚front\cos{\theta_{in}} = -\hat{d}_{in} \cdot \hat{n}_{front}

After it’s calculation, this needs to be normalized. To find where the ray hits the back surface of the lens, we need to solve the equation

zfront+tback(dΜ‚lensβ‹…zΜ‚)=zbackz_{front} + t_{back} (\hat{d}_{lens} \cdot \hat{z}) = z_{back}

for tbackt_{back}, which yields the intersection point

P→back=P→front+tbackd̂lens\vec{P}_{back} = \vec{P}_{front} + t_{back} \hat{d}_{lens}

At the back surface

zback(x,y)=dback+hback(x,y)z_{back}(x, y) = d_{back} + h_{back}(x, y)

a second refraction occurs as light exits the lens back into air. Again we apply Snell’s law, but now the refractive index ratio is

Ξ·β€²=nlensnair=nlens\eta' = \frac{n_{lens}}{n_{air}} = n_{lens}

The back surface normal nΜ‚back\hat{n}_{back} is computed identically from the gradient of hback(x,y)h_{back}(x, y), and the refracted direction in air is

dΜ‚out=Ξ·β€²dΜ‚lens+(Ξ·β€²cosΞΈlensβˆ’1βˆ’Ξ·β€²2sin2ΞΈlens)nΜ‚back\hat{d}_{out} = \eta' \hat{d}_{lens} + \left(\eta' \cos{\theta_{lens}} - \sqrt{1 - \eta'^2 \sin^2{\theta_{lens}}}\right) \hat{n}_{back}

where

cosΞΈlens=βˆ’dΜ‚lensβ‹…nΜ‚back\cos{\theta_{lens}} = -\hat{d}_{lens} \cdot \hat{n}_{back}

Finally, the ray propagates from P→back\vec{P}_{back} in direction d̂out\hat{d}_{out} until it hits the screen at z=0z = 0. The screen intersection is simply

P→screen=P→back+tscreend̂out\vec{P}_{screen} = \vec{P}_{back} + t_{screen} \hat{d}_{out}

where

tscreen=0βˆ’zbackdΜ‚outβ‹…zΜ‚t_{screen} = \frac{0 - z_{back}}{\hat{d}_{out} \cdot \hat{z}}

The (x,y)(x, y) coordinates of P→screen\vec{P}_{screen} determine where the ray contributes intensity to the caustic pattern.

Zernike Polynomial Parameterization

The critical question is: how do we represent hfront(x,y)h_{front}(x, y) and hback(x,y)h_{back}(x, y) in a form that is both differentiable and physically reasonable? The answer lies in Zernike polynomials, an orthogonal basis over the unit disk that are standard in optical surface description. A Zernike polynomial Znm(ρ,θ)Z_n^m(\rho, \theta) is indexed by radial degree nn and azimuthal frequency mm, where

ρ=x2+y2/rlens\rho = \sqrt{x^2 + y^2}/r_{lens}

is the normalized radial coordinate and

ΞΈ=arctan(y/x)\theta = \arctan(y/x)

is the angular coordinate. The polynomial is defined as:

Znm(ρ,ΞΈ)=Rn|m|(ρ)Γ—{cos(mΞΈ)mβ‰₯0sin(|m|ΞΈ)m<0Z_n^m(\rho, \theta) = R_n^{|m|}(\rho) \times \begin{cases} \cos(m\theta) & m \geq 0 \\ \sin(|m|\theta) & m < 0 \end{cases}

where Rnm(ρ)R_n^m(\rho) is the radial polynomial:

Rnm(ρ)=βˆ‘k=0(nβˆ’m)/2(βˆ’1)k(nβˆ’k)!k!(n+m2βˆ’k)!(nβˆ’m2βˆ’k)!ρnβˆ’2kR_n^m(\rho) = \sum_{k=0}^{(n-m)/2} \frac{(-1)^k (n-k)!}{k! (\frac{n+m}{2}-k)! (\frac{n-m}{2}-k)!} \rho^{n-2k}

Each surface is expressed as a weighted sum:

h(x,y)=βˆ‘iciZnimi(ρ,ΞΈ)h(x, y) = \sum_{i} c_i Z_{n_i}^{m_i}(\rho, \theta)

where the coefficients cic_i are the learnable parameters optimized by PyTorch. For instance, with maximum radial order nmax=6n_{max} = 6, we obtain 28 Zernike modes per surface. Low-order modes (e.g., Z20Z_2^0 corresponding to defocus) have large-scale effects, while high-order modes introduce fine features. To compute the surface normal, we need βˆ‡h(x,y)\nabla h(x, y). PyTorch’s automatic differentiation transparently handles this: during the forward pass, we simply call backward() and PyTorch computes

βˆ‚hβˆ‚ci\frac{\partial h}{\partial c_i}

for each coefficient, propagating gradients through the entire ray-tracing pipeline.

Differentiable Histogram via Gaussian Splatting

After tracing NN rays, we obtain a set of screen hit positions {pβ†’k}k=1N\{\vec{p}_k\}_{k=1}^N with validity weights wk∈{0,1}w_k \in \{0, 1\} indicating whether ray kk successfully reached the screen (some rays may undergo total internal reflection or miss the screen bounds). To compare against the target pattern, we must convert these discrete points into a continuous 2D intensity distribution. Traditional histogram binning is non-differentiable due to the discrete assignment of points to bins. Instead, I use Gaussian splatting: each ray kk contributes a Gaussian kernel centered at pβ†’k\vec{p}_k to nearby grid cells. Formally, the histogram at grid cell (i,j)(i, j) with center gβ†’ij\vec{g}_{ij} is:

H(i,j)=βˆ‘kwkexp(βˆ’βˆ₯pβ†’kβˆ’gβ†’ijβˆ₯22Οƒ2)H(i, j) = \sum_k w_k \exp\left(-\frac{\|\vec{p}_k - \vec{g}_{ij}\|^2}{2\sigma^2}\right)

where σ\sigma is the kernel width (typically 1-2 grid cells). This operation is fully differentiable: gradients flow from HH back to p→k\vec{p}_k, then through the ray-tracing equations to the Zernike coefficients. The choice of σ\sigma balances resolution and smoothness; smaller σ\sigma gives sharper features but noisier gradients.

Loss Function Design

The optimization objective is a weighted combination of several terms, each enforcing different physical and aesthetic constraints. The primary term is data fidelity, which measures how closely the predicted histogram HH matches the target TT. I use a combination of L1 loss

βˆ₯Hβˆ’Tβˆ₯1\|H - T\|_1

for robustness and Sinkhorn divergence

WΟ΅(H,T)W_\epsilon(H, T)

for spatial transport. The Sinkhorn divergence, an entropic approximation to the optimal transport distance, is particularly effective because it measures the β€œwork” needed to transform one distribution into another, naturally handling spatial shifts. It is computed via iterative Sinkhorn scaling:

WΟ΅(H,T)=minP∈Π(H,T)⟨P,CβŸ©βˆ’Ο΅H(P)W_\epsilon(H, T) = \min_{P \in \Pi(H, T)} \langle P, C \rangle - \epsilon H(P)

where

Cij=βˆ₯gβ†’iβˆ’gβ†’jβˆ₯2C_{ij} = \|\vec{g}_i - \vec{g}_j\|^2

is the cost matrix and Ο΅\epsilon is the entropic regularization parameter (typically 0.01). This converges in approximately 100 iterations.

Surface smoothness regularization prevents unphysical high-frequency oscillations by penalizing high-order Zernike coefficients more heavily via

βˆ‘i(ni+1)2ci2\sum_i (n_i + 1)^2 c_i^2

where nin_i is the radial degree of mode ii. This biases the optimizer toward low-order aberrations, which are easier to manufacture. To ensure the predicted pattern has similar spatial spread as the target, I match both the Shannon entropy

H(P)=βˆ’βˆ‘ipilogpiH(P) = -\sum_i p_i \log p_i

and the second spatial moments (variance). Rays undergoing total internal reflection fail to reach the screen, reducing light efficiency, so the term

(1βˆ’fvalid)2(1 - f_{valid})^2

encourages designs that minimize TIR, where fvalidf_{valid} is the fraction of rays that successfully reach the screen. Finally, a surface separation constraint ensures the two free-form surfaces maintain a minimum separation Ξ”zmin\Delta z_{min} (e.g., 0.5 mm) to prevent physical overlap. I sample random points across the aperture and penalize violations via

βˆ‘kReLU(Ξ”zminβˆ’(tlens+hfront(xk,yk)βˆ’hback(xk,yk)))2\sum_k \text{ReLU}(\Delta z_{min} - (t_{lens} + h_{front}(x_k, y_k) - h_{back}(x_k, y_k)))^2

The total loss is

β„’=Ξ»1β„’data+Ξ»2β„’smooth+Ξ»3β„’entropy+Ξ»4β„’TIR+Ξ»5β„’separation\mathcal{L} = \lambda_1 \mathcal{L}_{data} + \lambda_2 \mathcal{L}_{smooth} + \lambda_3 \mathcal{L}_{entropy} + \lambda_4 \mathcal{L}_{TIR} + \lambda_5 \mathcal{L}_{separation}

where the weights Ξ»i\lambda_i are hyperparameters tuned to balance competing objectives.

Optimization Loop

The optimization uses the Adam optimizer with learning rate

Ξ±=0.001\alpha = 0.001

and a ReduceLROnPlateau scheduler that halves the learning rate when the loss plateaus for 50 iterations. Gradient clipping (max norm = 1.0) prevents instabilities from sharp refractions. A typical run executes 500-1000 iterations, taking a few minutes on a GPU. Each iteration:

  1. Samples ray entry points uniformly on a grid within the lens aperture
  2. Traces rays through both lens surfaces using the current Zernike coefficients
  3. Creates the predicted histogram via Gaussian splatting
  4. Computes the loss
  5. Backpropagates gradients through the entire pipeline to update Zernike coefficients

The code exports the optimized lens surfaces as STL files for 3D printing and generates animated GIFs showing the caustic pattern evolving during optimization.

Results and Observations

Running the optimizer on a target pattern (e.g., a logo or simple shape), I observe several phenomena. The loss typically decreases rapidly in the first 100 iterations as low-order Zernike modes (defocus, astigmatism) adjust the overall ray distribution. Later iterations refine fine details via higher-order modes, demonstrating the hierarchical nature of the Zernike basis. The Gaussian splatting kernel width Οƒ\sigma directly controls pattern sharpness: smaller Οƒ\sigma produces crisper edges but requires more rays to avoid noisy gradients, revealing a fundamental tradeoff between resolution and optimization stability. The surface separation penalty proves crucial in practice, without it, the optimizer occasionally produces overlapping surfaces that are physically impossible to manufacture, highlighting the importance of encoding domain constraints directly into the loss function. Like most non-convex optimizations, the final result depends on initialization. Starting with negative defocus (concave front surface) helps spread rays, providing better initial coverage of the screen and reducing the likelihood of getting trapped in poor local minima.

This project demonstrates how modern automatic differentiation frameworks enable inverse design in classical physics domains. The key insight is that ray tracing, despite involving geometric intersections and conditional logic, can be made differentiable through careful formulation. The same approach extends naturally to more complex optical systems, adding more refractive surfaces, diffractive elements, or wavelength-dependent dispersion simply requires expanding the forward model while PyTorch handles the gradient computation automatically.