Reinforcement Learning Accelerates Model-free Training of Optical AI Systems
The Californer/10335641

Trending...
LOS ANGELES - Californer -- Optical computing has emerged as a powerful approach for high-speed and energy-efficient information processing. Diffractive optical networks, in particular, enable large-scale parallel computation through the use of passive structured phase masks and the propagation of light. However, one major challenge remains: systems trained in model-based simulations often fail to perform optimally in real experimental settings, where misalignments, noise, and model inaccuracies are difficult to capture.

In a new paper, researchers at the University of California, Los Angeles (UCLA) introduce a model-free in situ training framework for diffractive optical processors, driven by Proximal Policy Optimization (PPO), a reinforcement learning algorithm known for stability and sample efficiency. Rather than rely on a digital twin or the knowledge of an approximate physical model, the system learns directly from real optical measurements, optimizing its diffractive features on the hardware itself.

More on The Californer
"Instead of trying to simulate complex optical behavior perfectly, we allow the device to learn from experience or experiments," said Aydogan Ozcan, Chancellor's Professor of Electrical and Computer Engineering at UCLA and the corresponding author of the study. "PPO makes this in situ process fast, stable, and scalable to realistic experimental conditions."

To demonstrate that PPO can successfully teach an optical processor how to perform a computational task even without knowing the underlying physics of the experimental setup, UCLA researchers carried out comprehensive experimental tests to demonstrate adaptability across multiple optical tasks. For example, the system successfully learned to focus optical energy through a random, unknown diffuser, faster than standard policy-gradient optimization, demonstrating its ability to explore the optical parameter space efficiently. The same framework was also applied to hologram generation and to aberration correction. In another demonstration, the diffractive processor was trained on the optical hardware to classify handwritten digits using measurements. As the in situ training progressed, the output patterns became clearer and more distinct for each input number, showing correct classification without any digital processing.

More on The Californer
PPO reuses measured data for multiple update steps while constraining policy shifts; therefore, it significantly reduces experimental sample requirements and prevents unstable behavior during training, making it ideal for noisy optical environments. This approach is not limited to diffractive optics but can be applied to many other physical systems that provide feedback and can be adjusted in real-time.

"This work represents a step toward intelligent physical systems that autonomously learn, adapt, and compute without requiring detailed physical models of an experimental setup," said Ozcan. "The approach could expand to photonic accelerators, nanophotonic processors, adaptive imaging systems, and real-time optical AI hardware."

This research received funding from ARO, USA. Ozcan is also an Associate Director of the California NanoSystems Institute (CNSI).

Article: https://www.nature.com/articles/s41377-025-02148-7

Source: ucla ita
Filed Under: Science

Show All News | Report Violation

0 Comments

Latest on The Californer