Designs of diffractive complex field imagers
Figure 1a illustrates a spatially multiplexed design of our diffractive complex field imager, termed the design I. This diffractive imager is composed of 5 diffractive layers (i.e., L1, L2, …, L5), where each of these layers is spatially coded with 200 × 200 diffractive features, with a lateral dimension of approximately half of the illumination wavelength, i.e., ~ /2. These diffractive layers are positioned in a cascaded manner along the optical axis, resulting in a total axial length of 150 for the entire design. A complex input object, , illuminated at is placed at the input plane in front of the diffractive layers. This complex object field exhibits an amplitude distribution that has a value range of [ADC, 1], along with a phase distribution ranging within [0, απ]. Here, ADC denotes the minimum amplitude value of the input complex field, and α is the phase contrast parameter of the input complex field. Without loss of generality, we selected default values of ADC and α as 0.2 and 1, respectively, for our numerical demonstrations. Note that it’s essential to work with ADC ≠ 0 since otherwise the phase would become undefined. After the input complex fields are collectively modulated by these diffractive layers L1-L5, the resulting optical fields at the output plane are measured by the detectors within two spatially separated output FOVs, i.e., FOVPhase and FOVAmp, which produce intensity distributions and that correspond to the phase and amplitude patterns of each input complex field, respectively. In addition, we also defined a reference signal region at the periphery of the FOVPhase, wherein the average measured intensity across is used as the reference signal for normalizing the quantitative phase signal . This normalization process is essential to ensure that the detected phase information is independent of the input light intensity fluctuations, yielding a quantitative phase image , regardless of the diffracted output power. Overall, the objective of our training process is to have the phase image channel output approximate the ground truth phase distribution of the input complex field, i.e., , demonstrating an effective phase-to-intensity (P → I) transformation. Concurrently, the training of the diffractive layers also aims to have the diffractive output image in the amplitude channel, i.e., , proportionally match the ground truth amplitude distribution of the input complex field after subtracting the amplitude DC component ADC, i.e., , thereby achieving a successful amplitude-to-amplitude (A → A) transformation performed by the diffractive processor. Note that phase-to-intensity transformation is inherently a nonlinear function58. In the phase imaging channel of our diffractive complex-field imager, the amplitude-squared operation as part of the intensity measurement at the sensor plane represents the only occurrence of nonlinearity within the processing pipeline.
In addition to the spatially multiplexed design I described above, we also created an alternative complex field imager design named design II by incorporating wavelength multiplexing to construct the amplitude and phase imaging channels. As illustrated in Fig. 1b, this approach utilizes a dual-color scheme, where the amplitude and phase of the input images are captured separately at two distinct wavelengths, with dedicated to the phase imaging channel and dedicated to the amplitude imaging channel. As an empirical parameter, without loss of generality, we selected = × 1.28 and + = 2 for our numerical diffractive designs. With this wavelength multiplexing strategy in design II, the amplitude and phase imaging FOVs can be combined into a single FOV – as opposed to 2 spatially separated FOVs as employed by design I shown in Fig. 1a. Consequently, the output amplitude and phase images, i.e., and , can be recorded by the same group of sensor pixels.
As illustrated in Fig. 1c, we also developed an additional complex field imager design, referred to as design III, which integrates both space and wavelength multiplexing strategies in constructing the amplitude and phase imaging channels. Specifically, design III incorporates two FOVs that are spatially separated at the output plane (similar to design I) for amplitude and phase imaging, also utilizing two different wavelength channels (akin to design II) to encode the output amplitude/phase images separately.
Following these design configurations (I, II and III) depicted above, we performed their numerical modeling and conducted the training of our diffractive imager models. For this training, we constructed an image dataset comprising 55,000 images of EMNIST handwritten English capital letters, and within each training epoch, we randomly grouped these images in pairs – one representing the amplitude image and another representing the phase image – thereby forming 27,500 training input complex fields. The phase contrast parameter αtr used for constructing these training input complex fields was set as 1. We utilized deep learning-based optimization with stochastic gradient descent to optimize the thickness values of the diffractive features on the diffractive layers. This training was targeted at minimizing a custom-designed loss function defined by the mean squared error (MSE) between the diffractive imager output amplitude and phase images with respect to their corresponding ground truth. More information about the structural parameters of the diffractive complex field imagers, the specific loss functions employed, and additional aspects of the training methodology can be found in the Methods section.
Numerical results and quantitative performance analysis of diffractive complex field imagers
After the training phase, the resulting diffractive layers of our complex field imager models following designs I, II and III are visualized in Supplementary Figs. S1a, S2a and Fig. 2a, respectively, showing their thickness value distributions. To evaluate and quantitatively compare the complex field imaging performances of these diffractive processors, we first conducted blind testing by selecting 10,000 test images from the EMNIST handwritten letter dataset that were never used in the training set and randomly grouped them in pairs to synthesize 5000 complex test objects. To compare the structural fidelity of the resulting output amplitude and phase images (i.e., and ) produced by our diffractive complex field imager models, we quantified the peak signal-to-noise ratio (PSNR) metrics between these diffractive output images and their corresponding ground truth (i.e., and ). Our results revealed that, for the diffractive imager model using design I that performs space-multiplexed complex field imaging, the amplitude and phase imaging channels provided PSNR values of 16.47 ± 0.96 and 14.90 ± 1.60, respectively, demonstrating a decent imaging performance. Additionally, for the diffractive imager models using designs II (and III), these performance metrics became 16.46 ± 1.02 and 14.98 ± 1.51 (17.04 ± 1.06 and 15.06 ± 1.63), respectively. Therefore, design III demonstrated a notable performance advantage over the other two models in both phase and amplitude imaging channels when both the space and wavelength multiplexing strategies were used. Apart from these quantitative results, we also presented exemplary diffractive output images for the three models of designs I, II and III in Supplementary Figs. S1b, S2b and Fig. 2b, respectively. These visualization results clearly show that our diffractive output images in both amplitude and phase channels present structural similarity to their input ground truth, even though these input complex fields were never seen by our diffractive models before. These analyses demonstrate the internal generalization of our diffractive complex field imagers, indicating their capability to process new complex fields that have similar statistical distributions to the training dataset.
where and denote the resulting output phase image when encoding the same grating pattern within the phase and amplitude channels of the input complex field, respectively; the first term represents the true signal, and the latter represents the crosstalk term in Eq. (2). Similarly, and denote the resulting output amplitude image when encoding the same grating pattern within the amplitude and phase channels of the input complex field, respectively. denotes the intensity summation operation across all the pixels. Following these definitions, we quantified the and values for all the grating imaging outputs in Supplementary Figs. S3a, b, S4a, b and Fig. 3a, b. These SCR analyses reveal that, for all the diffractive imager models, the grating inputs with 1.5 linewidth and 1 present a ~30% lower and a ~53% lower when compared to their counterparts with 3 linewidth, revealing that imaging of finer, higher-resolution patterns is more susceptible to crosstalk. Furthermore, we found that an increase in the input phase contrast ( leads to more crosstalk in the output amplitude channel, which results in a lower value; for example, from >3.5 for 0.25 down to 2.5-3 for 1. Additionally, we calculated the average and values across these grating images for different diffractive imager models; for the diffractive models using designs I, II and III, the average values are 2.805, 3.178 and 3.155, respectively, and the average values are 2.331, 2.262 and 2.252, respectively.
These analyses were performed based on amplitude and phase-only grating objects. Beyond that, we also used complex-valued gratings to further inspect the imaging performance of our diffractive models. Specifically, we created complex test fields that have the same grating patterns encoded in both the amplitude and phase channels. The results reported in the top row of the Supplementary Figs. S3c, S4c and Fig. 3c revealed that all our diffractive imager models are capable of distinctly resolving complex gratings with 3 linewidth, while being largely able to resolve those with 1.5 linewidth, albeit with occasional failure. We further created complex fields by orthogonally placing horizontal and vertical gratings, with one of these gratings encoded in the phase channel of the input field and the other encoded in the amplitude channel. As evidenced by the bottom row of Supplementary Figs. S3c, S4c and Fig. 3c, our diffractive models could successfully reconstruct the amplitude and phase patterns of the input complex fields with a grating linewidth of 3 .
Impact of input phase contrast on the performance of diffractive complex field imagers
In the analyses conducted so far, all the input fields fed into our diffractive model maintained a consistent phase contrast with an value of 1, i.e., , regardless of the training and testing phases. Next, we investigated the impact of greater input phase contrast on the performance of diffractive complex field imagers. For this analysis, we utilized the same diffractive design III model shown in Fig. 2 and tested its imaging performance using the same set of complex test objects used in Fig. 2b, but with an increased object phase contrast, , chosen within a range between 1 and 1.999 (i.e., ∈ [0, 2π)). The corresponding results are shown as the blue curve in Fig. 4a, illustrating a degradation in the imaging performance of the diffractive model as increases. This degradation is relatively minor in the PSNR results for the amplitude channel but more pronounced for the phase channel. Specifically, as increases from 1 to 1.5, the average amplitude PSNR value slightly drops from 17.04 to 15.82, while the phase PSNR falls from 15.06 to 11.96. When approaches 2, the average amplitude and phase PSNR values further decrease to 15.34 and 9.87, respectively. The visual examples in Fig. 4b and Supplementary Fig. S5a, which correspond to the cases of = 1.5 and 1.25, respectively, reveal that the amplitude channel of the diffractive model can consistently resolve the amplitude patterns of the objects, which were never encountered during the training phase. However, in the phase channel, despite the patterns being distinguishable and very well matching the ground truth, their intensities were lower than the correct level, leading to incorrect quantitative phase values and, thus, a drop in the phase PSNR values.
The numerical analyses and experimental validation presented in our work showcased a compact complex field imager design through deep learning-based optimization of diffractive surfaces. We explored three variants of this design strategy, with comparative analyses indicating that the design employing spatial and wavelength multiplexing (design III) achieves the best balance between the complex field imaging performance and diffraction efficiency, albeit with a minor increase in hardware complexity. Leveraging the all-optical information processing capabilities of multiple spatially engineered diffractive layers, diffractive complex field imagers reconstruct the amplitude and phase distributions of the input complex field in a complete end-to-end manner, without any digital image recovery algorithm, setting it apart from other designs in the existing literature for similar applications. This capability enables direct recording of the amplitude and phase information in a single snapshot using an intensity-only sensor array, which obviates the need for additional computational processing in the back-end, thereby significantly enhancing the frame rate and reducing the latency of the imaging process. Furthermore, our diffractive imager designs feature a remarkably compact form factor, with dimensions of ~100λ in both the axial and lateral directions, offering a substantial volumetric advantage. In contrast, conventional methods based on interferometry and holography often involve relatively bulky optical components and necessitate multiple measurements, leading to optical and mechanical configurations that require a large physical footprint. While some of the recent single-shot complex amplitude imaging efforts using metasurfaces have aimed for greater compactness, they typically require metalenses with large lateral sizes of >1000λ32,37,38. Moreover, achieving a similar FOV (covering several tens of wavelengths) as in our work would require imaging path lengths of thousands of wavelengths.
In our previous research, we developed diffractive processor designs tailored for imaging either amplitude distributions of amplitude-only objects51 or phase distributions of phase-only objects53,59,60. However, these designs would become ineffective for imaging complex objects with independent and non-uniform distributions in the amplitude and phase channels. In this work, we have overcome this limitation by training our diffractive imager designs using complex objects with random combinations of amplitude and phase patterns, thus allowing a single imager device to effectively generalize to complex optical fields with various distributions in the amplitude and phase channels.
The diffractive complex field imager designs that we presented also exhibit certain limitations. Our results revealed residual errors in their targeted operations, particularly manifesting as crosstalk coming from the amplitude channel into the phase channel. This suggests that the actual phase-to-intensity transformation represented by our diffractive imager, while effective, is an approximation with errors that are dependent on the object amplitude distribution. The mitigation approach for this limitation might involve further enhancement of the information processing capacity of our diffractive imagers, which can be achieved through employing a larger number of diffractive layers (forming a deeper diffractive architecture), thus increasing the overall number of diffractive features/neurons that are efficiently utilized64. Additionally, we believe another performance improvement strategy could be to increase the lateral distance between the two output FOVs dedicated to the phase and amplitude channels, thereby allowing the trainable diffractive features to better specialize for the individual tasks of phase/amplitude imaging; this approach, however, would increase the size of the output FOV of the focal plane array and also demand larger diffractive layers.
Moreover, in our experimental results, we observed the emergence of noise patterns within certain regions, which did not exist in our numerical simulations. This discrepancy can be attributed to potential misalignments and fabrication imperfections in the diffractive layers that are assembled. A mitigation strategy could be to perform “vaccination” of these diffractive imager models, which involves modeling these errors as random variables and incorporating them into the physical forward model during the training process42,50,65. This has been proven effective in providing substantial resilience against misalignment errors for diffractive processors, exhibiting a noticeably better match between the numerical and experimental results42,50,65.
Leveraging its unique attributes, our presented complex field imaging system can open up various practical applications across diverse fields. For biomedical applications, it can be seamlessly integrated into endoscopic devices66 and miniature microscopes67,68,69 to enable real-time, non-invasive quantitative imaging of tissues and cells, which might also be useful for, e.g., point-of-care diagnostics with its compactness and efficiency. This might potentially pave the way for their use in intraoperative imaging, providing surgeons with critical, high-resolution insights during a medical procedure70,71. For environmental monitoring, as another example, the presented system may facilitate the development of portable lab-on-a-chip sensors capable of quickly identifying microorganisms and pollutants, streamlining on-site quantitative analysis without delicate and tedious sample preparation steps72,73,74,75. Furthermore, the portability and compactness of these diffractive designs can make them a valuable tool for rapid inspection of materials in industrial settings76,77. Overall, this compact and efficient complex field imager design could be used in various settings, opening new avenues in scientific research and expanding the measurement capabilities for practical, real-world applications.
where N is the total number of pixels within the image. Another metric employed to quantify the performance of our diffractive complex field imaging is the grating image contrast (Q), and its definition has been provided in Eq. (1). In Eq. (1), is determined by taking the average intensity of the grating images along the grating orientation and, finding the maximum intensity values within the bar regions, and is computed in a way similar to but by locating the minimum values.
Implementation details of diffractive complex field imagers
For the diffractive imager models used for numerical analyses in this manuscript, we used a minimum sampling period of 0.3 mm for simulating the complex optical fields (i.e., 0.375 for = 0.8 mm). The lateral size of each feature on the diffractive layers is also selected as 0.3 mm. Both the input and output FOVs, including FOVPhase and FOVAmp, were set to 18 mm × 18 mm. These fields were discretized into arrays of 15 × 15 pixels, with each pixel measuring 1.2 mm (i.e., ~1.5 ).
For simulating the diffractive imager models used for experimental validation, both the sampling period for the optical fields and the lateral dimensions of the diffractive features were set at 0.4 mm (i.e., ~0.516 for = 0.775 mm). The input and output FOVs in these models were 24 mm × 24 mm (i.e., ~30.97 × 30.97 ). These fields were discretized into arrays of 5 × 5 pixels, with each pixel measuring 4.8 mm (i.e., ~6.19 ).
For training our diffractive imager models, we randomly extracted 55,000 handwritten English capital letter images from the EMNIST Letters dataset to form our training set. During the training stage, we also implemented an image augmentation technique to enhance the generalization capabilities of the diffractive models. This involves randomly flipping the input images vertically and horizontally. These flipping operations were set to be performed with a probability of 50%. For testing our diffractive models, we used a testing image dataset of 10,000 handwritten English capital letter images, which were also randomly extracted from the EMNIST Letters dataset while ensuring no overlap with the training set. In addition, for preparing the blind test images used for evaluating the external generalization capabilities of our models, we used 10,000 handwritten digit images from the MNIST testing dataset and 10,000 QuickDraw images from the QuickDraw dataset63. Before being fed into our diffractive models, all these training and testing images further underwent a bilinear downsampling and normalization process to match the corresponding dimensions and value ranges of the input amplitude or phase images.
Our diffractive models presented in this paper were implemented using Python and PyTorch. In the training phase, each mini-batch was set to consist of 64 randomly selected EMNIST handwritten letter images from the EMNIST dataset78. Subsequently, these images were randomly grouped in pairs to synthesize complex fields. Within each training iteration, the loss value was calculated, and the resulting gradients were back-propagated accordingly to update the thickness profiles of each diffractive layer using the Adam optimizer79 with a learning rate of 10−3. The entire training process lasted for 100 epochs, which took ~6 h to complete using a workstation equipped with a GeForce RTX 3090 GPU.
Experimental terahertz set-up
For our proof-of-concept experiments, we fabricated both the diffractive layers and the test objects using a 3D printer (PR110, CADworks3D). The phase objects were fabricated with spatially varying thickness profiles to define their phase distributions. The amplitude objects were printed to have a uniform thickness and then manually coated with aluminum foil to define the light-blocking areas, while the uncoated sections formed the transmission areas, resulting in the creation of the desired amplitude profiles for test objects. Additionally, we 3D-printed a holder using the same 3D printer, which facilitated the assembly of the printed diffractive layers and input objects to align with their relative positions as specified in our numerical design. To more precisely control the beam profile for the illumination of the complex input objects, we 3D printed a square-shaped aperture of 5 × 5 mm and padded the area around it with aluminum foil. The pinhole was positioned 120 mm away from the object plane in our experiments. This pinhole serves as an input spatial filter to clean the beam originating from the source.
To test our fabricated diffractive complex field design, we employed a THz continuous-wave scanning system, with its schematic presented in Fig. 7c. To generate the incident terahertz wave, we used a WR2.2 modular amplifier/multiplier chain (AMC) followed by a compatible diagonal horn antenna (Virginia Diode Inc.) as the source. Each time, we transmitted a 10 dBm sinusoidal signal at frequencies of 11.111 or 10.417 GHz (fRF1) to the source, which was then multiplied 36 times to generate output radiation at continuous-wave (CW) radiation at frequencies of 0.4 or 0.375 THz, respectively, corresponding to the illumination wavelengths of 0.75 and 0.8 mm used for the phase and amplitude imaging tasks, respectively. The AMC output was also modulated with a 1 kHz square wave for lock-in detection. We positioned the source antenna to be very close to the 3D-printed spatial pinhole filter, such that the illumination power input to the system could be maximized. Next, using a single-pixel detector with an aperture size of ~0.1 mm, we scanned the resulting diffraction patterns at the output plane of the diffractive complex field imager at a step size of 0.8 mm. This detector was mounted on an XY positioning stage constructed from linear motorized stages (Thorlabs NRT100) and aligned perpendicularly for precise control of the detector’s position. For illumination at = 0.75 mm or = 0.8 mm, a 10-dBm sinusoidal signal was also generated at 11.083 or 10.389 GHz (fRF2), respectively, as a local oscillator and sent to the detector to down-convert the output signal to 1 GHz. The resulting signal was then channeled into a low-noise amplifier (Mini-Circuits ZRL-1150-LN+) with an 80 dBm gain, followed by a bandpass filter at 1 GHz (±10 MHz) (KL Electronics 3C40-1000/T10-O/O), effectively mitigating noise from undesired frequency bands. Subsequently, the signal passed through a tunable attenuator (HP 8495B) for linear calibration before being directed to a low-noise power detector (Mini-Circuits ZX47-60). The voltage output from the detector was measured using a lock-in amplifier (Stanford Research SR830), which utilized a 1 kHz square wave as the reference signal. The readings from the lock-in amplifier were then calibrated into a linear scale. In our post-processing, we further applied linear interpolation to each intensity field measurement to align with the pixel size of the output FOV used in the design phase. This process finally resulted in the output measurement images presented in Fig. 6c, f.