Abstract
We demonstrate transfer learning as a tool to improve the efficacy of training deep learning models based on residual neural networks (ResNets). Specifically, we examine its use for study of multi-scale electrically large metasurface arrays under open boundary conditions in electromagnetic metamaterials. Our aim is to assess the efficiency of transfer learning across a range of problem domains that vary in their resemblance to the original base problem for which the ResNet model was initially trained. We use a quasi-analytical discrete dipole approximation (DDA) method to simulate electrically large metasurface arrays to obtain ground truth data for training and testing of our deep neural network. Our approach can save significant time for examining novel metasurface designs by harnessing the power of transfer learning, as it effectively mitigates the pervasive data bottleneck issue commonly encountered in deep learning. We demonstrate that for the best case when the transfer task is sufficiently similar to the target task, a new task can be effectively trained using only a few data points yet still achieve a test mean absolute relative error of 3 % with a pre-trained neural network, realizing data reduction by a factor of 1000.
1 Introduction
The fields of photonics, plasmonics, metamaterials, and metasurfaces, along with other artificial electromagnetic material (AEM) systems, have witnessed remarkable advancements facilitated by the maturation of deep learning (DL) techniques [1], [2], [3]. Numerous studies have highlighted the benefits of employing data-driven approaches, such as accelerated modeling and inverse design of complex systems [4], [5], [6], [7], [8]. Traditionally, computational electromagnetic solvers (CEMS) such as finite difference time domain (FDTD) and the finite element method (FEM) have been extensively used in AEM research, effectively solving Maxwell’s equations and enabling accurate design for both applied and basic research pursuits [9], [10], [11], [12]. However, CEMS operate as grid solvers, requiring complete re-simulation of the entire problem even for minor changes in geometry. While this may not pose significant challenges for simple geometric arrangements or single unit-cell periodic systems, it becomes burdensome for electrically large arrays, often taking hours or even days for simulation [13], [14], [15]. In contrast, deep learning offers a compelling alternative by training neural networks on a relatively small set of ground truth CEMS simulations to create surrogate (proxy) models for a specific metasurface system [16]. When using deep learning, one typically creates a dataset of the form
Deep learning – and DNNs in particular – have found success as surrogate models for a wide variety of problems, enabling the rapid exploration of the scattering properties of AEMs of unprecedented complexity. DNNs have also displayed considerable potential in tackling inverse problems within AEM studies, wherein the goal is to infer the geometric structure needed to obtain some designed scattering response (i.e., find g given some setting of s) [1], [8]. However, generating the dataset to train the DNNs can be time-consuming and even impracticable in computational, and especially experimental, scenarios depending on the problem’s complexity and the design space’s size. It is important to note that the required dataset size for a specific level of accuracy in deep learning is unknown a priori, potentially introducing significant risk into its use in resource-constrained settings. Additionally, this data bottleneck issue persists even when deploying deep learning to solve new – but highly similar – problems, as this requires the procurement of a completely new training dataset. For example, two problems may have identical geometric parameterizations but need to be solved in different frequency ranges (i.e., same g but different s), or they may share the same spectral range but have distinct geometric parameterizations (i.e., same s but different g). In both cases, data generation must start anew, as a well-trained network’s accuracy is limited to the specific problem it was trained on. However, when two problems are related, it may be possible to leverage patterns observed in one problem’s data – referred to as the source problem – to reduce the amount of new training data needed to achieve satisfying accuracy on the second distinct problem, called the target problem. This general concept, known as transfer learning in machine learning literature [17], [18], has proven highly effective in mitigating the data bottleneck problem across various applications, such as image processing [19], [20] and natural language processing [21].
DNNs are typically trained using mini-batch gradient descent, which is an iterative process where the model parameters, Θ, are repeatedly updated using small subsets of the training data. The model parameter settings at the outset of training can have a significant impact on the final performance of the model. In conventional training the model parameters are initialized to random values, using some predefined sampling distribution, such as Xavier or Kaiming initialization [22]. The approach for transfer learning, however, is a two-stage process. The first step involves training a DNN on the source task, and the second stage is to use the resultant accurate model parameters as an initialization for the training process on the target task. In this approach, the model is said to be pre-trained on the source task, and then fine-tuned on the target task. The transfer learning method has been found to be highly effective in a wide variety of application domains, often dramatically improving model performance on the target tasks, see Figure 1.
Recently, several attempts have been made to apply transfer learning methods to AEM problems [23], [24], [25], [26]. These studies use a common input parameterization space and treat a metamaterial element or array as an image. For example, in [23], [24], transfer learning is applied to different discrete image representations of a two-dimensional shape consisting of a single metamaterial element. Another method examined an array-level problem in an inverse approach, solving for intermediate physical quantities, such as the phase map, for the target scattering properties, such as farfield patterns [25], [26]. However, our study distinguishes itself from other works by accepting a much more general input, which permits the study of drastically different external physical conditions. In our work, each entry in the input array represents an abstract geometric parameter of the element, such as the radius or the length, resulting in a simple and scalable representation of the geometric space. We solve the direct problem to obtain the scattering properties resulting from a diverse set of geometries. Our work presents a more versatile transfer learning application valid for a much broader range of AEM problems than the aforementioned works.
Despite the effectiveness of transfer learning shown in some recent works, it is still limited to relatively similar problems with a shared input space. In addition, pre-training is seldom used in AEM applications for several reasons. First, the geometric parameterizations of AEM problems vary widely, meaning that the definition and dimensionality of the input space, g, can differ significantly. This makes it difficult to apply transfer learning when the dimensionality of the source and target problems is different. Second, transfer learning is most effective when the meaning of the input parameters is similar across the source and target problems. This condition is rarely satisfied across AEM problems that have been studied with deep learning. In [27], stacks of both spherical and planar layers were studied where the definition of the thickness was different between the two problems, but still related via a coordinate transform. However, for more significant changes in an input geometrical parameter the settings of g can refer to fundamentally different geometric structures of the AEM, making pre-training difficult or less effective. Finally, it is not currently common practice for authors in the AEM community to share their CEMS datasets or pre-trained model parameters, which makes it difficult for other researchers to fine-tune these models on new AEM tasks. In this work, we aim to explore the use of transfer learning on a much wider range of AEM problem types where physical conditions change and the input space does not have the same meaning for different problems.
1.1 Contributions of this work
In this work we demonstrate the tremendous potential benefits of transfer learning for AEM tasks, motivating its broader adoption in the AEM community. To study the benefits of transfer learning, we use it to develop surrogate DNN-based models for the multi-scale problem of scattering from electrically large random metasurface arrays – a problem commonly encountered in applications such as metamaterial lens design [28], [29], disordered metasurfaces [30], [31], and Huygens metasurfaces [32]. We begin by generating a large number of simulations for a source task, and then explore several different target tasks with varying problem features – the base material, geometric shape, incident angle, and polarization of the excitation. We evaluate the accuracy of DNN-based surrogate models on each target task as a function of the quantity of data used to train the model, and perform a direct quantitative comparison to the accuracy of source tasks with and without transfer learning. Our results indicate that, using transfer learning, we can often dramatically reduce the quantity of training data that is needed to achieve a desired prediction accuracy for new problems, e.g., often over an 80 % reduction, and as high as 90 %. Crucially, we also find that transfer learning is never detrimental over our diverse target problem settings, suggesting that there is little risk in utilizing it.
To enable the benefits of transfer learning, we propose a geometric parameterization that makes transfer learning straightforward across all of the proposed tasks, and allows us to leverage existing successful DNN-based models from the computer vision literature. We therefore represent the geometric structure of the AEMs as a 2D array – similar to an image – where each entry in the array corresponds to a geometric parameter of the unit-cell, as illustrated in Figure 1. This structure allows us to adopt convolutional DNN model architectures from the computer vision community, which can take advantage of the spatial structure of the underlying metasurface when making predictions. This parameterization also means that little or no modification is required to utilize pre-trained model parameters from the source task, making it easy to use pre-trained weights for each target task.
This paper is organized as follows. First, we present the analysis of using a residual neural network structure to solve a base metamaterial array scattering problem, which serves as the source problem from which we transfer knowledge. Second, we briefly introduce the discrete dipole approximation (DDA) method for efficient data generation. Next, we explore five target tasks involving various condition changes, including element material, shape, and excitation. Finally, we perform a quantitative analysis on the effectiveness of transfer learning in terms of error reduction as a function of dataset size, as well as data reduction for given levels of accuracy. For each target task we also make a comparison between the transfer learning approach and for the case where we use random initialization and do not use any transfer knowledge.
2 Methods
2.1 Array scattering problem settings
We aim to solve the scattering problem for different random arrays under similar, yet distinct, physical conditions. Specifically, one class of problems, or equivalently one task, examines scattering from a random metasurface array consisting of elements with distinct morphology but with fixed periodicity and excitation fields. Another problem class involves array scattering from the base problem, but with different excitation fields. In each task instance, the input variables are the geometric parameters of different random arrays, denoted as g, and the output variables are the far-field patterns, i.e., the electromagnetic responses of the array, denoted as s.
2.2 Deep convolutional neural network
The scattering patterns from the metamaterial array are determined by the specific spatial arrangement of single elements in the array subject to different excitations and other environmental conditions. Therefore to make our method as general as possible, i.e. transferable to as many different types of metamaterials and physical scenarios as possible, we represent the geometry space g as a numeric matrix. This numeric matrix g is of the size C × N × M, where C represents the total number of tunable geometric parameters for each single element, e.g. some critical length or radius, and N, M are the 2D array sizes. Therefore, large arrays of metamaterial elements can be characterized in a compact and scalable fashion even if the elements are three-dimensional with fine features. The use of a CNN also permits preservation of the spatial relationship between elements in the input numeric matrix, thereby allowing the convolutional filters to learn element interactions.
While the geometric layout can be readily represented using matrices, the random array far-field radiation pattern is generally a function of 3D spherical angles, as depicted in Figure 1. For simplicity, and without loss of generality, we consider only one representative slice in the 3D radiation pattern along a constant forward scattered azimuthal angle. The output far-field patterns |E| are sampled along the forward scattering direction with varying elevation (polar) angles from 0 (North Pole) to 180 (South Pole) degrees, represented by a 181 × 1 positive number vector, so that
Due to the large magnitude variations of the far-field patterns, we formulated a mixed linear-log loss, which we found to be more effective, in comparison to the traditional MSE loss (applied in either the linear or log scale.) We define our training loss as:
where h(x) = ln(1 + x). The log-scale term is used to decrease the loss for the scattering pattern expressed in a dB scale. We found for the hyperparameter α that α = 5 is a suitable choice for the far-field scattering problem in this paper, which balances the loss between the linear and logarithm scales. For the test set we use the mean absolute relative error (MARE), defined as:
This metric ensures that the reported loss is independent of the absolute magnitude level of the far-field signal, making it suitable for comparisons across different problems.
CNNs can be constructed with different architectures, which refer to the order in which data is processed (e.g., its connectivity) and the specific processing that takes place. In this work, we use a residual convolutional neural network (ResNet), which has been found highly effective for computer vision problems [33] by exploiting the local information between close neighbors – similar to the input data we consider here. Using a 2-dimensional matrix encoding of our material design, we can leverage state-of-the-art CNN architectures from the computer vision literature. Furthermore, we hypothesize that the 2-D ResNet convolution operations resemble the far-field inference procedure employed for Green’s functions, and therefore, may be well-suited for modeling far-field radiation patterns. Our ResNet consists of one base convolution layer, followed by six residual blocks with (64, 64, 128, 128, 256, 256) convolution channels. Each residual block in the ResNet architecture consists of two identically sized convolutional layers connected by a skip connection. Each convolutional (Conv) layer is followed by a batch normalization (BN) layer and a ReLU activation. The base convolution kernel has a size of 7 × 7, and the remaining convolutional layers have a kernel size of 3 × 3. These residual blocks are followed by one max pooling layer of 2 × 2, and one fully-connected layer to yield the 181 × 1 vector output. If sufficient training data are collected and used to train a CNN model such that
2.3 Transfer learning using fine-tuning
As mentioned, the trained CNN model is accurate only for the dataset on which it was trained, and therefore cannot be used directly to solve a new task. An entirely new dataset D tgt for the target problem needs to be generated, and a new CNN must be trained using D tgt . However, if the new task shares some common underlying physics, transfer learning can leverage the information incorporated in the trained CNN for the new task. We begin with an initial modeling problem, termed the source problem, for which a large training dataset exists, denoted D src . We then train a ResNet model on D src , resulting in a trained CNN with weights θ src that achieve highly accurate predictions for (g, s) ∈ D src . In the target problem, denoted D tgt , we vary some aspects of the experimental scenario of the base problem, such as the geometry or excitation. In the traditional approach, we would need to collect a large quantity of data again to obtain accurate model parameters for D tgt ; however, we hypothesize that initializing the gradient descent training procedure with the model parameters θ src will lead to improved performance compared to randomly initializing the model parameters – the procedure typically used when training CNNs for a new task.
2.4 Discrete dipole approximation method for scattering
To generate the dataset in a practically realizable time, a semi-analytic method called the discrete dipole approximation (DDA) is used as an alternative to commercial full-wave simulation software. A finite-sized metasurface array may consist of dielectric or metallic elements and may be of large electrical size. It is computationally expensive or even impossible to simulate the entire array depending on the electrical length and desired wavelength range, which specifies the grid size and therefore the total problem size. The local slowly-varying approximation (SVA) is usually applied to these types of array problems by solving for the electromagnetic response of a single metamaterial element with periodic boundary conditions, thereby modeling an infinite identical array [34]. The local periodic response is then used to approximate the actual response for the element when embedded in a random array. The SVA is valid if the response fields from element to element are smooth, and the change in the coupling between elements, due to the random arrangement, is negligible. However, this method may result in poor accuracy since the local response of one element can be drastically different from the periodic response of the same element with different neighboring elements. The problem we explore here cannot be studied with SVA and therefore we use the polarizability retrieval method together with the discrete dipole approximation method to extract the response of the elements embedded in a random array. The DDA method, along with the retrieval method, enables accurate and fast large-area metasurface simulation.
Many sub-wavelength metasurfaces can be accurately approximated by an electric dipole
where for example,
The polarizability tensor
With a polarizability tensor determined for every desired metamaterial element, the random metasurface array scattering problem is then reduced to solving a linearized integral equation system which, for an arbitrary jth metamaterial element in the array, is described by,
where E
0, H
0 are the incident electric and magnetic fields at the jth element location, and
The far-field pattern is the coherent summation of the radiated fields from all the electric and magnetic dipoles with the far-field approximation, i.e.
and for a magnetic dipole by,
where
3 Results
In this section, the training results of using transfer learning for 5 target problems (see Table 1) from one base problem are presented.
Task name | Material | Shape | Incident angle | Polarization | |
---|---|---|---|---|---|
0 | Base | Silicon | Sphere | 13° | TE |
1 | 0° incident angle | Silicon | Sphere | 0° | TE |
2 | 30° incident angle | Silicon | Sphere | 30° | TE |
3 | TM polarization | Silicon | Sphere | 13° | TM |
4 | PEC spheres | PEC | Sphere | 13° | TE |
5 | Al crosses | Aluminum | Cross | 13° | TE |
3.1 Tasks and datasets
All six problems are shown in Table 1, (including the original base problem), and consist of the computed scattered far-fields of a 2D metamaterial array of 50 × 50 randomly chosen elements in free space. As mentioned, each problem differs in either the type of radiation illuminating the array, or in the material and geometry of the array elements. Different instances of these problems, namely the samples in the dataset, are of different individual element sizes with other conditions fixed. The base problem involves calculating the scattered far-field of an array of silicon spheres under a TE polarized plane wave incident at 13° from the surface normal at a wavelength of 11 μm. The dataset for each problem consists of 50,000 training data in total and another 5000 testing data, all generated by the DDA method described here. The geometry parameters for each metamaterial element in the array are randomly drawn from a uniform distribution spanning from 0.45 μm to 1.1 μm, which represents the radius for a sphere array or the half length of an array consisting of metallic cross elements. We have chosen transfer tasks which we expect will have varying similarity to the base problem, and therefore which will yield varying degrees of success. The first three target problems shown in Table 1 each keep constant the metasurface geometry and material, but allow the state of external radiation to vary. In tasks 4 and 5 the base material is changed to PEC and aluminum (Al), respectively, and in task 5 both the material and shape of the metasurface element changes, thus representing a two-factor change from the base problem.
In Figure 3(a) we show a schematic of the ResNet used in our study. The geometry of the metasurface array is represented here as a colormap matrix with the colorbar showing the radius of each sphere in the array. The matrix then feeds into the ResNet, which we depict here in AlexNet style [37]. As mentioned, the ResNet is trained on a dataset obtained from our DDA solver and thus the output predicts the far field scattering. To validate the accuracy of our DDA modeling approach we perform a direct comparison to a commercial computational electromagnetic solver. However, CEMS is limited in the maximum array size which can be studied; therefore here we reduce the metasurface array size to 30 × 30. On the right side of Figure 3(a) we show a slice of the far-field radiation taken along ϕ = 180° and for polar angles from 0° ≤ θ ≤ 180°, where the ground truth is shown as a dashed blue curve and the ResNet prediction is shown as the solid red curve.
Figure 3(b) presents a schematic depicting the transfer learning approach used for our study. The top AlexNet diagram in Figure 3(b) represents the base problem from which we transfer the parameters of the trained ResNet to the new problem under study – middle AlexNet diagram shown. We also compare the transfer learning approach to training a Resnet from scratch, which is shown in the bottom of Figure 3(b). In Figure 3(c) we compare the output of our DDA trained ResNet to computational simulations obtained from CST Microwave Studio for all target problems listed in Table 1. Here the network predictions are shown as solid red curves, while the ground truth simulations are shown as the dashed blue curves. All the neural network predictions are from the best model trained from random intializations using a total of 50,000 data.
3.2 Comparison of training from random initialization and transfer learning
To evaluate the effectiveness of transfer learning approach for the aforementioned five target tasks, two training schemes are implemented and compared for each problem. The first approach involves training a neural network with randomly initialized parameters, while the second is the transfer learning approach which involves fine-tuning a neural network initialized with the best model from the base problem – see Table 1. In the experiments conducted, the transferred neural network parameters are obtained from the best training results of the base problem, which used a total of 50,000 data points. Identical datasets of varying sizes are used for both training schemes, ranging from {500} to {1, 000} with a step size of 100, {1, 000} to {10, 000} with a step size of 1000, and {10, 000} to {50, 000} with a step size of 5000. The optimal model for each case is determined by searching over a grid of learning rates ([0.1, 0.01, 0.001, 0.0001]) and regularizers ([0.01, 0.001, 0.0001, 0.00001]). Each experiment is trained five times using the best hyperparameter combination to account for randomness. The resulting best models for both training schemes at different dataset sizes are then evaluated using a test dataset of size 5000. Although the mixed loss shown in Eq. (1) is employed during the training phase, we use a linear MARE loss in the testing phase, and for reporting and visualization purposes. Moreover, the MARE test loss is a normalized measure which enables a comparison among different problem types.
Figure 4 shows the test MARE of the transfer learning (orange) and random (blue) models as a function of data set size for each target problem. Here the solid curve is the average of the five aforementioned experiments, and the associated shaded area represents the 95 % confidence interval. We find that the transfer learning scheme consistently results in lower test loss for all target problems compared to random initialization. Further, the test MARE difference between transfer learning and random is a maximum for low dataset sizes which gradually diminishes for increasing dataset size. Notably, we do find that the advantage of transfer learning varies among the five target tasks, which can be intuitively observed through the gap between the two loss curves. The benefits of transfer learning are good for the two angle variation cases shown in Figure 4(b) and (c), while the TM polarization case shows the most significant improvement.
3.3 Quantitative analysis of the advantage of transfer learning
We comprehensively examine the advantages of the transfer learning approach by comparing its performance using two metrics. In the first case we measure the relative reduction in error between the transfer learning case and the random case, assuming a fixed quantity of target-task training data is available – denoted r e . We also examine the relative reduction in training data required in the transfer learning case to achieve a desired level of prediction error, as compared to the random case – denoted r D . To compare the relationship between the dataset size and error, we fit the data from Figure 4 using a third order polynomial to obtain a smooth e = g(|D|), where |D| denotes the size of a dataset D. Using the fitted functions both r e and r D can be calculated and results are shown in Figure 5. The error reduction ratio is a vertical cut of the data presented in Figure 4(b)–(f) and defined as,
where e TL is the test MARE using transfer learning at a dataset size of D, and e RI is the test MARE using random initializations also for D. In Figure 5(a) we see that for small data sizes r e is optimal and, for example at D = 300, we find that r e ≤ 10 % for Tasks 3–5, and r e ≈ 65 % for Tasks 1 and 2 – see Table 1. For all dataset sizes explored we find that transfer learning always leads to improved accuracy on the test set in comparison to training from random initialization, i.e. r e is less than unity. However, we find that the advantage of transfer learning gradually diminishes for increasing dataset size, and as we approach D ≈ 50, 000 we find r e → 1.
The data reduction ratio r D explores the data required for the transfer learning approach to achieve a specific test MARE compared to the data for random initialization to achieve the same test MARE. Therefore r D are horizontal cuts of the data presented in Figure 4(b)–(f), and defined as,
where D TL is the dataset size for transfer learning and D RI is the dataset size for random initialization. In Figure 5(b) we plot r D as a function of new sample simulations. If we take a test MARE of 3 % as a point of comparison, we find that the data can be reduced by over a factor of 2000 compared to random initialization for the TM Task 3 case, whereas we find r D values ranging from 94 % to approximately 22 % for the PEC Sphere (Task 4) to the 0° (Task 1), respectively. For all Tasks explored we find that a significant reduction in data is achieved for transfer learning compared to random initialization. The exact numbers of dataset sizes or MARE are compared in Table 2 at two critical metrics, i.e. MARE = 0.03 and data of D = 1, 000 points. For this comparison, the data of Figure 4(b)–(f) have been interpolated and, as observed from the data presented in Table 2, we find that to achieve a test MARE of 3 %, various tasks require drastically different amounts of data. For example, Task 3 needs only D = 3, whereas Tasks (4, 5, 2, 1) required D = (776, 924, 13691, 17,353). We note that for TM Task 3 case a third order polynomial fit – rather than the interpolated data – is used for Table 2, as denoted by the asterisk.
Task | Name | Data required for MARE = 0.03 | MARE for D = 1000 | ||
---|---|---|---|---|---|
Random initializations | Transfer learning | Random initializations | Transfer learning | ||
3 | TM | 7441 | 3a | 0.101 | 0.015 |
4 | PEC spheres | 12,050 | 776 | 0.103 | 0.029 |
5 | Al crosses | 7102 | 924 | 0.094 | 0.028 |
2 | 30° | 18,113 | 13,691 | 0.127 | 0.086 |
1 | 0° | 22,158 | 17,353 | 0.103 | 0.029 |
-
aValue determined using a third order polynomial fit described in the text.
The large variance in data required to achieve a certain error level presented in Table 2 hints that some task problems share more or less underlying physics with the base problem. To elucidate this connection, we perform a similarity analysis on the output far field scattering for each of the transfer Task problems. Figure 6(a) shows the results of a principal component analysis (PCA) performed on the farfield scattering patterns of 181 by 1 data points. The analysis utilized a total 60,000 data points, with each Task – including the base Task – contributing 10,000. As evident in the feature-wise basis, there is some overlap between the TM Task 3 (orange) and Base Task 0 (brown) data points in the principal components’ space. Other explored target Tasks (blue, purple, green and red) have centroids of varying distance to the Base Task. However, it is crucial to remember that the PCA operates on the output (i.e., the farfield scattering patterns). Therefore, close proximity in the output space (codomain) does not guarantee the same in the input space (domain). In other words, the two overlapping points from the TM and base problems may not correspond to the same input geometry features. None-the-less we find that the centroid distance of the Target task clusters in Figure 6(a) correlates relatively well with the critical transfer learning metrics presented in Table 2. As shown in Figure 6(b) we plot the PCA centroid distance and r D to reach a test MARE = 0.03 (open circle symbols) and r e for D = 1000 (open triangle symbols).
4 Discussion
We have demonstrated that transfer learning can be useful for solving different types of AEM problems. Specifically we showed that the knowledge gained from one AEM problem, stored in the parameters of a pre-trained DNN, can be applied to solve a different AEM problem with alternative settings. The method can be adapted to various changes in material choice, element shape, and disparate incident excitations. Although our study is only a small subset of all possible scenarios, we believe that transfer learning can be valuable for AEM problems in general. A crucial consideration for transfer learning is how dissimilar the target problem is compared to the base problem. Significant difference between base and target problems, for example such as a change in incident angle, can lead to decreased transfer learning effectiveness. Also, we found that as we increased the ground truth dataset size for the target problem, the advantage of transfer learning became less advantageous. While our five test cases showed that transfer learning is highly effective in all cases, there may be situations where it is not beneficial, or even yields worse performance if the problems being compared are too different. A positive correlation between the two critical metrics r D and r e and the centroid PCA distance of the far-field was found, but future research should look for such correlations in the input dimension.
Although transfer learning has been shown to always be useful between two different simulated datasets, we believe it can be applied in real-world situations as well. We propose using this method with models based on ideal simulations to deal with issues related to fabrication and measurement. This means adjusting the model to account for realistic factors using a small set of experimental data. One significant benefit of this approach is that we can predict uncertainties without needing to do a large number of individual experiments, of order of 103, that are typically needed to develop an accurate deep learning model. Through use of a model transferred from simulation to experiment we can save time and resources by estimating these uncertainties, suggesting that transfer learning can be a valuable tool in both theoretical studies and practical applications of metamaterials.
5 Conclusions
The transferability of knowledge from a well-trained neural network to problems with different materials and external conditions has been demonstrated. The transfer learning technique presented here allows for the training of a neural network surrogate model to achieve a desired level of accuracy using substantially less data – in some cases two orders of magnitude. A key technique to enabling this approach is that we keep the dimensionality of the input and output spaces consistent among different tasks, thereby ensuring the same neural network (ResNet) architecture can be utilized for all tasks. Our study highlights a viable approach for metamaterial problems that share the same underlying physics: a neural network trained for one problem can be reused and fine-tuned for a new problem with only a small number of new data points. Transfer learning opens doors to countless captivating physical processes ripe for exploration. The sheer breadth of AEMs and associated phenomena warrant further investigation, paving the way for exciting future studies. This discovery also paves the way for dynamically updating neural network models in response to changing environments or design targets, thereby helping to mitigate the data bottleneck problem in deep learning.
Funding source: Basic Energy Sciences
Award Identifier / Grant number: DESC0014372
-
Research funding: We acknowledge support from the Department of Energy under U.S. Department of Energy (DOE) (DESC0014372).
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Conflict of interest: Authors state no conflicts of interest.
-
Informed consent: Informed consent was obtained from all individuals included in this study.
-
Ethical approval: The conducted research is not related to either human or animals use.
-
Data availability: Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
[1] O. Khatib, S. Ren, J. Malof, and W. J. Padilla, “Deep learning the electromagnetic properties of metamaterials—a comprehensive review,” Adv. Funct. Mater., vol. 31, no. 31, p. 2101748, 2021. https://doi.org/10.1002/adfm.202101748.Search in Google Scholar
[2] W. Ma, Z. Liu, Z. A. Kudyshev, A. Boltasseva, W. Cai, and Y. Liu, “Deep learning for the design of photonic structures,” Nat. Photonics, vol. 15, no. 2, pp. 77–90, 2021. https://doi.org/10.1038/s41566-020-0685-y.Search in Google Scholar
[3] W. Ji, et al.., “Recent advances in metasurface design and quantum optics applications with machine learning, physics-informed neural networks, and topology optimization methods,” Light: Sci. Appl., vol. 12, no. 1, p. 169, 2023. https://doi.org/10.1038/s41377-023-01218-y.Search in Google Scholar PubMed PubMed Central
[4] W. Ma, F. Cheng, and Y. Liu, “Deep-learning-enabled on-demand design of chiral metamaterials,” ACS Nano, vol. 12, no. 6, pp. 6326–6334, 2018. https://doi.org/10.1021/acsnano.8b03569.Search in Google Scholar PubMed
[5] C. C. Nadell, B. Huang, J. M. Malof, and W. J. Padilla, “Deep learning for accelerated all-dielectric metasurface design,” Opt. Express, vol. 27, no. 20, pp. 27523–27535, 2019. https://doi.org/10.1364/oe.27.027523.Search in Google Scholar PubMed
[6] C. Qian, et al.., “Deep-learning-enabled self-adaptive microwave cloak without human intervention,” Nat. Photonics, vol. 14, no. 6, pp. 383–390, 2020. https://doi.org/10.1038/s41566-020-0604-2.Search in Google Scholar
[7] C. Liu, et al.., “A programmable diffractive deep neural network based on a digital-coding metasurface array,” Nat. Electron., vol. 5, no. 2, pp. 113–122, 2022. https://doi.org/10.1038/s41928-022-00719-9.Search in Google Scholar
[8] S. Ren, A. Mahendra, O. Khatib, Y. Deng, W. J. Padilla, and J. M. Malof, “Inverse deep learning methods and benchmarks for artificial electromagnetic material design,” Nanoscale, vol. 14, no. 10, pp. 3958–3969, 2022. https://doi.org/10.1039/d1nr08346e.Search in Google Scholar PubMed
[9] A. Taflove, S. C. Hagness, and M. Piket-May, “Computational electromagnetics: the finite-difference time-domain method,” in The Electrical Engineering Handbook, vol. 3, 2005.10.1002/0471654507.eme123Search in Google Scholar
[10] T. Kokkinos, C. D. Sarris, and G. V. Eleftheriades, “Periodic finite-difference time-domain analysis of loaded transmission-line negative-refractive-index metamaterials,” IEEE Trans. Microwave Theory Tech., vol. 53, no. 4, pp. 1488–1495, 2005. https://doi.org/10.1109/tmtt.2005.845197.Search in Google Scholar
[11] J.-M. Jin, The Finite Element Method in Electromagnetics, Nashville, TN, John Wiley & Sons, 2015.Search in Google Scholar
[12] J. Li and A. Wood, “Finite element analysis for wave propagation in double negative metamaterials,” J. Sci. Comput., vol. 32, no. 2, pp. 263–286, 2007. https://doi.org/10.1007/s10915-007-9131-2.Search in Google Scholar
[13] T. W. Hughes, M. Minkov, V. Liu, Z. Yu, and S. Fan, “A perspective on the pathway toward full wave simulation of large area metalenses,” Appl. Phys. Lett., vol. 119, no. 15, p. 150502, 2021. https://doi.org/10.1063/5.0071245.Search in Google Scholar
[14] Y. Zhao, S. Xiang, and L. Li, “Fast electromagnetic validations of large-scale digital coding metasurfaces accelerated by recurrence rebuild and retrieval method,” IEEE Trans. Antennas Propag., vol. 70, no. 12, pp. 11999–12009, 2022. https://doi.org/10.1109/tap.2022.3215230.Search in Google Scholar
[15] M. Mansouree, A. McClung, S. Samudrala, and A. Arbabi, “Large-scale parametrized metasurface design using adjoint optimization,” ACS Photonics, vol. 8, no. 2, pp. 455–463, 2021. https://doi.org/10.1021/acsphotonics.0c01058.Search in Google Scholar
[16] Y. Deng, S. Ren, K. Fan, J. M. Malof, and W. J. Padilla, “Neural-adjoint method for the inverse design of all-dielectric metasurfaces,” Opt. Express, vol. 29, no. 5, p. 7526, 2021. https://doi.org/10.1364/oe.419138.Search in Google Scholar PubMed
[17] Y. Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, JMLR Workshop and Conference Proceedings, 2012, pp. 17–36.Search in Google Scholar
[18] A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: disentangling task transfer learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3712–3722.10.1109/CVPR.2018.00391Search in Google Scholar
[19] A. Quattoni, M. Collins, and T. Darrell, “Transfer learning for image classification with sparse prototype representations,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008, pp. 1–8.10.1109/CVPR.2008.4587637Search in Google Scholar
[20] Y. Zhu, et al.., “Heterogeneous transfer learning for image classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, 2011, pp. 1304–1309.10.1609/aaai.v25i1.8090Search in Google Scholar
[21] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “Cross-language transfer learning for deep neural network based speech enhancement,” in The 9th International Symposium on Chinese Spoken Language Processing, IEEE, 2014, pp. 336–340.10.1109/ISCSLP.2014.6936608Search in Google Scholar
[22] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in International Conference on Machine Learning, PMLR, 2013, pp. 1139–1147.Search in Google Scholar
[23] J. Zhang, et al.., “Heterogeneous transfer-learning-enabled diverse metasurface design,” Adv. Opt. Mater., vol. 10, no. 17, p. 2200748, 2022. https://doi.org/10.1002/adom.202200748.Search in Google Scholar
[24] R. Zhu, et al.., “Phase-to-pattern inverse design paradigm for fast realization of functional metasurfaces via transfer learning,” Nat. Commun., vol. 12, no. 1, p. 2974, 2021. https://doi.org/10.1038/s41467-021-23087-y.Search in Google Scholar PubMed PubMed Central
[25] Z. Fan, et al.., “Transfer-learning-assisted inverse metasurface design for 30% data savings,” Phys. Rev. Appl., vol. 18, no. 2, p. 024022, 2022. https://doi.org/10.1103/physrevapplied.18.024022.Search in Google Scholar
[26] Y. Jia, C. Qian, Z. Fan, T. Cai, E.-P. Li, and H. Chen, “A knowledge-inherited learning for intelligent metasurface design and assembly,” Light: Sci. Appl., vol. 12, no. 1, p. 82, 2023. https://doi.org/10.1038/s41377-023-01131-4.Search in Google Scholar PubMed PubMed Central
[27] Y. Qu, L. Jing, Y. Shen, M. Qiu, and M. Soljacic, “Migrating knowledge between physical scenarios based on artificial neural networks,” ACS Photonics, vol. 6, no. 5, pp. 1168–1174, 2019. https://doi.org/10.1021/acsphotonics.8b01526.Search in Google Scholar
[28] P. R. West, et al.., “All-dielectric subwavelength metasurface focusing lens,” Opt. Express, vol. 22, no. 21, pp. 26212–26221, 2014. https://doi.org/10.1364/oe.22.026212.Search in Google Scholar
[29] W. J. Padilla and R. D. Averitt, “Imaging with metamaterials,” Nat. Rev. Phys., vol. 4, no. 2, pp. 85–100, 2022. https://doi.org/10.1038/s42254-021-00394-3.Search in Google Scholar
[30] M. Xu, et al.., “Emerging long-range order from a freeform disordered metasurface,” Adv. Mater., vol. 34, no. 12, p. 2108709, 2022. https://doi.org/10.1002/adma.202108709.Search in Google Scholar PubMed
[31] H. Zhang, Q. Cheng, H. Chu, O. Christogeorgos, W. Wu, and Y. Hao, “Hyperuniform disordered distribution metasurface for scattering reduction,” Appl. Phys. Lett., vol. 118, no. 10, p. 101601, 2021. https://doi.org/10.1063/5.0041911.Search in Google Scholar
[32] A. Leitis, et al.., “All-dielectric programmable huygens’ metasurfaces,” Adv. Funct. Mater., vol. 30, no. 19, p. 1910259, 2020. https://doi.org/10.1002/adfm.201910259.Search in Google Scholar
[33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.10.1109/CVPR.2016.90Search in Google Scholar
[34] S. An, et al.., “A deep learning approach for objective-driven all-dielectric metasurface design,” ACS Photonics, vol. 6, no. 12, pp. 3196–3207, 2019. https://doi.org/10.1021/acsphotonics.9b00966.Search in Google Scholar
[35] X.-X. Liu, Y. Zhao, and A. Alù, “Polarizability tensor retrieval for subwavelength particles of arbitrary shape,” IEEE Trans. Antennas Propag., vol. 64, no. 6, pp. 2301–2310, 2016. https://doi.org/10.1109/tap.2016.2546958.Search in Google Scholar
[36] G. Dural and M. I. Aksun, “Closed-form green’s functions for general sources and stratified media,” IEEE Trans. Microw. Theory Tech., vol. 43, no. 7, pp. 1545–1552, 1995. https://doi.org/10.1109/22.392913.Search in Google Scholar
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012.Search in Google Scholar
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.