Abstract
Green hydrogen production is crucial for a sustainable future, but current catalysts for the oxygen evolution reaction (OER) suffer from slow kinetics, despite many efforts to produce optimal designs, particularly through the calculation of descriptors for activity. In this study, we develop a dataset of density functional theory calculations of bulk and surface perovskite oxides, and adsorption energies of OER intermediates, which includes compositions up to quaternary and facets up to (555). We demonstrate that per-site properties of perovskite oxides such as Bader charge or band center can be tuned through element substitution and faceting, and develop a machine learning model that accurately predicts these properties directly from the local chemical environment. We leverage these per-site properties to identify promising perovskites with high theoretical OER activity. The identified design principles and promising materials provide a roadmap for closing the gap between current artificial catalysts and biological enzymes such as photosystem II.
Similar content being viewed by others
Introduction
Atomic-level design of highly active catalysts that rival the efficiency of natural enzymes is one of the ultimate goals of chemistry and materials science. Whereas enzymes have been fine-tuned residue-by-residue through biological evolution for millions of years, human-designed homo- and hetero-geneous catalysts to date remain far from optimal. For example, for the oxygen evolution reaction (OER), one of the most important reactions in both photosynthesis and renewable chemical and fuel production, the turnover frequency in photosystem II is at least two or three orders of magnitude higher than state-of-the-art heterogeneous catalysts1,2,3. Moreover, enzymatic catalysis demonstrates the significance of site-dependent reactivity, as reactions occur within highly specialized pockets and tailored chemical environments3,4.
Efforts to design heterogeneous catalysts with activities rivaling enzymes will require breaking scaling relations, which dictate that binding energies of reaction intermediates are linearly dependent and thus limit the minimum theoretical overpotential to ~0.37 V5,6. Of the various strategies for breaking scaling, facet engineering7 and element substitution8 remain some of the simplest and most widely used. Multicomponent metal oxide surfaces, which take advantage of both of these strategies, thus offer an opportunity to move beyond these limitations, due to their vast design space comprising billions of possible local atomic structures and active site environments. However, exploration of such a vast chemical space to design optimal artificial active sites will require atom-by-atom design beyond current approaches using trial-and-error coupled with human intuition9.
One effective strategy to go beyond Edisonian design of complex oxide catalysts is to use a descriptor-centered approach, which captures the quantitative relationship between structure, energetics, and function9. Descriptors are especially powerful as they can be calculated in silico with low cost, such that materials with desired properties can be predicted before they are even synthesized. For instance, catalytic activity has been shown to correlate with d-band center for metals10 and oxygen 2p-band center for metal oxides11,12. Moreover, eg orbital filling13, spin state14,15, and magnetic ordering14 have also been shown to have a profound effect on catalytic activity. While the use of such descriptors in catalyst designs has already led to successful catalyst discoveries11,16,17,18, most of these previous efforts have mainly relied on bulk descriptors, which assumes that surface electronic structure is correlated with bulk electronic structure characteristics19.
Despite their effectiveness, average bulk descriptors alone cannot capture local effects from substitution or surface effects, limiting the space of atom-by-atom design for surface reactivity20. Such limitation is particularly severe for complex multimetallic oxide surfaces21 due to the diverse and near-continuous distributions of surface sites. Moreover, it has been recently shown that for RuO2, a state-of-the-art catalyst for the oxygen evolution reaction (OER), the catalytic activity is highly dependent on the local environment of coordinatively unsaturated surface oxygen7,22. Specifically, CUS-oxygen as the active sites on (100) is shown to have an order of magnitude higher specific OER activity that of (110)7. Design of heterogeneous catalysts with activity rivaling that of enzymes thus requires the understanding and efficient prediction of site-specific activity.
Budding efforts to go beyond bulk descriptors and to link local, site-dependent descriptors to catalytic activity include the report that the site-dependent 2p-band center of adsorbed oxygen correlates strongly with OER-intermediate binding energies23,24. Nevertheless, this proposed descriptor requires the same DFT calculation as calculating the binding energy itself so it does not result in a significant computational advantage. Site-dependent structural descriptors, such as generalized coordination number, are cheaper to calculate since they only require the pristine slab and have been shown to correlate with intermediate binding energies25,26. While coordination number is able to capture local structure-property trends, it alone cannot be generalized across different local chemistry and coordination. Recent efforts to overcome these limitations have included encoding elemental as well as coordination information27,28, often relying on hand-crafted features that are difficult to generalize and extend to new materials classes29,30. Machine learning approaches have revolutionized computational materials science31,32 given their ability to learn very complex non-linear functions, efficiently process structural information without hand-tuning33,34, and combine this information with other data fidelities35 or physics-based priors, typically through graph-convolutional neural networks. Some recent works have leveraged these tools to predict OER-intermediate binding energies directly from structure36, circumventing the need for hand-crafted features, but the models to date have not taken advantage of site-dependent descriptors beyond local structure37.
In this work, we develop a graph-based neural network model that accurately learns site-dependent descriptors from structure and leverages these descriptors to predict OER intermediate binding energies automatically from structure, without the need for hand-crafted features. We consider density functional theory (DFT)-computed site-specific descriptors based on local electronic structure (Bader charges38,39, site-projected O 2p-band23,24, and metal d-band centers10), local magnetic moments40, and local phonon band centers41. By combining in-house data with previous datasets12,36,42,43,44, we develop and curate, to the best of our knowledge, the largest and most diverse dataset of OER energetics on perovskite surfaces to date. We show that per-site descriptors across this dataset vary widely with composition and surface coordination, and can be predicted purely from structure. We demonstrate per-site properties that are both independent of one another and linearly correlated to OER binding energies, such that they can be leveraged independently to tune OER energetics. We compare the OER energetics across this dataset with previous efforts on metals45, rutile RuO27,22 and IrO246, metal hydroxide-organic frameworks (MHOFs)18, single atom catalysts47, and enzymes48, and show that by tuning per-site properties of the active site and adsorbed species it is theoretically possible to extend beyond the scaling relations and bridge the gap between current state-of-the-art heterogeneous catalysts and enzymes such as photosystem II. Finally, we list several of the most promising materials identified through this work.
Results and discussion
The local environment around the active site strongly influences the materials properties of that site. For example, the Mn atoms in bulk CaMnO3 (Fig. 1a), on the CaMnO3(111) surface (Fig. 1b), and inside photosystem II (PSII) have different local environments and thus different d-band centers (Fig. 1c). To study the effects of the local environment on the properties of the active site, we curate, to the best of our knowledge, the largest dataset of multicomponent perovskite oxides by combining in-house DFT calculations with results compiled from the literature12,36,42,43,44. This dataset includes over 20,000 substituted bulk perovksite oxides up to quaternary, and over 10,000 surfaces cut along facets up to and including (555). The dataset additionally includes over 10,000 O*, OH* and OOH* adsorption energies. For all bulk and surface structures we calculate per-site properties relevant to OER, including the oxygen 2p-band center and metal d-band center (Fig. 1d), and magnetic moment (Fig. 1g). For surface structures we additionally calculate Bader charge (Fig. 1f). Atomic vibration frequencies (Fig. 1e), not calculated for our dataset but available from previous works41, are also considered (Fig. 1b). Using this dataset, we develop and train per-site graph-convolutional neural network machine learning models that are able to learn per-site properties (see Fig. 1h) and ultimately OER intermediate binding energies (see Fig. 1i) directly and automatically from atomic structure without the need for hand-crafted features.
Learning compositional effects on per-site properties in bulk perovskites
We show that by automatically encoding the local environment around each site, graph-based per-site models are able to capture the effect of chemical environment on per-site properties with high accuracy and without requiring hand-tuned features. We train and hyper-parameter tune several different site-aware deep learning models (see more details in Supplementary Information) on the bulk perovskites in our dataset to predict a number of DFT-derived atomic properties (metal d-band center, O 2p-band center, and magnetic moment), which are functions of the geometry and composition of the local chemical environment of each atom. We additionally train these models on external datasets for predicting Bader charge44 and phonon band center41, which are described in more detail in the Methods section. Performance is reported as Pearson’s correlation coefficient on a held-out test-set for these per-site properties in Table 1, and parity plots for each property are reported in Supplementary Fig. 1. The per-site model based on Crystal Graph Convolutional Neural Network (CGCNN)33 and Polarizable Atom Interaction Neural Network (PAINN)49 architectures have very similar performance, and out-perform per-site CrabNET50 which is based on stoichiometry alone without encoding local environments. The best results are shown in bold in Table 1, and are 0.952, 0.969, 0.950, 0.962 and 0.997 for metal d-band center, oxygen 2p-band center, magnetic moment, phonon band center, and Bader charge, respectively. While for magnetic moment, Bader charge and phonon band center, element type alone maps well to the property with correlation coefficients of 0.904, 0.957, and 0.924, respectively, the site-projected metal d-band and oxygen 2p-band centers are highly dependent on the environment around the site as these properties are a result of orbital overlaps with neighboring atoms. Moreover, we consider predictions from the bond-valence method51, where per-site properties are estimated from neighbor distances (Pi = \({\sum }_{j}exp(\frac{{r}_{0}-{r}_{ij}}{b})\), having rij as the bond distance between site i and neighbor j, r0 as the fit for a given cation-anion pair, and b as an empirical constant typically set to 0.37 Å51). Bond-valence method parameters were fit for each property using our dataset, and an example of this fitting for predicting oxygen 2p-band center and oxygen Bader charge is shown for O-Ti, O-Mn, O-Fe and O-Co bonds in Supplementary Fig. 2. Fitted bond-valence parameters are reported in Supplementary Table 1. It is shown that R0 for predicting oxygen Bader charge decreases from 2.12 to 2.08, to 2.05, to 2.02 for O-Ti, O-Mn, O-Fe and O-Co bonds, respectively, suggesting that oxygen 2p-band center is increasingly pushed down by oxygen-neighboring elements from left to right across the d-band. The bond valence method often outperforms simple element averages by accounting for neighboring atom types and distances. By encoding for several shells of neighbors through multiple graph convolutions, the per-site model is able to significantly outperform both element averages and the bond-valence method without the need for hand-tuned parameters (Table 1).
We compare the calculated and predicted per-site materials properties across a held-out test set on an element-wise basis to highlight the ability of the per-site model to explicitly learn physical principles dictated by local environments. Calculated and predicted magnetic moments (Fig. 2a), Bader charges (Fig. 2b), atomic vibration frequencies (Fig. 2c), metal d-band centers (Fig. 2d), and oxygen 2p-band centers (Fig. 2e) on a held-out test set are shown as a function of element for a sample of elements (V through Ni for metal d-band center, magnetic moment, and Bader charge; H through O for atomic vibration frequency; and O for oxygen 2p-band center). The models are shown to accurately capture both overall periodic trends (predictable from basic physical principles) as well as element-wise distributions (non-trivial effects of local environment). For example, the magnetic moments follow Hund’s rules52: the number of unpaired electrons increases from V to Mn as separate d states are filled, then decreases from Mn to Ni as electrons co-fill d-states and spin is canceled (Fig. 2a). The model recreates that elements become more negatively charged going from V to Ni as electronegativity increases going from left to right along the periodic table, as reflected by decreasing Bader charge (Fig. 2b). Per-site atomic vibration frequency is shown to generally decay for the first 10 elements in the periodic table as mass of the element increases (Fig. 2c). Periodic trends in d-band center are also captured: as the number of electrons in the d-band increases across the periodic table, the d-band width must widen to maintain the Fermi level thus pushing the d-band center more negative5 (Fig. 2d).
The per-site model accurately captures per-element distributions in material properties, which is remarkable as these distributions represent complex effects of substitution and local environment not obvious from basic physical principles alone. Additionally, the nodes in the calculated and predicted magnetic moment (Fig. 2a) and Bader charge (Fig. 2b) for each element represent different oxidation and spin states which are typically difficult to predict. The element-wise distributions in atomic vibration frequency (Fig. 2c), metal d-band center (Fig. 2d), and oxygen 2p-band center (Fig. 2e) are even less obvious, as these spreads do not include nodes corresponding to discrete oxidation states. For example, the model captures that oxygen atoms in Ca0.5La0.5MnO3 have a spread of 2p-band centers between –2.85 and –2.50 eV vs. Fermi (Fig. 2e). Indeed, this range is an interpolation of the oxygen 2p-band centers of –1.67 and –3.87 eV vs. Fermi in CaMnO3 and LaMnO3, respectively. Notably, the 2p-band center is higher for oxygen atoms with more Ca neighbors and lower for oxygen atoms with more La neighbors.
Despite the non-obvious nature of the site-properties discussed here, the presented models capture different local environments, match the calculated data with high accuracy, and are a powerful tool for tuning site properties through substitution effects.
Learning facet and composition dependence of surface site properties
Moving beyond bulk towards a more realistic representation of catalytic sites, we use the surfaces in our dataset to show the strong influence of surface coordination and composition on per-site properties of surface atoms (adsorbed oxygens and active sites). The results for learning metal d-band center and magnetic moment of the active sites, oxygen 2p-band center of adsorbed oxygens, and Bader charge of the active sites and adsorbed oxygens are summarized in terms of Pearson’s correlation coefficient in Table 2, and parity plots for each property are also reported in Supplementary Fig. 3. The per-site graph neural network models are predictive of these properties with a correlation coefficient of 0.980, 0.955, 0.969, and 0.995, respectively, and are thus able to capture surface effects. The per-site CGCNN and PAINN models have similarly high performance, analogously to that presented for bulk atoms in Table 1. It is shown that for surface atoms the bond valence method loses some of its predictability, likely due to lack of accounting for undercoordination. We additionally compare the per-site model results to a baseline of the element average in the corresponding bulk structure, which is able to capture some composition effects but cannot account for local environment effects such as facet dependence (Table 2). That the per-site model significantly outperforms this reasonable baseline shows that the model has successfully encoded local environment of surface atoms specifically.
We compare the calculated and predicted per-site surface properties on an element-wise basis across a held-out test set of benchmark surfaces to further highlight the ability of the per-site model to explicitly learn composition and facet dependence. The predicted and calculated 2p-band centers of surface oxygens in LaBO3 is compared as a function of B going across the 3d band of the periodic table from V to Ni, where the model has not been trained on any of these LaBO3 materials (Fig. 3a). Spreads in each of the violin plots are due solely to different surface coordinations in the dataset, which range from MO3 to MO6. Interestingly, the oxygen 2p-band center of surface oxygens varies significantly with facet, even more than the variation due to changing the B site. This wide spread is due to the significantly different coordination environments of facets that are close-packed versus not. Generally, the oxygen 2p-band centers of adsorbed oxygens become less negative as the active site is more highly coordinated (i.e. oxygens within MO6 have higher 2p-band centers than those in MO4). These spreads, which cannot be captured from bulk band center calculations (shown as a single point in white for each composition), provide compelling evidence for the need to account for surface coordination. In contrast to surface oxygen 2p-band centers, surface metal d-band centers are shown to have a much more limited dependence on facet and surface atom coordination, and are more strongly a function of composition (Supplementary Fig. 4). The model is able to capture both compositional effects (2p-band centers of surface-oxygen generally become less negative as B goes from left to right across the periodic table) as well as effects of faceting (calculated versus predicted distributions visibly match).
We highlight a few example surfaces in the test set, namely the (110), (100), and (111) facets of LaMnO3 (or MO4, MO5, and MO6 environments, respectively), to show that the 2p-band centers of adsorbed oxygens are most negative for the high coordination environment of (111) oxygen atoms (MO6), and least negative for the lower coordination environment of (110) oxygen atoms (MO4), and that the per-site model is able to capture this effect (Fig. 3b). This trend also matches the findings of Rao et al.7, who showed that bridge oxygen atoms on the edge-sharing (110) facet of unary RuO2 had a less negative 2p-band center than the bridge oxygen atoms on the (101) corner-sharing facet. We show that the per-site model is able to capture these facet effects, both by matching the distributions in Fig. 3a and by correctly ordering the highlighted surface oxygen atoms in Fig. 3b.
Atom-by-atom design strategies for OER
We next demonstrate the ability of per-site descriptors to predict DFT-calculated binding energies of OER intermediates, and ultimately catalytic activity. We fit a multi-linear function of per-site properties of the active site (magnetic moment, Bader charge, d-band center, d-band width, and coordination number) and of the adsorbed oxygen (Bader charge, 2p-band center, 2p-band width, and coordination number) to OER intermediate binding energies. The predicted versus calculated binding energies are shown in Fig. 4a, with the linear model achieving excellent predictive power with a correlation coefficient of 0.872. These simple linear relationships between binding energy and per-site properties capture most of the variation in the data. A complex, non-linear model combining a per-site graph neural network and per-site properties (Fig. 4a inset for results on a held-out test set) moderately outperforms the linear model with a correlation coefficient of 0.898. These results are in agreement with and extend beyond previous studies1,23,38,53, which have shown that OER intermediate binding energies correlate to bulk oxygen 2p-band center1,53 or to adsorbed oxygen 2p-band center23,38. We show these same correlations between binding energies and bulk descriptors1,53, and between binding energies and individual per-site descriptors23,38 in Supplementary Fig. 5 and Supplementary Fig. 6. Combining these models with the above-presented per-site property prediction models, and with models that predict relaxed structure54,55, would enable end-to-end learning of binding energies without the need for DFT calculations. Moreover, unlike these previous efforts1,23,38,53 which have only included the (001) facet of perovskites, our work includes facets up to (555) and multimetal complex perovskites up to quaternary.
By leveraging multiple per-site properties of the active site and adsorbed oxygen simultaneously, we are able to improve upon these previous correlations (e.g. correlation coefficient of 0.61 for bulk oxygen 2p-band center53, 0.66 for surface oxygen 2p-band center23,38, and 0.87 for the linear model) as several of the per-site properties are found to simultaneously and independently influence the binding energy of OER intermediates. A polar plot of the relative importance of each per-site property assigned by normalizing the coefficients of the linear model is shown in Fig. 4b. The two most important per-site properties are the 2p-band center of the adsorbed oxygen (in agreement with previous work1,23,56), and Bader charge of adsorbed oxygen, which is similar to eg filling13 and has been shown to be useful for both prediction of OER activity and metal oxide formation energy13,57,58. The relative importance of these two per-site properties differs among adsorbates. For example, for OH*, the least oxidizing adsorbate, the 2p-band center is the most influencial while for OOH*, the most oxidizing adsorbate, Bader charge is the most influential. For O*, these two descriptors and the oxygen 2p-band width are of more equal weight, as shown in Fig. 4b. For all three adsorbates, the remaining parameters studied in this work (i.e. coordination number of adsorbed oxygen and Bader charge, d-band center, d-band width, coordination number, and magnetic moment of the active site) are found to be of limited importance for predicting DFT-derived OER binding energetics.
A correlation matrix of the per-site properties is shown in Fig. 4c. While the 2p-band width and 2p-band center of adsorbed oxygen are negatively correlated with a coefficient of -0.6, most of the other per-site properties are not found to correlate strongly with any of the others. Further support comes from the sparsity (median degree of three) of the correlation network after correcting for multiple hypothesis testing and filtering for connections with greater than 0.01 p-value (Fig. 4d). Pairwise p-values were calculated across surface per-site descriptors and normalized by total comparison count as directed by the Bonferroni method for multiple hypothesis correction (Supplementary Fig. 7). Of interest, the metal d-band center does not significantly correlate with other metal descriptors suggesting it cannot be easily tuned by changing metal coordination. Further, the metal d-band center is of low feature importance to binding energy (Fig. 4b) counter to common metal catalyst behavior59. Most notably, the 2p-band center and Bader charge of adsorbed oxygen, the two per-site properties that most influence the binding energies, are found to be only weakly correlated with a correlation coefficient of 0.3 (Fig. 4c), suggesting that these two per-site properties can be tuned independently.
We compare the energetics for OER across the surfaces dataset to identify the most promising catalysts for OER, and develop design strategies for navigating this landscape as a function of per-site properties. The binding energies of OER intermediates (O*, OH*, and OOH*) are known to be linearly dependent60, fundamentally limiting the OER overpotential to above 0.37 eV and preventing catalyst design via DFT, assuming proton-coupled electron transfer61. For example, ΔEOH is known to scale with ΔEO with a slope of 0.5 eV and intercept of 0.44 eV, and ΔEOOH scales with ΔEOH with a slope of 0.88 eV and intercept of 3.21 eV19. These well-known scaling relationships19,60 compare well with our calculated values of ΔGO − ΔGOH and ΔGOOH − ΔGO for the dataset of complex multimetal perovskites studied in this work (Fig. 5a), with data-points colored by their calculated theoretical overpotential. An additional scaling relationship, the “ideal" scaling, is plotted in orange in Fig. 5a, where the well-known slope remains the same but the coefficient is adjusted such that scaling passes through 0 V overpotential. Interestingly, the spread in the perovksite oxides dataset is wide enough to touch the ideal scaling line due to the large diversity in composition and facet. We highlight several substituted rutile oxides of interest, which deviate from typical scaling and are closer to the ideal line due to substitution effects on the per-site properties and thus on OER binding energetics. Additional scaling relationships between ΔGOH and ΔGO, and between ΔGOH and ΔGOOH are shown in Supplementary Fig. 8, where the materials in this work are additionally compared to previous efforts on metals (purple circles)45, single-atom catalysts (pink circles)47, RuO2 facets (dark green)7, IrO2 facets (light green)46, metal hydroxide-organic frameworks (MHOFs, light gray)18, and PSII (orange star)48. While previously studied single-atom and metal catalysts fall more tightly on the typical scaling line (black, Supplementary Fig. 8), the oxides studied in this work show more deviation from the well-known scaling relations due to the large variation in both element substitution and faceting. RuO2 facets from the work of Rao et al.7 are also shown to deviate from this scaling, due to corner-sharing versus edge sharing effects as well as stabilization of OOH* by neighboring coordinately unsaturated oxygen. The effect of this deviation from scaling is that some surfaces studied in this work are able to outperform the well-known OER thermodynamic overpotential limit, with overpotentials as low as 0.21 V La0.5Sr0.5CoO3 (211). We note that the deviation from scaling for some of these perovskites is even as extreme as the deviation for photosystem II (PSII)48 (Supplementary Fig. 8) The typical activity volcano for theoretical potential as a function of ΔGO − ΔGOH is also shown (Fig. 5b), including standard and ideal scaling (black and orange lines, respectively), perovskites (dark gray), and rutile oxides (green). Perovksite oxides in our dataset that are unstable (energy above hull of the corresponding bulk structure above 0.3 eV) are semi-transparent while stable perovskites (energy above hull below 0.3 eV) are fully opaque. Energetics for PSII48 are additionally compared to the perovskites and rutiles in this work, and several materials with overpotentials approaching that of PSII are plotted in Fig. 5b and listed in Table 3. This activity volcano is based on a very simple mechanism for OER that may not hold for all oxides. Further work will need to be done to consider more complex OER mechanisms.
A version of the activity volcano as a function of relevant per-site properties rather than binding energies is shown (Fig. 5c), to show the physical origin of the deviation from the standard scaling relationships. The 2p-band center and Bader charge of adsorbed oxygen are considered, because these properties have the strongest influence on binding energies, are not correlated, and can be tuned independently to break the scaling relationships. While the 2p-band center of adsorbed oxygen is a good descriptor for traversing the standard scaling line, by simultaneously adjusting the Bader charge of the adsorbed oxygen closer to 0 qe, it is possible to move toward ideal scaling. We compare the R0 fitted bond-valence parameters for how transition metal neighbors affect oxygen Bader charge and oxygen 2p-band center, to build a map for simultaneously tuning Bader charge and 2p-band center of adsorbed oxygen through substitution (Fig. 5d). R0 gives us a measure of how an individual neighboring atom will affect the site properties of the oxygen in question, and by comparing R0 for different metal types we can tune the per-site properties of the oxygen (neighbor atoms with lower R0 increase the value of the site property). We compare the R0 for band-center and Bader charge to show how it is possible to substitute neighbor atoms to control 2p-band center and Bader charge independently. It is shown that as the atomic number of neighboring atoms increases, the R0 fitted parameter for predicting oxygen 2p-band center increases and the oxygen 2p-band center therefore decreases. In contrast, the R0 fitted parameter for predicting oxygen Bader charge given the neighbor atoms decreases (and therefore oxygen Bader charge increases) going from left to right across the periodic table. This map helps guide researchers for what substitutions to make in a perovskite to modify the per-site properties of adsorbed oxygen, and therefore to tune the catalytic activity. On the strong binding side of the volcano, metal substitutions that increase the 2p-band center of adsorbed oxygen while decreasing Bader charge will be appropriate, so in this region Co substitution is recommended. On the weak binding side of the volcano, metal substitutions that decrease the 2p-band center of adsorbed oxygen while decreasing Bader charge will be appropriate, so in this region Ir/Ru substitution is recommended. This matches with our intuition, as many promising perovskites for OER contain these metals38,58. We compare Ru0.5In0.5O3 and Ru0.5Co0.5O3 to illustrate this point. Replacing neighboring In with Co slightly decreases the R0 for oxygen 2p-band center, while dramatically decreasing the R0 for oxygen Bader charge (Fig. 5d). By swapping out In (Ru0.5In0.5O3) for Co (Ru0.5Co0.5O3), the 2p-band center increases slightly (moving toward the peak along the scaling line in Fig. 5c) and the Bader charge of adsorbed oxygen also increases. The result is that Ru0.5Co0.5O3 has a much higher activity than Ru0.5In0.5O3. This example illustrates a strategy for navigating activity space by tuning per-site properties through composition tuning. Additional strategies could involve facet engineering, which modulates the 2p-band center of adsorbed oxygen while having less effect on the Bader charge7.
Finally, the thermodynamic overpotential as well as the energy above hull of the corresponding bulk structure are used to screen the dataset for materials showing promising activity and stability (see Supplementary Fig. 9), and the most promising stable (energy above hull less than 0.3 eV) and active (thermodynamic overpotential below 0.6 V) materials are summarized in Table 3. While many of the interesting materials identified and listed in Table 3 have not yet, to the best of our knowledge, been tested for OER activity, several identified materials (such as La0.5Sr0.5CoO3, LaNiO3, PrNiO3, and La0.75Sr0.25MnO3) have already been tested and shown to be promising OER catalysts13,58. Although the dataset curated and analyzed in this work includes over 10,000 binding energies, the phase space of multicomponent perovskite surfaces remains far from exhausted and we remain hopeful that many more promising perovskite materials may yet to be discovered. Toward this end, future work could include pairing per-site descriptors with effective strategies for intelligent search of the remaining phase space, such as Bayesian optimization62. Also of interest are high entropy oxide systems with five or more metals, where the phase space is even more daunting and rich with opportunities for catalyst optimization63,64 through control of chemical ordering65.
Methods
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contacts, Rafael Gomez-Bombarelli, rafagb@mit.edu and Yang Shao-Horn, shaohorn@mit.edu.
Materials availability
This study did not generate any new materials.
Model details
In this work, we implement machine learning models to both predict per-site properties directly from structure (see Fig. 1h) and leverage these properties to predict OER intermediate binding energies (see Fig. 1i).
Per-site graph-based and composition only neural network models (extensions of previous models33,49,50) were individually trained on several calculated per-site properties including metal d-band center, oxygen 2p-band center, Bader charge, magnetic moment, and phonon band center. For graph-based models, convolutional layers are used to automatically featurize the structure based on local environments and a linear layer is used to convert this featurization to predicted per-site properties. Although all models in this study are trained on DFT-relaxed structures, we propose that an extension to unrelaxed structures will be possible using cheap surrogate models for predicting relaxed structure from unrelaxed structure54,55.
For all properties, we perform a benchmark comparison between per-site modified versions of CGCNN33, PAINN49, and CrabNET50. Per-site CGCNN is initialized with physics-informed, curated atomic descriptors which improves generalizability and performance, especially with smaller datasets. For each per-site property in bulk structures, the per-site model performance is additionally compared against a baseline of element average across the dataset. For surface atoms, the per-site model performance is additionally compared against a baseline of the element average in the corresponding bulk structure, and to the bond-valence method.
Linear models and neural networks were implemented to predict OER intermediate binding energies from calculated per-site properties. The linear model was regressed on calculated Bader charges, d-band centers, 2p-band centers, magnetic moments, and coordination numbers of surface active sites and adsorbed oxygens. A neural network architecture was also developed, using these properties appended to learned embeddings from the per-site graph-based models as input. This input was passed though graph convolutional layers and a final pooling to predict binding energies.
All results presented are on held-out test data. The models are trained on 60% of the data, validated on 20%, and tested on 20%. Random train-validation-test splits were used in all cases. Model weights were chosen based on optimal validation set performance. Error ranges were calculated using statistics from three different experiments, where model weights and train-test splits are seeded randomly, and were found to always be below 0.02. Hyperparameter tuning was performed on all models using SigOpt66. All models were run on Nvidia GeForce RTX 2080 Ti GPUs.
Datasets
Multicomponent bulk perovskite oxides were compiled from the literature12,42,43,44, including > 3,600 binary, > 1300 ternary, and > 900 quaternary oxides. This dataset was augmented by in-house DFT calculations up to quaternary with unit cell size 2 × 2 × 2, resulting in > 4600 quaternary oxides and > 19,000 total bulk perovskite structures. Cation substitutions were arranged both in the rock salt configuration as well as randomly distributed. Models for predicting metal d-band centers, oxygen 2p-band centers, and magnetic moments were trained and tested on this bulk perovskite dataset using a multi-task learning scheme that learns these three properties simultaneously for every atom. Moreover, to train and test the model on Bader charges, over 170,000 structures with Bader charges available were compiled from across the entirety of Materials Project44. This dataset includes a much more diverse set of structures than just perovskites. Lastly, to train and test the model on phonon band centers, a dataset of over 9,800 structures with phonon calculations was downloaded from the Phonon Database at Kyoto University41. All structures available in this database were used, not limited to oxides and perovskites.
We prepared a dataset of perovskite oxide surfaces for predicting surface per-site properties and OER intermediate adsorption energies by augmenting the Open Catalyst 2022 (OC22)36 dataset with data generated by an in-house automated pipeline (see Supplementary Fig. 10). A subset of the bulk perovskite structures are randomly sampled from the bulk perovskite dataset described above, cut with a random choice of miller indices between (001) and (555), and optimized with and without adsorbed O*, OH* and OOH*. All in-house adsorbate calculations consider adsorption to B-sites at low surface coverage, and further work should be done to additionally consider adsorption on lattice oxygen sites67. This augmented dataset includes over 10,000 clean surfaces and 7,800 surfaces with single adsorbates (O*, OH* or OOH*).
All DFT calculations were performed using projector augmented wave pseudopotentials, within the Vienna ab initio simulation package. Perdew-Burke-Ernzerhof functionals were used, and Hubbard-U corrections were employed to correctly capture the localization of metal d-electrons. Ueff values consistent with the defaults from Materials Project were chosen for each metal44. Forces were converged to within 0.02 eV/Å. Slabs were cut to be greater than 10 Å. The 2p-band and d-band center values of each atom were obtained by projecting the density of states onto each site and integrating between -10 below and 2 above the Fermi level. Magnetic moments were obtained directly from VASP output, and moments for f-band elements were left out of the test set due to unreliable pseudopotentials. To account for the well-known overbinding of O2, the oxygen reference was corrected to the experimental formation enthalpy of water, calculated as +0.29 V correction per O. Standard formation energies of H2O(l), OH(g), and HO2(g) were taken from NIST-JANAF thermochemical tables rather than being computed68. All binding energies were converted to free energies by using a standard contribution of adsorbate atoms to the ZPE and vibrational entropy, calculated as +0.30, –0.01, and +0.34 V correction for O*, OH* and OOH* respectively. Rotational and translational entropy of adsorbates on surfaces were assumed to be negligible, given that adsorption on surfaces typically results in loss of translational and rotational entropy in favor of vibrational entropy5. Additional electronic structure calculations were performed on all structures from OC22 to generate per-site metal d-band center, O2p-band center and magnetic moment.
Data availability
All data from this study are made available at https://github.com/learningmatter-mit/atom_by_atom. Instructions for downloading the datasets are included in the repository README file.
Code availability
All code from this study is made available at https://github.com/learningmatter-mit/atom_by_atom.
References
Hong, W. T. et al. Toward the rational design of non-precious transition metal oxides for oxygen electrocatalysis. Energy Environ. Sci. 8, 1404–1428 (2015).
Dau, H. et al. The mechanism of water oxidation: From electrolysis via homogeneous to biological catalysis. ChemCatChem 2, 724–761 (2010).
Cox, N. et al. Eectronic structure of the oxygen- evolving complex in photosystem ii prior to o-o bond formation. Science 345, 804–808 (2014).
Vogt, C. & Weckhuysen, B. M. The concept of the active site in heterogeneous catalysis. Nat. Rev. Chem. 6, 89–111 (2022).
Norskov, J. K., Studt, F., Abild-Pedersen, F. & Bligaard, T. Fundamental concepts in heterogeneous catalysis. John Wiley Sons 1, 32 (2014).
Ferri, P. et al. Approaching enzymatic catalysis with zeolites or how to select one reaction mechanism competing with others. Nat. Commun. 14, 2878 (2023).
Rao, R. R. et al. Operando identification of site-dependent water oxidation activity on ruthenium dioxide single-crystal surfaces. Nat. Catal. 3, 516–525 (2020).
Halck, N. B., Petrykin, V., Krtil, P. & Rossmeisl, J. Beyond the volcano limitations in electrocatalysis – oxygen evolution reaction. Phys. Chem. Chem. Phys. 16, 13682–13688 (2014).
Peng, J. et al. Human-and machine-centred designs of molecules and materials for sustainability and decarbonization. Nat. Rev. Mater. 7, 991–1009 (2022).
Hammer, B. & Norskov, J. K. Why gold is the noblest of all the metals. Nature 376, 238–240 (1995).
Grimaud, A. et al. Double perovskites as a family of highly active catalysts for oxygen evolution in alkaline solution. Nat. Commun. 4, 2439 (2013).
Jacobs, R., Mayeshiba, T., Booske, J. & Morgan, D. Material discovery and design principles for stable, high activity perovskite cathodes for solid oxide fuel cells. Adv. Energy Mater. 8, 1702708 (2018).
Suntivich, J., May, K. J., Gasteiger, H. A., Goodenough, J. B. & Shao-Horn, Y. A perovskite oxide optimized for oxygen evolution catalysis from molecular orbital principles. Science 334, 1383–1385 (2011).
Biz, C., Fianchini, M. & Gracia, J. Strongly correlated electrons in catalysis: Focus on quantum exchange. ACS Catal. 11, 14249–14261 (2021).
Vennelakanti, V., Nandy, A. & Kulik, H. J. The effect of hartree-fock exchange on scaling relations and reaction energetics for c–h activation catalysts. Top. Catal. 65, 296–311 (2022).
Grimaud, A. et al. Activating lattice oxygen redox reactions in metal oxides to catalyse oxygen evolution. Nat. Chem. 9, 457–465 (2017).
Kuznetsov, D. A., Peng, J., Giordano, L., Román-Leshkov, Y. & Shao-Horn, Y. Bismuth substituted strontium cobalt perovskites for catalyzing oxygen evolution. J. Phys. Chem. C. 124, 6562–6570 (2020).
Yuan, S. et al. Tunable metal hydroxide–organic frameworks for catalysing oxygen evolution. Nat. Mater. 21, 673–680 (2022).
Calle-Vallejo, F., Inoglu, N. G., Su, H. Y., Kitchin, J. R. & Rossmeisl, J. Number of outer electrons as descriptor for adsorption processes on transition metals and their oxides. Chem. Sci. 4, 1245–1249 (2013).
Dickens, C. F. & Norksov, J. K. A theoretical investigation into the role of surface defects for oxygen evolution on ruo2. J. Phys. Chem. C. 121, 18516–18524 (2017).
Choubisa, H. et al. Accelerated chemical space search using a quantum-inspired cluster expansion approach. Matter 6, 605–625 (2023).
Rao, R. R. et al. Towards identifying the active sites on ruo2(110) in catalyzing oxygen evolution. Energy Environ. Sci. 10, 2626–2637 (2017).
Dickens, C. et al. An electronic structure descriptor for oxygen reactivity at metal and metal- oxide surfaces. Surf. Sci. 681, 122–129 (2019).
Hwang, J. et al. Regulating oxygen activity of perovskites to promote nox oxidation and reduction kinetics. Nat. Catal. 4, 663–673 (2021).
Calle-Vallejo, F., Loffreda, D., Koper, M. T. M. & Sautet, P. Introducing structural sensitivity into adsorption–energy scaling relations by means of coordination numbers. Nat. Chem. 7, 403–410 (2015).
Ruck, M. et al. Oxygen reduction reaction: Rapid prediction of mass activity of nanostructured platinum electrocatalysts. J. Phys. Chem. Lett. 9, 4463–4468 (2018).
Batchelor, T. A. A. et al. High-entropy alloys as a discovery platform for electrocatalysis. Joule 3, 834–845 (2019).
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for co2 reduction and h2 evolution. Nat. Catal. 1, 696–703 (2011).
Fernandez, M., Barron, H. & S. Barnard, A. Artificial neural network analysis of the catalytic efficiency of platinum nanoparticles. RSC Adv. 7, 48962 (2017).
Li, Z., Ma, X. & Xin, H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal. Today 280, 232–238 (2017).
Axelrod, S. et al. Learning matter: Materials design with machine learning and atomistic simulations. Acc. Mater. Res. 3, 343–357 (2022).
Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230–232 (2018).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Damewood, J. et al. Representations of materials for machine learning. Annu. Rev. Mater. Res. 53, 399–426 (2023).
Greenman, K. P., Green, W. H. & Gómez-Bombarelli, R. Multi-fidelity prediction of molecular optical peaks with deep learning. Chem. Sci. 13, 1152–1162 (2022).
Tran, R. et al. The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts. ACS Catal. 13, 3066–3084 (2022).
Back, S. et al. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J. Phys. Chem. Lett. 10, 4401–4408 (2019).
Hwang, J. et al. Perovskites in catalysis and electrocatalysis. Science 358, 751–756 (2017).
Rawal, T. B. et al. Interaction of zinc oxide nanoparticles with water: Implications for catalytic activity. ACS Appl. Nano Mater. 2, 4257–4266 (2019).
Zhong, W. et al. Electronic spin moment as a catalytic descriptor for Fe single-atom catalysts supported on C2N. J. Am. Chem. Soc. 143, 4405–4413 (2021).
Phonon database at kyoto university – phonondb documentation. Phonondb.mtl.kyoto-u.ac.jp [Online] (2022).
Castelli, I. E. et al. Computational screening of perovskite metal oxides for optimal solar light capture. Energy Environ. Sci. 5, 5814–5819 (2012).
Emery, A. A., Saal, J. E., Kirklin, S., Hegde, V. I. & Wolverton, C. High-throughput computational screening of perovskites for thermochemical water splitting applications. Chem. Mater. 28, 5621–5634 (2016).
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Kulkarni, A., Siahrostami, S., Patel, A. & Norskov, J. K. Understanding catalytic activity trends in the oxygen reduction reaction. Chem. Rev. 5, 2302–2312 (2018).
Gauthier, J. A., Dickens, C. F., Chen, L. D., Doyle, A. D. & Norskov, J. K. Solvation effects for oxygen evolution reaction catalysis on iro2(110). J. Phys. Chem. C. 121, 11455–11463 (2017).
Deng, Q. et al. 2d transition metal–tcnq sheets as bifunctional single-atom catalysts. J. Catal. 370, 378–384 (2019).
Siegbahn, P. E. M. Water oxidation mechanism in photosystem ii, including oxidations, proton release pathways, o—o bond formation and o2 release. Biochimica et. Biophysica Acta 1827, 1003–1019 (2013).
Schutt, K. T., Unke, O. T. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. International Conference on Machine Learning 9377–9388 (2021).
Wang, A. Y. T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. Compositionally restricted attention-based network for materials property predictions. npj Comput. Mater. 7, 77 (2021).
O’Keefe, M. & Breese, N. E. Atom sizes and bond lengths in molecules and crystals. J. Am. Chem. Soc. 113, 3226–3229 (1991).
Kutzelnigg, W. & Morgan, J. D. I. Hund’s rules. Z. Phys. D. 36, 197–214 (1996).
Jacobs, R., Hwang, J., Shao-Horn, Y. & Morgan, D. Assessing correlations of perovskite catalytic performance with electronic structure descriptors. Chem. Mater. 3, 785–797 (2019).
Chen, C. & Ong, S. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci. 2, 718–728 (2022).
Deng, B. et al. Chgnet: Pretrained universal neural network potential for charge-informed atomistic modeling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Lee, Y. L. et al. Prediction of solid oxide fuel cell cathode activity with first-principles descriptors. Energy Environ. Sci. 4, 3966 (2011).
Craig, M. J., Kleuker, F., Badjdich, M. & Garcia-Melchor, M. Fefos: A method to derive oxide formation energies from oxidation states. Catal. Sci. Technol. 13, 3427–3435 (2023).
Hong, W. T., Welsch, R. E. & Shao-Horn, Y. Descriptors of oxygen-evolution activity for oxides: A statistical evaluation. J. Phys. Chem. C. 120, 78–86 (2016).
Sun, S., Zhou, X., Cong, B., Hong, W. & Chen, G. Tailoring the d-band centers endows (nixfe1–x)2p nanosheets with efficient oxygen evolution catalysis. ACS Catal. 10, 9086–9097 (2020).
Man, I. C. et al. Universality in oxygen evolution electrocatalysis on oxide surfaces. ChemCatChem 3, 1159–1165 (2011).
Rossmeisl, J., Dimitrievski, K., Siegbahn, P. & Norskov, J. K. Comparing electrochemical and biological water splitting. J. Phys. Chem. Lett. C. 111, 18821–18823 (2007).
Pedersen, J. K. et al. Bayesian optimization of high-entropy alloy compositions for electrocatalytic oxygen reduction. Angew. Chem. 1333, 24346–24354 (2021).
Svane, K. & Rossmeisl, J. High-entropy oxides for the oxygen evolution reaction. ChemRxiv. Cambridge Open Engage 1 (2022).
Nguyen, T. X., Liao, Y. C., Lin, C. C., Su, Y. H. & Ting, J. M. Advanced high entropy perovskite oxide electrocatalyst for oxygen evolution reaction. Adv. Funct. Mater. 31, 2101632 (2021).
Peng, J., Damewood, J. & Gómez-Bombarelli, R. Data-driven, physics-informed descriptors of cation ordering in multicomponent oxides. arXiv (2023).
Clark, S. & Hayes, P. SigOpt Web page. https://sigopt.com (2019).
Mueller, D. N., Machala, M. L., Bluhm, H. & Chueh, W. C. Redox activity of surface oxygen anions in oxygen-deficient perovskite oxides during electrochemical reactions. Nat. Commun. 6, 6097 (2015).
Chase, M. Nist-janaf thermochemical tables. Am. Inst. Phys. 4, 1261–1359 (1998).
Lohmiller, T. et al. Structure, ligands and substrate coordination of the oxygen-evolving complex of photosystem ii in the s2 state: a combined epr and dft study. Phys. Chem. Chem. Phys. 16, 11877–11892 (2014).
Acknowledgements
This work was supported by the Advanced Research Projects Agency-Energy (ARPA-E), US Department of Energy under award number DE-AR0001220. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract no. DE-AC02-05CH11231. We would additionally like to thank James Damewood for many fruitful discussions and suggestions, especially with regard to the bulk perovskite oxide data.
Author information
Authors and Affiliations
Contributions
Conceptualization, R.G.B., Y.S.H., J.R.L. and J.K.; Methodology, R.G.B., Y.S.H., J.R.L., J.K.; Formal analysis, J.R.L., J.K. and H.C.; Investigation, J.R.L., J.K., H.C., Y. T. and C.H.S.; Resources, R.G.B., Y.S.H. and B.H.; Writing - original draft, J.R.L., J.K., J.P., R.G.B. and Y.S.H.; Visualization, J.R.L. and J.K.; Supervision, R.G.B., Y.S.H. and B.H.; Funding Acquisition, R.G.B., Y.S.H. and B.H.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lunger, J.R., Karaguesian, J., Chun, H. et al. Towards atom-level understanding of metal oxide catalysts for the oxygen evolution reaction with machine learning. npj Comput Mater 10, 80 (2024). https://doi.org/10.1038/s41524-024-01273-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524-024-01273-y