FormalPara Key Points

The present systematic review and meta-analysis of randomized controlled trials (34 studies, n = 2830 participants aged ≥ 60 years) shows that supervised exercise intervention (SUP) brings significantly superior benefits compared with unsupervised exercise intervention (UNSUP) on physical function and well-being outcomes.

Significant benefits of SUP over UNSUP were still found for knee extension strength in those studies that applied a similar exercise program in both groups. In addition, greater benefits of SUP were observed compared with UNSUP when participants performed at least 66% of the training sessions in the assigned condition.

Given that both programs are safe and show similar attendance rates, UNSUP could represent a cost-effective tool for improving physical function and well-being in this population when SUP is not feasible.

1 Introduction

The population aged 60 years or over is rapidly growing, with the number of older adults worldwide expected to reach 1.4 billion in 2030 and 3.1 billion by 2100 [1]. This epidemiological shift is accompanied by a concomitant increase in the so-called aging-related diseases, notably frailty [2, 3]. Consequently, efforts are needed to attenuate aging-related deterioration and its associated burden.

Strong evidence supports the benefits of regular physical exercise for attenuating aging-related multisystem deterioration [4, 5]. Despite being overall beneficial, supervised exercise intervention (SUP)—the most widely analyzed type of intervention in the scientific literature—might have some drawbacks, as older adults can face difficulties in joining these interventions due to variables such as physical or financial constraints, low availability of facilities, weather conditions, distance from home, time commitments, the intimidating gym environment or, more recently, the lockdowns imposed by the COVID-19 pandemic [6]. In this context, unsupervised exercise intervention (UNSUP) appears to be a practical and potentially effective alternative [7, 8]. Indeed, a recent meta-analysis by our research group concluded that, despite being associated with modest adherence rates (67%), UNSUP might be effective for improving some important physical fitness outcomes in older adults compared with performing no exercise [9]. Similar results were reported by a recent meta-analysis that found a beneficial effect of UNSUP on physical fitness measures in healthy older adults [10].

There is therefore evidence to suggest that UNSUP is effective for improving physical fitness and overall health in older adults. It is worth noting, however, that controversy exists as to whether UNSUP might provide comparable benefits to those provided by SUP [11, 12]. A meta-analysis by Fisher et al. [12] reported that supervised strength training induces small benefits on muscle strength compared to unsupervised training in adolescents and adults, with little or no additional benefits on body composition. On the other hand, a meta-analysis by Lacroix et al. [13] found greater benefits on muscle strength/power and balance with SUP compared with UNSUP in healthy older adults. Nevertheless, the abovementioned results can be confounded by several factors, notably that most studies comparing SUP versus UNSUP have performed different interventions in each group (e.g., both groups did not perform the same type of training or the exercises performed were not comparable) [14,15,16]. Moreover, it is noteworthy that groups are frequently classified as SUP despite including unsupervised sessions, and conversely, the designation of UNSUP is sometimes applied to groups that also receive some supervised sessions [17, 18].

In this context, the aim of the present systematic review and meta-analysis of randomized controlled trials (RCTs) was to compare the safety, attendance/adherence rates, and effectiveness of SUP versus UNSUP on physical function and well-being measures in older adults, as well as to confirm whether differences are still present after accounting for potential confounding factors.

2 Methods

This systematic review and meta-analysis was reported according to the PRISMA (Preferred Reporting Items for Systematic Revies and Meta-Analyses) statement [19] and is conducted following the principles proposed elsewhere [20]. The review protocol was registered in PROSPERO (CRD42022326420).

2.1 Data Sources and Search Strategies

A systematic search was performed in the electronic databases PubMed, Web of Science, CINAHL, SPORTDiscus, and APA PsycINFO for relevant articles written in English (from inception to 4 September 2022). Screening of the articles was performed independently by two authors (AM, PGR). The complete search strategy is summarized in Online Supplementary Material (OSM) Table S1. The search was supplemented by a manual review of reference lists from included primary studies and review articles to find additional studies on the subject.

2.2 Study Selection

Eligibility criteria are reported according to the Population, Intervention, Comparison, Outcome and Study design (PICOS) approach [21]. The review was limited to studies that met the criteria shown in OSM Table S2.

Studies were first retrieved and preliminarily screened by title and abstract, and the full texts of those studies that met the inclusion criteria were assessed (AM, PGR). Disagreements between authors were resolved through consensus or after consultation with a third reviewer (PLV).

An exercise session was considered supervised when participants received synchronous supervision from a professional (e.g., an initial instructional session showing the exercises to ensure the correct technique, or individual/group supervised sessions conducted over the intervention period) whether face-to-face or videocall format. On the other hand, an exercise session was considered unsupervised if it did not include synchronous supervision by a sports scientist (e.g., phone calls asking about the exercises performed, assessing exercise frequency).

In the sub-analysis performed in this study, a supervised exercise group (SUP) was considered applicable if most training sessions performed had synchronous supervision by a sports scientist (i.e., at least 66% of the training sessions were supervised). An unsupervised exercise group (UNSUP) was considered applicable if the main part of the training sessions was conducted without synchronous supervision by an exercise professional (i.e., at least 66% of the training sessions were conducted without real-time supervision). Two independent reviewers (JSM, PGR) checked information from the included studies to calculate ratios in Table 1. In cases of disagreement, a third author (PLV) was consulted for clarification. This 66% cut-off has been applied in previous systematic reviews and meta-analyses comparing supervised and unsupervised exercise training [13].

Table 1 Supervised training sessions ratio over total number of training sessions in SUP and UNSUP groups

2.3 Outcomes Assessment

Safety included the number of adverse events (e.g., injury, pain, discomfort, worsening of an existing condition) as well as the number of falls during the intervention period.

Attendance rates refer to whether or not the participant carries out the exercise sessions. On the other hand, adherence refers to whether the participant, in addition to attending the exercise sessions, has achieved the intended objectives (i.e., volume, intensity, duration, exercises) [22].

To evaluate effectiveness, studies included should assess at least one of the following health-related endpoints: (1) muscle strength (e.g., knee extension strength, handgrip strength), (2) balance (e.g., one leg stance, tandem stance) (3) physical performance (e.g., timed-up-and-go test, maximum gait speed), (4) body composition (e.g., body fat, lean mass), or (5) health-related quality of life (e.g., European Quality of Life 5 Dimensions (EQ-5D-5L), 36-Item Short-Form Health Survey). If studies reported multiple variables within one of the endpoints categories, all variables were included. Only those variables that were included in at least three studies were used for meta-analysis.

2.4 Data Extraction

Two authors (JSM, PGR) independently extracted the following data from each study: participants’ characteristics, characteristics of the exercise interventions, attendance and adherence rates, outcomes assessed, and main results. This information was reviewed by a third author (AM) to ensure accuracy and completeness. Data comparing baseline and post-intervention assessments were used. We contacted the authors when studies reported the calculated change. Data were extracted, when available, as mean, standard deviation (SD), and number of participants per group. When data were provided as intervention effects and/or using other measures of dispersion (e.g., standard error, 95% confidence interval (CI)), the required information was estimated following the guidelines reported elsewhere [23]. When available, we used the results based on “intention-to-treat” analyses. We had to contact the authors of 30 studies [15, 16, 18, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] because the required data were not reported. Of these, the authors of 11 studies [15, 18, 25, 26, 31, 36, 37, 39, 44, 49, 50] provided the required information.

2.5 Quality Assessment

Two authors (AM, PGR) independently assessed the methodological quality of the included studies with the Tool for the assEssment of Study qualiTy and reporting in Exercise (TESTEX) scale [51]. This is a 15-point scale specifically designed for use in exercise training studies, including 5 points for study quality and 10 points for reporting. Thus, the quality of the studies was classified according to their total TESTEX score as “high” (≥ 12 points), “good” (7–11), or “low” (≤ 6). All the studies were used for data synthesis independently of their methodological quality. A third author (JSM) resolved any potential disagreement.

2.6 Statistical Analysis

A random-effects meta-analysis (DerSimonian and Laird method) was performed when at least three studies assessed a given outcome. The pooled standardized mean difference (SMD, post- minus pre-intervention data) between interventions was computed along with the 95%CI, and if the studies reported a given outcome using the same measurement units (e.g., kg, meters), the absolute mean difference (MD) was computed. A conservative correlation coefficient (Pearson’s r-value) of 0.7 between pre- and post-intervention data was used for the computation of the within-group SD, and sensitivity analyses with an r-value of 0.2 and 0.5 were performed when a significant result was found (not reported unless results became non-significant) [52]. When a study provided effect sizes separately for a given outcome divided in different sub-scales (i.e., health-related quality of life (HRQoL) divided into its different subdomains), results for that study were combined following a conservative approach by using a random-effects model assuming total dependency between measures (r = 1) as explained elsewhere [53]. Sensitivity analyses were also conducted by testing significance when removing one study at a time to check if findings were mostly driven by an individual study. Finally, sub-analyses were performed focusing solely on those studies with high and good quality according to the TESTEX scale, for those studies that applied a similar intervention in both SUP and UNSUP groups (e.g., both groups including exercise interventions targeting the same muscle groups and with similar characteristics) and for those studies in which participants performed more than two-thirds of the sessions in the assigned condition (i.e., the SUP group performed at least 66% of the sessions under supervision; Table 1, OSM Tables S3 and S4). Begg’s test was used to determine the presence of publication bias, and the I2 statistic was used to assess heterogeneity across studies. I2 values > 25%, 50%, and 75% were considered indicative of low, moderate, and high heterogeneity, respectively. The level of significance was set at 0.05. All statistical analyses were performed using the statistical software package Comprehensive Meta-analysis 2.0 (Biostat, Englewood, NJ, USA).

3 Results

3.1 Study Characteristics

From the retrieved studies, 34 studies derived from 30 RCTs (n = 2830 participants) met all eligibility criteria and were included in the systematic review (Fig. 1). Seven studies analyzed the same sample from three RCTs [15, 18, 25, 26, 28, 29, 54], and they were only counted once for the final sample size. The characteristics of the included studies are summarized in Table 2.

Fig. 1
figure 1

PRISMA 2020 flow diagram for new systematic reviews which included searches of databases, registers, and other sources. RCT randomized controlled trials

Table 2 Characteristics of the included studies

Participants’ average age ranged from 65 to 83 years (weighted average 72). Participants’ characteristics were highly heterogeneous, including different types of populations such as participants with hip osteoarthritis [18, 25, 26], sarcopenia [44], pre-frailty and frailty [16, 17, 41, 49], osteopenia and osteoporosis [14], Parkinson’s disease [31], intermittent claudication [33, 48], chronic obstructive pulmonary disease [36], previous falls [24, 46], type 2 diabetes [40], peripheral artery disease [34], postmenopausal women [27], sedentary individuals [15, 32, 54], individuals undergoing hemodialysis [43], independent individuals who resided in retirement village residents [28, 29] or in a nursing home [30], and community-dwelling older adults [35, 37,38,39, 45, 47, 55].

SUP and UNSUP included two to five training sessions per week (~ 15–90 min per session) and two to six training sessions per week (~ 15–60 min per session), respectively, and lasted between 4 and 52 weeks for both interventions. Only 18 studies (53%) included the same type of exercise intervention for both groups (in some studies the SUP and UNSUP groups performed exactly the same exercise program so only differed in the amount of guidance received while in other studies the exercises were similar but adapted to be performed at home) including strength training in two studies [45, 49], balance training in four studies [35, 38, 40, 46], aerobic training in one study [48] or multicomponent training (i.e., mainly strength, balance, aerobic, and/or stretching exercises combined in the same session) in the remaining 11 studies [24, 27,28,29,30, 36, 43, 44, 47, 55, 56]. For the other 16 studies, the exercise intervention performed between SUP and UNSUP groups was different. SUP was not fully supervised in eight out of 34 studies [18, 25, 26, 36, 37, 45, 49, 55]. Most studies provided between one and eight supervised sessions as instruction or as follow-up during the intervention to ensure that the exercise program was properly implemented [15, 18, 25, 26, 28,29,30,31, 33, 35, 38, 40, 41, 43, 47, 50, 54, 56], while others provided telephone follow-up [27,28,29, 38]. Additionally, in some studies, the UNSUP group also included from three up to 12 supervised training sessions [16, 17, 24, 44, 49].

3.2 Quality Assessment and Publication Bias

The quality of the included studies was overall good (mean TESTEX score of 10, range 4–14; Table 3). Three (9%) of the studies showed low methodological quality, 17 (50%) were of good quality, and 14 (41%) were deemed to be of high quality. Most studies did not specify allocation concealment (76% of the studies) or did not report adverse events (56% of the studies). Also, only 41% of the studies had a completion rate of at least 85% and only 41% reported details on assessors’ blinding.

Table 3 Quality of the included studies using the Tool for the assEssment of Study qualiTy and reporting in Exercise (TESTEX) scale

3.3 Endpoints

3.3.1 Main Analyses

3.3.1.1 Safety

Thirteen of the 34 included studies registered the incidence of adverse events [14, 17, 24, 32,33,34, 36, 37, 39, 41, 44, 47, 55] and four registered the number of falls [17, 24, 37, 46] during the study period, with none of them reporting significant differences between groups.

3.3.1.2 Attendance and Adherence Rates

Twenty-eight studies involving 25 RCTs reported attendance rates. Twenty-one studies computed attendance as the proportion of sessions completed from those initially prescribed for SUP [14, 17, 18, 26, 28,29,30,31,32,33,34,35, 39, 41, 44, 46,47,48,49,50, 55], reporting a weighted average attendance of 81% (range 60–100%). On the other hand, 20 studies registered the attendance rates for UNSUP, reporting a weighted attendance of 81% (range 38–100%) [14, 17, 18, 26,27,28,29,30, 32,33,34,35, 38, 39, 44, 46, 47, 49, 50, 55]. Two studies considered whether participants complied with variables such as intensity, duration, and prescribed exercises in addition to attendance at sessions (adherence) [47, 56]. Sandberg et al. [56] reported that 24% and 26% of the participants in SUP and UNSUP groups, respectively, were classified as fully adherent (i.e., achieved ≥ 80% of the exercise sessions at prescribed intensity), while 71% and 48%, respectively, were partially adherent (i.e., attendance at ≥ 20% to < 80% of exercise sessions regardless of intensity). Iliffe et al. [37] reported that 17% and 25% of the participants in the SUP and UNSUP groups, respectively, completed ≥ 75% of the prescribed sessions. Morrison et al. [40] reported that > 50% of subjects in the UNSUP group did not adequately complete the training sessions. Opdenacker et al. [54] and Van Roie et al. [15] reported that 80% and 78% of participants adhered to the exercise program (i.e., completed ≥ 80% of their program) in the SUP and UNSUP groups, respectively. The remaining six studies [16, 24, 25, 36, 43, 45] did not provide data on compliance rates for any of the groups.

3.3.1.3 Muscle Strength

Seventeen studies assessed different strength-related measures [14,15,16,17, 25, 26, 29, 30, 32, 39,40,41, 43,44,45, 49, 54], of which 13 could be meta-analyzed (OSM Figs. S1–S4). SUP induced significantly superior benefits to UNSUP on knee extension strength, which was confirmed in sensitivity analyses (Table 4). Furthermore, significant benefits of SUP were found for the sit-to-stand test (STS), but this was not significant in sensitivity analyses. No significant differences between SUP and UNSUP groups were found for handgrip strength.

Table 4 Summary of pooled results
3.3.1.4 Balance

Seventeen studies evaluated balance-related endpoints [14, 16, 24, 28, 30, 35, 37,38,39,40,41, 43, 45, 46, 49, 50, 55], of which 14 could be included in the analyses (OSM Figs. S5–S9). Significantly superior benefits were found for SUP when pooling the four studies that assessed the Berg balance scale, but significance was not confirmed in sensitivity analyses. A non-significant trend towards beneficial effects of SUP was observed for the functional reach test (FRT), although when removing the study by Watson et al. [14], the result was far from significant (Table 4). No significant differences between interventions were found for one leg stance or tandem stance with eyes closed or open.

3.3.1.5 Physical Performance

Thirty-one studies measured endpoints related to physical performance (i.e., timed-up-and-go test (TUG), usual and maximum gait speed, 6-min walk test, and maximal oxygen uptake) [14,15,16,17,18, 24,25,26, 28,29,30,31,32,33,34,35,36,37,38, 41, 43,44,45,46,47,48,49,50, 54,55,56] and 25 of them could be meta-analyzed (OSM Figs. S10–S14). SUP induced significantly superior benefits to UNSUP on TUG and usual gait speed, although these results were not significant in sensitivity analyses (Table 4). On the other hand, no significant benefits were observed for maximum gait speed, 6-min walk test, or maximal oxygen uptake.

3.3.1.6 Body Composition

Ten studies assessed different markers of body composition [14, 15, 25, 27, 41, 44, 45, 47, 54, 55] and seven could be included in the analyses (OSM Figs. S15–S18). No significant differences were found between SUP and UNSUP for body mass index, body mass, or body fat (Table 4). Nevertheless, significantly superior benefits of SUP were found for lean mass, although these differences became non-significant in sensitivity analyses when removing the study by Watson et al. [14]. Some studies analyzed other body composition variables such as bone mineral density [14] or body circumferences [15, 41, 44, 54], but they could not be meta-analyzed (Table 2).

3.3.1.7 Health-Related Quality of Life

Twelve studies assessed HRQoL [18, 31,32,33,34, 37, 39, 43, 44, 46, 48, 50], of which nine could be included in the analyses (OSM Fig. S19). Significant benefits of SUP over UNSUP were found for HRQoL, but sensitivity analysis showed that the removal of almost each individual study (except for Kakkos et al. [48], Illife et al. [37], or Pérez-Dominguez et al. [43]) made the results non-significant.

The results obtained in the main analyses remained essentially the same after removing the low-quality studies except for usual gait speed, which became non-significant, and the functional reach test, which became significant (see OSM Table S4).

3.3.2 Sub-Analyses of Confounding Factors

3.3.2.1 Muscle Strength

Beneficial effects of SUP on knee extension strength were observed when separately analyzing those studies that applied a similar intervention [30, 40, 44, 45, 49] and in the nine studies [16, 17, 26, 30, 40, 41, 44, 49, 54] in which participants performed ≥ 66% of the sessions in the assigned condition in both groups (OSM Table S3). Sub-analysis also confirmed significantly superior benefits of SUP in those studies [29, 43,44,45, 55, 56] where participants performed ≥ 66% of the sessions in the assigned condition (Table 1) for STS. No significant differences were found for handgrip strength in sub-analyses.

3.3.2.2 Balance

There were no significant benefits of SUP on the FRT when analyzing those studies [38, 45, 55] that applied a comparable intervention, although significant benefits were found in those studies in which participants performed ≥ 66% of the sessions in the assigned condition [14, 16, 38, 55] (Supplementary Table S3). No additional benefits were found when pooling the three studies that applied a similar training intervention in the SUP and UNSUP groups for one leg stance [28, 30, 43], balance scales [24, 28, 30], and tandem stance with eyes closed [30, 40, 55]. Participants in all studies completed ≥ 66% of the sessions in the allocated intervention for one leg stance [16, 28, 30, 41, 43], balance scales [16, 24, 28, 30, 50], and tandem stance with eyes closed [16, 30, 40, 55] and open [28, 30, 40, 49].

3.3.2.3 Physical Performance

A trend towards statistical significance was observed in those studies that applied a similar intervention in both SUP and UNSUP groups on TUG [24, 28, 30, 43,44,45,46, 49, 55] and usual gait speed [24, 43,44,45, 49], but not on maximum gait speed [35, 38, 45] (p = 0.274; OSM Table S3). Sub-analysis also confirmed significantly superior benefits of SUP in those studies in which participants performed ≥ 66% of the sessions in the assigned condition for TUG [14, 18, 24, 28, 30, 31, 41, 43, 44, 46, 49, 50, 55], and usual gait speed [16, 17, 24, 31, 43, 44, 49], but not for maximum gait speed [16, 17, 35, 38, 41], 6-min walk test [18, 34, 41, 43, 56], and maximal oxygen uptake [33, 47, 54].

3.3.2.4 Body Composition

Three studies performed a comparable training intervention in SUP and UNSUP for lean mass [27, 44, 47], and no significant benefits were found when pooling these studies. In all studies, participants performed ≥ 66% of the sessions in the assigned condition and significant benefits of SUP over UNSUP were found for lean mass (OSM Table S3).

3.3.2.5 Health-Related Quality of Life

Four studies [43, 44, 46, 48] applied a similar intervention in both SUP and UNSUP groups, and no differences were found when separately analyzing them. In seven [18, 31, 39, 43, 44, 46, 50] out of the nine studies participants performed ≥ 66% of the sessions in the assigned condition, and their separate analyses revealed significant benefits of SUP, whereas non-significant benefits were found for the two studies that did not meet this criterion (OSM Table S3).

4 Discussion

The present systematic review and meta-analysis compared the safety, attendance/adherence rates, and effectiveness of SUP versus UNSUP on measures of physical function and well-being outcomes in older adults. The incidence of adverse events and falls as well as the attendance to the program (81%) were similar in the SUP and UNSUP groups. Compared to UNSUP, SUP provided significantly superior benefits in knee extension strength, STS, TUG, usual gait speed, lean mass, and HRQoL, but only knee extension strength was still significant after sensitivity analyses. No benefits were found for the remaining outcomes. These results highlight the potential additional benefits that SUP can provide over UNSUP in older adults. However, for those unable to perform SUP, UNSUP may represent a safe and cost-effective alternative to ensure physical exercise.

4.1 Safety

We found that most of the included studies reported overall similar rates of adverse events and falls in SUP and UNSUP. For example, Almeida et al. [24], Costa et al. [17], and Lacroix et al. [56] performed a multicomponent exercise intervention (combining strength and aerobic training) during 12 weeks and reported no adverse events (e.g., falls, muscle soreness, or injuries) during the study for either group. Furthermore, one of the included studies with the longest duration (i.e., 35 weeks) did not register any adverse events as a result of the exercise programs [14]. Our results are in accord with a systematic review and meta-analysis analyzing the safety and effectiveness of long‑term (≥ 1 year) exercise interventions in older adults, which concluded that regardless of supervision or intervention structure (i.e., supervised group-based, unsupervised home-based, or a combination thereof), exercise reduces the number of falls and fall-associated injuries in this population [5]. However, it must be noted that the number of adverse events and falls reported in this study may be underestimated given that exercise dose variables are generally not equated between SUP and UNSUP groups (i.e., more difficult exercise selection, higher intensity, and volume for SUP).

4.2 Attendance and Adherence to the Exercise Program

In the present study we observed attendance rates of 81% for both SUP and UNSUP groups. In this regard, Lacroix et al. [13] compared the effects of SUP versus UNSUP programs including resistance and balance exercises on different physical fitness measures in older adults, finding a lack of association between attendance rates and the total number of supervised sessions. However, it is worth noting that there are at least two factors that might bias attendance rates. Firstly, most of the studies reported attendance using diaries in the UNSUP group, so the data obtained may not be accurate. Secondly, the fact that 21 of the 34 studies involved some level of supervision in the UNSUP group may affect the attendance rates obtained.

Another relevant finding is that only two studies considered whether participants complied with the prescribed parameters (i.e., intensity, duration, exercises) as well as their attendance to the training sessions (adherence). There are factors that can promote greater long-term adherence to the exercise program [57]. Although in the present study attendance rates were similar between groups, it is unclear whether this attendance rate could be maintained in the long term. One of the studies that showed the greatest benefits of SUP versus UNSUP lasted 35 weeks and had attendance rates over 85% in both groups [14]. Therefore, the limited duration of most interventions or the low attendance to the program in other studies may explain the lack of significant benefits observed in the remaining outcomes. In this sense, previous research has shown that people may be more likely to adhere to an UNSUP program compared to a SUP in the long term because UNSUP programs are easier to integrate into their lives [57]. Nevertheless, other factors associated with SUP might also be of relevance, such as obtaining direct feedback from a professional, the social component of being with other participants, or having greater material resources. Further research is therefore needed to confirm the role of attendance of SUP versus UNSUP in the long term in addition to studies that analyze adherence to training rather than only attendance rates.

4.3 Effectiveness of Supervised Exercise Intervention (SUP) Versus Unsupervised Exercise Intervention (UNSUP)

Our findings based on preliminary meta-analytical evidence suggest that SUP could provide greater benefits compared to UNSUP in different physical functions (i.e., knee extension strength, STS, TUG, usual gait speed, and lean mass) and well-being (i.e., HRQoL) measures. In line with our results, the meta-analysis of Lacroix et al. [13] found that SUP could provide additional benefits on some strength/power and balance measures. However, most of our findings became non-significant after sensitivity analyses, with the exception of knee extension strength. The observed improvement in knee extension strength is potentially relevant, as this outcome has proven to be critical for preventing osteopenia or osteoporosis [58]. Knee extension strength is also an important predictor of functional performance in older adults, as it is essential for activities of daily living and general well-being [59]. Remarkably, a systematic review and meta-analysis including data from two million adults concluded that higher levels of knee extension strength were associated with a lower risk of mortality, regardless of age and follow-up period [60].

On the other hand, no significant benefits were found for the remaining physical function outcomes (i.e., handgrip, FRT, one leg stance, balance scales, tandem stance, maximum gait speed, 6-min walk test, maximal oxygen uptake, body mass index, body mass, and body fat). There are different hypotheses that may partially explain the lack of benefits obtained. Firstly, participants will improve to a greater extent what they specifically train in their workouts (e.g., if most training sessions include lower-body exercises such as the Otago Exercise Program [37], participants will improve more in outcomes such as knee extension or STS since exercises with similar movement patterns are included). Secondly, it is possible that significant improvements will only be observed in those outcomes in which participants show more possibility of improvement because they have a lower starting level. This is consistent with the results obtained since, as we age, we tend to lose power, strength, and muscle mass due to the natural phenomenon of sarcopenia [61], so participants may be more likely to improve outcomes such as knee extension strength or STS if their baseline level is low. Lastly, there was large heterogeneity in the characteristics of the included studies and the applied interventions, as well as some potentially confounding factors that may influence the lack of additional benefits observed.

4.4 Confounding Factors in SUP and UNSUP Exercise Interventions

Of note, in many studies the exercise intervention applied in SUP and UNSUP differed substantially. We observed that training variables (i.e., volume, frequency, intensity, and type of exercise) were overall better reported in the SUP group than in the UNSUP group, which hinders drawing strong conclusions on the influence of these factors. In line with previous systematic reviews and meta-analyses [12, 62], a higher exercise intensity was usually applied in SUP than in UNSUP. For example, Iliffe et al. [37] reported that the SUP group trained at a higher intensity than the UNSUP group, and in the study by Watson et al. [14], the SUP group trained using weights equivalent to > 80–85% repetition maximum (RM) while the UNSUP group trained using a lower intensity (< 60% RM). This may be due to the fact that the target population are older people, which might lead professionals to be more conservative when prescribing intensities for the UNSUP group to avoid the potential risk of adverse events (e.g., injuries, falls) or to the participants themselves self-selecting a lower intensity during UNSUP. In a few studies the exercise selection for the SUP group was similar to the UNSUP group, taking into account the limitations and advantages in terms of facilities and equipment involved when training at a center versus training at home [40, 43]. Additionally, in most studies the type of exercise intervention was not comparable between groups. For example, Cecchi et al. [32] compared a SUP multicomponent physical exercise program (i.e., strength, balance, aerobic, and stretching) versus an UNSUP program consisting solely of regular walking (only aerobic). To account for this issue, a sub-analysis was performed comparing those studies that equated the exercise intervention as closely as possible (i.e., similar volume, frequency, intensity, and type of exercise) between the SUP and UNSUP groups. Significant differences were found for knee extension strength in those studies that performed a similar exercise program in both groups. However, the number of studies included in each of the 12 analyzed outcomes ranged from three to nine, and the remaining six outcomes could not be analyzed due to the low number of studies available.

Moreover, in some cases participants in the SUP or the UNSUP groups performed only part of the exercise sessions with or without supervision, respectively (only 13/34 studies did not include any supervised sessions during the intervention). Rarely, the two exercise groups only differed in the amount of guidance they received [30, 49]. Therefore, we conducted a second sub-analysis comparing those studies in which participants performed more than two-thirds of the sessions in the assigned condition (i.e., the UNSUP group performed at least 66% of the sessions without supervision) and significant differences in knee extension strength, STS, FRT, TUG, usual gait speed, lean mass, and HRQoL were observed. Eighteen outcomes could be analyzed in this sub-analysis, but the number of studies included in each outcome was reduced (from three to 13 studies). These limitations derived from the reduced number of studies included in both sub-analyses make it difficult to reach definitive conclusions.

Future research should examine the safety, attendance/adherence rates, and effectiveness of SUP versus UNSUP focusing on comparable training parameters, including volume, frequency, intensity, and exercise modality. These studies should specifically compare programs that differ solely in the presence or absence of supervision (i.e., fully supervised vs. fully unsupervised). This approach will provide valuable insights into the potential benefits and limitations associated with supervision, shedding light on the optimal design of exercise interventions for various populations.

4.5 Practical Implications

Our results have shown that SUP could provide additional benefits to UNSUP on some specific outcomes. The reason why UNSUP may not be as effective as SUP for improving some outcomes might be partly due to the fact that workouts conducted under the supervision of a professional may be performed with a higher quality. For example, SUP usually trains with better technical execution of the exercises, higher intensity and rating of perceived effort, better implementation of individualization and progression principles, and higher motivation due to direct feedback resulting in greater improvements [63, 64]. Therefore, given its potential superiority, SUP might be recommended over UNSUP when possible. However, there are some barriers usually associated with SUP in this population. For example, a systematic review conducted in the oldest old (i.e., people aged 80 years and over) showed that some of the main limitations to exercise identified were costs, transport, lack of access to exercise facilities, no exercise companion or being alone, care of siblings or others, fatigue, and embarrassment [65]. In addition, Costello et al. [66] reported lack of time and discipline, potential for injury, inadequate motivation, boredom and intimidation as main barriers to regular physical activity.

Previous meta-analyses have shown that UNSUP is effective for improving health-related outcomes in older adults [7, 10, 67]. Our research group showed that, compared with no exercise, UNSUP could be safe and effective for improving measures of muscle strength/power and balance in community-dwelling older adults, although the adherence to these programs was low [9]. Similarly, a meta-analysis including 17 studies also reported that UNSUP was effective for enhancing physical fitness in healthy older adults [10]. More recently, a meta-analysis including 12 studies (performed in both adolescents and adults) concluded that supervised resistance training could provide small additional benefits over unsupervised training on muscle strength, but no consistent differences were found for body composition [12]. Thus, when SUP is not feasible, UNSUP could be a safe and cost-effective alternative for improving the fitness and health of older adults.

4.6 Strengths and Limitations

One of the main strengths of the present study is that it provides novel information, as we focused solely on older adults, including a large number of studies (34 RCTs and 2830 participants), and analyzed both physical function and well-being outcomes. Another major strength is that previous systematic reviews and meta-analyses have often focused on a single specific type of exercise (e.g., strength or balance alone) whereas the present review included studies that examined all exercise types (i.e., strength, balance, flexibility, aerobic, or a combination thereof). Conversely, some limitations of the present study should be acknowledged. Notably, the low number of available studies for some of the meta-analyzed outcomes made the conclusions preliminary. One potential confounding factor is that most studies did not equate all training variables (i.e., volume, frequency, intensity, and type of exercise) for the SUP and UNSUP groups. To account for this issue, we performed sub-analyses comparing those studies that performed a similar exercise intervention in both groups. The lack of a consistent terminology regarding the degree of supervision in exercise interventions can be considered another confounding factor, since most of the UNSUP programs included some supervised sessions. Therefore, we performed additional sub-analyses defining an objective concept of training supervision (≥ 66% supervised sessions in the SUP group and ≥ 66% unsupervised sessions in the UNSUP group). There was also a lack of homogeneity in the tests used for assessment, making it difficult to reach definitive conclusions due to the small number of studies included in each meta-analyzed outcome. Future studies should take all these limitations into consideration. Finally, it is worth noting that the included studies analyzed both healthy and diseased populations, but given the heterogeneity of the populations assessed within- and between-studies, and the low number of studies included, we were unable to perform sub-analyses to determine how participants’ characteristics (i.e., healthy vs. clinical populations) moderate exercise benefits. Indeed, given the advanced age of the participants included, setting an objective definition of “healthy” is highly complex, since most participants presented some comorbidities (e.g., diabetes, hypertension, obesity, osteoarthritis, sarcopenia, frailty).

5 Conclusion

The present study suggests that SUP may offer certain advantages over UNSUP in enhancing physical function and well-being outcomes among older adults. Nevertheless, given that both interventions show high attendance rates and similar levels of safety, UNSUP appears to be an accessible approach for older adults, which might overcome some of the limitations associated with SUP. Future research should aim to examine the safety, attendance/adherence rates, and effectiveness of SUP versus UNSUP, focusing on equating training parameters between the two groups and differing only in the presence or absence of supervision (i.e., the UNSUP group cannot include supervised sessions or vice versa). These studies will provide valuable information about the benefits and limitations of supervision, informing the optimal design of exercise interventions. Figure 2 summarizes the findings obtained in this research.

Fig. 2
figure 2

Graphical summary of the study findings. CI confidence interval, HRQoL health-related quality of life, SMD standardized mean difference, SUP supervised exercise interventions, UNSUP unsupervised exercise interventions