Effects of resistance training on self-reported disability in older adults with functional limitations or disability – a systematic review and meta-analysis

Background Self-reported disability has a strong negative impact on older people’s quality of life and is often associated with the need for assistance and health care services. Resistance training (RT) has been repeatedly shown to improve muscle function (e.g. strength) and functional capacity (e.g. gait speed, chair-rise) in older adults with functional limitations. Nevertheless, it is unclear whether such objectively assessed improvements translate into a reduction in self-reported disability. Objectives To assess: i) whether and to what extent RT interventions have an effect on self-reported disability in older adults (≥65 years) with functional limitations or disability; and ii) whether the effects on self-reported disability are associated with changes in objective measures of muscle strength and functional capacity across studies. Methods PubMed, Embase, Web of Science, CINAHL and SPORTDiscus electronic databases were searched in June 2018. Randomized controlled trials reporting effects of RT on self-reported disability/function in ≥65 year-old adults with defined, functional limitations or self-reported disability were eligible. Data on self-reported disability/function were pooled by calculating adjusted standardized mean differences (SMD) using Hedges’g. Likewise, effect sizes for three secondary outcomes: knee extensor muscle strength; gait capacity; and lower body functional capacity were calculated and fit as covariates in separate meta-regressions with self-reported disability as the dependent factor. Results Fourteen RCTs were eligible for the primary meta-analysis on self-reported disability. The total number of participants was 651 (intervention n = 354; control n = 297). A significant moderate positive effect of RT was found (SMD: 0.59, 95% CI: 0.253 to 0.925, p = 0.001). Between-study heterogeneity was present (I2 statistic = 75,1%, p <  0.001). RT effects on objective measures of lower body functional capacity were significantly associated with effects on self-reported disability (Adj. R2 = 99%, p = 0.002, n = 12 studies), whereas no significant associations with gait capacity or knee extensor strength were found. Conclusions This review provides evidence that RT has a moderate positive effect on self-reported disability/function in old people with or at risk for disability. The effects are strongly associated with effects on objective measures of lower body functional capacity.


Background
The prevalence of disability increases with increasing age [1] and is a serious societal challenge because of the estimated demographic trends towards an ageing population [2]. In European countries, the increased cost related to the additional need for assistance with activities of daily living (ADL) and long-term institutionalization of older people are projected to rise by 1.1% of gross domestic product between 2013 and 2060 [3].
Disability has been defined as a deficit between the capacities of an individual and that individual's contextual factors [4]. Self-reported disability reflects an individual's perception of this relationship, and refers to experienced difficulties in executing a task or being involved in a socially defined role including household, self-care and social life. Self-reported measures of disability and function in ADL reflects dependency of assistance, which is linked to reduced quality of life in older adults [5]. Moreover, older adult's perception of function and disability play a role in the allocation of health care services. Health care providers commonly use interviews or questionnaires when rating older person's need for assistance in basic ADL (eating, dressing, bathing, transferring from bed to chair and using the toilet) as well as more complex activities required for independent living such as housekeeping, cooking and shopping (instrumental ADL; IADL) [6]. Self-reported disability is therefore a highly relevant outcome which impacts the quality of life of the individual and challenges the sustainability of the health care sector [3,7]. Preventing disability and maintaining independent living are therefore two high priorities in global health strategies [8].
Resistance training (RT), defined as exercise that causes muscles to work or hold against an applied force or weight [9], has been consistently reported to improve neuromuscular function (e.g. muscle strength) [10][11][12][13] and functional capacity (e.g. gait speed, chair-rise) [13][14][15][16][17][18][19] in older adults. These findings draw attention to RT as a means to preventing disability and dependency [10], and current health-recommendations encourage older adults to engage in RT on a weekly basis [9,20]. Nevertheless, whether improvements in neuromuscular function and functional capacity translate into reduced selfreported disability in older adults is yet not conclusively established. This mismatch has been investigated in a randomized controlled trial (RCT) by Chandler et al. [16] in 1998. Their study showed a relationship between gains in strength and functional capacity (i.e. gait speed), but not between strength and self-reported disability after 10 weeks of RT [16]. A similar mismatch was later confirmed in older women with coronary heart disease by Brochu et al. [21]. The authors found a correlation between strength gain and functional capacity improvements, but not between strength gain and selfreported ADL-function [21]. While RT-interventions have often used assessments of neuromuscular function (e.g. muscle strength) and objective measures of functional capacity such as gait, chair-rise, reaching, stooping, lifting, [6,14,22], self-reported disability outcomes have received less attention and have shown inconsistent results [14,23,24]. In a systematic review, Weening-Dijksterhuis and colleagues [24] found that resistancetype exercise of moderate to high intensity had light to moderate effects on ADL-disability in frail institutionalized elderly (effects sizes < 0.50). In line with that, a Cochrane review and meta-analysis by Liu and Latham [23] showed that the effects of RT on self-reported disability measured by the functional domain of the 36-Item Short Form Health Survey Instrument (SF-36) and self-reported measures of ADL were significant, but small (33 trials, 2172 participants; SMD 0.14, 95% CI 0.05 to 0.22) [23]. However, the study populations in this comprehensive Cochrane review were heterogeneous in terms of health, functional status and age [23].
Ceiling effect has been suggested as a potential explanation for why RT-interventions fail to detect changes in self-reported ADL-disability in relatively wellfunctioning older adults [22]. Improvements in selfreported disability following RT-interventions may therefore only be detected in older adults with limitations or existing disability.
Understanding whether RT-interventions may translate into better self-perception of ability in participants with pre-existing limitations is therefore highly valuable for older adults, health care sector as well as for recommendation guidelines.

Aims
The primary aim of this systematic review and metaanalysis is to investigate the effects of RT on selfreported disability in older people from the age of 65 years and above with functional limitations or selfreported disability. The secondary aim is to assess associations between the effects of RT on objective and subjective measures of disability across studies.

Study design and methods
This literature review was conducted according to the PRISMA guidelines for systematic reviews and metaanalysis [25,26]. The criteria of eligibility were specified in accordance to the PICOS approach (participants, intervention, comparison, outcomes, study design) as recommended by Centre for Evidence-Based Medicine [27] (details of the search strategy are outlined in Additional file 1). The quality assessment of eligible records was based on the validated tool for quality appraisal for reviews of physical therapy interventionsthe Physiotherapy Evidence Database scale (PEDro scale) [28].

Eligibility criteria
Participants/study population Studies including participants from the age of 65 years and above and of any residential status, sex and ethnicity were eligible for inclusion. To ensure that all studies only included participants who met the age criterion, those that enrolled participants < 65 years were excluded, even if the sample mean age was > 65 years. The study population had to be characterized by having functional limitations (e.g. low gait speed) or self-reported disability according to a given criteria set by the authors of the original paper. Studies examining the impact of RT in populations with significant medical conditions such as cancer, renal and hepatic diseases, obstructive pulmonary disease, or neural abnormalities were not eligible. Additionally, study samples characterized by cognitive impairments, amputation or permanent use of wheelchair were excluded.

Intervention
To be considered for inclusion, RT needed to be the dominant component of the exercise intervention, with more than 50% of the intervention involving RT. The warm up and cool down periods were not considered as a part of the intervention and therefore the use of aerobic training, stretching, functional training and balance exercises during these phases was not used as the basis for exclusion. Interventions of any frequency, intensity and duration were eligible. Trials applying multifaceted interventions combining RT with macro-nutritional, behavioural, psychological, or medical approaches in all of the experimental groups were excluded as this could impact the results. Also, early rehabilitation interventions after joint surgery were ineligible since the influence of the natural post-surgery recovery process, has been identified as a confounding factor [29].

Comparison
Trials were eligible if they comprised at least one intervention group (IG) receiving an intervention that fulfilled the above listed criteria, and a control group (CG) that: i) did not receive any treatment or ii) was provided attention control, standard therapy, sham intervention or usual care. In case of the latter two, trials were eligible only if it was explicitly stated in the original article that this control treatment was expected to have no effect on the outcome measures.

Outcomes
This study adopted the language of the International Classification of Functioning, Disability and Health (ICF) [4] as recommended by international health policy makers and the research community [30]. The World Health Organisation (WHO) clearly states that the activity component (i.e. execution of a task or action) and the participation component (i.e. involvement in a life situation) in the ICF-model are overlapping [4]. The lack of clear cut definitions between the domains has been identified as a shortcoming of the ICF-model when used in research [30]. To enable a clear definition of outcomes in this review, the following categorization of outcomes was used.

Primary outcome
Self-reported function/disability: measures aiming to quantify either the degree of functioning or disability in an individual in his/her life setting. These measures should be obtained from questionnaires, either self-administered or by interview, or by proxy (observation) as this approach is common in studies of the oldest old and institutionalized. The primary outcome of the present review is selfreported disability. Trials were eligible if self-reported disability was measured by specific disabilityquestionnaires or by a subscale within questionnaires comprising multiple aspects of health status or health related quality of life. If the data from the relevant subscale were not presented separately in the original article, authors were contacted. In cases where more than one eligible measure of disability was presented in a trial, the measure with most items reflecting ADL and IADL was selected for analysis. Eligible outcomes could also include items related to mobility-disability.

Trial design
Only RCTs were eligible for inclusion. Trials comprising more than two study arms were eligible if the relevant outcome data were provided in the article separately for all groups, or if it could be obtained by request.

Information sources and searches
In order to ensure an optimal search strategy for inclusion of all relevant records, a pilot search matrix was drafted and continually edited from December 2015 to October 2016 by two reviewers (PO and ADT).
Subsequently a final comprehensive, generic matrix was developed and adapted for each database (Additional file 1). It comprised the following key search terms: older adults, resistance training, and self-reported ADLdisability representing participants, interventions and outcomes respectively, in accordance with the PICOS. Blocks of relevant synonyms for each key term were created, and subsequently combined by using Boolean operators. The electronic databases PubMed, Embase, Web of Science, CINAHL and SPORTdiscus were searched on June 27th, 2018. The title and abstract fields were searched in all databases. Moreover the "key words" field and relevant subject headings were searched when applicable. Articles of any western language and publication date were eligible.

Study selection
Upon completion of the search, results were combined in EndNote X7.7.1 and duplicates were removed. The remaining records were then transferred to a web based software for review management (Covidence Veritas Health Innovation Ltd., 2019 [31]) and two reviewers (PO and MB) screened titles and abstracts to identify potentially eligible trials for full text assessment. PO and MB independently included or excluded records according to defined criteria. In the case of disagreements on any level in the selection process, PO and MB discussed the inclusion of a trial, until consensus was reached or a third reviewer (PC) was consulted.

Data collection and extraction
PO, ADT and MB carried out the quality assessment and data extraction from the eligible records.

Extraction of primary and secondary outcome data
Change from baseline in self-reported disability and all objective measures of muscle function/impairment and functional capacity/limitation data were extracted from each study. The most frequently reported categories of secondary outcome data were selected for the secondary analysis. These were: i) measures of isometric and dynamic knee extensor strength (KE-strength); ii) measures from tests of gait capacity (i.e. max gait speed, selfselected gait speed, time to cover set distance or distance covered in set time); and iii) lower body functional capacity assessed by any objective test of functional capacity relying mainly on the lower body. In order to decrease outcome heterogeneity, tests including chair-rise (e.g. chair-rise tests, timed up-and-go, Short Physical Performance Battery tests) were prioritized when studies reported various functional tests.
When available, within group mean change-scores (mean change ), baseline means (mean base ), follow-up means (mean end ), all corresponding measures of variability (i.e. standard deviations, SD; standard error, SE; confidence interval, CI; coefficient of variation, CV; ranges, interquartile ranges) and p-values of withingroup change for each group were extracted. As this review focusses on the short-term effects of RTinterventions, the baseline and follow-up measures were defined as the time points nearest the initiation and termination of the interventions respectively, and the within-group mean change-score (mean change ) was defined as When both the change-score including variability measure, and a complete set of baseline and follow-up means and measures of variability were missing, the authors were contacted for additional data. When the data could not be retrieved, the study was excluded from the meta-analysis, but kept in the review to be included in vote-counting analysis (supplementary conservative approach) to summarize intervention effects (described below).

Descriptive data
Descriptive items of the included studies First author; year; country; setting; study design; aims of the study; participant characteristics (health status, residential status and distribution of sex); sample size in analysis; drop-out rate; compliance rate; short description of the intervention; experimental and control conditions; and the direction of the effect for the self-reported disability outcomes were extracted and complemented by relevant additional notes (displayed in Table 1). For use in a post hoc sub-analysis, participant mean age was dichotomously categorized as ≥65-79 years and ≥ 80 years respectively. Likewise, categories of gait speed at baseline (≥0.8 vs. < 0.8 m/second) was used as surrogate measure to quantify degree of functional capacity across study samples.

RT-intervention items
In order to describe the extent of heterogeneity in the included RT-programs, data regarding: i) training intensity; ii) duration; iii) frequency; iv) supervision; and v) progression protocols were extracted. A previous review [12] found that training intensity and duration were significant predictors of the effect of RT on muscle strength. Therefore, these two variables were selected as covariates in two independent metaregressions in order to investigate whether, and to what extent the two variables predict self-reported ADLdisability. For training intensity, percentage of one Repetition-Maximum (%1RM) was selected as the standardized unit and used as a continuous covariate for the      [48]. When a range of intensities was provided, the mean intensity rounded up to nearest 5% as the estimate was used. Intensities based on Rate of Perceived Exertion scales (RPE) did not form part of the covariate.

Quality assessment/risk of bias in individual studies
Risk of bias in individual studies was evaluated using the validated PEDro scale [28,49]. The PEDro quality assessment tool rates the internal validity of RCTs on a scale from 0 (low quality) to 10 (high quality), with a score of ≥6 representing a cut-off for high-quality studies. For this review, two modifications to the scale were made. When awarding points in item four: "the groups were similar at baseline regarding the most important prognostic indicators" original texts and tables were screened for evidence that baseline differences were assessed. However, equality in baseline levels regarding self-reported disability was not always investigated, as this was a secondary outcome in many studies. Consequently, we performed meta-analysis of baseline-scores to screen for potential baseline differences in selfreported disability that were not addressed in the original articles [50].When this analysis revealed a baseline value in the active intervention group that differed significantly from that of the control group in a study, that study was awarded a "no" in item four of the PEDro scale, regardless of whether this baseline difference was addressed in the original paper.True participant blinding by a placebo intervention is not possible given the nature of the active treatment (i.e. RT). Item 5 "there was blinding of all participants" was therefore considered satisfied if sham intervention or attention control was applied in the control group.

Handling of missing data
Missing SDs were imputed from other available measures of variability (SE, CI) or from the exact p-values using methods proposed by Fu et al. [50]. When only a baseline mean, a follow-up mean and the corresponding measures of variability were reported, the mean change was calculated based on these data whereas SD change was imputed from correlation estimates (Corrs) from other studies using the following eq. ( [50,51]; ch. 16.1.3.2).
Corr mean is the mean of all Corrs that could be calculated for the given category of outcome (i.e. selfreported, KE-strength, lower body physical function, or gait capacity). The Corrs for the individual studies were calculated by the below equation when studies provided sufficient information.
If very few studies provided the data needed to calculate the Corr mean , missing SD's were imputed directly from the other treatment group within the same study or from another included study. Where studies reported multiple intervention groups of more than one RT modality (i.e. varying level of intensity, supervision or frequency) versus a control condition, data were combined according to existing recommendations ( [51]; ch. 7.7.3.8 and 16.5.4). Non-parametric summaries were used to estimate means and SDs in two studies [39,43] regardless of skewed distribution. This approach is supported by Fu et al. [50] provided the variable of interest has symmetric distribution in most included studies, as was the case in this meta-analysis. An exact description of how missing data have been handled for each study can be retrieved from the corresponding author.

Synthesis of results
The data synthesis was carried out using Stata statistical software (Stata/IC 15.1). The results from the individual studies were combined and pooled by calculating adjusted SMDs using Hedges' g. Accordingly, metaanalyses were performed for the primary and the secondary outcomes. For scales where low scores are favourable, the means were multiplied by − 1. Considering the broad inclusion criteria for the resistance training interventions, true heterogeneity in intervention effects was expected and the DerSimonian-Laired random-effects method for continuous outcomes was applied accordingly. The extent of between study heterogeneity was tested with the standard Q 2 statistics and the I 2 index [52]. There is somewhat agreement across references [51]; ch. 9.5.2, [53]) that heterogeneity should be assumed if I 2 is > 50%, indicating that 50% of the variability in the outcome cannot be explained by sampling variation, and cut points of I 2 values of 25, 50, and 75% may be used to categorize low, moderate, and high amounts of heterogeneity [54].

Secondary analysis
To investigate if the intervention effects on objective measures of functioning were also associated with the changes in self-reported ADL-disability, metaregressions were performed as follows: the effect sizes (i.e. SMDs) calculated in the meta-analyses on KEstrength, gait capacity and lower body functional capacity were fitted as continuous covariates in three separate meta-regressions (metareg-command) using the effect size (i.e. SMD) on self-reported disability/function as the dependant variable and the standard error of that SMD to weight the studies. Three measures from the meta-regressions were used to interpret the results [55]: i) the I 2 res % is the percentage of the residual variation that is due to between-study heterogeneity (the rest of the heterogeneity (100% -I 2 res %) is due to within-study sampling variability); ii) the adj. R 2 % which is the proportion of the heterogeneity in the dependent factor that can be explained by the covariate fit in the metaregression, and iii) a p-value of the overall test of the covariate in the random effects model.
In addition, the predictive value of specific intervention parameters (duration of intervention and load intensity) on the size of the RT effect on self-reported disability/function were tested by meta-regression as described in the section above [55]. Heterogeneity sources were also investigated by performing stratified analyses according to participant age, (65 < 80 yr., ≥80 yr.), residential status and relevant study quality parameters (parameters selected post hoc) [53].

Risk of publication bias across studies
To assess whether publication bias influenced the results of the primary outcome, a funnel plot was created [56] and the Egger's test [57] (metabias command in Stata) was applied to assess small study effect.

Search and study selection
The result of the search is outlined in the PRISMA diagram (Fig. 1). The search yielded 12,970 records, of which 5051 were duplicates. Thus, the title/abstract of 7919 records were screened for possible eligibility, leading to the exclusion of 7604 records based on the inclusion and exclusion criteria. The full text of 315 records were assessed for eligibility and 295 records were excluded (Fig. 1). Of the 20 eligible records, 14 included complete data that enabled their inclusion in the primary meta-analysis. In five of the remaining six records, the relevant data were incomplete and was not provided upon request [16,38,41,42,46]. The sixth trial [40] was initially included in the main meta-analysis. However, exceptionally small sizes of variation in change in this study, heavily distorted the results of the meta-analysis (forest-plot of the meta-analysis including this study is included in Additional file 2, Fig. 1a). Consequently, it was decided to exclude the data from this study from the quantitative pooling. Not taking these six studies into account could however increase the risk of systematic, selective reporting potentially leading to an overly positive conclusion. Consequently, we made a post hoc decision to keep the six trials in the qualitative synthesis of data and include them in an additional vote-counting analysis * ( [51]; ch. 9.4.11, [58]). The vote-counting procedure involves simply comparing the number of studies reporting positive effect (intervention favours experimental group), no effect (the effect was insignificant) and negative effect (intervention favours control group). If a majority of studies fell into any of these three categories, this category was declared the best estimate of the direction of the true relationship between the independent variable (i.e. RT-intervention) and the dependent variable (i.e. self-reported ADL) [58]. This method has major limitations as it does not take into account the quality of the studies, the size of the samples or the size and variability of the effect. Bushman & Wang [58] advocate that this method should be used only as a supplementary data synthesis approach to complement the primary meta-analysis of SMDs, as was the case in this study.

Study characteristics
A detailed description of the characteristics of the eligible studies is presented in Table 1.
Overall, a total of 1422 participants were included in the 20 RCTs, with 747 and 675 participants in the intervention and control groups respectively. In the subsample of trials included in the primary meta-analysis (n = 14), the total number of participants was 651(IG: n = 354, CG: n = 297). Sample size ranged from 21 to 222 participants (median/mean 50/71). In three trials, the mean age of participants was under 75 years [32,35,39], all of which were included in the meta-analysis. Eight trials recruited participants who were communitydwelling [15,16,32,35,[38][39][40]46], eight trials recruited institutionalized elderly [17,33,36,37,41,44,45,47], three trials recruited from sheltered housing [18,34,43], and one trial did not specify residential status [42]. In the five trials that were not included in the main metaanalysis, the participant mean age was above 75 years and they were not institutionalized. The distribution of participant sex was unequal with females being more represented than males. Three included only women [32,46,47], eight trials had > 60% of female participants [33,34,36,37,39,41,43,44], five trials included men and women nearly evenly (50-60% female) [15,16,35,38,42] and four studies did not report sex distribution [17,18,40,45]. Three trials [17,34,59] had two eligible RT-intervention groups for which data were collapsed as described previously. Regarding the comparison condition, nine trials provided a sham intervention as stretching or exercises without load or attention control equalling or approximating the time spent in RTintervention [15,17,32,37,38,[42][43][44]46]. The control groups in the remaining 11 studies were asked to maintain their current activity level and were provided with usual care when relevant. Four studies did not report compliance [16,40,43,44]. Compliance in the remaining 16 studies was relatively high with mean compliance rates at~75% or above. Compliance was in most cases expressed as n attended sessions n planned sessions Â 100%. In one study however, a specific compliance criterion was set and compliance below this threshold was characterised as drop-out, making 100% compliance inevitable [32]. All but two studies [40,44] reported drop-out. The mean drop-out rate was higher among IG's compared to CG's (IG mean drop out: 20%, range 6 to 37% vs. CG mean drop out: 15%, range 0 to 45%). One study reported participants dropping out for reasons related to the intervention (e.g. exercise intensity [37]). Seven studies reported some type of adverse events. Major adverse events were reported in two of these studies, with one case of rotator cuff injury [15] and one inguinal hernia [19]. Muscle injury and occasional exacerbation of musculoskeletal conditions such as arthrosis were the most frequently reported adverse events [15,41,42]. In six studies, no data on adverse events was presented [16,34,40,[44][45][46].
Primary outcome -self-reported ADL-function/disability A full list of all self-reported ADL outcomes extracted from the articles is presented in the additional material (Additional file 2, Table 1A). In total, 10 different instruments for self-or proxy rating of function/disability were represented in the 20 included studies, of these nine were represented in the primary meta-analysis. All instruments were generic (i.e. not condition-specific). The physical function domain (PFD) of the SF-36 and the Barthel Index (BI) [60] were the most frequent outcomes, used in five trials each (SF-36 [16,32,[40][41][42]; BI [33,36,43,45,47]). However, the SF-36 was only represented in the meta-analysis by one study [32]. Different scoring systems were applied across studies for both of these two instruments. The Groningen Activity Restriction Scale was applied in two studies [18,34], one of them presenting separate data on lower extremityrelated ADL, which was the target body part of that study intervention [18]. Lowton and Brody's IADL Scale was likewise used in two studies, but not in its full version. Six identified instruments were only applied in one study each [15,17,22,35,37,39,44,46]. Four instruments were referred to by the study authors as validated, supported by a reference (the SF-36 [32,40], the BI [36], the Groningen Activity Restriction Scale [18,34], the Joensuu classification of ADL/IADL skills [46]).

Secondary outcomes -objective study outcomes
Objective study outcomes were only collected from the trials that were included in the main meta-analysis (n = 14). KE-strength was reported in nine studies [15, 17, 18, 34-37, 39, 45]. Eligible data for the lower body functional capacity were available from 12 studies [15, 17, 18, 32-37, 39, 43, 45]. Chair-rise as a single task or chairrise in combination with walking and/or balance were the most frequent outcomes in this covariate [17,18,33,34,36,37,39,43,45]. Two studies [15,32] used a battery of functional tests for the entire body. However, one of them [32] reported a separate score for the tests mainly related to the lower body, but it was not specified in the text which tests were selected for this score. Nine studies presented data from gait performance tests [18,[32][33][34][35][36][37]39]. Three trials measured the distance reached in six minutes of walking [17,32,33], and the remaining six trials measured time to complete a pre-set short walking distance [18,[34][35][36][37]39]. A descriptive overview of secondary outcomes is available in the additional material (Additional file 2, Table 2A).

Risk of bias within studies
Rating of methodological quality A full overview of the assessment of study quality by the PEDro tool is summarized in Table 3. The majority of studies (15 out of 20) were of high quality. Nine trials [15, 16, 18, 36-39, 41, 43] provided some information about the method of randomization, suggesting that randomization was properly concealed (i.e. the use of concealed envelopes or the randomization was generated by an independent person). Consequently, this item was the least frequently met in the studies. Baseline imbalance between groups were reported in two studies [44,47] who accounted for this by using analysis of covariance (i.e. ANCOVA). The meta-analysis of baseline data revealed baseline differences in self-reported ADLdisability outcome in further two studies [18,33]. The quality criterion "8: Measures of at least one key outcome was obtained from 85% of the subject initially allocated to groups", was only fulfilled by 10 trials [15-17, 33, 35, 39, 41, 42, 44, 45], death and severe illness unrelated to the intervention were often reported as major dropout reasons.

Risk of bias across studies Publication bias
The Egger's test for small study effects was not significant (p = 0.05), but visual inspection of the funnel plot showed one outlier [45] (Fig. 2). When removing that outlier from the dataset there was no indication of publication bias (Symmetric funnel plot and Eggers test, p = 0.11). The outlying study [45] was of high quality according to the PEDro score (score = 7). 1. Eligibility criteria were specified . subjects were randomly allocated to groups (in a crossover study, subjects were randomly allocated an order in which treatments were received) . The groups were similar at baseline regarding the most important prognostic indicators There was blinding of all subjects There was blinding of all assessors who measured at least one key outcome Measures of at least one key outcome were obtained from more than 85% of the subjects initially allocated to groups . All subjects for whom outcome measures were available received the treatment or control condition as allocated or, where this was not the case, data for at least one key outcome was analysed by The results of between-group statistical comparisons are reported for at least one key outcome . The study provides both point measures and measures of variability Total score for internal validity (item 2-15)
Complementary analysisvote-count procedure All eligible trials (n = 20) were included in the secondary analysis of vote-counts. Seven trials demonstrated a significant effect of RT on self-reported disability, while no trials found effects favouring the control group. Thus, no effect of RT on self-reported disability was the most frequent finding among the studies in this review (n = 13 studies). None of the six trials, that were ineligible for the meta-analysis, found a significant intervention effect on self-reported disability (Table 1).

Secondary analysis Investigation of between-study heterogeneity
The apparent statistical heterogeneity in the pooled data (I 2 = 75.1%; Q: p < 0.001) was explored using sub-group analysis according to three participant characteristics (age; residential status; and gait speed at baseline), two training modality parameters (workload intensity and session frequency) and six study-quality items (allocation concealment, baseline imbalance, subject blinding, assessor blinding, end point data on 85% of participants, and intention-to-treat analysis). The latter were selected post-hoc because they were not satisfied in all trials (see Table 3). All of these subgroup analyses resulted in either nonsignificant pooled SMDs, heterogeneity, or a very low number of trials (< 4) within the new subgroups (see results in Table 5).
In addition to the subgroup analysis, meta-regressions were performed to explore the differences in treatment effect by participant characteristics (mean age as a continuous variable, and gait speed at baseline as a categorical variable < 0.8 or ≥ 0.8 m/second) and by the RTmodalities (load intensity (%1RM, continuous) and duration (weeks, continuous)). Two studies [17,45] had two intervention groups that only differed by load intensity. These groups were separately represented in the metaregression regarding RT-load intensity. Intervention duration and mean age were significantly associated with the effects on self-reported disability in that shorter duration and higher age predicted greater effects (duration: coefficient − 0.074, p = 0.024, Adj. R 2 = 65.1%, I 2 res. = 44.9%, n = 13 studies; and age: coefficient 0.088, p = 0.027 Adj. R 2 = 43.7%, I 2 res. = 64.8%, n = 14 studies, Table 6).

Association between effects on objective study outcomes and self-reported ADL-disability
The random effects meta-analyses on KE-strength (n = 9) and lower body functional capacity (n = 12) revealed significant positive effects of RT of large (SMD = 0.97) and moderate (SMD = 0.63) size respectively ( Table 4). The effect on gait capacity was small (SMD = 0.36) and did not reach significance (9 studies). Heterogeneity was moderate or high in all three analyses. The effects of RT on KE-strength and gait capacity were not associated with change in self-reported ADL-disability (p = 0.196, and 0.152 respectively). However, effects on lower body functional capacity was significantly associated with SMD in self-reported disability (coefficient: 0.772, 95% CI: 0.49 to 1.06; p < 0.001, Adj. R 2 = 99.2%, I 2 res = 0.0%, n = 12 studies, Fig. 4). An overview of results from metaregressions are presented in Table 6.

Summary of evidence
This systematic review and meta-analysis showed that resistance training has a significant moderate positive effect on self-reported disability in older adults with preexisting functional limitations or disability. This evidence was based in 14 trials investigating the effect of resistance training on self-reported function or disability related to ADL and IADL. The quality of the studies was generally high, with 11 out of 14 trials scoring ≥6 on the PEDro scale. There was a trend for publication bias, but this was caused by a single medium-size study. Meta-regressions performed on a subset of the studies, revealed that there was: i) a strong association between improvements in lower body functional capacity objectively assessed (e.g. chair-rise) and reduced self-reported disability (12 trials), ii) no association between reduced self-reported disability and either improvements in knee extensor strength or gait capacity (9 trials each). Higher age predicted greater effects on self-reported disability. In general, RT seems to be a safe method in this population. Transient inconveniencies (i.e. muscle soreness and exacerbation of musculoskeletal conditions such as arthritis) are the most frequently reported cases, and serious adverse events appear to be rare (two injuries reported in 1422 participants).

Primary results
The eligibility criteria that were applied in this study increase participant homogeneity in terms of age (≥65 Table 5 Results from stratified meta-analyses on the effects of resistance training on self-reported disability/function in older adults with functional limitation or disability *Results by the derSimonian and Laird random-effects method using Hedges' g **In this sub-analysis two studies [17,45] are represented by two intervention groups that exercised at different intensities SMD Standardised Mean Difference, CI Confidence Intervals, d.f. degrees of freedom, Q Heterogeneity statistics, I 2 the variation in SMD attributable to heterogeneity, CD Community-Dwelling, GI Geriatric Institution, SH sheltered housing, ND no data, RM repetition maximum, m/s meter per second years) and functional state at baseline compared to earlier meta-analysis [23]. This may explain the four-fold larger effect on self-reported disability seen in the current meta-analysis compared with the earlier Cochrane review by Liu and Latham (SMD 0.59 vs. SMD 0.14) [23]. Poor performance in functional tests objectively assessed (e.g. gait, chair-rise or balance) is a well-known predictor of future disability in ADL [61][62][63]. Therefore, in populations with no or only subtle functional limitation, the baseline levels of perceived ADL-function will expectedly be high, and improvements following a strength training intervention may be difficult to detect [6]. Keysor & Jette [22] addressed the inadequacies of the existing tools to measure perceived disability already more than 15 years ago in a systematic review of exercise effects on disability. These authors stressed that low responsiveness and ceiling and floor effects were major shortcomings in the self-reported ADLdisability outcomes (amongst others the Barthel Index and the SF-36). Despite the two most frequent outcomes in the present review were the Barthel Index and the SF-36, ceiling effects were only discussed as an issue in one of the articles included [35]. Only six studies stated that the selected tool was validated, and the validations were not necessarily performed in the same population (i.e. older adults with functional limitation or disability) or for the same purpose (i.e. detecting the effects of a RT intervention). The supplementary vote-count analysis allowed us to include six additional studies. This additional analysis For the load intensity covariate two studies [17,45] are represented by two intervention groups that exercised at different intensities SMD standardised mean difference from random effects model using Hedges' g Fig. 4 Association between self-reported disability and lower body functional capacity. Bubble-plot from meta-regression on the association between self-reported disability and lower body functional capacity. Dependent factor: Standardised mean differences (SMD) of resistance training effects on self-reported disability. Covariate: SMDs of the effect of resistance training on lower body functional capacity indicated that there is no effect of strength training on self-reported measures of disability. This is in opposition to the moderate positive effects shown by the metaanalysis without these six additional studies. However, the result of the vote-count analysis must be interpreted with caution, as this procedure does not take into account the quality of the studies, the sample size, or the size and variation of the effects in the individual studies. Also, one major advantage of a meta-analysis over the vote-count procedure, is indeed the increased power to show significant effects by pooling of data from several smaller studies. Nevertheless, this indicates that studies with negative results may more frequently report selective or incomplete data and the primary results of the present study could potentially be biased by this in a positive direction.
Associations between changes in self-reported disability and objectively measured function following RT Lower body functional capacity The meta-analysis on lower body functional capacity showed significant moderate effect (12 studies, SMD 0.625, p = 0.002) of a similar magnitude as the meta-analysis on selfreported disability. Moreover, the meta-regression between these two outcomes revealed that the size of the effect of RT programs on lower body functional capacity almost completely explained the heterogeneity in effects on self-reported disability across studies (Adj. R 2 = 99.2%). Visual inspection of the bubble-plot of this meta-regression (Fig. 4), indicates that one study may be the main cause of this nearly perfect linear relationship. Therefor a robustness analysis, omitting that study [45] from the meta-regression, was performed. In the new analysis without this study, the association was still significant (p = 0.025), the proportion of between study heterogeneity explained was high (85.84%) and the residual variation due to heterogeneity was minimal (3.78%). The strong association between RT effects on self-reported function and lower body functional capacity is a novel finding. It is surprising because it is in contrast with previous meta-analyses finding large effects of RT programs on objective measures of function but small effect on self-reported disability.
Knee extensor strength The effect size for KE-strength was large (9 studies, SMD 0.97, p < 0.001). This is in agreement with previous studies in healthy older adults [12] and in older adults not specifically characterized by functional limitation or disability [23], indicating that changes in muscle strength may be independent of preexisting limitations or disability, and that for muscle strength ceiling effect may not be an issue. The present study found no association between RT effects on KEstrength and self-reported disability. Muscle power of the lower limbs has been shown to be stronger associated with functional capacity than muscle strength [64], and recent studies suggest that muscle power and strength of the hip muscles in particular is linked to balance control in older adults [65]. Because muscle power may be a stronger predictor of future disability, studies investigating the changes in muscle power and selfreport disability following RT programs may provide valuable information. Nevertheless, data available for a meta-regression on this specific association were sparse and it was not the aim of this meta-analysis.
Gait capacity In the current study, the effect of RT on gait was not statistically significant (9 studies, SMD = 0.357, p = 0.061). This finding is in contrast with previous meta-analyses which demonstrated large significant effects of RT on gait capacity in older adults who were not selected by functional limitation-or disability criteria [23,66]. Gait capacity was a secondary outcome in the present review, and the articles eligible for the metaanalysis on gait capacity were restricted to be only those including both self-reported disability and an objective measure of gait. The purpose of performing this secondary meta-analysis was to use the calculated SMDs in a subsequent meta-regression. This was also the case for the meta-analyses on KE-strength and lower body functional capacity. Because of that methodological aspect the meta-analyses on gait capacity, KE-strength and lower body functional capacity do not represent comprehensive systematic literature searches on these three outcomes and should not be interpreted as such. This may also account for the difference in RT effects on gait in the present study from previous literature.
Meta-regression revealed no association between gait speed and self-reported disability. This may be explained by the fact that the self-reported tools cover a broader set of aspects of function and disability, which are better reflected in lower body functional capacity than in gait capacity. One could argue, that it would have been more suitable to investigate associations between self-reported mobility and gait capacity. However, since gait speed is a well-known predictor of incident disability [61], we found it relevant to specifically investigate if increases in gait speed are associated with improvements in selfreported disability.
Investigations of heterogeneity This study specifically aimed at investigating the effect of resistance training in older adults with pre-existing functional limitations or disability.
This possibly increases the external validity and the generalizability of our results for this target group, compared to previous meta-analyses which included both well-functioning and functionally limited older adults. The effect of resistance training on older adults with pre-existing limitations or disability are less likely to be affected by ceiling effect for self-report assessment of disability and may translate in more clinically meaningful benefits than for their higher-functioning peers.
Nevertheless, the moderate positive effect on selfreported disability observed in this study was associated with statistical heterogeneity (Table 4). This was not significantly reduced when the data from trials utilizing different training modalities or features of the study design (e.g. blinding) were pooled separately in sub-group meta-analyses. The number of studies in each of the subgroups was low. This impairs the strength of these sub-group analyses and the conclusions that can be drawn from them. When meta-regressions to investigate the influence of duration and load intensity of training as well as participant characteristics (age and gait speed at baseline) were performed we found that shorter duration and higher age were associated with larger effects. Contrary to what has been found in meta-analysis investigating the effects of RT on muscle strength [12,23,67] there was no evidence in this data that load intensity predicts the effects of RT on self-reported disability. Possibly, changing the perception of disability in older adults is not fully mediated by muscle function (e.g. muscle strength or power), indicating a more complex interplay between various components, that is still to be uncovered and understood by further research.

Strengths and limitations
To our knowledge, this is the first systematic review to investigate associations between objectively measured performance-based outcomes and self-reported disability on a study-level using meta-regressions in older adults with pre-existing functional limitation or disability.
A limitation of these meta-regressions is that they were performed on a low number of studies (ranging from 9 to 12 studies). The probability of finding a positive association by chance alone, is higher when running multiple sub-analysis in a meta-analysis study with a low number of studies. However, the results of the various additional analysis do highlight factors that may be important in understanding the heterogeneity of effects and designate questions that need to be addressed in further research.
A novel finding was the good methodological quality of the included studies. Most of the studies (19 out of 20), however, failed to satisfy at least one of the applied quality criterions that are known to increase internal validity (i.e. intention-to-treat analysis, blinded outcome assessors, attention control groups (subject blinding), or allocation concealment). Sub-analysis showed no evidence, that studies not fulfilling these criteria systematically over-or underestimated the effects. One study [40] had a score of 3 on the PEDro scale. The lowquality rating of this trial did not affect the results of the meta-analysis and meta-regressions, as this trial has been excluded from these analyses for other reasons.
The most frequently unsatisfied quality criterion was "missing follow-up data from more than 15% of the initially allocated participants", which might be due to selective drop-out because of very old age (i.e. mean age was > 80 years in 60% of the included studies). For this population, critical illness and death is to be expected even during shorter intervention periods. A clear example of this is the result from a study where more than 15% of the initially allocated participants were lost to death alone [36]. In spite of that, the mean drop-out rate across studies in this systematic review was lower than what has been reported in exercise studies in general [68]. Another strength of the present systematic review is that eligible studies, which could not be included in the meta-analysis because of analysis design or incomplete reporting of data, were still addressed in the review. This feature adds valuable information to the statistical results increasing the transparency of the evidence provided. The characteristics of the articles that were eligible for vote-count analysis only did not differ from those included in the statistical meta-analysis. Also, the mean PEDro quality score and the proportion of studies below the threshold of six on the PEDro scale was similar between included and non-included studies.

Conclusion
Based on the current evidence, RT has moderate positive effects on self-reported disability in older adults with functional limitations and or self-reported disability. Shorter duration of the RT-intervention as well as higher age predicted greater effects. Additionally, gains in lower body functional capacities are associated with positive effects on self-reported disability. However, no such association was found regarding gait capacity or muscle strength.
The results demonstrate that the continuously growing population of older adults at risk for or with existing ADL-disability, can benefit from RT in terms of improvements in self-reported ADL-function. Moreover, the diversity in intervention modalities and settings evident from these data, support that implementation of effective RT interventions is feasible and can be incorporated into routine health care services.
The finding of a clear association between effects in objective lower body functional capacity and selfreported function/disability supports the relevance of objective tests of physical function to evaluate RT intervention effectiveness also in terms of detecting changes that are perceivable for the individual. In this review the associations of effects on different outcomes were investigated on a study-level. The material and data synthesized here, provides no insight into how and by which factors such association may be moderated or modified in individuals. We call for more research to investigate these relationships with approaches allowing for investigation of direction, potential modifiers and moderators of interactions. Gaining more knowledge about such underlying mechanisms is imperative to enable optimization of future exercise interventions in producing effects that are clinically meaningful to the individual and society.