Study design and participants
We conducted a retrospective cohort study on older adults who expressed a willingness to participate in independent community-based physical exercise groups. In this study, we analysed data from the registry of physical tests between April 2010 and December 2019 of community-based physical exercise groups in Sumoto City, located in the central part of Awaji Island, Hyogo, Japan. In 2019, there were 85 small sub-groups and a total of 2407 participating in the “lively 100-years-old physical exercise” program. This study received approval from the Ethics Committee of the University of Hyogo (No.2019F21).
The 40-min program consisted of stretching exercises, and seven types of muscle strengthening exercises . Muscle strengthening exercises included lifting both arms up, lifting both arms to the side, getting up from the chair, knee extension exercises, knee lift exercises, lateral leg lifting exercises, and standing hip extension exercises. Weights were attached to the limbs and could be increased to 2.2 kg over 10 steps . If the muscle pain persisted for > 1 week, the weights were reduced, and then increased again, by the participants, once the pain had subsided. In terms of when to change to a higher weight was determined by the participants themselves. Exercises were conducted once a week. However, the decision to participate was left to the individual.
The physical fitness measurements of the participants were included for analysis. The participants were informed of the date and time of the physical fitness tests in advance. We assessed weight (kg), knee extension muscle strength (kgf), and TUG time (sec). Knee extension muscle strength (kgf) was adjusted to body weight (%). As an indicator of lower extremity muscle strength, the strength of the knee extensors was measured using a hand-held dynamometer (μTas F-1; Anima Co., Tokyo, Japan) (Supplemental image 1) . The reproducibility and validity of determining the knee extension force, using a hand-held dynamometer, has been described in young healthy subjects  and hemiplegic patients . Measurements were recorded continuously by one trained physical therapist. No preparatory exercises were performed. The maximum value of two consecutive measurements was registered for further analysis. With the participants seated in a chair, the length of the belt was adjusted so that the knee joint was at 90° when a force was applied. Measurements were taken with a fixation belt worn over the two lateral digits from the outer edge of the tibia (Supplemental image 2). TUG was measured as a direct physical performance test. The TUG test is a reliable, cost-effective, safe, and time-efficient way to evaluate overall functional mobility  and measures the time taken to stand up from a chair, to walk and turn around at a mark 3 m ahead, then back to being fully seated in the chair again.
In addition, we used the motor fitness scale (MFS) score to evaluate the physical ability of the participants, based on mobility that was equivalent to the direct measurement of physical performance among older residents who had improved levels of health status and functioning . The highest total score attainable was 14 points, with higher scores indicating better physical performance. The MFS displays a high internal consistency (α = 0.92) and test-retest reliability (individual correlation [ICC] = 0.92) ; it comprises three subscales including mobility, muscle strength, and balance.
Covariates that were controlled included age, sex, the frailty index (activities of daily living, motor skills, low nutrition, dysphagia, cognitive function, depression, and home-boundedness), and the Kihon Checklist scores (KCL) . This checklist included a self-administered questionnaire consisting of a total of 25 questions.
Duration of exercises
At a “lively 100-years-old physical exercise” session, older adults were required to participate in physical fitness tests every 4 months during the first year, and thereafter, once annually. The total number of physical fitness tests per participant ranged from 1 to 13 times. The 25th percentile was three times and the 75th percentile was seven times. Therefore, in this analysis, all participants were divided into three groups according to the total number of participations and completed physical tests. Participants who took the tests < 3 times were placed in a short-term participation group, those who took the tests four to six times were placed in the mid-term participation group, and those who took the tests seven to 13 times were placed in the long-term participation group.
All statistical analyses were conducted using R version 3.6.3 (Vienna, Austria), with the lme4 package  to fit the mixed-effects models . Mixed-effects models can adequately adjust for missing data as an outcome variable, by the missing at random (MAR) assumption. Maximum likelihood methods were used for the analysis of missing data because the pattern of the missing data was not missing completely at random (MCAR). MLM is an extension of regression that allows for simultaneous estimation of fixed and random effects and is additionally robust for unbalanced data (i.e., missing observations) [22,23,24]. The t-test and the chi-square test were used to compare the baseline data. The level of significance was set at p < 0.05.
MLM was used to identify the physical performance slope difference between groups of different ages. Age was centred by the mean age of all participants. Centering changes an estimated intercept score in models, but does not influence estimated regression coefficients in the model. Intercept scores can be interpreted as the predicted value of outcome where age was mean age and the other predictors were zero except. We used the model to estimate the mean at each point in the reference grid for each age from 65 to 90 years old. Marginal means were estimated as equally weighted means of these predictions at specified margins.
We investigated the associations between the participation level and changes in lower extremity muscle strength, TUG time, and MFS using the following three models: Model 1 was an unadjusted model with a random slope model. Model 2 was adjusted for age, sex, frailty index (depression, home-boundedness, dysphagia, poor nutrition, cognitive status) with a random slope model.