top of page

How Machine Learning Statistics Can Change the Game of Data Analysis in Rheumatology [...]

Guyard, F., Gossec, L., Leroy, D., Lafargue, T., Seiler, M., Jacquemin, C., Molto, A., Sellam, J., Foltz, V., Gandjbakhch, F., Hudry, C., Mitrovic, S., Fautrel, B., Servy, H. (September 2017).

Background/Purpose: A link between flares and physical activity would confirm the objective consequences of flares. In the ActConnect study of patients with RA or axSpA, the initial analyses made with traditional statistical tools on aggregated data found a low magnitude link between flares and physical activity(1). The objective of this reanalysis was to determine if applying Machine Learning technics (i.e., Big Data statistics) to this dataset, could lead to more accurate results about flares prediction.

Methods: In the ActConnect study, physical activity (steps) were collected through an activity tracker at the minute level, during 3 months for 170 patients, leading to 27 million information points that have been aggregated at the level of 24 hours (1). Patients also reported weekly their perceived flares. In this reanalysis, multi-class Bayesian classifications were performed to find a link between physical activity and flares / no flares states, using a Machine Learning software belonging to Orange (2). A normalization was performed to calibrate for each patient their pattern associated with no flares. As the data are sampled by minutes, models were designed using several aggregation intervals (24h, 12h, 4h, 1h) then trained randomly on 70% of data for each interval and tested on the remaining 30%. To evaluate the stability of the models, the complete analysis was done 10 times for each interval of aggregation. The performance of the models was evaluated using patient-reported flares (assessed weekly) as gold standard. Sensitivity, specificity and kappa were assessed.

Results: The modeling performance increased as the aggregation interval decreased. The best performance was evidenced for 1-hour interval (table 1). The increase in the agreement between the true classes and the predicted classes was also reflected in the substantial increase of the Kappa coefficient when the size of the aggregation intervals decreased (for 24 hours, kappa=0.51 [95% confidence interval 0.47, 0.56]; for 1 hour: kappa=0.90 [0.87, 0.92]).

Conclusion : Connected devices bring huge data-flows that cannot be handled with traditional statistical tools without data aggregation. Machine Learning techniques, which can compute raw datasets with minimal aggregation, bring more accurate predictions. These may contribute to a more precise quantification of existing links or to the identification of new links in rheumatological datasets.


bottom of page