COVID19 patient diagnosis and treatment data mining algorithm based on association rules – Wiley

3.1 Research objects and criteria

General information mainly includes gender, age, underlying disease, contact history, and so forth. Clinical data mainly included first symptoms and signs, Mulbsta score, critical time interval during diagnosis and treatment (including onset to dyspnea, first diagnosis, admission, mechanical ventilation, death, and time from first diagnosis to admission, etc.), laboratory examination, complications and main treatment conditions, and cause of death, and so forth. The frequency of drug use was counted, and association rule algorithm was used to analyse and study the effect of drug treatment. Through the study and analysis, the results obtained can provide reference for rational drug use of COVID-19 patients, reduce the cost of disease treatment and reduce the disease pain of patients.

A retrospective analysis was conducted on 49 cases of COVID-19 deaths diagnosed on January 29, 2020, BBB 0 and March 6, 2020 in our hospital. Inclusion criteria: All the enrolled patients met the diagnostic criteria of confirmed cases in Novel Pneumonia Diagnosis and Treatment Protocol for Coronavirus Infection (Trial Seventh Edition) issued by the National Health Commission on March 3, 2020. Clinical manifestations: fever and/or respiratory symptoms; COVID-19 imaging features: Multiple small patches and interstitial changes were present in the early stage, with obvious extrapulmonary zone; Further development for double lung ground glass shadow, infiltrating shadow, serious cases may appear lung consolidation. Exclusion criteria: excluded successfully treated cases and excluded undefinitively diagnosed cases. The study was non-interventional and did not require patients to sign informed consent.

According to the course records, the laboratory examination results on admission (D1+1), 4+1day (D4+1), 7+1day (D7+1) and 14+2days (D14+2) were recorded, including routine blood, blood gas analysis, PCT, hypersensitive C-reactive protein (HSCRP), myocardial enzymes, liver enzymes, renal function, coagulation indexes, electrolytes and etiological data.

In this study, the data were obtained from the regional health information platform based on health records. In the final analysis, it belongs to the medical information system, which is closely related to the real world. In order to improve the efficiency of data mining in data processing, it is necessary to pre-process these data. In the data table of personal basic information, in addition to previous history records, there are other fields unrelated to the research, such as the person who built the file, the date of the file, the medical institution, and so forth. In this application, only the past history records are needed. Therefore, there is no need to pre-process these irrelevant fields, only the past history fields are processed.

Secondly, in the application of this data mining, the main objective is to extract association rules of COVID-19 complications. So its properties for mining should be various diseases. Therefore, it is necessary to classify individual disease types. In the data storage of a person's disease history, it is often a personal disease history composed of multiple diseases, so it needs to be classified and labelled. For example, in the database, the data in the past history column of Zhang San is hypertension, COVID-19, indicating that Zhang San had suffered from hypertension and COVID-19 before. Therefore, in the information column of Zhang San, the column of Hypertension is marked as A, and the column of COVID-19 is marked as B.

The data cleaning process is to remove the noise design in the original data and some data that is not relevant to the data mining of association rules, and also to process the missing data. Mainly includes missing data processing and error data processing, and complete some data type conversion work.

Due to the large amount of data in electronic health records, which are generated in different places, and the complicated process of generation, it is inevitable that there will be data loss, duplication, and even wrong data. So the data is cleaned.

Fill the void value: Because some attributes in a record may be related to a certain degree of Novel Coronavirus, but its record is empty, so it needs to fill the void value. Filling the void value can be handled by: Ignore the record: When some data rows in the data lack the class label required for their classification, this row can be ignored and the data can be deleted. If the number of tuples missing a class label is very large, this approach will be difficult to work with. Manually fill in missing values: This method compares the cost of time. Especially if the data set is very large. Global constant padding: This method is to populate the records for which some of the attributes are missing with a uniform constant. Although this is an easy way to do it, it is not safe. Mean padding: Calculates the average value of an attribute so that records with missing values in that attribute can be filled in with this average value.

Modify error value: because a lot of data in the medical information system are entered artificially by medical workers, there are some errors in some values, so they need to be modified. Values of data attributes that belong to the canonical standard can be modified by the range standard.

For the original data, after data cleaning, cannot be directly used. You also need to convert some of the attributes into the required form. In the original data, the age of an individual is not stored, only the date of birth is stored. Therefore, the age of an individual will be determined according to the date of birth and the date of filing. But the format of these two dates is not the same in some records, some use year - month - day format, and some use year - month - day format, in order to deal with the convenience, all use year, month, day format; An individual's age is then calculated from the difference between the date of birth and the date of filing. The calculated age belongs to continuous attribute, which is not good for the classification of discrete attribute. So you need to discretize. The transformation of age attributes is shown in Table1.

Go here to read the rest:

COVID19 patient diagnosis and treatment data mining algorithm based on association rules - Wiley

Related Posts

Comments are closed.