Forecasting the Number of Tuberculosis Patients Using Automatic Clustering and Fuzzy Logical Relationship Method

The lungs are respiratory organs that have an important role in the human body, so if they are infected they will have a very dangerous impact. Tuberculosis is an infectious disease caused by the bacterium Mycobacterium tuberculosis bacillus , which infects the lungs and can potentially cause death. The goal of this study is to predict how many people in Kampar Regency will have tuberculosis in 2022. The Automatic Clustering and Fuzzy Logical Relationship (ACFLR) technique is employed. ACFLR is one approach for modeling time series data that incorporates the idea of fuzzy logic. According to a number of previous studies, this method is accurate. Secondary data from the Kampar District Health Office from 2017 to 2021 comprise the analyzed data. The result of research it was gotten to estimate the quantity of tuberculosis patients in 2022 upwards of 944 individuals with MAPE of 0.0882, the precision of guaging aftereffects of 99.9118, and an expansion in the quantity of tuberculosis victims from 2021 to 2022 upwards of 4 individuals. The forecasting result can be used by the government and medical staff so that they can take the right initial steps to reduce the increasing rate of tuberculosis.


INTRODUCTION
Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium Tuberculosis Bacillus which infects the lungs [1] and has the potential to cause death.Infectious diseases are very dangerous if not handled properly, therefore it is necessary to know the initial cause [2].TB transmission occurs through the air when TB sufferers splash mucus or phlegm when coughing or sneezing.TB bacteria will come through the mucus and be carried into the air.Then TB bacteria will enter other people's bodies through the air they breathe [3].Indonesia is ranked third with the highest number of TB sufferers in the world after India and China, the number of TB sufferers in Indonesia is around 5.8% of the number of TB sufferers in the world, and it is estimated that every year there are 528,000 TB sufferers with deaths of around 91,000 people [4].
Based on the results of the report of the Kampar District Health Office, Riau Province.In 2017, there were 1,069 TB sufferers.In 2018 there were 1,079 patients, this has an increase from the previous year, in 2019 TB sufferers there were 1,006, and from 2020 to 2021 TB sufferers there were 938 and 940.In 2021 TB sufferers experienced an increase from the previous year, although not as many as in 2018.However, the significant and slight increase in TB sufferers requires all parties to be able to commit and cooperate in TB management because this has a considerable impact not only from health aspects but also social and economic aspect [5].Therefore, to see how TB disease progresses in the next year requires forecasting or prediction.Forecasting is defined as a way of systematically and pragmatically predicting what will happen in the future based on past relevant data, so forecasting results are expected to provide greater objectivity [6].
Solving the problem of forecasting the number of tuberculosis sufferers can be optimized using the application of fuzzy science, namely using the Automatic Clustering and Fuzzy Logical Relationship (ACFLR) method.This method is a modified method of the fuzzy time series, the advantage of fuzzy time series modeling is being able to formulate a problem based on expert knowledge or empirical data [26] where the fuzzy time series is a method proposed by Song and Chissom to solve problems in the form of forecasting if the historical data is in the form of linguistic values [7] [8].Fuzzy time series is a data forecasting method that uses fuzzy principles as its basis [9] [10].An automatic clustering algorithm is used to determine the interval length [11], every the length of the interval is determined first at the beginning of the calculation process due to the interval process very influential in the formation of Fuzzyrelationship and the final result [12], and an algorithm is used to determine to forecast [13] [14].The advantage of the ACFLR method is that it can determine forecasts using short data and based on previous research the ACFLR method has a high level of accuracy compared to other time series methods.The disadvantages of the ACFLR method is in determining the length of the interval and interpretation of the output of the forecasting model, different interpretation will produce different forecasting results.Fuzzy logic too related to artificial intelligence [15], besides that fuzzy can also be used in grouping to diagnose a disease [16] [17].
Fuzzy Clustering is a technique for determining optimal clusters in a vector space based on the normal Euclidian form for the distance between vectors [18].There are many fuzzy clustering methods such as k-means, c-means and others [19].Besides that there are many more fuzzy theories such as in [20] [21] regarding FIS Sugeno, then [22][23] regarding fuzzy mamdani, then [24] regarding FIS tsukomoto, and the last [25] discuss about fuzzy linear programming that is one model that is often used to solve optimization problems.
In general, Fuzzy sets defined as a class of numbers with fuzzy boundaries [26].Research conducted by [11] in 2018 regarding the analysis of gold price forecasting using the ACFLR method resulted in a gold price forecast with a MAPE value of 5.3%, which means that the accuracy value of the forecast results with the ACFLR method is outstanding.Further research by [13] Makassar City resulted in a forecasting of the number of residents from 2017 to 2019 decreased.In 2020 to 2021, it increased, with a MAPE value of <10% which means that the level of accuracy of forecasting results using the ACFLR method is excellent.Further research by [27] 2021 regarding forecasting the number of foreign tourist visits using the fuzzy time series ACFLR resulted in a forecasting error value with a MAPE calculation of 6.22% which means that the error value of the forecasting results using the ACFLR method is excellent.Based on the events above, fuzzy theory has a big role in the health sector [28].Tuberculosis is an infectious disease where the most significant impact is death, so to reduce the mortality rate due to TB, it is necessary to forecast the number of TB sufferers so that the health department can anticipate or provide counseling to the community to maintain cleanliness and health, so the purpose of this study is the results of forecasting the number of TB sufferers in 2022 in Kampar Regency using the ACFLR.This research is expected to be used to anticipate an event for a program and as a benchmark for the number of TB sufferers in Kampar Regency Riau Province.So that the government and medical personnel can take appropriate initial steps to reduce the increase in TB disease.

METHODS
The ACFLR method is a method used to form intervals and forecasting arranged into a single unit [29].The ACFLR algorithm is divided into three main stages.The first stage carried out is to form intervals using an automatic clustering algorithm, the interval referred to is the distance for each clustering.The steps used are: 1. Sort the data from smallest to largest, and assume that no data is the same.Shown as follows:  1 ,  2 ,  3 , ...,   , ...,   Based on the row above, calculate the value of "Average_diff" using the following equation: The Average_diff value indicates the average difference between each adjacent data in successive data.2. Put the first numeric data (the smallest data in the data sequence) on an existing cluster (current cluster).Based on the value of average_diff it will be determined whether numerical data can then be entered into the current cluster or needs to be entered into a new cluster based on the following principles: i. Principle 1: It is assumed that the current cluster is the first cluster and it is assumed that numeric data  2 is the closest numerical data from  1 .Displayed as follows: is put into the current cluster that contains  1 .If not, then a new cluster is created for and assigned  2 a new cluster that loads into the current  2 cluster.ii.Principle 2: Assume that the current cluster is not the first cluster and assume that   it is the largest in the current cluster, and assume   that it is   the closest to .Displayed as follows:  { 1 , … }, … , {… }, {… ,   },   , ...,   If   -  ≤ average_diff and   -  ≤ cluster_diff, then it is put into   the current cluster that contains   .If not, then a new cluster forms for   and assigns a new cluster that loads   into the current cluster.Cluster_diff is the average difference between every two adjacent data in a cluster that has been formed.The calculation of cluster_diff can be determined by the formula: where  1 ,  2 , … ,   is the data in an existing cluster.iii.Principle 3: Assume that the current cluster is not the first cluster and   is the only one in the current cluster.Assumed that   it was data adjacent to data   and   was the largest in the previous cluster, and was shown as follows: { iii.Double-check all established intervals until the entire cluster is transformed into intervals.5.Each interval obtained in Step ( 4) is divided into  sub-intervals, where the values  are adjusted according to the amount of data used.After forming the interval using the automatic clustering algorithm, the second stage that is done is to calculate the forecast value with the ACFLR algorithm using the following steps: 1. Set of Universes The set of  universes is determined according to existing historical data by finding the minimum value and the maximum value.Next calculate the midpoint value of each interval obtained from Step (5).

Defining Fuzzy Sets
It is assumed  that there is an interval obtained from Step ( 5) is  1 ,  2 , ...,   further every fuzzy set   with the following conditions:1 ≤  ≤  [30].

Fuzzification Process
Based on the definition of fuzzy sets in Step (2), the next step is to fuzzify each data into fuzzy sets.If the data belongs to   where 1 ≤  ≤  , then the data is fuzzified in   , and so on until all data is fuzzified.

Fuzzy Logical Relationship
In this step, form a fuzzy relationship based on the data obtained in Step (3) i.e. if the data has been fuzzified from the year  and  + 1 is   →   , then a fuzzy logical relationship is formed "   →   ", which   is referred to as the current condition while referred to   as the next condition.Based on the current state of the fuzzy logical relationship, fuzzy logical relationship groups are formed, where fuzzy logical relationships that have the same current linguistic value are incorporated into the same fuzzy logical relationship.

Defuzzification Process
In this step, the results of forecasting from historical data will be determined by taking into account the following principles: i.If the result of fuzzification in the year  is   and there is only one fuzzy logical relationship in fuzzy logical relationship groups where the current condition is   →   , then the forecasted data in the year  + 1 is   where   is the middle value of the interval   .ii.If the fuzzified data from the year  is   and there is a fuzzy logical relationship in fuzzy logical relationship groups where the current state is   , then the forecasted data in the year  + 1 is calculated using the following formula: 1 ,  2 ,  3 , … ,   is the number of fuzzy logical relationships in fuzzy logical relationship groups, and  1 ,  2 , ...,   is the middle value of the interval  1 ,  2 , ...,   .
iii.If the result of fuzzification in the year  is   and there is a fuzzy logical relationship in fuzzy logical relationship groups where the current state is   → Ø, the symbol Ø is expressed as an unknown value, then the forecasting data for the year  + 1 is   , where   is the middle value of the interval   .Evaluation of forecasting results is the last stage of the ACFLR method which is a comparison and measurement process in order to determine the accuracy of forecasting results from actual data, there are several methods to calculate the level of accuracy in a forecast, one of which is the Mean Absolute Percentage Error (MAPE) [31].MAPE is a more ideal way to calculate error, because it states that the percentage of error of forecast results against actual conditions over a certain period shows a low percentage.MAPE calculation is done by calculating the difference between the original data and the forecasting data divided by the amount of original data that is absolutized and then calculated into a percentage.The MAPE value is obtained from the calculation using the following formula: Information:  : Amount of data Ŷ  : Forecasting value in year to    : Original value in year to According to [31] If the resulting MAPE is <10%, the forecasting result is perfect.The forecasting result is good if the resulting MAPE is between 10%-20%.If the resulting MAPE is between 20%-50%, then the forecasting result is said to be sufficient.Furthermore, if the resulting MAPE is >50%, the forecasting result is said to be inaccurate, and it is advisable not to use it.

RESULTS AND DISCUSSION
The data used by the authors in this study is secondary data on the number of tuberculosis sufferers from 2017 to 2021 sourced from the Kampar District Health Office Riau Province.The population of this study was Kampar District, and the sample used by the author was 21 health centers in Kampar Regency.Data on the number of tuberculosis patients in Kampar Regency Riau Province from 2017 to 2021 experienced fluctuations.Here is the data on the number of tuberculosis sufferers from 2017 to 2021.Forecasting the number of TB patients using the ACFLR method consists of three main stages.The following are the first stages, namely: 1. Applying an automatic clustering algorithm to cluster historical data on the number of TB sufferers from 2017-2021 into intervals.The steps performed are: i. Sort data from smallest to largest.Data order: 938, 940, 1,006, 1,069, 1,079.
Based on the data, the average_diff value is obtained by using the formula in the Equation ( 1 Based on research [15], So that the following results are obtained: iii.Based on the definition of the fuzzy set in step (ii), the number of TB patients is fuzzified by looking at the data on the number of TB sufferers based on the year and then fuzzification based on the results of the sub-interval in step (v) of the first stage so that results are obtained as shown in the table below:  Based on the results of step (iv) in the second stage, the results of forecasting the number of TB patients were obtained by looking at the results of the fuzzy logical relationship year  to year  + 1 then the forecasting value was seen from the midpoint of the year  + 1 fuzzification results so that the results were obtained in the table below: Based on Table 4, it can be seen that the results of forecasting the number of TB sufferers in 2022 are 944 people.A comparison between the actual data and the forecasting result data can be seen in the chart below: In Figure 3, the graph shows that the forecasting data pattern is almost the same value as the actual data pattern.
3. The last stage of the ACFLR method is to determine the forecasting error value using the formula on Equation (5) So, forecasting errors are obtained from each year as follows: Based on the results of Table 5 using the calculation of the forecasting average MAPE error, this shows that the forecast results are close to the actual data.Based on the calculation criteria, MAPE = ∑ 0,00441 × 100% = 0,0882, the resulting MAPE error is excellent because the error obtained is <10%.Based on the analysis, the forecast result obtained in 2022 amounted to 944 tuberculosis sufferers.The results of this forecasting use to anticipate an event against a program and as a benchmark for the number of tuberculosis sufferers in Kampar Regency Riau Province.So that appropriate initial steps can be taken by the government and medical professionals to stop the rise in tuberculosis.

CONCLUSION
Based on the analysis and discussion results using the ACFLR method, 944 people were obtained from forecasting the number of TB patients in Kampar Regency in 2022.Forecasting error value of 0.0882 the accuracy rate of forecasting results was 99.9118, this means that the forecasting results using the ACFLR method have an excellent level of validity and the increase in the number of TB sufferers in Kampar Regency from 2021 to 2022 was 4 people.The number of people with tuberculosis

Actual data
Forecasting data

Figure 2 .
Figure 2. Data on the Number of TB Patients in 2017-2021 in Kampar Regency

1 = 2 ,
ii.Based on the average_diff value, a cluster will be formed from data that has been sorted using the three principles and with Equation (then the clustering results are obtained as follows:a.First Cluster = {938 ; 940} b.Second Cluster = {1.006}c.Third Cluster = {1.069; 1.079} iii.Based on the clustering results in step (ii), the clustering members are adjusted again and obtained as follows: a.First Cluster = {938 ; 940} b.Second Cluster = {970,75 ; 1.041,25} c.Third Cluster = {1.069; 1.079} iv.Based on the cluster results in Step (iii) the following intervals are obtained: a.First Interval = [938; 940) b.Second Interval = [940; 970.75) c.Third Interval = [970.75;1,041.25)d.Fourth Interval = [1,041.25 ; 1,069) e. Fifth Interval = [1,069; 1,079] v. Divide the interval into sub-intervals.In this study, the author used the value =5 because the data used was 5 years.Based on the results of the division of the above intervals, the following sub-intervals are obtained a.

Figure 3 .
Figure 3. Graph of Actual Data Pattern Difference in Actual Data with Forecasting Results Data 1, … }, … , {… }, {… ,   }, {  },   , ...,   If   -  ≤ average_diff and   -  ≤   -  then   is put into the current cluster that contains   .If not, then a new cluster forms for   and assigns a new cluster that loads   into the current cluster.3. Based on the clustering in Step (2), readjust the members of the cluster using the following principle: i. Principle 1: If a cluster has more than two data, then the smallest numerical data and the largest numerical data are retained and delete the other data.ii.Principle 2: If a cluster has only two numerical data, then both data are preserved.iii.Principle 3: If on a cluster only there is one numerical data   , then the values "  -average_diff " and "  + average_diff " are placed in the cluster and   removed from cluster.If the following situations occur, the cluster needs to be adjusted again: a. Situation 1: If principle 3 occurs on the first cluster, then delete the value from "   -average_diff " and set   it as its replacement.b.Situation 2: If principle 3 occurs on the last cluster, then delete the value from "   + average_diff " and set   the value as its replacement value.c.Situation 3: If the value of "  -average_diff " is smaller than the smallest data in the previous cluster then principle 3 does not apply.4. Transform the formed clusters into adjacent intervals.It is assumed that the result of clustering in Step (3) is as follows: { 1,  2 }, { 3,  4 }, { 5,  6 }, ... , {  }, { ,   }, ... , { −1,   } The clusters that have been formed are converted into intervals with the following sub-steps: i.The first cluster { 1 ,  2 } is converted into the interval [ 1 ,  2 ).ii.If the current interval [  ,   ) and the current cluster { ,   } , then a.If   ≥   , then an interval [  ,   ) is formed which is set to the current interval and the next cluster {  ,   } , set to the current cluster.b.If   <   , then {  ,   } it is converted into the interval [  ,   ) and a new interval [  ,   ) is formed between [  ,   ) and {  ,   }. [  ,   ) is set to the current interval and next cluster {  ,   } is further set to the current cluster.If the current interval [  ,   ) and the current cluster {  }, then the current interval [  ,   ) is converted into [  ,   ) and set to the current interval and the next cluster is current cluster.c.The last interval is the closed-hose interval [  ,   ].

Table 1 .
Results of the universe set and middle value The second stage is to calculate the forecasting value with the following steps: i. Determine the set of universes, i.e.,  = [938 ; 1.079].Furthermore, calculate the middle value in a way so that based on step (v) in the first stage, the result of the middle value on the Table 1 for each interval     .With is as follows: 1 ≤  ≤ 25 ii.It is assumed that there are intervals.Next, define each  1 ,  2 , … ,  25 fuzzy set   where 1 ≤  ≤  and  = 25 by using the formula in equation (

Table 2 .
Fuzzification Results of the Number of TB Patients in 2017-2021 The next step is to determine the Fuzzy Logical Relationship of the fuzzification results in Table 2 by connecting the fuzzification results year  to year  + 1 so that results are obtained, as shown in the table below:

Table 3 .
FLR Results on the Number of TB Patients in 2017-2021 After the fuzzy logical relationship is formed, the next step is to group the fuzzy logical relationship into a fuzzy logical relationship group based on the results of the smallest to largest fuzzy logical relationship so that the following results are obtained: Group 1 :  1 →  6 Group 4: →  21  25 Group 2 :  6 → Ø Group 5 : →  25  13 Group 3 :  13 →  1 v.

Table 4 .
Results of Forecasting the Number of TB Patients in 2022

Table 5 .
Results of Calculating the Forecasting Error Value for 2018-2021