Keywords

1 Introduction

Fruit quality is one of the most important factors that dictates the economic value and directly affects the market competitiveness of the fruit. China firmly remains as the world’s superpower in terms of fruit trees, fruit yield, and planting area [1, 2]. However, the proportion of fruit exports trail behind those of other countries [3]. A study found that, on one hand, given the country’s lagging fruit quality evaluation and sorting technology [4], grading of fruit quality is disordered, and resulting quality is uneven. Single fruit indices, such as strawberry [5] and apple [6] soluble solids, sweet orange titratable acid [7], and kiwi fruit hardness [8], which were analyzed by scholars. However, single indicators can only evaluate the quality of one aspect of the fruit but fails to meet the requirements of fruit quality evaluation, because these indicators cannot be used in comprehensively evaluating fruit quality and present certain limitations. On the other hand, fruit quality evaluation contains numerous factors, including the internal and external quality factors of fruit, are involved in evaluating fruit quality, and each factor present close correlation and relative independence, resulting in difficulty in conducting fruit quality evaluation and grading work. In view of the above problems, the fruit quality data mining analysis of the indicators of hierarchical method and classification system, and relevant evaluation methods can be used to simplify quality indicators, extract the main evaluation factors, and simplify the evaluation process. The use of data mining study on comprehensive evaluation methods for fruit quality has become a research hotspot in recent years, scholars have used data mining methods on fruit quality of Nanfeng tangerine [9], apple [10], pineapple [11], pear [12] and other fruit. Results show that data mining method can be used to effectively evaluate fruit quality. At present, the literature on fruit quality data mining is rarely reported. In this paper, data mining methods applied on fruit quality in recent years were reviewed and analyzed. Finally, the main evaluation factors of common fruits were consolidated for evaluating fruit quality research and providing a reference.

2 Main Fruit Quality Indicators and Access Methods

Fruit quality includes the appearance and the intrinsic qualities. The main evaluation indices include fruit shape index, fruit weight, fruit color, fruit firmness, soluble solids, vitamin C, and others, as shown in Table 1. These indicators are representative of the different aspects of fruit characteristics. Close relationships exist among these indicators, such as total sugar, including soluble solid matter represented by sucrose and other reducible carbohydrate carbonyl components, which denote different attributes that are also related. At present, methods for obtaining quality indices mainly involve chemical and instrument measurement methods, but the difficult quantitative analysis indices of fruit flavor can be obtained only by depending on expert scoring.

Table 1. Main index and access methods of fruit quality

3 Data Mining Overview

3.1 Simple Mathematical Method

For simple-featured and small amounts of data, existence of the unknown and potential information can be handled by simple mathematical processing method, such as mean, percentage, classification method. A simple data processing method can mine the data set of potential, valuable information.

3.2 Mathematical Statistics Method

Statistical analysis is mainly used to complete knowledge summary and relational knowledge mining. For some data, a function or relationship that cannot be expressed in a function exists. At this point, implicit data information can be excavated by using mathematical statistics method. Common methods include regression analysis, correlation analysis, and principal component analysis.

3.3 Artificial Intelligence Method

For large amounts and particularly complex data sets, a general data mining method cannot obtain the data set of implicit information. At this point, we can use artificial intelligence method of data mining, which is extremely complex. The main methods include fuzzy evaluation, association rules, and clustering analysis.

4 Fruit Quality Data Mining Method

4.1 Single Evaluation Method

  1. (1)

    Fuzzy evaluation method

Fuzzy evaluation method is influenced by numerous factors so as to conduct a comprehensive evaluation of a highly effective multi-factor decision method. One characteristic of this method is that, instead of an absolutely positive or negative evaluation result, fuzzy sets are used to represent the results [13]. The advantages of fuzzy evaluation are that we can quantify several qualitative indices, overcome the disadvantages of qualitative analysis, and objectively and accurately evaluate the pros and cons of varieties [14]. Its disadvantages include information duplication problem, which is caused by the unresolved correlation between the evaluation indexes. Thus, the confirmation of membership function and fuzzy correlation matrix, among others should be studied in the present research [15]. This method is mainly used in fruit quality identification and breeding of good varieties, and is presently applied in the quality evaluation of longan [16], persimmon [17], and other fruits.

  1. (2)

    Analytic hierarchy process method

The analytic hierarchy process is a multi-objective decision analysis method that combines qualitative and quantitative analysis methods [18]. The main concept of this method is to decompose the complex problem of fruit quality evaluation into several levels and factors. Comparison between two indices is essential for judgment. The judgment matrix is established by computing the largest eigenvalue in the matrix, and corresponding eigenvectors can indicate the different degrees of importance weights and provide a basis for selecting the optimal evaluation index. One advantage of the analytic hierarchy process is that not only the weight coefficient of each evaluation index is obtained but simultaneous filtering by accidental factors determines the perception of differences and the different dimension of factors in a unified evaluation system with high reliability and small error. On the other hand, one disadvantage is the limited number of fruit indicators, with the maximum generally being 9. This method has been applied in the cultivation of good varieties of jinxixiaozao [19], pear [20], and other fruits.

  1. (3)

    Gray correlation degree analysis

Correlation analysis is the main tool in grey correlation analysis method using the grey system theory for the comprehensive evaluation of the research object. The correlation coefficient and correlation between the sequence of numbers and the reference sequence are compared to determine the primary and secondary factors and their correlation degree [21]. This method offers the advantages of simplicity, ease of operation, and intuitiveness. On the other hand, its disadvantages include strong subjectivity and difficulty in determining certain optimal values. This method is mainly used in the situations where in the index correlation between is too high. Grey correlation degree analysis method has performed an important function in the comprehensive evaluation on muskmelon [22], peach [23], amomum [24], and other fruits.

  1. (4)

    Principal component analysis

The goal of principal component analysis is to secure the data under the principle of minimum information loss and convert the more original data and related indicators into new, fewer data at smaller orthogonal transforms to each other or comprehensive indices with slight correlation to simplify the evaluation process [25, 26]. Principal component analysis presents advantages of calculating the comparison standard, capability of being realized on the computer and using special software for analysis. Its disadvantage is that the new comprehensive index is difficult to explain, and the general method of combining clustering is used. Principal component analysis method is mainly used for more quality indicators, and the correlation among the indices of strong case and multiple correlated stochastic variables according to the main component of the contribution rate are simplified into several variables to avoid traits and related traits caused by error evaluation [29]. At present, the analysis is used for the comprehensive evaluation for selecting fruit quality evaluation factors and fruit quality [27, 28].

4.2 Hybrid Evaluation Method

  1. (1)

    Principal component cluster analysis method

For the multi-index evaluation of sorting fruit quality, the variance contribution of the first principal component F1 rate is not sufficiently high. In other words, the first principal component expression of original data information is not large enough, only the first principal component scores for evaluating the sample sort are one-sided. At this point, the two methods of combining principal component analysis and clustering analysis are combined to form “principal component clustering analysis method”. As an advantage, the method can extract multiple indicators simultaneously with most of the information, prevent the artificial selection evaluation factor of subjectivity, and provide a true reflection of varieties of comprehensive characteristics so as to offer an objective basis for breeding materials [29]. One disadvantage is clustering difficulty when the data is too large. Principal component cluster analysis method can effectively extract the main quality factors, simplify the fruit quality evaluation work, and provide theoretical basis for fruit speed measurement. The method has been used for tomato [30] and Lee apricot [31] quality rapid detection.

  1. (2)

    Rationalization-satisfaction degree and multiple value method

The so-called “reasonable–satisfaction” refers to fruit varieties that demonstrate the characteristics of satisfaction that people need. The reasonable degree is1 if a characteristic species is in full compliance with the “rule”. If not in line with “rule”, then the reasonable degree is 0 [32]. The advantages of the algorithm are simplicity, ease of calculation, and the ability to distinguish between good quality and poor quality. Its disadvantage is larger algorithm error. The algorithm objectively and accurately reflects the people’s needs and satisfaction degree of fruit quality. The method can not only be used as a method to identify fruit quality but also can be used as a reference value of fruit tree breeding species, especially for the breeding of commercial varieties. At present, the method has been used on pear [33] and other fruits for the cultivation of good varieties.

  1. (3)

    Principal component cluster combined with rationalization-satisfaction multidimensional value analysis theory of merger rules

The algorithm presents new ideas and methods of comprehensively evaluating fruit quality in combination with principal component analysis, cluster analysis, and multidimensional value theory “reasonable–satisfaction” composite evaluation method. This method can be used to extract the main factors of common fruit so as to simplify the evaluation process. Moreover, the method can be used for fruit breeding. Its disadvantage is that the method computation is trivial, complex, and requires large amounts of calculation. This combined method has been applied in mango [34] fruit quality assessment factor selection and simplify the work of mango fruit quality evaluation.

4.3 Parts of Comprehensive Fruit Evaluation Factors

Table 2 summarizes the evaluation factors and the use of the method of data mining for fruit quality after the main evaluation factors of fruit in certain literature. Numerous fruit quality indicators, the presence of fruit quality evaluation using a single index presents certain limitations and evaluation of all indicators inevitably requires too much work. Fruit quality data mining methods can effectively reduce fruit evaluation indices and simplify the evaluation process. Jiyun Nie, et al. [35]. used principal component analysis to select five indicators of the contribution rate of more than 95.75 % of the previous four components reflecting apple quality as the main evaluation factors. Haying Zhang, et al. [36], simplified 19 peach quality indicators for five items according to principal component analysis, clustering analysis, and the national standard of GB - 10653-1989 regarding “the fresh peach” requirements indicators. Table 2 shows that the use of data mining methods can effectively reduce the evaluation index, provide good evaluation of fruit quality, and solve the problems of limited single index evaluation and hefty workload of multi-index evaluation. The approach provides new ideas and methods for the evaluation of fruit quality.

Table 2. Part of the fruit of evaluation factors and main evaluation factor

5 Conclusion and Prospect

As people’s living standards continue to improve, the demand for fruit quality keeps growing. The search for rapid, simple methods of evaluating fruit quality has become a hot topic in the field of fruit quality analysis. The composition of fruit quality evaluation factors is too numerous, and different degrees of correlation and relative independence exist among and between different quality factors. The use of single quality index to evaluate the quality of fruit exist certain limitations. Moreover, a single indicator can only explain the quality of fruit in a certain aspect but cannot evaluate the overall quality of fruit. Through the use of data mining methods can combine multiple quality metrics for the comprehensive evaluation of fruit quality to obtain a comprehensive and objective assessment. In actual fruit quality assessment process, the use of relevant data mining method to determine the main fruit quality evaluation factors of common fruits can substantially reduce the workload of fruit quality appraisal.

Data mining method provides a new thinking and approach to the selection and breeding of fruit. First, data mining can identify the good traits of prominent fruit varieties, which can provide hybrid parent reference for the improvement of the fruit quality. Second, for single specific varieties, data mining methods can distinguish between fruit quality and provide a basis for directional breeding, further improvement of fine varieties, thereby yielding more excellent varieties. For the comprehensive evaluation of overall poor quality or general varieties and given the highly prominent individual quality, selection of a specific function and strong varieties not only will aid in improving the level of comprehensive utilization of fruit but is also conducive to determining the different uses and maximizing the performance of fruits on the basis of quality characteristics.

In recent years, with the development of cloud platforms, computers, massive databases, networking, and other technologies, data mining will perform a more important function in fruit production, distribution, sales, and consumption sectors. Although various studies have thorough, all kinds of fruit quality assessment method of research are more mature, but the existing methods are still hard to meet the needs of actual production and consumption. Therefore, finding a rapid and easy method for evaluating fruit quality remains a hot topic in the field of fruit quality analysis. The application of data mining method to more fruits and to extract major evaluation factors and simplify the evaluation process will become a new direction in fruit quality research.