+ - دوره ۱۱ ‏()
+ - دوره ۱۱ ‏(۱۳۹۶)
+ - دوره ۱۰ ‏(۱۳۹۵)
+ - دوره ۹ ‏(۱۳۹۴)
+ - دوره ۸ ‏(۱۳۹۳)
+ - دوره ۷ ‏(۱۳۹۲)
+ - دوره ۶ ‏(۱۳۹۱)
+ - دوره ۵ ‏(۱۳۹۰)
+ - دوره ۴ ‏(۱۳۸۹)
+ - دوره ۳ ‏(۱۳۸۸)
+ - دوره ۲ ‏(۱۳۸۷)
+ - دوره ۱ ‏(۱۳۸۶)
ارزيابي و مقايسه عملکرد روش‌هاي داده کاوي در تخمين شاخص کيفي SAR (مطالعه موردي: رودخانه آجي چاي آذربايجان شرقي)
عنوان (انگلیسی): Performance evaluation and comparison of data-mining methods in estimating SAR quality index (Case study: Ajichay river in East Azerbaijan)
نشریه: پژوهش آب ايران
شماره: پژوهش آب ايران (دوره: ۱۰، شماره: ۳)
نویسنده: رضازاده جودی، علی ، ستاری، محمدتقی
کلیدواژه‌ها : كيفيت آب ، مدل درختي قوانين M5 ، نسبت جذبي سديم. ، ماشين بردار پشتيبان
کلیدواژه‌ها (انگلیسی): Support Vector Machine , Water Quality. , Sodium Adsorption Ratio , M5 Model Tree Rules
چکیده:

آب پاک يکي از عوامل مهم توسعه هر منطقه است. با توجه به قرارگيري ايران در منطقه گرم و خشک و کمبود منابع آب،‏ حفاظت و تأمين کيفيت آب لازم براي مصارف مختلف اهميتي دو چندان دارد. به طور‌معمول ارزيابي کيفي آب‌هاي سطحي پرهزينه و زمان‌بر بوده و انتخاب روشي که در آن با حداقل پارامترهاي هيدروشيميايي بتوان پيش‌بيني به نسبت دقيقي از کيفيت آب داشت،‏ ترجيح داده مي‏شود. يکي از مهم‌ترين پارامترهاي کيفي آب در زمينه فعاليت‌‌هاي کشاورزي،‏ نسبت جذبي سديم (SAR)‎ است که تخمين و ارزيابي دقيق مقدار آن بسيار ضروري است. در اين بررسي،‏ امکان‌سنجي تخمين شاخص کيفي SAR در رودخانه آجي چاي در منطقه آذربايجان شرقي با استفاده از پارامترهاي هيدروشيميايي مختلف با مدل درختي قوانين M5 و ماشين بردار پشتيبان بررسي شد. براي بررسي دقت مدل‌هاي M5 و ماشين بردار پشتيبان از چهار آماره‌‌ ضريب همبستگي (R)‎،‏ نش- ساتکليف (NSC)‎،‏ جذر ميانگين مربعات خطا (RMSE)‎ و ميانگين خطاي مطلق مقادير (MAE)‎ استفاده شد. مقادير اين آماره‌ها براي روش ماشين بردار پشتيبان (98‎/0R=،‏ 97‎/0N-SC=،‏ (mg/l)‎22‎/6RMSE= و (mg/l)‎06‎/6MAE=) و براي مدل M5(98‎/0R=،‏ 96‎/0N-SC=،‏ (mg/l)‎33‎/7RMSE= و (mg/l)‎9‎/3MAE=) محاسبه شد. نتايج مقايسه نشان داد هر دو روش عملکرد خوبي در تخمين ميزان SAR داشته‌اند،‏ اما مدل درختي قوانين M5 در محدوده داده‌هاي مورد استفاده روابط خطي ساده و کاربردي‌تر ارائه مي‌کند.

چکیده (انگلیسی):

Clean water is one of the important factors in any region's development. Since Iran is located in an arid and semi-arid area with scarce water resources, preservation of water required for various uses and maintenance of its quality takes redoubles this importance. Evaluation of surface water is normally a costly and time-consuming process. Therefore, a method is preferred which has the minimum number of hydrochemical parameters and can yield a relatively accurate prediction of water quality. One of the most significant qualitative parameters of water for agricultural uses is the sodium absorption ratio (SAR), the factor which should be estimated and evaluated accurately. This research employed various hydrochemical parameters, a model tree using the M5-Rules, and a Support Vector Machine to study the feasibility of estimating the qualitative index SAR in the Ajichai River located in East Azerbaijan Province. The four statistics of correlation coefficient (R), Nash-Sutcliffe coefficient (NSC), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) were used to determine the accuracy of both M5 model and the Support Vector Machine.
The study region was the Ajichai River on the northern hillsides of the Sahand Mountain. Hydrochemical data from the Hydrometric Station in Vanyar was used to evaluate and predict the SAR in the river. The Vanyar Station has the longitude of 46˚24 ́ east, the latitude of 38˚ 7́ north, and the altitude of 1460 meters. Effects of Total Dissolved Solids (TDS), Electrical Conductivity (EC), PH, chlorine (Cl-), sulfate (SO42+), calcium (Ca2+), magnesium ( Mg2+) and sodium ( Na+) parameters on SAR were determined in SAR estimation. The model tree M5-Rules is a new data mining method. The main goal of this model is derived from regression trees. The difference is that this model has regression functions in its leaves instead of constant values and classification tags. The major advantage of the model tree M5-Rules over regression trees is that the model tree M5-Rules is much smaller than regression trees. Furthermore, regression functions normally do not include many parameters. A decision tree usually consists of four parts of root, branches, nodes, and leaves. Each node corresponds to a certain characteristic, and the branches represent values of the intervals. These intervals consider known values for each of the characters. The branching operation takes place with one of the predictor variables. The branching intervals are selected in a way that the sum of squared deviations from the mean of the data in each node is minimized. The branching criterion indicates the amount of the error in the related node, and the model calculates the minimum expected error as a result of each characteristic testing in the related node. The model error is generally assessed by measuring the predicted unobserved target values accuracy. In this research, the WEKA software which is developed at Waikato University in New Zealand was used to model the M5 method. Modeling was performed with this software using the option of M5-Rules which present simple and linear rules. Support Vector Machines are data mining algorithms similar to the model tree M5 and the artificial neural network. There are two groups of Support Vector Machines: Support Vector Classification (SVC) and Support Vector Regression. Furthermore, Support Vector Machines are based on the concept of decision planes that define decision boundaries, i.e. a decision plane separates data with different tags from each other. The goal in a linearization algorithm with the help of a Support Vector Machine, the assumptions of the input value of xi, and the output value of yi is to find a function with the minimum deviation (ε) from the yis (ε is the amount of deviation). In this research, the Statistica software is used for modeling the SAR values employing Support Vector Regression.
In the modeling of the SAR values by using the tree model M5-Rules, the best answer was obtained when 66 percent of the data was allocated to training and the rest to testing. To model the SAR values using the Support Vector Machine, various functions were tested as kernel functions, and it was found that the RBF function exhibited the best performance in the modeling of the SAR values. Among the 10 scenarios studied in this research, the best one was selected. The four statistics of correlation coefficient (R), Nash-Sutcliffe coefficient (NSC), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) were used to determine the accuracy of both M5 model and the Support Vector Machine. The obtained values of these calculations were: R =0.98, N-SC=0.97, RMSE=6.22 (mg/l), MAE=6.06 (mg/l) for the Support Vector Machine method; and R=0.98, N-SC=0.96, RMSE=7.33 (mg/l), and MAE=3.9 (mg/l) for the M5 model. Results of the comparison indicated that both methods studies in this work, i.e. Support Vector Regression and the M5 model, were highly capable of predicting the SAR values in the Ajichai River, using the available data. However, the M5 model is recommended to be used due to the fact that the formulas employed in this method are so simple and linear.

فایل مقاله : [دریافت (509.8 kB)] ‏165 دریافت تاكنون
صاحب امتیاز:
دانشگاه شهرکرد
مدیر مسئول:
دکتر حسين صمدی
سردبیر:
دکتر منوچهر حيدرپور
مدیر داخلی:
دکتر محمدعلی نصراصفهانی