Volume 18, Issue 1, Winter and Spring 2021

Robust Weighted Approaches to Detect and Deal with Outliers in Estimating Principal Component Regression Model

Esraa Najeeb Alsaraf; Bashar Abdulaziz AL-Talib

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 1-13
DOI: 10.33899/iqjoss.2021.168371

This paper aims to propose an approach to deal with the problem of Multi-Collinearity between the explanatory variables and outliers in the data by using the method of Principal Component Regression, and then using a robust weighting functions for the objective function has been used to deal with the presence of outliers in the data, and in order to verify the efficiency of the estimators, an experimental study was conducted through the simulation approach, and the methods were also applied to real data collected from the files of Badoush Cement Factory in Nineveh Governorate for the period from (2008-2014) with nine explanatory variables representing the chemical properties of cement and a dependent variable representing the physical properties of cement (hardness).
The data was tested whether it was suffer from multi-collinearity problem and then the least squares using principal components as an explanatory variables and the model was estimated, and it was found that the variables suffer from Multi-Collinearity problem, and the treatment was done by applying principal component regression weighed by robust weights due to the presence of outlying values in the data in addition to the collinearity problem.

Output Error Dynamic Models Identification and Transfer Function - A comparative study –

Sarah Muwaffaq Abdel Qader; Heyam Abdel Majeed Hayawi

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 14-20
DOI: 10.33899/iqjoss.2021.168372

In this research, the transfer function models were diagnosed based on seasonal data represented by solar brightness as inputs and temperatures as outputs, where a double seasonal model was obtained which was used in diagnosing the transfer function models as well as diagnosing the output error models represented by the (OE) model and the (BJ) model and compared the results. Finding that the dynamic systems give a better diagnosis than the transfer function models, depending on some statistical criteria 

Identification State Space models and some Time Series models

Zeina Assem; heyam Abdel Majeed Hayawi

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 30-37
DOI: 10.33899/iqjoss.2021.168374

In this research, a comparison of the identification process for time series models represented by ARIMA models was studied by identification several models and choosing the best model based on some statistical criteria and one of the dynamic models represented by state space models was through identification several models with different ranks and choosing the best model based on statistical criteria. On data handled by Box & Jinkins researchers, namely,  is the input variable and represents the leading indicator, and  represents the output variable, which refers to sales, and includes 150 pairs of inputs and outputs. Time frame, depending on statistical criteria.

On Pareto Set for a Bi-criterion Scheduling Problem Under Fuzziness

Ayad Ramadan

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 38-44
DOI: 10.33899/iqjoss.2021.168375

In this paper a bi-criterion fuzzy scheduling problem was presented and the problem under consideration is total fuzzy completion time and maximum earliness, where the processing times and the due dates are triangular fuzzy numbers. Each of  jobs is to be processed without interruption on a single machine and becomes available for processing at time zero. A new definition of fuzzy numbers was given namely m-strongly positive fuzzy numbers, through this definition we found an interval which restricts the range of the fuzzy lower bound and presented a relation between the fuzzy lower bound and the fuzzy optimal solution with number of efficient solutions. Also we found the exact solution of the problem through finding the Pareto set.

ODED: Outlier Detection in Educational Data

Ammar Thaher Yaseen Al Abd Alazeez Al Abd Alazeez

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 45-54
DOI: 10.33899/iqjoss.2021.168376

Clustering data streams is one of the prominent tasks of discovering hidden patterns in data streams. It refers to the process of clustering newly arrived data into continuously and dynamically changing segmentation patterns. The current data stream clustering algorithms are lacking general clear steps for analysing new incoming data chunks. However, the majority of existing data stream solutions are adapting the clustering methods of static data to work with data stream setting. The main issue of concern is to propose a solution can improve the performance of existing approaches and present correct clusters and outliers. Data arriving in streams often contain outliers, which may have equal importance as clusters. Thus, it is desirable for data stream clustering algorithms to be able to detect the outliers as well as the clusters. The data stream clustering algorithms should be able to minimise the effects of noise and outliers data in a given dataset. This article presents a stream mining algorithm to cluster the data stream and monitor its evolution. Even though outlier detection is expected to be present in data streams, explicit outlier detection is rarely done in stream clustering algorithms. The proposed method is capable of explicit outlier detection and cluster evolution analysis. Relationship between outlier detection and the occurrence of physical events has been studied by applying the algorithm on the education data stream. Experiments led to the conclusion that the outlier detection accompanied by a change in the number of clusters indicates a significant education event. This kind of online monitoring and its results can be utilized in education systems in various ways. Viber education data streams produced by Viber groups are used to conduct this study.

Using the linear and non-linear discriminant function with cluster analysis to study the level of education for the completed stages (governmental – private) In Nineveh Governorate

Zainab Adel; Safwan Nathem Rashed

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 88-98
DOI: 10.33899/iqjoss.2021.168377

The basic idea of this research is to study the influential variables that led to the differences between the performance of public and private schools, and through the use of some linear statistical methods, which are the function of discriminatory analysis and nonlinear methods represented by the function of logistic regression and cluster analysis, where the comparison between the two groups of public and community schools and know the appropriateness of each of the methods used with the applied data of the study community, which includes data The finished stage for the school year(2018-2019) of each of the schools (primary- middle- preparatory(Bio)) in the Nineveh Governorate Education Directorate, to determine the group and the best stage in addition to identifying the variables that affect the performance in order to reach improvement and standardization of the educational path.

Beta Control Chart for Monitoring Proportions With Application

Aseel Abed Al-Kareem; Khalida Mohammed

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 56-64
DOI: 10.33899/iqjoss.2021.168379

 In this paper we use a beta Chart to monitor fracture data. The beta Chart displays its control limits based on the beta probability distribution. This Chart was applied to a data set of contaminated peanut proportions. With toxic substances for 34 batches weighing 120 pounds and then comparing it with the traditional Shewart Chart ( p- Chart ), then a sensitivity study is performed to compare both Charts in two cases: under control and out of control. Using several values of average ratios and with different sample sizes, the evaluation is based on one of the criteria that measures the efficiency of the the evaluation is done based on one of the criteria that measures the efficiency of the Chart 
 which is the Average Run Length (ARL) for both cases. , and the operating average in the first case is a function of the type one error  is for comparing Charts and in order to discover the shift in the proof  the first type and in the second case it is a function of the type two error The second  cases Sensitivity analysis using several values of the fracture rate confirmed the superior performance of the beta Chart compared to the p Chart, resulting in the proposed approximation the value of the average operating length of ARLo in the control condition slightly greater and that the value of the average operating length in an out sidesce nario ARL1 controlismuchsmaller                                                                                                                                                   

Using fuzzy dynamic programming in finding the best solution for sales for Badoush cement factory stores.

Ahmed Jalal; Zena Yahya

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 66-73
DOI: 10.33899/iqjoss.2021.168380

Dynamic programming is considered one of the most appropriate ways to manage storage systems because of the random and linear nature of these systems, so that it is difficult to reach the optimal solution with other optimization techniques and other techniques used. The first to clarify the issue of reservoir management was Moran in 1945. The issue of the optimal operation of tanks in the form of a dynamic programming issue was done by the scientist Little in 1955. In this research, the data available in Badoush Cement Factory was used to construct a dynamic programming model to find the optimal operation of the plant after processing the fog surrounding the request.

Employing Robust MM-estimators in Estimating Principal Component Regression Model - A Comparative Study

Esraa Alsaraf; Bashar A. AL-TALIB

IRAQI JOURNAL OF STATISTICAL SCIENCES, 2021, Volume 18, Issue 1, Pages 74-87
DOI: 10.33899/iqjoss.1970.170002

This paper focuses on proposing the use of robust MM estimators in estimating the parameters of the principal component regression model, which is usually used in estimating the regression model when the explanatory variables are not independent.  even in the presence of leverage points in the data and gives estimators with good efficiency, this estimator has been called the MM estimator, referring to the fact that more than one M estimator is used to obtain the final estimator as the estimation is done using the Iteratively Re-weighted Least Square (IRLS) method