Factor Analysis (like the similar technique of Principal Components Analysis) is a multivariate statistical method used primarily for data reduction. Itis usually employed to reduce the columns of our data matrix, that is the variables measured for each respondent. The reduction Is done by grouping together those variables which are intercorrelated as measured by the coefficient of correlation.
In our data matrix we have one variable of particular interest and we wish to see how it is affected by movements in a single explanatory variable. For example, the variable we might be interested in is Sales in £mn and we may wish to know how it is affected by advertising expenditure in £000's. The rows of our data matrix may consist of the last seven years of data for these two variables.
This paper looks at the various sources of bias, discusses how to reduce them or, at least, how to measure them.
In this and the following paper, the phenomena associated with and the nature of "survey error" is explored.
In survey research it is very rare for all respondents in a given population to be interviewed. We usually take a sample of that population. The reason why we can do this is because a sample can give us, not necessarily the accuracy of a census (or full count), but sufficient accuracy for prediction purposes. This is true if the sample is representative of the population from which it is drawn. There are various sampling methods that can be used if we wish to obtain a representative sample. Such samples can give, depending mainly on the size of the sample, results to given levels of precision.
One of the most important uses of statistical analysis is to investigate the associations or relationships between different variables. Understanding these relationships is of importance to an investigator for several reasons: It helps in the understanding of the phenomenon or phenomena under investigation. It gives insights into possible causal mechanisms between the variables. It is an important step in the construction statistical models which relate the variables to each other. ⢠These models may be used to improve the quality of predictions. The study of association in Statistics falls into two broad areas: 1. Correlation between two or more variables, and 2. Association between two or more categories in a frequency table.
Questions may be presented as open-ended, dichotomous or multiple choice. Which Is the best format? This depends very much on the aim of the particular question, for instance whether you are looking for as wide a range of responses as possible, or sensitivity In highlighting different answers, or just to act as a filter for questions which will follow. One factor is certain: In survey research there is a strong practice: a) case for closed or multi-choice questions because they allow pre-coding. This means faster production and lower costs. Pre-coding also means more likelihood of valid data: there are fewer opportunities for lapses of memory on the part of the respondent and for Incorrect recording by the Interviewer. Having said this, it is equally true to say that too many questions In one format can lead to a boring questionnaire, so try to be like a good cook and provide some variety!
How should I decide on the sample size for a survey? That is a question often posed by survey researchers to statisticians. It is difficult to answer simply as in market research we carry out surveys which more often than not carry a large number of different questions. There may be questions which are more important than others and hence need to be answered with a higher level of precision. A good starting point therefore is to consider the most important item to be measured by a proposed survey. For the moment we will assume that the survey is to be carried out using a Simple Random Sample and that the survey result is a percentage.
Questionnaire design must be regarded as one of the most critical phases of a market research survey because if the required information is not covered adequately or if the questions are not asked properly then no amount of clever interviewing or ingenious post-analysis, either by man or by computer, can put things right. In this respect. It allows even less flexibility than samplIng. To construct a good questionnaire is a task which is both difficult and interesting. There is a veritable minefield of obstacles to be overcome, but at the same time there exists a series of quite simple rules to guide us through these successfully. Therein lies part of the interest. The other interest lies in the fact that each new survey brings with It Its own particular set of problems. The challenge is a real one each time.
The simplest form of analysing data is to form survey tabulations. This is done by counting the number (and percentage) of people that fall in to the predefined categories of our questionnaire. The basic tool for the survey analyst is the cross tabulation in which one or more questions on the questionnaire form the rows of the cross tabulation and one or more different items form the columns. The simplest form would be where one question forms the rows and one demographic forms the columns.
Discriminant Analysis Models are very similar to regression models, but differ in one important respect. In Discriminant Analysis the key variable of interest Y is now categorised (eg, Buyer = 1, Non-buyer = 0) rather than being continuous like sales.
Clustering normally operates on the rows of the data matrix. It seeks to find groupings or clusters of respondents who exhibit similar patterns in terms of the variables measured.