Page length requirements: 2–3 pages
This exercise is a continuation of the data mining project introduced in the Module Two Exercise.
Your Assignment
Open the Bubba Gump survey data in JMP. Examine the data set and prepare an analytics project plan that describes the survey data set and how it will be used to address the stated business problem.
Specifically, the summary should:
* Include a description of the population from which the sample was drawn, the sources of data that were combined to construct the sample, the number of customers in the sample, and descriptions of the variables that exist in the data set.
* From plots and graphs (generated using JMP, with continuous variables appropriately binned) of the distribution of values for each of the variables in the Bubba Gump sample, describe instances where data may be missing or defective or where variables may contain extreme outliers that affect the usefulness of the survey in a data mining exercise.
* Identify correlations and associations, using pairwise correlations and principal components analysis, that would be useful to measure as part of the pre-analytics process, including descriptions of the benefits of each.
* Describe how the data set supports analyses that address the stated business problem, and also describe any shortcomings in the data set that might limit its usefulness in a data mining exercise.
Comments
Content
Page length requirements: 2–3 pages
This exercise is a continuation of the data mining project introduced in the Module Two Exercise.
Your Assignment
Open the Bubba Gump survey data in JMP. Examine the data set and prepare an analytics project plan that describes the survey data set and how it will be used to address the stated business problem.
Specifically, the summary should:
* Include a description of the population from which the sample was drawn, the sources of data that were combined to construct the sample, the number of customers in the sample, and descriptions of the variables that exist in the data set.
* From plots and graphs (generated using JMP, with continuous variables appropriately binned) of the distribution of values for each of the variables in the Bubba Gump sample, describe instances where data may be missing or defective or where variables may contain extreme outliers that affect the usefulness of the survey in a data mining exercise.
* Identify correlations and associations, using pairwise correlations and principal components analysis, that would be useful to measure as part of the pre-analytics process, including descriptions of the benefits of each.
* Describe how the data set supports analyses that address the stated business problem, and also describe any shortcomings in the data set that might limit its usefulness in a data mining exercise.