Importing, Sampling

Importing data, starting Enterprise Miner, and drawing samples

The SAS acronym for the data mining process is SEMMA, which stands for these actions: Sample, Explore, Modify, Model, and Assess. Note that this is similar to the middle steps outlined for Data Mining in the first lecture.

Before we can begin these steps, we need to make the database available to SAS. We will proceed by creating SAS data libraries, and storing our data in these libraries as SAS data sets. A SAS library is a location (such as a directory on your hard drive or zip drive) in which you store files in SAS data format. If you have data from another application, such as a database file or spreadsheet file, you will follow these two steps:

Create a SAS data library.
Import your file into the library as a SAS data file.

See the computer material for this lecture on how to do these two steps. Once the data is stored in a SAS data library, we can access it from Enterprise Miner.

When we start Enterprise Miner we see two windows called a Project Navigator window and a Diagram Workspace window. Most of the data mining process occurs by creating a diagram in the Diagram Workspace that corresponds to the data mining actions. The first action is to drag an Input Data Source node into the Diagram Workspace, then right click it and Open a SAS data set.

The next node that we add to the diagram is the Data Partition node, which we then connect to the Input Data Source node. The need for data partitioning is discussed well in the Two Crows booklet on page 28.