Introduction to Data Mining

Data Mining is the process of extracting meaningful patterns from large databases.

An overview of the Data Mining Process:

1. Define the problem

2. Construct a database to address the problem

3. Explore, clean, prepare the data

4. Build models and use other analytical methods to address the problem

5. Model assessment

6. Implement the model

This list is adapted from the excellent Introduction to Data Mining and Knowledge Discovery tutorial booklet which can be downloaded for free from the Two Crows website. The list is also quite similar to the steps involved in developing scientific experiments shown at the top of this Stat 401 webpage. Both processes are iterative, as discoveries made at later stages can lead back to earlier stages in the process.

Data Mining methods are used in many applications, from Marketing and Customer Relations to developing new drugs in the Pharmaceutical Industry.

Perhaps more so than in other applications of fitting models to data, Data Mining methods yield predicted associations that have no implied claim of causality. The goal is typically to make more efficient use of resources by focusing effort on subsets of the data that have a higher chance of a successful outcome.