Once we establish our objectives for using analytics, as discussed in First Step in Data Analytics: Defining the Business Objective, the next logical step is exploring the data. During that step, a technique that’s often useful, especially while analyzing retail transactions data is association rule mining or ARM.
Association rules have been broadly used in many applications domains for finding pattern in data. The pattern reveals combinations of events that occur at the same time. One of the best domain is business field, where discovering of pattern or association helps in effective decision making and marketing.
An association rule X ⇒ Y expresses that in those transactions in the database where X occurs; there is a high probability of having Y as
well. X and Y are called respectively the antecedent and consequent of the rule.
Association rules provide information of this type in the form of “if-then” statements. These rules are computed from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature. In addition to the antecedent (the “if” part) and the consequent (the “then” part), an association rule has two numbers that express the degree of uncertainty about the rule. In association analysis the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).
Support: The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)
Confidence: Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent.
Lift: Lift is nothing but the ratio of confidence to expected confidence. Lift is a value that gives us information about the increase in probability of the “then” (consequent) given the “if” (antecedent) part.
There are essentially two steps in mining association rules from data
1. Identification of all item sets having support above a minimum support level
2. Discovery of all derived association rules having confidence above a minimum confidence level
For a meaningful and accurate prediction; high volume of preprocessed data in the form of data warehouse is prerequisite. Mining association rules is particularly useful for discovering relationships among items from large databases. The implementation of ARM procedures helps in ascertaining the descriptive models of data mining.
1. In single level or multiple level association rules; the first and most important issue is concerned with accurate data source in appropriate data format. Which encoding method should be used to convert the transaction tables is main issue because these encode tables are used to support the concept hierarchy of multiple levels.
2. Another issue to develop/design algorithms for multiple level association rules to reduce the number of iteration and to achieve time efficiency. The time efficiency can be achieved by reduction of database scans at each level. The redundancy of association rules is a main issue in association rule discovery.
3. If the interestingness parameters i.e. support and confidence thresholds are small, the number of frequent item sets increases, the number of rules presented to the user typically increases proportionately. Many of these rules may be redundant. So selection of appropriate values of interestingness parameters may be an important issue in association rule mining.
4. There are so many measures of the interestingness of an association. Several interestingness metrics including support, confidence, gain, Laplace value, conviction, lift, entropy gain, gini, and chi-squared value. These measures are indicators of the degree to which items in an association are related to each other. The challenge to the users which concentrates on finding associations is to choose the user specified constraints.
Association rule mining has been applied to e-learning systems for traditionally association analysis (finding correlations between items in a dataset), including, e.g., the following tasks: building recommender agents for on-line learning activities or shortcuts, automatically guiding the learner’s activities and intelligently generate and recommend learning materials, identifying attributes characterizing patterns of
performance disparity between various groups of students, discovering interesting relationships from student’s usage information in order to provide feedback to course author, finding out the relationships between each pattern of learner’s behavior, finding students’ mistakes that are often occurring together, guiding the search for best fitting transfer model of student learning, optimizing the content of an e-learning portal by determining the content of most interest to the user, extracting useful patterns to help educators and web masters evaluating and interpreting on-line course activitie, and personalizing e-learning based on aggregate usage profiles and a domain ontology. Here is the link to the paper.