Since when we are mining for association rules we have no idea of what attribute we are trying to predict and which attributes and their values it will depend on, bluntly going through all data looking for all possible combination of attributes and their values is not feasible. So the first step in finding association rules is finding frequent item sets. In other words, we are just looking for the items that frequently appear together, without inferring any relationship between the items. Taking a grocery store example, we'll be looking for an item set {milk, bread, peanut butter}. There are several algorithms that find frequent item sets, the most common is Apriori (Agrawal, Srikant; VLDB 1994). I'm not going to go through the details of the algo, since it is described pretty well in the paper and on wiki, but generally all you need besides your data is minimum support for the set (i.e. minimum number of time that an item set appears in data). The algo starts with one item item-sets, i.e. looking at each individual item, rejecting those that don't meet minimum support. With every iteration, it adds one more item to the item sets, again rejecting those that don't meet minimum support and stops when no larger item sets that meet min support can be found.
Once the frequent datasets are found, each one is turned into one or more association rule with a specified minimum accuracy. But I think that in most cases finding frequent datasets already solves the data analysis problem, and there is no need to turn sets into rules.
No comments:
Post a Comment