Microsoft User Group Винница feedback Association rules: apriori algorithm - Работа с данными, базы данных, Data Mining - Microsoft User Group Винница

 

Association rules: apriori algorithm

Association Rules Overview

From Wikipedia:

In data mining, association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. Piatetsky-Shapiro describes analyzing and presenting strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Agrawal et al. introduced association rules for discovering regularities between products in large scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {onions, potatoes} => {beef} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy beef. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection and bioinformatics. 

Apriori Algorithm Overview

The Apriori algorithm is used to analyze a list of transactions for items that are frequently purchased together. Considering a transaction where the sale of software is increased by the sale of e-books, Support and Confidence are two measures used to describe market based analysis association rules created with an Apriori algorithm. E.g. a Support measure of 1% and a Confidence measure of 50% means that 1% of transactions analyzed contain purchases of e-books and software and 50% of customers who bought an e-book also bought a software.

A set of items is known as an itemset. An itemset which contains k items is known as a k-itemset. E.g. a set of items {Books, CD, DVD, Video} is a 4-itemset.

The number of transactiobns that contain an itemset is known as the Frequency or Support Count of the itemset. If the number of transactions containing an itemset satisfies the minimum support count specified then the itemset is known as a Frequent Itemset. E.g. the 2-itemset {Books, DVD} has a support count of 5 in the database of transactions below.

The database below contains 9 transactions. Find the support count and confidence for the the 2-itemset {Books, DVD}. Using the market based analysis apriori algorithm create an assocation data mining rule between {Books} and {DVD}.

Firstly the number of transactions that contain the 2-itemset {Books, DVD} is 5. The number of transactions containing the itemset {Books} is 6.

Consequently the support for the 2-itemset {Books, DVD} is (5/9) * (100%) = 55.6%

The confidence for the 2-itemset {Books, DVD} is = (Support Count({Books, DVD}) / Support Count({Books}) * (100%) .

Consequently the confidence for the 2-itemset {Books, DVD} is = ((5/6) * 100%) = 83.3%

Transaction 1:  {Books, CD, DVD}
Transaction 2:  {CD, Games}
Transaction 3:  {CD, DVD}
Transaction 4:  {Books, CD, Games}
Transaction 5:  {Books, DVD}
Transaction 6:  {CD, DVD}
Transaction 7:  {Books, DVD}
Transaction 8:  {Books, CD, DVD, Video}
Transaction 9:  {Books, CD, DVD}

The Apriori Algorithm basically finds the support count and confidence of itemsets eliminating those itemsets that do not meet a minimum support count and confidence measure from a final list of rules created. Considering the list of transactions above, the algorithm will perform the following steps for a minimum support count of 3:

  • The APriori algorithm creates a list of unique items in a 1-itemset Candidate Itemset corresponding to {Books, CD, DVD, Games, Video}
  • The support count of each item in the list above is obtained and any item that does not satisfy the minimum support count is eliminated from further analysis creating a 1-itemset Frequent Itemset
  • The 1-itemset frequent itemset is joined with itself to create a 2-itemset candidate itemset.
  • The steps taken for the 1-itemset candidate itemset is repeated for the 2-itemset candidate itemset.
  • The steps above are repeated until a frequent itemset is empty and no new candidate itemsets can be generated.

A confidence measure is created for each rule generated from the frequent itemsets.

1. Apriori Algorithm for Market Basket Analysis C# Implementation

Source: http://www.kdkeys.net/forums/thread/2043.aspx

You can download source code from the codeplex site.

2. Apriori Algorithm with JavaScript

Source: http://allmybrain.com/2007/11/12/implementing-the-apriori-data-mining-algorithm-with-javascript/

You can download source code from the codeplex site.

3. Association Rules Demo 1.0 in Russian

Program for association rules mining in Russian without source code from BaseGroup Labs. Binaries are available on the codeplex site.

References

Downloads

 

P.S. I will update materials and references about the apriori algorithm. You also may write your materials and references in the comments or send to the msugvn[at]gmail.com.


Поділитись


Related Posts with Thumbnails Posted апр 10 2009, 05:23 by Краковецкий А. View 1 838 4

Comments

ram on 06-30-2009 13:00

some better

Pham Son on 11-18-2009 10:49

Thanks this post

EZHAVIN on 12-04-2009 23:31

CAN U SUGGEST ABOUT ASSOCIATION RULES MINING USING NOVEL GENETIC ALGORITHM ( ARMNGA  (ie) AR+ GA ) ..

supot on 01-09-2010 1:56

thanks

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Please add 1 and 1 and type the answer here:

Enter captcha:

Информация

О нас
Timeline
Спонсоры
Поддержать

Разделы

Блоги
Медиа
Форумы
Вики
Презентации

Работа

Вакансии
Компании

Проекты

TechPosters
Data Mining SDK
Численные методы на C#
iPhoner
Data Extracting SDK

Контакты

msugvn@gmail.com
krakovetsky.alex
@msugvnua
ВКонтакте
LinkedIn
Facebook
INETA

Разработка логотипа: Helen

Статистика

Powered by Community Server (Non-Commercial Edition), by Telligent Systems