Business Applications of Data Mining (Web/Library research) List some business applications of data mining techniques. Case studies and success stories will be helpful to you. . See for example: http://wwwÂ]01.ibm.com/software/success/cssdb.nsf/CS/STRDÂ] 8QSKHK?OpenDocument&Site=spss&cty=en_us o This is just one example of a place to find case studies and success stories. You should find at least one other reference to use for this assignment . Create a table that lists: . Each business application (including a brief description of the business objective, and the company/companies that have used or could use data mining for these applications/business-objectives) o Example business application: âPredicting responses to a marketing campaignâ . The data mining techniques/algorithm(s) that are/were helpful in achieving the business objective for each business application o Your syllabus lists some of the most popular data mining techniques . Possible/typical outputs of data mining in that business application area o Example outputs: âMarried women with one or more children are more likely to respond to the campaignâ, âPeople who buy chili are more likely to by antacidsâ, etc. . The web-address of the page where you found the information or the citation for the article or book where you found the information. You are required to provide at least 3 different web-addresses/citations. (300-400 words [approx. 1 page]; 10 points). Question 2: The Data Mining Process Describe the industry standard CRISP-DM data mining process model (http://www.dataminingtechniques.net/dataÂ]miningÂ]tutorial/dataÂ]miningÂ]processes/) and SASâs SEMMA model (http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.htm l ) . Choose one of your business applications from Question 1, illustrate the usage of either CRISP-DM or SEMMA for that application: e.g. in the âdata preparationâ phase of CRISPDM describe the specific data sources you would use for that application, etc. Further Instructions for Assignment 1 . For question 1, you should format your answer as a table with the headings âbusiness objectiveâ, âcompanies (that do or could pursue this data mining objective)â, âdata mining algorithm/techniqueâ, âsample outputâ, and âweb addressâ as requested in the question. o Also, you should give specific answers. For example, a business objective of âprovide payment processing solutionsâ is not specific enough; rather say âproduce a model to score transactions and identify the transactions most likely to be fraudulentâ. o Similarly, for outputs of data mining, âcompany lowers percentage of fraudulent transactionsâ is fine as a general goal, but give me more details of possible specific outputs: e.g. give example rules that could be produced like âlarge transactions by people in Queens who have held accounts for less than 6 weeks are likely to be fraudulentâ. . For question 2, on the application of data mining techniques, make sure to describe both the CRISP-DM and SEMMA processes. However, you only need to apply one of them (preferably CRISP-DM). o The goal is to provide good descriptions of how specifically to apply each of the 6 stages of the process to your particular case. o More importantly, you should give specific actions for each phase of the data mining process when explaining how the process could be applied to your particular case. . For example, âunderstand dataâ is not specific enough and would not earn you any of the points for the âapplicationâ part of the question; instead, under the heading âdata understandingâ give details like âgather customer data from internal database, including customer identifier, age, purchase history, â¦â. . Similarly âprepare dataâ is not detailed enough. Instead, under the heading âdata preparationâ, write âbin the customers into 5 equal bins by income attribute; compute aggregates for past 3 months, past 6 months, and past 12 months customer purchases, â¦â. . Under the heading âevaluationâ you might explain that a model that picks fraudulent transactions with 98% recall, and 80% precision is probably sufficient and that low recall is costly because each fraudulent transaction that falls through our checks is expensive, whereas low precision is not too costly as transactions that were rejected falsely do not lose us a lot of profits. . (Model evaluation is dealt with more detail in lectures after the assignment is due; you should have picked up knowledge about evaluation criteria from reading the Two Crows reading). o For the deployment phase (of CRISP-DM), you should explain how the Company could exploit (profit from) the model produced and what specific actions they did or could take: e.g. they could use the model to score prospects, and email high-scoring customers. Obviously the details would depend on the application you chose, but the important thing is to be specific and apply the process to your particular application. . Always cite the source of any comparative performance figures, or seemingly unsubstantiated data, which you give.
Business Applications of Data Mining
(Web/Library research) List some business applications of data mining techniques. Case studies
and success stories will be helpful to you.
. See for example: http://wwwÂ]01.ibm.com/software/success/cssdb.nsf/CS/STRDÂ]
8QSKHK?OpenDocument&Site=spss&cty=en_us
o This is just one example of a place to find case studies and success stories. You should find
at least one other reference to use for this assignment
. Create a table that lists:
. Each business application (including a brief description of the business objective, and the
company/companies that have used or could use data mining for these
applications/business-objectives)
o Example business application: âPredicting responses to a marketing campaignâ
. The data mining techniques/algorithm(s) that are/were helpful in achieving the business
objective for each business application
o Your syllabus lists some of the most popular data mining techniques
. Possible/typical outputs of data mining in that business application area
o Example outputs: âMarried women with one or more children are more likely to
respond to the campaignâ, âPeople who buy chili are more likely to by antacidsâ,
etc.
. The web-address of the page where you found the information or the citation for the
article or book where you found the information. You are required to provide at least 3
different web-addresses/citations.
(300-400 words [approx. 1 page]; 10 points).
Question 2: The Data Mining Process
Describe the industry standard CRISP-DM data mining process model
(http://www.dataminingtechniques.net/dataÂ]miningÂ]tutorial/dataÂ]miningÂ]processes/) and
SASâs SEMMA model
(http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.htm
l )
. Choose one of your business applications from Question 1, illustrate the usage of either
CRISP-DM or SEMMA for that application: e.g. in the âdata preparationâ phase of CRISPDM
describe the specific data sources you would use for that application, etc.
Further Instructions for Assignment 1
. For question 1, you should format your answer as a table with the headings âbusiness
objectiveâ, âcompanies (that do or could pursue this data mining objective)â, âdata mining
algorithm/techniqueâ, âsample outputâ, and âweb addressâ as requested in the question.
o Also, you should give specific answers. For example, a business objective of âprovide
payment processing solutionsâ is not specific enough; rather say âproduce a model to
score transactions and identify the transactions most likely to be fraudulentâ.
o Similarly, for outputs of data mining, âcompany lowers percentage of fraudulent
transactionsâ is fine as a general goal, but give me more details of possible specific
outputs: e.g. give example rules that could be produced like âlarge transactions by
people in Queens who have held accounts for less than 6 weeks are likely to be
fraudulentâ.
. For question 2, on the application of data mining techniques, make sure to describe both the
CRISP-DM and SEMMA processes. However, you only need to apply one of them
(preferably CRISP-DM).
o The goal is to provide good descriptions of how specifically to apply each of the 6
stages of the process to your particular case.
o More importantly, you should give specific actions for each phase of the data mining
process when explaining how the process could be applied to your particular case.
. For example, âunderstand dataâ is not specific enough and would not earn you
any of the points for the âapplicationâ part of the question; instead, under the
heading âdata understandingâ give details like âgather customer data from
internal database, including customer identifier, age, purchase history, â¦â.
. Similarly âprepare dataâ is not detailed enough. Instead, under the heading
âdata preparationâ, write âbin the customers into 5 equal bins by income
attribute; compute aggregates for past 3 months, past 6 months, and past 12
months customer purchases, â¦â.
. Under the heading âevaluationâ you might explain that a model that picks
fraudulent transactions with 98% recall, and 80% precision is probably
sufficient and that low recall is costly because each fraudulent transaction that
falls through our checks is expensive, whereas low precision is not too costly
as transactions that were rejected falsely do not lose us a lot of profits.
. (Model evaluation is dealt with more detail in lectures after the
assignment is due; you should have picked up knowledge about
evaluation criteria from reading the Two Crows reading).
o For the deployment phase (of CRISP-DM), you should explain how the Company
could exploit (profit from) the model produced and what specific actions they did or
could take: e.g. they could use the model to score prospects, and email high-scoring
customers. Obviously the details would depend on the application you chose, but the
important thing is to be specific and apply the process to your particular application.
. Always cite the source of any comparative performance figures, or seemingly
unsubstantiated data, which you give.