The impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining
- Authors: Welcker, Laura Joana Maria
- Date: 2015
- Subjects: Data mining , Business -- Data processing , Database management
- Language: English
- Type: Thesis , Doctoral , DPhil
- Identifier: http://hdl.handle.net/10948/5009 , vital:20778
- Description: The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria).
- Full Text:
- Date Issued: 2015
- Authors: Welcker, Laura Joana Maria
- Date: 2015
- Subjects: Data mining , Business -- Data processing , Database management
- Language: English
- Type: Thesis , Doctoral , DPhil
- Identifier: http://hdl.handle.net/10948/5009 , vital:20778
- Description: The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria).
- Full Text:
- Date Issued: 2015
Adoption of business information systems in an automotive manufacturing environment: a case study
- Authors: Dyer, Shirley
- Date: 2008
- Subjects: Management information systems , Technology -- Information services , Information resources management , Business -- Data processing
- Language: English
- Type: Thesis , Masters , MTech
- Identifier: vital:9772 , http://hdl.handle.net/10948/892 , Management information systems , Technology -- Information services , Information resources management , Business -- Data processing
- Description: Dorbyl Automotive Technologies (DAT) is a manufacturing company that supplies parts and components to the local and international motor vehicle market. The automotive components’ market is very competitive and customers require more from the industry to stay competitive. Customers require full integration throughout the supply chain. DAT and its Information Systems Department have ensured that the necessary business information systems are available to assist the company in staying competitive. One problem, though, is that the users of these systems are not using and adopting the technologies available. This research examines the reasons for this by making use of a technology acceptance model called the UNIFIED THEORY OF ACCEPTANCE AND USE OF TECHNOLOGY (UTAUT), which is an integrated model based on eight different available acceptance models. The aim is to understand which factors influence the use of systems. The research also proposes a way forward by suggesting a model to assist DAT in new system implementations as well as correcting the current situation. The only way DAT will stay competitive is by ensuring that the company becomes lean. Customers demand this as more and more are moving to just-in-time delivery. This implies that the suppliers must react to changes real-time. The use of business information systems will become the main focus area to react to changes quickly and correctly. Effective and accurate systems depend on users making good use of these systems. Remaining competitive will depend on how effectively Information and Communication Technologies (ICT) are used.
- Full Text:
- Date Issued: 2008
- Authors: Dyer, Shirley
- Date: 2008
- Subjects: Management information systems , Technology -- Information services , Information resources management , Business -- Data processing
- Language: English
- Type: Thesis , Masters , MTech
- Identifier: vital:9772 , http://hdl.handle.net/10948/892 , Management information systems , Technology -- Information services , Information resources management , Business -- Data processing
- Description: Dorbyl Automotive Technologies (DAT) is a manufacturing company that supplies parts and components to the local and international motor vehicle market. The automotive components’ market is very competitive and customers require more from the industry to stay competitive. Customers require full integration throughout the supply chain. DAT and its Information Systems Department have ensured that the necessary business information systems are available to assist the company in staying competitive. One problem, though, is that the users of these systems are not using and adopting the technologies available. This research examines the reasons for this by making use of a technology acceptance model called the UNIFIED THEORY OF ACCEPTANCE AND USE OF TECHNOLOGY (UTAUT), which is an integrated model based on eight different available acceptance models. The aim is to understand which factors influence the use of systems. The research also proposes a way forward by suggesting a model to assist DAT in new system implementations as well as correcting the current situation. The only way DAT will stay competitive is by ensuring that the company becomes lean. Customers demand this as more and more are moving to just-in-time delivery. This implies that the suppliers must react to changes real-time. The use of business information systems will become the main focus area to react to changes quickly and correctly. Effective and accurate systems depend on users making good use of these systems. Remaining competitive will depend on how effectively Information and Communication Technologies (ICT) are used.
- Full Text:
- Date Issued: 2008
- «
- ‹
- 1
- ›
- »