Data mining in organization: the concepts, tools and techniques, and the benefits
1. Introduction
Data is vital. It is the strength for organization success in today's modern organizations. Data comes in various sources and forms such as paper documents and electronic formats. And it is provide an imperative role in underneath decision making in many organizations as it crucial to lead to more productivity, faster problem solving and improved organizational performance. We will look further about data mining that is a relatively new portion of information management that offers immense guarantee for production sense of large data sets. Data mining is concern with determining new and meaningful information, so that decisions makers can learn as much as that they can from their data assets. As the result of rapid growth in data and database, there's a need to develop new technologies and tools to develop data into practical and valuable information. Therefore data mining has become importance debate in the area of research nowadays.
2. Data Mining and Data Warehouse conceptsData mining is the process of using model identification technologies such as neural networks and inherent algorithm to establish actionable and significant patterns, profiles and developments by run through organizational data. Data mining also can be described as the hunt for priceless information in large amount of data. It explores for concealed relationship, patterns, connections, and interdependencies in large databases that may fail to notice in conventional information congregation techniques. In other words, data mining can be describes as automated recognition of relevant patterns in a database. Other author, Reeves (1995) identifies data mining as the method of pertaining artificial intelligence techniques to large data set in order to verify patterns in the data.
Meanwhile data warehouse are much related from database or can be considered as a huge physical databases that stored a large volume of information from a variety of resources. It is basically repository for relevant business data. Mattison described data warehouse as a database that prepared to provide as impartial data storage area as well as exploited by data mining and other applications. Besides that data warehouse assemble a specific set of business necessities and uses data that gather to predefine a set of business criteria.
Data warehouse is one of the important research areas related to data mining. Data warehouse was developed in purposed for evaluating business situations as well as aid in supporting decision making. Data warehouse consists of large volumes of information and data from variety of sources pertaining to organization and can be access in timely manner. In the meantime, data mining goes collectively with data warehouse to systematize past information gathered from major clients or server based applications. Therefore, data mining can add values to the information assets through successful orientation of large corporate data warehouse. The relationship formed between data mining and data warehouse improves the efficiencies and excellence of judgments permits by decision makers to influence their massive data assets (Baron and Smith, 1997).
In recent years, business databases have developed vastly, but the capabilities for analyzing such large amounts of data have not developed at the same rate as the capabilities of collecting and storing data. As a result, businesses are becoming increasingly concerned with the study of knowledge discovery methods. Data mining is an interactive process that involves assembling the data into a format conducive to analysis. Once the data are configured, they must be cleaned by checking for obvious errors or flaws and simply removing them. Once the data are mined to establish patterns or make predictions, the results must be tested and verified. This involves testing models on test data sets to see how well they match. If the data are verified, must then be put into terms that the business user can use easily.
Moreover one of the most important business goals of data mining is breakthrough or classifying new patterns within the data. Besides that data mining goals also is aim to offer client with a spontaneous, graphical tool for creating new scrutinizes and steer the data warehouse. Apart from that, it also aims to facilitate focal point of the user's analysis so that the relevant and important information can be acquired quicker and more effectively and efficiently.
Data mining characteristicsAfter discuss the concepts of data mining and it relationship with data warehouse we will move to the characteristic of data mining. The foremost characteristics of data mining are as follows:
* Data pattern determination. It is a data access language or data manipulation language (DMLs) which identifies the exact data that the users wish for to draw into the program for processing or display.
* Formatting capability. Data mining also has a sturdy formatting capability where it capable of produces raw data configures, database form, tabular, multidimensional display and visualization.
* Content analysis capability. It refers to in the term of facilitating client to process the provisions written by the end user.
* Synthesis capability. Data mining has a strong synthesis capability which it permits data synthesis to be well-timed executed.
3. Data Mining Tools and TechniquesData mining is imperative to organizations because it conveys a great prospective to aid the companies onto removal of concealed extrapolative information from large databases and make possible to focus on the most important information in their data warehouses. There a lot of data mining tools and techniques reviewed by other literature in the field. A number of authors prefer to use data mining tools while another number have a preference to identify it as data mining techniques, nevertheless in essence it is identical and not differentiated far from another. While converse about data mining tools and techniques it is essential to know that it can forecast future inclinations and behaviors hence allowing organizations to compose upbeat, knowledge-driven decision. By employing data mining methods to organizations it can bring knowledge workers to the next coherent step in data analysis by compliant even deeper insights than those presented by production reports, managed queries and executive information systems.
The explicit approaches of data mining steps however differ from one organization to organizations and researchers to another researcher. But generally there are three major steps in data mining. First is to prepare the data, then diminish the data and finally gazing for precious information. Examples below by Fayyad et al. (1996) suggested the following steps:
1) Regaining the data from a large database.
2) Choosing the appropriate detachment to work with
3) Deciding on the suitable sampling system, clean-up the data and commerce with absent fields and records.
4) Applying the right alterations, dimensionality reduction and ledges.
5) Fitting models to the preprocessed data.
3.1 Categorizing Data Mining TechniquesVarious literatures have highlight many data mining that used in the many organizations. There are numbers of data mining techniques and system that have been developed and proposed and designed. Every single one of it can be categorizing and classified based on databases, the information to be revealed, and the procedures to be utilized. Chen et al. (1996) has outlined one of the classification techniques to classify data mining techniques.
· Based on databaseA data mining system can be classified based on the category of database for what it is intended for. Organizations use many database systems to support their activities such as transaction database, multimedia database web database and many more. If data mining system determines knowledge from relational database then it is considered as a relational data mining system.
· Based on the knowledgeData mining system can find out a variety of types of knowledge and also can be graded according to generalization level of the discovered knowledge. It can be classified into common knowledge, primal-level knowledge and compound-level knowledge.
· Based on the techniquesData mining system can be classified by data according to its principal mining approach such as pattern-based mining, integrated approaches and other approaches.
3.2 Major data mining tools and techniquesSome of the tools used in data mining are straightforward, succinct and easy to employ algorithms that model non arbitrary relationship in huge amount of data. Stirred and aspired by diverse paradigms there is wide range of data mining tools. The most commonly used including Artificial Intelligence methods (Firebaugh, 1998), Decision Trees (Ginsberg, 1993), Genetic Algorithm (Koza, 1992), Back propagation (Wasserman, 1989), Rule Induction Methods (Michalski et al., 1983), Visualization, Hybrid Systems and Artificial Neural Networks (Vemuri, 1988; Chester, 1993).
Every each data mining tools share some familiar characteristics to one another although they approaches the data from different standpoint. Each adaptively progresses as the model gain knowledge from the data set and more knowledge is accumulated. Common characteristics are every techniques have a training phase (discover patterns and relationships hidden in historical data set) and followed by an implementation phase (model is executed). For the most part these techniques need more than an external knowledge of the raw data. This is essential in order to appropriately preprocess the data, decide the right technique for the organization and comprehend the results. A diverse range of data mining tools and techniques is now presented and used in common organizations for further understanding.
Artificial Intelligence (AI) TechniquesThis data mining techniques are broadly utilized in data mining. The techniques used in AI mostly related to the diverse progression steps in data mining such as pattern recognition, machine learning, neural networks, knowledge acquisition, knowledge representation and search. Case-based reasoning drawn chronological cases to distinguish patterns, meanwhile the intelligent agent approach utilizes a computer program to move through data are other AI techniques that can be exploited for data mining.
Decision TreesDecision trees are three-shaped structures that embody set of decisions. It used a simple tree model where at every stem during tree growth the data set is deliberately separated into different classes and subclasses. It able to produce rules for the classification of a data set. And can be also used successfully to acquire a gut to sense the data when the knowledge hunter needs a fast and soiled exploratory partitioning of the data set. Decision trees are uncomplicated to understand, straightforward to construct and simple to clarify whereas it offer knowledge to user perceptive, high level and very understandable overviews of the structure and organization of their data set. Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) are among the specific decision tree methods used for the classification of data.
Genetic AlgorithmThe strong point of this technique are that it is an easy to clarify, robust, and by instinct interesting model which scrutinizes the search space with a broad encounter and can easily handle multidimensional problems. This technique was inspired from Darwin's theory of evolution and used optimization techniques in process such as genetic combination, mutation and neural selection. A pair of rules each representing a probable clarification to a problem is originally formed at random. Then the pairs of rules are merging to produce offspring for the next generation. A mutation processed is used to random modify the genetic structures of some members of each new generation then the system runs for dozens or hundreds of generations. The process is terminated when acceptable or optimum solution is found or after some fixed time limit. Generic algorithm is practical for finding solutions to firm optimization problems with respect to some computable criterion. One of the advantages of using this technique is that numerous possibilities can be recommended as solutions.
VisualizationThis visual data mining technique has demonstrated the significance in probing data analysis and they also have a good probable for mining large databases. Visualization is the process of clearly presenting the result found using data mining tools and the best approach of presenting complex interdependencies among various attributes for catching an intuitive feeling of the data and knowledge resulting from the analysis. It allows users to quickly assesses and make sense of the vast amount of data. Users can easily evaluate, rescale and get an overview when using this technique. These technique can be used standalone or in various combinations for visualizing multidimensional datasets (scatter plot matrices, parallel coordinates, projections matrices etc.) either in two and three dimensional images.
Rule Induction MethodThis technique used statistical discovery method to expand system which depends on the occurrence of relationship, the rate of accurateness and the accurateness of calculation. Rule induction also is the mining of useful if then rules from data based on statistical significance. This technique is highly unsupervised, yet they do necessitate that the rules generated be assessed by experts. Besides that, this technique is frequently exploited when new rules need to be created. Firstly, simple rules are created based on set statistics, followed by more complex rules are formed and joint with old rules via standard rule combining techniques. As rule induction deems all possibilities, it requires continuous expert assessment of the rules generated by the system which can be a very lengthy and costly exercise.
Back propagationIt is based on the biological theory of learning and memory creation using a highly interrelated group of neurons (or nerves). Such models attempt to mimic brain activity by adapting the weights of the interconnections among the neurons in the network allowing learning and memory creation to take place. Back prop systems are highly supervised. The back prop neural network model is ideal for prediction and classification in situations where there are good deals of historic data available for training. The strengths of this technique are that it can grip multidimensional data, it is vigorous and it can handle noisy data.
4. Benefits of data miningData mining is now getting accepted and widespread use because of the solid reimbursement it offers such as the accurate credentials of buying trends and the accurate characterization of market segments. Data mining is important in particular to organizations that crave to develop outfitted and added accessible data to improve the excellence of decision making and increase critical competitive advantages. Data mining has given a lot of advantages to the organizations. Benefits to organization can be discussed in many aspects such as effectiveness, quality as well as the cost of employing data mining.
In term of effectiveness, data mining users can appraise it by testing it strength, determining the precision of the output and investigating its expounding power. An excellent method of data mining should be scalable, extensible and flexible to meet up the requirements of strategic development due to a continuously growing data mining as well as to new sources of outside data access.
The excellence, security, comprehensiveness, authority, reliability and relevance of the data in the warehouse influence a great extent on explanatory power and its accuracy. Therefore signify of the quality of data mining. The degree of the data quality plays a significant effect on the outcome of a model.
Knowledge worker also gain benefits in employing data mining in organizations as they improved relevance and flexibility of information due to both swift model development and the accumulated value of a continuously evolving model. Besides that, by using data mining in organizations, users can increase their efficiencies, productivity and the quality of decision making. The accurateness of prediction and overall decision making also becomes more effective and successful to the organizations. When talking about the overall cost of utilizing data mining is an expensive and long-term proposition it is a better alternative then embracing older technologies or doing nothing at all. Grasping the data assets to retrieve the highest possible payoff is worthwhile since even a little enhancement can turn out to be a significant benefit to the organization.
The application of data mining techniques to customer databases in organization also has directly impacted the area of marketing extensively. The most imperative impact has been the rise of database marketing. Managers nowadays gain benefits from it and now seeking ways to move their focus away from making the right products to targeting the existing product to the right customers. The tools are used to forecast customer actions by predicting which customer will be most responsive to promotional and sales campaigns hence increase the total revenue of organizations.
Data mining applications also has gain benefits in many areas such as finance, telecommunication, marketing and web analysis. The summarized benefits of data mining are as follows:
1. correct data identification and scrutiny improve the quality of decision making
2. strong direction-finding, calculation, synthesis capabilities make it possible to achieve critical competitive advantages
3. relevant information is acquired quicker and the time is used more effectively.
5. ConclusionA rise to extraordinary new applications of customer information is just the effect of the boost database technology and just as the efficiency of mass marketing is declining. The processes of data collection and mining are multifarious but well valuable to organizations. The new technologies have had a massive impact on how businesses market products to customers. Data mining and data warehouse strategy in organizations is critical in realizing competitive advantages through the effective use of information. Knowledge workers must comprehend how the organization accesses and uses data and how business intelligence tools can be useful to attain organizational goals. Data mining and data warehouse alters data into information in a reliable and intellectual manner across organizations. It is importance as a major shared asset of the organization. Clearly, data mining will be one of the foremost competitive edges of organizations. Understanding data mining concepts and techniques are very useful and important in organizations.
Article name: Data mining in organization: the concepts, tools and techniques, and the benefits essay, research paper, dissertation