Data Warehouse and Data Mining Concepts - A complete guide for CSIT student

Data Warehouse and Data Mining Concepts

Data Warehouse and Data mining Concept

Data Warehouse:

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.
According to William Inmon, A data warehouse is a subject oriented, integrated, nonvolatile and time variant collection of data in support of management decision making process.

(Characteristics of Data Warehouse)

Subject Oriented:
A data warehouse can be used to analyze a particular subject matter.
For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales

Integrated:
Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.

Nonvolatile:
Nonvolatile means that, once entered into the warehouse, data should not change.

Time Variant:
Large amount of historical data is kept in a data warehouse for the analysis purpose. A data warehouse's focus on change over time is what is meant by the term time variant.

Functionality of Data Warehouse:

Data Warehouse Functionalities are described as follows:
  • Data Consolidation
  • Data Consolidation is the process of collecting data from different sources across various formats so that it can be integrate into one place. The main advantages of data consolidation is that it helps to improve productivity and efficiency.

  • Data Cleansing
  • Data cleansing is also known as data scrubbing. Data cleansing is essential steps towards managing the quality of data ensuring that the loaded data is standard across all records. The Data Cleanser detects and corrects invalid or inaccurate records based on rules you define to provide a clean and consistent data set.

  • Data Integration
  • Data integration is the process of combining data from different sources into a unified view. Data Integration ensures that information is timely, accurate, and consistent across complex systems.


    Data Mining:


    Data mining is the techniques of extracting non-trivial, implicit, valuable and potentially useful information that discovers previously unknown relationship among data from a large database. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are:
    1. Automatic discovery of patterns
    2. Prediction of likely outcomes
    3. Creation of actionable information
    4. Focus on large data sets and databases

    Data Mining Functionality

    Each data mining function specifies a class of problems that can be modeled and solved. Data mining functions fall generally into two categories:
  • Descriptive Task (Unsupervised)

  • These tasks present the general properties of data stored in database. The descriptive tasks are used to find out patterns in data i.e. cluster, correlation, trends and anomalies etc.


    Data Mining Unsupervised Functions are as follows:
        Anomaly Detection (implemented through one-class classification)
    Identifies items (outliers) that do not satisfy the characteristics of "normal" data

        Association Rules: Finds items that tend to co-occur in the data and specifies the rules that govern their co-occurrence

        Clustering: Finds natural groupings in the data

        Feature extraction: Creates new attributes (features) using linear combinations of the original attribute
  • Predictive Task (Supervised)
  • Predictive data mining tasks predicts the value of one attribute on the bases of values of other attributes, which is known as target or dependent variable and the attributes used for making the prediction are known as independent variables.

    Data Mining Supervised Functions are as follows:
        Classification: Assigns items to discrete classes and predicts the class to which an item belongs.

        Regression: Approximates and forecasts continuous values

    No comments:

    Post a Comment

    विज्ञापन