Data Warehouse:
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.According to William Inmon, A data warehouse is a subject oriented, integrated, nonvolatile and time variant collection of data in support of management decision making process.
(Characteristics of Data Warehouse)
Subject Oriented:A data warehouse can be used to analyze a particular subject matter.
For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales
Integrated:
Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
Nonvolatile:
Nonvolatile means that, once entered into the warehouse, data should not change.
Time Variant:
Large amount of historical data is kept in a data warehouse for the analysis purpose. A data warehouse's focus on change over time is what is meant by the term time variant.
Functionality of Data Warehouse:
Data Warehouse Functionalities are described as follows:Data Mining:
Data mining is the techniques of extracting non-trivial, implicit, valuable and potentially useful information that discovers previously unknown relationship among data from a large database. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are:
- Automatic discovery of patterns
- Prediction of likely outcomes
- Creation of actionable information
- Focus on large data sets and databases
Data Mining Functionality
Each data mining function specifies a class of problems that can be modeled and solved. Data mining functions fall generally into two categories:These tasks present the general properties of data stored in database. The descriptive tasks are used to find out patterns in data i.e. cluster, correlation, trends and anomalies etc.
Data Mining Unsupervised Functions are as follows:
Anomaly Detection (implemented through one-class classification)
Identifies items (outliers) that do not satisfy the characteristics of "normal" data
Association Rules: Finds items that tend to co-occur in the data and specifies the rules that govern their co-occurrence
Clustering: Finds natural groupings in the data
Feature extraction: Creates new attributes (features) using linear combinations of the original attribute
Data Mining Supervised Functions are as follows:
Classification: Assigns items to discrete classes and predicts the class to which an item belongs.
Regression: Approximates and forecasts continuous values
No comments:
Post a Comment