Previous |
Next |
Attribute analysis seeks to discover both general and detailed information about the structure and content of data stored within a given column or attribute.
Attribute analysis consists of:
Pattern analysis attempts to discover patterns and common types of records by analyzing the string of data stored in the attribute. It generates several regular expressions that match many of the values in the attribute, and reports the percentages of the data that comply with each candidate regular expression. OWB can also search for data that conforms to common regular expressions, such as dates, e-mail addresses, telephone numbers and Social Security numbers.
Table: Sample Columns Used for Pattern Analysis shows a sample attribute, Job Code, that could be used for pattern analysis.
Sample Columns Used for Pattern Analysis
| Job ID | Job Code |
|---|---|
|
7 |
337-A-55 |
|
9 |
740-B-74 |
|
10 |
732-C-04 |
|
20 |
43-D-4 |
Table: Pattern Analysis Results shows the possible results from pattern analysis, where D represents a digit and X represents a character. After looking at the results and knowing that it is company policy for all job codes be in the format of 999-A-99, you can derive a data rule that requires all values in this attribute to conform to this pattern.
Domain analysis identifies a domain or set of commonly used values within the attribute by capturing the most frequently occurring values. For example, the Status column in the Customers table is profiled and the results reveal that 90% of the values are among the following values: "MARRIED", "SINGLE", "DIVORCED". Further analysis and drilling down into the data reveal that the other 10% contains misspelled versions of these words with few exceptions. Configuration of the profiling determines when something is qualified as a domain; therefore, be sure to review the configuration before accepting domain values. You can then let OWB derive a rule that requires the data stored in this attribute to be one of the three values that were qualified as a domain.
Data type analysis enables you to discover information about the data types found in the attribute. This type of analysis reveals metrics such as minimum and maximum character length values as well as scale and precision ranges. In some cases, the database column is of data type VARCHAR2, but the values in this column are all numbers. Then you may want to ensure that you only load numbers. Using data type analysis, you can have OWB derive a rule that requires all data stored within an attribute to be of the same data type.
Unique key analysis provides information to assist you in determining whether or not an attribute is a unique key. It does this by looking at the percentages of distinct values that occur in the attribute. You might determine that attributes with a minimum of 70% distinct values should be flagged for unique key analysis. For example, using unique key analysis you could discover that 95% of the values in the EMP_ID column are unique. Further analysis of the other 5% reveals that most of these values are either duplicates or nulls. You could then derive a rule that requires that all entries into the EMP_ID column be unique and not null.