X4DB data quality framework
"Over 66% of enterprises are not confident their organization has a single view of their legacy data, while they recognize that data quality is imperative to business and compliance "

Glossary Data Cleansing

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z Ad-Hoc Query
Any spontaneous or unplanned question or query. It is a query that consists of dynamically constructed SQL, which is usually constructed by desktop-resident query tools.

A defined, finite sets of steps, operations, or procedures that will produce a particular outcome (e.g. computer programs, mathematical formulas, and recipes).

A program or group of programs designed for end users. Software can be divided into two general classes: systems software and applications software. Systems software consists of low-level programs that interact with the computer at a very basic level. This includes operating systems, compilers, and utilities for managing computer resources. In contrast, applications software (also called end-user programs) includes database programs, word processors, and spreadsheets. Figuratively speaking, applications software sits on top of systems software because it is unable to run without the operating system and system utilities.

Application Integration
The process of enabling independently designed applications to work together. This can range from simple approaches - such as providing users with access to data and functionality from multiple applications through a single user interface - to more sophisticated approaches involving integration brokers or middleware. See integration broker and middleware.

Application Server
1. A hardware server designated to run applications (but not a database).
2. System software used to host the business logic tier of applications. In three-tier applications, the application server manages business logic and enables it to be accessed from the user interface tier. In a service-oriented architecture (SOA), an application server hosts the application services and also plays the role of a fundamental enabling technology.

1. The overall design of a hardware, software or network system and the logical and physical relationships among its components. The architecture specifies the hardware, software, access methods and protocols used throughout the system.
2. A set of principles, guidelines and rules used by an enterprise to direct the process of acquiring, building, modifying and interfacing IT resources throughout the enterprise. These resources can include equipment, software, communications, development methodologies, modeling tools and organizational structures.

Back Office Solution
Software applications designed to assist organizations with the management of "behind the scene" tasks and processes related to accounting, human resources, distribution and manufacturing. These processes do not usually have direct interaction with customers, however when integrated to a front office application such as CRM, they increase the benefits of your back office and CRM solution.

Batch Processing
The processing of application programs and their data individually, with one batch being completed before the next is started. It is a planned processing procedure typically used for purposes such as preparing payrolls and maintaining inventory records.

A metric used to quantify performance for comparative purposes. See benchmarking.

1. Measuring the performance of hardware components or systems (such as processors or servers) using standard benchmarks maintained by an independent organization, such as the Transaction Processing Performance Council (see TPC).
2. Measuring performance qualities (such as efficiency or spending) of enterprise organizations or processes (such IS) against comparative benchmarks. Such benchmarks can be external (for example, averages of industry peer performance) or internal (for example, measurements of an organization's performance in different time periods, or comparison to other organizations in the same enterprise).

Best Practice
A group of tasks that optimizes the efficiency or effectiveness of the business discipline or process to which it contributes. Best practices are generally adaptable and replicable across similar organizations or enterprises - and sometimes across different functions or industries.

Binary Code
Code that uses combinations of two base values (generally represented using the digits "0" and "1") to represent information. For example, the number 17 is represented as "1001" in binary notation.

Business Intelligence (BI)
An interactive process of exploring and analyzing customer data in order to discern trends or patterns resulting in gains such as identifying new sales opportunities and employee commitment.

Business Process
Business Process Means a sequence of defined steps necessary to achieve a business objective. Business objectives can include any business operation, including product design, marketing, sales, finance, accounting, manufacturing, logistics, supply chain management, customer relationship management and other special business relationships.

Business Process Automation (BPA)
The automation of complex business processes and functions beyond conventional data manipulation and record-keeping activities, usually through the use of advanced technologies. It focuses on "run the business" as opposed to "count the business" types of automation efforts and often deals with event-driven, mission-critical, core processes. BPA usually supports an enterprise's knowledge workers in satisfying the needs of its many constituencies.

Business Process Modeling (BPM)
A process that links business strategy to IT system development to ensure business value. It combines workflow, functional, organizational and data/resource views with underlying metrics such as costs, cycle times and responsibilities to provide a foundation for analyzing value chains, activity-based costs, bottlenecks, critical paths and inefficiencies.

Business Process Outsourcing (BPO)
Business Process Outsourcing (BPO) occurs when an organization turns over the management and optimization of a business function to a third party that conducts the activity based on a set of predetermined performance metrics. A BPO vendor manages people and processes, while traditional outsourcers focus on life cycle management and hardware uptime.

Business Process Re-engineering or Redesign (BPR)
This strategy combines process and system change to achieve company goals. Through a fundamental analysis and the redesign of business processes and management systems, companies can often make large gains in productivity and performance.

Business Intelligence (BI)
An interactive process of exploring and analyzing customer data in order to discern trends or patterns resulting in gains such as identifying new sales opportunities and employee commitment.

Business Rules
Policies by which a business is run. The business rules contain constraints on the behavior of the business. The assertions that define data (e.g., the state code business rule might be the 50 United States, the District of Columbia and the U.S. Territories) from a business point of view.

Business Rule Engine (BRE)
A software tool used to record, track, manage and revise enterprise business processes. Rules are set to stipulate and outline processes, and the BRE "externalizes" these rules for quick and easy modification. BREs (also known simply as "rule engines") can be used independently or in conjunction with other technology - such as business process management (BPM) and business activity monitoring (BAM) tools - to help achieve business goals and enable organizational change. The use of BREs can support business process re-engineering (BPR) and help an enterprise meet operational objectives, such as reducing maintenance costs, facilitating straight-through processing (STP) and enabling exception-based processing.

Technically, a dynamically bindable package of functionality that is managed as a unit and accessed through documented interfaces that can be discovered at runtime. Pragmatically, components tend to fall into two major groups: technical components, which perform a technology-specific task that is application-independent (e.g., a graphical user interface control), and business components, which encapsulate a piece of business functionality.

Content Management System, or CMS
is software for facilitating the maintenance of content on a web site.

CDI, or Customer Data Integration
CDI Is a new buzz word for Customer data reference projects

Critical Success Factors
Key areas of business activity in which favorable results are necessary for a company to reach its goals.

CRM (Customer Relationship Management)
An enterprise-wide strategy and solution that impacts customer-facing departments and processes in a company. It is designed to improve customer service, loyalty and retention, optimize profitability and help companies better manage communication and interaction between their employees and their customers, partners and suppliers. A standard CRM solution includes three modules, Sales Force Automation (SFA), Marketing Automation and Customer Service.

is a collection of information that is organized so that it can easily be accessed, managed, and updated. Databases are sometimes classified according to their organizational approach: Relational Database, Object Oriented Programming Database.

Data Analysis
The systematic study of data so that its meaning, structure, relationships, origins, etc. are understood.

Data Profiling

Data Profiling is a process whereby one examines the data available in an existing database and collects statistics and information about that data.

Data Architecture
The framework for organizing the planning and implementation of data resources. The set of data, processes, and technologies that an enterprise has selected for the creation and operation of information systems.

Data Conversion
The process of changing data from one form of representation to another.

Data Integration
is the process of consolidating and managing customer information from all available sources.

Data Loading
The process of populating a data warehouse. It may be accomplished by utilities, user-written programs, or specialized software from independent vendors.

Data Mapping
The process of identifying a source data element for each data element in the target environment.

Data Marts
is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use

Data Migration
Also referred to as Data Conversion or Data Import. This process involves moving data from an old system into the new CRM system. Existing data from the old system will be cleansed and mapped to the new CRM system prior to starting this process.

Data Mining
is sorting through data to identify patterns and establish relationships.

Data Modeling
is the analysis of data objects that are used in a business or other context and the identification of the relationships among these data objects. Data modeling is a first step in doing object oriented programming.

Data Ownership
Responsibility for determining the required quality of the data, for establishing security and privacy for the data and determining the availability and performance requirements for the data. Data originators who have the authority, accountability, and responsibility to create and enforce organizational rules and policies for business data.

Data Quality
The degree of excellence of data. Factors contributing to data quality include: the data is stored according to their data types, the data is consistent, the data is not redundant, the data follows business rules, the data corresponds to established domains, the data is timely, the data is well understood, the data satisfy the needs of the business, the user is satisfied with the validity of the data and the information derived from that data, the data is complete, and there are no duplicate records. For example, this means that a customer's name is spelled correctly and the address is correct.

Data Source

An electronic collection of biological data, usually in the form of a database (such as relational, object-oriented or flat-file database), but sometimes in the form of a program.

Data Standardization

Data processing method used to reduce or eliminate custom, one-time and seldom-used data elements that introduce variability and potentially added costs and data quality problems.

Data Store
is a type of database often used as an interim area for a data warehouse. An ODS is designed to quickly perform relatively simple queries on small amounts of data, rather than the complex queries on large amounts of data typical of the data warehouse

Data Synchronization
A form of embedded middleware that allows applications to update data on two systems so that the data sets are identical. These services can run via a variety of different transports but typically require some application-specific knowledge of the context and notion of the data being synchronized.

Data Transfer
Data transfer is defined as any item (graphic, sound, html, file or database file) that is delivered from your account on the IT Dimensions Web Hosting web server to a visitor through your web pages. If size of your web page is 10Kb, each time this page is downloaded by a web browser, 10K of the data transfer quota is used. If this were an account, with a quota of 5 GB of data transfer, this page would have to be called from the web 500,000 times in the month to reach its quota.

Data Visualization
This term refers to presenting data and summary information using graphics, animation, 3-D displays, and other multimedia tools.

Data Warehouse
Central repository for all or significant parts of the data that an enterprise's various business systems collect. Data from various online transaction processing applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries.

Data Warehouse Integration

The process on reconciling each data warehouse increment with the strategic data warehouse architecture.

Database Conversion
A process of changing one database format to another database format. Database could be RDBMS (Relational database such as Oracle, Mysql, Postgres, Sybase, MS SQL Server), Desktop database (MS Access, FoxPro) or file databases such as XML, xls, csv, tab delimited, HTML and others. Part of this process could be transformation, data-cleaning, data-deduping, structure change and others.

Database Management System
A program that lets one or more computer users create and access data in a database.

Decision Systems
Computer based programs and technologies intended to make routine decisions, monitor and control processes, and aid or assist decision makers in semi-structured and/or non-routine decision situations.

Decision Support Systems (DSS)
Interactive computer-based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions.

Drill Down Reporting
Reporting based on analytical technique that lets a user navigate among levels of data ranging from the most summarized (up) to the most detailed (down).

Dynamic content
Web page content that changes or is changed automatically based on database content or user information. You can usually spot dynamic sites when the URL ends with .asp, .cfm, .cgi or .shtml, but it's also possible to serve dynamic content with standard static pages (.htm or .html). Many search engines index dynamic content, but some don't if there's a "?" character in the URL.


Data Quality

EAI - Enterprise Application Integration
Acronym for enterprise application integration. Originally defined as technology that connected enterprise-wise systems within a company it has evolved to refer to technologies used to connect systems anywhere they may be located.

EJB - Enterprise JavaBeans
EJB is a powerful component model for server-based applications as defined by Sun's reference. This framework provides the basis for component based, multi-tier applications that benefit from the "Write Once, Run Anywhere"tm capability inherent in all Java-based programs. EJBs provide server-side functionality while separating the presentation layer from the business layer, simplifying application development, and speeding deployment.

End User
An individual who uses a computer to perform a business or personal activity. Technical personnel are generally not considered end users when they are programming or operating the computer for technical purposes, though they are when they perform other tasks.

Enterprise Application
A software product designed to integrate computer systems that run all phases of an enterprise's operations to facilitate cooperation and coordination of work across the enterprise. The intent is to integrate core business processes (such as sales, accounting, finance, human resources, inventory and manufacturing). The ideal enterprise system could control all major business processes in real time via a single software architecture. Enterprise software is expanding its scope to link the enterprise with suppliers, business partners and customers.

ERM (Enterprise Relationship Management)
An enterprise-wide strategy and solution that impacts a company's "back office". It is designed to improve the management and flow of these operations by integrating and automating back office departments and processes.

ERP (Enterprise Resource Planning)
Equivalent to ERM, although it refers more specifically to operational planning and resource optimization.

ETL (extraction, transformation and loading)
Tools for extracting data and its metadata from one data store, transforming the record structure and content of this data, and loading the transformed data to another data store. These tools are sometimes referred to as extraction/transformation/transport or ETT technology.

Exception Reporting
A reporting philosophy and approach that supports Management by Exception. Reports should be designed to display significant exceptions in results and data. The idea is to "flag" important information and bring it quickly to the attention of managerial users of the report.

A collaborative, Internet-based network that facilitates intercompany relationships by linking an enterprise with its suppliers, customers or other external business partners. Extranets use Internet-derived applications and technology to provide secured extensions of internal business processes to external business partners.

Front Office Solution
Software applications designed to assist organizations with the management of tasks and processes related to customer-facing departments. This is usually a CRM solution, but may include any other applications used in the customer lifecycle.

Group Decision Support Systems (GDSS)
An interactive, computer-based system that facilitates solution of unstructured problems by a set of decision-makers working together as a group. It aids groups, especially groups of managers, in analyzing problem situations and in performing group decision making tasks.

Is software designed to support more than one person working on a shared task. Groupware is an evolving concept that is more than multi-user software which allows access to the same data. Groupware provides a mechanism that helps users coordinate and keep track of on-going projects. It allows people to work together through computer-supported communication, collaboration, and coordination.

Heterogeneous systems

Systems that contain more than one kind of database [RDBMS], more than one kind of middleware or more than one type of operating system

Heuristic Data cleansing

Using speculative strategies for solving data quality problems.

Data that has been processed to add or create meaning and hopefully knowledge for the person who receives it. Information is the output of information systems.

Information Retrieval
The study of systems for indexing, searching, and recalling data, particularly text or other unstructured forms.

Information Systems Architecture
A formal definition of the business processes and rules, systems structure, technical framework, and product technologies for business information systems. An information systems architecture consists of four layers: business process architecture, systems architecture, technical architecture, and product delivery architecture.

The activity of combining data from multiple data sources to present a single collection of data to the warehouse.

The division of a project in which functionality is provided to the users in a series of phases.

An internal organizational network with at least one web server that is only accessible by an organization's members or others who have specific authorization. A firewall and password protection limit access to the network. The intranet is used to share corporate information.

The method by which you control anything. The screen is the interface to your computer, just as a dashboard is the interface to your car, just as a doorknob is the interface to a door.

JavaScript is a client-side programming or scripting language. It's used to create interactive and dynamic effects on a web page, as well as handle and manipulate form data. JavaScript is a separate language from Java.

JVM - Java Virtual Machine
JVM is an abstract computing machine, or virtual machine. JVM is a platform-independent programming language that converts Java bytecode into machine language and executes it. Using a JVM, you can run Java code on any number of different computer platforms, including Macintosh, Windows 95, and Unix. JVMs read and execute Java statements one at a time.

Keyword search
A search for documents containing one or more words specified by a user in a search engine text box.

Knowledge Base
A collection of facts, rules, and procedures organized into schemas. The assembly of all the information and knowledge of a specific field of interest.

Knowledge Management
The formal strategy and software designed to manage and leverage a company's intellectual assets. This strategy promotes a collaborative and integrative approach to the creation, capture, organization, access and use of information assets. In CRM systems, a product information Knowledge Base is used by customer service or by customers directly in a self-service model.

Knowledge Transfer
The act of transferring knowledge from one individual to another by means of mentoring, training, documentation, and other collaboration.

Legacy Applications
and data are those that have been inherited from languages, platforms, and techniques earlier than current technology. Currently, many companies are migrating their legacy applications to new programming languages and operating systems that follow open or standard programming interfaces. This will make it easier in the future to update applications without having to rewrite them entirely and will allow a company to use its applications on any operating system.

Legacy Data Critical organizational data stored in mainframes and minis (legacy systems).

Legacy system

An existing system that is designated for closure when the capability is absorbed by an interim or core system. It is not cost effective to modify or enhancements legacy systems.

Legacy system takedown

Gradual process of eliminating a legacy system. During this process a new system runs in pilot mode and the data are being converted / synchronized with the new system . Usually, the legacy system is replaced by newer better performing system.

Libraries (queries and reports)
Sets of programs that have been created, fully tested, quality assured, documented, and made available to the user community. The programs in these libraries are variously called canned, predefined, parameterize, or skeleton queries/reports. They are launched by the user, who only enters a variable such as a date, region number, range of activity or some other set or sets of values the program needs to generate a query or report.

Logical Data Model
An abstract formal representation of the categories of data and their relationships in the form of a diagram, such as an entity-relationship diagram. A logical data model is process independent, which means that it is fully normalized, and therefore does not represent a process dependent (e.g. access-path) database schema.

Metadata or Meta Data
Data about the data in a data warehouse. Metadata provides a directory to help to locate the contents of the data warehouse; it is a guide to mapping data as it is transformed from the operational environment to the data warehouse environment; and it serves as a guide to the algorithms used for summarization of current detailed data. Metadata is semantic information associated with a given variable. Metadata must include business definitions of the data and clear, accurate descriptions of data types, potential values, original source system, data formats, and other characteristics. Metadata defines and describes business data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems.

A system of principles, practices, and procedures applied to a specific branch of knowledge.

A set of traditional and non-traditional business measurements such as rating customer satisfaction and order throughput time. A critical aspect of a CRM strategy and solution is the definition, tracking and reporting of a company's metrics.

A communications layer that allows applications to interact across hardware and network environments.

A tangible event used to measure the status of the project. Markers during the execution of a project that shows the movement of a project in the right direction.

Model Base
A collection of preprogrammed quantitative models (e.g., statistical, financial, optimization) organized as a single unit.

Another fancy term for Reference data management; Master Data Management Modeling Tools
Software programs that help developers/users build mathematical models quickly. Spreadsheets and planning languages like IFPS are modeling tools.

is a popular open source SQL (Structured Query Language) database implementation, available for many platforms, including Windows, Unix/Linux and Mac OS X..

Multidimensional database (MDB)
is a type of database that is optimized for data warehouse and OLAP applications. Conceptually, a multidimensional database uses the idea of a data cube to represent the dimensions of data available to a user. For example, "sales" could be viewed in the dimensions of product model, geography, time, or some additional dimension.

Object-oriented Database Management System (ODBMS)
is a database management system that supports modeling and creation of data as objects.

Open source is simply programming code that can be read, viewed, modified, and distributed, by anyone who desires.

The decision strategy of choosing the alternative that gives the best or optimal overall value.

is an acronym for Practical Extraction and Report Language. It's a very popular and powerful scripting language used for web applications. One of its strengths lies in its speedy and effective use of regular expressions

Pilot Conversion
The new system is installed for a few users, who evaluate it and help decide weather it is suitable for the rest of the system to join suit. This method is handy for new products, as it ensure functionality is at a level that can perform in real operation.

Project plan
A management document describing the approach taken for a project. The plan typically describes work to be done, resources required, methods to be used, the configuration management and quality assurance procedures to be followed, the schedules to be met, the project organization, etc. Project in this context is a generic term. Some projects may also need integration plans, security plans, test plans, quality assurance plans, etc.

Software trial that allows a prospect to try out the product before buying it. Delivers a realistic slice of functionality and is often used as the foundation for the first application. A quickly built system to show the capabilities of an idea. A proof-of-concept should not become a live system, but usually does. A pilot, proof of concept and prototype are sometimes used synonymously.

A strategy in system development in which a scaled down system or portion of a system is constructed in a short time, tested, and improved in several iterations. A prototype is an initial version of a system that is quickly developed to test the effectiveness of the overall design being used to solve a particular problem.

Quality Assurance
The department, role or process responsible for validating that which is proposed to ensure a correct outcome. The planned and systematic activities to provide confidence that a product or service will fulfill requirements for quality.

Generically query means question. Usually it refers to a complex SQL SELECT statement for decision support. See Ad-Hoc Query or Ad-Hoc Query Software.

Reference Data

Database(s) or datasets that uniquely identify key entities for a business. Reference data can be customer records, security reference and other support reference files.

Rapid Application Development (RAD)
Part of a methodology that specifies incremental development with constant feedback from the customers. The point is to keep projects focused on delivering value and to keep clear and open lines of communication. Oral and written communication is not completely adequate for specification of computer systems. RAD overcomes the limitations of language by minimizing the time between concept and implementation.

Data that is captured, and made available as it is happening. Real time data reflects the latest status of the organization's operational transaction data. Current moment in time. Real time refers to what is happening to any piece of data right now. For analysis, some people want to see current rather than historical data as is the case with most data warehouses.

A relationship between two instances of the same entity, as in "recursive data design".

Relational Database
is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables.

Relational Database Management System (RDBMS)
is a program that lets you create, update, and administer a relational database. Most commercial RDBMS's use the Structured Query Language (SQL) to access the database

SAP applications, built around their latest R/3 system, provide the capability to manage financial, asset, and cost accounting, production operations and materials, personnel, plants, and archived documents.

The ability to scale hardware and software to support larger or smaller volumes of data and more or less users. The ability to increase or decrease size or capability in cost-effective increments with minimal impact on the unit cost of business and the procurement of additional services.

Server Cluster
A Cluster is a group of physical servers, which run front-end web server software and contain back end data base-type facilities. A Cluster can be as simple as two servers, one front end and one back end, but normally a cluster will have several servers. Clusters contain facilities for services such as load balancing and data replication. Clusters offer enhanced security, redundancy and enhanced performance and are a natural evolution to any successful complex web site or Internet-based application.

Service Oriented Architecture (SOA)
SOA comprises of loosely joined, highly inter operable software components which allow for the application to be flexible and to respond to changes in your business more quickly.

Shared Server
Shared web servers are a very popular way of providing low-cost web hosting services. Instead of requiring a separate computer for each site, dozens of sites can co-reside on the same computer. In most cases, performance is not affected and each web site behaves as if served by a dedicated server.

SQL (Structured Query Language)
is a standard interactive and programming language for getting information from and updating a database. Queries take the form of a command language that lets you select, insert, update, find out the location of data, and so forth

Strategic Planning
A decision-making process in which decisions are made about establishing organizational purposes/mission, determining objectives, selecting strategies and setting policies.

Systems Development Life Cycle (SDLC)
A process by which systems analysts, software engineers, programmers, and end-users build systems. It is a project management tool, used to plan, execute,, and control systems development projects. The steps in the cycle include: 1) Determine user requirements; 2) Systems analysis; 3) Overall system design; 4) Detailed system design; 5) Programming; 6) Testing; and 7) Implementation.

System Integration

A process to make two or more software application to work together. Data conversion and synchronization processes are part of system integration.

Total Cost of Ownership (TCO)
TCO is a type of calculation designed to help consumers and enterprise managers assess both direct and indirect costs and benefits related to the purchase of any IT component. The intention is to arrive at a final figure that will reflect the effective cost of purchase, all things considered.

is a computer operating system initially designed with the objective of creating an OS written in a high level language rather than assembly. A majority of web servers currently run on different "flavors" of this high-performance OS, or on Linux, developed as a Unix-like operating system.

An evaluative term for a Decision Support System's user interface. The phrase indicates that users judge the user interface as to easy to learn, understand, and use.

User Interface
The component of a computerized support system that allows bidirectional communication between the system and its user. This is also called the dialogue component of a DSS. An interface is a set of commands or menus through which a user communicates with a program.

A computerized system or a tools using a "thin-client" Web browser like Netscape Navigator or Internet Explorer. The computer server that is hosting the application is linked to the user's computer by a network with the TCP/IP protocol.

The automation of a process or series of processes through the linking of tasks and activities. The software that automatically routes tasks, notifications and records to predefined or user selected destinations such as users, departments, or business units.

© 2015 IT Dimensions, Inc. All Rights Reserved
If your DQ initiative failed, call us: 1-718-777-3710 | Astoria | New York | USA
Data Quality