Data preparation involves the specification, cleansing,
and distillation of one or more source data feeds into a central data repository
that the dashboard solution will use for its calculations. Specific feeds
are identified, translation and validation rules are created, and in most
instances raw source data is transformed into one or more high-performance
OLAP cubes. Data preparation is usually completed in partnership with a
client’s technical team. This process is not unlike data preparation efforts
for other reporting or database integration projects. Specific responsibilities
are usually dictated by technical resource availability, source of data,
and existing BI infrastructure.
Click on any Step above to see details
Dimensional Modeling
In this stage of the process we look at all of the
ways in which we will have to query and filter data based on the working
prototypes developed in the previous phase.
A dimensional model takes one or more data feeds from source transactional
systems and creates a data schema that is based on the concept of "measures"
and "dimensions." A measure is numeric piece of data that is relevant to
our dashboard, and a dimension is a mechanism by which we can slice and
filter the results of our measures. Typically this is accomplished with
a modified star or snowflake schema.
The end result of this stage is a comprehensive data model that integrates
all source system data into one high-performance data repository that our
dashboards can use to render their data visualizations.
Data Feed Definition
While the working prototypes provide a set of functional requirements for our Dimensional Model, the Dimensional Model helps define the specifications for the format of the source data feeds.
Based on the number of measures, dimensions, and the frequency of the data, we create a set of data feed definition files which clearly enumerate all data required for the system to operate on a field by field basis. This document is often used by the client's internal IT department as the basis for developing their source data feeds.
Data Extracts
Data Extracts are the execution of the Data Feed definitions against source systems. This step is most often performed by the client’s internal IT department as it requires a deep understanding of the source system data and structures.
The client uses the data feed definition as guide in building their source data extracts, and tests the accuracy of their extracts against the data feed definition..
ETL & Staging.
Once data extracts have been provided (either in
sample form or live) we then build the "Extraction - Transformation - Load"
routines that take the data extracts and transform them into our dimensional
model.
The dimensional model itself is a relational database and used as
a staging area for batch updates prior to us moving the data into an OLAP
engine.
This ETL step is where all the critical logic takes place to make sure all of the source systems data forms the necessary relationships between each other. This is also the place where
we validate the accuracy of the data and perform the needed logic to handle such things as slowly changing dimensions and exception rules.
OLAP Solution
OLAP stands for "On-Line Analytical Processing" and is a specialized database primarily used to pre-calculate the aggregates/sums/groupings of the intersection of each dimension and measure in the dimensional model.
Often times this database is referred to as a "cube" as the data is no longer in a 2 dimensional row/column format.
OLAP solutions are often necessary when having to analyze large sets of data in aggregate form. These data cubes are usually necessary to build the high performing foundation for dashboards to calculate, render, and provide on demand interactivity at high speed.