Data Acquisition Improvements


Part of maintaining the Data Warehouse means keeping data current. We achieve this by processing and loading new or changed data from the source applications on a nightly basis. We report the most recent data load completion times in the Data Availability table on the System Status page.

Since its inception, Decision Support worked toward the goal of having data consistently loaded and available by 8 a.m. We are pleased to announce our efforts have payed off and we have reached our goal! We aren't done yet, though. We will continue work toward additional improvements.

Overall Progress

As of March 14, 2008, the Data Warehouse processing has been consistently completing more than 7 hours earlier than before these changes were implemented, bringing us 100% toward our goal.

Current Efforts
The current phase of our processing improvement plan focuses on eliminating a processing step during the data acquisition (whenever possible), optimizing job schedules, and tuning long-running Data Warehouse processing programs.

I. Eliminate a Processing Step
Previously, acquiring data involved three separate steps for most tables:
1. Extract data from the source tables (e.g., Banner)
2. Transform the data from the source tables format to the target tables format
3. Load the formatted data into the production environment users access

In most cases, we are eliminating the third step and are instead loading the data directly to the final destination as part of the second step. This process reduces the overall Data Warehouse processing time and frees machine/database resources for other work, as described below.


As of March 14, 2008, we have applied this change to 73% of the Data Warehouse table load processes (509 of 700).

II. Optimizing Job Schedules
We have identified 23 opportunities to realign processing schedules to allow more jobs to run at the same time. This utilizes the machine/database resources freed up by the elimination of the processing step described above, and allows the overall process to complete sooner.


As of March 14, 2008, we have implemented 77% of these schedule changes (20 of 26).

III. Tuning Long-Running Data Warehouse Processing Programs
We have identified the 20 longest-running ETL programs. Each will be analyzed for opportunities to improve efficiency and reduce run time. This will allow jobs to complete sooner, thereby improving the completion time of the overall process.


As of March 14, 2008, we have implemented tuning changes for 40% of the long-running processing programs (8 of 20).

Improvement Forecast
We are pleased to announce that we have exceeded our goals for the third quarter, and the daily process to load new or changed data from the source applications is now consistently completing before 8 a.m.!

Decision Support would like to thank everyone who has been working so hard to make these significant improvements possible, including the DBA and Production Scheduling groups in AITS who have been assisting us with these efforts. The team has done an excellent job of finding creative ways to improve the processing completion times, and we commend them for their persistence and skill.

We look forward to making additional improvements in the processing run times. New goals will be set and work planned to reach them. If you have any questions or comments, we invite you to contact us.

Related Information