OO MSSQL DB performance issues

  • KM03208776
  • 24-Jul-2018
  • 24-Jul-2018

Archived Content: This information is no longer maintained and is provided "as is" for your convenience.

Summary

OO MSSQL DB performance issues

Question

Over the past week we have been experiencing severe OO db performance issues. Prior to this week, our system could easily run 35,000+ workflows per day but lately it has not been able to keep up. We've notices the DB server is almost always pegged at 100% CPU usage.
In order to keep the system running we have had to clear the running queue several times per day using a stored procedure provided by support.
We feel that the performance issues might be due to huge tables and poor db maintenance. Most of the OO tables had significant fragmentation (>90%) so we stopped our OO Central servers for several hours to rebuild most of the highly fragmented indexes. We were able to do this for many of the large tables but not the largest table, OO_STEP_LOG_BINDINGS. That brings up question #1.
1a) How important is OO_STEP_LOG_BINDING for performance?
1b)Would rebuilding the indexes on this table fix our performance issues?
currently our OO_STEP_LOG_BINDINGS table contains over 2 billion records.
When our DB server's CPU is pegged at 100% we've noticed the following query is always running. The query seems to run for several minutes, longer depending on the number of items in the "EXECUTION_ID IN ()" clause. The IN clause often contains several hundred items. sometimes up to 999. When it is running, the Central server's activity is noticeably reduced. It seems the Central servers are almost hung waiting for this query to complete.
select distinct hibsteplog0_.EXECUTION_ID as col_0_0_, hibsteplog0_.INVOKED_UUIDS as col_1_0_ from OO_STEP_LOG_STARTED hibsteplog0_ where (hibsteplog0_.EXECUTION_ID in ( @P0 , @P1 , <long_list_of_items> )) and (hibsteplog0_.INVOKED_UUIDS is not null)
2) what is this querying doing? and how can we prevent it from hanging the system?
FYI. We have been using the OO database maintenace workflows to purge workflow history but lately the workflows have been running constantly and exiting with errors. It seems like the purge workflows are not able to keep up with the incoming data. We are looking at the DB maintenance scripts to see if they provide any performance benefit over the maintenace workflows.
Thanks.

Answer

“Mitigating the database size in OO 10
Introduction
OO 10 comes with a new architecture which allows for a very scalable and easy to deploy installation. An aspect of this architecture than can be significantly noticeable for some customers is the increase in the database size used by OO (a 200 GB database size can become frequent).
Explanation
In OO 9, the information required by the run-engine to execute a flow logic is stored in the physical RAM memory of the host machine. This had advantages but also disadvantages especially when a scalable or high availability OO deployment was required because OO had to rely on 3rd party technologies (Terracotta). To decouple OO of from the usage 3rd party technologies, in OO 10 all information required to execute the flow logic is stored in the database. The consequence of this design is that after a flow finishes, this information which otherwise is not needed any more would remain persisted in the database.
The table consuming most of the database disk space is the OO_STEP_LOG_BINDINGS. This table contains all inputs and results of all run steps which have already ended. There is a separate entry per input/result.
Values of some inputs or results might be BLOBs. In case there are worker failures and recoveries during a run, some steps will be duplicated, together with all their bindings, which will affect the table growth. So the table size and implicit database size does not depend on the flow complexity, but rather is a function of the number of actual steps and number of inputs/results and their size. A very simple flow having a loop and large inputs/results can cause the table to grow faster than a more complex flow.
Mitigation
OO R&D team has developed several tools and libraries to cope with the database size that can become unmanageable.
0. Upgrade to latest OO version
First and foremost recommendation is to upgrade OO to latest version available. The database size is one of OO R&D top priorities and in each version improvements are being made in database area (for instance OO 10.20 comes with around 10% improvement)
1. Database guide
Available at https://hpln.hp.com/node/21996/attachment , this document is the first step when dealing with database issues. It contains extensive information on configuring the database schema used by OO and it covers all supported database vendors
2. Tutorial: Controlling the database size
Available at https://hpln.hp.com/node/22841/attachment, this document provides an overview of options and actions that can be taken to reduce the database size
3. Database purging scripts
Available for download https://hpln.hp.com/node/21/otherfiles/?dir=22470 , these are set of scripts specific to each database vendor that should be handled to a DBA. These scripts allow removing historical data information. Please do not skim on the details presented in the documentation associated with the purging scripts.
Running the purging scripts can take a considerate amount of time (multiple hours) and it’s highly recommended that all flow execution to be suspended while the scripts are running.
Note: The scripts are available for OO 10.20 and OO 10.10 version and are not compatible with each other
4. HP Solutions content pack
Available for download at: https://hpln.hp.com/contentoffering/community-hp-solutions.
The HP Solutions content pack contains a list of flows that can be incorporated in an automated solution for purging the OO database.
These purging flows are available starting hp-solutions-cp-1.3.0 and this content has compatibility only with version OO 10.20. Check before downloading the appropriate CP version that you need to use for your environment.
Note that this option has the drawback of being slower when compared to running the purging scripts due to the overhead created by the communication with Central. For more details, please consult the Controlling database size document at step 2.
Other considerations
Before upgrading OO, it is highly recommended that the database size is reduced to the absolute minimum. During the upgrade procedure, the database schema will be altered and on a large database, the upgrade can take a considerate amount of time which can lead to a more than expected change management window.
On a large database on the orders of hundreds of gigabytes, the upgrade can take as much as 12 hours due to the time required to update the schema and update all records affected by the change. 8