Recovering data stored in eventInsertbuffer/expired directory

  • 3705530
  • 14-Jan-2008
  • 28-Aug-2012

Environment


Sentinel 5.1.xx Sentinel Server
Sentinel 6.0.xx Sentinel Server 
Sentinel 7.0.xx Sentinel Server 
Sentinel Log Manger 1.2

Situation

Sentinel DAS service has failed or the DB was down and upon attempting to insert events into the DB/File System it has created files on %ESEC_HOME%/sentinel/bin/eventfiles/data
and on %ESEC_HOME%/sentinel/bin/eventfiles/expired.

The events on %ESEC_HOME%/sentinel/bin/eventfiles/data were recoverd automatically from Sentinel while those on %ESEC_HOME%/sentinel/bin/eventfiles/expired are still there. Is there a way to consume the events in the expired directory?

Resolution

There are two directories that could be involved in these scenarios. They are both located under %ESEC_HOME%/sentinel/bin:
eventInsertbuffer contains all the events cached to file when hitting an error trying to insert into database.
eventProcessingbuffercontains all the events cached to file when hitting an error trying to write to eventfiles using eventredirect service..
Here is an explanation of what goes in each of those folders:
Or Expired directory
In Sentinel, event insertion can fail at two points:
1) The first place it can fail is when Sentinel  tries to write the event to a file that will later be processed by DAS Aggregation. It must successfully write the event to this file, or else the event summaries calculated by DAS Aggregation could be incorrect. This step, interestingly, requires some interaction with the database in order to update some aggregation file status fields. So, if the event fails to get written to the file because of a file error, db error, or some other error, the event will get put in the "eventProcessingbuffer" folder.
2) After the event is successfully written to the aggregation file, it will attempt to be inserted into the EVENTS table in the DB. If this fails, the event gets put in the"eventInsertbuffer" folder. This is usually where things fail because event insertion takes a lot of resources in the DB.

In either case, you'll see a "data" sub folder in the above mentioned folders if there was some sort of event failure. If the event failure continues beyond the expiration threshold (default 8 hours; configurable in das_binary "eventTimeoutSec" property, which appears twice in the xml file), the event will be moved to the "expired" sub folder and the system will no longer try to process it.

In order to get the system to process the events that have expired, You must copy the data up 1 dir, stop and start the service:

In 6.0 the process is below
1) Log into Sentinel Control Center.
2) Go to the Admin tab.
3) Open the Servers Viewer.
4) Find the running DAS_Binary process, right click on it, and select Stop. If there is more than one DAS_Binary process running, do this for all of them.
5) In a terminal, log in as the esecadm user into the machine where the DAS_Binary process is running.
6) Move all the files that are under the "expired" directory underneath the "data" directly. I've seen people interpret this in many different ways, so let me explain:
- Say you have the following directory structure:
eventInsertbuffer/expired/1/...
eventInsertbuffer/expired/2/1/...
eventInsertbuffer/data/2/2/...
eventInsertbuffer/data/3/...
- The correct move operation would result in the following:
eventInsertbuffer/expired/
eventInsertbuffer/data/1/...
eventInsertbuffer/data/2/1/...
eventInsertbuffer/data/2/2/...
eventInsertbuffer/data/3/...

- Note that there are sub directories under the directories numbered "1", "2", "3", etc. So if there are directory name collisions between the directories that are under the "expired" directory and those already under the "data" directory, you need to make sure you use the appropriate set of commands on the OS you are using so that no files get overwritten (or else events will be lost). Also note that the actual leaf file-names should never collide.

Additional Information

This will happen if the system run's in to High Utilization and it can't write the data to the DB or Disk. The following is how it calculates what is to be expired. It takes the current time minus the event time, and if that is greater than eventTimeoutSec it will store the event as expired