Sed and Regular expressions

  • 7011189
  • 24-Mar-2011
  • 19-Oct-2012

Resolution

NetIQ's Access Governance Suite is a convenient tool in that it uses XML to describe objects and relationships within the repository. This us allows you great flexibility in working with those objects as they can be described and configured with much more ease through the use of XML. Modifications to the XML objects are pretty straightforward and easy to understand, they are not cryptic by default, though they do contain GUIDs when pulled from the repository.

The use of GUIDs may present challenges to you, especially if you are handed a set of objects to import into a system that were not cleaned, or if you need to perform a recovery of a system from another known-good system when something goes wrong, and all you have are copies of the files form the other system with the GUIDs included.

Fortunately, if you have access to a UNIX shell account or if you are a Linux or Unix user, you should have accees to sed, the Stream Editor, a basic POSIX tool that will process files according to a script. By using sed together with a simple regular expression, I will show you how to remove all instances of the GUIDs from any NetIQ Object XML that you receive when you want a clean file.

In my example, I receive a file of some 500 Applications that are defined in a Link Configuration object. I need to clean out the GUIDs due to a system issue that has left the production server in a poor state - the current Link Configuration has somehow been wiped clean.

The only existing Link Configuration Object that I can find in the organization comes from a different server, run by a different group. They have the staging environment, but the Administrator does not use the iiq console, so they copy and paste their Link Configuration from the debug page. This leaves me with a Link Configuration file that I need to clean and remove some 500 GUIDs from as they will not be globally unique for my servers.

Should I do this by hand? In one word: no. Technology makes things easier for us, and sed will be very handy here. Let's take a look at how we can remove everything that is a GUID in the file quite easily, and within a few seconds. I should caution you that you will want to try a few tests first to ensure that your output will be what you require. Nothing quite works so well as removing items in your regular expressions that you do not intend, or finding out that extra spaces (or the lack thereof) can have somewhat strange consequences in the output that you desire.

So, for a recommended best practice, it is a good habit to create a working copy of the file that you wish to perform the operation upon. You don't want to work on the existing original. Copy that file instead and use the copy as your working copy. That way, if you make a mistake in processing, you always have the original as a fall back to go and retry your operations without losing the original file that needed the modification.

In my example, I note the following information as I look at the start of the file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ObjectConfig PUBLIC "sailpoint.dtd" "sailpoint.dtd">
<ObjectConfig name="Link" id="8a50082522192a480122192a6cae0082" modified="1289241006537" created="1245962726574">
  <ObjectAttribute displayName="RSC-0088 Groups" editMode="ReadOnly" multi="true" name="groupAIMS11" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" id="8a50082523b637120123b638096a0003" name="RSC-0088"/>
      </ApplicationRef>
    </AttributeSource>
  </ObjectAttribute>
  <ObjectAttribute displayName="RSC-0088-22 Groups" editMode="ReadOnly" multi="true" name="groupAIMS21" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" id="8a50082523b637120123b6380b300007" name="RSC-0088-21"/>
      </ApplicationRef>
    </AttributeSource>
  </ObjectAttribute>
  <ObjectAttribute displayName="RSC-00088-22 Groups" editMode="ReadOnly" multi="true" name="groupAIMS22" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" id="8a50082523b637120123b6380cb6000b" name="RSC-0088-22"/>
      </ApplicationRef>
    </AttributeSource>

Now, I now that I need to remove anything with a GUID (id="*") a modified date (modified="*") and created date (created="*) in the file, as these will cause problems immediately. Of these the modified and created dates are going to exist solely for the unique instance that they occur for this file, as the file itself is only modified and created once. So, it will be beneficial to remove this information by itself.

Using sed, I can remove these two objects from the ObjectConfig line with the following script:

sed -e 's/modified=".*\"//' LinkConfig_stage.xml > LinkConfig_production.xml

This command tells the stream editor to erase the regular expression starting with "modified=" and ending with the last " in the line. The syntax is important here, as the first / begins the regular expression and the last // ends that expression, the single quote is the script to operate upon, the -e is the operation for the editor itself.

I encourage you tro try this for yourself. You will find that this does indeed remove the modified and created parameters in the ObjectConfig declaration, but does leave the GUID followed by a space and then the carat >.

Now, we need to remove the GUIDs from the file. It would be easy if all we had to do was to remove everything from the id=" portion of the line onwards, but there is useful information after that in the reference that we want to maintain. within that file. So this requires a little more dexterity in editing:

sed -e 's/id=".*\" //' LinkConfig_production.xml > LinkConfig_final.xml

Once that is complete, I can check my LinkcConfig_final.xml file and see the first few lines are now:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ObjectConfig PUBLIC "sailpoint.dtd" "sailpoint.dtd">
<ObjectConfig name="Link" >
  <ObjectAttribute displayName="RSC-0088 Groups" editMode="ReadOnly" multi="true" name="groupAIMS11" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" name="RSC-0088"/>
      </ApplicationRef>
    </AttributeSource>
  </ObjectAttribute>
  <ObjectAttribute displayName="RSC-0088-22 Groups" editMode="ReadOnly" multi="true" name="groupAIMS21" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" name="RSC-0088-21"/>
      </ApplicationRef>
    </AttributeSource>
  </ObjectAttribute>
  <ObjectAttribute displayName="RSC-00088-22 Groups" editMode="ReadOnly" multi="true" name="groupAIMS22" type="string">
    <AttributeSource name="User Groups">
      <ApplicationRef>
        <Reference class="sailpoint.object.Application" name="RSC-0088-22"/>
      </ApplicationRef>
    </AttributeSource>

It is advisablle to check the whole file though, as a missed space in the script or regular expression can cause the output to be incorrect.

Looking at the regular expression itself, it starts with id=" and then captures and includes everything up to " plus a whitespace. So the first closing quotemark and whitespace thereafter is removed as well. That is important to making the script evaluate correctly and remove the information (and only the information) that you want removed.

Try it out for yourself, and let us know if this is helpful as well.