Whether you are new to AEM or an experienced developer, committing changes to AEM makes the hair on the back of your neck stand up. You have to make sure another process won’t come along and commit changes at the same time in a different session. As soon as you kick off asynchronous processes which can commit, hold on to your hat and watch your logs closely.
And of course the codebase I’m currently working on has concurrent jobs running all over the place. While working my way through reproducing an inconsistent issue I noticed some PersistenceExceptions which made my hair stand on end. I was dealing with a workflow which kicked off a series of Sling Jobs and each time there was a failure the same exceptions would show up in the logs. I had stumbled upon a race condition.
NOTE: Typically OakState issues come from long running ResourceResolver sessions. That wasn’t the case in this instance. All resource resolvers were closed once an operation completed. If you have a similar issue there are plenty other articles on that topic and it’s the first place you should look. Make sure if you create a session or resource resolver that you are also closing them. And DO NOT store them as a class level variable.
So back to the issue at hand. An interesting Java Class I noticed in the stack trace was the ConflictValidator class. After searching for more information on this class and only finding the Javadoc, I decided to find the jar and decompile it. Let’s see what it actually does.
Once I knew the Class was in the oak-core bundle, I went to my local crx-quickstart/launchpad/felix directory and searched for oak-core.
I then went to the enclosing folder and found the jar to decompile. Typically, I use JD-GUI to look at files, but you can use any preferred decompiler.
Next, I noticed that there was a debug message which may show more information.
Setup a Logger
So, naturally it was time to setup a separate log file in AEM to start capturing these debug messages:
- On your local environment go to the OSGi configurations http://localhost:4502/system/console/configMgr
- Search for “Apache Sling Logging Logger Configuration“
- Click the + button to add a new logging configuration
- Set the Log Level to Debug, give the log file a name, and put org.apache.jackrabbit.oak.plugins.commit in the Logger field
- Save the configuration
- Search the configurations for “Apache Sling Logging Writer Configuration“
- Click the + button to add a new log writer configuration
- Here’s an example configuration:
- Save the configuration
- You should now be ready to capture additional information the next time it occurs
Once I started logging, I also noticed there were debug messages coming from MergingNodeStateDiff in the same package. With these debug messages, I was able to determine that the merge conflict was related to deleting an asset and trying to update it at the same time. To find the root cause, I had to add additional Sling Job logging. I needed to see what Sling Jobs were executing and in which order. Then, I could see what was happening when everything was working and when the race condition occurred. To setup Sling Job logging, do the same steps as above, but change the Logger field to “org.apache.sling.event”
This type of issue can take a lot of analysis to get to a root cause. I hope this helps you troubleshoot your issue more quickly.