Alfresco – Removing JMX settings via the AttributeService since the Revert doesn’t work

February 17, 2022, 10:30 am

≪ Previous: Alfresco – Documents are searchable only after 3min

At a customer, I recently faced a problem with the JMX settings that couldn’t be reverted. If you already worked with Alfresco, you know that it’s possible to save configurations through the Alfresco Administration Console and that doing so will store these settings into the Database. This has several drawbacks like the fact that it’s then impossible to manage your Alfresco through its configuration files since the DB settings will always take precedence and that can lead to nasty surprises. Another potential problem is that the settings will then be globally managed, meaning that you cannot have distinct values on distinct Alfresco Nodes (in case of clustering). Therefore, a best practice is to always use the Alfresco configuration files and not the JMX Settings, except for some quick testing but you must never forget to revert them.

During an upgrade to Alfresco Content Services 7.1.0.1, I had the unpleasant surprise, at that customer, that there were JMX Settings being used for the Solr configuration (and 1 for Lucene as well but not shown on the Administration Console) that I couldn’t remove. These settings were remainder, coming from a very old system that we didn’t manage at the time. As part of the upgrade process, to match our Best Practices, the JMX Settings were checked, and all removed using the “Revert” button on the Alfresco Administration Console. There was no problem with that on a few environments, but it was, unfortunately, not the case for the PROD where it just didn’t work. Trying to use the “Revert” button on the Alfresco Administration Console ended up with the following error and logs:

2022-02-03 20:15:14,962  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-6] admin-systemsummary.get.js Start
2022-02-03 20:15:15,052  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-6] admin-systemsummary.get.js End 90 ms
2022-02-03 20:15:18,165  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-4] admin-jmx-settings.get.js Start
2022-02-03 20:15:18,185  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-4] admin-jmx-settings.get.js End 19 ms
2022-02-03 20:15:37,373  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Resolving and compiling script path: jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-remote-api-11.153.jar!/alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/support-tools/admin-jmx-settings.post.js
2022-02-03 20:15:37,373  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Found script resource import: classpath:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/admin-common.lib.js
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Succesfully located script 'classpath:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/admin-common.lib.js'
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Found script resource import: classpath:alfresco/templates/webscripts/org/alfresco/repository/admin/admin-common.lib.js
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Succesfully located script 'classpath:alfresco/templates/webscripts/org/alfresco/repository/admin/admin-common.lib.js'
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Imports resolved, adding resource 'classpath:alfresco/templates/webscripts/org/alfresco/repository/admin/admin-common.lib.js
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Imports resolved, adding resource 'classpath:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/admin-common.lib.js
2022-02-03 20:15:37,374  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-1] Imports resolved, adding resource '_root
2022-02-03 20:15:37,383  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-1] admin-jmx-settings.post.js Start
2022-02-03 20:15:37,384  DEBUG [repo.jscript.ScriptLogger] [exec-1] beanName: Alfresco_Category=Search,Type=Configuration,id1=managed,id2=solr
2022-02-03 20:15:37,415  INFO  [management.subsystems.ChildApplicationContextFactory] [exec-1] Starting 'Search' subsystem, ID: [Search, managed, solr]
2022-02-03 20:15:37,735  WARN  [management.subsystems.ChildApplicationContextFactory$ChildApplicationContext] [exec-1] Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
2022-02-03 20:15:38,237  WARN  [management.subsystems.ChildApplicationContextFactory] [exec-1] Startup of 'Search' subsystem, ID: [Search, managed, solr] failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:342)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:113)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1689)
        ...
Caused by: org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
        at org.springframework.beans.BeanWrapperImpl.createNotWritablePropertyException(BeanWrapperImpl.java:243)
        at org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:432)
        at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)
        ...
2022-02-03 20:15:38,238  ERROR [management.subsystems.PropertyBackedBeanAdapter] [exec-1] java.lang.RuntimeException: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
2022-02-03 20:15:38,239  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-1] admin-jmx-settings.post.js Exception
org.mozilla.javascript.WrappedException: Wrapped javax.management.RuntimeMBeanException: java.lang.RuntimeException: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter? (classpath*:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/support-tools/admin-jmx-settings.post.js#486)
        at org.alfresco.enterprise.repo.management.script.ScriptableMBeanOperations$4.call(ScriptableMBeanOperations.java:398)
        at org.mozilla.javascript.optimizer.OptRuntime.callProp0(OptRuntime.java:98)
        at org.mozilla.javascript.gen.classpath__alfresco_enterprise_webscripts_org_alfresco_enterprise_repository_admin_support_tools_admin_jmx_settings_post_js_20._c_main_17(classpath*:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/support-tools/admin-jmx-settings.post.js:486)
        ...
Caused by: org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
        at org.springframework.beans.BeanWrapperImpl.createNotWritablePropertyException(BeanWrapperImpl.java:243)
        at org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:432)
        at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)
        ...
2022-02-03 20:15:38,240  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-1] admin-jmx-settings.post.js End 857 ms
2022-02-03 20:15:38,242  ERROR [extensions.webscripts.AbstractRuntime] [exec-1] Exception from executeScript: 01030006 Wrapped Exception (with status template): 01030131 Failed to execute script 'classpath*:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/support-tools/admin-jmx-settings.post.js': java.lang.RuntimeException: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
org.springframework.extensions.webscripts.WebScriptException: 01030006 Wrapped Exception (with status template): 01030131 Failed to execute script 'classpath*:alfresco/enterprise/webscripts/org/alfresco/enterprise/repository/admin/support-tools/admin-jmx-settings.post.js': java.lang.RuntimeException: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'search.solrIndexCheckService' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-enterprise-repository-11.153.jar!/alfresco/subsystems/Search/solr/solr-jmx-context.xml]: Cannot resolve reference to bean 'solrAdminClient' while setting bean property 'solrAdminClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrAdminClient' defined in URL [jar:file:/opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib/alfresco-repository-11.140.jar!/alfresco/subsystems/Search/solr/solr-search-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
        at org.springframework.extensions.webscripts.AbstractWebScript.createStatusException(AbstractWebScript.java:1139)
        at org.springframework.extensions.webscripts.DeclarativeWebScript.execute(DeclarativeWebScript.java:171)
        at org.alfresco.repo.web.scripts.RepositoryContainer.lambda$transactionedExecute$2(RepositoryContainer.java:556)
        ...
Caused by: org.springframework.beans.NotWritablePropertyException: Invalid property 'solrHost' of bean class [org.alfresco.repo.solr.SOLRAdminClient]: Bean property 'solrHost' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
        at org.springframework.beans.BeanWrapperImpl.createNotWritablePropertyException(BeanWrapperImpl.java:243)
        at org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:432)
        at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)
        ...

So, what to do when JMX Settings cannot be Reverted like that? There is always the possibility to clean the Database directly (while Alfresco isn’t running) but the DB tables that store the JMX Settings are rather complex. It’s not impossible but it’s definitively not the recommended option if you can avoid it. Of course, if Alfresco cannot start at all anymore, you will have to go through the Database and remove things in there, so look at the following tables: alf_prop_link, alf_prop_value, alf_prop_string_value, alf_prop_root and alf_prop_unique_ctx. However, if Alfresco can start, then you have a much nicer and much safer way to remove JMX Settings: using the AttributeService!

The AttributeService is an interface of the Alfresco Java API to manage attributes. There isn’t much documentation about it but there are a few technical presentations: Alfresco’s Attribute Service Primer (June 2013) & Alfresco Tech Talk Live (Episode 75) (April 2014). The idea to use this service actually came initially from Bindu Wavell while we were discussing this topic on the Alfresco Discord (don’t hesitate to join us there) with other members of the Alfresco Community.

This AttributeService if very good for specific purposes but it’s initially part of the Java API, meaning that you would need to write a piece of Java to use it. That might be a problem for some customers or for you directly, if you aren’t really proficient in development. Fortunately, there is something, that (almost?) everybody in the Alfresco Community knows, called the JavaScript Console. It’s an addon that allows you to use the Alfresco JavaScript API on the Repository directly, by executing the code in the Share UI (i.e. in your browser). This addon also allows to do JavaScript-Java interoperability, meaning that you can use the Java API through the JavaScript Console. Here is an example of how to use the AttributeService.getAttributes method in the JavaScript Console:

var context = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var attributeService = context.getBean("attributeService", Packages.org.alfresco.repo.attributes.AttributeServiceImpl);

attributeService.getAttributes(function(id, value, keys) {
    print(' > id: ' + id);
    print('  >> key: ["' + keys[0] + '", "' + keys[1] + '"]');
    print('  >> value: ' + value);
    print('');
    return true;
}, ["keyCheck"]);

The first 2 lines above are related to the interoperability and the rest simply list all Attributes that share the same key(s) (in the above example, the key is: [“keyCheck”]). Using the AttributeService in this way is really simple and it might be harder to find the correct key(s) to use because there is no documentation on it. Obviously, if it’s related to your custom attributes, then you should know the keys since you created them, but what about the Alfresco OutOfTheBox ones? Alfresco made it so that it is unfortunately impossible (as far as I could see) to retrieve and list all attributes in one go. You would need to first find at least the top-level key for the attribute you are looking for before you can retrieve the associated value(s). It took me some time but, in the end, I was able to find what I was looking for inside the alf_prop_string_value DB table. The key(s) and its associated values are stored on this table and therefore to find them, you can just list its content. Most of the Alfresco OOTB keys appear to have a alf_prop_string_value.string_value starting with a dot. It’s not always the case as you can see above (it’s not [“.keyCheck”] but simply [“keyCheck”]) but it does gives already a certain list of keys to go through. Here is an example:

SQL> SELECT id, string_value
 2   FROM alf_prop_string_value
 3   WHERE string_value like '.%'
 4   ORDER BY 1 ASC;
  id              string_value
------  ---------------------------------
    21  .PropertyBackedBeans
    46  .ChainingUserRegistrySynchronizer
    55  .empty
    58  .repoUsages
    62  .SHARD_STATE
  1372  .clusterInfo
  1373  .cluster_name
  1375  .clusterMembers
  1377  .host_name
  1379  .ip_address
  1381  .port
  1382  .clustering_enabled
  1383  .last_registered
  1384  .cluster_node_type

For the JMX Settings specifically, the top-level key to be used is [“.PropertyBackedBeans”]. It is also possible to filter the list of attributes to retrieve by specifying a sub-level key. Going back to my customer case, even if the Alfresco Administration Console showed only one MBean (Alfresco:Category=Search,Type=Configuration,id1=managed,id2=solr), the AttributeService returned two set of Attributes stored on the DB, as I said before, one for Solr and one for Lucene:

The sub-level keys can be seen on the above screenshot, but you can also find their values from the DB as I said earlier as well as from the MBean name. For example, the MBean “Alfresco:Category=Search,Type=Configuration,id1=managed,id2=solr” can be translated to the key of [“.PropertyBackedBeans”, “Search$managed$solr”] (cat$id1$id2 in this case). Here are some more examples on how to retrieve the attributes using the AttributeService methods (getAttributes using a top-level key only, getAttributes using a top and sub-level key and another method getAttribute (without the s) to retrieve the value of a single attribute):

So, retrieving attributes is good and all but we still have our JMX Settings present… The next step is therefore to remove them and for that purpose, we will use the same top-level and sub-level keys that we found. First, removing the Lucene JMX Settings:

var context = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var attributeService = context.getBean("attributeService", Packages.org.alfresco.repo.attributes.AttributeServiceImpl);

attributeService.getAttributes(function(id, value, keys) {
    print(' > id: ' + id);
    print('  >> key: ["' + keys[0] + '", "' + keys[1] + '"]');
    print('  >> value: ' + value);
    //if ((keys[0] == ".PropertyBackedBeans") && (keys[1] == "Search$managed$solr")) {
    if ((keys[0] == ".PropertyBackedBeans") && (keys[1] == "Search$managed$lucene")) {
        attributeService.removeAttribute([keys[0], keys[1]]);
        print('   >>> The attribute for ["' + keys[0] + '", "' + keys[1] + '"] has been removed');
    }
    print('');
    return true;
}, [".PropertyBackedBeans"]);

Then, removing the Solr JMX Settings:

var context = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var attributeService = context.getBean("attributeService", Packages.org.alfresco.repo.attributes.AttributeServiceImpl);

attributeService.getAttributes(function(id, value, keys) {
    print(' > id: ' + id);
    print('  >> key: ["' + keys[0] + '", "' + keys[1] + '"]');
    print('  >> value: ' + value);
    if ((keys[0] == ".PropertyBackedBeans") && (keys[1] == "Search$managed$solr")) {
    //if ((keys[0] == ".PropertyBackedBeans") && (keys[1] == "Search$managed$lucene")) {
        attributeService.removeAttribute([keys[0], keys[1]]);
        print('   >>> The attribute for ["' + keys[0] + '", "' + keys[1] + '"] has been removed');
    }
    print('');
    return true;
}, [".PropertyBackedBeans"]);

It is also possible to obtain the same result by simply filtering the sub-level key on the method arguments (using [“.PropertyBackedBeans”, “Search$managed$solr”] on line 15 instead of [“.PropertyBackedBeans”]). The result would be exactly the same, but the above “if” statement make sure that you only remove what you expect to remove. It’s just another level of check, to prevent human error. Another alternative could be to use the removeAttribute method directly (so just the lines 1, 2 and 10 above), since the top-level and sub-level keys are known, it’s not really need to retrieve them first via the getAttributes… Therefore, proceed as prefered.

If you want the output of the JavaScript Console to appear on the Tomcat logs, you can replace “print” with “logger.warn” or “logger.info”. Here is an example of log generated for these actions:

List JMX Attributes (cd2b1a49208e85e40e4978d80d59afd8.js) >> 2 sets of properties shown
Remove Lucene JMX Attributes (cca083e1fc36fd7eb6691b47193a3885.js)
Remove Solr JMX Attributes (716300901e28026ffc632490eed87be1.js)
List JMX Attributes (cd2b1a49208e85e40e4978d80d59afd8.js) >> no properties remaining

2022-02-14 20:17:32,555  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-49] Resolving and compiling script path: cd2b1a49208e85e40e4978d80d59afd8.js
2022-02-14 20:17:32,562  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-49] cd2b1a49208e85e40e4978d80d59afd8.js Start
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]  > id: 21
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]   >> key: [".PropertyBackedBeans", "Search$managed$solr"]
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]   >> value: {solr.backup.alfresco.numberToKeep=3, search.solrTrackingSupport.enabled=true, solr.backup.archive.remoteBackupLocation=${dir.root}/solrBackup/archive, solr.backup.archive.cronExpression=0 0 4 * * ?, solr.host=localhost, solr.backup.alfresco.remoteBackupLocation=${dir.root}/solrBackup/alfresco, solr.backup.archive.numberToKeep=3, solr.backup.alfresco.cronExpression=0 0 2 * * ?, solr.port=8080, solr.port.ssl=8443}
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]  > id: 22 - key: .PropertyBackedBeans - Search$managed$lucene - value: {index.recovery.maximumPoolSize=5}
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]   >> key: [".PropertyBackedBeans", "Search$managed$lucene"]
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]   >> value: {index.recovery.maximumPoolSize=5}
2022-02-14 20:17:32,564  INFO  [repo.jscript.ScriptLogger] [exec-49]
2022-02-14 20:17:32,564  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-49] cd2b1a49208e85e40e4978d80d59afd8.js End 2 ms
2022-02-14 20:17:52,197  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-41] Resolving and compiling script path: cca083e1fc36fd7eb6691b47193a3885.js
2022-02-14 20:17:52,204  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-41] cca083e1fc36fd7eb6691b47193a3885.js Start
2022-02-14 20:17:52,206  INFO  [repo.jscript.ScriptLogger] [exec-41]  > id: 21
2022-02-14 20:17:52,206  INFO  [repo.jscript.ScriptLogger] [exec-41]   >> key: [".PropertyBackedBeans", "Search$managed$solr"]
2022-02-14 20:17:52,206  INFO  [repo.jscript.ScriptLogger] [exec-41]   >> value: {solr.backup.alfresco.numberToKeep=3, search.solrTrackingSupport.enabled=true, solr.backup.archive.remoteBackupLocation=${dir.root}/solrBackup/archive, solr.backup.archive.cronExpression=0 0 4 * * ?, solr.host=localhost, solr.backup.alfresco.remoteBackupLocation=${dir.root}/solrBackup/alfresco, solr.backup.archive.numberToKeep=3, solr.backup.alfresco.cronExpression=0 0 2 * * ?, solr.port=8080, solr.port.ssl=8443}
2022-02-14 20:17:52,207  INFO  [repo.jscript.ScriptLogger] [exec-41]
2022-02-14 20:17:52,207  INFO  [repo.jscript.ScriptLogger] [exec-41]  > id: 22 - key: .PropertyBackedBeans - Search$managed$lucene - value: {index.recovery.maximumPoolSize=5}
2022-02-14 20:17:52,207  INFO  [repo.jscript.ScriptLogger] [exec-41]   >> key: [".PropertyBackedBeans", "Search$managed$lucene"]
2022-02-14 20:17:52,207  INFO  [repo.jscript.ScriptLogger] [exec-41]   >> value: {index.recovery.maximumPoolSize=5}
2022-02-14 20:17:52,210  INFO  [repo.jscript.ScriptLogger] [exec-41]    >>> The attribute for [".PropertyBackedBeans", "Search$managed$lucene"] has been removed
2022-02-14 20:17:52,210  INFO  [repo.jscript.ScriptLogger] [exec-41]
2022-02-14 20:17:52,211  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-41] cca083e1fc36fd7eb6691b47193a3885.js End 6 ms
2022-02-14 20:18:41,840  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-12] Resolving and compiling script path: 716300901e28026ffc632490eed87be1.js
2022-02-14 20:18:41,847  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-12] 716300901e28026ffc632490eed87be1.js Start
2022-02-14 20:18:41,849  INFO  [repo.jscript.ScriptLogger] [exec-12]  > id: 21
2022-02-14 20:18:41,849  INFO  [repo.jscript.ScriptLogger] [exec-12]   >> key: [".PropertyBackedBeans", "Search$managed$solr"]
2022-02-14 20:18:41,849  INFO  [repo.jscript.ScriptLogger] [exec-12]   >> value: {solr.backup.alfresco.numberToKeep=3, search.solrTrackingSupport.enabled=true, solr.backup.archive.remoteBackupLocation=${dir.root}/solrBackup/archive, solr.backup.archive.cronExpression=0 0 4 * * ?, solr.host=localhost, solr.backup.alfresco.remoteBackupLocation=${dir.root}/solrBackup/alfresco, solr.backup.archive.numberToKeep=3, solr.backup.alfresco.cronExpression=0 0 2 * * ?, solr.port=8080, solr.port.ssl=8443}
2022-02-14 20:18:41,850  INFO  [repo.jscript.ScriptLogger] [exec-12]    >>> The attribute for [".PropertyBackedBeans", "Search$managed$solr"] has been removed
2022-02-14 20:18:41,850  INFO  [repo.jscript.ScriptLogger] [exec-12]
2022-02-14 20:18:41,850  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-12] 716300901e28026ffc632490eed87be1.js End 3 ms
2022-02-14 20:18:57,199  DEBUG [repo.jscript.RhinoScriptProcessor] [exec-37] Resolving and compiling script path: cd2b1a49208e85e40e4978d80d59afd8.js
2022-02-14 20:18:57,205  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-37] cd2b1a49208e85e40e4978d80d59afd8.js Start
2022-02-14 20:18:57,207  DEBUG [jscript.RhinoScriptProcessor.calls] [exec-37] cd2b1a49208e85e40e4978d80d59afd8.js End 1 ms

As shown above, the execution was done successfully, no errors and all the JMX Settings were finally gone:

A small restart of Alfresco, to make sure nothing has been broken and to clean the cache and you are good to go.

L’article Alfresco – Removing JMX settings via the AttributeService since the Revert doesn’t work est apparu en premier sur dbi Blog.

↧

Monitor Alfresco with Zabbix

October 20, 2023, 2:02 am

≫ Next: Hyland Summit 2023 – Düsseldorf

≪ Previous: Alfresco – Removing JMX settings via the AttributeService since the Revert doesn’t work

In this blog post, I will share a few obstacles I have encountered while trying to monitor a complete Alfresco setup with all these components. As an overview, here are the templates I will use:

Apache ActiveMQ by JMX
Apache Solr by HTTP
Apache Tomcat by JMX
Generic Java JMX (for transformation service)
Linux by Zabbix agent
Nginx by Zabbix agent
PostgreSQL by Zabbix agent 2
Website certificate by Zabbix agent 2

Until now, I only did simple monitoring:

One component, one host.

As a matter a fact, having multiple Java Virtual Machine (JVM) on same host bring challenges which will see in the next chapter. Alfresco server was provisioned with YaK and Alfresco role.

Multiple JVM Monitoring

In my previous blogs, I usually declare one or two hosts interfaces max:

An Agent interface to communicate with Zabbix Agent
A JMX interface to communicate with a JVM via Zabbix Java Gateway (setup example h ere)

If you look at the list of templates, there are three templates which are JMX based, thus I will have to declare 3 JMX interfaces:

Each interface has its own port.

For an item, one of the parameters I overlooked is “Host interface”. As a matter a fact, each item will address one interface to get its “value” from. When linking a template, you can’t directly set to which JVM it will get data from. This will have to be done in a second step.

I decided to assign the following ports as Tomcat has the more discovery rules and thus, I will not need to modify them to point to a different interface:

10011 for Apache Tomcat
10021 for ActiveMQ
10031 for Generic Java JMX

Going the “manual” Way

Active MQ by JMX templates contains the following elements to update:

2 Discovery rules
21 Items prototypes

Fortunately, Zabbix front-end has a “Mass update” feature. It is still missing for discovery rule, but a feature request exists here. Vote for it, if you want to have it a next release .

To do it for ActiveMQ, I proceed as follow:

Click on host.
Select Discovery rule.
Click on one of the two ActiveMQ discovery rule:
- Brokers discovery
- Destinations discovery
In Host interface drop down list, change from 10011 to 10021:

Next, inside each DR, there are items definition so called item prototypes:

Here we have Mass update button, so let’s use it.

Select all item prototypes.
Click “Mass update” button.
Tick Host interface checkbox and select the 10022 interface:

Finally, click “Update”.

Repeat this for the second DR.

Also repeat this for Generic Java JMX template where we will use 10031 interface.

JMX RMI Setup

RMI is a powerful protocol, but then very sensitive as many things can be done beside monitoring. To protect resources correctly, I will enable password protection and encryption.

Password Protection

First, I need to create two files:

jmxremote.access:

monitorUser readonly

jmxremote.password:

monitorRole P@ssword?

Permissions of these files must be limited to user. One might think this is not safe to leave a password in clear text, but, no worries, it will be encrypted at first connection attempt with the user.

To use these files following JVM parameters must be added:

-Dcom.sun.management.jmxremote.access.file=/path-to-file/jmxremote.access
-Dcom.sun.management.jmxremote.password.file=/path-to-file/jmxremote.password

Traffic Encryption

To enable traffic encryption, I will need a key store and a trust store. These are provided using these parameters:

-Djavax.net.ssl.keyStore=/path-to-file/keystore.p12
-Djavax.net.ssl.keyStorePassword=${KS_PASSWORD}
-Djavax.net.ssl.trustStore=/opt/openjdk-x.y.z/lib/security/cacerts
-Djavax.net.ssl.trustStorePassword=${TS_PASSWORD}

Extra Parameters

To enable remote JMX, SSL and force client authentication, some extra parameters must be added:

-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=true
-Dcom.sun.management.jmxremote.authenticate=true
-Dcom.sun.management.jmxremote.ssl.need.client.auth=true
-Djava.rmi.server.hostname=${HOSTNAME}

Finally, for each JVM, I provide the two ports required (example below if for ActiveMQ):

-Dcom.sun.management.jmxremote.port=10021
-Dcom.sun.management.jmxremote.rmi.port=10022

Zabbix Server

On the Zabbix Java Gateway side, there are also extra parameters to set. For that, I had the following lines to /etc/zabbix/zabbix_java_gateway.conf:

JAVA_OPTIONS="${JAVA_OPTIONS} \
 -Djavax.net.ssl.keyStore=/etc/zabbix/security/<ks_name>.p12 \
 -Djavax.net.ssl.keyStorePassword=<ks_password> \
 -Djavax.net.ssl.trustStorePassword=<ts_password>"

Followed by a service restart:

systemctl restart zabbix-java-gateway.service

Once this is completed, I will see all JMX interfaces up when mousing over JMX box in hosts list:

Finally, I must set macros for JMX user and password. Each template has its own macros:

Going Further

A possible improvement, which I am already working on, is to use Zabbix API to programmatically do every steps. Another improvement would be to replace all JMX user/password related macros to only two.

Monitoring a full stack taught me a lot on monitoring in general and Zabbix in particular.

L’article Monitor Alfresco with Zabbix est apparu en premier sur dbi Blog.

↧

Hyland Summit 2023 – Düsseldorf

October 26, 2023, 8:21 am

≫ Next: Alfresco – Use Alfresco for a Trivia Game – Repository skeleton

≪ Previous: Monitor Alfresco with Zabbix

Introduction

The event took place in a very pleasant street, the Königsallee, also known as the Kö, is a shopping street in Düsseldorf and considered to be the most beautiful avenue in Düsseldorf. Whaou !!

The Hyland Summit 2023 in Düsseldorf is the right place to learn and seek innovative ways to manage, secure, and harness the power of the content service platforms.

As the digital transformation journey continues to reshape the way we work, collaborate, and serve our customers, this event promises to be exceptional. From visionary keynotes to real-world success stories, it’s an opportunity to gain insights into the latest technologies, strategies, and best practices that are redefining the way businesses approach content management.

Both Guillaumes from dbi, him and me or me and him, were invited to participate this year.

So let’s read this blog to follow us during this adventure and get the last news about content excellence and innovation !

Chapter 1: The partner’s meet up

It started with a partner party on Monday evening, where we met the nice Hyland people, of course, but it was also the opportunity to discuss with experts from different horizons and to exchange about the new trends and innovations, if we face the same challenges and the way we overcome them.

*Too impatient to participate, we were the first to arrive..*

Discussion was so interesting that we decided to go with fme AG Team and share one of the biggest pizza we have ever seen!

Chapter 2: The Summit

After a good night of sleep, we were full of energy, ready to enjoy the day and discover the awesome topics prepared by the speakers.

We started with a short stretching session led by Hyland’s Vice President Tim HOOD, followed by his warm welcome speech that set the scene for the show to come.

*Welcome speech from Hyland VP Tim Hood*

Then Chris McLaughlin, Executive Vice President & Chief Revenue Officer talked about the Hyland vision: “shaping the future of content services”

Which you will see throughout this blog, is largely based on the AI and also Machine Learning.

Chapter 3 : The Customers

After that we then got 4 sessions provided by Hylands Customers explaining to us how their partnership with Hyland led to success stories. Allowing complex scenarios to become real.

VGZ – OnBase: Document archiving and accessing in the cloud

Dutch Health Insurer VGZ shared with us their remarkable Hyland success story. Learn about the challenges they faced, their requirements for a solution, rollout of the project and some facts and figures from the VGZ team.

Sparebank 1 Insurance: The Road to Digital Success with Hyland’s OnBase

In 2017, faced with managing three separate end-of-life document repositories, Sparebank 1 partnered with Hyland to seamlessly integrate them into a unified OnBase solution. They further advanced their digital operations, migrating to Microsoft Azure in 2021 and incorporating case flow and document composing solutions.

During the session, Bente will highlight the benefits of their cloud-based solution. Notably, the streamlined case management process requires minimal training for claims managers and enables direct customer communication through SMS email and letter in real time, with communications added to cases in real time. In doing so they’ve slashed the time cases take to be resolved, resulting in happier customers and reduced costs.

Future-proofed banking: FUIB’s transformation with Hyland’s cloud-native Nuxeo platform

The First Ukrainian International Bank (FUIB) is one of the leading banks in the Ukraine, with a focus on providing top-quality customer service, customized banking offerings and innovation for both individuals and businesses.

Despite the testing times that the country is currently facing, FUIB strives to remain a stable and reliable service for customers, constantly improving its processes for their convenience and safety.

Alongside this already significant challenge, the company was also looking to modernize operations and move to a paperless approach to business.

As a large, legacy financial institution with over twenty years in service, the bank had generated a vast array of paper documents that formed a key part of the company’s processes. They now needed a solution that could centralize document storage, provide secure access to authorized users with a unified integration protocol and enable scalability for future growth.

In this session, you will hear why FUIB selected Hyland’s cloud-native solution, Nuxeo, to address these challenges.

Ethias: Scaling an Alfresco archive to 200 million documents… and counting

This customer experience was the one we were the most interested of as it was about Alfresco.

Ethias is an insurance company and they decided to use Alfresco to archive Documentum data, this mission was challenging due to the volume of document and the criticality.

But with as strong containerized architecture they were able to achieve the target.

In addition, the indexing part was also a key in this success, with a SOLR sharding using the dbid-range technique (more information here) split into 25M nodes per shard with 3 shards on each of the 6 SOLR servers.

To ensure the security of the platform some mechanisms have been put in place, like ABAC Module for GDPR compliance, integration with CA SiteMinder and also content encryption at Alfresco level

And to guarantee the good performances of this platform, the monitoring was implemented with open-source applications, like Elasticsearch, Grafana and Alerta…

Chapter 4 : AWS Evangelists

We attended two presentations from AWS, the first one was from Philippe WANNER, introducing us to the workload modernization and the benefits that Cloud solutions can bring, but also how important is to disrupt with legacy technologies when you want to success your transformation journey.

After that John MOUSA explained the successful organizations are the one who are driving outcomes and value based on the insights from data.

We explored patterns for data extraction, data lakes and lake house architectures, and finally distributed data architectures with some examples.

Chapter 5 : the live (fun) demo

Arsalan MINHAS and his happy team showed us an “almost” real case scenario of what could be (will be) the near future of the content management combined with Artificial Intelligence.

With live demo, the scenario was pretty simple, a customer wants a loan to buy a new car and will exchange with the banker through Bot chat and automated forms, it was fun and lively, but it also demonstrated the power of AI and all the possibilities that come with it.

Chapter 6: Discover what’s new in the Hyland product portfolio

The idea is clear and reassuring, even if Hyland has three ECMs in its portfolio, they plan to continue their developments because each has its advantages depending on the needs.

The main updates for each application are:

OnBase

Release 23.1 available since October 2023

New OnBase App Builder
New Outlook web Add-in
New front end

Nuxeo

LTS2023 since July 2023

The most scalable Platform!

Target to reach DR capabilities in 30 Minutes
An embedded 3D Viewer
Media Editor
Keys and secrets management

Alfresco

23.1 – November 2023

Farewell to Solr and welcome to Elasticsearch for the indexing part
New User Experience
Better Workflows integrations
Integration of AI with Hyland RPA

Final Chapter : The Keynote

The Keynote title was “Unleash the ‘Rebel Technologist’ within and discover the future of digital transformation”

Presented by Brett StClair, CEO & Co-Founder of teraflow.ai.

He explained to us how to switch our conventional thinking to revolution our business.

Brett reveals how the fearless spirit of rebel technologists can fuel digital transformation and overcome adoption barrier. Focus on basics and manage one problem at a time, get a clear vision and define a roadmap that is the key of success!

In substance, If you have an idea to move things forward, go ahead, make it known and fight to get it approved by everyone.

Conclusion

The Hyland Summit in Dusseldorf in 2023 was undoubtedly an incredibly good event, the promises were kept, passionate speakers in their field aiming to share their enthusiasm and expertise in data management.

Out of scope…

A special thanks to the Deutsche Bahn, which enabled me to take this superb photo thanks to the cancellation of our train(s) home… ⁠;-)

Co-written with love by Guillaume FUCHS (the other).

L’article Hyland Summit 2023 – Düsseldorf est apparu en premier sur dbi Blog.

↧

Alfresco – Use Alfresco for a Trivia Game – Repository skeleton

November 6, 2023, 12:30 am

≫ Next: Alfresco – Use Alfresco for a Trivia Game – Add questions

≪ Previous: Hyland Summit 2023 – Düsseldorf

Have you ever wondered how you could use Alfresco to run a Trivia Game? No? Strange, I believe that should be a very important question to ask yourself when you are looking for an ECM for your company! If you have been working or following Alfresco in the past 10 years, you might have heard about the ContentCraft project, which is an integration between Alfresco and Minecraft, using CMIS. A few months ago, I was preparing a presentation about REST-API in Alfresco and to conclude the talk on a funnier note (a 45min talk about REST-API IS fun!), I thought about which game I could play using Alfresco REST-API.

With half a day to setup my Alfresco environment and implement the game, I obviously didn’t want to do something too complex and therefore, I thought about a Trivia Game. It’s essentially a question-and-answer game so knowing that Alfresco stores metadata and documents and that the REST-API can be used to fetch that, it appeared to be something feasible easily. In addition to that, it would help me and my presentation by running a small quiz to “test” the attendees and make sure they understood (/followed :D) the talk.

In this first blog, I will talk about the Alfresco preparation needed, which mainly consists of the meta-model. In a second blog, I will go through the REST-API calls needed to add questions into Alfresco with their associated good/bad answers of course. And the final blog will be around really “playing” the game. I will be using REST-API because it’s the simplest/fastest way to interact with Alfresco. A more advanced version of the game would most probably be using Web Scripts/Services so that it’s not up to the client to know how many answers are needed or to check whether the answer is good or bad, since on the client I could obviously just disregard the real answer and display that I selected the correct answer (but I don’t cheat! ;)).

First thing first, to have something done quickly for my purpose, I designed a small meta-model using the Share Model Manager. I started with the creation of the model “dbi”:

In terms of XML, it should be something like:

<model xmlns="http://www.alfresco.org/model/dictionary/1.0" name="dbi:dbi">
  <description>Model for demo</description>
  <author>Morgan Patou</author>
  ...
  <namespaces>
    <namespace uri="http://www.dbi-services.com/model/content/1.0" prefix="dbi"/>
  </namespaces>
  ...
</model>

Then I added an aspect “dbi_trivia”, so it can be added into existing nodes:

In terms of XML, it should add something like:

  <aspect name="dbi:dbi_trivia">
    <title>dbi trivia</title>
    <properties>
      ...
    </properties>
    ...
  </aspect>

This game could be done using documents. For example, a document being a question with the different answers and the correct one as content. Or it could be a type of document with the details as metadata. In this blog, I will use an aspect that will be applied to folders. Basically, a question is a folder, and the aspect is assigned to the folder so that it has access to the different metadata for questions and answers. I thought that would be one of the fastest/simplest to apply so I went with that. Therefore, the next step is to create the different properties for the aspect, 1 for the question, then 4 for the different answers (it could be a multi-valued one as well) and finally a last one to specify which is the correct answer:

In terms of XML, it should add something like:

    <properties>
      <property name="dbi:dbi_question">
        <title>Question</title>
        <type>d:text</type>
        <mandatory>true</mandatory>
        ...
      </property>
      <property name="dbi:dbi_answer1">
        <title>Answer#1</title>
        <type>d:text</type>
        <mandatory>false</mandatory>
        ...
      </property>
      ...
      <property name="dbi:dbi_correct_answer">
        <title>Correct Answer</title>
        <type>d:text</type>
        <mandatory>false</mandatory>
        ...
      </property>
    </properties>

The “dbi:dbi_correct_answer” here is again a “d:text” metadata, in the sense that it will contain again the textual answer (same value as either answer #1, 2, 3 OR 4). It would of course be possible to have this parameter as an integer instead, to link to the answer. I selected a text so that it is slightly easier to show/see which one is the correct one if you want to randomise the display of possible answers for example.

The next step is to create the layout to display these properties, which is optional, if you want them to be visible through Share. Again, something very simple:

As mentioned previously, the questions of a quiz would be different folders. To be able to handle/support multiple quiz, it would be possible to either separate the quiz based on a parent folder (each parent folder containing the different questions) or through another metadata that indicate a Unique ID for a specific quiz, in which case you could find the different questions through searches, and you wouldn’t really mind where the files are stored in Alfresco. The simplest being the parent folder (=quiz) and sub-folder (=questions) approach, I went with that, create one parent folder for now and noted its “uuid” (last part of the “nodeRef“).

That concludes the first part of this blog. It’s basically a 10/15 minutes setup to have the skeleton needed to play the game on the repository side. The second part of this blog can be found here and the third part here.

L’article Alfresco – Use Alfresco for a Trivia Game – Repository skeleton est apparu en premier sur dbi Blog.

↧

Alfresco – Use Alfresco for a Trivia Game – Add questions

November 6, 2023, 5:30 am

≫ Next: Alfresco – Use Alfresco for a Trivia Game – Play the game

≪ Previous: Alfresco – Use Alfresco for a Trivia Game – Repository skeleton

In a previous blog, I talked about using Alfresco to setup a simple Trivia Game and more specifically the Repository part of it, i.e., the meta-model and the structure I will use. In this second part, I will go through the REST-API commands that can be used to create new questions with their associated good/bad answers.

First of all, I would need to define the target endpoint and setup my credentials so that I can exchange with Alfresco. To keep things simple, I will use the Basic authentication with username/password. You could of course use a ticket or another authentication mechanism that is supported as you see fit. The “folder_id” is the “uuid” of the folder in which I prepared a few questions beginning of the year for my REST-API presentation and in which I will now add a new one for this blog:

$ base_url="https://alf-trivia.dbi-services.com"
$ endpoint="${base_url}/alfresco"
$ folder_id="e6395354-d38e-489b-b112-3549b521b04c"
$ username="admin"
$ read -s -p "Please enter the password of the '${username}' user for '${endpoint}': " password
Please enter the password of the 'admin' user for 'https://alf-trivia.dbi-services.com/alfresco':
$
$ auth=$(echo -n ${username}:${password} | base64)
$

I decided that the name of the folders inside Alfresco would be “QX” where X is the number of the question, from 1 to infinite. Therefore, to be able to create a new question, I would first need to find out how many are currently present in my quiz. For that purpose, I’m using the GET node’s children REST-API (listNodeChildren) that will list all the children of a specific node, here “${folder_id}“, with some of its details such as the children name, creation/modification date, creator/modifier, type, etc. In the result, all I would care about is the current total count of children that I can get easily using “JQ” (a JSON Processor command line utility):

$ response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Accept: application/json")
$
$ echo ${response} | jq
{
  "list": {
    "pagination": {
      "count": 6,
      "hasMoreItems": false,
      "totalItems": 6,
      "skipCount": 0,
      "maxItems": 100
    },
    "entries": [
      {
        "entry": {
          "createdAt": "2023-01-18T10:42:14.957+0000",
          "isFolder": true,
          ...
          "name": "Q1",
          "id": "d2afbdb6-2afb-4acc-85e7-61a800e96db3",
          "nodeType": "cm:folder",
          "parentId": "e6395354-d38e-489b-b112-3549b521b04c"
        }
      },
      ...
    ]
  }
}
$
$ echo ${response} | jq -r ".list.pagination.totalItems"
6
$

Here, there are currently 6 children and therefore the next question to create would be “Q7“. For that purpose, I’m using the POST node’s children REST-API (createNode). The HTTP Method changes compared to the previous request and with the POST, it is possible to ask Alfresco to create a child instead of listing them. Here, I can directly assign the “dbi:dbi_trivia” aspect to it, so that it would enable the specific metadata for this node:

$ response=$(curl -k -s -X POST "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Content-Type: application/json" \
>   -H "Accept: application/json" \
>   -d "{
>     \"name\": \"Q${folder_nb}\",
>     \"nodeType\": \"cm:folder\",
>     \"aspectNames\": [
>       \"dbi:dbi_trivia\"
>     ],
>     \"properties\": {
>       \"cm:title\":\"Question ${folder_nb}\",
>       \"cm:description\":\"Question #${folder_nb} of the TriviaGame\"
>     }
>   }")
$
$ echo ${response} | jq -r ".entry.id"
46d78e31-c796-4839-bac8-e6d5a7ff5973
$

With that, the new folder exists, and it has the associated aspect. The next step is purely optional, but I wanted to put a specific tag to all my questions, so that I can find them quicker in Share. For that purpose, I’m using the POST node’s tags REST-API (createTagForNode) on the newly create folder:

$ tag_name="triviaquestion"
$ response=$(curl -k -s -X POST "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${new_folder_id}/tags" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Content-Type: application/json" \
>   -H "Accept: application/json" \
>   -d "{
>     \"tag\": \"${tag_name}\"
>   }")
$

The last step for the first script would then be to set the different aspect’s metadata into the newly create folder. For that purpose, I’m using the PUT node REST-API (updateNode), after prompting for the values to add exactly:

$ question="Is it possible to use Alfresco to play a Trivia Game?"
$ answer1="Obviously not, it's just an ECM!"
$ answer2="Only if using the Enterprise version..."
$ answer3="Of course, Alfresco can do everything"
$ answer4="I don't know"
$ correct_answer="Of course, Alfresco can do everything"
$
$ response=$(curl -k -s -X PUT "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${new_folder_id}" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Content-Type: application/json" \
>   -H "Accept: application/json" \
>   -d "{
>     \"properties\": {
>       \"dbi:dbi_question\":\"${question}\",
>       \"dbi:dbi_answer1\":\"${answer1}\",
>       \"dbi:dbi_answer2\":\"${answer2}\",
>       \"dbi:dbi_answer3\":\"${answer3}\",
>       \"dbi:dbi_answer4\":\"${answer4}\",
>       \"dbi:dbi_correct_answer\":\"${correct_answer}\"
>     }
>   }")
$
$ echo ${response} | jq
{
  "entry": {
    "aspectNames": [
      "cm:titled",
      "cm:auditable",
      "dbi:dbi_trivia",
      "cm:taggable"
    ],
    "createdAt": "2023-11-03T10:12:49.804+0000",
    "isFolder": true,
    ...
    "name": "Q7",
    "id": "46d78e31-c796-4839-bac8-e6d5a7ff5973",
    "nodeType": "cm:folder",
    "properties": {
      "cm:title": "Question 7",
      "dbi:dbi_question": "Is it possible to use Alfresco to play a Trivia Game?",
      "dbi:dbi_correct_answer": "Of course, Alfresco can do everything",
      "dbi:dbi_answer4": "I don't know",
      "dbi:dbi_answer3": "Of course, Alfresco can do everything",
      "dbi:dbi_answer2": "Only if using the Enterprise version...",
      "dbi:dbi_answer1": "Obviously not, it's just an ECM!",
      "cm:description": "Question #7 of the TriviaGame",
      "cm:taggable": [
        "6017bd2f-05d2-4828-9a1d-a418cf43a84e"
      ]
    },
    "parentId": "e6395354-d38e-489b-b112-3549b521b04c"
  }
}
$

Putting everything together into a small bash script and then executing it again to make sure everything works, to create the 8th question:

$ cat triviaAdd.sh
#!/bin/bash

# Define endpoint, credentials, and folder ID
base_url="https://alf-trivia.dbi-services.com"
endpoint="${base_url}/alfresco"
share="${base_url}/share/page"
folder_id="e6395354-d38e-489b-b112-3549b521b04c"
username="admin"
read -s -p "Please enter the password of the '${username}' user for '${endpoint}': " password
auth=$(echo -n ${username}:${password} | base64)
tag_name="triviaquestion"

# Get number of children
echo
echo
echo "Fetching the current number of questions from Alfresco..."
echo
response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
  -H "Authorization: Basic ${auth}" \
  -H "Accept: application/json")
num_children=$(echo ${response} | jq -r ".list.pagination.totalItems")
folder_nb="$((num_children+1))"

# Create new folder
response=$(curl -k -s -X POST "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
  -H "Authorization: Basic ${auth}" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d "{
    \"name\": \"Q${folder_nb}\",
    \"nodeType\": \"cm:folder\",
    \"aspectNames\": [
      \"dbi:dbi_trivia\"
    ],
    \"properties\": {
      \"cm:title\":\"Question ${folder_nb}\",
      \"cm:description\":\"Question #${folder_nb} of the TriviaGame\"
    }
  }")
new_folder_id=$(echo ${response} | jq -r ".entry.id")
echo -e "\033[32;1m  --> A new folder 'Q${folder_nb}' has been created: ${share}/folder-details?nodeRef=workspace://SpacesStore/${new_folder_id}\033[0m"
echo

# Add the tag
response=$(curl -k -s -X POST "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${new_folder_id}/tags" \
  -H "Authorization: Basic ${auth}" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d "{
    \"tag\": \"${tag_name}\"
  }")
echo -e "\033[32;1m  --> The tag '${tag_name}' has been added to the folder\033[0m"
echo

# Add question, answers and correct answer
read -p "Enter the question: " question
read -p "Enter the answer #1: " answer1
read -p "Enter the answer #2: " answer2
read -p "Enter the answer #3: " answer3
read -p "Enter the answer #4: " answer4
read -p "Enter the correct answer: " correct_answer
response=$(curl -k -s -X PUT "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${new_folder_id}" \
  -H "Authorization: Basic ${auth}" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d "{
    \"properties\": {
      \"dbi:dbi_question\":\"${question}\",
      \"dbi:dbi_answer1\":\"${answer1}\",
      \"dbi:dbi_answer2\":\"${answer2}\",
      \"dbi:dbi_answer3\":\"${answer3}\",
      \"dbi:dbi_answer4\":\"${answer4}\",
      \"dbi:dbi_correct_answer\":\"${correct_answer}\"
    }
  }")
echo -e "\033[32;1m  --> The question and answers have been added to the folder\033[0m"
echo
$
$
$ # Execute the script to add a question
$ ./triviaAdd.sh
Please enter the password of the 'admin' user for 'https://alf-trivia.dbi-services.com/alfresco':

Fetching the current number of questions from Alfresco...

  --> A new folder 'Q8' has been created: https://alf-trivia.dbi-services.com/share/page/folder-details?nodeRef=workspace://SpacesStore/0b147c71-70fd-498e-9bf6-d40a738699fa

  --> The tag 'triviaquestion' has been added to the folder

Enter the question: Is this working as it should?
Enter the answer #1: Of course...
Enter the answer #2: Maybe?
Enter the answer #3: Definitively not
Enter the answer #4: Why do you ask me???
Enter the correct answer: Of course...
  --> The question and answers have been added to the folder

$

Unfortunately I cannot display the colors in our blog platform for the code sections, but if you look at the source code, you will see the script has that so that the output is clearer. Looking into Alfresco Share shows the newly created folder with the proper metadata:

That concludes the second part of this blog. If you are familiar with REST-API, writing this kind of small script shouldn’t take you more than a few minutes, let’s say 30 minutes if you want to add colors as I did and wrapper/information around, to have something that can be presented. The first part of this blog is available here and the third part of this blog here.

L’article Alfresco – Use Alfresco for a Trivia Game – Add questions est apparu en premier sur dbi Blog.

↧

Alfresco – Use Alfresco for a Trivia Game – Play the game

November 7, 2023, 12:30 am

≫ Next: Alfresco – A never ending transformation

≪ Previous: Alfresco – Use Alfresco for a Trivia Game – Add questions

In previous blogs (here & here), I talked about the Repository skeleton necessary and then about how to use REST-API to add questions and answers into Alfresco for the Trivia Game. In this third part, I will go through what would be needed to really play the game since that was the initial goal.

This time again, the first thing to do would be to define the target endpoint and setup my credentials so that I can exchange with Alfresco. No changes on the authentication method, I will continue to use the Basic authentication and I will of course use the same parent folder, which represents the quiz I prepared earlier (reminder: multiple parent folders = multiple quiz):

$ base_url="https://alf-trivia.dbi-services.com"
$ endpoint="${base_url}/alfresco"
$ folder_id="e6395354-d38e-489b-b112-3549b521b04c"
$ username="admin"
$ read -s -p "Please enter the password of the '${username}' user for '${endpoint}': " password
Please enter the password of the 'admin' user for 'https://alf-trivia.dbi-services.com/alfresco':
$
$ auth=$(echo -n ${username}:${password} | base64)
$

To be able to play the game, I will need to retrieve all the questions with all their associated answers and to do that, I will need to loop on all the children of the parent folder. Therefore, the next step is to retrieve the list of all Child Node IDs (“uuid“) that are present in the folder, i.e., the unique identifier of each question. For that purpose, I’m using the GET node’s children REST-API (listNodeChildren) that I used in the previous blog to retrieve the number of existing questions but this time, I will retrieve the list of IDs from it:

$ response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Accept: application/json")
$
$ echo ${response} | jq
{
  "list": {
    "pagination": {
      "count": 8,
      "hasMoreItems": false,
      "totalItems": 8,
      "skipCount": 0,
      "maxItems": 100
    },
    "entries": [
      {
        "entry": {
          "createdAt": "2023-01-18T10:42:14.957+0000",
          "isFolder": true,
          ...
          "name": "Q1",
          "id": "d2afbdb6-2afb-4acc-85e7-61a800e96db3",
          "nodeType": "cm:folder",
          "parentId": "e6395354-d38e-489b-b112-3549b521b04c"
        }
      },
      ...
      {
        "entry": {
          "createdAt": "2023-11-03T09:38:45.786+0000",
          "isFolder": true,
          ...
          "name": "Q8",
          "id": "0b147c71-70fd-498e-9bf6-d40a738699fa",
          "nodeType": "cm:folder",
          "parentId": "e6395354-d38e-489b-b112-3549b521b04c"
        }
      }
    ]
  }
}
$
$ echo ${response} | jq -r ".list.entries[].entry.id"
d2afbdb6-2afb-4acc-85e7-61a800e96db3
c4655fb0-5eef-494c-8b1e-5f160ca53558
d53bb919-ce60-42a7-a4db-5c8e3c5bfdac
c4e9826a-a1f4-4bf7-9768-137205c01045
a45e1b98-5780-448e-b26d-a603f9b03a85
4218c4b7-0be1-4948-9ae6-50e1575d1185
46d78e31-c796-4839-bac8-e6d5a7ff5973
0b147c71-70fd-498e-9bf6-d40a738699fa
$

If I wanted to, I could just use the above single REST-API call to get the properties of all nodes (without retrieving their IDs first), by using “/nodes/${folder_id}/children?include=properties” instead of “/nodes/${folder_id}/children“. This would automatically include all the properties of the nodes and therefore I would see the aspect’s properties in this single command, for all the questions:

$ response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children?include=properties" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Accept: application/json")
$
$ echo ${response} | jq -r ".list.entries[].entry.properties"
{
  "cm:title": "Question 1",
  ...
}
...
{
  "cm:title": "Question 7",
  "dbi:dbi_question": "Is it possible to use Alfresco to play a Trivia Game?",
  "dbi:dbi_correct_answer": "Of course, Alfresco can do everything",
  "dbi:dbi_answer4": "I don't know",
  "dbi:dbi_answer3": "Of course, Alfresco can do everything",
  "dbi:dbi_answer2": "Only if using the Enterprise version...",
  "dbi:dbi_answer1": "Obviously not, it's just an ECM!",
  "cm:description": "Question #7 of the TriviaGame",
  "cm:taggable": [
    "6017bd2f-05d2-4828-9a1d-a418cf43a84e"
  ]
}
{
  "cm:title": "Question 8",
  "dbi:dbi_question": "Is this working as it should?",
  "dbi:dbi_correct_answer": "Of course...",
  "dbi:dbi_answer4": "Why do you ask me???",
  "dbi:dbi_answer3": "Definitively not",
  "dbi:dbi_answer2": "Maybe?",
  "dbi:dbi_answer1": "Of course...",
  "cm:description": "Question #8 of the TriviaGame",
  "cm:taggable": [
    "6017bd2f-05d2-4828-9a1d-a418cf43a84e"
  ]
}
$

The above query including the properties is probably the optimized way to retrieve the information, since it’s a single REST-API call and you have all you need there… But I couldn’t just complete the blog by using a single REST-API call, it would be too fast ;). Therefore, I will use the list of question IDs, going through them all, retrieving the question and all the possible answers for each one separately, so it can be displayed to the “player” to test his knowledge. For that purpose, I’m using the GET node REST-API (getNode):

$ node="0b147c71-70fd-498e-9bf6-d40a738699fa"
$ response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${node}" \
>   -H "Authorization: Basic ${auth}" \
>   -H "Accept: application/json")
$
$ echo ${response} | jq
{
  "entry": {
    "aspectNames": [
      "cm:titled",
      "cm:auditable",
      "dbi:dbi_trivia",
      "cm:taggable"
    ],
    "createdAt": "2023-11-03T09:38:45.786+0000",
    "isFolder": true,
    ...
    "name": "Q8",
    "id": "0b147c71-70fd-498e-9bf6-d40a738699fa",
    "nodeType": "cm:folder",
    "properties": {
      "cm:title": "Question 8",
      "dbi:dbi_question": "Is this working as it should?",
      "dbi:dbi_correct_answer": "Of course...",
      "dbi:dbi_answer4": "Why do you ask me???",
      "dbi:dbi_answer3": "Definitively not",
      "dbi:dbi_answer2": "Maybe?",
      "dbi:dbi_answer1": "Of course...",
      "cm:description": "Question #8 of the TriviaGame",
      "cm:taggable": [
        "6017bd2f-05d2-4828-9a1d-a418cf43a84e"
      ]
    },
    "parentId": "e6395354-d38e-489b-b112-3549b521b04c"
  }
}
$

From that point, it’s only a matter of display and verification, which is simple scripting not related to Alfresco, so I won’t go through it in details. The final content of the small bash script and its execution to play the game:

$ cat triviaPlay.sh
#!/bin/bash

# Define endpoint, credentials, and folder ID
base_url="https://alf-trivia.dbi-services.com"
endpoint="${base_url}/alfresco"
folder_id="e6395354-d38e-489b-b112-3549b521b04c"
username="admin"
read -s -p "Please enter the password of the '${username}' user for '${endpoint}': " password
auth=$(echo -n ${username}:${password} | base64)

# Get all nodes in the TriviaGame folder
echo
echo
echo "Fetching all existing questions from Alfresco..."
response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${folder_id}/children" \
  -H "Authorization: Basic ${auth}" \
  -H "Accept: application/json")
nodes=$(echo ${response} | jq -r ".list.entries[].entry.id")

# Iterate through all nodes
nb_correct=0
nb_incorrect=0
for node in ${nodes}; do
  # Get question, answers, and correct answer from node's custom aspect
  response=$(curl -k -s -X GET "${endpoint}/api/-default-/public/alfresco/versions/1/nodes/${node}" \
    -H "Authorization: Basic ${auth}" \
    -H "Accept: application/json")
  question=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_question"')
  answer1=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_answer1"')
  answer2=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_answer2"')
  answer3=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_answer3"')
  answer4=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_answer4"')
  correct_answer=$(echo ${response} | jq -r '.entry.properties."dbi:dbi_correct_answer"')

  # Ask question and get user input
  echo
  echo -e "\033[4mQuestion #$((nb_correct+nb_incorrect+1)):\033[0m"
  echo "${question}"
  echo "  1) ${answer1}"
  echo "  2) ${answer2}"
  echo "  3) ${answer3}"
  echo "  4) ${answer4}"
  read -p "Please enter your answer (1, 2, 3 or 4): " user_answer
  answer=$(eval "echo \${answer$user_answer}")

  # Check if answer is correct
  if [[ "${answer}" == "${correct_answer}" ]]; then
    echo -e "\033[32;1m  --> Correct!\033[0m"
    nb_correct=$((nb_correct+1))
  else
    echo -e "\033[31m  --> Incorrect... The correct answer is: \033[31;1m${correct_answer}\033[31m.\033[0m"
    nb_incorrect=$((nb_incorrect+1))
  fi
done

# Print final score
echo
if [[ "${nb_incorrect}" == "0" ]]; then
  echo -e "\033[32;1m==> Congratulations, your final score is a perfect ${nb_correct}/$((nb_correct+nb_incorrect))!\033[0m"
else
  if [[ "${nb_correct}" -gt "${nb_incorrect}" ]]; then
    echo -e "\033[32;1m==> Your final score is an acceptable ${nb_correct}/$((nb_correct+nb_incorrect)). You can still do better!\033[0m"
  else
    echo -e "\033[31;1m==> Oops, your final score is ${nb_correct}/$((nb_correct+nb_incorrect))... You will do better next time!\033[0m"
  fi
fi
echo
$
$
$ # Execute the script to play the game
$ ./triviaPlay.sh
Please enter the password of the 'admin' user for 'https://alf-trivia.dbi-services.com/alfresco':

Fetching all existing questions from Alfresco...

Question #1:
What is the best ECM?
  1) Documentum
  2) Alfresco
  3) Nuxeo
  4) SharePoint (lol)
Please enter your answer (1, 2, 3 or 4): 2
  --> Correct!

Question #2:
Why?
  1) Because
  2) Because
  3) Because it is the best
  4) Because
Please enter your answer (1, 2, 3 or 4): 3
  --> Correct!

Question #3:
How can you interact with Alfresco REST-API?
  1) Using a browser
  2) Using a script (bash/python/java/etc)
  3) Using Postman
  4) All of the above
Please enter your answer (1, 2, 3 or 4): 4
  --> Correct!

Question #4:
What is the correct HTTP Verb to use to perform a search?
  1) POST
  2) GET
  3) PUT
  4) DELETE
Please enter your answer (1, 2, 3 or 4): 1
  --> Correct!

Question #5:
What is the correct URI to use to perform a search?
  1) /alfresco/api/search
  2) /alfresco/api/-default-/versions/1/search
  3) /alfresco/api/-default-/public/search
  4) /alfresco/api/-default-/public/search/versions/1/search
Please enter your answer (1, 2, 3 or 4): 4
  --> Correct!

Question #6:
How can you create a Node with content in a single API call?
  1) Using the content API and a multipart body
  2) Using the content API and a json body
  3) Using the children API and a multipart body
  4) Using the children API and a json body
Please enter your answer (1, 2, 3 or 4): 3
  --> Correct!

Question #7:
Is it possible to use Alfresco to play a Trivia Game?
  1) Obviously not, it's just an ECM!
  2) Only if using the Enterprise version...
  3) Of course, Alfresco can do everything
  4) I don't know
Please enter your answer (1, 2, 3 or 4): 3
  --> Correct!

Question #8:
Is this working as it should?
  1) Of course...
  2) Maybe?
  3) Definitively not
  4) Why do you ask me???
Please enter your answer (1, 2, 3 or 4): 1
  --> Correct!

==> Congratulations, your final score is a perfect 8/8!

$

I can also force some wrong answers, just to make sure it detects it properly:

$ ./triviaPlay.sh
...
Question #2:
Why?
  1) Because
  2) Because
  3) Because it is the best
  4) Because
Please enter your answer (1, 2, 3 or 4): 1
  --> Incorrect... The correct answer is: Because it is the best.
...

==> Your final score is an acceptable 7/8. You can still do better!

$
$
$ ./triviaPlay.sh
...
Question #8:
Is this working as it should?
  1) Of course...
  2) Maybe?
  3) Definitively not
  4) Why do you ask me???
Please enter your answer (1, 2, 3 or 4): 4
  --> Incorrect... The correct answer is: Of course....

==> Oops, your final score is 0/8... You will do better next time!

$

As mentioned earlier, you could simply use the first REST-API command to get all the details and then creating arrays using “JQ“, containing all metadata needed for the game. Both approaches are very easy to implement/script and give a funny ending to a presentation about Alfresco REST-API, so it was good enough for me! In case you missed them, the previous parts of this blog can be found here and here.

L’article Alfresco – Use Alfresco for a Trivia Game – Play the game est apparu en premier sur dbi Blog.

↧

Alfresco – A never ending transformation

February 16, 2024, 8:08 am

≫ Next: Alfresco – Mass removal/cleanup of documents

≪ Previous: Alfresco – Use Alfresco for a Trivia Game – Play the game

Beginning of the week, as I was working for our ServiceDesk (SLA support for our customers), I saw a few dozen mails generated by our monitoring over the weekend on a Production Alfresco 7.x Cluster doing the yo-yo in terms of RAM and Disk Space. Nothing was down, just some strange behavior where 20GB of free space would be gone and then re-appear after a few minutes and same thing for the RAM/SWAP.

The first thing I checked was the disk space mentioned on the alert. We received alerts from all members of the cluster one by one, almost in a perfect round-robin manner. On the second node, I saw the issue occurring in real-time, so I looked into what exactly was generating all the noise:

alfresco@alf-p2:~# date; df -h /tmp
Mon Feb 12 07:27:41 UTC 2024
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        19G    7G   12G  35% /tmp
alfresco@alf-p2:~#
alfresco@alf-p2:~# date; df -h /tmp
Mon Feb 12 07:28:20 UTC 2024
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        19G    9G    9G  49% /tmp
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9427    /tmp/
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9484    /tmp/
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9541    /tmp/
alfresco@alf-p2:~#

In less than a minute, around 2/3Gb of temporary files were generated, which doesn’t seem very healthy:

alfresco@alf-p2:~# cd /tmp
alfresco@alf-p2:/tmp#
alfresco@alf-p2:/tmp# ls -ltr
total 480
...
-rw-r-----   1 alfresco  alfresco    115 Feb 11 21:26 scheduler.json
drwxr-x---   2 alfresco  alfresco   4096 Feb 12 07:28 Alfresco/
drwxrwxrwt 117 root      root      12288 Feb 12 07:28 ./
alfresco@alf-p2:/tmp#
alfresco@alf-p2:/tmp# cd Alfresco/
alfresco@alf-p2:/tmp/Alfresco# ls -l
total 10553428
drwxr-x---   2 alfresco alfresco        4096 Feb 12 07:29 ./
drwxrwxrwt 117 root     root           12288 Feb 12 07:29 ../
-rw-r-----   1 alfresco alfresco     1897650 Feb 12 07:23 source_11877384286747332767_tmp.pdf
-rw-r-----   1 alfresco alfresco 10804789248 Feb 12 07:29 target_18121744399232974935_tmp.txt
alfresco@alf-p2:/tmp/Alfresco#
alfresco@alf-p2:/tmp/Alfresco#
alfresco@alf-p2:/tmp/Alfresco# ls -l
total 10686460
drwxr-x---   2 alfresco alfresco        4096 Feb 12 07:29 ./
drwxrwxrwt 117 root     root           12288 Feb 12 07:29 ../
-rw-r-----   1 alfresco alfresco     1897650 Feb 12 07:23 source_11877384286747332767_tmp.pdf
-rw-r-----   1 alfresco alfresco 10941014016 Feb 12 07:29 target_18121744399232974935_tmp.txt
alfresco@alf-p2:/tmp/Alfresco#

At that point in time, it looked like Alfresco was doing something that was causing the issue for the Disk Space, at least. Here, we can see a PDF file that is a “source” and a TXT file that appears to be under generation, as a “target”. So of course, my first thought here is that this is probably the Alfresco Transformation Service that is causing this issue, trying to transform a PDF into TXT, most probably for indexing of the content of this file.

While looking at the RAM/SWAP usage on this server, it was also showing the same thing, with the Java process of the ATS using 100% CPU (fortunately, the host has multiple CPUs) and going overboard with its RAM, forcing the host to SWAP.

Therefore, I looked at the ATS logs and saw 2 types of errors. First was a few IOException on PDFBox “Error: End-Of-File: expected line” but there wasn’t a lot of those… Then there was another error, much more present, that was the consequence of the FileSystem being full:

alfresco@alf-p2:~# cat $ATS_HOME/logs/transform-core-aio.log
...
2024-02-12 07:18:37.380 ERROR 23713 --- [o-8090-exec-141] o.a.transformer.TransformController      : Error writing: Seite 1

org.alfresco.transform.exceptions.TransformException: Error writing: Seite 1
        at org.alfresco.transformer.executors.Transformer.transform(Transformer.java:83) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.AIOController.transformImpl(AIOController.java:118) ~[classes!/:2.5.3]
        at org.alfresco.transformer.AbstractTransformerController.transform(AbstractTransformerController.java:173) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        at jdk.internal.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) ~[na:na]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
        ...
Caused by: java.lang.IllegalStateException: Error writing: Seite 1
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:697) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:673) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:617) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.TikaJavaExecutor.call(TikaJavaExecutor.java:141) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.TikaJavaExecutor.transform(TikaJavaExecutor.java:131) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Transformer.transform(Transformer.java:70) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        ... 55 common frames omitted
Caused by: org.xml.sax.SAXException: Error writing: Seite 1
        at org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:110) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:136) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) ~[tika-core-1.26.jar!/:1.26]
        ...
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:693) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        ... 60 common frames omitted
        Suppressed: java.io.IOException: No space left on device
                at java.base/java.io.FileOutputStream.writeBytes(Native Method) ~[na:na]
                at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:337) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.close(StreamEncoder.java:161) ~[na:na]
                at java.base/java.io.OutputStreamWriter.close(OutputStreamWriter.java:255) ~[na:na]
                at java.base/java.io.BufferedWriter.close(BufferedWriter.java:269) ~[na:na]
                at org.alfresco.transformer.executors.Tika.transform(Tika.java:684) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
                ... 60 common frames omitted
Caused by: java.io.IOException: No space left on device
        at java.base/java.io.FileOutputStream.writeBytes(Native Method) ~[na:na]
        at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) ~[na:na]
        at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) ~[na:na]
        at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303) ~[na:na]
...
alfresco@alf-p2:~#

As you can see above, at 07:18, the FileSystem /tmp was 100% full and when I checked 5 minutes later, at 07:23, a new transformation was already producing a 10Gb text file and still growing. So, it was clear that this happens repeatedly, most probably for the same document. According to the monitoring, the issue started just before the weekend. Looking at the first occurrences of the FileSystem full from the ATS logs gave the following:

alfresco@alf-p2:~# grep '2024.*Error writing' $ATS_HOME/logs/transform-core-aio.log
2024-02-09 19:20:51.628 ERROR 23713 --- [o-8090-exec-166] o.a.transformer.TransformController      : Error writing:
2024-02-09 19:41:29.954 ERROR 23713 --- [o-8090-exec-156] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 20:02:11.764 ERROR 23713 --- [o-8090-exec-160] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 20:23:08.828 ERROR 23713 --- [o-8090-exec-163] o.a.transformer.TransformController      : Error writing:
2024-02-09 20:44:05.313 ERROR 23713 --- [o-8090-exec-141] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 21:04:52.642 ERROR 23713 --- [o-8090-exec-162] o.a.transformer.TransformController      : Error writing: Seite 1
...
2024-02-12 07:18:37.380 ERROR 23713 --- [o-8090-exec-152] o.a.transformer.TransformController      : Error writing: Seite 1
alfresco@alf-p2:~#

With the above, it pretty much confirms that it’s the same document that is always failing, since it’s blocking on “Seite 1“, which means “Page 1” in English.

To be able to find which document is causing the issue in Alfresco, there isn’t a lot of details available, since the ATS isn’t really giving you much about what it is doing. All I had was a temporary name (which obviously doesn’t trace back to anything in the Repository) and a size. Therefore, I checked for documents on the Alfresco Data (“alf_data“) with a size equal to the document “/tmp/Alfresco/source_11877384286747332767_tmp.pdf” (i.e. 1897650 bytes), created in the last few days. I expected it to be created on the 9-Feb, a little before 19:20 and I indeed found one:

alfresco@alf-p2:~# find /alf_data/contentstore/2024/2/ -type f -ls | grep 1897650
 34508512  1856 -rw-r----- 1 alfresco alfresco 1897650 Feb 9 19:02 /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#
alfresco@alf-p2:~# md5sum /tmp/Alfresco/source_11877384286747332767_tmp.pdf /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
45ed40bd5f84b7c68e246885f2b6a55f  /tmp/Alfresco/source_11877384286747332767_tmp.pdf
45ed40bd5f84b7c68e246885f2b6a55f  /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#
alfresco@alf-p2:~# diff /tmp/Alfresco/source_11877384286747332767_tmp.pdf /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#

Therefore, this is the same content file. There is of course the possibility that a duplicate node was using the same content before February (as I searched only inside /2024/2, that means February), but since the issue appeared only over the weekend, it’s pretty safe to assume it’s this document/node.

alfresco@alf-p2:~# stat /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
  File: /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
  Size: 1897650         Blocks: 3712       IO Block: 262144 regular file
Device: 34h/52d Inode: 34508512    Links: 1
Access: (0640/-rw-r-----)  Uid: (  113/alfresco)   Gid: (  116/alfresco)
Access: 2024-02-09 19:02:12.153002964 +0000
Modify: 2024-02-09 19:02:12.157983495 +0000
Change: 2024-02-09 19:02:12.157983635 +0000
 Birth: -
alfresco@alf-p2:~#

From that point, I had the “content_url” of a Node. Therefore, I could have used the Database (see useful database queries) to find the NodeRef of this Alfresco Node but at this customer, I don’t have an easy access to the DB, so I went through Share instead.

I know the node was created (or modified) at 19:02:12 (+/- 1s) on the 9-Feb, and even if the content isn’t indexed, its metadata should still be available searchable. Therefore, I just performed a search on Alfresco Share, to find documents created (or modified) at that exact time, i.e. cm:created:’2024-02-09T19:02:12′.

That gave me 4 results, out of which only 1 had a size around 2MB. To validate if this was indeed the document causing the issue, I simply used the JavaScript Console to dump this file and it gave me the exact same “content_url“. I could also validate on Share that this specific file wasn’t content-indexed yet (despite being in the repository for 2.5 days).

As a temporary workaround, to stop the OS from going crazy, I set this document as metadata-indexed only (no content), using the “Index Control” aspect. If you don’t know how this works, it’s pretty simple for a node:

Click on “Manage Aspect”
From the list of “Available to Add”, find “Index Control (cm:indexControl)”
Click on “+” to add it to the list of “Currently Selected”
Click on “Apply changes”
Click on “Edit Properties”
Uncheck the “Is Content Indexed” option

After doing that, you should be able to see something like that on the node’s properties:

In case a transformation for this document is already in progress, you will need to wait for the FileSystem to be full for the ATS (java) to remove its temporary file and realize that this document doesn’t need to be transformed anymore. You can probably also restart the process, if you prefer.

That’s only a workaround of course, not a real solution. Therefore, even if I knew that the issue was most probably around “Seite 1“, I replicated the issue on TEST by uploading this same file into the TEST environment and then looked inside the TXT content, to validate that assumption:

alfresco@alf-t1:/tmp/Alfresco# ls -l
total 23960
drwxr-x---  2 alfresco alfresco      4096 Feb 12 09:10 ./
drwxrwxrwt 25 root     root         36864 Feb 12 09:10 ../
-rw-r-----  1 alfresco alfresco   1897650 Feb 12 09:10 source_2995534351432950419_tmp.pdf
-rw-r-----  1 alfresco alfresco  22593536 Feb 12 09:10 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# wc -l target_7429882841367188802_tmp.txt
2509490 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# grep -v "^[[:space:]]*Seite 1$" target_7429882841367188802_tmp.txt | wc -l
1913
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# sleep 30
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# wc -l target_7429882841367188802_tmp.txt
83418233 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# grep -v "^[[:space:]]*Seite 1$" target_7429882841367188802_tmp.txt | wc -l
1913
alfresco@alf-t1:/tmp/Alfresco#

As shown above, there are 1913 lines of some texts and then the rest of the millions of lines are exactly “ Seite 1“. This text is actually coming from the page 34 of the PDF (it’s a merge of multiple PDFs it seems). By removing the page 34 from the document, it can be indexed properly. In the end, the “quick” solution for this customer is to fix the PDF (e.g. transform the page 34 into an image, then back into a PDF and OCRize it so it is indexed and searchable).

L’article Alfresco – A never ending transformation est apparu en premier sur dbi Blog.

↧

Alfresco – Mass removal/cleanup of documents

April 30, 2024, 10:30 am

≫ Next: Self-Signed SSL Certificate is blocked on Chrome or Edge

≪ Previous: Alfresco – A never ending transformation

At a customer, I recently had a case where a mass-import job was executed on an interface that, in the background, uses Alfresco for document and metadata storage. From the point of view of the interface team, there was no problem as documents were properly being created in Alfresco (although performance wasn’t exceptional). However, after some time, our monitoring started sending us alerts that Solr indexing nearly stopped / was very slow. I might talk about the Solr part in a future blog but what happened is that the interface was configured to import documents into Alfresco in a way that caused too many documents in a single folder.

Too many documents in the same folder of Alfresco

The interface was trying to import documents in the folder “YYYY/MM/DD/HH” (YYYY being the year, MM the month, DD the day and HH the hour). This might be fine for Business-As-Usual (BAU), when the load isn’t too high, but when mass-importing documents, that meant several thousand documents per folder (5’000, 10’000, 20’000, …), the limit being what Alfresco can ingest in an hour or what the interface manages to send. As you probably know, Alfresco definitively doesn’t like folders with much more than a thousand nodes inside (in particular because of associations and indexing design)… When I saw that, I asked the interface team to stop the import job, but unfortunately, it wasn’t stopped right away and almost 190 000 documents were already imported into Alfresco.

Alfresco APIs for the win?

You cannot really let Alfresco in this state since Solr would heavily be impacted by this kind of situation and any change to a document in such folder could result in heavy load. Therefore, from my point of view, the best is to remove the documents and execute a new/correct import with a better distribution of documents per folder.

A first solution could be to restore the DB to a point in time before the activity started, but that means a downtime and anything else that happened in the meantime would be lost. A second option would be to find all the documents imported and remove them through API. As you might know, Share UI will not really be useful in this case since Share will either crash or just take way too long to open the folder, so don’t even try… And even if it is able to somehow open the folder containing XX’XXX nodes, you probably shouldn’t try to delete it because it will take forever, and you will not be able to know what’s the status of this process that runs in the background. Therefore, from my point of view, the only reasonable solution is through API.

Finding documents to delete

As mentioned, Solr indexing was nearly dead, so I couldn’t rely on it to find what was imported recently. Using REST-API could be possible but there are some limitations when working with huge set of results. In this case, I decided to go with a simple DB query (if you are interested in useful Alfresco DB queries), listing all documents created since the start of the mass-import by the interface user:

SQL> SELECT n.id AS "Node ID",
  n.store_id AS "Store ID",
  n.uuid AS "Document ID (UUID)",
  n.audit_creator AS "Creator",
  n.audit_created AS "Creation Date",
  n.audit_modifier AS "Modifier",
  n.audit_modified AS "Modification Date",
  n.type_qname_id
FROM alfresco.alf_node n,
  alfresco.alf_node_properties p
WHERE n.id=p.node_id
  AND p.qname_id=(SELECT id FROM alf_qname WHERE local_name='content')
  AND n.audit_created>='2023-11-23T19:00:00Z'
  AND n.audit_creator='itf_user'
  AND n.audit_created is not null;

In case the interface isn’t using a dedicated user for the mass-import process, it might be a bit more difficult to find the correct list of documents to be removed, as you would need to take care not to remove the BAU documents… Maybe using a recursive query based on the folder on which the documents were imported or some custom type/metadata or similar. The result of the above query was put in a text file for the processing:

alfresco@acs01:~$ cat alfresco_documents.txt
  Node ID Store ID Document ID (UUID)                   Creator   Creation Date             Modifier  Modification Date         TYPE_QNAME_ID
--------- -------- ------------------------------------ --------- ------------------------- --------- ------------------------- -------------
156491155        6 0f16ef7a-4cf1-4304-b578-71480570c070 itf_user  2023-11-23T19:01:02.511Z  itf_user  2023-11-23T19:01:03.128Z            265
156491158        4 2f65420a-1105-4306-9733-210501ae7efb itf_user  2023-11-23T19:01:03.198Z  itf_user  2023-11-23T19:01:03.198Z            265
156491164        6 a208d56f-df1a-4f2f-bc73-6ab39214b824 itf_user  2023-11-23T19:01:03.795Z  itf_user  2023-11-23T19:01:03.795Z            265
156491166        4 908d385f-d6bb-4b94-ba5c-6d6942bb75c3 itf_user  2023-11-23T19:01:03.918Z  itf_user  2023-11-23T19:01:03.918Z            265
...
159472069        6 cabf7343-35c4-4e8b-8a36-0fa0805b367f itf_user  2023-11-24T07:50:20.355Z  itf_user  2023-11-24T07:50:20.355Z            265
159472079        4 1bcc7301-97ab-4ddd-9561-0ecab8d09efb itf_user  2023-11-24T07:50:20.522Z  itf_user  2023-11-24T07:50:20.522Z            265
159472098        6 19d1869c-83d9-449a-8417-b460ccec1d60 itf_user  2023-11-24T07:50:20.929Z  itf_user  2023-11-24T07:50:20.929Z            265
159472107        4 bcd0f8a2-68b3-4cc9-b0bd-2af24dc4ff43 itf_user  2023-11-24T07:50:21.074Z  itf_user  2023-11-24T07:50:21.074Z            265
159472121        6 74bbe0c3-2437-4d16-bfbc-97bfa5a8d4e0 itf_user  2023-11-24T07:50:21.365Z  itf_user  2023-11-24T07:50:21.365Z            265
159472130        4 f984679f-378b-4540-853c-c36f13472fac itf_user  2023-11-24T07:50:21.511Z  itf_user  2023-11-24T07:50:21.511Z            265
159472144        6 579a2609-f5be-47e4-89c8-daaa983a314e itf_user  2023-11-24T07:50:21.788Z  itf_user  2023-11-24T07:50:21.788Z            265
159472153        4 7f408815-79e1-462a-aa07-182ee38340a3 itf_user  2023-11-24T07:50:21.941Z  itf_user  2023-11-24T07:50:21.941Z            265

379100 rows selected.
alfresco@acs01:~$

The above Store ID of ‘6’ is for the ‘workspace://SpacesStore‘ (live document store) and ‘4’ is for the ‘workspace://version2Store‘ (version store):

SQL> SELECT id, protocol, identifier FROM alf_store;
 ID PROTOCOL   IDENTIFIER
--- ---------- ----------
  1 user       alfrescoUserStore
  2 system     system
  3 workspace  lightWeightVersionStore
  4 workspace  version2Store
  5 archive    SpacesStore
  6 workspace  SpacesStore

Looking at the number of rows for each Store ID gives the exact same number and confirms there are no deleted documents yet:

alfresco@acs01:~$ grep "  4 " alfresco_documents.txt | wc -l
189550
alfresco@acs01:~$
alfresco@acs01:~$ grep "  5 " alfresco_documents.txt | wc -l
0
alfresco@acs01:~$
alfresco@acs01:~$ grep "  6 " alfresco_documents.txt | wc -l
189550
alfresco@acs01:~$

Therefore, there is around 190k docs to remove in total, which is roughly the same number seen in the filesystem. The Alfresco ContentStore has a little bit more obviously since it also contains the BAU documents.

REST-API environment preparation

Now that the list is complete, the next step is to extract the IDs of the documents, so that we can use these in REST-API calls. The IDs are simply the third column from the file (Document ID (UUID)):

alfresco@acs01:~$ grep "  6 " alfresco_documents.txt | awk '{print $3}' > input_file_6_id.txt
alfresco@acs01:~$
alfresco@acs01:~$ wc -l alfresco_documents.txt input_file_6_id.txt
   379104 alfresco_documents.txt
   189550 input_file_6_id.txt
   568654 total
alfresco@acs01:~$

Now, to be able to execute REST-API calls, we will also need to define the username/password as well as the URL to be used. I executed the REST-API calls from the Alfresco server itself, so I didn’t really need to think too much about security, and I just used a BASIC authorization method using localhost and HTTPS. If you are executing that remotely, you might want to use tickets instead (and obviously keep the HTTPS protocol). To prepare for the removal, I defined the needed environment variables as follow:

alfresco@acs01:~$ alf_user=admin
alfresco@acs01:~$ read -s -p "Enter ${alf_user} password: " alf_passwd
Enter admin password:
alfresco@acs01:~$
alfresco@acs01:~$ auth=$(echo -n "${alf_user}:${alf_passwd}" | base64)
alfresco@acs01:~$
alfresco@acs01:~$ alf_base_url="https://localhost:8443/alfresco"
alfresco@acs01:~$ alf_node_url="${alf_base_url}/api/-default-/public/alfresco/versions/1/nodes"
alfresco@acs01:~$
alfresco@acs01:~$ input_file="$HOME/input_file_6_id.txt"
alfresco@acs01:~$ output_file="$HOME/output_file_6.txt"
alfresco@acs01:~$

With the above, we have our authorization string (base64 encoding of ‘username:password‘) as well as the Alfresco API URL. In case you wonder, you can find the definition of the REST-APIs in the Alfresco API Explorer. I also defined the input file, which contains all document IDs and an output file, which will contain the list of all documents processed, with the outcome of the command, to be able to check for any issues and follow the progress.

Deleting documents with REST-API

The last step is now to create a small command/script that will execute the deletion of the documents in REST-API. Things to note here is that I’m using ‘permanent=true‘ so that the documents will not end-up in the trashcan but will be completely and permanently deleted. Therefore, you need to make sure the list of documents is correct! You can obviously set that parameter to false if you really want to, but please be aware that it will impact the performance quite a bit… Otherwise the command is fairly simple, it loops on the input file, execute the deletion query, get its output and log it:

alfresco@acs01:~$ while read -u 3 line; do
  out=$(curl -k -s -X DELETE "${alf_node_url}/${line}?permanent=true" -H "accept: application/json" -H "Authorization: Basic ${auth}" | sed 's/.*\(statusCode":[0-9]*\),.*/\1/')
  echo "${line} -- ${out}" >> "${output_file}"
done 3< "${input_file}"

The above is the simplest way/form of removal, with a single thread executed on a single server. You can obviously do multi-threaded deletions by splitting the input file into several and triggering commands in parallel, either on the same host or even on other hosts (if you have an Alfresco Cluster). In this example, I was able to get a consistent throughput of ~3130 documents deleted every 5 minutes, which means ~10.4 documents deleted per second. Again, that was on a single server with a single thread:

alfresco@acs01:~$ while true; do
  echo "$(date) -- $(wc -l output_file_6.txt)"
  sleep 300
done
Fri Nov 24 09:57:38 CET 2023 -- 810 output_file_6.txt
...
Fri Nov 24 10:26:55 CET 2023 -- 18920 output_file_6.txt
Fri Nov 24 10:31:55 CET 2023 -- 22042 output_file_6.txt
Fri Nov 24 10:36:55 CET 2023 -- 25180 output_file_6.txt
Fri Nov 24 10:41:55 CET 2023 -- 28290 output_file_6.txt
...

Since the cURL output (‘statusCode‘) is also recorded in the log file, I was able to confirm that 100% of the queries were successfully executed and all my documents were permanently deleted. With multi-threading and offloading to other members of the Cluster, it would have been possible to increase that by a lot (x5? x10? x20?) but that wasn’t needed in this case since the interface job needed to be updated before a new import could be triggered.

L’article Alfresco – Mass removal/cleanup of documents est apparu en premier sur dbi Blog.

↧

Self-Signed SSL Certificate is blocked on Chrome or Edge

June 8, 2024, 12:43 pm

≫ Next: Alfresco – Impact of SSL and LB on the import perf.

≪ Previous: Alfresco – Mass removal/cleanup of documents

If you are using Self-Signed SSL Certificates and recently updated your Google Chrome / Edge browser, you might have come across a new kind of error, i.e: “ERR_SSL_KEY_USAGE_INCOMPATIBLE”. In my case, it happened a few months ago on an Alfresco installation, and more specifically the Apache Solr URL. Since this URL is mostly used/accessible by Administrators, there is not much traffic going on there and no need to use a real signed SSL Certificate. The error on the browser looks like this:

This is, at the moment, specific to Chrome / Edge but it will most probably be coming for other browsers as well. As mentioned on this previous blog, I have been using a pretty standard OpenSSL request file to generate all my Self-Signed SSL Certificates for years. This is an example of configuration I used to use:

[req]
distinguished_name = dn
x509_extensions = v3_req
prompt = no

[dn]
C = CH
ST = JU
L = Delemont
O = dbi services
OU = IT
CN = dms.poc.it.dbi-services.com

[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = dms.poc.it.dbi-services.com
DNS.2 = alfresco1.it.dbi-services.com
DNS.3 = alfresco2.it.dbi-services.com
DNS.4 = solr1.it.dbi-services.com
DNS.5 = solr2.it.dbi-services.com

This has been sufficient so far and it still works properly on Firefox for example, but not on Chrome anymore. After a bit of research, I found this page from Chrome which explains this change in behavior, for security purposes. As it states:

Connections which fail this check will fail with the error ERR_SSL_KEY_USAGE_INCOMPATIBLE. Sites which fail with this error likely have a misconfigured certificate. Modern ECDHE_RSA cipher suites use the “digitalSignature” key usage option, while legacy RSA decryption cipher suites use the “keyEncipherment” key usage option. If unsure, adminstrators should include both in RSA certificates meant for HTTPS.

Therefore, depending on the cipher suite that will be used, your Self-Signed SSL Certificate with only “keyEncipherment“, might not be sufficient anymore. If you want more details on the “keyUsage“, please see the OpenSSL documentation.

You can, of course, set the registry entry (on Windows clients) to disable this new behavior, but that’s not the way to go in the long term. You should therefore re-generate all your Self-Signed SSL Certificate to add the “digitalSignature” key usage. For that purpose, you just have to edit the line 15 from the above request file from “keyUsage = keyEncipherment, dataEncipherment” to “keyUsage = keyEncipherment, dataEncipherment, digitalSignature“. The result is this one:

[req]
distinguished_name = dn
x509_extensions = v3_req
prompt = no

[dn]
C = CH
ST = JU
L = Delemont
O = dbi services
OU = IT
CN = dms.poc.it.dbi-services.com

[v3_req]
keyUsage = keyEncipherment, dataEncipherment, digitalSignature
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = dms.poc.it.dbi-services.com
DNS.2 = alfresco1.it.dbi-services.com
DNS.3 = alfresco2.it.dbi-services.com
DNS.4 = solr1.it.dbi-services.com
DNS.5 = solr2.it.dbi-services.com

Then re-execute the shell script (or just the OpenSSL command) and replace the certificate in your Web Server / Application Server to fix the issue for good. In my case, my Solr URL was reachable again through Chrome:

L’article Self-Signed SSL Certificate is blocked on Chrome or Edge est apparu en premier sur dbi Blog.

↧

Alfresco – Impact of SSL and LB on the import perf.

July 31, 2024, 10:42 am

≪ Previous: Self-Signed SSL Certificate is blocked on Chrome or Edge

Have you ever wondered what is the impact of SSL communications or the impact of a specific Load Balancing mechanisms on the performance of your Alfresco environment? Alfresco (the Company at that time, before it became Hyland) ran some benchmark and published the results a few years ago but that might not be very relevant to you as you might be running your infrastructure very differently to what they used. Networking & latency, CPUs, memory, disks, virtualization, etc… All that will have an impact on the performance so you cannot really take external data for granted. In this blog, I will look specifically on the import side of things.

I. Setup details

Recently, I had a customer which wanted to migrate 3 million documents to Alfresco and they wanted to know how long it could take. This specific environment has 2 Alfresco Content Services (7.x) Nodes in Cluster as well as 2 Alfresco Search Services (Solr6) Nodes using Sharding. It’s not a new environment, it has been running since several years already and has around 10TB of content stored inside. At dbi services, we have a team that can help customers execute Load Tests / Stress Tests for their applications (e.g. this blog). However, that will usually require a certain amount of time to integrate the Load Test software (like JMeter) and the target application as well as to design the needed scenarios beforehand. This customer didn’t really need to pull out the big gun as it was just to get an idea on the import speed. Instead, I proposed to simply script a small importer to be as close as possible to the exact performance that the migration would have, using the REST-API from outside of the Alfresco Cluster Nodes (to consider also the networking & latency), using the same user/permissions/ACLs, etc.

To give a bit more details regarding the setup, there is an Apache HTTPD installed on each Alfresco Nodes. The customer doesn’t have any global load balancer solution (neither hardware nor software) and therefore, to avoid single point of failure (SPOF), the DNS would redirect the traffic to any of the 2 Apache HTTPD, which would then again redirect the traffic to any of the 2 Apache Tomcat hosting the alfresco.war application. That’s one way to do it but there are other solutions possible. Therefore, the question came about what the impact of the SSL communications was exactly as well as what would be the difference if the Load Balancing mechanisms would be different. Like, for example, only redirecting the requests to the local Tomcat and not caring about the second Node. If you do that, of course you might introduce a SPOF, but for the migration purpose, which is very short-lived and that can use a decided URL/PORT, it could be an option (assuming it brings a non-negligeable performance gain).

II. Test cases

On a TEST environment, I decided to slightly update the Apache HTTPD and Apache Tomcat configurations to allow for these test cases:

Apache HTTPD in HTTPS with Load Balancing (mod_jk) >> standard day-to-day configuration used so far, to avoid SPOF
Apache HTTPD in HTTP with Load Balancing (mod_jk) >> normally redirect the traffic to HTTPS (above config) but I modified that to send the request to Tomcat instead
Apache Tomcat in HTTP (bypass Apache HTTPD) >> normally blocked but I allowed it
Apache HTTPD in HTTPS without Load Balancing (proxy) >> normally doesn’t exist but I added a simple proxy config to send the request to Tomcat instead
Apache HTTPD in HTTP without Load Balancing (proxy) >> normally doesn’t exist but I added a simple proxy config to send the request to Tomcat instead

III. Documents, metadata & import script

To be as close as possible with the real migration for this customer, I took, as input, a few of the smallest documents that will be imported, a few of the biggest (300 times bigger than the smallest) and a few around the average size (30 times bigger than the smallest). I also took different mimetypes like PDF, XML, TXT and the associated expected metadata for all of these. These documents are all using a custom type with ~10 custom properties.

I love bash/shell scripting, it’s definitively not the fastest solution (C/C++, Go, Python, Perl or even Java would be faster) but it’s still decent and above all, it’s simple, so that’s what I used. The goal isn’t to have the best performance here, but just to compare apples to apples. The script itself is pretty simple, it defines a few variables like the REST-API URL to use (which depends on the Access Method chosen), the parent folder under which imports will be done, the username and asks for a password. It takes three parameters as command line arguments, the Access Method to be used, the type of documents to import (small/average/large sizes) and the number of documents to create in Alfresco. For example:

## script_name --access_method --doc_size nb_doc_to_import
./alf_import.sh --apache-https-lb --small 1000
./alf_import.sh --apache-http-lb --average 1000
./alf_import.sh --direct --large 1000
./alf_import.sh --apache-https-nolb --small 1000
./alf_import.sh --apache-http-nolb --small 1000

With these parameters, the script would select the templates to use and their associated metadata and then start a timer and a loop to import all the documents, in a single-thread (meaning 1 after the other). As soon as the last document has been imported, it would stop the timer and provide the import time as outcome. It’s really nothing complex, around 20 lines of code, simple and straightforward.

IV. Results – HTTPS vs HTTP & different access methods – Single-thread

I did 3 runs of 1000 documents, for each combination possible (between the access method and the document size). I then took the average execution time for the 3 runs which I transformed into an import speed (Docs/sec). The resulting graph looked like that:

As a reminder, this is a single-thread import using REST-API inside a bash/shell script executed from a remote server (through the Network). So, what can we see on these results?

First, and as I expected, we can see around 10/15% degradation when using HTTPS instead of HTTP.
Then, between the smallest and the average size documents, we can only see a very small difference (with the same access method): around 1-3%. Which could indicate that the network might not be the limiting factor when documents aren’t too big, since the documents size increased by 30 times while the import speed is “only” 1-3% slower.
A third interesting point is that with bigger files, the import is noticeably slower. That’s especially true when using the Load Balancing methods, as that means that irrespective of which Apache HTTPD we are talking to, there will be 50% of the requests going to the local Alfresco Node while the remaining 50% will be redirected to the second, remote, Alfresco Node. Therefore, the bigger the document, the slower it will be compared to other methods, as for 50% of the requests, it will need to transfer the document through the network twice (client -> Apache HTTPD + Apache HTTPD -> remote Alfresco Node). With the large documents, the size increased by 10 times (vs average) and the import speed is 10-25% slower.
In relation to the previous point, there is another interesting thing to note for the small/medium documents. Indeed, even with a single thread execution, using the Load Balancing method is actually 3-4% faster than the direct access to Tomcat and than a plain Reverse Proxy. How could that be? If we consider the network, it should be slower, no? I believe this shows that the Apache HTTPD implementation of the Load Balancing via “mod_jk” is really the most efficient way to access an Apache Tomcat. This difference would probably be even more exacerbated with multi-threads, while doing Load Tests / Stress Tests.

V. Import script with Multi-threads?

With the previous script, it was possible to test different import/access methods, but it was only using a single thread. This means that new requests would only come in when the previous one was already completed, and the result was returned to the client. That’s obviously not like that in reality, as you might have several users working at the same time, on different things. In terms of Migration, to increase the import speed, you will also most probably have a multi-threaded architecture as it can drastically reduce the time required. In a similar approach, the customer also wanted to see how the system behaves when we add several importers running in parallel.

Therefore, I used a second script, a wrapper of sort, that would trigger/manage/monitor multiple threads executing the first script. The plan is, of course, to provide the exact same command line arguments as before, but we would also need a new one for the number of threads to start. For example:

## script_name --access_method --doc_size nb_doc_to_import nb_threads
./alf_multi_thread.sh --apache-https-lb --small 1000 2
./alf_multi_thread.sh --apache-http-lb --average 1000 6

Most parameters would just be forwarded to the first script, except for the number of threads (obviously) and the number of documents to import. To keep things consistent, the parameter “nb_doc_to_import” should still represent the total number of documents to import and not the number per thread. This is because if you try to import 1000 documents on 6 threads, for example, you will be able to do either 996 (6166) documents or 1002 (6167) but not 1000… Giving 1000 documents to import, the script would do some division with remainder so that the threads #1, #2, #3 and #4 would import 167 documents while the threads #5 and #6 would only import 166 documents. This distribution would be calculated first and then all threads would be started at the same time (+/- 1ms). The script would then monitor the progress of the different threads and report the execution time when everything is completed.

VI. Results – Scaling the import – Multi-threads

As previously, I did 3 imports of 1000 documents each of took the average time. I executed the imports for 1 to 10 threads as well as 15, 20, 30, 50, 70, 90 and 100 threads. In addition, I did all that with both 1 Alfresco Node or 2 Alfresco Nodes, to be able to compare the speed if only 1 Tomcat is serving 100% of the requests or if the load is shared 50/50. The resulting graph looked like that:

So, what can we see on these results?

It’s pretty clear that the ingestion speed is increasing in an almost linear way from 1 to ~8/10 threads. The increase then slows down between 10 and 50 threads before the import speed actually starts decreasing from 70 parallel threads. The limit reached and the number of threads seen might just be related to the fact that I was using a bash/shell script and the fact that the OS on which I was running the importer (my client workstation) was obviously limited in terms of processing power. I had only 4 CPUs, so when you try to run 20/50/70 threads on it, it’s bound to reach a threshold where your threads are actually just waiting for some CPU time before it gets executed. Therefore, adding more might not improve the performance and it might actually Have the opposite effect.
There isn’t much difference in terms of ingestion speed whether we used only 1 or 2 Alfresco Node(s). With 1 to 4 threads, it was ~6% faster to use 2 Alfresco Nodes. From 5 to 10 threads, the gap widens a bit but, in the end, the maximum difference was only ~10%. After 10 parallel threads, the gap reduces again and then the threshold/limit is pretty much the same. You might think something like: “Why is it not 2x faster to use 2 Alfresco Nodes?”. Well, it’s just not enough threads. Whether you are running 6 threads on 1 Alfresco Node or 3 threads on each of 2 Alfresco Nodes (3×2=6), it’s just not enough to see a big difference. The number of threads is fixed, so you need to compare 6 threads in total. With that in mind, this test isn’t sufficient because we are far from what Tomcat can handle and that means the small difference seen is most probably coming from the client I was using and not Alfresco.

In summary, what is the limiting factor here? The CPU of my client workstation? The networking? The Clustering? The DB? It’s pretty hard to say without further testing. For example, adding more CPU and/or other client workstations to spread the source of the requests. Or removing the clustering on this environment so that Alfresco doesn’t need to maintain the clustering-related caches and other behaviours required. In the end, the customer just wanted to get an idea of how the import speed increases with additional threads, so the limit wasn’t really relevant here.

As a closing comment, it actually took much more time to run the tests and gather/analyse the results than to create the scripts used. As mentioned previously, if you would like to do real Load Tests / Stress Tests (or something quick&dirty as here :D), don’t hesitate to contact us, some of my colleagues would be very happy to help.

L’article Alfresco – Impact of SSL and LB on the import perf. est apparu en premier sur dbi Blog.

↧