DolphinScheduler Project Management Page Loads Slowly? Here's How to Optimize It

Problem Description Sometimes, the Apache DolphinScheduler project management page fails to load. When inspecting in the browser, the request [http://ip:12345/dolphinscheduler/projects?pageSize=10&pageNo=1&searchVal=] times out. Troubleshooting Steps 1. Check DolphinScheduler Logs (No Exceptions Found) tail /home/dolphinscheduler/api-server/logs/dolphinscheduler-api.log 2. Restart the DolphinScheduler api-server Module sh /home/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop api-server sh /home/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server Result: The page still fails to load. 3. Test the Timeout Request and Observe the Response Testing reveals that data can be returned, but it takes over ten seconds. 4. Check the Source Code for Potential Slow Queries Code location: 5. Use Arthas to Analyze Runtime Performance curl -O https://arthas.aliyun.com/arthas-boot.jar java -jar arthas-boot.jar # Identify the DolphinScheduler API process; in my case, it was the first one. Trace the runtime to find where the delay occurs. It was identified in the queryProjectListPaging method. trace org.apache.dolphinscheduler.api.service.impl.ProjectServiceImpl queryProjectListPaging -n 1 Inspecting the queryProjectListPaging method: 6. Run the SQL Query and Analyze Testing the SQL query reveals that it is not inherently slow. SELECT * FROM t_ds_project p WHERE 1=1 AND p.id IN (1, 2, 3, 5, 6, 8, 9, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 26) GROUP BY p.id ORDER BY p.id DESC; Testing the SQL being executed by DolphinScheduler, using Arthas and the CTBots MyBatis SQL Parser: 7. Identify the Slow Query The slow SQL query was identified as follows: SELECT p.id, p.name, p.code, p.description, p.user_id, p.flag, p.create_time, p.update_time, u.user_name AS user_name, (SELECT COUNT(*) FROM t_ds_process_definition AS def WHERE def.project_code = p.code) AS def_count, (SELECT COUNT(*) FROM t_ds_process_definition_log def, t_ds_process_instance inst WHERE def.code = inst.process_definition_code AND def.version = inst.process_definition_version AND def.project_code = p.code AND inst.state = 1) AS inst_running_count FROM t_ds_project p LEFT JOIN t_ds_user u ON u.id = p.user_id WHERE 1=1 AND p.id IN (1, 2, 3, 5, 6, 8, 9, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 26) GROUP BY p.id, u.user_name ORDER BY p.id DESC LIMIT 100; Analysis using EXPLAIN revealed that the bottleneck was in the inst_running_count subquery. The t_ds_process_definition table contains approximately 280,000 rows, while t_ds_process_definition_log has 1,000 rows. 8. Optimize the Query with Indexes Adding an index to the t_ds_process_instance table significantly improved performance: CREATE INDEX qwl_diy_index ON t_ds_process_instance (process_definition_code, process_definition_version, state); 9. Verify the Results After adding the index, the query execution time improved, and the page loaded correctly. By addressing the slow query with an appropriate index, the issue was resolved. If this article was helpful, feel free to like, bookmark, or share it!

Jan 15, 2025 - 04:57
DolphinScheduler Project Management Page Loads Slowly? Here's How to Optimize It

Image description

Problem Description

Sometimes, the Apache DolphinScheduler project management page fails to load. When inspecting in the browser, the request [http://ip:12345/dolphinscheduler/projects?pageSize=10&pageNo=1&searchVal=] times out.

Troubleshooting Steps

1. Check DolphinScheduler Logs (No Exceptions Found)

tail /home/dolphinscheduler/api-server/logs/dolphinscheduler-api.log  

Image description

2. Restart the DolphinScheduler api-server Module

sh /home/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop api-server  
sh /home/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server  

Result: The page still fails to load.

3. Test the Timeout Request and Observe the Response

Testing reveals that data can be returned, but it takes over ten seconds.

Image description

4. Check the Source Code for Potential Slow Queries

Code location:

Image description

5. Use Arthas to Analyze Runtime Performance

curl -O https://arthas.aliyun.com/arthas-boot.jar  
java -jar arthas-boot.jar  
# Identify the DolphinScheduler API process; in my case, it was the first one.  

Trace the runtime to find where the delay occurs. It was identified in the queryProjectListPaging method.

trace org.apache.dolphinscheduler.api.service.impl.ProjectServiceImpl queryProjectListPaging -n 1  

Image description

Inspecting the queryProjectListPaging method:

Image description

6. Run the SQL Query and Analyze

Testing the SQL query reveals that it is not inherently slow.

SELECT * FROM t_ds_project p  
WHERE 1=1 AND p.id IN (1, 2, 3, 5, 6, 8, 9, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 26)  
GROUP BY p.id  
ORDER BY p.id DESC;  

Image description

Testing the SQL being executed by DolphinScheduler, using Arthas and the CTBots MyBatis SQL Parser:

Image description

7. Identify the Slow Query

The slow SQL query was identified as follows:

SELECT  
    p.id, p.name, p.code, p.description, p.user_id, p.flag, p.create_time, p.update_time,  
    u.user_name AS user_name,  
    (SELECT COUNT(*) FROM t_ds_process_definition AS def WHERE def.project_code = p.code) AS def_count,  
    (SELECT COUNT(*) FROM t_ds_process_definition_log def, t_ds_process_instance inst  
     WHERE def.code = inst.process_definition_code  
       AND def.version = inst.process_definition_version  
       AND def.project_code = p.code  
       AND inst.state = 1) AS inst_running_count  
FROM t_ds_project p  
LEFT JOIN t_ds_user u ON u.id = p.user_id  
WHERE 1=1  
  AND p.id IN (1, 2, 3, 5, 6, 8, 9, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 26)  
GROUP BY p.id, u.user_name  
ORDER BY p.id DESC LIMIT 100;  

Analysis using EXPLAIN revealed that the bottleneck was in the inst_running_count subquery.

The t_ds_process_definition table contains approximately 280,000 rows, while t_ds_process_definition_log has 1,000 rows.

8. Optimize the Query with Indexes

Adding an index to the t_ds_process_instance table significantly improved performance:

CREATE INDEX qwl_diy_index  
    ON t_ds_process_instance (process_definition_code, process_definition_version, state);  

9. Verify the Results

After adding the index, the query execution time improved, and the page loaded correctly.

Image description

By addressing the slow query with an appropriate index, the issue was resolved. If this article was helpful, feel free to like, bookmark, or share it!