By David McPhail, Principal Architect at Rulesware
A client recently asked me the following question: How can we guarantee that 80% of transaction responses will not take longer than five seconds from the time of user request?
For the client, this excluded network time and process time at the Client PC (excepting any mutually agreed to exception transaction responses).
Others ask: Can we ensure that 100% of transaction responses shall not take longer than 10 seconds from the time of user request, excluding network time and process time at Client PC, except for any mutually agreed to exception transaction responses? These are difficult questions that we see asked, in varying forms, over and over again.
It comes down to how we design the system and how we monitor it. It’s difficult to guarantee a specific response time when there are often many interfaces from many disparate systems, both new and old.
Perception is also key. For example, what some may say is fast others may say is slow. What, then, do we measure: total round trip? The entire process? Individual http calls? Is the system instantly responsive?
In my experience, I think all of these considerations are important. It boils down to design, how you monitor, and how you set yourself up to be proactive.
10 Key Principles for Designing and Monitoring Performance Systems
- Use Patterns for making a system respond faster.
- Have an environment in place that is a match to your production system or as close to it as possible. This will allow you to load test and identify issues earlier.
- Know what you are measuring and testing. Is it screen to screen? AJAX traffic? Load test early and often in all environments. Keep track of past runs so you can measure appropriately.
- There are tools that monitor web traffic. Fiddler and httpWatch comes to mind for monitoring single sessions. Atternity, CA Wily IntrosCop and IBM Tivoli are few tools I’ve seen used at various client sites for active monitoring. I can’t recommend a particular product, but I can tell you that enterprise class monitoring exists and is being used.
- Pega has its own monitoring tools but they have their limitations. Use AES, for active monitoring and PLA, PAL and Performance Analyzer for passive monitoring. Monitor logs in production by using AES. Often there are early outliers that lead to performance degradation and don’t forget to check your other environments! AES is also built on Pega and workflows can be created and integrated with bug tracking systems.
- Server side metrics are also important. JVM performance, threading, server errors, database run stats, et al are also important aspects of performance.
- Set Alert thresholds appropriately. If we set them too low, then we don’t see the issues people may be reporting. Set them too high and we see too much noise.
- Understand what the user is doing, in respect to the system. Let the user know something is running long and allow them to escape. It lets them move on to other work and then come back.
- Make performance part of your process; from development to production and continuing into sustainment and ongoing maintenance. It should never become an afterthought.
- Use OOB controls, capabilities and follow guardrails and best practices. This will reduce the chance of introducing something bad into the system.
I think it is also important that we design the system in a way where we can tell where the performance issues are. By the nature of design, Pega will often be the first place people want to blame for performance issues. We should design the system in a way that makes it possible for us to correlate downstream, when sub-systems are taking long to respond and know exactly where the problem is.
Following these steps will help make the answer to the question easier. It won’t be easy at times, but following these steps will surely pay off.