Some companies make the insightful IT business decisions because they have the right data, processes and software. Because ManageEngine fits into the software bucket, I’ll address this straight up. It never fails to amaze me is many IT departments have a hodge-podge set of tools. Recently, I ran into a company that is using What’s up, Altiris, some basic MRTG and Tivoli…and it was only giving them up / down status. They also purchased Applications Manager and were liking the results.
I’ve heard this silo, multi-tool story many times. It happens for a variety of reasons. It is usually based on their IT infrastructure maturity level, the evolution their needs at the time or the case of IT decision makers coming and going. The attitude of the moment is: “Got a problem, I’ll solve it!” Reminds me of that line in the Vanilla Ice song...
Management software is not cheap and can cause neck and back problems (swiveling head back and forth to look at multiple consoles). There became a point where they wanted to become a performance-based IT Shop. They learned more about Applications Manager, Added VMWare and Storage monitoring, then added Service Desk to aid in their trouble ticketing and incident and change management processes.
There is a time when intuition based trouble shooting does not scale. Data needs to be collected to get a sense of what’s going on in the IT infrastructure. Then one must identify strength and weaknesses and measure progress against goals and historical data. All of which supports good decision making.
Selecting Metrics to Predict Performance
IT Metrics should be defined to fit the individual need. Not all infrastructures are the same. Within ManageEngine products, one can collect hundreds of arcane metrics. In some cases, IT shops are fire fighting all day and no one is aware of the performance metrics. Managing and controlling the IT metrics has big implications. Downtime and loss of productivity definitely puts a hit on the financial bottom line. Just selecting just a few critical metrics is key to moving toward a performance driven organization. Visit the metrics continually to align with the decision making strategy. Then, make the metrics visible to all to see. Some of our customers put up a dashboard in high traffic areas. People became more aware and active in understanding the goals of the IT strategy, thus making everyone more accountable.
Below are customer examples to drive the point home.
Jamie Gilbert, Director, CIO of CD Baby, the largest online distributor of independent music is using ManageEngine Applications Manager and Service Desk. He said there is an expectation within the organization of no downtime. Uptime metrics and SLA reporting for long term trending for site performance using URL sequence testing is invaluable. Not only performance driven, he also uses it for troubleshooting analysis.
In a previous position, he implemented Applications Manager to monitor 450 real and virtual servers in a mixed Windows and Linux environment with MS-SQL and Oracle databases. He experienced issues with a new application running on Apache, tomcat, and Java. While using real time performance reporting in Applications Manager along with long term trending and comparative analysis reporting across servers, they were able to hone in on the root cause of the issue. The root cause ended up being application programming issue in conjunction with tomcat connection limitations and JVM memory allocation. It was a multifaceted problem and Applications Manager made it possible to see the problems very easily and allowed the team to come up with a path to resolution.
Darren Qualls, CTO of Premier Global Technologies, user of ManageEngine IT360 explains database performance this way. Slow performing databases can be extremely tricky to chase down. An example would be a 9 Terabytes of SQL server data and throw a $20k piece of hardware at it. The likelihood is you’re still going to have problems. There are a few common issues you will run into with database servers. In most cases, you will want to start with lock waits. This is one of the standard metrics for any product. There are so many ways you can mess up record locking and not even know it for a year or two.
In 90% of the cases, record lock issues are only a drop in the bucket. The next thing I run into is the disks. Slow disk access will take a half million dollar blade System to its knees EVERY time! There are so many things that I have to categorically rate as self imposed; incorrect normalization of data, bugs in code, incorrect commit placement or parameters, etc. Even underpowered hardware with incorrect initial specs, organic growth, expired systems will cause problems. Another is telecom issues that can be anything that revolves externally around the system, network setup or remote pulls on queries for reports.
These are your common 3 server setups you’ll need for network maps and traffic monitoring to isolate the data to determine the issue. Do not skimp, without it you may end up taking about 3 times the effort to resolve it.
No comments:
Post a Comment