Thursday, August 18, 2011

Performance based IT Shop Part 2

Not all IT problems come under the domain of the network engineer. In my previous Blog, I talked about IT shops having a hodge-podge set of tools. There are various reasons, but the real inefficiency is when these tools perform the same functions. There becomes a time and need to look at the IT problems from different perspectives. A few examples below:


Kenn Nied, Senior Network Engineer at WA State Board for Community and Technical Colleges, illustrates this encounter. While looking at OpManager from a networking point of view, the operator sees alerts that a few switches and a firewall are unresponsive. Is it faulty equipment or an attack? Then turning to a Security mindset, he looks at ManageEngine Device Expert to see real time and historical configuration changes. In one case, it was identified that there was a Firewall rule change made and realized it was a misconfiguration that caused the switch unresponsive. Diagnostic time was minimal.

Albert E. Whale, CHS CISA CISSP, Senior Technology & Security Director for ABS Computer Technology, Inc. explains the security aspect further. When you are managing the security of a business, there are several essential tools needed to manage the environment. There is a need to get a better handle on the design, information flow and stability in the environment. First is a baseline review of all of the Network devices. ManageEngine Device Expert captures the current configuration of the network switches and firewalls. It's an invaluable tool for managing change control on configurations, and also evaluating all of the configurations at a glance. Continuing from the baseline report, both the ManageEngine EventLog Analyzer and Firewall Analyzer determine bottle necks in network throughput and attack information within the Enterprise. Being proactive on security allows for protection before break-ins occur.

Bill Duffy, CTO of Northwind Technology describes the compliance angle. IT departments are faced with compliance oversight irrespective of whether its internal audit and risk management or external regulatory bodies overseeing a particular industry share common goals in meeting these requirements:

* Ability to incorporate aims of compliance reporting into overall monitoring and system administration strategy to optimize technology investments as requirements change and grow.
* Need to reduce the time spent on compliance and audit reporting.
* Use monitoring toolset to proactively manage risk across the organization.
* Demonstrate adherence to compliance controls with clear, objective and easily accessible evidence.

Central to achieving these aims is finding a comprehensive suite of tools that covers all areas of IT security and infrastructure and provides easy access to administrators and auditors. Moreover, it is paramount to provide a rich reporting framework to address ad-hoc and historical data requests as part of evidence gathering during audits. IT departments meeting compliance need to show service availability, IT administration staff activity tracking, change management, asset management, access control, as well as audit trails and logging (security, system, applications, maintenance etc).

The ManageEngine suite of products is unique in being able to effectively bridge the IT landscape to meet these compliance demands. By utilizing ManageEngine ServiceDesk Plus, OpManager and AD Manager Plus as well as modules for AD Audit Plus and Asset Explorer in an integrated fashion, we are able to provide a complete compliance approach streamlined to limit audit and administration burdens on human and system resources while delivering a risk management solution and satisfying audit controls.





Wednesday, August 3, 2011

Performance based IT Shop

Some companies make the insightful IT business decisions because they have the right data, processes and software. Because ManageEngine fits into the software bucket, I’ll address this straight up. It never fails to amaze me is many IT departments have a hodge-podge set of tools. Recently, I ran into a company that is using What’s up, Altiris, some basic MRTG and Tivoli…and it was only giving them up / down status. They also purchased Applications Manager and were liking the results.

I’ve heard this silo, multi-tool story many times. It happens for a variety of reasons. It is usually based on their IT infrastructure maturity level, the evolution their needs at the time or the case of IT decision makers coming and going. The attitude of the moment is: “Got a problem, I’ll solve it!” Reminds me of that line in the Vanilla Ice song...

Management software is not cheap and can cause neck and back problems (swiveling head back and forth to look at multiple consoles). There became a point where they wanted to become a performance-based IT Shop. They learned more about Applications Manager, Added VMWare and Storage monitoring, then added Service Desk to aid in their trouble ticketing and incident and change management processes.

There is a time when intuition based trouble shooting does not scale. Data needs to be collected to get a sense of what’s going on in the IT infrastructure. Then one must identify strength and weaknesses and measure progress against goals and historical data. All of which supports good decision making.

Selecting Metrics to Predict Performance

IT Metrics should be defined to fit the individual need. Not all infrastructures are the same. Within ManageEngine products, one can collect hundreds of arcane metrics. In some cases, IT shops are fire fighting all day and no one is aware of the performance metrics. Managing and controlling the IT metrics has big implications. Downtime and loss of productivity definitely puts a hit on the financial bottom line. Just selecting just a few critical metrics is key to moving toward a performance driven organization. Visit the metrics continually to align with the decision making strategy. Then, make the metrics visible to all to see. Some of our customers put up a dashboard in high traffic areas. People became more aware and active in understanding the goals of the IT strategy, thus making everyone more accountable.

Below are customer examples to drive the point home.

Jamie Gilbert, Director, CIO of CD Baby, the largest online distributor of independent music is using ManageEngine Applications Manager and Service Desk. He said there is an expectation within the organization of no downtime. Uptime metrics and SLA reporting for long term trending for site performance using URL sequence testing is invaluable. Not only performance driven, he also uses it for troubleshooting analysis.

In a previous position, he implemented Applications Manager to monitor 450 real and virtual servers in a mixed Windows and Linux environment with MS-SQL and Oracle databases. He experienced issues with a new application running on Apache, tomcat, and Java. While using real time performance reporting in Applications Manager along with long term trending and comparative analysis reporting across servers, they were able to hone in on the root cause of the issue. The root cause ended up being application programming issue in conjunction with tomcat connection limitations and JVM memory allocation. It was a multifaceted problem and Applications Manager made it possible to see the problems very easily and allowed the team to come up with a path to resolution.

Darren Qualls, CTO of Premier Global Technologies, user of ManageEngine IT360 explains database performance this way. Slow performing databases can be extremely tricky to chase down. An example would be a 9 Terabytes of SQL server data and throw a $20k piece of hardware at it. The likelihood is you’re still going to have problems. There are a few common issues you will run into with database servers. In most cases, you will want to start with lock waits. This is one of the standard metrics for any product. There are so many ways you can mess up record locking and not even know it for a year or two.

In 90% of the cases, record lock issues are only a drop in the bucket. The next thing I run into is the disks. Slow disk access will take a half million dollar blade System to its knees EVERY time! There are so many things that I have to categorically rate as self imposed; incorrect normalization of data, bugs in code, incorrect commit placement or parameters, etc. Even underpowered hardware with incorrect initial specs, organic growth, expired systems will cause problems. Another is telecom issues that can be anything that revolves externally around the system, network setup or remote pulls on queries for reports.

These are your common 3 server setups you’ll need for network maps and traffic monitoring to isolate the data to determine the issue. Do not skimp, without it you may end up taking about 3 times the effort to resolve it.