Back to Basics: Network Operations

By Bruno Wollmann, Contributor
Share Post

This blog on network operations is the fourth in my Back to Basics series. The first three blogs covered Gathering Requirements, Network Design and Network Implementation.

Once ACME Corporation’s data center network has been implemented, it will need care and feeding, which consists of operations and optimizations. There are a few key ingredients to successful network operations.

ACME’s data center project produced several documents that should be stored in two locations. First, all project documents should be stored in a read-only project archive. It will be used to answer questions like “Why did you do it this way?” or “What were you thinking?” This archive may prove useful to future projects as they try to follow established standards. Second, a copy of all documents deemed useful to network operations (i.e., cable plans, diagrams, configuration templates, IP addressing plans, and naming and numbering conventions) should be turned over to the network operations team.

Some of ACME’s resources would rather be launched from one of the company’s giant slingshots than update documentation, but it is critical to the success of the entire team to treat these as living documents. No network change should be considered fulfilled until all corresponding paperwork is complete.

Unless there is some security policy preventing it, sharing read-only versions of these documents with other operations and support teams goes a long way to helping them understand what they are using. Other teams are far more likely to blame the network for all their problems when all they can visualize is a black box. Demystify the network whenever and wherever you can.

Network Visibility
Network visibility comes in many forms. A few are critical to the success of the network and can shift many network operations from being reactive to being proactive. The following sections describe forms of network visibility that should be pursued for ACME. General categories of network visibility tools are root cause analysis, capacity planning and security.

Syslog Collector: A syslog collector that can index, store and allow querying of all syslog messages from all network devices should be implemented. To be most effective, this syslog collector should collect logs from servers, storage systems, databases, applications and security appliances in addition to network equipment. Querying such a collection of syslog messages would greatly help with correlating events when troubleshooting and performing root cause analysis.

Configuration Management: Every network device requires a configuration. For fast and easy recovery during equipment replacement and failure, all configurations should be stored and archived with a history of changes that have occurred. This type of tool is invaluable during an audit.

Network Performance Monitor (NPM): A network performance monitor is a category of tools that can help take the guesswork out of capacity planning. These tools can report on link utilization, latency, and round-trip times and generate alerts when thresholds have been crossed. Data from these tools can serve as inputs for sizing of security appliances and help with the creation and maintenance of accurate quality of service (QoS) policies. Network statistics may be gleaned from both real-world traffic and synthetic traffic and is usually gathered in the form of IP Flow Information Export (IPFIX).

Application Performance Monitor (APM): Application performance monitors take performance information a few steps deeper and can provide information about applications that an NPM tool cannot. An APM can collect information about transaction times and report on what the overall user experience is.

Health Monitor: A useful health monitor reports on information such as up/down status, CPU utilization, link/bandwidth utilization, memory utilization, power consumption and temperature to name a few. This product should be evaluated to make sure it meets the needs of monitoring a highly available data center network.

Packet Capture: Packet captures are one of the main tools used to perform a deep dive into network and application performance problems. There are many packet capture tools available for use, some are open source, and some are sold for a fee.

Network Visibility Fabric (NVF): Implementing an NVF into ACME’s data center network would provide an excellent foundation for network visibility. Through a series of test access points (TAP) installed on all layers of the network, a copy of all packets can be sent, out-of-band, to a central aggregation point or packet broker where it can be used by the tools mentioned above. Because an NVF sends copies of packets out-of-band, there is no performance impact to production traffic or the production network. Traffic is copied for monitoring, security and analysis as the traffic continues to pass through the network unimpeded. Also, because there is a central collection point for all packets, it is simpler to broker packets to the tools that need them. An organization usually requires less of each type of tool (i.e., monitoring, security, analysis) because initial packet collection is performed by the broker rather than the specific tool and there is no re-design required to implement a new tool. The key is the central collection of packets.

An NVF can place tools in-line with production traffic which allows tools such as security devices (i.e., intrusion prevention system (IPS), anti-malware, data loss prevention (DLP)) to be implemented without a network re-design.

Using virtual TAPs, an NVF can TAP into virtualized environments where it has been historically challenging to troubleshoot.

IPFIX generation can be very resource intensive on network devices and sending IPFIX statistics to a collector consumes bandwidth. Offloading IPFIX generation to the NVF removes both problems.

Figure 1 displays what an NVF would look like in ACME’s data center network. The green lines show where TAPs have been installed in the traffic path and are cabled to a central collector or packet broker. The orange lines show how traffic from the packet broker is sent to the appropriate tools.

Figure 1: Network Visibility Fabric

Figure 1: Network Visibility Fabric

Change is the Only Constant
As business and application requirements evolve, the network will also need to evolve. Security vulnerabilities will no doubt be exposed that need to be remediated to avoid exploitation. Network operating system lifecycles and bug fixes will force upgrades. Capacity may need to be added. Whatever the reason, there will be change.

All significant changes should send the network through a spate of tests to ensure it still supports all business requirements. Periodic testing would make sure even minor changes are tested in batches. In a perfect world, every aspect of the data center network would be duplicated in a lab environment so that all changes could first be tested there. Even if ACME had the funds to do this, they can’t duplicate the types of traffic along with the load and flow that is experienced in production. Invariably, there is always some level of risk associated with every change, so ACME has only provided a subset of the network in a lab environment.

Documentation should not be immune to change. Be a team player and document, document, document.

Read My Other Blogs in the Series

Gathering Requirements

Design Fundamentals

Network Implementation


The Network Lifecycle