Using Nornir for

Network OS Upgrades (Part 2)

Author: Kirk Byers
Date: 2019-05-01

This is an updated version of the original "Using Nornir for OS Upgrades (Part 2)" article dated 2018-09-18.

This article has been updated to reflect Nornir 2.x inventory format.

Once more unto the breach, dear friends, once more;

In Part1, I wrote about Nornir's inventory and about using Nornir and the netmiko_file_transfer plugin to transfer a file to twelve devices across three different platforms.

Now we are going to continue this OS upgrade process. In this article, we are going to do the following:

  • Expand the file transfer to select different files (based on the platform);
  • Set the boot variable;
  • Save of the running configuration;
  • Reload the device.

For the first task (expanding the file transfer), we will use all twelve devices. For the remaining three tasks, we will only focus on Cisco IOS. In other words, we will actually perform the OS upgrade on the two Cisco IOS devices.

Expanding the file transfer to select different files.

In Part1, we transfered the exact same file to all of the devices. We would expect; however, that we would need to transfer different files for different platforms (in other words, the different platforms will require different OS image files). Consequently, we need to handle this in our code.

Since I am grouping my devices by platform, the easiest location to specify different files is in the groups.yaml file. As a result, our groups.yaml will look as follows (which is identical to what we had in Part1):

---
cisco-ios:
  platform: 'cisco_ios'
  data:
    img: 'c880data-universalk9-mz.155-3.M8.bin'
    backup_img: 'c880data-universalk9-mz.154-2.T1.bin'

arista:
  platform: 'arista_eos'
  data:
    img: 'test_arista.txt'

nxos:
  platform: 'cisco_nxos'
  data:
    img: 'test_nxos.txt'

Note, for the "cisco_ios" group, we are specifying the name of the image file that we will be transferring to the device (using the 'img' variable). We also specify the 'backup_img' file which is the image that was in-use on the device (prior to me doing this upgrade process). We will use this 'backup_img' variable when we are configuring the boot settings. For both Arista and NX-OS, we will just transfer text files, but it will be a unique text file per platform.

Here is my updated code to accomplish this file transfer. Note, once I again, I am using the Nornir 2.0 branch as of today.

from nornir import InitNornir
from nornir.plugins.tasks.networking import netmiko_file_transfer

from nornir_test.nornir_utilities import nornir_set_creds, std_print


def os_upgrade(task):
    file_name = task.host.get('img')
    result = task.run(
        task=netmiko_file_transfer,
        source_file=file_name,
        dest_file=file_name,
        direction='put',
    )
    return result


def main():
    # Initialize Nornir object using default "SimpleInventory" plugin
    nr = InitNornir()
    nornir_set_creds(nr)
    result = nr.run(
        task=os_upgrade,
        num_workers=20,
    )
    std_print(result)


if __name__ == "__main__":
    main()

From my main() function, I use "nr.run()" to execute the "os_upgrade" function that I have defined inside the program.

If you look at the os_upgrade() function, you see that it takes a single argument named "task". Here, I am relying on an important aspect of Nornir namely that the nr.run() inside of main, will create child threads or in the case of "num_workers=1" will entirely execute in the main thread. But once "os_upgrade" is running, it will be executing in the Nornir "task" context. This implies that the "task" argument passed in will actually be a Nornir task object. Additionally, task.host inside of "os_upgrade" will be the host that this task is operating on (technically Nornir Host object).

In our case this "task.host.img" variable will be the file that we want to transfer. Consequently, we can use this file_name variable as both the source_file and the dest_file in the netmiko_file_transfer plugin.

We also specify that we are doing a "put" operation i.e. that we are transferring the file from the Nornir control machine to the remote network device. Note, I have all of the specified "img" files in the same directory as my Nornir script. Note, the Cisco IOS image file is not in GitHub as it is a copyrighted file.

Executing this program returns that the correct files are on all of the remote devices.

Setting the boot variable.

We have successfully transferred unique files per platform to the remote devices, now we need to set the boot variable on the remote devices

A good pattern to use here is to first determine the exact configuration that you want to achieve (via your program). In other words, actually configure a test device to the desired end-state. This will ensure (hopefully) that your configuration commands are 100% correct.

The desired end configuration for my two Cisco IOS routers is:

boot-start-marker
boot system flash c880data-universalk9-mz.155-3.M8.bin
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker

My currenct configuration on both IOS devices is:

boot-start-marker
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker

Now, I am going to expand my code to filter down my "nr" object to a single device. From this point forward (in the code), I want to be careful about my actions. I want to step through them and check them carefully. Because of this, at least temporarily, I want to disable the concurrency and only operate on a single device.

In addition to filtering my nornir object down to a single device, I also ran the "set_boot_var" task on this host. "set_boot_var" is a function I have defined that uses the netmiko_send_config plugin.

In my main function, this code looks as follow:

    # Filter to only a single device
    nr_ios = nr.filter(hostname="cisco1.domain.com")

    aggr_result = nr_ios.run(task=set_boot_var)

    # If setting the boot variable failed
    # (assumes single device at this point)
    for hostname, val in aggr_result.items():
        if val[0].result is False:
            sys.exit("Setting the boot variable failed")

And my set_boot_var function is:

def set_boot_var(task):
    """
    Set the boot variable for Cisco IOS.

    return True if boot variable set

    return False if staging verification steps failed
    """
    primary_img = task.host.get('img')
    backup_img = task.host.get('backup_img')

    # Check images are on the device
    for img in (primary_img, backup_img):
        result = task.run(
            task=netmiko_send_command,
            command_string=f"dir flash:/{img}"
        )
        output = result[0].result
        # Drop the first line as that line always contains the filename
        output = re.split(r"Directory of.*", output, flags=re.M)[1]
        if img not in output:
            return False

    commands = f"""
default boot system
boot system flash {primary_img}
boot system flash {backup_img}
"""
    command_list = commands.strip().splitlines()
    task.run(
        task=netmiko_send_config,
        config_commands=command_list
    )
    return True

The "set_boot_var()" code, retries the primary and backup image file names from inventory. It then ensure that both of those image files exist in the "flash:" file system. Note, the earlier file transfer process already ensured the MD5 of the primary image file. This check is ensuring that I will only specify image files that exist on the device.

If I was creating a more general program, then I would not hard-code the file-system to "flash:". Additionally, I would probably incorporate a final MD5 check before setting the boot variable. In other words, ensure not only that the files exist, but they are exactly correct.

After I have done those verifications, I then use a Python f-string to construct the boot commands that I want to send (yes, it is nice to use Python3.6). Finally, I use the netmiko_send_config plugin to send the configuration commands down the channel.

I also added the following to the main() section of the code:

# Verify the boot variable
    result = nr_ios.run(
        task=netmiko_send_command,
        command_string="show run | section boot",
        num_workers=20,
    )
    std_print(result)
    continue_func()

In other words, I execute "show run | section boot" and print this out to the screen. I then call a function named "continue_func" which just prompts if I want to continue.

The entire program at this point is here.

Executing this program yields:

$ python nornir_os_upgrade.py
Enter username: pyclass
Password:
Transferring files

.... omitting file transfer output ....


--------------------------------------------------
pynet-rtr1
boot-start-marker
boot system flash c880data-universalk9-mz.155-3.M8.bin
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker
--------------------------------------------------


Do you want to continue (y/n)? y

We can see at that our boot section looks correct and on the router, I see the configuration was changed from/to:

pynet-rtr1#show run | section boot
boot-start-marker
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker
pynet-rtr1#

pynet-rtr1#show run | section boot
boot-start-marker
boot system flash c880data-universalk9-mz.155-3.M8.bin
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker

Reload, reload, let's try a reload.

Our final steps are to complete the "write memory" and "reload".

In order to do the "wr mem" and "reload", I added the following code into main():

    # Save the config
    result = nr_ios.run(
        task=netmiko_send_command,
        command_string="write mem",
    )
    std_print(result)

    # Reload
    continue_func(msg="Do you want to reload the device (y/n)? ")
    result = nr_ios.run(
        task=netmiko_send_command,
        use_timing=True,
        command_string="reload",
    )

    # Confirm the reload (if 'confirm' is in the output)
    for device_name, multi_result in result.items():
        if 'confirm' in multi_result[0].result:
            result = nr_ios.run(
                task=netmiko_send_command,
                use_timing=True,
                command_string="y",
            )

    print("Devices reloaded")

Before doing the reload, I explicitly ask whether the user wants to do this (or not).

In practice, I would probably decouple the file transfer and staging process from the actual reload.

In other words, the file transfer process is slow (potentially very slow if you are not using concurrency), but relatively low risk. The reload process is high-risk and you probably want to do it a very controlled manner.

Additionally, I recommend integrating more checks and verifications into the process. Basically, I would ask myself, "what verifications would be required if I was doing this manually." I would then bake those verifications into the automated process. Even with additional verifications, make sure you follow reasonable operations practices to minimize the downside if things go wrong.

Note, the "reload" command requires an extra confirmation in IOS. Consequently, the associated code handles that confirmation and sends a "y" as required.

All right, let's see what happens when we execute this program...

$ python nornir_os_upgrade.py
Enter username: pyclass
Password:
Transferring files

--------------------------------------------------
pynet-rtr1

True
--------------------------------------------------

--------------------------------------------------
pynet-rtr2

True
--------------------------------------------------

--------------------------------------------------
arista1

True
--------------------------------------------------

--------------------------------------------------
arista2

True
--------------------------------------------------

--------------------------------------------------
arista3

True
--------------------------------------------------

--------------------------------------------------
arista4

True
--------------------------------------------------

--------------------------------------------------
arista5

True
--------------------------------------------------

--------------------------------------------------
arista6

True
--------------------------------------------------

--------------------------------------------------
arista7

True
--------------------------------------------------

--------------------------------------------------
arista8

True
--------------------------------------------------

--------------------------------------------------
nxos1

True
--------------------------------------------------

--------------------------------------------------
nxos2

True
--------------------------------------------------



--------------------------------------------------
pynet-rtr1
boot-start-marker
boot system flash c880data-universalk9-mz.155-3.M8.bin
boot system flash c880data-universalk9-mz.154-2.T1.bin
boot-end-marker
--------------------------------------------------


Do you want to continue (y/n)? y

--------------------------------------------------
pynet-rtr1
Building configuration...
[OK]
--------------------------------------------------


Do you want to reload the device (y/n)? y
Devices reloaded

And on the device itself:

pynet-rtr1#
pynet-rtr1#Connection to cisco1.domain.com closed by remote host.
Connection to cisco1.domain.com closed.

...wait some amount of time...

(pynet36) [gituser@ip-10-178-21-224 ~]$ ssh -l pyclass cisco1.domain.com
Password:

pynet-rtr1#show version
Cisco IOS Software, C880 Software (C880DATA-UNIVERSALK9-M), Version 15.5(3)M8, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2018 by Cisco Systems, Inc.
Compiled Thu 02-Aug-18 12:15 by prod_rel_team

ROM: System Bootstrap, Version 12.4(22r)YB5, RELEASE SOFTWARE (fc1)

pynet-rtr1 uptime is 1 minute

"cisco1" was already running 15.5(3)M8 both before and after the reload (as I had previously used essentially this same code to perform the upgrade earlier). I did set the boot section, back to the old configuration, and I did perform the actual reload. I also performed the upgrade on "cisco2" using essentially this same process.

Reference code and inventory files used in this article (with some minor modifications).

Kirk Byers

@kirkbyers

You might also be interested in: