The first part of the book focuses on collaboration with development and testing. I’ll cover some of the tools and techniques to improve how you work and communicate about your work.
You can’t be the lone sysadmin anymore known for saying “no.” The nature of the work may start at understanding operating systems, but it spans across understanding services across different platforms while working in collaboration with other teams within the organization and potentially external to your team. You must adopt tools and practices from across the organization to better perform your job.
You need to be comfortable with using the terminal and graphical interfaces. Just about every tool I’ll cover has some aspect of command line usage. Being able to explore and use the tools helps you understand when problems arise with the automation. When you have to debug the automation, you need to know whether it’s the tool or your use of the tool.
You can’t ignore version control. For years, DORA’s annual State of DevOps report has reported that the use of version control highly correlates to high IT performers Version control is fundamental to collaboration with other parts of the organization whether you’re writing code to set up local development and test environments or deploying applications in a consistent and repeatable manner. Version control is also critical for managing your documentation whether it’s README’s embedded in a project repository, or as a separate project that spans content for the organization. You administer tests of the code you write, as well as the infrastructure that you build within version control.
The second part of this book covers managing infrastructure. Systems administration practices that work well when managing isolated systems are generally not transferable to cloud environments. Storage and networking are fundamentally different in the cloud, changing how you architect reliable systems and plan to remediate disasters.
For example, network tuning that you might handcraft with ttcp testing between nodes in your data centers is no longer applicable when your cloud provider limits your network capacity. Instead, balance the abilities gained from administering networks in the data center along with in-depth knowledge about the cloud providers limits to build out reliable systems in the cloud.
In addition to version control, you need to build reusable, versioned artifacts from source. This will include building and configuring a continuous integration and continuous delivery pipeline. Automation of your infrastructure reduces the cost of creating and maintaining environments, reduces the risk of single points of critical knowledge, and simplifies the testing and upgrading of environments.
Scaling Production Readiness
The third part of the book covers the different practices and processes that enable scaling system administration. As a company grows, monitoring and observability, capacity planning, log management and analysis, security and compliance, on-call and incident management are critical areas to maintain, monitor and manage risk to the organization.
The landscape of user expectations and reporting has changed with services such as Facebook, Twitter, and Yelp providing areas for individuals to report their dissatisfaction. To maintain the trust of your users (and potential users), in addition to improvements to how you manage and analyze your logs, you need to update security and compliance tools and processes. You also need to establish a robust incident response to issues when we discover them (or worse when our users find them).
Detailed systems monitoring adds application insights, deeper observability, and tracing. In the past, system administration focused more on system metrics, but as you scale to larger and more complex environments, system metrics are less helpful and in some cases not available. Individual systems are less critical as you focus on the quality of the application and the impact on your users.
Capacity planning goes beyond spreadsheets that examine hardware projections and network bandwidth utilization. With cloud computing, you don’t have the long lead times between analysis of need and delivery of infrastructure. You may not spend time performing traditional tasks such as ordering hardware, and “racking and stacking” of hardware in a data center. Instance availability is near instantaneous, and you don’t need to pay for idle systems anymore.
Whether containerized microservices, serverless, or monolithic applications, log management, and analysis needs have become more complex. The matrix of possible events and how to provide additional context to your testing, debugging, and utilization of services is critical to the functioning of the business.
The system administrator role is a critical role that encompasses a wide range of ever-evolving skills. Throughout this book, I share the fundamental skills to support architecting robust highly scalable services. I’ll focus on the tools and technologies to integrate into your work so that you can be a more effective systems administrator.
A Role by any Other Name
I have experienced a dissonance over the last ten years over the role “sysadmin”. There is so much confusion about what a sysadmin is. Is a sysadmin an operator? Is a sysadmin the person with root? There have been an explosion in terms and titles as people try to divorce themselves from the past. When someone said to me “I’m not a sysadmin, I’m an infrastructure engineer”, I realized that it’s not just me feeling this.
To keep current with the tides of change within the industry, organizations have taken to retitling their system administration postings to devops engineer or site reliability engineer (SRE). Sometimes this is a change in name only with the original sysadmin roles and responsibilities remaining the same. Other times these new titles encompass an entirely new role with similar responsibilities. Often it’s an amalgamation of old and new positions within operations, testing, and development. Let’s talk a little about the differences in these role titles and set some common context around them.
While devops and SRE have been around for approximately ten years, the role of system administrator (sysadmin) has been around for much longer. Whether you manage one or hundreds or thousands of systems, if you have elevated privileges on the system you are a sysadmin. Many definitions strive to define system administration in terms of the tasks involved, or in what work the individual does often because the role is not well defined and often takes on an outsized responsibility of everything that no one else wants to do.
Many describe system administration as the digital janitor role. While the janitor role in an organization is absolutely a critical role, it’s a disservice to both roles to equate the two. It minimizes the roles and responsibilities of each.
A sysadmin is someone who is responsible for building, configuring, and maintaining reliable systems where systems can be specific tools, applications, or services. While everyone within the organization should care about uptime, performance, and security, the perspective that the sysadmin takes is focused on these measurements within the constraints of the organization or team’s budget and the specific needs of the tool, application, or service consumer.
Finding Your Next Opportunity
One of the reasons you might have picked up this book, is that you’ve been within your position for awhile, and you’re looking to your next opportunity. How do you identify positions that would be good for your skills and experiences and desired growth? Across organizations, different roles mean different things, so it’s not as straightforward as just substituting a new title and doing a search. Often it seems the person writing a job posting isn’t doing the job being described, as the postings will occasionally include a mishmash of technology and tools.
A danger to avoid is thinking that somehow there is some inherent hierarchy implied by the different roles even as some folks in industry or even within an organization assume this.There is a wide range of potential titles. Don’t limit yourself by the role title itself, and don’t limit your search to just “sysadmin” or even “sre” and “devops”. From “IT Operations” to “Cloud Engineer” the variety of potential roles are diverse.
Before you even examine jobs, think about the skills you have. As a primer, think about what technical stacks are you familiar with? How familiar are you with the various technologies described in this book? Think about where you want to grow. Write all of this down.
As you review job reqs, as you note skills that you don’t have that you’d like to have write those down. Compare your skill evaluation with the job requirements and work towards improving those areas. Even if you don’t have experience in these areas, during interviews if you are able to clearly talk about where you are compared to where you want to be for those skills it goes a long way to showing your pursuit of continuous learning (which is a desirable skill).