Table of Contents
Towards a single PKI for Configuration Management - DRAFT 3
Background
For a general introduction to SSL and Certificate Authority (CA), see SSL vs. SSH, the Sysadmin's Basic Guide to SSL Certificates and Authorities and Client Authentication with SSL.
For a general introduction to Public Key Crypto and PKIs, see Wikipdeia's SSL article and Redhat's Introduction to Public-Key Cryptography. Kerberized Credential Translation: A Solution to Web Access Control is also good to read.
For a general introduction to configuration management systems, see UKUUG 2007 System Configuration presentation, SAGE Short Topics Booklet: System Configuration and Comparison of open source configuration management software. You may also want to subscribe to the config-mgmt mailing list.
Machine Identity
How do you know a machine is the machine you wanted to connect to? How can machines assert and obtain the identity of other machines they communicate with, when no user is around?
The tragedy of IP address based "security"
If you just trust the network, you could easily become the victim of IP Address Spoofing, a Man in the Middle Attack, DNS Cache Poisoning, a rogue DHCP server, network snooping, someone just taking the machine you are trying to connect to down and setting a new one up in its place, or other attacks. There are rootkits available for most of these attacks that make implementing them trivial. Often it is possible to carry out these attacks from places you wouldn't think it would be possible, using other widely-available attacks against low-level networking protocols and networking equipment.
One usual refrain is "we trust our employees"; but if that is true, why doesn't everyone with access to your physical network have the root/Administrator passwords to your production servers? The difference between everyone having the passwords and trusting the network is more slim than most choose to know, acknowledge, or think about a lot. And in most cases, your security is only as good as the weakest link in the chain, because a malicious individual or program can use the weakest link as a vector to obtain access to or harvest passwords that will probably work on other computers and services.
You might think that this would not be a problem any more since the security vulnerabilities of applications that trust the network such as NFSv3, rsh and telnet have been known for decades, but people and large corporations continue to use these old protocols, and to produce client/server and peer to peer software with root/administrator level access to the local machine that just trusts the network.
In many cases this seems to be due to the fact that projects are started in "research mode", with robust security not being an initial design consideration (which conventional wisdom suggests is a good way to create a piece of software that will be very difficult to secure in the future). When the project later gets more widespread use, security may be added on somehow, but for backwards compatibility reasons they are often still not Secure by Default when shipped/configured in the default manner. Since many Sys Admins are lazy (in the bad way), clueless, disgustingly busy, or work for people who are clueless and/or on deadline, the default is what gets implemented in a large number of cases.
Security is Hard
Another reason for this is because difficult to crack security is hard. Really hard. Even protocols that are considered very secure, like SSH, have deployment problems that are usually ignored, such as key distribution; so they tend to rely on the level of security paranoia of the user (i.e. does the user just type "yes" when presented with a prompt to accept a new or changed key?).
With SSH, the example is a recent attack that precomputes billions of SSH keys, gets their key fingerprints, and then uses an algorithm to choose a key fingerprint that looks a lot like the key fingerprint for a specific SSH server (roughly the same letter/word shapes, tries to have the same leading and trailing characters etc.) In practice this works quite well, rendering a high success percentage to man in the middle attacks; so unless you configure SSH to never accept keys on trust and instead uses some Configuration Management application to distribute them, you could be in trouble. (This attack and an inventive, but unfortunately not implemented in OpenSSH yet solution, is in this presentation, starting at "Wetware Bug"; there is also video of the talk, which is really fun.)
A secondary example of the difficulty of security is one of the Bcfg2 author's favorite examples: using Bcfg2 to distribute OpenSSH keypairs, including private keys. Unless you wait for 0.9.3 and then do a setup that includes separate passwords for each Bcfg2 client, you will make your OpenSSH security posture (arguably) worse if you do this, because any Bcfg2 clients that share the same password (or anyone who knows that password) can, using some of the cracking techniques mentioned above to defeat IP-based "security", obtain the SSH private key of any other machine that uses that password, and then use that private key to facilitate a man in the middle or machine spoofing attack. So you would be solving one problem, but introducing another; and the fact that you are introducing another was nonobvious to very intelligent people.
Configuration Management Software
My personal interest is in the Configuration Management space. I've used cfengine and looked at puppet, both of which seem to have had good security designed in from the beginning (or at least have authors that cared a lot about security early on; I'm not a canonical security expert, esp. at the source code level). I also looked at IBM Tivoli Configuration Manager several years ago; as I recall it is really insecure by default (even has rsh built into the product for some things, in a manner that made it impossible to replace by ssh), but had an add-on that let it use normal SSL certificates. However the complexity (and fragility) of the product made me decide that it would only really usable if there was a large team of people dedicated full time to it (the IBM documentation actually explicitly says a TCM deployment will fail without a team dedicated to it), preferably with a large-dollar contract with IBM Global Services.
I am currently most interested in Bcfg2, which I helped to prod towards a better security stance (the next release, 0.9.3, which should come out in around a week, has client authentication of the server cert, and per-client passwords).
For a different problem domain, black box testing of configuration management software, I am very interested in STAF, a quality engineering automation programming framework (which at my day job we are also pushing into service as a cross-platform Configuration Management solution, since none of the other FLOSS options mentioned in this post have good Windows support) which has pretty atrocious security (think rsh .rhosts/hosts.equiv), although it has gotten slightly better (think telnet) over time, and shows signs of improving more in the future.
A Plethora of Identity Documents
A slightly more subtle problem is that many applications have their own certificate authorities and methods/formats for the public/private keypairs/certificates, which makes using multiple applications annoying and difficult to manage, esp. in terms of revocations. For example, you could in theory deploy these 6 applications/services:
- Kerberos V (not public key encryption; has a keytab file to prove machine identity). Kerberos V is (pretty much) a requirement for OpenAFS and NFSv4.
- OpenSSH (OpenSSH-specific keypair)
- Cfengine (Cfengine-specific keypair)
- Puppet (Standard SSL Private Key/Certificate)
- Stunnel (Standard SSL Private Key/Certificate)
- Bcfg2 (Standard SSL Private Key/Certificate for the server, per-client password stored in bcfg2.conf and known by the server).
You would then have 5 separate files on each machine which prove the machine's identity, only 2 of which will be usable by applications other than those that created them (Kerberos V and Standard SSL Private Key/Certificate).
But I want to use the pretty software!
My problem is that often the projects with poorer security are also the most interesting and/or have the most management buy-in and/or are the only projects that support the required platforms; I want to use them, but I don't want to sacrifice the security of production machines. I am not a great coder, but I am pretty good at finding infrastructure solutions to complex problems using free (libre) software.
My evaluation of the situation is that in the GNU/Linux and Unix worlds, OpenSSH is pretty much a requirement, and is often installed by default, with a new keypair (for the machine's sshd) created on bootup (or by your Configuration Management system). Thus you absolutely need an OpenSSH keypair; so it would be great if this could be the base for as many other applications as possible. It turns out that it is possible to convert a SSL Certificate to a SSH keypair, so presumably that process could also be reversed. If possible, that covers OpenSSH, Puppet, Stunnel, and the Bcfg2 server. In theory Cfengine could use the RSA keypair generated by OpenSSH, but people with Cfengine knowledge thought that that would be silly. While Puppet has a CA, you aren't forced to use it, and it is completely standards-based.
For Kerberos V (and NFSv4 and OpenAFS), you need to deploy a Kerberos V server; in a shop that includes Windows, you probably already have an Active Directory server, so that is usually the most practical option until Samba 4 comes out. This is of course deeply unfortunate from an ethical point of view, as aside from any other arguments against Microsoft and its business model in general, it is specifically the result of Microsoft ignoring/postponing/stonewalling responses to anti-trust rulings in the US and EU that force it to release information to allow interoperability with its products (Samba 4's authors spend a lot of time reverse engineering), and Microsoft embracing and extending the Kerberos 5 protocol.
It is possible to run OpenSSH as a Kerberos 5 service (to do this fully, you need to apply a patch; I have some encap profiles that do that here); so in that case it would be in the Kerberos 5 camp instead of the OpenSSL camp. IMHO if you are in a Kerberos 5 environment, this makes much more sense than relying on a Configuration Management app to handle your OpenSSH known hosts file etc.
There is a system to create short-lived SSL x.509 certificates (called kx.509) from Kerberos tickets (another site here). There are two components to this; the Kerberos Credential Authority (KCA) and the Kerberos Credential Translator (a.k.a. Kerberos Certificate Translator / KCT). In theory you could probably hack that to create long-lived certificates, and thus have a "single source" for all identity information for a machine. There is a mailing list thread on lopsa-tech with a little bit more information. There is a good overview of kx.509 starting on page 27 of REST Project - Final Report.
It is also possible to get Kerberos credentials from SSL/TLS/x.509 credentials using PKINIT. Here is the Heimdal Kerberos doc and the RFC which defines PKINIT.
Now I have to manage machine accounts too?
Kerberos 5 has a mature suite of administration utilities, and its centralized nature (the server has full state), longevity, and the fact that it isn't a public-key based system make it relatively easy to manage. If you run a Windows Active Directory domain and your machines are all part of the domain, you are done on the Windows side. Public Key management and distribution (aka PKI, or Public Key Infrastructure) on the other hand is widely considered a problem for which optimal/mature solutions have not yet been found. Because of this, some of the individual products that use SSL certificates have their own Certificate Authorities (CA).
In the Unix tradition of small tools that do one thing well (well okay, maybe not the small part) I think it would make more sense to have the CA independent of the software that makes use of SSL certificates, so your security solution isn't tied to a specific application. The most mature FLOSS solutions in this space looks to be the OpenCA PKI Project and EJBCA - The J2EE Certificate Authority. Of the two OpenCA's community of users seems to be much larger (around 5x as many mailing list subscribers, messages, and Google hits). OpenCA also has the advantage of not relying upon layers of Java middleware cruft or falling into the java trap (it will be a long time before (a) a free/libre JVM is released and (b) EJBCA is updated to be compatible with that JVM). Debian and Ubuntu users may be interested in how to create OpenCA Debian Packages. The OpenXPKI Project (a fork of OpenCA) and (if you can do without a web interface) TinyCA and CSP look like the other main competitors in this area.
Chatting with Luke Kanies on #puppet (freenode), it also turns out that puppet integrates easily with third party CAs, and Puppet's Certificate authority, puppetca, is very usable as a CA for other purposes.
- djbclark: In theory, it's possible to use a CA other than the one built into puppet with puppet, right?
- lkanies: Yep. Just drop the certs into the right place, and puppet will load them automatically. Puppet's CA is only used when generating and signing certs. You could even drop your own CA Cert onto the puppetmaster, and it will be used to sign certs.
- djbclark: Not that you'd advocate this, but would it make any sense to use puppetca without using puppet at all (e.g. as an alternative to TinyCA/OpenCA/EJBCA etc)?
- lkanies: Yeah, and I would advocate that. I use it for my web and mail servers. I'm actually about to generate a cert for use on my blog, in fact, using puppetca.
Puppet uses a Certificate Revocation List (CRL), but I'm guessing it would be pretty easy to also get the certificate revocation in a format usable by one of the OSCP daemons, so that information would be easily usable by other applications, such as stunnel (Stunnel-4.20 Man Page, search for OCSP = url)
- djbclark: Is Puppet's CRL in a format only immediately usable by puppet, or is it something that other apps / a OCSP daemon could use?
- lkanies: It should be standard. Puppet uses openssl's support for a CRL, it doesn't do anything special.
It looks like it would be possible to use the Simple Certificate Enrollment Protocol (SCEP) client such as AutoSscep or SSCEP to replicate puppetca's "waiting for certificate" type behavior.
Another intriguing possibility is The MyProxy Certificate Authority; this may help to interface the Kerberos and SSL/x.509 worlds. There is a Python/OpenSSL MyProxy client. There is a mailing list thread on the topic of using the MyProxy CA for long term certificates / machine identity.
In general the Globus Grid project has spawned a lot of interesting work and papers on Kerberos and SSL/TLS/x.509/PKI based authentication, such as Simplifying Public Key Credential Management Through Online Certificate Authorities and PAM.
You of course want to be very careful with the SSL Private keys; they should be readable only by the local SYSTEM or root account, and you don't want to ever use these client SSL certificates with web browsers.
A note on keypair vs. certificate trust
- lkanies: SSH and Cfengine keys only have mutual-key trust, which makes it far harder to build a secure system. There have been some interesting but impractical talks I've seen recently that could help to overcome the poor trust model of key pairs; the main point, though, is that it's tough to manage and it's always going to be a bit insecure, because a human will generally have to approve each host.
- djbclark: I didn't quite get your mutual-key trust comment; what would the situation be where a human wouldn't have to approve each request (or use some heuristic like Cfengine's I-trust-X-network-exactly-once)?
- lkanies: Well, with certs you don't have to because every host trusts the CA, and every host's cert is from the CA, so each host trusts the same cert, and all trust derives from that cert. That's the primary reason why Puppet uses certs and not SSH keys - so two puppet clients could talk to each other and not have to worry about trust.
- djbclark: Yes, but for the client to get the cert that the server creates in the first place, doesn't a human need to do something? Otherwise, how does the server know the client is the client?
- lkanies: Yes, but what about if clientA wants to talk to clientB? That is, what about communications other than just the primary server and each client?
- djbclark: Oh okay, so is it just that you get N-way trust "for free" after the initial human interaction, or am I totally missing something?
- lkanies: And what about overlapping trust domains, like a hierarchical server model, where you've got a primary server in your central DC, and a separate CA in different DC's, with all DC's trusting the primary DC, and only the local DC's trusting the local CA.
- lkanies: Yep, you got it. But don't forget, also, that 1) you can trust any number of CAs, each of which bring their own pools of certs, and 2) CAs can create new CAs, and the certs signed by the sub-ca are automatically trusted by clients of the main ca. There's already somebody in the puppet community using multiple CAs.
A possible workaround for insecure software
So, back to the problem: we have some method set up to create and manage SSL private keys, certificate signing requests, and certificates for each machine, and now we want to be able to use software that uses IP address based, plaintext password based, or other less-than-ideal security via secure, encrypted, authenticated channels.
At first I thought that it looked like stunnel was the best contender for this job. I didn't want to use meta-solutions like OpenVPN or IPsec/DNSsec, as they would be intrusive. I wasn't going to use OpenSSH, because I must support Windows systems, and there is no really clean OpenSSH implementation for Windows; also OpenSSH is more geared towards server-to-user communications, whereas stunnel is geared towards server-to-server. However the problem with any tunnelling solution at the level in the network stack of stunnel or ssh port forwarding is that, to the server, it looks like all requests are coming from localhost (127.0.0.1). Stunnel has a transparent proxy (-T) option that works around this, but it only works on GNU/Linux. So it looks like the only solution that will work would be something that is completely transparent to the application that depends on IP Address security.
This leaves OpenVPN and IPsec. AIX is our main server platform, and OpenVPN doesn't support AIX, so this leaves IPsec. There is a great Debian IPsec Micro‐Howto, and AIX has had a very good IPsec implementation since version 4.3.3 (current version is 5.3), which is documented in the AIX Virtual Private Networks page and the AIX Security Guide, page 160. There is also some doc on AIX and Linux interoperability in the AIX and Linux Interoperability Redbook, page 198; however I think it references an old version of the GNU/Linux implementation (the book is circa 2003).
A good thing about IPsec is that it allows use on a per-port basis (using something it calls filters), so it should be possible to only secure the ports used by the configuration management apps. I am now in the process of testing IPsec, and setting it up on a set of machines.
Once the initial setup is complete, I will be testing the solution with Bcfg2 (ironically using the old-style single-password security; I may even bug the author to create a flag to disable the built-in ssl :-) and STAF. I think it can be assumed that Puppet would just work as well if those 2 applications work, although using stunnel with puppet would probably be a bit of overkill.
Daniel Joseph Barnhart Clark is a supporter of Free Software Activism and the evolution of System Administration as a profession. As a hobby he maintains OpenSysAdmin.com and works on interesting Sys Admin problems; by day he is a Storage Administrator with IBM. He encourages everyone to obtain low-cost (think monthly) warm fuzzies by becoming a member of the Free Software Foundation.
