Bhyve – BSD Hypervisor

like_do_i_care · on Aug 29, 2014

In what sometimes feels like a Linux world, it is good to be reminded of some of the excellent quality projects and codebases in the BSD projects. We still have a few Open and Free BSD servers running, as they are basically rock solid, and have a much slower rate of evolution, and decent stability, unlike some of the Linux distros we run, where constant rejiggling of the underpinnings, init system etc, were a bit painful.

donavanm · on Aug 30, 2014

One aspect thats always stood at is the system/userland divide. The BSDs have both kernel and userland developed by the same folks. The leads to a consistent, enjoyable, user experience. In linux land the divide between kernel and user is quite stark. Throwing problems overthe fence, and blaming the "other side", is far too common.

The state of documentation and man pages is indicative. Nearly everything in the BSDs has a well written up to date docs. Determining behavior in linux all too often comes down to reading opaque kernel code and old mailing list threads that may or may not be relevant.

_delirium · on Aug 30, 2014

That's the standard line, but I have never felt that as a user. Debian and FreeBSD feel about equally integrated to me as systems. Some stuff in both is developed by different upstreams (large parts of both the Debian and FreeBSD base installs are developed elsewhere), but the "buck stops here" when it comes to integration, integration-testing, engineering a consistent user experience, and generally making the system work for end-users. If anything Debian feels slightly more tightly integrated to me, though it depends on whether you frequently use a lot of non-base packages. FreeBSD's model is tight integration of the base install and light curation of everything else in userland, while Debian's model is to "Debianize" everything in the archive and ensure it plays well together, even non-base packages.

SEJeff · on Aug 29, 2014

Facebook uses Linux "where it is faster than FreeBSD, which is basically everywhere" per a recent whitepaper or presentation I saw. What I found interesting, is that Facebook runs FreeBSD in production at all.

It just goes to show how solid the BSDs really are. I use the hell out of keepalived (OSS vrrp daemon) + conntrackd on Linux, but strongly prefer carpd and pf. It moves a lot slower, but it caters to a different audience.

allegory · on Aug 29, 2014

An anecdote.

When the company I was working for at the time's AC failed in the machine room and melted three racks of kit, all 7 of our FreeBSD machines at the time were back online from scratch on hot spare kit in under two hours.

Our Windows guys were still working on restoring the master Active Directory pair and DFS cluster TWO DAYS LATER even with a DR strategy in place.

Quality engineering, documentation and automation ability shits on everything else IMHO and that's where FreeBSD lands every time. Performance isn't a major thing for me.

Linux on the other hand is heading in the direction of windows. I'll probably get flamed by the systemd proponents here but it took me three hours to work out how to debug this command:

   timedatectl set-ntp yes

Which returned:

   Failed to issue message call

vezzy-fnord · on Aug 29, 2014

Linux on the other hand is heading in the direction of windows.

The reasons for this, in part, are the different mentalities in the Linux and BSD communities. The Linux community used to be quite adamant about how "Linux is not Windows" and sticking to their paradigms, but more recently they've been absolutely obsessed with pipe dreams of conquering the desktop, and have been completely undermining their technical workflow in the process. Instead of "Linux is not Windows", it seems companies and communities alike have decided to pander to a crowd that has no respect for computing and Unix culture, who are used to closed systems that hold your hand, and quality is suffering in the process.

The BSD people on the other hand, have stayed true to their heritage, and show no signs of compromise. BSD people also tend to be very familiar with Linux as well, whereas the opposite is much less common.

lozf · on Aug 30, 2014

Reminds me of the old saying that <em>"Linux is for people who hate Windows, and *BSD is for those who love Unix."</em>

nisa · on Aug 29, 2014

> Failed to issue message call

Reminds me of some mad attempts to debug some problem in Unity/GNOME... I gave up pretty fast after stranding on orphaned freedesktop.org pages.. I distinctly remember ranting on reddit about the problems and I've got a few .odt presentations from GNOME developers that not so really explained some concepts.. it's a different world. I can't judge if it's better this way for the Desktop but I've found it impossible to debug without sacrificing enormous time and energy about things I'd never wanted to deal with.

Stark contrast has been xCAT by IBM.. it's used to manage clusters of 10000 machines and it's only small readable Perl and Shellscripts.. totally awkward at the beginning but once you get a grasp it's a flexible power tool.

anon4 · on Aug 30, 2014

A friend of mine calls it "desktrap loonix".

The sad part is it's still better than windows.

allegory · on Aug 30, 2014

Despite punching windows in the face a while back, I don't agree that GNOME is anywhere near windows. Not even going back to windows 2000.

anon4 · on Aug 30, 2014

>GNOME

Oh dear Eris, don't get me wrong, I don't claim GNOME in particular is a software fit for any purpose.

My point was more like that the base system is still really good and systemd even with its failings is better than dealing with Windows. As for the desktop, there's enough choice that you should be able to find something that's maximally convenient while minimally retarded, with a small amount of customization. That something will most surely not be GNOME.

That's kind of what I mean - it's gotten worse in some ways, but you can still pick your poison and it's not as bad as having to deal with whatever retardation Microsoft come up with.

incision · on Aug 29, 2014

>'Our Windows guys were still working on restoring the master Active Directory pair and DFS cluster TWO DAYS LATER even with a DR strategy in place.'

I'd be overjoyed if I never had to admin Windows again, but it's a bit unfair to frame that as an OS comparison.

The most basic of direct from TechNet 'best practices' would reduce the impact of a failure of that sort to minutes.

That said, that sort of failure not at all uncommon. Further, I expect such disasters would happen far less often if an existing, poorly implemented Microsoft environment could be remediated into something better as directly as anything unix-like.

allegory · on Aug 29, 2014

I think it's completely fair. Windows is completely non-deterministic due to the insane amount of coupling inside it and the sheer snowball of crap that has stuck to it over the years.

In this case, they had tested DR strategy etc in place. Unfortunately when it came to doing it for real (onto the same backup hardware the DR plan was tested against), TSHTF and Windows threw an obscure COM error loading the AD catalog. That required contacting MS 1st line who didn't know what it was so had to escalate it to the AD guys. One obscure registry key change and an ACL change (related to ESE) and it was back.

All best practices followed, yet complexity, poor design and secret knowledge crept in and shot the whole process.

Stuff like that scares the shit out of me having been on the end of it way too many times now.

Many a night have I spent up at 2AM trying to work out why the hell something odd has gone on inside Windows and taken something out in production. Not once in 20 years has a proper Unix (Solaris/FreeBSD) box woken me up or shafted me for hours.

shawnreilly · on Aug 30, 2014

The scenario described would indicate that there were delta's between the testing environment (where the DR strategy was tested), and the production environment. It was probably related to OS updates/changes being applied over time, resulting in a configuration that changes. I've always found it good practice to build and maintain a staging environment that mimics the production environment in all aspects. When configuration changes (aka security patch) are needed, they are tested and validated on the staging environment before they are deployed on the production environment. This gives you an opportunity to test and validate the results in a non-production (aka no rules, no SLA's) environment. Part of this involves validating that procedures such as DR will continue to work as expected on the new configuration, before it gets rolled out to production. From my experience, this methodology minimizes scenarios of unexpected behavior in the production environment (aka downtime). I would recommend this methodology (or anything similar) regardless of the OS/distribution you're using.

allegory · on Aug 30, 2014

That's exactly what was done. The guys we had were shit hot ex Microsoft admins.

The DR environment was the staging environment for the patches. Periodically the production kit would be block copied back to the DR environment and sysprepped.

Every step to prevent differences between the two clustered environments were taken.

Rapzid · on Aug 30, 2014

You're just not trying hard enough. There was a Xen hypervisor bug for years that could, when the stars aligned and the moon was full, jump the clock ahead 50 minutes. Nobody knew what was causing it until during a bug session the right guy was looking at the right bit of code and noticed an obscure problem in some in-line assembly.

Every system has bugs.

allegory · on Aug 30, 2014

Every system has bugs, but as the other reply said, quality engineering and most importantly transparency determine the impact.

Never been a fan of Xen. It doesn't strike me as quality software. Then again no virtualization solution has to me, yet. I'd rather just deploy all the services to the base machine and use MAC to isolate them.

vezzy-fnord · on Aug 30, 2014

This isn't about bugs, it's about system transparency.

bdunbar · on Aug 30, 2014

"Our Windows guys were still working on restoring the master Active Directory pair "

I was a Solaris admin at a previous employer. We ran annual DR exercises, and saw similar issues with the Windows kit. Even with months to prepare for the exercise, it took the entire exercise, sometimes longer, to bring 'Windows stuff' back to life.

What we (the unix team) did was to write into _our_ DR plan a few paragraphs on how to be up and running without DNS, or Active Directory.

allegory · on Aug 30, 2014

I see DHCP, DNS and AD (in that order), at least in a windows environment as step one of anything.

Solaris was nice and easy to get back apart from one machine I dealt with: the starfire E10k. The SSP failed on one that we had and the entire system uses crypto key exchange so only the original SSP can bring the system up. Fortunately we had an E15k being installed at the time so it was an emergency migration job.

I miss big iron. I had an old maxed out 1000E and a disk tray full of 4.3Gb SCSI disks at home for a bit (until I got the electricity bill!)

bdunbar · on Sept 1, 2014

"I see DHCP, DNS and AD (in that order), at least in a windows environment as step one of anything."

It gets _messy_ but I think it's worthwhile to write out in biz continuity plan how to function without DHCP, DNS, and AD.

Take my case: we sometimes had the tier 1 and 2 services (the unix-based ones at least) up _days_ before DNS and AD were available. In the real world the business can't wait for DNS and AD to have access to their tier one and two apps: customers are waiting for product.

Note that a lot of work went into getting the Solaris stuff to this point: in particular using ZFS and zones made the process or restoration a breeze. By the time of our fourth annual DR exercise, getting stuff up and running was essentially waiting for it to restore from tape, adjust for things like 'DNS y/n' and done.

harshreality · on Aug 30, 2014

  Failed to issue message call

What was that, a dbus configuration problem? Are you quoting it correctly because if you really ran into a failure message that's not googleable that's scary.

pantalaimon · on Aug 30, 2014

sounds simmilar to

https://wiki.archlinux.org/index.php/Systemd_FAQ#.22Failed_t...

allegory · on Aug 30, 2014

No idea. I never solved the problem. It just worked again randomly after a period of time (a couple of hours).

That was enough to scare me away from CentOS 7 and systemd to be honest.

anon4 · on Aug 30, 2014

On the whole I don't think systemd is that much more complicated to understand than a collection of bash scripts. It's a domain-specific declarative language for dependency-based execution. At least, once you understand it, it has less gotchas than bash scripting.

It's the rest of the utilities that they're trying to couple to it that are bullshit. Starting with the journal - asking for the last few lines of the journal can take half a minute for some reason. Fortunately you can ignore those parts and just use the smart init system, i.e. you can ignore the cron replacement and run plain cron, you can ignore the nptd replacement and run plain ntpd, etc.

jedberg · on Aug 29, 2014

reddit originally ran on FreeBSD. They switched to Linux before I got there for ease of administration. I wanted to switch it back to BSD, but never got around to it as it wasn't very high priority. I tried to switch again when we moved to Amazon, but sadly it wasn't an option so I went to Ubuntu instead.

I've always thought that BSD is superior from an admin perspective, as in it's easier to do it right and keep it solid.

I may be biased though as my first job was IT for a company that had BSD committers so everything we did was on BSD and I learned how to admin from arguably some of the best BSD admins in the world.

lifeisstillgood · on Aug 30, 2014

I would be interested in interviewing (for the worlds smallest blog audience) some of "the worlds best BSD admins" - is there any chance of an introduction ?

justincormack · on Aug 29, 2014

They have a job ad out [1] saying

"Facebook is seeking a Linux Kernel Software Engineer to join our Kernel team, with a primary focus on the networking subsystem. Our goal over the next few years is for the Linux kernel network stack to rival or exceed that of FreeBSD."

[1] https://www.facebook.com/careers/department?req=a0IA000000Cz...

contingencies · on Aug 30, 2014

That's very interesting, but isn't very convincing. Is there some kind of study on the supposed differences? I like BSD, but I've always felt from admin experience that the Linux networking stack is far more capable in terms of features.

justincormack · on Aug 30, 2014

Well obviously it is a brief statement. Linux generally does have enormous numbers of features, but they are no use if they do not apply to you. My guesses about Facebook is that network stack issues would be around ipv6, performance or large scale (eg lost of routes).

UNIXgod · on Aug 30, 2014

BSD is the implementation to the networking stack.

contingencies · on Aug 30, 2014

From the API design / pedigree side, sure. But that's a given, and a tangent to the curiosity I am trying to have sated :)

ams6110 · on Aug 29, 2014

BSDs do change stuff too. OpenBSD recently replaced sendmail with OpenSMTPD, and apache with nginx. FreeBSD 10 has some big changes[1], depending on what you're doing with it.

Still, in general I agree. I run OpenBSD as my main desktop system because it works and mostly stays out of my way.

[1] http://www.freebsd.org/releases/10.0R/announce.html

Forbo · on Aug 29, 2014

Just a heads-up, OpenBSD is actually removing nginx from the base install in favor of httpd.

http://undeadly.org/cgi?action=article&sid=20140827065755

Edit: Adding in the link to the actual commit: http://marc.info/?l=openbsd-cvs&m=140908174910713&w=2

ams6110 · on Aug 29, 2014

Wow that was pretty fast, by OpenBSD standards. nginx was just added to base a release or two ago. Thanks.

talideon · on Aug 29, 2014

It's understandable when you know that httpd was spun out of relayd: they basically had almost all the work done and figured they'd might as well do the last few bits needed ot make it a standalone HTTP daemon.

cjg_ · on Aug 29, 2014

Actually, OpenBSD just replaced nginx with their own httpd.

Nux · on Aug 30, 2014

What httpd would that be?

I thought they replaced their custom Apache with (also custom) Nginx.

clarry · on Aug 30, 2014

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/httpd/...

For some rationale, see this comment (by Reyk): http://undeadly.org/cgi?action=article&sid=20140827065755&pi...

Nginx is considerably larger than the old Apache was. Commits that took it to a new version were pretty big. I didn't see a whole lot of OpenBSD developers hardening the code. Simple and small seems to be the OpenBSD way. I like it.

Bluerise · on Aug 30, 2014

They wrote their own. A completely new implementation.

sturadnidge · on Aug 29, 2014

Be interested to see that paper / presentation if you can track it down, especially given https://www.facebook.com/careers/department?req=a0IA000000Cz...

krakensden · on Aug 29, 2014

whatsapp was a FreeBSD shop, and they weren't purchased that long ago.

X-Istence · on Aug 30, 2014

WhatsApp is still a FreeBSD shop AFAIK.

Keyframe · on Aug 29, 2014

Define faster though.

SEJeff · on Aug 29, 2014

There are 2 common things where "faster" makes sense.

1. Throughput

2. Latency

Both are relevant for webapps like facebook.

XERQ · on Aug 30, 2014

I've used FreeBSD since the 6.0 days, and agree that it's absolutely rock solid. The main problem is that the support cycles are way too short, for instance 10.0 is EOL in 4 months (1 year after it was released), whereas RHEL is EOL in 10 years.

jrapdx3 · on Aug 30, 2014

As others have commented, the effective EOL is not really that short. FBSD 10.0 was released Jan 2014, so 10.0 will be supported for a year or so. Updating to 10.1 (when it is finalized in Q4) will very likely be painless, no incompatibilities, so as a practical matter, EOL is extended at least another year or so.

This has been the history of FBSD since my exposure to 4.x back in 2000. A more recent example, FBSD 8.0 was announced Nov 2009. And now, nearly 5 years later, FBSD 8.4 is still a supported version.

Sure, technically each minor version of 8.x reached EOL after a year. But since keeping an 8.x installation up-to-date has been so seamless, it achieves an EOL >=5 years from introduction.

Anyway, how I see it based on real-world experience.

X-Istence · on Aug 30, 2014

This is being worked on, in that 10.x is going to be supported for at least 2 years, and in the future there has been talks about 5 years of support per major release. The minor releases will get dropped sooner, upgrading is mostly painless anyway (from 9.1 to 9.2 was very simple) so it shouldn't make much of a difference.

_delirium · on Aug 30, 2014

I think some of the reason for that is that enterprise usage of FreeBSD is frequently assumed to be via an intermediary, not directly as an actual FreeBSD install. RHEL directly supports enterprise installs, but FreeBSD is doing something more like producing a reference implementation. The enterprise installs that need N years of support lifecycle are supposed to run on a derived system (one of NetApp's, say), with its own EOL policies.

justincormack · on Aug 30, 2014

There is now going to be commercial support for 15 years. There was a discussion with the company doing it on bsdnow.tv recently.

pjmlp · on Aug 30, 2014

The worse is that many now seem to equate GNU/Linux to UNIX, thus forgetting not all UNIX are alike, nor POSIX is a portable as it seems, X is not everywhere and many other differences.

_delirium · on Aug 30, 2014

I think market share has mostly led to that. As long as the commercial unixes had significant market share (esp. among paying customers), people developing Unix software had an incentive to write it portably, so it could run on at least Solaris and Linux. As the bottom dropped out of the Solaris market, a lot of packages decided not to bother, as >90% of Unix customers were on Linux. Some don't even bother supporting more than one specific Linux distribution (among commercial software packages, that's often RHEL, sometimes RHEL+Ubuntu).

There was a period when non-x86 compatibility went through a similar low period (due to the decline of Alpha, SPARC, etc.), but increasing ARM market share is making architecture portability more practically relevant again.

clarry · on Aug 30, 2014

Another thing that bothers me a lot is that people equate GNU or Linux or GNU/Linux to free software.

squiguy7 · on Aug 29, 2014

Not to mention, you can run this on Linux too so you are not limited to using BSD considering it is a type 2 hypervisor.

justincormack · on Aug 29, 2014

You can run Linux as a host on bhyve, but bhyve has not been ported to run on a Linux host yet.

squiguy7 · on Aug 29, 2014

Oh shoot, read that wrong!

wyager · on Aug 30, 2014

I tried OpenBSD the other day. I was so impressed that I immediately donated $20 (via BTC :)) to the OpenBSD foundation.

Everything was extremely straightforward and consistent, much more so than Linux (which I normally use).

I am now migrating my servers to OpenBSD. Very happy so far.

stock_toaster · on Aug 29, 2014

bhyve has been in FreeBSD 10.0 for a while now. The link to the frontpage of bhyve.org contains no hint of why this was just posted. Is there some news?

Further, it [sounds][1] like things may be starting to migrate from bhyve.org to the freebsd wiki.

[1]: https://twitter.com/michaeldexter/status/504044404424716288

davidone_f · on Aug 30, 2014

Probably FreeBSD pays the fact that (well, almost in EU) is much more simple to find a linux kernel developer than a BSD one. And I think that for a company this is far more important than a debate about GNU/Linux or FreeBSD.

In terms of FreeBSD shops, I think that actually Netflix is one of the biggest FreeBSD shops out there.

icantthinkofone · on Aug 30, 2014

'Tis true. Netflix uses FreeBSD.

EDIT for the downvoter: https://www.netflix.com/openconnect/software

on Aug 29, 2014

[deleted]

jimktrains2 · on Aug 29, 2014

Just an FYI, this got posted to the thread on BHyve, not IBLT.