| No TrackBacks

For reference (and only if for my very own reference), an abcde.conf for flac + space-based filenames:

PADTRACKS=y
ACTIONS=default,replaygain
OUTPUTTYPE=flac
OUTPUTFORMAT='${ARTISTFILE}/${ALBUMFILE}/${TRACKNUM} - ${ARTISTFILE} - ${TRACKFILE}'
VAOUTPUTFORMAT='VA/${ALBUMFILE}/${TRACKNUM} - ${ARTISTFILE} - ${TRACKFILE}'
ONETRACKOUTPUTFORMAT=$OUTPUTFORMAT
VAONETRACKOUTPUTFORMAT=$VAOUTPUTFORMAT
MAXPROCS=8
mungefilename ()
{
    echo "$@" | sed s,:,\ -,g | tr /\* _+ | tr -d \'\"\?\[:cntrl:\]
}
pre_read ()
{
  eject -t
}
EJECTCD=y

This post is motivated by a once again lost abcde.conf file.

Lessons learned with Supermicro's remote management/IPMI view

| No TrackBacks

Supermicro's recent IPMI/KVM ("remote server management with graphical console") violates all good design principles and what you would expect from such a solution.

Basically, it works like this: there is some management controller on the mainboard, with it's own dedicated network port. It's got an HTTP interface for use & configuration. For use it offers basic power control (off, on, reset), a serial-over-lan transport, and a graphical console which can also provide disk services to the host (CD/ISO, USB Key, floppy).

For the basic feature set, this sounds like what you want to use.

Unfortunately Supermicro's implementation adds a great deal of obstacles which make using it nearly impossible. Here's why:

  • The HTML UI makes extensive use of JavaScript and AJAX, and fails to provide progress and error messages when something goes wrong.
  • The client part of the graphical console appears to be implemented in Java and native code. The native code parts are only available for the platforms Supermicro has chosen to support (i386/amd64 of Windows and "Linux").
  • Different servers appear to require different management controller firmware versions. While the interface looks quite the same, it seems to do completely different things under the hood. ("This one works on a Mac, the other's dont?")

None of this does any good.

Details:

  • The graphical console requires you to use Sun Java 6u17. Using a newer Java version plainly doesn't work, and you get either no window and no error message or "Authentication failed".
  • The underlying protocol seems to be VNC, but with a different authentication scheme, making standard VNC clients useless. (Also it appears to be an OEM version of ATENs KVM/VNC stuff.)

A friend pointed me to the so called "IPMIView" tool, which basically is a standalone version of the graphical console and some other bonus features. Compared to the Java applet stuff, it feels rather stable, but has the same platform limitations (i.e. Windows + "Linux" only). It appears to be available only from SM's FTP server: ftp://ftp.supermicro.com/utility/IPMIView/

Also, to compare this situation with HP: HP's "ILO 2" is very slow, went through a few firmware versions to fix rather odd bugs, but: the basic features (== what you depend on during emergencies) work and worked all the time. Their graphical console also is Java, but with no native code, and therefore works fine on a Mac and IIRC it also worked fine on ppc Linux.

Sidebar:

This has cost a client about 12 man hours. They're using Macs in the office, and those are now basically useless during emergency times.

A Dyson has arrived

| No TrackBacks

After >5 years with some old-style Siemens vacuum cleaner, I've now replaced it with a Dyson DC32.

Pros:

  • The normal strainer is very effective, especially on carpets
  • Bar/staff is rather flexible -> more freedom for cleaning under the bed
  • Very long cable -> more freedom

Cons:

  • Very loud
  • Getting the bin clean afterwards is a bit challenging. Might not be a problem compared to normal vacuum cleaners though (which also tend to be full of dust after replacing the dustbag)
  • Needs more stowing space

Have a MySQL (Replication) Setup?

| No TrackBacks

If so, you should invest some time in research of tools already available.

At least, you should consider using MaatKit:

Most of Maatkit's functionality is designed for MySQL. It makes MySQL easier and safer to manage. It provides simple, predictable ways to do things you cannot otherwise do. That's why Maatkit is now shipping by default with many GNU/Linux distributions such as Debian and CentOS. You can use Maatkit to prove replication is working correctly, fix corrupted data, automate repetitive tasks, speed up your servers, and much more.

I found it especially useful in MySQL Replication Setups. mk-table-checksum and mk-table-sync will save you headaches in this scenarios.

(This entry was prompted by an conversation.)

Setting up GemPlus USB reader on Linux

| No TrackBacks

For reference.

SmartCard reader is a gemalto PC USB-SL Reader, P/N HWP108841C.

Install these packages:

  • sys-apps/pcsc-lite (+usb +hal)
  • app-crypt/ccid

It might be benefical to install sys-apps/pcsc-tools too.

After this, start pcscd. You don't need to put anything into /etc/reader.conf, pcscd should pick up the USB reader, and load the ccid driver.

If you happen to use MOCCA, the Austrian "Buergerkarte" software, it should now find the card reader. Might need to restart it though.

"Domain Renewal Group"/"Domain Registry of America" = scam

| No TrackBacks

The beloved DROA is again sending "renewal notices". Also, they will transfer your domain name to them and charge you lots of money.

So - if you receive letters from them, discard them.

Linux on the Intel DP55KG board

| No TrackBacks

Now owning an Intel DP55KG board (http://www.intel.com/products/desktop/motherboards/DP55KG/DP55KG-overview.htm), I naturally tried running Linux on it. Unfortunately this was not one of those "works out of the box" experiences.

The current issues are (all tested with 2.6.31.4):

Other stuff to know:

  • The extra Marvell controller exposes an AHCI interface, so just use the AHCI SATA driver for it. Hot-plugging eSATA drives works fine.
  • There are apparently issues with Noctua fans, but I haven't verified that yet.

root-on-LVM2 with Gentoo

| No TrackBacks

For various reasons I had to reinstall my home desktop, this time using Gentoo Linux.

My desktop systems usually have their root-fs on an LVM2 volume. Alas, such a setup is not covered in the Gentoo Installation Guide. Here are the details:

Setting up root-on-LVM2 with Gentoo

Fact: root-on-LVM2 needs an initramfs to work.

Therefore:

  • emerge lvm2 in your chroot before doing any kernel work.
  • Setup LVM2 as usual (create type 8e PV partitions, pvcreate them, vgcreate, lvcreate, mkfs)
  • Use genkernel --lvm to build your kernel.
  • Specify root=/dev/mapper/VGNAME-LVNAME and dolvm on the kernel command line.

You may need to set these things in your kernel config, too:

  • Disable asynchronous SCSI device scanning
  • Build SCSI/SATA device drivers into your kernel
  • Build device mapper as a module

(These last things are what I did, without testing other options.)

If the initramfs complains about not finding your root-LV, check that there is an /etc/lvm/lvm.conf inside the initramfs. Else, pvscan/vgscan will scan no devices for PVs/VGs.

Configuring Hudson for grml autobuilds on EC2

| 1 TrackBack
Suppose you want to do automated builds of grml using the excellent grml-live framework, and host this in a nice autobuilder, like, Hudson. Also you don't have the necessary disk space, RAM, etc. locally so you want to use Amazon EC2 to host the worker machine.

Install Hudson

  • Download Hudson. (Actually grab hudson.war.)
  • apt-get install sun-java6-jdk
  • adduser --system --group --disabled-password hudson
  • su - hudson
  • java -jar hudson.war
The Hudson web interface should now be listening on port 8080. Go there and configure it.

Configure Hudson

  • Use the plugin manager (Click 'Manage Hudson', -> 'Manage Plugins') to install the EC2 plugin. Restart hudson afterwards. (It may take a while until all available plugins are listed. Be patient.)
  • Configure the basics ('Mange Hudson' -> 'Configure system'):
  • Set "# of executors" to 0. This effectively disables any builds on the master.
  • In the "Cloud" section add "Amazon EC2".
  • Configure Access Key, Secret Key and EC2 RSA private key. (First two are in your Amazon EC2 Credentials, the RSA private key can be created using the EC2 Management Console by using the 'Create keypair' function.)
  • Add an AMI:
    • AMI ID: ami-fcf61595 (current AMI ID from alestic.com for Debian squeeze server 64bit)
    • Instance Type: LARGE (the Alestic AMI won't work with the SMALL type)
    • Description: Debian 6.0 server 64bit (Alestic) US
    • Remote FS Root: /mnt/hudson (where the hudson slave will store it's local data. /mnt is the large disk for EC2.)
    • Labels: debian-6.0-amd64 (AMIs with the same label will be grouped by Hudson)
    • Init Script: wget -q http://your.web.server/hudson-slave.run && bash hudson-slave.run (Get my hudson-slave.run and copy it to a web server reachable by your EC2 instances.)
  • Save.
By now you should be able to manually add a node on EC2 from 'Mange Hudson' -> 'Manage Nodes' (click "Provision from EC2"). If this works well, you're mostly done.

Setup a build job

Now create a new job for building grml. Job name can be "grml-small amd64 testing" or whatever you actually build :-)
Choose "Build a free-style software project" as the proper option.

Configure your job

From the job dashboard choose your job, and select "Configure".

Check "this build is parameterized" and add two String parameters:
  • Name: FLAVOUR
  • Default Value: grml-small
  • Name: CLASSES
  • GRMLBASE,GRML_SMALL,AMD64
For the build, you'll need to add two shell steps, with the following script contents:
Execute shell step #1:
#!/bin/bash
echo "setup system and cleanup"
set +e
set +x
apt-get install -y mksh fai-client fai-server fakeroot squashfs-tools squashfs-lzma-tools bc perl
apt-get install -y grml-live grml-live-addons

cat > /etc/grml/grml-live.local << EOF
GRML_LIVE_SOURCES="
deb http://localhost/apt-cacher/http.us.debian.org/debian squeeze main contrib non-free
deb http://localhost/apt-cacher/deb.grml.org/ grml-stable  main
deb http://localhost/apt-cacher/deb.grml.org/ grml-testing main
"
FAI_DEBOOTSTRAP="squeeze http://localhost/apt-cacher/http.us.debian.org/debian"
#SQUASHFS_OPTIONS="-nolzma"
SUITE="squeeze"
CLASSES="${CLASSES}"
VERSION="${BUILD_ID}"
EOF

grep /grml /proc/mounts | awk '{print $2}' | sort -r | xargs umount
rm -rf /grml

rm -rf work
mkdir -p work

Execute shell step #2:
#!/bin/bash
echo "actual build"
set -x
set -e
mkdir -p /grml
mount -t tmpfs -o suid,dev none /grml
cd /grml
set +e
grml-live -g ${FLAVOUR} -F
RC=$?
set -e
cd -
mv /grml/grml-live/grml_isos/* work/
umount /grml
exit $RC


For post-build Actions you'll want to check "Archive the artifacts" and use "work/**/*.iso" as the files to archive. This way the built ISO will be copied to the Hudson master.

Test it

After saving your job config, do a test run by clicking "Build now". After a few moments you should see a build running, and console output should show grml-live doing it's work!


You obviously want to customize the parameters to your job as well as the first shell fragment, if you want to build something different than some grml-small amd64 ISO ;-)

Testing in mobile phone browsers

| No TrackBacks
Browsers on mobile phones are a pain. It feels a bit like the old browser wars, except that there are no fights.

Testing all those browsers is even more painfully: you'd need one physical device for each released mobile phone OS + version out there. PPK can do this, but you probably don't have all those devices.

So, what about emulating...?

Yeah. It works for Android.

The Android SDK (free download!) actually contains an emulator which seems to run the original binaries. You can also hook up a debugger (adb logcat) to the emulated device, and start debugging javascript in the browser. Nice. (Also on Mac.)

The iPhone SDK (free download from Apple. Mac only) contains a simulator: the "device" will run x86 code, so it's not the same as the physical device to start with. I need to check if the browser behaves the same as on the phone.

For Symbian phones, you need to download a Symbian / Series60 SDK from Nokia (forum.nokia.com - how weird is that? After 14 days you need to register it). It's Win32 only, and again only a simulator which runs x86 code. It didn't work at all on my Windows 7 desktop. (Also it's the SDK with the strangest feeling. I can fully understand why no one is developing apps for S60.)

Doing OpenID right

| No TrackBacks
So, everyone* is now accepting OpenID for authentication. 

But with doing so, many sites are actually hurting OpenID and themselves. 
Here's why: they've integrated it badly.

There are obviously many parts of the OpenID integration people can fuck up, but here are the three common cases:

#1 Your login page shoots a large text box right into your user's face.
Also it goes on to explain a long way what OpenID is, etc, yadda.

This is plainly wrong for two good reasons: 
a) your site breaks user expectations of a login page (where's the username + password boxes?)
Live sample for doing it wrong: stackoverflow. They also get bonus points for listing Google as the first provider but not having read Google's freely available research about UI design for OpenID.
(You should understand that most users out there have an account with one of the large providers and don't host their own OpenID stuff. Really.)

b) no ordinary user cares for OpenID.
This is just a simple fact of life. Users want to get things done, not think about some cool whizzbang technology your site's using to make your life simpler. They do care a lot about you making their lives easier, though - so let's move on to case #2 and see what that means.


#2 Your site uses OpenID only for authentication.
OpenID can do so much more for you and your users, please actually use it.

Authentication is probably the number one reason for implementing OpenID. Yep, it makes your life easier: you (as a site operator) no longer need to store user credentials and keep them safe and so on. But why stop there?
In the same request you make for authenticating your users, you can also request profile data (if it's available), like an email address, a nickname, user's real name, etc. Obviously not every OpenID provider will have this info, but if it does and you use it to pre-fill user profiles on your site, your users will thank you.

Again, stackoverflow is a nice example of doing this completely wrong.
Their welcome screen for new users: stackoverflow-openid-signup.png
Yep, a scary "are you sure you want to create a new account" page. Instead of asking users how they would like to appear on the site instead of "unknown <providername>". Really: the way to go - for creating new users - is: ask about profile data and provide a (small) link to merge with an existing account.


#3 Require seperate logins across sister sites.

This obviously only applies to you if you have sister sites, but probably you do if you're focusing multiple countries or multiple interest groups.

Yep, there are OpenID providers (like Google) out there which will give you per-realm tokens. And this is good (privacy, you know). But this is not something you want to slam into your user's face. It's your site's problem, so solve it.

And it's easy to solve: make a central realm, probably your main site (e.g. "amazon.com" if you were Amazon and had amazon.(com|co.uk|de|fr) as sites) for logging in. This way you'll have only one token and be done with it.


* obviously greatly exaggerated, but the time to solve this problems is now.

DMI data on consumer hardware

| No TrackBacks
Desktop Management Interface - DMI is a standard for exposing base data about the system (hardware) to the running software. Usually you'd get the manufacturer name, etc. 

At least in theory.

On typical consumer hardware (like my Zotac board), you instead get this:

$ dmidecode
...
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: To be filled by O.E.M.
        Product Name: To be filled by O.E.M.
        Version: To be filled by O.E.M.
        Serial Number: To be filled by O.E.M.
        Asset Tag: To Be Filled By O.E.M.

...

I don't need to say that there is no way of finding out what hardware you've actually got from this info.

Atom+ION board replaces my old fileserver

| No TrackBacks
I've got myself a Zotac ION ITX A board (with an Atom 330) to replace my old home fileserver's hardware (P4). This worked out well, but there are some traps:

Old 533 MHz memory does not work - but does not fail completely. Instead, the system does not boot on some attempts. Holding down the reset switch eventually makes it boot. After replacing the memory with fresh 667MHz DIMMs, this was resolved. (The board is actually spec'd for 667/800 DIMMs only, but I didn't notice this at first.)

The shipped CPU fan is - as you'd expect - crap. It's so noisy that even the old system was quieter. I've removed the CPU fan and instead hooked up an 80mm Noctua fan which indirectly cools the CPU cooler. This is now as quiet as it gets (within the time and money I'm willing to spend on this).

BIOS flashing only works from a (emulated) DOS boot disk. The flasher definitively is scary.

The WLAN card (it's actually a card, so I guess I could replace it) is an ath9k - AP mode doesn't really seem to work with the current Linux drivers. I was hoping to get rid of my AP, but this has to wait...

"Regular Expressions Cookbook" may not be what you want

| No TrackBacks
I bought this book for reasons I cannot remember, and I can't see why I did this in the first place. The intro chapters are not very useful, they do not cover anything besides 'Perl-style regexes' (they do cover differences between different Perl-style RE implementations though).
I'll probably keep it as a reference, if I ever need an RE for an already solved problem and Google would fail me. 

Doing what one suggests

| No TrackBacks
During my recent mini-talk I suggested, that one should always use the latest version of puppet.

But, our own production setup was still on 0.24.4. 
Yep, 0.24.4.
That old.

This got us into some trouble during deployment of the Nagios/Naginator types (in 0.24.4+0.24.5 they are dead slow and missing features which make them unusable). Therefore, we've now upgraded to 0.24.8 and enjoy fast deployment of those types :-)

Blog now reachable via IPv6

| No TrackBacks
This blog is now reachable using IPv6, too.
If you're using IPv6, it should say so here: 

"Configuration Management using Puppet"

| No TrackBacks
I gave a very short talk on Puppet, during FrOSCon 2009 as part of the PostgreSQL project line.


(As OOo cannot do an reasonable export to PDF, I redid the slides with Keynote.)

Monitoring static values

| No TrackBacks
Sometimes statically configured values/limits are not so static as you'd think:

tomcat_jvmheap-day.png

The 'Maximum Limit' is configured to be 9728M. But maybe this isn't what we think it is :-)

Looking back at 5 years old code

| No TrackBacks
Looking back at 5 years old code makes me a bit sad. How embarrassing!
At least I've now cleaned up the mess of where the code is located. No more outdated CVS & SVN repos, 4 backup copies, etc.

Replacing nscd for "hosts" caching

| No TrackBacks
I'm currently looking for a replacement for caching the Linux NSS "hosts" "database". Basically, my goal is caching DNS lookups, but with a few restrictions:
  • The cache should not cache until the DNS TTL expires, but for a maximum of a configurable TTL. This is needed, so I can easily flush the DNS caches on the central resolvers, but don't need to worry about the cache on each and every machine.
  • The cache should ask one or more centrally installed resolvers, as they have special configurations for some domains.
  • The cache should be running locally, as everything else will break at some point. *

nscd is designed to do this, but unfortunately it has serious bugs which make it unsuitable for use, including this bug (which also features a nice "Drepper response").

I'm not sure which software to use yet, maybe it doesn't even exist.

Any recommendations?

Making the case for 15 boxes per rack

| No TrackBacks
We are now putting exactly 15 1U machines into each rack. This number fits nicely for our setup.

15 machines equal to:
  • 15x3 switch ports. Each machine gets out-of-band management, in-band management and a VLAN trunk port.
  • 16x2 power ports, as each machine gets two PSUs for redundancy requirements. The switch obviously also has redundant PSUs, so this totals 32 power ports. With CEE7/4 plugs this already takes lots of space just for the plugs/power bars.
  • ~ 4000kW power requirement; we calculate with ~260W per server (actually more during boot etc).
  • 15x5 cables. Two power cables, 3 Ethernet patch cables per machine. Plus a few more for switch power and switch/rack interconnects.

It's already a challenge to actually handle all this stuff in a single rack. It gets really messy if you surpass 15 machines per rack.

We group these 15 1U machines in groups of three. After each group we leave 1U empty. After the second group we leave an extra 1U empty and mount the 4U switch on the read side. Then again we leave 2U empty and mount the remaining three groups below. Power bars go to the rear sides, heavier machines go to the bottom.
My colleague detailed this layout on the amd.co.at/AdminWiki.


Maybe we can do 20 machines per rack when we know more about power on the new HP ProLiant server generation.

Switched to VoIP

| No TrackBacks
At work we've now successfully switched from an Alcatel OmniPCX Enterprise to a VoIP-/SIP-based solution. This has worked pretty well, some users were excited, some not so (as always).

We're not even missing features, and it's way better for us to handle now. (Even as the LDAP-backed provisioning is still missing right now.)

Overall summary basically boils down to:
  • Freeswitch (inside OpenVZ on a HP ProLiant)
  • Polycom 331 hardware phones
  • Some users are running twinkle instead
  • LDAP-based directory on the Polycom phones (needs a software license though)
  • Phones are provisioned via DHCP+HTTP, they automatically switch to the Voice VLAN after getting their initial DHCP lease
  • PoE
  • LDAP-based provisioning of Users (not yet ready, but real soon now)

Of course, all of this was made possible & implemented mostly by my colleagues, not by myself.

Backwards incompatible change in Firefox 3.0.13

| No TrackBacks
Seems like Firefox 3.0.13 has been released, as well as the updates for Ubuntu, probably fixing the BlackHat related SSL problems.

This is of course fine, but the fix seems to have broken behavior we've relied on for all too long:

Previously, given a certificate for *.domain.com, this would (in Firefox, not in other browsers) also be valid for a host called "foo.bar.domain.com". As of 3.0.13 this is no longer true, and therefore we're getting loads of SSL errors now...

(I've not validated this with Firefox on Windows.)

Resetting your bitlbee password

| No TrackBacks
If you're using bitlbee, and forgot your password, it's relatively easy to reset it - if you actually host your own bitlbee server.

  • /var/lib/bitlbee$ mv nickname.* old/
  • reconnect
  • register 
  • disconnect
  • /var/lib/bitlbee$ mv old/* .
  • reconnect
  • reset your account passwords

Good luck!

PXE boot decisions

| No TrackBacks
SYSLINUX/PXELINUX is officially awesome. You've almost certainly already used it - while booting from a Linux installation CD - and probably already use it if you do any PXE stuff.
What you may not knew is that it provides an API programs can use, it can directly execute specially crafted DOS command binaries (COM), and even has a special 32-bit mode for these programs (the collection of these is "Comboot API").

There are lots of cool things in the API and the extensive sample programs. Especially nice is the readily available DMI support in the API.

This made my task an easy journey. First, what I wanted to actually do:

  • If a machine gets booted from PXE (from any location, we have a central TFTP server),
  • if it is a server of a known model, auto-boot into a customized grml environment for auto-deployment,
  • else present our standard PXE boot menu for OS installation/recovery purposes.

And this basically boils down to:

  • Write a small Comboot program which checks the DMI product_name against a pre-defined string (for HP ProLiant servers this unsurprisingly starts with "ProLiant") and dispatches to two different PXELINUX configuration files. I named this program proliant.c32.
  • Write pxelinux.cfg/default which does exactly one thing: auto-load proliant.c32.
  • Write the seperate PXELINUX configuration files.

FRITZ!Box Fon WLAN + WRT54GL Wireless Bridge

| No TrackBacks
After spending two hours fighting different WRT54GL firmware versions, WDS, Client Bridge, and other stuff, here's my solution for bridging the ethernet to WLAN on an Linksys WRT54GL with a Fritz!Box Fon WLAN 7140 WLAN basis station.

  • Get dd-wrt for the WRT54GL. I used version 24-sp1.
  • Do a full factory reset on the WRT.
  • dd-wrt Setup: Basic Setup/Network Setup: configure an IP in the same network as your Fritz!box is in.
  • Security/Firewall: SPI Firewall -> Disable
  • Wireless/Basic Settings:
    • Wireless Mode -> Repeater Bridge
    • Wireless Network Mode -> G-Only
    • SSID -> needs to match your Fritz!box SSID
    • Network Configuration -> Bridged
  • Wireless/Wireless Security:
    • Security Mode: if your Fritz!box is configured for WPA or WPA2/Mixed, select WPA here (NOT WPA2! - it won't work)
    • WPA Algo should be TKIP for a Fritz!box
    • WPA Shared Key -> same as on your Fritz!box

You probably need to reboot the WRT after changing all this stuff.

Now go to Status/Wireless, you should see the Fritz!box in the Access Points & Clients list, with a Signal Quality > 0.
If it doesn't work yet, try the "Site Survey" and click "Join" next to your WLAN network.

There isn't anything special to configure on the Fritz!box. If it finds your WRT it should say "Repeater" next to it in the WLAN Monitor menu.

Probably it's better to use WDS instead of "Repeater Bridge", but I haven't got that working. If you do, let me know!

Introducing lastfmproxy-rb

| No TrackBacks
Last.fm released a new Radio API which must be used instead of the old one now. Unfortunately most programs have not been updated to the new API, including lastfmproxy, which I wanted to use to listen to last.fm from my squeezebox (there are some limitations in the native squeezebox/last.fm radio stuff which prevent me from using it).

Therefore, I want to introduce: lastfmproxy-rb (git repo).

This piece of software aims to provide a real Shoutcast/ICY-style "internet radio" stream based on the last.fm radio API. In the current state it's mostly a big hack, but the basics (== listening) works already. ICY metadata is still on my todo list, for example.

Requirements:
  • ruby 1.8 installed
  • last.fm subscriber (limitation of last.fm)
  • have an last.fm API key (get it here for free)

Quick start:
  • fetch proxy.rb
  • create config.rb:
    config = {
      :username => 'yaddayadda',
      :password => 'PASSWORD',
      :station => 'lastfm://artist/MGMT/similarartists',
      :api_key => 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
      :api_secret => 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    }
    
  • ruby ./proxy.rb
  • point your audio client to http://localhost:2000/listen.mp3

Data Centers in Vienna

| No TrackBacks
Vienna has a relatively small data center density compared to other cities, but there are still a few options worth exploring:

DanubeDC, Floridsdorf: very competent looking, fast interaction possible. They build a second DC on the south side of the Danube, too, probably with high-density cooling zones.

IBM ODC21, Floridsdorf: Bureaucracy; fast; very competent looking. Can do high-density cooling zones.

InterXion, Floridsdorf: "the datacenter". VIX2 is hosted here. A bureaucracy. Can do large-scale projects, as well as high-density cooling zones.

interoute, Liesing

Invitel, Voesendorf: Can do large-scale projects, unfortunately not directly located in Vienna.

Nessus, Favoriten: probably very flexible, small company.

SIL, Heiligenstadt: okay. Sometimes a pita, but things get better.

UPC, Favoriten: probably okay. Flexible pricing.


There are rumors that Verizon and Telia also operate data centers in Vienna, but I couldn't get pricing/info from them.

Most data centers offer reasonable pricing while not being capable of doing proper pre-sales. You better know exactly what you want ahead of calling/visiting anyone.

Polycom SoundPoint IP 330: a first look

| No TrackBacks
soundpoint_ip330_320.jpgGot my Polycom SoundPoint IP 330 today, and already had a chance to take a first look.

I'm not that impressed yet, but I guess this is a good sign. Provisioning "just worked" as documented on the web, as well as the PoE, SIP, firmware upgrade (which is really just part of the provisioning), dialing stuff, etc. German translation is included, but it's (like on all SIP phones) very confusing.

But it's already less confusing than the Snom 360 I was previously using.

EDAC i5000 NON-FATAL ERRORs on HP ProLiant Hardware

| No TrackBacks
If you're seeing messages like 'EDAC i5000 NON-FATAL ERROR' in your kernel.log, and that is on HP ProLiant hardware (DL360 G5 in this case), take them seriously and PANIC ^W act immediately. (These messages are neither a kernel bug nor a hardware bug, but the plain truth.)

Just had a machine panic with an NMI today, and we've delayed ^W ignored the kernel messages for a few days...

Causes could probably be faulty RAM, faulty system board, or something else. If you don't get a lit 'memory faulty' LED, the IML will helpfully save a bit of info when an NMI occours, so you at least have something to tell HP.

Puppet 0.25.0 beta1 + Passenger

| No TrackBacks
The first beta version of Puppet 0.25.0 was released. If you're using Passenger, please read ext/rack/README for setup instructions, and keep in mind that the old config.ru file from the wiki will no longer work. (There's a new one in ext/rack.)

ruby-ldap + SSL

| No TrackBacks
If you have trouble with ruby-ldap to connect to an SSL-only LDAP server, there can be lots of reasons. From what I've seen today, the next time I've problems like this I'd check these things first:

  • does ldapsearch -x -H ldaps://your.ldap.hostname work?
    • if not, fix this. usually you need to set TLS_CACERT in /etc/ldap/ldap.conf
  • check the underlying ldap library. ruby's ldap library can be linked against the OpenLDAP libldap or against the Netscape LDAP SDK. Make sure the binaries supplied with the correct library can connect to your ldap server.

  • check that the minimum amount of code works, an example would be:

    require 'ldap'

    conn = LDAP::SSLConn.new( 'your.ldap.hostname', 636 )
    conn.set_option( LDAP::LDAP_OPT_PROTOCOL_VERSION, 3 )
    conn.bind('cn=loginuser,o=foo','FOOPASSWORD') {
      conn.perror("bind")
    }

In my case, I was missing the TLS_CACERT config option in /etc/ldap/ldap.conf and was only getting a useless "Connect error" from ruby.

Why we're using GRML in the datacenter

| 2 Comments | 1 TrackBack
grml basically is a live Linux CD optimized for text-tool users and sysadmins. We are using it in the datacenter, and here's why:

  • It's a breeze to use for unprepared tasks: it contains a lot of useful tools, and it's just a CD you need to carry. Got a broken machine? Suddenly some box is acting strange? Fire up grml and check what's going on.
  • It's easy to integrate into your existing infrastructure. You probably already have some PXE server for network booting, and you can boot grml off it after following a few simple steps.
  • It's easy to extend. We've just recently replaced our old preseeded debian installer for deploying new machines with grml (from PXE boot) + a simple script I wrote in an hour or so.
  • It supports all our hardware.
  • The grml64 variant is a true 64-bit Linux, which is great if you want to deploy 64-bit installations.
  • It's based on Debian, so our people are already familiar with the environment.
Try grml out - grab your copy of grml or grml64 from here.

Also, grml provides a nice pre-configured zsh (and related shell environment). If you like it, you may want to use it permanently for your workstation (or even servers). Get the instructions from http://grml.org/console/

HP support Debian lenny on ProLiant G5 series

| 2 Comments | No TrackBacks
Debian 5.0 ("lenny") has landed, and now HP officially supports it.

The first downloads are available on the regular support pages. I've been waiting for this to happen for a very long time, and will now test what has been delivered.

Update: the ILO components seem to be missing from this initial release, but hpasm, hpacucli are fully functional, and I suspect the other software packages will work too (hpsmh, hpadu, hp-snmp-agents, cpqacuxe). Manually installing ia32-libs was necessary on amd64, as hpbootcfg crashed without it.

PowerDNS 2.9.22 released

| No TrackBacks
After a very long time, PowerDNS 2.9.22 has been released by it's author; a release I've been looking forward to because of various issues.

Unfortunately this is too late for the Debian lenny release, so I'll probably need to maintain custom packages for both etch & lenny.

Creating a multi-floppy USB Key (for flashing HP DL140/145 G2 systems)

| No TrackBacks
I'm a (not-so) proud owner of one HP ProLiant DL140 G2 (Intel-based) and one HP ProLiant DL145 G2 (AMD-based) machines. These were, at the time, good machines, not so expensive, and had two fixed 3.5" SATA drives, allowing for cheap disk upgrades. (BTW, this blog runs on one of these.)

HP had outfitted these machines with so-called "Lights-Out 100i management", basically delivering IPMI-based out-of-band management, with serial-over-LAN. If you take a look at what Supermicro delivers nowadays, you would not want the LO100i. I've also had lots of problems with the LO100i, so I stopped using it.

Few days ago I decided to give the LO100i a new try. In the meantime, HP issued firmware upgrades for both the system ROMs as well as for the LO100i management processor (BMC). The journey begins...

Interesting facts:
  • The DL140/145G2 were made in 2005, and were shipped without a floppy drive.
  • There is no way to add a floppy drive into the machine.
  • The EXE is a 16-bit DOS executable which runs in full-screen mode.
  • The downloaded EXE (containing the flash update) requires a floppy drive to write its self-contained floppy image to.
  • System ROM and BMC updates are seperate, so each one needs a seperate floppy disk.

What's not working:
  • 64bit Windows can NOT execute 16-bit DOS executables. All the Windows machines at work are 64bits. (They do not have a floppy drive anyway.)
  • Full-screen DOS applications can NOT run inside a terminal services connection.
  • DOSEMU does crash on my 64bit Linux desktop.
  • Unzipping the downloaded EXE ("SoftPAQ") file does not work, it's some kind of Compaq propietary tool.

Conclusions so far:
  • I've got one DL140 and one DL145, so I'd need 4 floppies.
  • I'd need a USB floppy drive. I checked at a local store, such a drive costs about 30 euros.
  • It would take lots of time in the data center.
  • I'd need even more than 4 floppies, because floppy disks are very unreliable.

I decided to use a single USB key instead. I've been recently given such a (very cheap) key from my employer - nothing to lose if something goes wrong with it.

Preparations:
  • Get a Windows desktop, install WinImage
  • Get a Linux desktop with the following software:
    • VirtualBox (Ubuntu: apt-get install virtualbox-ose and login again)
    • mkisofs
    • syslinux
    • The FreeDOS installation ISO. You'll only need the small base-cd ISO.
  • Download the relevant updates from hp.com. For me those were:
    • DL140G2 BIOS: SP32670.EXE
    • DL140G2 BMC: SP33955.EXE
    • DL145G2 BIOS: SP33884.EXE
    • DL145G2 BMC: SP33956.EXE

Now lets do this:

1) Use WinImage to create an empty image of a 1.44MB floppy disk. Save it as floppy.img (select uncompressed before saving it). floppy.img will serve as a virtual floppy disk for the SoftPAQs.

2) Copy floppy.img to your Linux desktop, put the Windows machine aside.
Yes, you could use dd + mtools/mformat to do this, but it's so much easier with WinImage.

3) Create a ISO from the updates. mkdir temp_dir, copy all the EXEs into it; run:
mkisofs -o updates.iso temp_dir
This updates.iso will serve as the source for our DOS VM, so we don't have to set up networking inside the VM.

4) Start VirtualBox, create a new VM, profile type DOS. Create a virtual hard drive for it.

5) Add the floppy.img and the FreeDOS ISO to your newly created VirtualBOX VM.

6) Boot the VM, install FreeDOS to the virtual hard drive. Follow the on-screen instructions; xfdisk hanged for me at reboot time, resetting the VM worked fine.

7) Mount updates.iso in VirtualBox instead of the FreeDOS ISO.

8) Inside FreeDOS: mkdir c:\tmp ; copy all files from the virtual CD drive (usually D:) to c:\tmp (the ROMpaq stuff won't work properly from the virtual CD drive).

9) For every single update:
9a) format a:  (without this, the extractor will fail to recognize the floppy disk after the first run)
9b) run the SPxxxx.exe, type Agree and have it write the update to drive A:
9c) unmount the floppy.img from VirtualBox, save it away with a meaningful filename. We'll put these images onto the USB key later on.
9d) re-mount floppy.img in VirtualBox

10) Shut down VirtualBox. Save your VM for the future.

11) Prepare the USB key. If there's something on it, make a backup - we'll wipe it in the next step.

12) Write a superfloppy-style DOS filesystem onto the USB key. This will allow most BIOSes to boot from the key. Run:
mkdosfs -I /dev/sdX

13) Put the syslinux bootloader onto it:
syslinux /dev/sdX

14) Mount the new filesystem on your USB key, so we can copy more stuff onto it:
mount /dev/sdX /mnt

15) Copy the floppy images you created above to /mnt. Remember that you can't use long filenames (so 8.3 filenames must do; directories are okay).

16) Create /mnt/syslinux.cfg. This is the configuration file for the bootloader. We'll want syslinux to stop after loading and issue a prompt:
echo "PROMPT 1" >/mnt/syslinux.cfg

17) Copy the memdisk kernel. It's a floppy emulator (for A:), which operates in RAM.
cp /usr/lib/syslinux/memdisk /mnt/

18) For every floppy image you have, put the following lines into your syslinux.cfg. They will tell syslinux what to do with all the files:
    label IDENTIFIER
        kernel memdisk
        append initrd=FLOPPYIMAGE.NAME

Obviously, you need to replace FLOPPYIMAGE.NAME and IDENTIFIER. IDENTIFIER will be the string you type at the syslinux prompt after booting from the USB key, to select this particular floppy image.

19) umount /mnt and try your fresh 4floppy-in-1key.


New Tunnelblick version fixes Nameserver problems

| 1 Comment | No TrackBacks
If your company uses OpenVPN as the VPN-solution for road warriors, and you are using a Mac, you are probably using Tunnelblick. Tunnelblick is a nice GUI wrapper for OpenVPN on OS X. (I understand it consists of a bit more than a wrapper, but you never see those parts.)

I recently upgraded to version 3.0b9 which fixed all those nasty crash bugs, but created a new problem for me: it would no longer correctly set the nameserver if told to. Even worse, it would somehow destroy /etc/resolv.conf, so all name resolution went out while using the VPN. Not very useful, so I lived without the company nameservers and tried to remember the important IP addresses instead (uh).
On November 20th a new version was released - 3.0b10 - which fixes this problem, but this is not noted in the ReleaseNotes. I suspect the bug was fixed in openvpn and not in Tunnelblick, and the new openvpn version which is included in the new Tunnelblick version no longer suffers from this problem.

Yay.

Puppet: managing directories recursively

| No TrackBacks

This is not very obvious from Puppet's TypeReference, but you can manage directories in a very interesting way:

  • Recursively copy a directory from the filestore to a client _and_
  • remove all unmanaged files
Still not very interesting, but please see the light:
  • You can deploy an empty directory,
  • Fill this directory using seperate file resources, possibly from other modules (or even other nodes, if you use exported resources)
  • Everything puppet did not put into the directory gets removed.
This yields, very effectively, a fully managed directory with lots of flexibility.
We're using this approach for all sorts of configuration directories, including:
  • APT's sources.list.d and apt.conf.d
  • Debian-Apache2's sites-available/sites-enabled
  • Debian-Exim4's conf.d (including subdirectories)
  • Bacula director/Munin configuration (in combination with the concatenated_file type)
For this to work, you need to do a little bit of work:
  • prepare a directory in your module filestore which will be the (usually empty) source directory
    I often put a README file in there, explaining what's going on.
  • add this code snippet for managing the target directory:
    file { "/etc/exim4/conf.d":
      ensure => directory, # so make this a directory
      recurse => true, # enable recursive directory management
      purge => true, # purge all unmanaged junk
      force => true, # also purge subdirs and links etc.
      owner => "root",
      group => "root",
      mode => 0644, # this mode will also apply to files from the source directory
      # puppet will automatically set +x for directories
      source => "puppet:///exim/exim4-conf.d-empty",
    }
    
  • add one or more file resources which deploy files into the target directory, example:
    file { "/etc/exim4/conf.d/router/400_testrouter":
      ensure => present,
      owner => "root",
      group => "root",
      mode => 0644,
      source => "puppet:///exim/exim4-conf.d/router/400_testrouter",
    }
    
Because puppet looks for those file resources which manage a sub-dir of the managed directory it is also possible to define a sub-directory with unmanaged files, which will then not get removed - no magic involved here:
file { "/etc/exim4/conf.d/acl":
  ensure => directory,
  owner => "root",
  group => "root",
  mode => 0755,
}

tcpdump: go non-promiscuous

| No TrackBacks
College of mine reminded me today, that one does not want tcpdump with the NIC in promiscuous mode (the default for tcpdump, turn it off with -p), when debugging problems. And really, there is no other use for tcpdump, than debugging problems. (You don't sniff for fun, do you? And I'd want to use Wireshark in that case anyway.)

So, why is promiscuous mode a bad idea?
Because tcpdump will show you a very different truth - it will show you what's on the wire, but not what ethernet packets your machine really accepts under normal conditions - it will only accept packets which destination address is set to the machines ethernet address (plus some multicast stuff, but I'm usually not interested in those). This will especially get you in trouble, if you rely on the IP adresses in tcpdumps output to determine if "this packet is for me". You will fool yourself into thinking, "oh these packets all arrive here", and all your further conclusions from this point forward are wrong.

Typical situation for this:
  • Machine M is connected to Router R
  • Machine M has got more than one IP address, but only the first one is directly bound to the interface
  • You swap the ethernet card in machine M (or migrate the whole machine to new hardware, probably more common these days)
  • IP connectivity works, but the secondary IPs are not reachable
    • because router R caches the ethernet address for all IPs
    • only the primary/first IP got updated in the routers ARP cache (= IP address to ethernet address cache)
  • tcpdump will show you different truths:
    • without ethernet address display turned on, plus NIC in promiscuous mode:
      • shows that everything is fine
    • NIC not in promiscous mode:
      • would have shown you that the packets don't arrive ...
I failed to recognize this problem two times by now, first time this got us quite some outage time, and Yesterday I saw the symptoms, got the feeling that again the ARP cache is acting up, finally resolved the issue, but I hadn't proof that this was the problem. And I could have had... if I'd used non-promisc mode or turned on ethernet address display in tcpdump.

No more double-clicks

| No TrackBacks
During a shower thought, I realized that I'm no longer double-clicking, for the most of my computer usage.
What are double clicks useful for? Maximizing windows, opening files, starting apps, etc.
But I'm just not doing this stuff any more - window management is only a keyboard thingie with xmonad, maximizing windows on OS X is a single-click anyway, and on Windows I've moved to using the (single-click) buttons, too.
The other tasks are pretty much non-existent with xmonad (and no "desktop manager").
On OS X app starts only come via quicksilver/the dock; opening files can also be done using drag&drop and I'm increasingly doing this.

Collecting config from your Puppet clients

| No TrackBacks
I'm doing more boring stuff using Puppet this week, but there are some highlights anyway. I needed to configure Nagios/NRPE-checks on all clients for disk usage, process count, swap space and system load.
In our setup the NRPE daemon needs to have the warning and critical values for those on the client. Configuring this for new clients is trivial, but for old ones it's quite a bit of manual work to collect the old configuration (instead of deploying the defaults and see what happens - avoiding Nagios alert storms is ++).

Facter + storedconfigs wipes all the manual stuff away:
  • deploy a new fact which collects the interesting stuff from nrpe.cfg
  • write a simple SELECT against the puppet database
  • (optional) write a simple script which tells you, what "default config" actually means
  • enjoy your config values

The collecter fact:
Facter.add("nrpe_collect") do
	setcode do
		fn = "/etc/nagios/nrpe_local.cfg"
		lines = File.read(fn)
		str = ""
		lines.each {|x|
			if x =~ /check_(disk|procs|swap|load)/
				str+=";" + x.chomp 
			end
		}
		str
	end
end
(Not the most beautiful ruby, but that took me like one minute to think and write.)

Query the puppet db (syntax for Postgres):
select hosts.name||' '||value from fact_values 
inner join hosts on fact_values.host_id=hosts.id
 inner join fact_names on fact_values.fact_name_id=fact_names.id
 where fact_names.name='nrpe_collect';
Results:
 vnode02.in.domain.at ;command[check_disk]=/usr/lib/nagios/plugins/check_disk -X nfs -w 20% -c 10%;command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20;command[check_swap]=/usr/lib/nagios/plugins/check_swap -w75% -c50%;command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 1200 -c 1450
 vnode03.in.domain.at ;command[check_disk]=/usr/lib/nagios/plugins/check_disk -X nfs -w 20% -c 10%;command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20;command[check_swap]=/usr/lib/nagios/plugins/check_swap -w75% -c50%;command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 1200 -c 1450
 (...)

This will probably result in the following manifest tomorrow:
In our allhosts class:
# nagios plugin for disk usage
if $disk_warning {
	$disk_warning = $disk_warning	# need this for puppet <0.24.6
} else {
	$disk_warning = "20%"
} if $disk_critical { $disk_critical = $disk_critical } else { $disk_critical = "10%" } nagios::plugin { "check_disk": check_script => "check_disk", args => "-w $disk_warning -c $disk_critical -X nfs", }
And the clients which are determined to have special values will get this in their node files:
	$disk_critical = "5%"
	$disk_warning = "10%"

Puppet 0.24.4 + Passenger in production

| No TrackBacks
Today we moved our Puppetmaster 0.24.4 installation to Passenger.

We've previously be running just plain WEBrick, and after adding a few more clients Yesterday, we ran into some troubling issues. A few clients just failed fetching files from the fileserver with "Connection reset by peer" errors.

Those errors seem to be gone now, and a few short puppetrun-s show that 6 master processes handle our (for now) 30 clients fine - and quick.

storedconfigs got us into some trouble at first: after the first client run, the master failed with a PGError saying that the PostgreSQL connection went away. I band-aided this with an ActiveRecord::Base.remove_connection in rack.rb after the client request has been executed; this should not do any harm, and works fine so far.

Puppet: Exported Resources

| No TrackBacks
As of today we're using Exported Resources to let our Munin and Bacula servers know about their clients.

It's really easy to set up. Enable stored configuration on the puppetmaster, create a resource the client exports and a place to collect them in the server config.

Looks like this for the client node config:
  @@file { "/var/local/puppet/munin-nodes/$fqdn":
 content => "[$fqdn]\n other munin stuff here",
tag => "munin",
}
And for the server node:
  File <<| tag == 'munin' |>>

So, what does this do, really?
  • when puppet runs on the client node:
    • encounter the @@file resource
    • save the encounter as well as the parameters to the storedconfigs db on the puppetmaster (in our case PostgreSQL of course).
    • that's it for the client node
  • when puppet runs on the server node:
    • encounter the File <<||>> directive
    • query all the stored @@file encounters from the storedconfigs db
    • only those matching the specified tag will be used
    • realize all the matching files onto the server node
    • => lots of files in /var/local/puppet/munin-node/
Easy, huh.

Note though, that the client node does not send a fully realized template back to puppetmaster, but will send the encounter of the @@file resource and the available $variables etc.
Also note that updates to the @@file resource will only become visible on the server node, after both the client node and the server node had a puppet run. (The exporting client node run must come before the server node run.)

Setup notes for puppetmaster on Debian etch:
  • You probably already run puppet and puppetmaster from backports.org.
  • That version requires the rails package from testing. It's not in bpo, so either fetch it from testing and directly install it or rebuild it yourself on etch (needs 2 or 3 other packages as well, _if_ you rebuild it). Rebuilding was painless though.
One more thing: if you want to manage the munin server, you'll have to use something like concatenated_file [from git.black.co.at] to generate munin.conf (as munin can't include a directory into it's configuration).

Detecting OpenVZ

| No TrackBacks
Reliably detecting if there is an OpenVZ Environment is pretty easy:

Just check for existance of /proc/user_beancounters.

But this will only tell you that OpenVZ is there. It won't tell you, if you are inside an unprivileged Virtual Environment (VE) or on the privileged Hardware Node (HN or VE0).
Still an easy check:

Read /proc/$PID/status and check for "envID: $VEID". $VEID will be 0 for the Hardware Node (hence the VE0 name). If it's greater than 0, you are inside an unprivileged VE.

Facter 1.5.3 will probably have support for this.

Puppet 0.25.x + Passenger

| No TrackBacks
In my previous entry about Puppet I was talking about using Puppet 0.24.6 (unreleased) inside Apache. (More specifically: running puppetmaster with Passenger in Apache).

I've now got a working code base for Puppet 0.25.x. + Passenger.

See for yourself:
http://github.com/zeha/puppet-rack/commits/feature/rack/

Unfortunately it seems like the current Puppet 0.25.x client can't talk to the corresponding master (regardless of WEBrick or Passenger), as not all required handlers/methods are implemented in the new REST interface.

HP ProLiant Resources

| No TrackBacks
Your tools of trade if you're working with HP ProLiant 3xx+ hardware. I've only got DL360s, DL380s and a few DL320s here at work, so can't say anything about the bigger ones. 

HP SmartStart or Firmware Update CDs
You should know them. SmartStart usually comes with the server, but hp.com obviously has newer versions. SmartStart comes in handy when configuring complex RAID stuff (more than one logical drive per array, something you can't do from the rom-based tool). Fimware maintenance CDs are one-shot firmware upgrades for your machine, so get them from hp.com too.
Yep, you can PXE-boot those. This HP-ITRC forum entry has the details including an awesome PDF (local copy of awesome pdf).

My recommendation: have both CDs on your boot server (and in your toolbox), but don't rely on the firmware update being functional when booted from the network - I saw problems when the NIC firmware got updated. Also I had problems with NFS with Release 8.20, but CIFS worked fine for me.

Also, keep copies of the older CD versions, if you've got older hardware. HP sometimes drop support for older hardware from newer CD releases. (Space constraints, etc.)

QuickSpecs: Hardware Specifications
Bookmark the QuickSpecs links for your hardware. Need to know exact physical dimensions? Maximum RAM module count? How RAMs need to be installed for your required RAM configuration? That one box is maxed out on CPU and you need to find faster CPUs? QuickSpecs have the answers. 

Proliant ILO2 Hardware Health Monitoring using Nagios

IT Resource Center
Hosts Warranty Check, KB, Support Case Manager. You will need one of them at some point.
ITRC logins are not the hp.com passport, so one more login to save.

Care Packs / Hardware support / Parts replacement
Care Packs are warranty extensions. And sometimes also software support.
Lookup tool for your hardware: Here. IE only. Doesn't work sometimes. Have hardware Part and Serial number ready.
If your product is wrong in their database, mail them, they can fix that. (I had an MSA60 recognized as a 6412 enclosure.)

Your hardware QuickSpecs tell you about the included warranty (for most Proliant models this is 3Yrs 9x5 NBD by now), and what Care Pack options are available.

Also, when contacting HP always have your hardware P/N and S/N ready, you'll need them. Sometimes you will also need the purchase date and (maybe) a copy of the invoice (mostly only if you are approaching end of warranty period). Best to document all of this when you deploy the server for the first time.
Store your HP customer number in your documentation. Saves you quite some time on the phone.


Another time I'll talk a bit about the basic software components that ship with ProLiants.

Debian Installer preseeding: autostart from PXE

| No TrackBacks
Assume that you have a fully working Debian Installer preseed configuration. Your x86 target machines do not have CD-ROM drives (and handling CDs is cumbersome), booting from elsewhere is not really an option.

Solution: boot from PXE.
DHCP, tftpd-hpa and pxelinux are set up easily in just a couple of minutes.

Here's the pxelinux.cfg configuration as needed to make the installer enter silent mode:
label auto
kernel debian/etch/amd64/linux append vga=normal initrd=debian/etch/amd64/initrd.gz DEBCONF_DEBUG=5 -- auto url=http://debian.namespace.at/d-i/etch/./preseed.cfg locale=en_US interface=auto console-keymaps-at/keymap=us debian-installer/country=AT hostname=installme domain=namespace.at
This will set up an English/US locale, US keyboard, set the preseed.cfg path and kick off installation. Hostname and domain are optional, but can be used to override the values from DHCP. DEBCONF_DEBUG=5 is quite useful to see what's currently happening. While the installer is running you can switch to console 4 to see what's going on. After the installation has finished you can take a look at /var/log/installer and see what happened. Saves quite some time while debugging late_scripts.

Debian Installer preseeding: partitioning

| No TrackBacks
For automated Debian installations you usually have two choices:
  • scripted install
  • image based
I actually don't like image based installs - they are usually a pain to update (with security updates, etc). For our automated installations I therefore chose a scripted approach, based on preseeding.
Most things are quite obvious to implement, but there are a few tricks still.

Partitioning is one of the tasks which is really tricky. It's so easy to get something wrong, and the installer will just not tell you why it failed.

What I wanted to achieve:
  • 16GB root fs on the first HP SmartArray device, outside LVM
  • rest of the first hpsa device as an LVM PV
  • one LVM VG called "vg1"
  • 1GB swap inside that
  • leave the other devices untouched
After taking a look at the relevant docs it looked like I could preseed the LVM stuff, but in the end I gave up doing that. Time needed to figure out what's wrong just isn't worth it.

So I ended up doing a simple recipe, which creates a 16GB rootfs (becoming /dev/cciss/c0d0p1) and a swap partition. The swap partition usually ends up being a logical partition, spanning the rest of the blockdevice (becoming /dev/cciss/c0d0p5).

Example:
d-i partman-auto/expert_recipe string regularvnode :: 16000 16000 16000 ext3 $primary{ } $bootable{ } method{ format } format{ } use_filesystem{ } filesystem{ ext3 } mountpoint{ / } .  100 10000 1000000000 linux-swap method{ swap } format{ } .


This example actually works. So safe it for reference!


In the late script I then run parted to drop the swap partition and prepare LVM:

Example late_script:

echo "Configuring LVM"
swapoff -a
swapoff /dev/cciss/c0d0p5
parted /dev/cciss/c0d0 -- rm 2
parted /dev/cciss/c0d0 -- mkpart primary ext2 16GB -1s
parted /dev/cciss/c0d0 -- toggle 2 lvm
pvcreate /dev/cciss/c0d0p2
apt-install lvm2 # make sure target knows about lvm
All of this was only tested with Debian etch. The lenny installer has a new share of problems, and I haven't successfully seeded it yet.

One thing to know: you can't have no swap partition. While this works when doing a manual install, it doesn't when seeded. In my experience the installer would just endlessly loop in the partitioner.

Puppet + Passenger

| 1 TrackBack

I've been working the last few days on getting puppetmaster (the puppet server) running inside Apache, using Passenger.

Why Passenger?

Simple answer: for the performance. We currently have over 150 servers (many of the virtual) to manage. Right now only a small subset of these servers is running the puppet client, but I'm looking forward to the point where we will manage all of them using puppet.

The puppet docs have to say this about scaling:

Mongrel scales much better than WEBrick, at least partially because it allows you to run multiple processes serving the same pool of clients on the same host. WEBrick only uses Ruby's threading, which does not scale beyond one processor, and it appears that WEBrick starts dropping connections beyond about 2 concurrent connections.

If you're getting connection-reset or End-of-file errors, you should try Mongrel. As more people try it and it proves to be stable, it will eventually become the preferred serving platform for the master.

While I understand that WEBrick is more or less just a development web server, I also know from other projects that Mongrel just doesn't cut it. The puppet way of running mongrel also seems to be even more cumbersome than running mongrel with mongrel-cluster. But, in any case, there is no one monitoring your mongrel processes, to see if they would die and then restart them. And I saw lots of mongrels dieing for various reasons already. (None of them were puppet mongrels though, didn't even bother trying that.)

First Results

The first result of my effort is a fully working puppetmaster for puppet 0.24.x running as a Passenger app. Technically, it's behaving like a rack application (and my config.ru is using the rack library), so Passenger just auto-discovers it and launches a puppetmaster instance on the first client connect.

All the usual Passenger configuration should apply, including process limits etc.

Caveats

You may wonder how SSL is handled in this configuration - Apache handles it, just like in a puppetmaster with mongrel setup. This has a few implications: Apache won't start up if the standalone puppetmasterd never started up and created the SSL certificates and CA. Everything else should work just fine.

There's also another catch: Passenger will not start an application as root, but always as the designated application user. Therefore puppetmaster will not create all the usual stuff (== no manifest check). This needs to be done by the standalone puppetmasterd, at least once.

Trying it out

What about 0.25.x?

I'm still working on that. 0.25.x changed the whole server side, so I've got a lot to do here.