tl;dr: Sanity-checking your input should go beyond making sure the input is acceptable. Try to predict contextually-likely mistakes your user will make (even – especially! – when your user is you) and prevent them, particularly when your operation is destructive.
I’m back on a macOS machine more of the time again, which means I’m using Music.app (née iTunes) more again. It’s not my favourite, but I don’t hate it. (Actually, I normally use Plexamp, but my Plex server is offline at the moment – long story.) My biggest frustration with Music.app has always been the lack of native FLAC support, which prevents me from simply network-mounting my music library from my file server and playing it back directly. I used to use Max to batch-convert FLAC to MP4-ensconced AAC (.m4a), because CoreAudio has about the best AAC encoder going, and afconvert is too inscrutable to use directly. But these days, FFmpeg’s native encoder is good enough, so I wrote a simple shell script to:
find all FLAC files under the current directory and – in parallel – use FFmpeg to convert them all to AAC-in-MP4;
find all M4A files under the current directory and use AtomicParsley to populate their album art from an artwork file in the same directory;
delete all non-M4A files under the current directory;
add everything left under the current directory into my Music.app library (by moving it into ~/Music/Music/Media.localized/Automatically Add to Music.localized/).
If those last two steps haven’t already set off alarm bells in your head, they should’ve! That’s some pretty destructive stuff, particularly because:
I’m applying it to the current directory, not some fixed “ingest” directory, and
this script probably lives somewhere in my $PATH, so I can invoke it (maybe accidentally) from anywhere.
In fact, I have accidentally invoked it, from my Downloads directory (which contains nothing of consequence), and from my home directory (which contains approximately everything of value stored locally on my machine). But: I didn’t lose any data in either instance. Why? Because I thought ahead, and added two preconditions to the conversion script:
No files are permitted in the immediate working directory. When I convert music, I do so album by album, not collections of loose files, and those albums are always in their own directories. E.g., I’m always working from a structure like this: . ./Casualties of Cool/Casualties of Cool (2014) ./Casualties of Cool/Casualties of Cool (2014)/CD 1 ./Casualties of Cool/Casualties of Cool (2014)/CD 1/01 - Daddy.flac ./Casualties of Cool/Casualties of Cool (2014)/CD 1/02 - ... So a stray file at the top level of the working directory is distinctly abnormal, and might mean that I’m in the wrong place.
The number of FLAC files (files with names ending in .flac) contained within the working directory must exceed the number of all other files.
These checks take more space in the script than the actual conversion work (15 lines vs. 4). And they’re totally worth it.
$WORKPLACE is using Hasura for a project. The core server component of Hasura – the “GraphQL Engine” – is distributed only as a Docker container. This works well enough on my Linux machine, but I have a strong aversion to running Docker on non-Linux machines. I find idea that containers can still be considered “lightweight” when they’re running inside a full-fat VM to be a bit laughable. I prefer to run things on “bare metal” in my development environment where I can. And while Hasura don’t distribute first-party native binaries, there’s hypothetically nothing stopping us from building our own; it is, after all, open source (APL 2.0). So let’s do that.
The Engine is written in Haskell, so as a prerequisite, install ghcup. If it’s been added to your PATH correctly, you should be able to run ghcup list and get a list of available Haskell compiler (GHC) and build/package manager (Cabal) versions.
$ git clone https://github.com/hasura/graphql-engine.git # Make sure we're building a specific release version, not just the master branch. $ git checkout v1.3.3 # The server-side source code is all in the server/ directory. $ cd graphql-engine/server
The Haskell build manager, Cabal, is almost capable of building the project without intervention. The fly in the ointment is that Hasura seems to be quite picky about the GHC version it’s built with, and Cabal isn’t high enough up the dependency food chain to pick which compiler version gets used. So instead, we’ll need to pull desired GHC and Cabal versions out of the freeze file, and install/set those as the defaults with ghcup:
# Install the GHC version corresponding to the required Haskell language base version. $ grep 'any.base ==' cabal.project.freeze any.base ==22.214.171.124, $ ghcup install ghc base-126.96.36.199 [ Info ] downloading: https://downloads.haskell.org/~ghc/8.10.1/ghc-8.10.1-x86_64-apple-darwin.tar.xz ... [ Info ] GHC installation successful $ ghcup set ghc base-188.8.131.52 [ Info ] GHC 8.10.1 successfully set as default version $ grep 'any.Cabal ==' cabal.project.freeze constraints: any.Cabal ==184.108.40.206, $ ghcup install cabal 220.127.116.11 [ Info ] downloading: https://downloads.haskell.org/~cabal/cabal-install-18.104.22.168/cabal-install-22.214.171.124-x86_64-apple-darwin17.7.0.tar.xz ... [ Info ] Cabal installation successful $ ghcup set cabal 126.96.36.199 [ Info ] Cabal 188.8.131.52 successfully set as default version
Before you can off and run make, you’ll also need a couple libraries installed: unixODBC and libpq are the important ones. On a Mac, you can brew install unixodbc libqp.
Then you can kick Cabal into action:
$ cabal v2-update (git junk happens) Downloading the latest package list from hackage.haskell.org To revert to previous state run: cabal v2-update 'hackage.haskell.org,...' $ cabal v2-build (git junk happens) Resolving dependencies... Build profile: -w ghc-8.10.1 -O1 In order, the following will be built (use -v for more details): (looong build process happens) Building executable 'graphql-engine' for graphql-engine-1.0.0.. Linking /path/to/graphql-engine/server/dist-newstyle/build/x86_64-osx/ghc-8.10.1/graphql-engine-1.0.0/x/graphql-engine/opt/build/graphql-engine/graphql-engine ...
And that’s it. The long path given by the last line (“Linking…”) is the final executable. It’ll be about 64MB. You can run it as-is and it’ll complain about missing arguments. The documented configuration arguments and environment variables are really just passed through to the binary running in the container, so they’ll work just fine with the binary running out of the container, too. For example, to connect to the local database testdb as the user testuser, and enable the console:
Note that the graphql-engine executable is dynamically linked against a number of other libraries:
$ otool -L /path/to/graphql-engine /path/to/graphql-engine: /usr/local/opt/postgresql/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.13.0) /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) /usr/lib/libcharset.1.dylib (compatibility version 2.0.0, current version 2.0.0)
Not very portable. To physically relocate these libraries and modify the linking of the graphql-engine executable for easier distribution, you can either do this yourself with install_name_tool, or use something like locallink to do it for you.
A question came up on reddit: why is my webpage being interpreted with the incorrect character encoding? The question (which has since been removed, or else I’d link to it) involved some specifics about how the page was being served, and, paraphrased, the answer was that PHP-generated pages were served with an HTTP Content-Type header which included encoding information, and static HTML pages weren’t.
But the markup included a <meta http-equiv="Content-Type"> tag. Shouldn’t the encoding have gotten interpreted correctly regardless of the Content-Type header, then? I threw off a quick, cargo-cult-tinged remark about being sure to place a <meta> tag to specifying character encoding before the <title> tag, and someone else said they thought it was interpreted anywhere in the first 128 bytes of the document.
Rather than continue to perpetrate hearsay and questionably-shiny pearls of wisdom, I’m going to try to nail down some factual observations about the behaviour of content-encoding-information-bearing <meta> tags. I want to know how the tag interacts with real HTTP headers, how it behaves when it’s placed at different offsets in the document (before rendered content? After rendered content? A long way after the start of the document?), and whether different browsers handle it differently.
It’s been a slower couple of weeks. M is back into her studies, so my own days are following a bit more of a pattern. I still regular events keeping me up until the wee hours with people back in Canada on Tuesday and Thursday nights, but I’m trying to get in and out of bed at reasonable times otherwise. I went through a few days where I was pretty into Slime Rancher, and now Newto is pulling me back into TIS-100 and Infinifactory. The end of the summer is visible. Days are slowly getting shorter; it’s dark before 22:00.
Cottage cleaning continued
With the cottage empty for a few days and a bit of better weather, I undertook to power wash the paving tiles around the cottage.
Satisfying results, for sure. Need to do around the house next.
A café in Aberlour has started serving bubble waffles, so of course we had to take a walk down to try them. Hard to measure up to Moo Shu, but it was pretty good. In my excitement, I neglected to take any photos; but I did get a couple on the walk back up to the house.
Employment and writing
It’s time to start thinking about the fall. M is planning on moving to Strathkinness for the winter, and I need to start looking for something to do – something which will look good on a rental application. “Occasional open source contributor” probably isn’t enough. And I’m struggling to push ahead on that work, anyway, without anyone else interested in me doing it. Guess it’s time to start looking for real work. I started poking around for contract opportunities on Upwork. Lots of people looking for lots of things I can’t do. Imposter syndrome hits hard and fast. It’s the sort of thing to make one dream of switching careers. Too bad my obvious backup, live sound, is still in such a overwhelming state of COVID-induced decline.
I wrote an exploratory article on the performance of a particular design decision I made in passing on a project years ago. I don’t really remember what I was doing which brought it to mind, but I’m glad it was did. It was fun to write up, especially because I hadn’t put much thought into the performance of the choice at the time: it was originally stylistically-motivated. While writing the code samples for the post, I found that the amount of material I wanted to provide was unpleasantly long to read casually. Rather than provide a condensed version of the code in the post and link to a longer form, I annotated the unimportant sections and wrote a plugin to control visibility. I’m still struggling with the language to use to describe this feature: is it expansion? Compression? “Squeezing” is the term I used in the plugin name, but that was mostly for the pun value. And what are the bits that get hidden? Layers? Blocks? Expansions? Would like to get back to it and give it a proper readme, a configuration interface, and release it to the WordPress Plugin Directory.
Cullen and the Knock
We spent most of yesterday afternoon in Cullen, sitting on a bench overlooking the harbour. We had lunch, and M read while I watched the goings-on and enjoyed the day. People watching isn’t usually my thing, but people don’t usually move about that much, either.
In the evening, we climbed the Knock and had a picnic supper. That’s Ben Rinnes in the background.
It’s been a beautiful couple of days here. Clear, bright and sunny, just enough of a breeze.
I’d like to land at least one contract on Upwork, even if just something small.
I’d like to beat (or at least match) Newto’s cycle count on Interrupt Handler.
I’d like to write two more sections for the library logging analysis project/post I started on months ago.
I’d like to at least look into building a universal macOS binary for NRJS to support both x86-64 and ARM64.
I’d like to go at least one whole day without having an existential crisis.
Rebooting Seefilms by way of DCGSFT, and on the use of Syncplay
This is the eleventh summer some friends and I have run the DCGSFT drama group. Due to COVID-19 inhibiting its usual format, the group isn’t doing a performance this year; instead, we’re taking the opportunity (and advantage of everyone’s newly-developed familiarity with video conference calls) to do group readings of plays. It’s not very formal; whoever shows up on the call gets a part, and parts are assigned randomly (and can rotate from scene to scene). So far, we’ve read through Dear Brutus by J. M. Barrie, and Saint Joan by George Bernard Shaw.
This week, we took a break from reading plays, and watched a film adaptation of one instead: the 2002 adaptation of The Importance of Being Earnest by Oscar Wilde. We performed this play last summer, so it was pretty familiar to most; beyond that, I’d already re-read the script and watched this cinematic adaptation a month or two ago with KJJ’s Windstone book club. It was neat to rewatch it again with a different crew, and see different things pulled out of it.
Outside of the actual watching, there was a little technical effort involved on my part to get things set up for synchronous watching. I wanted to do something a little more structured than someone in the call doing a countdown and having everyone try to hit “play” at the same time. And because some people watching (self included, but also our fearless director) are doing so from fairly crappy Internet connections, I wanted a system where some people could stream the movie directly, while other people could download a copy ahead of time. The obvious solution is Syncplay: it’s free (in both of the usual senses), it’s cross-platform, and it supports a number of media players.
Syncplay, however, is only one piece of the puzzle (the other being a media player). Setup requires a little more attention than I (pessimistically, and perhaps uncharitably) expected all participants to be able to muddle through by themselves. I spent several hours trying to package up a turnkey distribution of Syncplay, mpv, and a configuration file for Syncplay which would automatically log into the right server and room with a preconfigured username. The intent was to throw together a web interface which would ask for a username, then patch the configuration file, and deliver a customized archive to the user. Unfortunately, the Syncplay configuration dialog doesn’t appear to be skippable. When loading the configuration file, Syncplay substitutes default values for unspecified configuration options, so I could generate a file specifying only the things I care about (username, server, room, etc.) – but the configuration dialog would always pop up anyway when launching Syncplay. Rightly or wrongly, I felt that if I couldn’t totally achieve my objective of a one-click launch then it wasn’t worth trying to build my own package at all. In the end, I wrote some instructions for installing the individual bits and pieces, and it worked out okay. We had a few issues with the stream pausing and spuriously rewinding/skipping, and I’m not sure it was really worth the effort to get everyone to use it; but it did function in the end.
Most of the big effort of the week went into cleaning up around the cottage in anticipation of the arrival of the first Airbnb guests of the season. This involved the usual mundane deep-clean things we hadn’t done earlier in the year, like windows and the fridge, along with extra, pandemic-related disinfecting.
Earlier in the year, M and I cut down a whole bunch of Scotch broom near the cottage:
We had piled it all the in cottage driveway, because we thought that would be the easiest place from which to load it all into the trailer and take it to the dump. Unfortunately, the dump has stopped taking trailer-loads for now, so there the pile sat. We ended up hand-carrying it down the hill and re-piling it in the main parking lot next to the drive shed.
The trouble with cleaning things is that you keep finding more things to clean. I took a brush to the garden bench thinking I could quickly knock some of the moss off it; a couple hours (and one serious application of wire brush and elbow grease) later, the bench is much cleaner, but really ought to be properly sanded and stained. To be done some time when there’s no one in the cottage to miss the bench – maybe in the fall.
M spent a lot of her time trying to beat the overgrown cottage garden into submission. In particular, one large rosebush had pulled itself off the wall and needed reanchoring. And a barrel planter needed replanting; and the interior of the cottage is much improved for the addition of several little plants. It’s not a great time to be cutting things back, horticulturally, but you’ve gotta do what you’ve gotta do. My thumbs are pretty brown, so I keep getting pretty surprised by what a few green things can accomplish.
It does feel really good to get this stuff off of the list. Some of it has been on there since we got here.
I’d still like to do the things which I said last week that I wanted to do this week. This week has been pretty physically demanding, so it was hard to muster the interest in expending much mental effort even when I did get the time. Hopefully, with the cottage pressure off, I’ll feel a bit more like tackling some deeper technical challenges in the coming week.
I made steaks (and frites). First time trying a reverse sear. The ribeyes we got were a bit thinner than necessary to take full advantage of the technique, but it was a good first stab at it. Even if they did come out closer to medium-well, they tasted phenomenal, and stayed wonderfully juicy. I’ll definitely be doing that again. I was a bit too ginger with the temperature for the fries, and while they were well-cooked and seasoned, they were a bit on the limp side. Would’ve been a good first pass for double-fried fries, but I was trying to do it in one. Live and learn.
M found what she believed to be an expansive patch of wild blueberries in the woods, and, after confirming that they were indeed wild blueberries, we collected probably half a cup of them. A bit bitter by themselves, but fantastic with (somewhat more than half a cup of) ice cream.
We repeated the experiment with raspberries from the garden, and appreciated the results similarly (although not quite as much – the blueberries really were great).
I’ve been mulling over whether and how best to get involved with the local community radio station. I mocked up a new website layout, but it’s not done yet and I ran into a few snags, so I put it back on the shelf. I did build prototype apps for both Android and iOS:
I tried to build something functionally equivalent to this many years ago for Axe Radio, but never got it finished. This seemed like a good opportunity to prove to myself that I could do it, given the time.
I ran into frustrations building both. The Android player was built over parts of three days. The first day was a false start based on the template Android Studio gives you if you start a new app and accept all of its default proposals. I ended up with Kotlin, which sounds fine, and the AppCompat/AndroidX support libraries. That sounded all well and good (I was looking forward to playing with Kotlin), but combined, those options carry over 2MB overhead with them in your published app package. To me, for an app which should be able to get by on media playback functionality built into Android for years now, that’s unacceptable. I spent the rest of the day in a new, Java-based project trying to pare back the Gradle build definition to a point where I could understand everything it was doing.
On day two, I got down to business wiring things up. On Android, you need to manage your own background task for long-living audio playback. It’s not hard, but it is a little tedious to get set up properly. Beyond that, the built-in programmatic media player, the aptly-named MediaPlayer, doesn’t give very good error messages. I stopped in frustration when I got to the point where older versions of Android were working fine, but Android 9+ failed with an obscure error code (1, meaning MEDIA_ERROR_UNKNOWN, with extra information -2147483648, meaning MEDIA_ERROR_SYSTEM – helpful and specific, I’m sure you’ll agree).
On day three, I proved that my implementation worked on some level when I succeeded in playing back media embedded in the app. After much fumbling and floundering, I tried side-stepping MediaPlayer’s built-in HTTP retrieval mechanism: I made my own HTTP request for a fixed-length chunk of the audio stream, and fed the bytes into MediaPlayer. And… it didn’t even make it to MediaPlayer, because I got an error making the HTTP request. As of Android 9, all HTTP traffic must be secure (HTTPS), or you need to configure your app to allow plaintext traffic. And yet, I had already configured that back on day 2. Turns out that redeploying the app through the Android Studio debugger may not cause the device to pick up on changes to the app manifest. After I uninstalled the app from the emulated device and reinstalled it, it worked perfectly. Aaugh.
As for iOS, I ran into slightly similar backwards-compatibility issues trying to start the project. The default Xcode template for a new Swift app relies on SwiftUI, which is only available as of iOS 13. I wanted to target something at least a version or maybe two older than that. Even after starting afresh with a Storyboard-based app, I still had to rip out a lot of references to scenes before it would compile. After that, it was mostly straightforward to get something working. iOS has similar restrictions on HTTP content in recent versions, but it gives helpful error messages about it. It’s also is much more restrictive about what can run in the background than is Android, but the trade-off is that the process for doing that background work much simpler: tick off the “Audio, AirPlay, and Picture in Picture” background mode permission, and your AVPlayer will do the right thing. No coordinating state between your app’s UI and some background task/service, because it’s managed for you. Much less flexible, but much easier to do – as long as what you want to do is on the path well trod. Then again, not in all cases: error handling for AVPlayer is a mess, and the iOS version of the app mostly doesn’t do it right now. HTTP errors from the source (e.g., 404 if the stream is offline) are handled separately from playback/stream errors (e.g., bad decoding), and are accessed through clumsy, Objective-C-style key-value observers. Even knowing when the player has started actually playing back media versus just starting to buffer it is tedious.
On one hand, it’s nice to have knocked out proofs-of-concept for both of these. On the other hand, neither experience was welcoming. Both Android Studio and Xcode were massive downloads on my wee, limited Internet connection here. Android Studio started off better, with a “mere” 850 MB installer, but immediately after install kicked off downloading a few more hundred megs of updates and plugins. Then, to emulate a device, you need to download an image for each Android version you want to run, which are 700 MB to 1.1 GB. Total weight, by the time all was said and done, was probably in the neighbourhood of 5.5 GB. Xcode, on the other hand, is a single massive 7.8 GB package, but it does have all of the bits in the box, so to speak.
I don’t like using either IDE much. IntelliJ (even for more traditional Java projects) always feels like death by a thousand cuts, with every little thing being just a little different than I’m used to. And without getting into its performance issues, Xcode continues to make worse and worse use of screen real estate. I feel like I could see more code on the screen at once in the QBASIC editor on a CGA monitor than in a contemporary Xcode session. This whole experience has been a good reminder of why I never finished doing this for Axe Radio: too much tooth-pulling to go through voluntarily, unless you’re feeling particularly stubborn.
I also sent an e-mail to KCR asking if and how I could get involved. Haven’t heard back yet, but maybe something beyond my own edification will come of this, eventually.
I added a “The webcam is located at…” note to the webcam page. I realized that I’ve been sending this link to friends who don’t really have a precise idea of where I am, so this answers the question nicely.
I’m hoping to get a timelapse interface built soon. I have weeks of images now at five-minute intervals, and it would be fun to be able to browse through them. In addition to the obvious daily timelapse (animation of all of the images from the last 24 hours in sequence), I think it would be neat to do a time-by-day timelapse, showing images from (around) the same time across multiple days. It wouldn’t look as fluid (the clouds would jump around a lot), but it would be an easy way to compare day-by-day weather.
I’m pretty frustrated with the reflections on the inside of the window. I’d really like to put the camera outside the house. The webcam I’m using (Logitech C920) is not weather-protected in the slightest, but maybe I can build a box to put it in. Or maybe I can replace the plastic housing with aluminum. Then I could put a nicer lens on it, too. But that’s a fairly expensive option, as cool as the result would be. I’ve also thought about replacing the camera entirely with a security camera. Some of those are pretty cheap. But they can also pretty sketchy, with unknown data leakage, and sometimes the image can only be accessed through some cloud service, not directly. A CUBE would be perfect, only that it isn’t out yet. I could even resort to a Raspberry Pi High Quality Camera + a lens + a PoE HAT + some waterproof case, and have a self-contained unit. But that’s probably the most expensive solution of the lot.
M and I climed up Ben Rinnes.
A difference of 525 m in altitude from base to peak. Took minutes shy of three hours to make the round trip, with about an hour and three-quarters of that being the trek up, and maybe ten or fifteen minutes spent at the top.
We were pretty spent afterwards, but after staring out the window at it for a few months, it felt pretty good to climb it and look down from the opposite perspective.
I’m hoping to get back to some research/analysis I was doing for NRJavaSerial. It would also be nice to finish up error handling for the iOS version of the KCR app, and maybe start on design work for both the Android and iOS versions. Displaying some information on the currently-playing show and music would be nice, too.
I was making dinner and needed to time something. My phone already had another timer going, and running two timers simultaneously is clearly beyond the capacity of a modern smartphone, so I reached for my laptop instead. And as I Googled “macos desktop timer” (because having any timer at all built into macOS is clearly an extravagance, and not one Apple has seen fit to bequeath us with), my Wi-Fi dropped. So I grabbed a terminal, and ran:
1 and a bit cups of farfalle or maybe rotini? Idk I’m not Italian
Get home a bit late from work. Decide you want pasta. Realize that if you wanted pasta, you should’ve probably gotten something out of the freezer this morning or you’re going to have to suck it up and go to the grocery store. Do neither and submerge the freezer-bag-ensconced ground beef in tepid water with the hope that it’ll defrost quickly.
Begin veg prep. Do something somewhat like mincing to the mushrooms; that’s cool, right? Crush the garlic with the flat of the knife because it’s satisfying then question how much that really affects anything in the overall swing of things while continuing to mangle it finer with the actual, sharp, you’re-supposed-to-use-this-part of the blade.
Get bored waiting for the beef to thaw. You’re going to cook it pretty much well-done anyway, right, so why not just let it continue to thaw in the pan? Toss it in at medium-high heat.
Ah, that’s why not. Go throw in some olive oil much too late in the process (“It’s medium so it’s got lots of fat in it, right?”) and get the batteries out of the fire alarms (“Oh, that’s where that one is!”). Do your best to tear apart the burnt chunks while continuing to break up the larger, still-mostly-frozen pieces.
Add the garlic in here somewhere.
Once the beef is thoroughly cooked and not a moment before (because that would be the right time to do it), add the can of soup. Spread it around for a few seconds while trying to decide what other liquid would make this less of a balled-up disaster then slosh in a cup of milk from the measuring-ish implement you can find. Panic in the face of your decisions: it will truly be a miracle if this results in something that is in any way edible.
Make a huge mess getting all the various bits in the pan melded together. Realize you should’ve used a larger pan. Reduce heat to medium. Reduce the sauce, too. Taste the sauce. Bad call on the cream of celery there, mate. Remember you forgot the mushrooms. Add the mushrooms. Add black pepper.
Around the time the mixture has been reduced to a more normal sauce-like consistency and you’ve been reduced nearly to a nervous breakdown, remember the pasta. Get the water on to boil. Decide the reduction has gone on for long enough and bring the heat down as low as it goes.
Do some dishes to kill time. Nervously eye the quantity of pasta you’ve deemed appropriate. Will it be enough? Will it be far, far, far, far too much? Literally no one is capable of knowing and anyone who says otherwise is a damned liar.
When the water boils, add the pasta. Shortly thereafter, remember the red pepper. Add the red pepper to the sauce. Good thing you don’t like it too cooked. Add some more black pepper while you’re at it too, ‘cause why not? Remember you forgot to set a timer when you added the pasta. Make a note of whatever the clock on the stove reads and decide against setting a timer because it’s too late for that now.
Chop some parsley that you’ve had in the crisper drawer for six weeks which is miraculously still green because that’s a thing people put on top of pasta sauce, right?
Do more dishes.
Oh look, the pasta’s done. Sweet. Drain the pasta.
Start writing blog post.
Remember the pasta. Plate pasta, add sauce, top with parsley. Take photo for Instagram. Realize photo looks like dogfood. Decide to use photo for blog post instead.
Excel 2016 includes the exciting new ability to save files as “CSV UTF-8 (Comma delimited)” in addition to the old “Comma Separated Values (.csv)” format. In magic fairy dreamland, this output format solves the age-old CSV character set problem: CSVs produced by Excel on Windows encode text using Windows-1252, and CSVs produced by Excel on Mac use MacRoman. Unfortunately, over here in the real world, this new export format creates more problems than it solves. Files saved in this format still suffer from two other problems: incorrect line ending, and a shocking corruption issue resulting in potentially irrecoverable data loss. And, on top of that, Excel still has issues even opening files containing special characters.
Let’s start with an example. Here’s the CSV:
Pretty standard stuff. If you inspect the raw file, you’ll find Unix-style line endings (that is, a single <LF> at the end of each line), quotes escaped with quotes, UTF-8 encoding (note the two-byte “é”), and a trailing newline.
Now, let’s bring this file into Excel, as one does. I’m using a fully-updated (as of this post) copy of Excel for Mac 2016. Here’s what happens if I just open the file:
Well, that’s a great start. To get special characters to decode correctly, I need to create a new workbook, then from the “Data” ribbon, select “From Text” – then, after selecting the file, I can set the “File origin” to “Unicode (UTF-8)”. Of course, that method of import doesn’t support line breaks in fields:
To produce a file Excel can open the normal way, I had to re-encode the reference as UTF-16LE:
Alright, onward to output. First, let’s see what happens when we save the file using the oldschool “Comma Separated Values (.csv)” format:
Okay, so it’s close. Our <LF>s have been replaced with <CR><LF>s (with the exception of the the linebreak within the second column of the third row, which has bizarely been replaced with a <CR>) and we lose our trailing newline, but that’s no big deal. The quotes are still correctly quoted. And as expected, the special character has been replaced with 8E. All completely reversible:
(First iconv performs the character set conversion, then perl invocation fixes the newlines, and finally sed adds the trailing newline.)
There’s just one problem: Excel for Mac itself doesn’t properly reopen this file!
That’s… That’s super special, guys.
Okay, moving along. Let’s see what the “CSV UTF-8 (Comma delimited) (.csv)” format does:
Well, to its credit, the file is UTF-8-encoded. Unfortunately, that’s also where things start going wrong: the first three bytes of the file are a BOM, because Microsoft tools are crazy. Still, given the above experience trying to open a UTF-8 file, I’m not surprised.
The next problem is that it’s replaced all the line endings with… just carriage returns?! How can you possibly regress on something like line endings? Windows- or Unix-style, I really don’t care, but why on God’s green earth would you use Classic Mac-style, <CR>-only line termination? That’s just offensive.
But that’s not the biggest problem: no, the biggest problem, as alluded to in the image above, is the behaviour produced when escaping quotes within a field. Why is the first quote not escaped, and where did that extra et" come from? I’ll give you a hint: it’s from earlier in the field.
Let’s try a series of small tests, each with only a single row and column:
I think you get the picture. The output, when quotes within fields are involved, is completely untrustworthy. The original input is all but unrecoverable (if there is a pattern for “unwinding” the damage, it isn’t obvious to me), so any file saved this way is basically useless if there are quotes anywhere other than the header. And I haven’t tried, but it looks like it would be pretty easy to generate collisions (the same output for different input) meaning that even if there is a pattern, it’s almost certainly not bijective.
I can understand that these sorts of issues could have flown under the radar during the runup to one of the biggest revisions to Office for Mac. What I don’t get is how nobody’s noticed this nearly eighteen months after Office for Mac 2016 was released to Office 365 subscribers. At time of release, Office 2016 seemed like a huge step forward for Office for Mac, including far better compatibility with the Windows version of Office applications and a significantly improved UI. But the persistence of issues like these do raise doubts about Microsoft’s continuing commitment to bugfixes outside of major releases. I hope this one gets fixed soon.