CSV Support in Excel for Mac 2016 is a Travesty

Excel 2016 includes the exciting new ability to save files as “CSV UTF-8 (Comma delimited)” in addition to the old “Comma Separated Values (.csv)” format. In magic fairy dreamland, this output format solves the age-old CSV character set problem: CSVs produced by Excel on Windows encode text using Windows-1252, and CSVs produced by Excel on Mac use MacRoman. Unfortunately, over here in the real world, this new export format creates more problems than it solves. Files saved in this format still suffer from two other problems: incorrect line ending, and a shocking corruption issue resulting in potentially irrecoverable data loss. And, on top of that, Excel still has issues even opening files containing special characters.

Opening

Let’s start with an example. Here’s the CSV:

Pretty standard stuff. If you inspect the raw file, you’ll find Unix-style line endings (that is, a single <LF> at the end of each line), quotes escaped with quotes, UTF-8 encoding (note the two-byte “é”), and a trailing newline.

Now, let’s bring this file into Excel, as one does. I’m using a fully-updated (as of this post) copy of Excel for Mac 2016. Here’s what happens if I just open the file:

Well, that’s a great start. To get special characters to decode correctly, I need to create a new workbook, then from the “Data” ribbon, select “From Text” – then, after selecting the file, I can set the “File origin” to “Unicode (UTF-8)”. Of course, that method of import doesn’t support line breaks in fields:

To produce a file Excel can open the normal way, I had to re-encode the reference as UTF-16LE:

$ iconv -f utf-8 -t utf-16le <csv-good.csv >csv-converted.csv

Or prefix the file with a byte order mark (BOM):

$ printf $'\xef\xbb\xbf' | cat - csv-good.csv >csv-converted.csv

Saving

Alright, onward to output. First, let’s see what happens when we save the file using the oldschool “Comma Separated Values (.csv)” format:

Okay, so it’s close. Our <LF>s have been replaced with <CR><LF>s (with the exception of the the linebreak within the second column of the third row, which has bizarely been replaced with a <CR>) and we lose our trailing newline, but that’s no big deal. The quotes are still correctly quoted. And as expected, the special character has been replaced with 8E. All completely reversible:

$ iconv -f mac -t utf8 <csv-mac.csv \
    | perl -pe 's/\r\n?/\n/g' \
    | sed -e 's/$//' \
    | diff csv-good.csv -
$ echo $?
0

(First iconv performs the character set conversion, then perl invocation fixes the newlines, and finally sed adds the trailing newline.)

There’s just one problem: Excel for Mac itself doesn’t properly reopen this file!

That’s… That’s super special, guys.

Okay, moving along. Let’s see what the “CSV UTF-8 (Comma delimited) (.csv)” format does:

Well, that’s…

Oh boy.

Well, to its credit, the file is UTF-8-encoded. Unfortunately, that’s also where things start going wrong: the first three bytes of the file are a BOM, because Microsoft tools are crazy. Still, given the above experience trying to open a UTF-8 file, I’m not surprised.

The next problem is that it’s replaced all the line endings with… just carriage returns?! How can you possibly regress on something like line endings? Windows- or Unix-style, I really don’t care, but why on God’s green earth would you use Classic Mac-style, <CR>-only line termination? That’s just offensive.

But that’s not the biggest problem: no, the biggest problem, as alluded to in the image above, is the behaviour produced when escaping quotes within a field. Why is the first quote not escaped, and where did that extra et" come from? I’ll give you a hint: it’s from earlier in the field.

Let’s try a series of small tests, each with only a single row and column:

Input	Output
`0123456789"`	`"0123456789""""`
`0123456789""`	`"0123456789""""1"`
`0123456789""""`	`"0123456789""""""123"`
`0123456789""""""""""`	`"0123456789""""""""""""""""""""""123456789""""""""""""23456789""""`
`01234"56789`	`"01234"56789""1234""""`
`01234""""56789`	`"01234""""56789""1234""""""234"`
`012""345""6789`	`"012""345""6789""12""345""6""2""345"""""345"""345""345"`

I think you get the picture. The output, when quotes within fields are involved, is completely untrustworthy. The original input is all but unrecoverable (if there is a pattern for “unwinding” the damage, it isn’t obvious to me), so any file saved this way is basically useless if there are quotes anywhere other than the header. And I haven’t tried, but it looks like it would be pretty easy to generate collisions (the same output for different input) meaning that even if there is a pattern, it’s almost certainly not bijective.

I can understand that these sorts of issues could have flown under the radar during the runup to one of the biggest revisions to Office for Mac. What I don’t get is how nobody’s noticed this nearly eighteen months after Office for Mac 2016 was released to Office 365 subscribers. At time of release, Office 2016 seemed like a huge step forward for Office for Mac, including far better compatibility with the Windows version of Office applications and a significantly improved UI. But the persistence of issues like these do raise doubts about Microsoft’s continuing commitment to bugfixes outside of major releases. I hope this one gets fixed soon.

DIY Audio Snake, Part 2

(Alternate title: “As If I Didn’t Hate Myself Enough Already.”)

All the bits showed up, a heaping pile of small bubblepack envelopes with mixed Mandarin and English labeling. I’m sure I’m on at least two watchlists now due to the cumulative volume. I didn’t really track arrival dates, but the pieces trickling in at a rate of one or two per day over most of a week. After writing the last post, I revisited the project box dimensions and ultimately decided I’d better preemptively order larger ones. It’s a good thing I did, too, because the ones I originally got were far too small.

So it all got here, and then I put it on a shelf for a couple more weeks while I worked up the nerve to start the build. On December 7th I finally got sufficiently brave – foolhardy? – to make a thing. It all went pretty much as previously described: HDMI to DB-25 in the boxes, XLR to DB-25 as pigtails, and pins as I thought they ought to go. What I wasn’t prepared for was the amount of time taken building the pigtails. The microphone cable ended up being much smaller in diameter than I thought it was which meant space wasn’t an issue but stripping it sure was. In order to give myself enough spare that I could afford to make mistakes, I used 16cm lengths between the DB-25 and the XLR connectors. Good thing, too, as I ended up with only around 50cm left (of the original ~300cm – ordering in Imperial units then measuring in metric is hard). If I ever do anything like this again, I’ll figure out how to strip shielded wire without accidentally cutting through half the shielding. (Really, the big problem was that the largest hole on my wire strippers was still too small. By “figure out”, I really mean “buy a more appropriate tool”.)

I used 10cm lengths inside the box because unlike the mic cable, I’d overestimated how much I’d need for that. The boxes I ended up using were more shallow than wide (which was perfect) and I could’ve physically gotten away with 8cm or less – I just don’t know that I’d have been able to have worked with a more constrained length. Stripping short wire is a bit difficult because you don’t have much to hold onto. As for mounting the connectors onto the end plates of the box, Dad cut (drilled, really) appropriately-sized holes, then I mounted both the HDMI and DB-25 connectors with threaded anchors. The HDMI connector looks a bit funny from the outside as there’s no such thing as a screw-down HDMI cable but I used one anchor through the board (with a nut on the backside) as a spacer then ran another anchor through the end plate into the first one. The threading in the anchors isn’t as deep as their screws are long so there’s almost a millimetre of slack between the end plate and the anchor head. It doesn’t matter, though, as the mounting holes were drilled sufficiently small that the screws won’t move without rotational torque.

And then I tested the boxes with a multimeter, labeled the connectors, and took them to Team Night without ever having run audio through it. Yeah, I live life on the edge alright. More like, after seven hours of assembly, I needed some affirmation of my life decisions regardless of how the final product turned out. And the response was as positive as I’d hoped. A surprising number of people thought I was somehow running a real HDMI signal, not just using an HDMI cable as an analogue carrier. But once we got that sorted out, we found a speaker and a 50′ HDMI cable and took them for a spin. We didn’t actually try with microphone-level signals, but with line-level signals they worked admirably. Signal loss was barely audible for the first four channels, and got progressively worse (but not by much) for the last two.

There’s a couple other things I’d still like to do. The pigtails are especially fragile as the mic cable is too small in diameter to fill the holes in the plastic sheathes of the XLR connectors, and even six of them don’t fill the hole in the plastic DB-25 hood. This means the connections on both ends are supported only by the shortest wire in the cable. I made sure in all cases that it’s the ground connector (physically, the cable shielding) carrying the stress but that’s only marginally better. I’d like to add some better stress relief before (if ever) using the units “for realsies”.

I’d also like to get some hard numbers on signal loss. Probably just straightfoward as connecting each line to the ins and outs of an audio interface, generating some signal, and comparing what comes back to the interface against what was sent out. But it would be nice to do the test with a variety of lengths of HDMI cables to see how much the cable plays in versus shoddy soldering.

Up next: probably nothing. Even if the signal test does say nice things, the product is still kind of pointless. Six channels isn’t worth much, and HDMI is ultimately pretty fragile. Even with reinforcement, I’d still be wary of throwing something as half-baked as this onto a stage. It certainly wouldn’t be an important stage. Still, it was a neat project in a lot of other ways: I got to try some 100% Chineseum XLR connectors (terrible), solder a lot of wires (backbreaking until you get your table set up correctly), and learn about how HDMI electrically works under the hood (before promptly ignoring that and doing something else). A+ would recommend.

Feel It

I’m a pretty hands-on guy. I like to twist knobs, push levers, turn dials. A whiteboard or flip chart is my best friend when scribbling down ideas, not because I particularly subscribe to the tenets of the corporate office space, but because I like scratching down words and diagrams on something I can feel.

I’ve been incredibly lucky to work at the same place for four summers running, and they’ve asked me back for a fifth. Lucky not because of the stability (although that’s a definite plus), but because the company is great and the product is really cool. The details of the product are, while interesting in and of themselves, irrelevant to this discussion; what matters is that it’s a real, physical product. It’s a beautiful blend of engineering: electrical, mechanical, and more than a little software. And I can feel it, and I can interact with it.

Which brings me to a huge disappointment I find in software: the impossibility of picking up, of playing with the things I’ve made. The intangibility of software is one of the leading reasons I often wish my aptitudes lay elsewhere.

I don’t think this trait is particularly limited to me. For example, despite the vast improvements computers have brought to the world of audio engineering over the last few decades, people still spend large bags of money on buttons and knobs and keys not because they have any impact on the range of sounds the computer can product but because they provide a more natural means of interaction with the machine. On-screen widgets which don’t provide the same tactile feedback as physical controls are simply not as satisfying to use nor as rapid at manipulating the underlying values.

It’s in our nature to want to interact with things physically. After all, fingers are pretty good at it, and we do have eight of those plus another two thumbs to help us hang onto the things we’re interacting with. Wouldn’t it be nice to be able to interact with computers in ways more akin to our nature? There have been some attempts to manufacture controls adaptable to the wide range of input tasks that software can demand; on the opposite end of the spectrum, some static controllers just have inputs for everything imaginable.

Still, these are all pretty specialized pieces of equipment: generally either specialized for gaming or multimedia production. It’d be super to see something like a USB version of the Big Knob on every desk, as standard as a keyboard. Some keyboard manufacturers have attempted to achieve something like this by jamming a million extra buttons onto keyboards. I don’t think buttons are the answer: binary states are too limiting. But then, they’re also very general, and software is a very general universe. There are so many applications for better input solutions, and then software would have to expose itself to the new controllers, and it’d be the same headache as the introduction of the mouse all over again.

When can we have our interactive hologram displays, please?

Words

I have an enthusiasm for regular languages which borders on fetishism. PineConeBoy has had reason to half-chastise, half-mock me on several different occasions due to my well-meaning but frustratingly clumsy attempts to enforce regularity on the English language by intentionally breaking grammatical rules or restricting myself from using more-inconsistent word forms. For example, I used to capitalize every word of titles, without exception; even though I realized capitalizing “of” and “the” was grossly incorrect, it was a form of standardization. More recently, I tried to eliminate the “-er”/”-ier” adjective suffix on the basis that it doesn’t apply evenly; for instance, “happier” is a word, but “funner” isn’t. However, “more happy” and “more fun” are both valid, so for a while, that’s what I tried to stick to. (That stopped working the second someone pointed out how stupid “more easy” sounds. Or something like that. And the someone was probably PineConeBoy.)

The point being, even though I did horribly in my formal languages/translators class, I do find the topic near and dear to me. I’ve spent hours completing regex crosswords (and I’d recommend anyone else do as well — they’re a great way to get your head around some of the more complex regex operations like backreferences). I once cleared a competitive programming problem in the first ten minutes of the competition because I was able to quickly construct a pattern to match the solution. Maybe it’s because of this, or maybe it’s the other way around, but it frustrates greatly when others overlook the importance of syntax in effective communication. (I know I go on about that topic a lot, but I do feel it’s significant!)

Some friends and I were hanging out earlier, and the topic came up of how Pickett22 crowdsourced a portion of the proofreading and editing of her master’s thesis to her Facebook friends. It was noted how she had to spend a significant amount of effort (although not as much as if she’d had to do it all herself) proofreading and correcting the returned proofreading and corrections.

Nicho and Rinnly and I have had numerous conversations about the importance of clarity in written informal communication, especially with regards to comments on websites. Nicho sides strongly with the viewpoint that your ability or otherwise to effectively verbalize your message should not decrease its importance as perceived by its recipients. This on the basis that the voices those who haven’t been as privileged in their ability to learn their target language shouldn’t be silenced. To me, that also assumes a great number of ESL communicators. My perspective is maybe a little more cynical: assuming instead that these are native speakers, I feel that if you can’t be bothered to put effort into coherently stating your opinion, it’s probably not worth it to me to spend more time deciphering it than you spent writing it.

I don’t really have a pithy conclusion or even much of a point. Guess I’m just contributing further to the noise. At least it’s mostly syntactically correct.

Oops

I broke my tablet about a month ago. The screen, specifically. Left it charging on the corner of my bedside table then went to grab it without considering the length of the cable. Even Gorilla Glass doesn’t much like a three foot drop onto ceramic tile, it would seem. (Sorry, I didn’t drop it — I used gravity to place it on the floor, as quoth Rinnly in reference to her own recently similarly-affected cellphone.) Funny, second time in a matter of hours that I’d done that, except the first time was with my plastic-screened phone and I don’t know that it even noticed — it’s seen worse. So there’s Oops #1: lack of attention.

Oops #2, it seems, is having ordered a replacement screen as a standalone piece. I knew it was going to be fun to detach the bezel from the old screen, but I didn’t know it was going to be this much fun. Decided not to get one because it would’ve been another $25 after $100 on the screen. Might’ve been worth it. “Recklessness”, let’s call it — the Oops, not the bezel. After half an hour of alternately waving a decades-old hair drier around the edges of the screen in an attempt to loosen up the glue and delicately prying it further and further back, I gave it a little more of a tug, fractured the screen (not that it wasn’t already fractured before, but it’s more to clean up), and realized the glue isn’t simply on the bezel — oh no, that would be too simple. ASUS, in their infinite wisdom, decided that the bezel should be adhered to the screen (and vice versa) by means of a sort of stencil layer of double-sided adhesive plastic. Gee, lovely. So I’ve already torn that in one place. I hope the replacement screen sticks down OK. As far as detaching the old screen goes, I’m going to cross my fingers and hope it’s sunny tomorrow so I can just leave it out in a warm place.

The replacement screen shipped out with something that might help get it to stick: three pieces of 3M 300LSE. Never run into it before, but Google tells me it’s “adhesive transfer tape”. Peeled back a little corner; seems to be really high-quality double-sided tape, like the stuff that’s on there that I’ve partially ruined, but not cut to shape. Might work. Might have to work. I guess that makes Oops #3 my apparent inability to retain anything from the disassembly/replacement video that, for once, I did bother to look up before I was half-way through the process — again, lack of attentiveness.

Something else the replacement came with was a spudger/prying tool. Nice. I’ve been too cheap to get one. Guess this means I don’t have to. Maybe Oops #4 is that I end up trying to do this crap on a semi-regular basis without adequate tools — unpreparedness. Every time I get new tools, I outgrow the problems that required those tools. Never seems to be because I’ve just solved everything. They say that when all you’ve got is a hammer all problems are nails, but it’s a weird sort of hammer to deal with sixteen-foot tall nails made of grapefruit.

I really like The Codeless Code. They’re software development koans. A cool concept. Sometimes they’re a bit far out, but other times, they’re bang on. The most recent one, Ten Thousand Mistakes, is a good one. Reviewing the four Oopses I’ve just mentioned (is immediate retrospect bad? Is that Oops #5?), none of them are new problems to me and #1 and #3 are even duplicates. Eh, I’ve got another nine-thousand nine-hundred and ninety-six mistakes with which to be original.

Windows Hate

I tried installing Windows 8 on my netbook last night. Everything went great: the install only took about fifteen minutes, and it came up with Wi-Fi support, the proper screen resolution, etc. Except, on my 1280×720 netbook, every single Metro app fails to launch, politely informing me that “The screen resolution is too low for this app to run.” Really, Microsoft? You’ll support 1024×768 as a minimum, but 1280×720 – 48 pixels shorter – is just right out? Instead, you’re endorsing the marketing-drone-constructed bastard that is 1366×768 as the lowest-supported widescreen resolution? Well, alright then.

Motivation and Side Projects

For a long long time — starting in 2009, actually — I’ve been hacking away on a PHP framework, writing loosely-connected bits and pieces of code here and there as the mood struck me, repeatedly reconsidering the implementation of components as web development practices continue to evolve and as I continue to learn more as a developer. And at the end of the day, what I consider the “trunk” of the project (including the literal trunk in source control) has never consisted of more than a handful of parts that benchmark blank pages nicely but don’t really do anything.

And every once in a while, some job thing will come along with requirements perfectly suited for the application of that perfect framework, the manifestation of which exists only in my mind. And so I sit down, and in a few days, I hammer out something pretty close to what I’ve got in mind. Sometimes I’ll reuse existing components — I’ve got a database singleton I’ve been hauling around for at least a couple years now just because it works and I don’t have to think about it — but it’ll mostly be original to that project. And when the project’s done, I move on with my endless conceptualizing of something better. I may update the external original of anything I dragged in that that saw updates, but for the most part, I feel that my employer (none of these situations have ever been contract work) owns the implementation I’ve written on their time, so I take concepts with me, not code. Those concepts get rolled into the never-eventual “what I want to have”, and I go back to tinkering until I next need to pull out the stops.

I’m not really sure why I keep this up. Based on the time it’s taken the times I’ve had to finish an implementation — two or three, now — I could, if not finish this, at least have something ready to rumble in about a week straight of plugging away at it. Maybe some part of me enjoys the perpetually unfinished state: it’s a chance to continually explore (or at least contemplate) new development methods and design patterns. I think it’s more likely that without an objective clearly in sight, I find it hard to make progress on milestones. Every time I’ve had to get something operational for work purposes, it’s been a subquest: “vanquish the demon” requires “craft a website”, which has one completion path involving a frameworky backend. But at the same time, I think defining a use for the framework would somewhat rob it of its usefulness; despite the futility and inevitable over-complication involved in any attempt to craft a perfect be-all, end-all base for everything else, something reasonably close to that would be nice. Maybe I can settle for more clearly defining what it must provide.

Or maybe I’m just lazy and get distracted easily and someone throwing money at me makes me more likely to finish something. That could be it too.

LARS

LooseDetective and I completed our project for Software Engineering Thursday night. It was due at midnight, but I didn’t get it e-mailed off until 00:02:55 Friday; multitasking live sound for a band at the student bar* and frantically debugging a JTable model does that to you.

LARS — Like a Rental System — was a huge undertaking, and I’m still not quite sure how we managed to finish it at all, even a couple minutes late. The topic of the course was software engineering: the process by which software is designed in relation to its development cycle. As a class, we were tasked with the architecture of a video store rental system, and split off into self-selected teams to execute it. Thanks to the Asian professor’s thick accent, none of us clued into the fact he really wanted us to implement the system until about a month before its due date; even still, Loose and I didn’t actually sit down and start writing code until last Sunday afternoon.

Still, I’m fairly happy with how it turned out: despite what one might think, there are very few really hackish parts to be found in the codebase (probably to Loose’s dismay, due to the amount of extra time consumed in doing it “the right way”); actually, at the moment, I can’t think of a really good example of one. The design went through several changes as development progressed; the largest similarity can be found between the original design documents and the finished product in the data structures. Its largest issue is severe over-engineering: earlier in the semester, when we weren’t aware that we’d have to implement LARS, we designed an excessive number of features into it, including the ability for some items to be rentable, others just purchasable; price modifiers, allowing the modification of purchase/rental price and rental duration on a per-item basis; and a couple other things like that. In retrospect, had we known that we’d have to implement it, we probably wouldn’t have done that, but I do feel I was the driving factor in adding those, which makes me a bit guilty of the added work put on both of us.

Despite the crunch and stress associated with completing a 4,500-line-of-code project in five days, I’m not sure I, at least, would have been motivated to complete it had I started it any sooner: I feel like I would’ve put it off with thoughts of “oh, that’s started, I can finish that later” and then would’ve ended up rushing to get most of it done in two or three days and made an utter mess of it. But I don’t know; this is the first big project I’ve worked on side-by-side with another coder since Slyf and I did the Smiths Falls Chamber of Commerce website in 2008 (which I won’t link to, as it’s since had its interesting functionality gutted). That was an interesting aspect: even though I didn’t see this article until yesterday, I was most definitely the guy in a room. One may argue that that’s OK in a crunch-time scenario where the others involved don’t have experience with Methodology A or Technology B but I still feel badly about it, and it’s not the first time I’ve noticed this sort of thing: even though Rinnly and I don’t work side-by-side in the same codebase, I feel there are times where I’ve held up project progression or at least hindered teamwork with the magic words “no don’t worry about how x is going to work, I’ll take care of it”.

Anyway, it’s done now. All teams are to present their finished work to the professor (and, as I understand it, the class) tomorrow morning; as over-engineered as ours is, I think we can manage to show its ins and outs pretty quickly and still manage to impress. And then we wait for the mark — it would really suck if the almost-three minutes late does count against us and we get a 15% deduction on what may be an otherwise-perfect submission.

* Speaking of which, Adam and Steve did an absolutely fantastic job and I can’t wait until they next perform. Hopefully they don’t get the clever idea of doing a run of concerts over summer while I’m not here (please? Please don’t?).

Years later edit: I’ve converted the project’s Subversion repository to Git and uploaded it to GitHub.