IMAGEOPTIMISER: POST MORTEM OF A GITHUB BOT

      Comments Off on IMAGEOPTIMISER: POST MORTEM OF A GITHUB BOT

I envisage a future where we pair-program with our computers. I don’t know how to get there, but I know it won’t happen by itself.

In my last post I talked about imageoptimiser. Well, I made the thing and gave it a trial run limited to ~50 projects. The majority of cases were very positive, with maintainers generally being intrigued and grateful. There were two exceptions:

The first was from the maintainers of MacVim. MacVim isn’t a product that benefits from image optimisation, and the measly 89KB saved was a bit of a waste of time. The bot triggered in this case because that was 48% of their assets – over the threshold I had set. I added a minimum saving in bytes for the scanner, but fair enough, maybe some human supervision is needed.

The second case was that of rails_admin. Given that the bot was designed to target popular repos, rails_admin was a smart choice. It saved 411KB and submitted the pull request. rails_admin’s creator, @sferik was aware that this was against GitHub’s policies, yet it also had a clear benefit to him in this case. Should he close the request in disgust, or accept it anyway? He tweets about it, a tweet that gets picked up by Robert McMillan of Wired.

Robert contacts sferik, myself, and GitHub. His article appears on the Wired site and finds the top of Hacker News. There are suddenly a load of eyes on everyone involved. On HN and Twitter, there’s a back and forwards between developers and GitHub employees. Should this be allowed? Does it open a pandora’s box? We’ll come to that later.

Now understand that there’s a difference between a company’s Terms of Service, and the clauses that they feel obliged to enforce. I didn’t get a chance to find out where that line lies in this case. It’s not that GitHub’s response was opaque, but there is an obvious frustration from the GitHub side that they’ve been portrayed as the bad guys. See: “We don’t actually block nice bots”.

I’ve explained a lot, and not said much – so here’s the meat of this post. If I were GitHub I’d have the same policy. It’s easy to imagine swarms of bots on GitHub, fixing all sorts of stupid things, conflicting and getting in the way of the humans. To quote their email to me; “social coding requires people” – and I think this needs at least some level of supervision. It needs somebody to select suitable projects and respond to feedback and comments on a per-issue basis.

GitHub allows API access to repos (including service hooks), but that doesn’t solve the problem of discovery and comes at the effort of the maintainer. People build these things to serve a need, and I want to see GitHub promoting automated services in a way that works for them. Why shouldn’t they have some sort of registry or marketplace like Heroku Add-ons. With more and more services integrating with GitHub (think Travis CI), this should be a no brainer. With any sense, they’re working on it behind closed doors.

It’s clear that people want this. They want to tidy and optimise their repos. With that in mind, I see 4 options going forward:

  1. I make the service opt-in. I have a website with a text box where you request that your repo be scanned. It still operates from the imageoptimiser account. This would let developers pick repos that they think are suitable, a bit like an entirely human-organised bot.
  2. I do the service hook stuff. This is what GitHub want me to do. Repo admins (and only repo admins) can set up the service to scan their repository whenever they push. Although post-commit scanning is a good idea, I can’t help but feel it limits the service’s potential.
  3. I keep the same codebase, and use it as a tool from my personal account. I select suitable repositories, check that the savings are decent, and hand-craft the pull request text. I’d actually be pretty happy to do this, my only concern is how it would clutter my GitHub account.
  4. Not a ‘bot’, not so much a ‘service’, but a tool. A tool that anyone can sign in to, and use to optimise repositories. This is similar to option 1, but would see the contributor take personal responsibility for the pull request. They select the repo they want to scan, write the pull request, respond to feedback and answer comments from their own account. Imagine someone new to open source or feels like they have little to contribute. They’re suddenly empowered to improve open source projects.