More Than Nothing

“You mean you can't take less; it's very easy to take more than nothing.”

If it’s not threadsafe, just don’t try

Posted on November 1, 2011 by [ICR] under Java

I recently got burnt by this at work. Even though I knew HashMap isn’t threadsafe it was apparently working and rewriting to not use HashMap was work I didn’t have time for now and was scheduled for later.

It appeared to be working because, whenever I printed out all the keys and values in the HashMap they were all correct, even when the overall result of the algorithm was wrong.

After much debugging it turned out it was the problem. This little sample should demonstrate it:

import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class Test {
	static final int NUM_THREADS = 4;
	static final int NUM_ELEMENTS = 4;
	static final int NUM_RUNS = 1000000;

	static class Tester {
		Map map = new HashMap();

		public void run() {
			for (int i = 1; i < NUM_ELEMENTS; i += 1) {
				map.put(i, i);
			}

			Runnable runnable = new Runnable() {
				public void run() {
					for (int i = 0; i < NUM_RUNS; i += 1) {
						map.put(0, 0);
						Set elements = map.keySet();
						if (elements.size() != NUM_ELEMENTS) {
							System.err.println("[" + Thread.currentThread().getId() + "] numElements=" + elements.size());

							for (Object element : elements) {
								System.err.println("[" + Thread.currentThread().getId() + "] " + element + "=" + map.get(element));
							}
						}
					}
				}
			};

			for (int i = 0; i < NUM_THREADS; i += 1) {
				new Thread(runnable).start();
			}
		}
	}

	public static void main(String[] args) {
		while (true) {
			new Tester().run();
		}
	}
}

If you leave it running for a bit (the higher NUM_THREADS is the quicker it should happen) you'll eventually start to see that map.keySet().size() is way off - 11, 14, 23 - each thread will start to have its own incorrect value for elements.size(). But - and here's the weird bit - the keys and values inside map are all correct.

The take-away lesson? Even if you think something is working fine, if it's documented as not being threadsafe just don't try it.

1 Comment »

FizzBuzz Golf

Posted on September 10, 2011 by [ICR] under C#, golf, Programming

Someone linked to http://golf.shinh.org/p.rb?FizzBuzz today and I couldn’t help but have a go. Currently I’m ranked 20 in the C# solutions, but I just can’t seem to shave off those last few characters. [The following code is obviously unminified]

class X {
	static void Main() {
		for (int i=0; i<100;) {
			var z = ++i % 5 > 0 ? null : "Buzz";
			System.Console.WriteLine(i%3 > 0 ? z??i+"" : "Fizz" + z);
		}
	}
}

While googling around for inspiration on other possible angles to attack the problem I saw many so-called “good” and “simple” solutions, all of which looked far too complicated to me. So here’s what I think a good solution looks like.

using System;

static class FizzBuzz {
	static void Main() {
		for (int i = 1; i <= 100; i += 1) {
			var fizz = i % 3 == 0;
			var buzz = i % 5 == 0;

			if (fizz) {
				Console.Write("Fizz");
			}

			if (buzz) {
				Console.Write("Buzz");
			}

			if (!fizz && !buzz) {
				Console.Write(i);
			}

			Console.WriteLine();
		}
	}
}

It doesn't over-complicate things - there's no need to build up intermediate strings or to have nested branches - and it's quite clear from reading it what it does.

Now if I can just squeeze those last few characters out I can get to the top of the chart…

No Comments »

Leaving Google+

Posted on August 28, 2011 by [ICR] under Uncategorized

It's now come to the point where I can no longer, in good concious, continue to use Google+. Not while they continue to try and enforce their so-called "real name" policy.

There are many perfectly legitimate reasons why people would want to use different names. What's more, there are many legitimate reasons why people might want to present different names to different people. With circles, Google+ is half way to a technical solution for that – but they're hindered by policy.

I understand the desire for a stable identity. It breeds a more responsible community and it allows Google to build services around it (including, but by no means limited to advertising). But a stable identity does not necessitate a single real-world name.

Not that it's currently implemented as a real-world name. In their desperate attempts to stamp down on internet style pseudonyms they're banning people's legal names. Yes, someones first name is M3 and plenty of people do have only one name – not just celebrities – and yes, people do have a mix of non-english and english characters in their name.

As for the old "If it bothers you, you don't have to use it" line, try telling that to someone hiding from an abusive partner as their work colleagues try to get them to join in order to share the company BBQ pictures.

1 Comment »

MinValue/MaxValue Extension Method

Posted on May 10, 2011 by [ICR] under C#

Quite a lot of times it seems these little extension methods come in handy. They give the maximum or minimum value(s) from an IEnumerable, optionally applying a projection for the ordering.

They have better complexity characteristics than ordering and taking the first item, which requires time and space to order pairs of items that we aren’t interested in.

I could have used Aggregate instead of manually iterating over the enumerables. For Max/MinItem this would have made for much more readable code, but the implementations of Max/MinItems would have been either obscure or less performant, and I wanted to use a similar implementation for both.

I’ve only given the implementations that use a projection to int to pick the maximum or minimum value, but if needed it is trivial to implement them using other orderable data types.

It’s all pretty simple stuff – it’s just remembering to start with the first value that could possibly trip you up.

public static int MaxItem(this IEnumerable<int> list) {
    return MaxItem(list, x => x);
}

public static T MaxItem<T>(this IEnumerable<T> list, Func<T, int> selector) {
    var enumerator = list.GetEnumerator();

    if (!enumerator.MoveNext()) {
        // Return null/default on an empty list. Could choose to throw instead.
        return default(T);
    }

    T maxItem = enumerator.Current;
    int maxValue = selector(maxItem);

    while (enumerator.MoveNext()) {
        var item = enumerator.Current;
        var value = selector(item);

        if (value > maxValue) {
            maxValue = value;
            maxItem = item;
        }
    }

    return maxItem;
}
public static int MinItem(this IEnumerable list) {
    return MinItem(list, x => x);
}

public static T MinItem<T>(this IEnumerable<T> list, Func<T, int> selector) {
    var enumerator = list.GetEnumerator();

    if (!enumerator.MoveNext()) {
        // Return null/default on an empty list. Could choose to throw instead.
        return default(T);
    }

    T minItem = enumerator.Current;
    int minValue = selector(minItem);

    while (enumerator.MoveNext()) {
        var item = enumerator.Current;
        var value = selector(item);

        if (value < minValue) {
            minValue = value;
            minItem = item;
        }
    }

    return minItem;
}
public static IEnumerable MaxItems(this IEnumerable list) {
    return MaxItems(list, x => x);
}

public static IEnumerable<T> MaxItems<T>(this IEnumerable<T> list, Func<T, int> selector) {
    var enumerator = list.GetEnumerator();

    if (!enumerator.MoveNext()) {
        return Enumerable.Empty<T>();
    }

    var maxItem = enumerator.Current;
    List<T> maxItems = new List<T>() { maxItem };
    int maxValue = selector(maxItem);

    while (enumerator.MoveNext()) {
        var item = enumerator.Current;
        var value = selector(item);

        if (value > maxValue) {
            maxValue = value;
            maxItems = new List<T>() { item };
        } else if (value == maxValue) {
            maxItems.Add(item);
        }
    }

    return maxItems;
}
public static IEnumerable MinItems(this IEnumerable list) {
    return MinItems(list, x => x);
}

public static IEnumerable<T> MinItems<T>(this IEnumerable<T> list, Func<T, int> selector) {
    var enumerator = list.GetEnumerator();

    if (!enumerator.MoveNext()) {
        return Enumerable.Empty<T>();
    }

    var minItem = enumerator.Current;
    List<T> minItems = new List<T>() { minItem };
    int minValue = selector(minItem);

    while (enumerator.MoveNext()) {
        var item = enumerator.Current;
        var value = selector(item);

        if (value < maxValue) {
            minValue = value;
            minItems = new List<T>() { item };
        } else if (value == minValue) {
            minItems.Add(item);
        }
    }

    return minItems;
}
No Comments »

Why Javascript appears to hoist variables

Posted on February 2, 2011 by [ICR] under Uncategorized

JavaScript can appear to do odd things if you’re not familiar with it’s scoping rules. Take the following as an example.

var i = 0

function foo() {
    console.log(i)
    var i = 1
    console.log(i)
}

If you’ve not come across this specific example before, you might be surprised by the result:

undefined
1

Searching on the Internet, you can find plenty of explanations of this variable hoisting behaviour. Most liken it to the following example:

var i = 0

function foo() {
    var i
    console.log(i)
    i = 1
    console.log(i)
}

This helps describe what is happening, but it doesn’t explain why JavaScript does this.

To do that, we will need to dive into how scoping works.

Javascript, like most modern languages, is statically scoped. In contrast to dynamic scoping, this makes it a lot easier to reason about and optimise code. Before the code is run, scope analysis is performed. The code is split into different “scope blocks”. In JavaScript, each function creates a scope bock. In other languages, such as C or Java, scope blocks are also created by things like loops and branches.

Each variable declaration adds that variable name to it’s immediate scope block. In JavaScript, if multiple variable declarations with the same name occur in the same scope, they refer to the same variable within the same scope. In languages that don’t allow variable redeclaration, this would generate an error.

Whenever a variable is referenced, it is looked up in the current scope. If it is not found there, it recurses to the parent scope.

If we look at the first example again with our new found knowledge of how scoping works, we can see exactly why the first reference to “i” in “foo” is undefined.

Within a given scope, all variable references using the same variable name refer to the same variable.

But why is this such a problem? If you are given a program and told it is semantically correct (it compiles/is able to be executed), can you follow the code linearly as it would run and tell what it will do at any given point?

If you had the following fragment of a semantically correct c# program, you would be able to say that at line 4, it will print 10.

int i = 10;

void Foo() {
    Console.WriteLine(i);
...
}

But given this same semantically correct JavaScript fragment, you cannot say at line 4 what it will print without first reading the rest of the scope to see if it is referring to the local scope, or some parent scope.

var i = 10

function foo() {
    console.log(i)
...
}

Its fairly trivial in these small examples, but in larger programs it can cause hours of debugging if you’re not looking out for it.

Static scoping makes our code easier to reason about, but it can also lead to odd situations like these. Other languages avoid this problem by disallowing references before assignments.

Next time we will look into how javascript’s function scoping exacerbates this problem, and why it uses function scoping in the first place.

1 Comment »

Tabs vs. whitespace

Posted on January 11, 2011 by [ICR] under Programming

I was so very tempted to post a response to this question on tabs vs. spaces on SO today. (I did actually write one, but deleted it and voted to close for being too subjective) I thought it would be better to post my response here instead.

The question is which is better for formatting code indentation, tabs or spaces? My answer? Both.

The advantage of tabs is that it avoids another of the religious programming wars, how big should your tabs be? 4 characters? 8? 2? My personal preference is 4, but why should I get to decide if someone else reads the code better with 8?

The big disadvantage of tabs, which is the problem that spaces solve, is alignment with non whitespace. Take, for example, the following bit of code:

public void Foo() {
    DoSomething(a, b, c
                e, f, g);
}

In my editor, where the tab width is 4, everything lines up nicely. But what about my team mate who uses, *spits on the ground* 8?

public void Foo() {
    DoSomething(a, b, c
                        e, f, g);
}

The prettiness is starting to unwind. This is the big reason why many people advocate using spaces. It avoids this exact problem when people assume a certain tab width.

But there is a way of getting the best of both worlds. If indentation is done using tabs, but lining whitespace is done using spaces then people get to choose how much things are indented, but things still line up just fine.

public void Foo() {
 → DoSomething(a, b, c
 →             e, f, g);
}

The big problem with mixing whitespace like this is that it is often very difficult to maintain. Most people forget at least a few times. The easiest way to remember is to turn on the auto indent feature of your editor, then only ever use spaces when you need to indent more (assuming your editor doesn’t try to be too clever).

No Comments »

The problem of visibility controls in new media

Posted on August 14, 2010 by [ICR] under Twitter

With the rising popularity of new media such as Facebook and Twitter, companies have started using it as an extension of their customer service experience, responding directly to customer queries and complaints.

But often these aren’t in response to direct communication, but the result of the company earwigging in on conversations and comments people are making on their feeds. This unnerves some, seeing it as an invasion of their privacy, while others seem to dismiss their fears – they made the information public in the first place so should not be surprised when companies read it.

To me this highlights a complex and unsolved problem that we are currently facing with new media.

There are two factors to be considered: visibility and intent.

Visibility determines who can view your content. Different services provide different granularity; Facebook allows you to specify individual groups or people who can view each individual piece of content, while Twitter only provides the choice between visible by all and visible by a list of vetted followers.

The visibility/privacy policy demonstrates the differences in how the services are intended to be used.

Facebook is geared towards managing a complex social network of friends, sharing different sorts of media such as text, photos and video as well as experiences through their app framework. The fine granularity of the visibility controls is necessary to manage the complex friendships people build. The view of Facebook as a place to spend hours at a time means that the overhead of managing such granularity is subsumed by the time creating and consuming content.

Twitter has less of an emphasis on mutual friendships and complex social networks, instead focusing on a simple asynchronous model of followers. Twitter’s positioning as a service that you can update on the go, combined with its purposeful simplicity means more fine grained visibility controls would complicate the tweeting process.

Intent is the intended audience of content. A particular piece of content may be intended for everyone to view, or for only a specific group of people. The span of potential intended audiences is vast. You may intend for a comment about a surprise birthday event to be visible by everyone you know but the birthday boy/girl. A particularly rude joke may be perfectly fine for your friends, but you’d rather people at your work not see it.

These examples of intended audiences are catered for, with differing effectiveness, by visibility/privacy controls such as Facebook’s. There are, however, intended audiences that the current model of visibility controls cannot, and possibly will never be able to capture. Take, for example, the intended audience of most people’s tweets – people who are interested.

It is impossible to know ahead of time who will be interested in me as a tweeter, but if someone is interested I want them to be able to see my timeline. It is impossible to capture this intent with even the fine grain controls of Facebook, let alone Twitter’s. It can be approximated by making my tweets private and being liberal with who I allow to follow me, but this not only introduces overhead on my part, but also prevents people who are not already aware of me discovering the sorts of things I tweet and deciding if they are interested.

There is a gap between the models of visibility that can be expressed and the model of visibility that needs to be expressed to reach the intended audience. How we solve this problem, I don’t know, but we cannot simply disregard people’s feeling of their privacy being invaded simply because the current tools do not let them adequately express their intent.

No Comments »

cmd-notify: Push notifications for long tasks

Posted on July 27, 2010 by [ICR] under Programming, Python

I just started a new job, and as part of the induction we had to do a lot of complete builds, both for an application server and for a pretty substantial database. Needless to say, these took quite a long time.

I found myself wishing there was a way for the computer to notify me that it was done, so I could go elsewhere and do something interesting while it plodded along.

To this end, I wrote the following little python script, which uses notifo to send you a push message on your iPhone when a command has finished executing.

#!/usr/bin/env python

from notifo import Notifo
import os, sys

api_user = [API_USER]
api_key = [API_KEY]
notifo = Notifo(api_user, api_key)

def print_usage():
    sys.stderr.write('''Usage:
  %(command)s register <username>
or
  %(command)s <username> <command>
''' % { 'command': sys.argv[0] })

def register(username):
    result = notifo.subscribe_user(username)
    sys.stderr.write(result["response_message"] + '\n')

def notify(username):
    result = notifo.send_notification(username, 'Command \'%s\' has completed' % command)
    if result['status'] == 'error':
        sys.stderr.write('cmd-notify error: ' + result["response_message"] + '\n')

if len(sys.argv) == 3 and sys.argv[1] == 'register':
    username = sys.argv[2]
    register(username)
elif len(sys.argv) < 4:
    print_usage()
else:
    username = sys.argv[1]
    command = ' '.join(sys.argv[2:])
    os.system(command)
    notify(username)

First register your account to receive push notifications by running cmdnotify.py register <username>. Once you've accepted messages from the service, simply use cmdnotify.py <username> <command> (e.g. cmdnotify.py ajanuary build_db $DB_NAME). When the command has finished, you'll get a push notification letting you know it's done.

1 Comment »

Value Types vs. Reference Types vs. Passing by Value vs. Passing by Reference

Posted on May 26, 2010 by [ICR] under Programming

I’ve seen a lot of people getting confused about value/reference types, and passing by value/reference, so I thought I’d write a post to explain it.

These are two different phenomena with confusingly similar names. Value/reference types are to do with how values are copied when assigning from one variable to another. Passing by value/reference is to do with how values are copied when passing a variable as an argument to a method. You can pass a value type by value, or pass a value type by reference, and you can pass a reference type by value, or pass a reference type by reference The important thing to remember is that, whilst both references, reference types and passing by reference are two different sorts of references.

Let’s begin with value and reference types. The main distinction between value and reference types is that value types have deep copy semantics, whilst references have shallow copy semantics.

This means that when you assign a value type from one variable to another, the whole value is copied. There now exists two completely separate, though identical, objects. If you change the attributes of one object, it is only changed on that object – the other object is completely separate.

When a is assigned to b, the value of a is copied and assigned to b. When we change the Name property on b, it only changes on b because a refers to a completely different object.

Reference types, rather than containing a value, contain a reference to the value. The shallow copy semantics mean that, when a reference value is assigned from one variable to another, only the reference is copied. There still exists only one object, but now both variables point to the same object. If you change the attributes of the object through one variable, these changes can be seen by accessing the attribute through the other variable.

If, rather than changing an attribute of one of the variables containing a reference type, we change which object the variable refers to, this change is only reflected in that variable. It does not change the value the other variable refers to.

Assignment changes the value of the variable – in the case of a reference type this value is a reference to an object. Assigning a new reference changes only the value of the variable, that is it changes which object it refers to. The object it used to refer to, and any other variables that also refer to that object, remain unchanged.

Moving onto passing by value/reference. Passing by value creates a copy of the variable before passing it into the function – this would be the object in the case of a value type, and the reference to an object in the case of a reference type. Passing by reference holds onto a reference to the original variable being passed into the method – anything done with the argument is redirected to the variable passed in, whether it be changing an attribute of the object, changing the value or changing the reference.

For the purpose of the examples, let us suppose we have a method Foo which takes one argument, person. The body of Foo either changes the Name property on person, or changes the value of person. The third line of code in the example pictures donates which of these happens.

Firstly, we can pass value types by value. The value of the variable is copied (using the deep copy semantics described above) before being passed to the method. Changing attributes on the argument or changing the value of the argument has no effect on the variable being passed in as they are completely separate objects.

We can also pass reference types by value. The value of the variable – that is the reference – is copied (using the shallow copy semantics described above) before being passed to the method. Changing attributes on the argument are reflected in the variable being passed in, as both the variable and the argument refer to the same object.

Because a and person refer to the same object, changing the Name property through person can be seen by inspecting the Name property through a.

Changing which object the argument refers to, however, has no change on which object the variable refers to.

Changing the value of person – that is changing which object it refers to – does not change the value of a.

Now let’s suppose the arguments for Foo are passed by reference. When passing either value or reference types by reference, any operations (assignment, changing attributes etc.) on the argument are redirected to happen on the variable.

Changing the value of an argument passed by reference changes the value of the variable being passed in, whether the variable is a value or a reference type.

Usually languages default to pass by value. Some, such as C#, offering pass by reference as an explicit option.

Hopefully, despite value and value types being different, and the reference in reference types and pass by reference being of different sorts, this has made things a little clearer. If not, please ask questions in the comments and I shall endeavour to clarify, either in comments or by changing the post.

No Comments »

Replacing mew lists with iterators

Posted on May 7, 2010 by [ICR] under Mew

[Disclaimer: The following post is all speculation. None of it will necessarily get implemented]

Last time I talked about how to add iterators to mew. However, writing iterators in this way can be a bit of a pain. So how can we make this easier? Generators.

Generators allow us to write a function just as we normally would, but instead of having a single return point, we yield values. Yielding a value suspends the execution of the current function and returns the value. The next time the function is called it continues from where it left off. They are, technically, a specialised form of coroutines which always yield to the caller.

def one-to-five {
    yield 1;
    yield 2;
    yield 3;
    yield 4;
    yield 5;
};

print (one-to-five); # prints 1
print (one-to-five); # prints 2
print (one-to-five); # prints 3
print (one-to-five); # prints 4
print (one-to-five); # prints 5

In order to use this concept to help write iterators in mew, we need to change how the yield statement works. Instead of yielding a value, it returns the value and a continuation: the rest of the functions computation (including the environment) wrapped in a thunk. We are effectively returning an iterator cell.

Writing iterators is now a lot simpler and cleaner.

def to-iterator @{ xs :
    if { list.empty? xs } {
        .end
    } else {
        yield (head xs);
        recurse (tail xs);
    }
};

def naturals {
    def x 0;

    while { .true } {
        yield (+ x 1)
    }
};

We can also write some functions, such as map and filter, to help consume the iterators.

The problem is mew is meant to be small, yet we now have two different concepts of what a collection is: an eagerly evaluated list, or a lazily evaluated iterator. Can we re-write lists to be a special form of iterator?

In order to do so we need to replace lists with iterators as the built-in concept of collections. We can’t use our current form of iterators, built upon lists, if we’re basing lists on iterators.

Let us suppose we’ve now done that, and all the wrinkles have been ironed out. Because we kept the structure of our iterators close to the existing concept of cons lists, this shouldn’t be too difficult.

A list can now be represented as a series of yield statements.

[1 2 3 4 5] => { yield 1; yield 2; yield 3; yield 4; yield 5; }

This does have the disadvantage that lists now incur the overheads of continuations. It is possible, however, to detect and optimised a straight-forward series of yield statements and apply traditional list optimisations. Only the behaviour of the list needs to match that of iterators, not necessarily the compiler/interpreters implementation.

We’ve now replaced lists with the much more powerful and flexible concept of iterators. There are still many details to sort out: should there be a separation of iterables and iterators? how would current list operators work with iterators? what is the best way to implement continuations with minimal overheads? I do, however, think this is a promising and worthwhile road to go down.

No Comments »