This blog post is hopefully the first of many about a “new” old project that I’ve been passionate about for some time now.
New because I now have the tools to implement previous fantasies. Old because some of those who know me will appreciate that I’ve tried this project several times, once as a research project (98) within my first company (Magic E), second as Thumbjot (with the help of then-Rubyist and now Facebook designer, but mostly spiritual kindred spirit Jason Cale) and later as ThumCrowd.
All of them failed, but for good reason.
Even so, I’m proud to say that Thumbjot was the first iPhone productivity app to get an Apple staff pick (the listing now expunged from their iTunes archives) and was also listed as the second most useful app by CBS News, second only to Facebook, and above Google. Wow! What lofty company.
Later – many years later – I tried to explain the cognitive underpinnings of Thumbjot to Google as a paradigm for “Augmented Cognition” that could form the goal of a platform like Google Glass. I will get to that in due course, most likely in a later post, but suffice to say that I’m not working at Google on Glass despite meeting with Mr Brin himself who was already on the cusp of losing interest in what is possibly one of the most expensive hobby projects of all time (apart from, of course, various wars).
I still believe in a wearable device with – and this is key – a persistent interface with my consciousness (unlike watches, which to me remain an unconvincing computation and UX paradigm, despite their popularity. McDonald’s “milkshake” is also popular.)
I think we are not that far from usable brain-machine interfaces, at least at the signal level. Of course, making sense of those signals is another matter and clearly subject to profound mystery – i.e. the “Language of Thought.” If I were lucky enough to have made gazzillions from any of my inventions, then I would probably be building such things, or any project that is unlikely to be solvable in my lifetime.
Considering LOT as a mystery is deeply connected to my codename for the project – “Project Chomsky.”
Anyone who follows his (philosophical) ideas closely will know what I mean by mysteries. They will also know about Chomsky’s oft-repeated refrain that “discovery is the ability to be puzzled by simple things.” It is with this spirit that I remain deeply puzzled by many simple things, including why my computer interface is so dumb.
In essence, nothing much has changed with computer interfaces since Microsoft, God bless them (I guess), invented Windows. Well, we all know they didn’t. Supposedly it was PARC. That reminds me of how unexciting it was when I finally got the chance to pay homage to PARC by driving out to the hills of Palo Alto and sitting for a while in my car in the PARC parking lot.
It reminded me of the peculiar nature of culture, especially tech culture. I won’t bore you with cultural theory here, but those that talk about such things will enthuse about influencers (“taste makers”) and so on. It upsets me to see how much this is part of tech culture where you would think (of course, naively) that technical folk were more inclined towards a Bayesian belief system than the follies of hyperbole and alls its illogic.
This slight sojourn in my post is to offer a quick lament about the susceptibility of believing in “tech trends,” such as bots. I mention bots because conversational interfaces are truly important and related deeply to Project Chomsky, yet I almost cried when I saw how crude (lacking in any intelligence, even artificial) the intent definition method is for building a so-called “Skill” (app) for Amazon’s Echo (Alexa).
Bots make sense as an ecosystem play for the obvious candidates, but as a this-tech-is-cool “meme” they are rather dubious.
Playing for a moment with North Face’s “Intelligent” shopping bot reveals how useless some of these “conversational commerce” ideas can really be, even when supposedly driven by IBM Watson (which itself is a faux-meme for “massively intelligent”).
Like many bot interfaces, it soon becomes rather obvious that it’s nothing more than a dressed up decision tree a.k.a. left nav filter for e-commerce. The “convenience” might make sense inside of a chat interface (like Skype or FBM), but as a genuine substitute for product discovery…. well.
Well… there is some hope for “conversational” commerce because how humans think about many subjects is very different to how they talk about them, which is different still to how they rely upon established taxonomies as proxies for decision making. A truly conversational UX ought to invoke the vernacular of how humans really speak about a subject, but this requires a lot more computationally cognitive smarts than most bot platforms currently offer.
By the way, when I couldn’t get Microsoft Cognitive services to connect with my ridiculously annoying Microsoft Live SSO, it told me “Oops!” but I’m sure it neither meant it, nor knew what it meant. I think that 404, or 500, would have been more honest.
But this awkward fumbling through the North Face app (or even IBM Watson’s embarrassingly simply Pizza app) leads me nicely to the crux of this post. Yes … I know … already TL;DR, but at least it’s natural (to me).
There is no such thing as talking to a machine in so-called natural language if one assumes that by natural we mean as we are naturally accustomed to speaking to others.
It has been known for some time (actually postulated by Aristotle) that spoken communications between two human beings involves a crucial feedback loop known as the “Theory of Mind“, an idea that has been more recently validated and explained by the existence of mirror neurons.
A machine is, well, we don’t know, do we? As in we don’t really know how our minds perceive Siri, Alexa and other silicon voices. What is our “Theory of Machine?” Do we see them as “persons,” like in the film Her, or as something else?
There is still plenty of research to be done about the cognitive aspects of talking to machines, but one thing is perfectly obvious.
We often fail to use the right command or expression when talking to a machine and then end up trying to guess what we think the machine wants us to say. It is similar to Theory of Mind as in we know how to navigate this process of ambiguity with other humans, yet it is weirdly difficult when talking to a machine. There are potential strategies, yet most bot interfaces seem to lack them.
But there is another more crucial role of “Theory of Machine” that I hope to exploit in Project Chomsky and that I hope to explain, along with the project’s goals, in my next post.
Until then: “Hey Alexa, 张贴此博客.”