Tuesday, September 30, 2003

Metadata, Semiotics, and the Tower of Babel

Speaking of bubblets, looks like we're seeing the beginning of one around metadata. Joi Ito thinks (worries?) that metadata is so important it may be a major weapon for Microsoft. It's going to be so huge that we'll soon devote most of our storage space to metadata . In fact, it's going to be a semantic gold mine.

Rubbish. Unless you can repeal human nature.

As you can guess, a rant follows. A few clarifications before plunging in. First, I am not talking about simply using a standardized data structure to express information can be collected as a side effect of tool use, e.g., story divisions, permalinks and other inherent structural information in a blog. These are very sensible features, long present in boring tools like Word, and aren't in any danger of being a Big New Thing. I am talking about things like GeoURL and FOAF that require incremental efforts, and even more about grand taxonomic metadata schemes that will overturn Google and/or result in the second coming in the form of the Semantic Web. Second, the canonical plain English rant on this topic has already been written by Cory Doctorow, two years ago. I recommend it, in fact I will cite it as a gloss to this rant. But I have my own hobby horse to ride here. Third, for those who spied a certain word in the title and are considering drastic action: Take. Your. Finger. Off. The. Weapons. Release. Button. Now. I am not about to justify transcultural moral relativism or other silly-ass over-interpretations by French Persons.

That word is semiotics, of course. If you want the full French treatment of the topic (in English), you might start here. But I'll go with a engineer's gross simplification: Semiotics is the observation that words and other symbols are interpreted differently by different people, and that the disparity of interpretation is affected by the abstractness of the notion symbolized, and cultural and other differences between the people. A matter fairly easily demonstrated: There is likely a decent global consensus, at least among engineers, on what is meant by '1Ghz Pentium III'. You can go into any hardware store in America, ask for a 6-32 x 2" machine screw, and get one without further discussion. But don't try it in a metric country. Moving to the abstract, here we spy part of a rational discussion among two experienced engineers regarding the word 'friend'. And here we have a 'friendly' cross-cultural dialog among statesmen regarding the notion 'allies.' QED. If you want more nuance, go off to the formal treatment, start off with 'signifier/signified', and surf away. Be careful. I'm informed that too much of the stuff can cause you to believe that you can't communicate with anyone about anything, that everything you think is predetermined by your culture and/or class, or cause you to put scare quotes around every other word Try occasionally kicking a large rock, hard, while reading. It might help.

In spite of the risks, it's been shown possible to get productive use from this theory. For instance, Dina Mehta outlines its applications in design, and among other useful links, points to a more cut down overview of semiotics, possibly of use to that profession.

Having spent some years working with unstructured text and hypertext databases, I'm willing to suggest that the core notion of semiotics is in fact a useful engineering maxim, a True Theory of how humans behave in the context of symbolic systems. Like the laws of thermodynamics in energy systems, semiotics proposes a hard limit to the efficiency of any situation involving externalized representations of human thought. You can process character strings or other computational representations as long as you want, but just as the map is not the territory, the symbol is not the thought of its author, nor the thought elicited in an eventual reader. Even if all the ambiguity inherent in messy languages like English were eliminated, this would remain.

Anyone who has built or been a heavy user of text databases or similar systems has run into this problem repeatedly. Fancy lexical or statistical processing does help, as does the integration of information like link patterns, but at the end there is always a significant and irreduceable noise, that means one either has a certain amount of garbage in the output (from the view of the user), or is dropping some amount of useful information. Nor is there a way out by using an artificial set of symbols, e.g., 'controlled terms', taxonomies and the like. In the large, with a heterogenous set of users, these perform no better than grinding up plain text. In fact, it creates a secondary problem of inconsistency among the indexers employing the artificial symbol set, letting the ambiguity of language in the original documents back in at the rear door (Cory 2.4). And far from being neutral, any such artificial attempt to expunge the complexity of natural language does so by embodying a particular theory of importance, an intrinsic point of view, that will gain efficiency in one constrained setting at the price of being useless in others - and having no way to tell the difference. (Cory 2.5)

Now why should we suspect that taking character strings, and wrapping them in XML or RDF is going to change all of this? The syntactic sugar is all wonderful, and indeed a better mousetrap from the POV of systems integration, but the real basis for the blue sky claims that we're approaching Semantic Web nirvana is bound up in the signifiers, the symbols, that are to be wrapped in that sugar. Is there some magic in angle brackets that was not found in LISP parentheses, that will repeal human nature and semiotics? I think not. Call it a taxonomy, a controlled vocabulary, a metadata dictionary, it's all the same thing: yet another language, either small and brittle, or large and ambiguous. Either way, just another layer on the Tower of Babel.

Coming soon: One place where the French and the Chicago school agree: economic reasons why the Semantic Web is a crock.
7:18:31 PM    


Welcome another biz/tech blogger

Umair Haque is blogging from the UK at Bubble Generation. Among others, he's got interesting posts on the impact of RIAA policies on developing economies and cultures and why ibankers shouldn't be giving advice on tech venture strategy. Oh, and he thinks there's a bubble in social software.
2:48:38 PM    


Another WMD threat to the civilized world

I suspect my wife's opinions on my tastes in hard liquor have just been confirmed. Wonder if they need any more weapons inspectors on this beat? (Hat tip to the Armed Liberal)
2:40:23 PM    


Oligopoly? Monopoly? Let's call the whole thing off...

Barry Ritholtz posts a rebuttal to Kevin Laws' most recent description of the real music monopolists. Kevin replies in the comments. I'm with him: Ritholtz' argument fails because music is not fungible. Maybe a casual listener might say (for instance), one bluegrass act is as good or bad as another, but no real fan would agree. In one of my favorite genre, no one's going to confuse the Chieftains for Clannad or the Afro-Celts, nor feel they're equivalent in setting a mood.

I do have my own issue with Kevin's post, where he equates monopoly/oligopoly to high margins. That needn't be the case, if the market in question is elastic, and the good sold is substitutable (as versus fungible). In that case, putting up prices to improve margins will drive down sales and send the customers away to another market. If the business has high fixed costs (think promotion), the net margin position won't improve, and the top line may fall. Sorta like what's actually happening....

Update: Kevin replies in e-mail:

To take in consideration as you post -- Bruce Springsteen's substitutability isn't huge. Mine (as a musician, anyway) is -- I have no fan base, but you might try me if I were cheap enough and somebody told you to try it out. So implied elasticity for established superstars is small, while for new artists it is quite large.  

Thus the goods are only highly substitutable outside the established end of the market...

My comment: I think we've gotten into a language problem around the context of substitutability. One music act is not the equivalent of another - though the degree may differ by artist and genre - but the entire media experience of music may well be substitutable - as entertainment - with another media option, e.g., DVD, online. Put up the prices, reduce the apparent value for money to the customers, and they will find another use for their dollars and time. If they feel they are being 'cheated' in some sense, compared to options they already know are available, they're just that more likely to substitute.

Update 2: And Kevin sez:

Oh, yes, I agree with you there. Thus the conundrum the industry's gotten itself into -- by gouging (perhaps appropriately) on each individual basis, they've priced themselves out of the market as a form of entertainment in aggregate. The interesting point there is the extent to which it is against each individual's artists (or release's) interests to be the one to go lower in price. Oddly, only an industry with significant monopoly power could force prices down in aggregate ;-)

10:01:37 AM    

Welcome back, Stefan Smalla

Germany biz-blogger Stefan Smalla has finished his thesis, and is back from blogging hiatus. He's also launching a subsidiary page concentrated on annotated links as opposed to essays. All hail the soon-to-be-minted Diplom-Kaufmann!
9:46:23 AM