Transcripción español

Muchas gracias a Sergio Ocaña Gálvez (sergio@istudio.es), quien hizo esta transcripción.

EL PROBLEMA

John Hebeler (0:02): El principal problema reside en que nuestra habilidad para generar información ha excedido con creces nuestra habilidad para gestionarla. Se podría decir que estamos ahogándonos en nuestra propia riqueza; eso es lo que está ocurriendo, porque disponemos de toda esa información, todos esos puntos de acceso y verdaderamente no hay una manera efectiva de ayudar a controlarlo todo a excepción de aquella información que seas capaz de almacenar en tu cerebro. Y no puedes memorizar tanta. Así que tenemos todo ese enorme potencial pero no disponemos de herramientas para controlarlo.

~00:29: web 3.0, Un documental de Kate Ray

~00:45: Centro de Datos de HE.NET, Fremont, California

David Weinberger (0:33): Tenemos tanto material con el que enfrentarnos. Individualmente, a nivel cultural. Tanto…que simplemente sobrepasaría los limites físicos de cualquier biblioteca. Sabes, si aplicáramos el sistema Dewey para clasificar todo el contenido en la web, el trillón de páginas y sub-páginas disponibles, no seríamos capaces de encontrar nada, el sistema simplemente no funcionaría.

Toma en el Centro de Datos de HE.Net: 800 armarios = 9,600 terabytes = 9.6 billones de libros = 1,690 Bibliotecas del Congreso de los EEUU

~01:13: Clay Shirky, Escritor y profesor en Nuevos Medios de Comunicación, Universidad de Nueva York

Clay Shirky (1:09): La cantidad de información disponible al usuario medio es enormemente superior a cualquier otra existente en la historia de la humanidad. Si fuera mañana a comenzar un nuevo negocio, empezaría uno de noticias diseñado no para proporcionar un nuevo hilo de noticias al usuario sino para agregar todas aquellas noticias existentes que le importaran realmente.

~01:38: Nova Spivack, Emprendedor en Web Semántica

Nova Spivack (1:32): Google fue creciendo en importancia a medida que la web se extendía por millones de páginas. En breve estaremos en una web con billones -de hecho, ya estamos- billones y billones   de páginas y próximamente trillones. Porque cada tweet es, en sí, una página. Cada producto en el mundo, todo aquello que puedes nombrar o a lo que puedes dirigirte tendrá una página. Y eso son trillones de cosas. Google no puede crecer a ese nivel.

~02:02: John Hebeler, BBN Technologies

Hebeler (1:59): Deberías ser capaz de poder preguntar adecuadamente de cara a poder obtener un resultado muy específico dentro esa cantidad de información disponible ahí fuera, pero no puedes. Debes hacer toda la integración en tu cabeza, debes revisar todo lo que Google te devuelve, y dices, “Vaya, me pregunto cómo podría preguntarlo, porque esto es correcto pero esto otro no… Ahora debo revisarlo de nuevo, obtuve esto, pero no es lo que quiero”.

~02:20: Tim Berners-Lee, Inventor del World Wide Web

Tim Berners-Lee (2:18): Y, bueno, esto no se parece a una búsqueda. Creo que la gente utiliza la palabra búsqueda para referirse a esta especie de salto al vacío, cruzando los dedos, y esperando aterrizar en un buen sitio.

~02:29: Chris Dixon, Hunch.com, CEO

Chris Dixon (2:25): Cuando buscas una cámara fotográfica, vas a una web y dispones de diez mil modelos diferentes te sientes abrumado, y ciertos estudios muestran que de hecho la gente tiene menos predisposición a comprar algo cuando están saturados e incluso son menos felices tras haberlo comprado bajo estas circunstancias.

Weinberger (2:37): Tenemos demasiados correos, así que comenzamos a asignarles tags o a etiquetarlos; Gmail lo llama etiquetas. Y comenzamos a aplicar etiquetas. Y llegamos a la situación en que tenemos cientos de etiquetas y pensamos “Cielos, voy a tener que etiquetar mis etiquetas“.

Hebeler (2:48): Tenemos tantos tweets y tantos MySpace, y piensas “¿Qué pasaría si empezara a ordenar todo ese flujo de información?”. Y para hacerlo necesito cierta estructura.

~03:00: Alon Halevy, Google, Científico Investigador

Alon Halevy (2:59): Está claro que algo debe hacerse, usando información más estructurada.

Dixon (3:03): Como toda información disponible ahí fuera, si no está indexada adecuadamente, es como si no estuviera, ¿entiendes?

Shirky (3:09): Existe también el problema de la edad en los contenidos. Parece que no es la panacea.

Weinberger (3:15): Estaremos siempre filtrando los filtros que filtran nuestros filtros…que filtran nuestros filtros.

Hebeler (3:20): ¿Cómo encuentro el archivo correcto? ¿Cómo sé que todos esos ficheros pertenecen a ese sitio?

Spivack (3:24): ¿Cómo integro la información?

Jason Shellen (3:25): ¿Cómo puedo mantenerme actualizado de todas esas nuevas fuentes de información?

Shirky (3:29): ¿Cómo filtro los contenidos de cara a darles un mayor significado del que actualmente obtenemos?

Hebeler (3:33): …y eso es lo que la Web Semántica promete hacer.

Read More »

Web 3.0 Transcript – Now with links!

THE PROBLEM

John Hebeler (0:02): The core problem is, our ability to create information has far exceeded our ability to manage it. It’s kind of like we’re drowning in our richness, that’s kind of what’s happening, cause you have all this data, all these access points, and there’s really no way to really help you deal with it except for stuff you can pull into your human brain. And you can only pull in so much. So you’ve got this massive amount of potential, but there’s not any real tools to harness it.

David Weinberger (0:33): We have so much stuff that we have to deal with. Individually, as a culture. So much – that it just bursts the bounds of any physical library. You know if we had a Dewey Decimal System for everything on the web, the trillion pages and all the subpages and all that, we wouldn’t find a thing, that system simply can’t work.

Footage of HE.Net Data Center: 800 cabinets = 9,600 terabytes = 9.6 billion thick books = 1,690 Libraries of Congress.

Clay Shirky (1:09): The amount of media that’s available to the average user is a vastly much larger superset than anything that’s ever existed in human history. If I was going to start a news business tomorrow, I would start a news business designed to produce not one new bit of news, but instead to aggregate news for individuals in ways that mattered to them.

Nova Spivack (1:32): Google really was more important as the web was in millions of pages. Now we’re entering a web that’s going to be billions – well, it already is – that’s going to be billions and billions of pages, and soon trillions of pages. Because a tweet is actually, every individual item is a page. Every product in the world, everything you can name or address is going to have a page. And so that’s trillions of things. And Google doesn’t scale to that.

Hebeler (1:59): There should be enough information out there that you should be able to ask for something extraordinarily specific, but you can’t. You pretty much have to do all the integration in your own head, you’ve gotta come back and see all the stuff that comes back from Google, and say, Oh, I wonder how I could ask that, cause this was kinda right but this was wrong…Oh, I see why it came back, came this out, that isn’t what I want though.”

Tim Berners-Lee (2:18): And so that’s not really a search, I think people use the word search to mean this sort of parachuting in, crossing your fingers, and hoping to land somewhere really good.

Chris Dixon (2:25): You know when you’re looking for a camera and you go to some place and there’s like ten thousand cameras and you’re overwhelmed, and sort of studies show that people are actually less likely to buy something when they’re overwhelmed by these things and less likely to actually be happy with what they buy afterward.

Weinberger (2:37): We have too many emails, so we start to tag them or label them, Gmail calls them labels. And we start to apply labels. And then we get, maybe we start to get hundreds of labels and we think, Oh jeez, now I gotta label my labels.

Hebeler (2:48): All the tweets and all the MySpace and you start to think, What if I could start to put things together in all that flow of information? And in order to do that, you need some structure.

Alon Halevy (2:59): It’s clear that something needs to be done with more structured data.

Dixon (3:03): Like all the information might be out there, it’s just if it’s indexed in a really inaccessible form, you know a lot of times it might as well not be out there, right?

Shirky (3:09): That is, in many ways, the problem of the age. Right, content, as it turns out, is not king.

Weinberger (3:15): We are always going to be filtering the filters that filter our filters. That filter our filters.

Hebeler (3:20): How do I find the right file? How do I know that all those files belong there?

Spivack (3:24): How do you integrate data?

Jason Shellen (3:25): How do I keep up with all these new sources of information?

Shirky (3:29): How do you filter things to create more value than you can currently get?

Hebeler (3:33): And that is what the Semantic Web could eventually promise to do.

Read More »

Web 3.0 – Some useful distinctions and what is an ontology, anyway?

I made a short film on the story of the Semantic Web because I was fascinated by the philosophy it was based on and by the people who devoted themselves to it. We think of technology as something “other” than us, artificial as distinguished from natural, so I hoped to call attention to the fact that technology is built by people. John Hebeler’s understanding of the Semantic Web as being “all about relationships” and Clay Shirky’s objection to it because he doesn’t think we can “unambiguously describe the world” illustrate my belief that technology is less an exact science than the expression of a worldview.

But of course it’s more complicated than that. To tell a better story, I chose to leave a lot out (I can assure you that you wouldn’t have wanted to watch the hour-long more “nuanced” versions I started out with). I’m going into some of that here.

I want to complicate the objections raised by “the critics” of the Semantic Web. Essentially, the argument I presented was that while getting all the information on the web into standard formats might be a great idea in theory, in practice it can’t work because people don’t agree about the definitions of anything. In fact, the Semantic Web doesn’t require everyone to agree on the same ontologies. (In case you didn’t catch what an ontology is, think of it as a taxonomy that’s more specific about the relationships between things.) The idea is for everyone to build up their own ontologies, but to have some standard ways of connecting them together (so I can say the “Kate Ray” on Facebook and the “Kate Ray” on Twitter are the same person, and be able to bring together information from different sites).

Having lots of ontologies still doesn’t resolve everything. It actually gets into an even headier controversy about what it means for one thing to be the “SameAs” another thing (see my post on ‘Ontology Alignment’), but it’s not indicative of a failure to acknowledge that people have different worldviews. The “neatest” of the Semantic Web academics I talked to (including Frank Van Harmelon, whose interview didn’t end up in the film) may have leaned toward a more scientific, objective view of the world, but they weren’t suggesting that we eliminate diversity of thought. Even so, there are plenty of areas where people should agree about definitions. A Japanese factory working with an American factory to build airplanes should probably have some pretty damn specific agreements about what each part is and how it fits in with the other parts. Manufacturing and other enterprise-level areas, in fact, are currently hot applications of the Semantic Web.

An earlier version of the film also distinguished between the Semantic Web vs Linked Open Data. The Semantic Web is closer to the original vision – all information rendered machine-understandable so that computers can reason across huge swaths of data and potentially discover things we can’t now. Linked Data is a more toned-down version or the first step – just get information free from the applications that are keeping it locked up, and figure out what to do with it later. Linked Data is actually what Tim Berners-Lee was talking about in the TED clip I showed and in the year since his talk, quite a lot of organizations have added their data, from The New York Times to the U.S. Government. Whatever the debate surrounding the usefulness of the Semantic Web as a whole, the response to the Linked Data project clearly seems to be a step toward better transparency.

I could continue on this post in ever-refining detail (and an ever-diminishing audience), but instead will just repeat that my motivation for all this is discussion. Technology has important implications for how we think about the world, which is why I think these are the kinds of conversations we should be having. Thanks to everyone who commented, wrote posts in response, and sent me emails. I intend to keep up my side of the dialogue.

Hive Architecture

paradox of the crowd?

If I told you that a big group of people is better at doing things than one person alone, or even a smaller group of people, you’d probably agree with me. Science benefits from collaboration, as does carrying heavy objects. Kitchen wisdom says that ‘many hands make light work.’ And then there’s Wikipedia.

But if I told you that big groups of people are terrible at doing things, you’d probably still agree with me. Large organizations are inefficient, and mob mentality causes regular people to make stupid/appalling decisions. Our kitchen sage contradicts itself with the warning that ‘too many cooks spoil the broth.’ There was the recession.

Isn’t there a contradiction here? The web is brimming with discussions about the value or deficiencies of social production and over ‘the wisdom of the crowd’ versus ‘the stupidity of the crowd,’ as if it were one or the other, but the truth is that a group is neither wise nor stupid, powerful nor impotent. What’s important is just the observation that a group is not equal to the sum of its parts. Groups of smart people can do stupid things, a bunch of ignorant guessers can make better predictions than an expert, and, despite the fact that I live with three friends who keep their bedrooms fairly neat, our kitchen is more or less repulsive.

Part of the confusion surrounding ‘crowd wisdom’ comes from the fact that there are two principles at work here. First, Social Psych Rule #1: the variation in a person’s behavior across situations is much greater than the variation between people’s behavior in the same situation. In other words, we’re extremely susceptible to the influence of a situation, basing our behavior much more on the specifics of our environment than our snowflake-unique personalities. Every group has a particular social structure and environment, and that is what’s mostly going to determine whether we give it our all, shirk responsibility, or let our better judgment be swayed. Groups with different structures will do completely different things.

Second, there’s a bit of oft-overlooked math underlying crowdsourcing, that was part of the original formulation of the wisdom of the crowd: collective error = individual error – prediction diversity. Meaning, if you have a group of people with diverse opinions about something tangible (say, the number of jellybeans in a jar), their individual inaccuracy can be averaged out when you put together their answers, yielding one that is surprisingly accurate (better than 55 of 56 individual guesses). It all depends, though, on getting a random sampling of people with diverse opinions – which often isn’t the case with a self-selected group. Take a group of people with similar assumptions/biases and add the influence of a group’s environment on their behavior (the strong tendency to adjust opinions closer to the majority’s, for example) and you end up with a group that does astonishingly dumb things.

The point? Groups aren’t necessarily smart or powerful, but they could be. It isn’t enough to get a bunch of people together in a room or on a website with a great vision of what they could accomplish. The group’s potential hinges on its structure.

architecting the hive

The structure of an online social platform plays a pretty significant role, then. Its aim is to facilitate the kind of group that will lead to the most efficient collaboration, the wisest knowledge, the broadest possible imagination. It must prevent the flood of simultaneous contributions from descending into counterproductive anarchy.

The platform consists of a two-part structure: its architecture and its user community (see Wikipedia sociology for the most thorough obsessive mapping of an online community). The architecture is coded and therefore controllable; the social structure – while ultimately determined by users – is guided by the architecture, interface, and marketing of the site. Ideally, the designer of such a platform would be cognizant of the social repercussions of every element of the site. Ideally, he/she would make deliberate choices about each one of those elements to encourage specific group dynamics and cohesion. In practice, of course, we don’t know nearly enough about what goes on between a website and its users to make such fine-tuned predictions (especially any time a rebellious userbase decides to transform a site into something it wasn’t intended to be). Luckily, what’s unique to web services is that the code of a site can be changed in response to user behavior – a kind of symbiosis between technical and social structure. The immediate feedback in the form of changed behavior on the site provides the perfect opportunity to learn.

There’s a lot to learn. The Internet is still so young that things are exploding on top of it without us really knowing why. The slightest variation in group structure – say, introducing the one-way relationship of ‘follower’ in place of the two-way relationship of ‘friend’ – can have a huge impact on the nature of the group, and make or break a start-up. There are interesting experiments in group formation everywhere and new trends, like incorporating gaming techniques into social network platforms, are rife with implications about human motivation. Academically, all of this is fascinating, but ultimately I’m interested in learning how to build more deliberately-crafted platforms that will help us work together better.

This blog is about that.