понедельник, 7 июня 2010 г.

RDF vs. nonRDF for geodata at BeAware project

One of the initial goals of creating BeAware was to try to find useful application of technologies and approaches from Semantic Web stack (against the background of three-year implantation of semantics to projects where there was no significant need of it). In this post I will overview the advantages of using RDF datasets from Linked Data as compared with other data representation forms.

At the moment a lot of Linked Data datasets are published. It is interesting that for most of them RDF is not initial representation form, and two biggest geodata datasets (GeoNames and LinkedGeoData) are not exceptions here. GeoNames is daily published as a set of CSV-dumps, which may be easily imported to relation databases, and LinkedGeoData is a RDF representation of OpenStreetMap data, which is accessible in various nonRDF formats. Here the question arises: “What benefit can we get from converting data into RDF?” I won’t dwell on theoretical aspects of this question but will describe specific usecases which have become or will become accessible to BeAware thanks to using RDF.

Let’s start from a short intro into the idea of united usage of so different geodatasets at one project: GeoNames for cities (and overlying entities) and LinkedGeoData for objects inside a city.

At first it seemed like there is no need in using GeoNames as LinkedGeoData contains information not only about objects but also about cities, countries, etc. But in view of OpenStreetMap orientation to content (but not context), cities (and overlying entities - all the more so) are marked very badly: there is no such variety of localized names and such a full hierarchy that we have at GeoNames.

So we have decided to use GeoNames for first event location “dimension” – city. But the question whether to use relational or RDF form was still open. During one of the discussions at GeoNames google group Prakeek Jain gave me the link to the article, which he wrote as a coauthor. It put me up to quite an obvious thing: how easily transitive hierarchies of GeoNames entities could be processed by inference engine.

Another argument for RDF representation of GeoNames was a possibility of integration with geopolitical ontology. Currently GeoNames doesn’t contain information about countries unions like European Union (although this side of GeoNames is being developed) but we are planning to implement geofiltration by this kind of entities. And geopolitical ontology already contains Country Groups – we just need to implement mapping to GeoNames countries.

So we decided to use RDF representation of GeoNames. Now let’s discuss the choice between LinkedGeoData and pure OpenStreetMap.

First of all, LinkedGeoData ontology that connects all OpenStreetMap categories and properties excellently suits our interface of new place choosing (in addition, it allows to use inference engine, for example, for retrieving buildings of all types).
Secondly, as LinkedGeoData is orientated not to maps displaying but to geo queries execution, there is not only full version but also a reduced one (LinkedGeoData Elements). LGD Elements doesn’t contain uninteresting objects (highways, hanging elements, etc.) and has 25 times less triples. It suits us perfectly as it is very expensive to host 3 billions of triples of full LinkedGeoData. This point doesn’t relate neither to RDF nor to semantics at all and maybe there are reduced datasets for pure OpenStreetMap too (but I haven’t see any of them). But it accents the fact that Linked Data datasets may have purely technical, going beyond semantics, advantages.

We have overviewed advantages of using GeoNames and LinkedGeoData separately. But there surely should be some advantages of joint usage of these datasets (as they are published as Linked Data). Unfortunately, at the moment there is no mapping between LinkedGeoData and GeoNames. However when it is created (there is such a point in LinkedGeoData developers’ todo-list), some useful usecases will be opened to us. I am going to provide an example below.

Every GeoNames and LinkedGeoData entity has coordinates (latitude and longitude). As I understand, cities’ coordinates do not have any semantics (for example, it could be coordinates of city administration building) – these are just coordinates inside the city. Now then these coordinates differ in GeoNames and LinkedGeoData. The difference, as a rule, is not very significant, but sufficient to face the following problem. In OpenStreetMap near to city coordinates the city name is displayed. But at new event location choosing form (while event creating/editing) when you choose the city, map is moved to GeoNames city coordinates (as city search is GeoNames driven) and city name is not visible (at least using default scale). If we had GeoNames+LinkedGeoData mapping, we could use LinkedGeoData coordinates for GeoNames cities which would solve this problem.

To summarize, I can say for sure that using Linked Data is a real chance to extend your application functionality – you just need to find appropriate applying to opportunities that become open with moving to using RDF.

Комментариев нет:

Отправить комментарий