One of the initial goals of creating BeAware was to try to find useful application of technologies and approaches from Semantic Web stack (against the background of three-year implantation of semantics to projects where there was no significant need of it). In this post I will overview the advantages of using RDF datasets from Linked Data as compared with other data representation forms.
At the moment a lot of Linked Data datasets are published. It is interesting that for most of them RDF is not initial representation form, and two biggest geodata datasets (GeoNames and LinkedGeoData) are not exceptions here. GeoNames is daily published as a set of CSV-dumps, which may be easily imported to relation databases, and LinkedGeoData is a RDF representation of OpenStreetMap data, which is accessible in various nonRDF formats. Here the question arises: “What benefit can we get from converting data into RDF?” I won’t dwell on theoretical aspects of this question but will describe specific usecases which have become or will become accessible to BeAware thanks to using RDF.
Let’s start from a short intro into the idea of united usage of so different geodatasets at one project: GeoNames for cities (and overlying entities) and LinkedGeoData for objects inside a city.
At first it seemed like there is no need in using GeoNames as LinkedGeoData contains information not only about objects but also about cities, countries, etc. But in view of OpenStreetMap orientation to content (but not context), cities (and overlying entities - all the more so) are marked very badly: there is no such variety of localized names and such a full hierarchy that we have at GeoNames.
So we have decided to use GeoNames for first event location “dimension” – city. But the question whether to use relational or RDF form was still open. During one of the discussions at GeoNames google group Prakeek Jain gave me the link to the article, which he wrote as a coauthor. It put me up to quite an obvious thing: how easily transitive hierarchies of GeoNames entities could be processed by inference engine.
Another argument for RDF representation of GeoNames was a possibility of integration with geopolitical ontology. Currently GeoNames doesn’t contain information about countries unions like European Union (although this side of GeoNames is being developed) but we are planning to implement geofiltration by this kind of entities. And geopolitical ontology already contains Country Groups – we just need to implement mapping to GeoNames countries.
So we decided to use RDF representation of GeoNames. Now let’s discuss the choice between LinkedGeoData and pure OpenStreetMap.
First of all, LinkedGeoData ontology that connects all OpenStreetMap categories and properties excellently suits our interface of new place choosing (in addition, it allows to use inference engine, for example, for retrieving buildings of all types).
Secondly, as LinkedGeoData is orientated not to maps displaying but to geo queries execution, there is not only full version but also a reduced one (LinkedGeoData Elements). LGD Elements doesn’t contain uninteresting objects (highways, hanging elements, etc.) and has 25 times less triples. It suits us perfectly as it is very expensive to host 3 billions of triples of full LinkedGeoData. This point doesn’t relate neither to RDF nor to semantics at all and maybe there are reduced datasets for pure OpenStreetMap too (but I haven’t see any of them). But it accents the fact that Linked Data datasets may have purely technical, going beyond semantics, advantages.
We have overviewed advantages of using GeoNames and LinkedGeoData separately. But there surely should be some advantages of joint usage of these datasets (as they are published as Linked Data). Unfortunately, at the moment there is no mapping between LinkedGeoData and GeoNames. However when it is created (there is such a point in LinkedGeoData developers’ todo-list), some useful usecases will be opened to us. I am going to provide an example below.
Every GeoNames and LinkedGeoData entity has coordinates (latitude and longitude). As I understand, cities’ coordinates do not have any semantics (for example, it could be coordinates of city administration building) – these are just coordinates inside the city. Now then these coordinates differ in GeoNames and LinkedGeoData. The difference, as a rule, is not very significant, but sufficient to face the following problem. In OpenStreetMap near to city coordinates the city name is displayed. But at new event location choosing form (while event creating/editing) when you choose the city, map is moved to GeoNames city coordinates (as city search is GeoNames driven) and city name is not visible (at least using default scale). If we had GeoNames+LinkedGeoData mapping, we could use LinkedGeoData coordinates for GeoNames cities which would solve this problem.
To summarize, I can say for sure that using Linked Data is a real chance to extend your application functionality – you just need to find appropriate applying to opportunities that become open with moving to using RDF.
понедельник, 7 июня 2010 г.
воскресенье, 6 июня 2010 г.
BeAware project current state
In this post I am going to write about the current state of BeAware project, describe current problems and outline the nearest prospects. This post has a debating character, so any comments are highly appreciated in our googlegroup.
Let’s begin with BeAware’s heart – ontology.
I think BeAware’s ontology should satisfy the following criteria:
1. Be user-friendly. For example, it means we fix the tough set of properties for each event type
2. One of the competitive advantages of semantic technologies is the ability of inference. That’s why I think ontology should use it effectively (from the user perspective, of course). Now inference engine works out, for example, when user searches for all entertaining events (independing on underlying hierarchy levels quantity) or scientific conferences by all social sciences. Better than nothing… but this ontology side should obviously be improved.
3. Ontology should be a generalization of all events’ types occurring all over the world
I already told about it in the previous post, but I want to repeat it once more (in my opinion, it is really important). Ontology development is a very difficult and responsible task. It is difficult because we need to overview a huge amount of events occurring all over the world and generalize them into a convenient form for the end-user. It is responsible because as soon as ontology is stabilized, the backward compatibility restriction will be introduced and we won’t be able to significantly change the ontology (as there already will have been created events of different types and we will have to maintain them without semantics change).
Heere is a list of open questions concerning ontology structure:
1. Currently “musical concert” event is “entertaining event” inheritor. But, in my opinion, classical musical concerts deserve “cultural event” parent. According to the current approach, when root event types (root is visual meaning: all events are inheritors of Event class) refer to some sphere of life, it would be logical to create class “classical music concert” and derive if from “cultural event”). But I think it will ball up the users (and classical musical concerts will be created as just “musicalal concert” and become an “entertaining event”). May be the whole events types’ structure should be remade?
2. “Theater” event type has “theater art type” property (there should be a more laconical label…). Should we keep it this way or may be it would be better to extract opera, ballet, etc. into “Theater” class subclasses? On the one hand, it will lumber the ontology, on the other hand, current implementation won’t allow to add additional properties, for example, for ballet (technically we can do it, but in this case we will break the first criteria).
3. Are there any standardized science and sports classification? Current “kind of science” (at “science event”) and “kind of sport” (at “sport event”) hierarchies should surely be remade.
4. I have no idea where to put “exhibition” event type. There are a lot of exhibition types: pictures, dogs, IT (software, hardware, games), etc…
5. I can’t resist the temptation to create a new root category called “IT”. I am still successfully battling with it… but a more serious question arises: by what criteria should root events types’ be created?
Let’s go to the second point of this post – overview of geodata application at BeAware.
Every event is bound to some place where:
Place = City (GeoNames) + [Object (LinkedGeoData)], where Object is an optional parameter.
Due to event binding to GeoNames city I would like to give an opportunity to users to filter events by:
1. City
2. Country administrative division
3. Country
4. Countries union
But if you look at BeAware advanced search, you will see just geofiltering by city or country. Why?
First, I will explain the absence of filtering by country administrative division. Initial implementation allowed to filter by first three points of the list. But it made the filter object search very tangled: there could be city, country or country administrative division with the same name. That’s why I decided to explicitly write object type before its name:
But we can't determine the type of administrative division: state (USA), land (Germany), prefecture (Japan), etc. By reason of this uncertainty I decided to reject administrative division filtering for the time being. Looks like the only solution here is to use some general label like (“Country subdivision”) (or could there be a better solution?).
Now let’s discuss the fourth geofiltration method – filtration by countries unions. Currently GeoNames doesn’t support such a kind of geo/political entities but, as I know, Marc (GeoNames chief) is working on it. Also there is another way to implement this feature: make mapping (very simple mapping) between GeoNames countries and GeoPolitical ontology countries and filter by GeoPolitical ontology countries groups. I will find out more details about it and implement later (just didn’t have enough time to implement it).
Now let’s talk about the second event location “dimension”, about particular object from LinkedGeoData. Why not give an opportunity to users to filter events by exact object in the city? Looks like a rather useful usecase: you can get all plays at your favorite theater, all parties at a particular night club, etc.. We haven’t implemented such a filtration because it depends on decision whether will we allow users to create custom objects (if they can’t find appropriate one) or not.
That’s all for today. Next time I am going to write about advantages of using RDF geo datasets GeoNames RDF + LinkedGeoData instead of relational GeoNames + OpenStreetMap.
Let’s begin with BeAware’s heart – ontology.
I think BeAware’s ontology should satisfy the following criteria:
1. Be user-friendly. For example, it means we fix the tough set of properties for each event type
2. One of the competitive advantages of semantic technologies is the ability of inference. That’s why I think ontology should use it effectively (from the user perspective, of course). Now inference engine works out, for example, when user searches for all entertaining events (independing on underlying hierarchy levels quantity) or scientific conferences by all social sciences. Better than nothing… but this ontology side should obviously be improved.
3. Ontology should be a generalization of all events’ types occurring all over the world
I already told about it in the previous post, but I want to repeat it once more (in my opinion, it is really important). Ontology development is a very difficult and responsible task. It is difficult because we need to overview a huge amount of events occurring all over the world and generalize them into a convenient form for the end-user. It is responsible because as soon as ontology is stabilized, the backward compatibility restriction will be introduced and we won’t be able to significantly change the ontology (as there already will have been created events of different types and we will have to maintain them without semantics change).
Heere is a list of open questions concerning ontology structure:
1. Currently “musical concert” event is “entertaining event” inheritor. But, in my opinion, classical musical concerts deserve “cultural event” parent. According to the current approach, when root event types (root is visual meaning: all events are inheritors of Event class) refer to some sphere of life, it would be logical to create class “classical music concert” and derive if from “cultural event”). But I think it will ball up the users (and classical musical concerts will be created as just “musicalal concert” and become an “entertaining event”). May be the whole events types’ structure should be remade?
2. “Theater” event type has “theater art type” property (there should be a more laconical label…). Should we keep it this way or may be it would be better to extract opera, ballet, etc. into “Theater” class subclasses? On the one hand, it will lumber the ontology, on the other hand, current implementation won’t allow to add additional properties, for example, for ballet (technically we can do it, but in this case we will break the first criteria).
3. Are there any standardized science and sports classification? Current “kind of science” (at “science event”) and “kind of sport” (at “sport event”) hierarchies should surely be remade.
4. I have no idea where to put “exhibition” event type. There are a lot of exhibition types: pictures, dogs, IT (software, hardware, games), etc…
5. I can’t resist the temptation to create a new root category called “IT”. I am still successfully battling with it… but a more serious question arises: by what criteria should root events types’ be created?
Let’s go to the second point of this post – overview of geodata application at BeAware.
Every event is bound to some place where:
Place = City (GeoNames) + [Object (LinkedGeoData)], where Object is an optional parameter.
Due to event binding to GeoNames city I would like to give an opportunity to users to filter events by:
1. City
2. Country administrative division
3. Country
4. Countries union
But if you look at BeAware advanced search, you will see just geofiltering by city or country. Why?
First, I will explain the absence of filtering by country administrative division. Initial implementation allowed to filter by first three points of the list. But it made the filter object search very tangled: there could be city, country or country administrative division with the same name. That’s why I decided to explicitly write object type before its name:
But we can't determine the type of administrative division: state (USA), land (Germany), prefecture (Japan), etc. By reason of this uncertainty I decided to reject administrative division filtering for the time being. Looks like the only solution here is to use some general label like (“Country subdivision”) (or could there be a better solution?).
Now let’s discuss the fourth geofiltration method – filtration by countries unions. Currently GeoNames doesn’t support such a kind of geo/political entities but, as I know, Marc (GeoNames chief) is working on it. Also there is another way to implement this feature: make mapping (very simple mapping) between GeoNames countries and GeoPolitical ontology countries and filter by GeoPolitical ontology countries groups. I will find out more details about it and implement later (just didn’t have enough time to implement it).
Now let’s talk about the second event location “dimension”, about particular object from LinkedGeoData. Why not give an opportunity to users to filter events by exact object in the city? Looks like a rather useful usecase: you can get all plays at your favorite theater, all parties at a particular night club, etc.. We haven’t implemented such a filtration because it depends on decision whether will we allow users to create custom objects (if they can’t find appropriate one) or not.
That’s all for today. Next time I am going to write about advantages of using RDF geo datasets GeoNames RDF + LinkedGeoData instead of relational GeoNames + OpenStreetMap.
BeAware project alpha-version has been launched
Several months ago when searching for information about a conference I visited a lot of “events catalogue” sites but none of them appealed to me. Lack of convenient search engine (in case of large events quantity it turns the site into a trashcan), poor geodata integration and other drawbacks urged me to create a new events storing platform. Besides, the key motivation was the potential of applying the technologies from Semantic Web stack (I was looking for such a project for some time). Together with Alexander Efimov, we started working on BeAware project.
In this post I am going to shortly describe BeAware places of interest without plunging into technical details.
Let’s keep the index page for dessert and begin with the event creation page:
On the left side you can see the events tree, which is based on ontology (the ontology is poorly worked out at the moment but it still reflects the general idea). Each event type has it own properties set (which is extended with moving down the hierarchy).
After selecting the event type, we can start filling event properties: basic properties and properties which are specific for the selected event type:
The most interesting part here is the event location choosing. Location is a combination of city (GeoNames) and concrete obect inside it (LinkedGeoData). City is the one mandatory part of the place.
You can either choose the previously labeled place or add a new one. Before object choosing you should choose object type to filter objects (here inference is used, for example, for choosing all types of buildings):
Now you know what is meant by “event” at BeAware. Let’s go to the main page and review the advanced search:
First let’s choose an event type (current dropdown event tree implementation is not very convenient… but we are working on it). As a result, additional optional filtering criteria properties will be loaded (for example, for a scientific conference “free/not free” and “science field” properties will be loaded). Either events hierarchy or hierarchies of properties’ values are processed by inference engine. That’s why if a user requests entertaining events, he will get not only “entertaining event” instances but also instances of all its descendants (festivals, carnivals, etc.). Same for the situation in which the user is interested in scientific conferences on all social sciences – no problem.
Then let’s look into the geo filter. At the moment you can filter events either by city or country. In fact, it is the smallest part of functionality we could implement by using the means which our service is based on (GeoNames, LinkedGeoData, inference engine (part of Virtuoso)). I'm not going to go into details about geofiltering now, because the next post will cover this topic in full.
So we have set the search criteria – we just need to press “Find events” button and get our results… although current database state (to be precise, its emptiness) discredits the magic that occurs during search query execution.
We have looked through BeAware project core and now let’s talk about the things that will be improved or implemented in the nearest future.
First of all event ontology will be supplemented. It’s a very difficult and responsible task. It is difficult because we need to overview a huge number of events occurring all over the world and generalize them into convenient form for to the end user. It is responsible because as soon as ontology is stabilized, backward compatibility restriction will be introduced (it means that we won’t be able to significantly change the ontology: there will be events of all the types and we will have to maintain them without semantics change).
Secondly geofiltering is due to be improved. It’s just absurd to wield GeoNames+LinkedGeoData+Inference_Engine and get only filtering by cities and countries out of it. We will try to significantly improve this component.
Thirdly we will implement subscriptions. Subscriptions’ interface will be similar to the advanced search interface. Subscriptions will deprive our users of the “pleasure” of daily searching and give them opportunity to get notifications using email, RSS, etc.
Fourhtly RDFa markup will be added.
Fifthly we will integrate MusicBrainz for musical concerts.
Sixthly you may suggest your ideas – we won’t leave them without proper attention :)
P. S. BeAware is looking for grant/investments. Please contact me for any information
In this post I am going to shortly describe BeAware places of interest without plunging into technical details.
Let’s keep the index page for dessert and begin with the event creation page:
On the left side you can see the events tree, which is based on ontology (the ontology is poorly worked out at the moment but it still reflects the general idea). Each event type has it own properties set (which is extended with moving down the hierarchy).
After selecting the event type, we can start filling event properties: basic properties and properties which are specific for the selected event type:
The most interesting part here is the event location choosing. Location is a combination of city (GeoNames) and concrete obect inside it (LinkedGeoData). City is the one mandatory part of the place.
You can either choose the previously labeled place or add a new one. Before object choosing you should choose object type to filter objects (here inference is used, for example, for choosing all types of buildings):
Now you know what is meant by “event” at BeAware. Let’s go to the main page and review the advanced search:
First let’s choose an event type (current dropdown event tree implementation is not very convenient… but we are working on it). As a result, additional optional filtering criteria properties will be loaded (for example, for a scientific conference “free/not free” and “science field” properties will be loaded). Either events hierarchy or hierarchies of properties’ values are processed by inference engine. That’s why if a user requests entertaining events, he will get not only “entertaining event” instances but also instances of all its descendants (festivals, carnivals, etc.). Same for the situation in which the user is interested in scientific conferences on all social sciences – no problem.
Then let’s look into the geo filter. At the moment you can filter events either by city or country. In fact, it is the smallest part of functionality we could implement by using the means which our service is based on (GeoNames, LinkedGeoData, inference engine (part of Virtuoso)). I'm not going to go into details about geofiltering now, because the next post will cover this topic in full.
So we have set the search criteria – we just need to press “Find events” button and get our results… although current database state (to be precise, its emptiness) discredits the magic that occurs during search query execution.
We have looked through BeAware project core and now let’s talk about the things that will be improved or implemented in the nearest future.
First of all event ontology will be supplemented. It’s a very difficult and responsible task. It is difficult because we need to overview a huge number of events occurring all over the world and generalize them into convenient form for to the end user. It is responsible because as soon as ontology is stabilized, backward compatibility restriction will be introduced (it means that we won’t be able to significantly change the ontology: there will be events of all the types and we will have to maintain them without semantics change).
Secondly geofiltering is due to be improved. It’s just absurd to wield GeoNames+LinkedGeoData+Inference_Engine and get only filtering by cities and countries out of it. We will try to significantly improve this component.
Thirdly we will implement subscriptions. Subscriptions’ interface will be similar to the advanced search interface. Subscriptions will deprive our users of the “pleasure” of daily searching and give them opportunity to get notifications using email, RSS, etc.
Fourhtly RDFa markup will be added.
Fifthly we will integrate MusicBrainz for musical concerts.
Sixthly you may suggest your ideas – we won’t leave them without proper attention :)
P. S. BeAware is looking for grant/investments. Please contact me for any information
Подписаться на:
Сообщения (Atom)