Wednesday, December 12, 2007

Semantic Web as Pragmatic Web

Note. Sorry for language mistakes. English is my second language.

I published in docsGoogle my article

Vi-Fi (“Visible - Findable”) - how to describe and to search sites with standard and universal semantic-pragmatic tree: Semantic Web as Pragmatic Web

Abstract. The new approach to Web search, which provides powerful narrow-focused marketing tool for small on-line businesses as well as for everybody, who want to increase his visibility in Web. The core of approach is observable (small enough - of the order of tens) standard universal system of both attributes and their values, which describes requests of WEB-users and content (offers) of sites in the same language and allow to calculate easily congruency between query and site. This system is based on pragmatics (logic of customer’s request) rather than on usual (“pure”) ontology and is organised as tree. Both web-masters and searchers browse this tree to describe their sites and queries respectively. The tree changes until take its ultimate form as approach is realised.

1. PROBLEM
2. APPROACH
3. STANDARD UNIVERSAL SEMANTIC-PRAGMATIC TREE
4. DESCRIBING QUERIES AND SITES
5. SEARCH AND METRICS
6. IMPLEMENTATION AND PROBLEMS
7. INSTEAD OF CONCLUSION

1. PROBLEM

View from the searcher’s side. There are things that may be easy found in Web, and the things which is very difficult to find . It is difficult even to understand either they present in Web or not. For example, right now it was very simple for me to find some information about Semantic Web. I made search in Google and found article in Wikipedia with a lot of references, some of them directed me in W3C site etc.. But it was simply impossible (at least for me with my experience of Web search) to find somebody who proposed or developed the approach similar to my own. It was even impossible to find out that nobody elaborated these ideas. If I want to find notebook Toshiba Satellite 2060CDC, it would be very simple for me to resolve this problem. But if I want to find cheap, convenient, reliable notebook suitable for my personal aims, I will need to spend at least some days studying market, searching consumers forums etc.. Or alternatively, I must buy something based on advise of friends or simply something “famous”. The same problem I meet when I’m going to travel. This is very simple to find some hotel in every city. But it is very difficult to find “right” hotel for you. And of course, it is almost impossible to find more or less significant like-minded group, which shares your views, if these views are not very common. And so on. To say simply I can find in Web something, that “Big Boys” want to sell me, rather than what I need indeed.

View from the seller’s side. If I propose some specific things for specific people, I have very few chances that I will find these people or they will find me. If I have specific area of expertise and want to sell (or even propose free) it to somebody, I have chance (at best) to find only very small friction of my potential clients. The same thing is with search friends, collaborators etc.. But in all these cases my proposition will reach a lot of people who do not need it. In other words, if I am not “Big Name”, I am invisible on Web.

Of course, there are a lot of attempts to deal with these problems, like Ebay Auctions or Yahoo Questions, but all of them can not resolve the problem of satisfactory matching offers and request on Web. For example, there are hundreds or thousands places more or less similar to what I need but there are no any tools to choose from them one or two most appropriate.


2. APPROACH

The roots of described problem are quite obvious: if you say very briefly of what you want, then you have not too big chance to be understandable, especially when you want something special. If you try to minimise your query or description of your site, you cannot find what you do need (but what you did not say) or you cannot be visible by those who do need what you offer (but again what you did not describe). Thus, to resolve this problem both searchers and web-masters must present their needs and offers respectively in all necessary details explicitly. But this is only one, necessary, but not sufficient condition. The second (“more sufficient”, but again not “completely sufficient”) is that they must use the same language.

Ultimate solutions of our problem might be imagine as model of semantic Web, where the user formulates his query in natural language, the semantic analyser extracts the meaning of this query and after that semantic search engine analyses content of sites to find ones which are appropriate for query. However this idea may be realised today (and perhaps not just today, but in principle) only for very limited scopes of application. I will not discuss here all difficulties, which arise when we attempt to create semantic Web - they are well known, but I want to point at only one of them, which attracts less attention of developers.

Indeed, the meaning of request often can not be extracted from text of query, in principle. If somebody ask for “hotel in Florida” he may simply not state explicitly that he need hotel for holiday with his family. Thus, special procedure is necessary in many cases to clarify request. At the same time, not every hotelier put in his web-site, that his hotel especially suited for leisure. Sometimes he does not do this intentionally - being afraid to reduce the number of potential customers (i.e. business travellers may want more specialised for their needs hotel, rather than his “leisure hotel”), but sometimes he simply do not think in logic of his potential customers. In the last case he may need special procedure, which help him to show in the site all aspects of his hotel, which may be potentially crucial for his customers.

Now I can formulate the central idea of my approach. If both sites and queries are described in the same formal language, then we may make the search more focused and make the sites more visible. Two simple considerations are in a core of this approach. If I offer to customers exactly what thousand other seller offer, my chance to be “findable” is one thousandth in best. And if I formulate my need like millions others formulate their needs, I cannot expect, that I find the site which is right namely for me. But if I describe my need in some details and in the same language which web-masters use to describe their sites, my chances to find what I need increase as well chances of site’s owner to be found by me increase also.

But every speaker of some natural language has his/her own experience and this results in that he/she has his/her own “model of world” and uses his/her own language with his/her own meanings of words. And although the words themselves are common for all speakers of this language, the different meanings of them for different speakers (and listeners) make their similar sounding words, in fact, different ones. To say the same thing in other words, each person has his/her own system of concepts, and in this sense, has his/her “own language”. Thus, the task is to replace (for aim of Web-search) all this different individual “languages” by one common language. One way to resolve this task is to introduce one standard ontology and to construct one artificial “language of concept” to force users to describe themselves in framework of this artificial ontology with help of its artificial language, rather than to allow them to use their own (“natural”) models of world, systems of concepts and languages, with hope to extract the correct meaning from these natural statements.

What kind of language I mean? The simplest and most natural (in logic of developer of such system) language of this kind is the language of attributes/values (systems of concepts). If we propose for simplicity, that all attributes has the same number of values, then the words in this language are the matrix { x(i,j) }, where x(i,j) is meaning of predicate “i-th attribute of object has j-th value”. To make our system of concepts complex enough to produce tens and hundreds of billions different words, corresponding to different types of queries and sites, we must introduce some more or less independent attributes, which describe both queries and sites.

For example, if we introduce 10 independent attributes, each of them may have 1 of 10 different values, the total variety of different types of queries/sites will be 10 in 10th power. Thus, it is enough for searcher (web-master) to answer on 10 simple questions to make his query (site) different from 10 billions other queries (sites).

Of course, this situation is some model over-simplification. The real set of attributes and their values is organised in a more complex way - as hierarchy (tree), where the upper levels correspond to more general (abstract) concepts and lower ones correspond less general (abstract) concepts, which disclose meanings of upper concepts. In other words, each complex attribute itself has its own tree-like structure, which is similar to ones we use when we browse in a book shop searching the books of specific theme. But even taking into consideration over-simplification of example above, its estimation “Ten-ten” (ten attributes, each with ten values) gives right idea of both complexity of descriptive language and necessary efforts to use it. Since there are no also any problems to elaborate standard and simple procedures for describing sites and forming queries (for example, with using dialog wizards), the approach seems to be very practical.

And the last (by order but not by importance - indeed, it is VERY important) question is about the character of our attributes. To make approach practical our concepts must reflect searcher’s pragmatics, i.e. what searcher needs and what he wants to get from site (to find in Web). In other words, both queries and sites must be described in the logic of searcher’s request. In particular, web-master describes what potential user may find valuable for himself in the site.

Below I provide examples of both the tree of concepts and the simple language for describing sites and queries.

To be continued

No comments: