Semantic Information Retrieval Using Fuzzy Computer Science

Essay add: 30-03-2016, 17:17   /   Views: 4

Abstract:

Several approaches have been introduced in the field of information retrieval. Although these approaches are much effective but sometimes they are not able to appear in providing accurate information to the user. In this paper an ontology based approach of information retrieval has been presented that uses fuzzy set of various documents for a specific domain. An algorithm for fuzzy based classification of web documents is proposed to create semantic index. The proposed algorithm differs from others as: it utilizes K-means clustering algorithm to find semantically similar terms and domain ontology as well. The retrieved results would always be semantic as they are limited to a particular threshold of classified range.

Keywords: Information Retrieval, Ontology, Semantic search, Query Expansion, K-means clustering, Fuzzy Sets.

Introduction

With the rapid growth in Information Technology's field, Information Retrieval on Internet is gaining importance, day by day. The web comprises of huge amount of data and search engines provide an efficient way to help navigate the web and get the relevant information. The World Wide Web has proven to be less efficient in providing relevant information from a query processed by a user. Though at a first glance Search engines seems to be useful for providing accurate information to the user, but most of the time they prove to be less efficient in obtaining most reverent information of users processed queries, Therefore, the efforts should be made to create such methodology which generates quality of results, instead of just sorting a mass of document which is an unimportant process in delivering quality results. The biggest challenge in this process, is finding most relevant data according to user interest. In today's scenario the relevancy of web document is evaluated by matching each keyword of query with information on web [14]. To overcome such problem the new upcoming Semantic approaches are becoming a good example in establishing a semantic relationship among the documents i.e. Semantic web [15]. In semantic web information is stored in a conceptualize hierarchy referred as Ontology developed in Web ontology language (OWL). Therefore in semantic web the similarity can be evaluated by using semantics of concepts in ontology. In Semantic web the information is represented in RDF (Recourse description framework) through which it provides user understandable meaning to each concept. As the need to extract information more intelligently so that it can meet user's requirement to much extant. In this respect several knowledge base repositories are used i.e. ontology. Ontology has the property to define concepts and relationships to give the knowledge about a specific document in domain specific term. Therefore using ontology for information retrieval allows extracting information on the basis of semantic association (Links) rather than just by matching keyword.

Fuzzy logic can prove itself quit beneficial in complex situation where people are facing difficulty in making decision and find it difficult to use complex mathematical model. One of such field is information retrieval, where IR system finds it difficult to make decision in providing accurate information.

A fuzzy set in a universe of discourse U is characterized by a membership function PA (x) that takes values in the interval [0, 1] [16]. The fuzzy sets theory proposed by Zadeh [16] in 1965 called as approximate reasoning'. Each elements of fuzzy set has a membership value between 0to1 this membership function plays an important role in defining the degree of membership of elements in the fuzzy set. This membership faction can be defined in two ways to represent fuzzy set. 1. Discrete type membership function. 2. Continuous type membership functions.

This paper presents an approach of finding relevant documents by using the semantic similarity, fuzzy concept and query expansion technique. Semantic similarity between concepts is determined by applying k-means clustering. The theme of the proposed technique is based on fuzzy based searching and extracting precise information from user defined knowledge base.

Rest of the paper is organized as: Section 2 provides the related work; Section 3 presents the proposed system consists of calculating the semantic similarity, using fuzzy concept and query expansion along with example. Finally section 4 concludes the paper.

II. RELATED WORK

So far many approaches have been proposed by a number of researches in information retrieval. In proposed approach interpretation of information, searching of information and collection of information can be manipulated by semantic based on ontology. S. Robertsone et all. have defined the ontology based Information retrieval model, as in large repository of documents the semantic search can be supported by domain specific knowledgebase[2]. Another research done by A. Kiryakov et.al is based on producing architecture for Indexing, semantic annotation, and extracting documents with respect to repositories based on semantic [2]. Xing Jiang et al. Have proposed user ontology, it is an ontology based model it provides personalized information services which is used in semantic web. In this ontology concepts, taxonomic relationship and non - taxonomic relationship for a given domain ontology is used to assume the user interest [4]. A survey on existing research activities in this field have shown various applications for information retrieval such as :query expansion used in graph-based approach focusing on multi-document summarization [5], A syntactically-based technique of query reformulation for retrieving information[6], query can be formulated automatically through semi-supervised incremental algorithm [7] Consideration of concept for finding information is also an active area of research now a day's, ontology based[8], semantic[9] and conceptual based query expansion are some example of it. The search can be improved by using query expansion techniques. There are two methods for query expansion: Local analysis and global analysis [10].In local analysis new query generated on the basis of few retrieved documents. In global analysis new terms from external resources i.e. WordNet, thesaurus etc. are added in original query. There are many query expansion methods [11] although they are very helpful in improving search quality but they have some problems like: relevance feedback is not appropriate in many cases. Sometime finding relevant document becomes much difficult when the query is ambiguous [12] and in such situation user do not have any way to give his interest to the system.

There are different models for calculating semantic similarity between concepts in ontology. i.e. Lexical and syntactical, Structural, information-theoretical and future based models, among them feature based models are proven to be most efficient similarity technique[13] . We are motivated with feature based model because this model is shown to be very close to human judgment.

III. PROPOSED WORK

In this paper we have proposed a technique for getting few yet much more relevant results according to user query. The main objective of this approach is to solve the problem of finding non relevant result from user's view.

Overall Architecture

We have proposed architecture (as shown in Fig. 1) for semantic information retrieval based on document classification. In the proposed architecture, if a user enters a query through the user interface, firstly query expansion is performed. The expanded query is used for information retrieval through semantic IR module. The index manager receives the expanded query and the corresponding results are retrieved through the semantic index. The semantic index is created with the help of proposed Document Classification Algorithm that is based on Fuzzy, Domain Ontology and K-means clustering algorithm. Document classification is done by the document manager that receives the web documents fetched by the web crawler (for specific domain) and applies the Document Classification Algorithm.

OKB

Web

U s e r I n t e r f a c e

Doc. Manager

Document classification

Algorithm

Query Expansion

Expanded Query

Index Manager

Semantic Index

Web Crawler

Results

Documents

Ranking

K-means algorithm

Semantic terms

Semantic IR

Fig 2.1: Overall Architecture for the proposed system

The steps performed in semantic IR process are given as follows:

Create or reuse an existing ontology

Web crawler fetches web documents having terms given from ontology (classes).

Extract semantic terms by applying k-means clustering algorithm to the classes found in fetched document and ontology.

4) The documents consisting of semantic terms are sent to document manager.

5) The proposed algorithm is used for document classification and a semantic index is created consisting of fuzzy set of documents with their attributes.

6) If a user enters a query, the query is expanded and the results are retrieved within a fuzzy threshold through the semantic index with the help of index manager.

Extraction of semantic terms:

Relevant classes found in the documents use k-means clustering algorithm [17] and find the terms that are semantically similar to the classes. Each cluster created using this algorithm consists of set of semantically related terms. It follows two steps:

Initially the terms (centroid) are chosen randomly.

Terms that are semantically similar to the centroid are assigned in same cluster.

These steps are repeated until a stopping condition is met.

For e.g.

Initial classes chosen randomly: Management, Technical, AppliedScience etc.

Semantically similar terms: Management = {MBA, BBA, PGDM}

Technical = {B.Tech, M.Tech, MCA, BCA}

AppliedScience = {B.Sc, M.Sc}

Management

MBA

BBA

PGDM…..

B.Tech ,BS, BE,ME,MS

M.Tech, PGDCA,

MCA, BCA………

Technical

Applied Science

Cluster1

Cluster2

Cluster3

B.Sc

M.Sc……

Fig.2 Terms Clustering

The Algorithm for Document Classification:

Let S(D) is the set of document i.e. D(web Document). SK, Semantic keywords. SKj (Di) is the jth term in ith document.

Input: Set of documents fetched by crawler

Output: Classified documents within a boundary

Step 1: A Set of Documents is fetched by the web crawler for concepts fetched through ontology

Step2: Compute SK by applying K-means clustering algorithm in ontology .

Step 3: Find the number of semantic terms presented in each document.

, where SKj is the set of semantic terms found in Di

Step3: Find the percentage of keyword match i.e. P(x) of each document Di

Pi(x) = (SKj / SK) * 100)

Step 4: compute membership function (µd(x)) for each document as ( µdx: X [0,1] )

µdi(x)  Pi(x)

Step 5: Classify the documents through creating a fuzzy set of documents by setting a boundary of fuzzy value through membership function as,

0

Article name: Semantic Information Retrieval Using Fuzzy Computer Science essay, research paper, dissertation