Sunteți pe pagina 1din 5

Clustering Sentence-Level Text Using a Novel Fuzzy Relational

Clustering Algorithm
Abstract
In comparison with hard clustering methods, in which a pattern belongs to a
single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with
differing degrees of membership. This is important in domains such as sentence
clustering, since a sentence is likely to be related to more than one theme or topic present
within a document or set of documents. However, because most sentence similarity
measures do not represent sentences in a common metric space, conventional fuzzy
clustering approaches based on prototypes or mixtures of Gaussians are generally not
applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm
that operates on relational input data i.e., data in the form of a s!uare matrix of pairwise
similarities between data ob"ects. The algorithm uses a graph representation of the data,
and operates in an #xpectation$%aximization framework in which the graph centrality of
an ob"ect in the graph is interpreted as a likelihood. &esults of applying the algorithm to
sentence clustering tasks demonstrate that the algorithm is capable of identifying
overlapping clusters of semantically related sentences, and that it is therefore of potential
use in a variety of text mining tasks. 'e also include results of applying the algorithm to
benchmark data sets in several other domains.
Builing a Scalable !atabase-!riven Reverse !ictionary
Abstract
In this paper, we describe the design and implementation of a reverse dictionary.
(nlike a traditional forward dictionary, which maps from words to their definitions, a
reverse dictionary takes a user input phrase describing the desired concept, and returns a
set of candidate words that satisfy the input phrase. This work has significant application
not only for the general public, particularly those who work closely with words, but also
in the general field of conceptual search. 'e present a set of algorithms and the results of
a set of experiments showing the retrieval accuracy of our methods and the runtime
response time performance of our implementation. )ur experimental results show that
our approach can provide significant improvements in performance scale without
sacrificing the !uality of the result. )ur experiments comparing the !uality of our
approach to that of currently available reverse dictionaries show that of our approach can
provide significantly higher !uality over either of the other currently available
implementations.
"rivacy-"reserving #ining o$ Association Rules From %utsource
Transaction !atabases
Abstract
*purred by developments such as cloud computing, there has been considerable
recent interest in the paradigm of data mining$as$a$service. + company ,data owner-
lacking in expertise or computational resources can outsource its mining needs to a third
party service provider ,server-. However, both the items and the association rules of the
outsourced database are considered private property of the corporation ,data owner-. To
protect corporate privacy, the data owner transforms its data and ships it to the server,
sends mining !ueries to the server, and recovers the true patterns from the extracted
patterns received from the server. In this paper, we study the problem of outsourcing the
association rule mining task within a corporate privacy$preserving framework. 'e
propose an attack model based on background knowledge and devise a scheme for
privacy preserving outsourced mining. )ur scheme ensures that each transformed item is
indistinguishable with respect to the attacker.s background knowledge, from at least k/0
other transformed items. )ur comprehensive experiments on a very large and real
transaction database demonstrate that our techni!ues are effective, scalable, and protect
privacy.
&n$ormation-Theoretic %utlier !etection $or Large-Scale Categorical
!ata
Abstract
)utlier detection can usually be considered as a pre$processing step for locating,
in a data set, those ob"ects that do not conform to well$defined notions of expected
behavior. It is very important in data mining for discovering novel or rare events,
anomalies, vicious actions, exceptional phenomena, etc. 'e are investigating outlier
detection for categorical data sets. This problem is especially challenging because of the
difficulty of defining a meaningful similarity measure for categorical data. In this paper,
we propose a formal definition of outliers and an optimization model of outlier detection,
via a new concept of holoentropy that takes both entropy and total correlation into
consideration. 1ased on this model, we define a function for the outlier factor of an
ob"ect which is solely determined by the ob"ect itself and can be updated efficiently. 'e
propose two practical 0$parameter outlier detection methods, named IT1$** and IT1$*2,
which re!uire no user$defined parameters for deciding whether an ob"ect is an outlier.
(sers need only provide the number of outliers they want to detect. #xperimental results
show that IT1$** and IT1$*2 are more effective and efficient than mainstream methods
and can be used to deal with both large and high$dimensional data sets where existing
algorithms fail.
A System to Filter Un'ante #essages $rom %SN User (alls
Abstract
)ne fundamental issue in today.s )nline *ocial 3etworks ,)*3s- is to give users
the ability to control the messages posted on their own private space to avoid that
unwanted content is displayed. (p to now, )*3s provide little support to this
re!uirement. To fill the gap, in this paper, we propose a system allowing )*3 users to
have a direct control on the messages posted on their walls. This is achieved through a
flexible rule$based system that allows users to customize the filtering criteria to be
applied to their walls, and a %achine 4earning$based soft classifier automatically
labeling messages in support of content$based filtering.

S-ar putea să vă placă și