Information filtering system
Encyclopedia
An Information filtering system is a system that removes redundant
or unwanted information
from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload
and increment of the semantic signal-to-noise ratio
. To do this the user's profile is compared to some reference characteristics. These characteristics may originate from the information item (the content-based approach) or the user's social environment (the collaborative filtering
approach).
Whereas in information transmission signal processing filters
are used against syntax
-disrupting noise on the bit-level, the methods employed in information filtering act on the semantic level.
The range of machine methods employed builds on the same principles as those for information extraction
. A notable application can be found in the field of email spam filters. Thus, it is not only the information explosion
that necessitates some form of filters, but also inadvertently or maliciously introduced pseudo
-information.
On the presentation level, information filtering takes the form of user-preferences-based newsfeeds, etc..
Recommender systems are active information filtering systems that attempt to present to the user information items (film
, television
, music
, book
s, news
, web page
s) the user is interested in. These systems add information items to the information flowing towards the user, as opposed to removing information items from the information flow towards the user. Recommender systems typically use collaborative filtering
approaches or a combination of the collaborative filtering and content-based filtering approaches, although content-based recommender systems do exist.
, there are already several methods of filtering information; for instance, if a government controls and restricts the flow of information, speaking of censorship, although, somewhat in a democratic country it will do to satisfy needs of beneficiaries.
On the other hand, we are going to talk about information filters if we refer to newspaper editors and journalists when they provide a service that selects the most valuable information for their clients, readers of books, magazines, newspapers, radio
listeners and viewers TV. This filtering operation is also present in schools and universities where there is a selection of information to provide assistance based on academic criteria to customers of this service, the students. With the advent of the Internet it increases the possibility that anyone can publish low-cost all one wish. In this way, it increases considerably the less useful information and consequently the quality information is disseminated. With this problem, it began to devise new filtering with which we can get the information required for each specific topic to easily and efficiently.
is harmful or not, whether knowledge allows a better understanding with or without the concept. In this case the task of information filtering to reduce or eliminate the harmful information with knowledge.
through ordinary users.
. This step is simplified by separating the training data in a new series called "test data" that we will use to measure the error rate. As a general rule it is important to distinguish between types of errors (false positives and false negatives). For example, in the case on an aggregator of content for children, it doesn’t have the same gravity to allow the passage of information not suitable for them, that shows violence or pornography, than the mistake to discard some appropriated information.
To improve the system to lower error rates and have these systems with learning capabilities similar to humans we require development of systems that simulate human cognitive abilities, such as natural language understanding, capturing meaning Common an other forms of advanced processing to achieve the semantics of information.
At present, these techniques are used in different applications, not only in the web context, but in thematic issues as varied as voice recognition, classification of telescopic astronomy or evaluation of financial risk.
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...
or unwanted information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload
Information overload
"Information overload" is a term popularized by Alvin Toffler in his bestselling 1970 book Future Shock. It refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information...
and increment of the semantic signal-to-noise ratio
Signal-to-noise ratio
Signal-to-noise ratio is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. It is defined as the ratio of signal power to the noise power. A ratio higher than 1:1 indicates more signal than noise...
. To do this the user's profile is compared to some reference characteristics. These characteristics may originate from the information item (the content-based approach) or the user's social environment (the collaborative filtering
Collaborative filtering
Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...
approach).
Whereas in information transmission signal processing filters
Filter (signal processing)
In signal processing, a filter is a device or process that removes from a signal some unwanted component or feature. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal...
are used against syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
-disrupting noise on the bit-level, the methods employed in information filtering act on the semantic level.
The range of machine methods employed builds on the same principles as those for information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
. A notable application can be found in the field of email spam filters. Thus, it is not only the information explosion
Information explosion
The information explosion is the rapid increase in the amount of published information and the effects of this abundance of data. As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English...
that necessitates some form of filters, but also inadvertently or maliciously introduced pseudo
Pseudo
The prefix pseudo- is used to mark something as false, fraudulent, or pretending to be something it is not.-See also:*Falsehood*Pseudorealism*Deception*Mimicry*Pseudo.com*Pseudo: Blood of Our Own...
-information.
On the presentation level, information filtering takes the form of user-preferences-based newsfeeds, etc..
Recommender systems are active information filtering systems that attempt to present to the user information items (film
Film
A film, also called a movie or motion picture, is a series of still or moving images. It is produced by recording photographic images with cameras, or by creating images using animation techniques or visual effects...
, television
Television
Television is a telecommunication medium for transmitting and receiving moving images that can be monochrome or colored, with accompanying sound...
, music
Music
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...
, book
Book
A book is a set or collection of written, printed, illustrated, or blank sheets, made of hot lava, paper, parchment, or other materials, usually fastened together to hinge at one side. A single sheet within a book is called a leaf or leaflet, and each side of a leaf is called a page...
s, news
News
News is the communication of selected information on current events which is presented by print, broadcast, Internet, or word of mouth to a third party or mass audience.- Etymology :...
, web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...
s) the user is interested in. These systems add information items to the information flowing towards the user, as opposed to removing information items from the information flow towards the user. Recommender systems typically use collaborative filtering
Collaborative filtering
Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...
approaches or a combination of the collaborative filtering and content-based filtering approaches, although content-based recommender systems do exist.
History
Before the advent of the InternetInternet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
, there are already several methods of filtering information; for instance, if a government controls and restricts the flow of information, speaking of censorship, although, somewhat in a democratic country it will do to satisfy needs of beneficiaries.
On the other hand, we are going to talk about information filters if we refer to newspaper editors and journalists when they provide a service that selects the most valuable information for their clients, readers of books, magazines, newspapers, radio
Radio
Radio is the transmission of signals through free space by modulation of electromagnetic waves with frequencies below those of visible light. Electromagnetic radiation travels by means of oscillating electromagnetic fields that pass through the air and the vacuum of space...
listeners and viewers TV. This filtering operation is also present in schools and universities where there is a selection of information to provide assistance based on academic criteria to customers of this service, the students. With the advent of the Internet it increases the possibility that anyone can publish low-cost all one wish. In this way, it increases considerably the less useful information and consequently the quality information is disseminated. With this problem, it began to devise new filtering with which we can get the information required for each specific topic to easily and efficiently.
Operation
A filtering system of this style consists of several tools that help people find the most valuable information, so the limited time you can dedicate to read / listen / view, is correctly directional in the most interesting and valuable documents, aside from the most inconsequential. These filters are also used to organize and structure information in a correct and understandable way, in addition to group messages on the mail addressed. These filters are very necessary in the results obtained of the search engines on the Internet. The functions of filtering improves every day to get downloading Web documents and more efficient messages.Criterion
One of the criteria used in this step is whether the knowledgeKnowledge
Knowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...
is harmful or not, whether knowledge allows a better understanding with or without the concept. In this case the task of information filtering to reduce or eliminate the harmful information with knowledge.
Learning System
A system of learning content consists, in general rules, mainly of three basic stages:- First, a system that provides solutions to a defined set of tasks.
- Subsequently it undergoes assessment criteria which will measure the performance of the previous stage in relation to solutions of problems.
- Acquisition module which its output obtained knowledge that are used in the system solver of the first stage.
Future
Currently the problem is not finding the best way to filter information, but the way that these systems require to learn independently the information needs of users. Not only because they automate the process of filtering but also the construction and adaptation of the filter. Some branches based on it, such as statistics, machine learning, pattern recognition and data mining, are the base for developing information filters that appear and adapt in base to experience. To allow the learning process can be carried out, part of the information has to be pre-filtered, it means there are positive and negative examples which we named training data, which can be generated by experts or, via feedbackFeedback
Feedback describes the situation when output from an event or phenomenon in the past will influence an occurrence or occurrences of the same Feedback describes the situation when output from (or information about the result of) an event or phenomenon in the past will influence an occurrence or...
through ordinary users.
Error
As data is entered, the system includes new rules; if we consider that this data can generalize the training data information, then we have to evaluate the system development and measure the system's ability to correctly predict the categories of new informationInformation
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
. This step is simplified by separating the training data in a new series called "test data" that we will use to measure the error rate. As a general rule it is important to distinguish between types of errors (false positives and false negatives). For example, in the case on an aggregator of content for children, it doesn’t have the same gravity to allow the passage of information not suitable for them, that shows violence or pornography, than the mistake to discard some appropriated information.
To improve the system to lower error rates and have these systems with learning capabilities similar to humans we require development of systems that simulate human cognitive abilities, such as natural language understanding, capturing meaning Common an other forms of advanced processing to achieve the semantics of information.
Fields of use
Nowadays, there are numerous techniques to develop information filters, some of these reach error rates lower than 10% in various experiments. Among these techniques there are decision trees, support vector machines, neural networks, Bayesian networks, linear discriminants, logistic regression, etc..At present, these techniques are used in different applications, not only in the web context, but in thematic issues as varied as voice recognition, classification of telescopic astronomy or evaluation of financial risk.