GroupLens Research
Encyclopedia
GroupLens Research is a research lab in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities
specializing in recommender systems, online communities
, mobile
and ubiquitous
technologies, digital libraries
, and local geographic information system
s.
The GroupLens lab was one of the first to study automated recommender systems with the construction of the "GroupLens" recommender, a Usenet
article recommendation engine, and MovieLens
, a popular movie recommendation site used to study recommendation engines, tagging systems, and user interfaces. The lab has also gained notability for its members' work studying open content communities such as Wikipedia
and Cyclopath, a computational "geowiki" currently being used in the Twin Cities
to help plan the regional cycling system.
CSCW
conference
together. After they heard keynote speaker
Shumpei Kumon talk
about his vision for an information economy
, they began working on a collaborative filtering
system for Usenet news
. The system collected ratings from Usenet readers and used those ratings to predict how much other readers would like an article before they read it. This recommendation engine was one of the first automated collaborative filtering systems in which algorithms were used to automatically form predictions based on historical patterns of ratings. The overall system was called the "GroupLens" recommender, and the servers that collected the ratings and performed the computation were called the "Better Bit Bureau". (This name was later dropped after a request from the Better Business Bureau
. "GroupLens" is now used as a name both for this recommender system, and for the research lab at Minnesota.)
A feasibility test
was done between MIT and
the University of Minnesota and a research paper was published including
the algorithm
, the system design, and the results of the feasibility
study, in the CSCW conference of 1994.
In 1995, Riedl and Resnick invited Joseph Konstan to join the
team. Together, they decided to create a higher-performance
implementation of the algorithms to support larger-scale deployments.
In summer 1995 the team gathered
Bradley Miller, David Maltz,
Jon Herlocker, and Mark Claypool for "Hack Week" to create
the new implementation, and to plan the next round of experiments.
In the Spring of 1996, the first workshop on
collaborative filtering
was put together by Resnick and
Hal Varian
at the University of California, Berkeley
.
There, researchers from projects around the US
that were studying similar systems came together to share ideas and
experience.
In the Summer of 1996, David Gardiner, a
former Ph.D. student of Riedl's, introduced Riedl to Steven Snyder
.
Snyder had been one of the early employees at Microsoft
, but had left
Microsoft to come to Minnesota to do a Ph.D. in Psychology
. He
realized the commercial potential of collaborative filtering, and
encouraged the team to found a company in April 1996. By June,
Gardiner, Snyder, Miller, Riedl, and Konstan had
incorporated
their company, and by July
they had their first round of funding, from the Hummer-Winblad
venture capital
company. Net Perceptions went on to be one of the leading companies in
personalization during the Internet boom
of the late 1990s, and stayed
in business until 2004. Based on their experience, Riedl and Konstan wrote a book about the lessons learned from deploying recommenders in practice. Recommender systems have since become ubiquitous in the online world, with leading vendors such as Amazon
and Netflix
deploying highly sophisticated recommender systems. Netflix even offered a $1,000,000 prize for improvements in recommender technology.
Meanwhile, research continued at the University of Minnesota. When
the EachMovie site closed in 1997, the researchers behind it generously released
the anonymous rating data they had collected, for other researchers
to use. The GroupLens Research team, led by Brent Dahlen and Jon
Herlocker, used this data set
to jumpstart a new movie recommendation
site called MovieLens
. They were able to get the first version of MovieLens running within a few months.
Since 1997, MovieLens has been a very visible research platform, including a detailed discussion in a New Yorker article by
Malcolm Gladwell
, and a report in a full episode of ABC Nightline.
Between 1997 and 2002 the group continued its research on
collaborative filtering, which became known in the community by the
more general term of recommender systems. With Joe Konstan's expertise in user interface
s,
the team began exploring interface issues in recommenders, such as explanations, and meta-recommendation systems.
In 2002, GroupLens expanded into social computing
and online communities
with the addition of Loren Terveen, who was known for his research of social recommender systems such as PHOAKS.
In order to broaden the set of research ideas and tools they used,
Riedl, Konstan, and Terveen invited colleagues in social psychology
(Robert Kraut
and Sara Kiesler
, of the
Carnegie Mellon Human Computer Interaction Institute
), and
economic and social analysis (Paul Resnick and
Yan Chen of the
University of Michigan School of Information
) to collaborate. The
new, larger team adopted the name CommunityLab, and looked
generally at the effects of technological interventions on the
performance of online communities. For instance, some of their
research explored technology for enriching conversation systems, while other research explored the personal, social, and economic
motivations for user ratings.
In 2008 GroupLens launched Cyclopath, a
computational geowiki for bicyclists within a city.
University of Minnesota
The University of Minnesota, Twin Cities is a public research university located in Minneapolis and St. Paul, Minnesota, United States. It is the oldest and largest part of the University of Minnesota system and has the fourth-largest main campus student body in the United States, with 52,557...
specializing in recommender systems, online communities
Virtual community
A virtual community is a social network of individuals who interact through specific media, potentially crossing geographical and political boundaries in order to pursue mutual interests or goals...
, mobile
Mobile computing
Mobile computing is a form of human–computer interaction by which a computer is expected to be transported during normal usage. Mobile computing has three aspects: mobile communication, mobile hardware, and mobile software...
and ubiquitous
Ubiquitous computing
Ubiquitous computing is a post-desktop model of human-computer interaction in which information processing has been thoroughly integrated into everyday objects and activities. In the course of ordinary activities, someone "using" ubiquitous computing engages many computational devices and systems...
technologies, digital libraries
Digital library
A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...
, and local geographic information system
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...
s.
The GroupLens lab was one of the first to study automated recommender systems with the construction of the "GroupLens" recommender, a Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...
article recommendation engine, and MovieLens
MovieLens
MovieLens is a recommender system and virtual community website that recommends films for its users to watch, based on their film preferences and using collaborative filtering. The website is a project of GroupLens Research, a research lab in the Department of Computer Science and Engineering at...
, a popular movie recommendation site used to study recommendation engines, tagging systems, and user interfaces. The lab has also gained notability for its members' work studying open content communities such as Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
and Cyclopath, a computational "geowiki" currently being used in the Twin Cities
Twin cities
Twin cities are a special case of two cities or urban centres which are founded in close geographic proximity and then grow into each other over time...
to help plan the regional cycling system.
History
In 1992, John Riedl and Paul Resnick attended theCSCW
Computer supported cooperative work
The term computer-supported cooperative work was first coined by Irene Greif and Paul M. Cashman in 1984, at a workshop attended by individuals interested in using technology to support people in their work. At about this same time, in 1987 Dr...
conference
Academic conference
An academic conference or symposium is a conference for researchers to present and discuss their work. Together with academic or scientific journals, conferences provide an important channel for exchange of information between researchers.-Overview:Conferences are usually composed of various...
together. After they heard keynote speaker
Keynote
A keynote in literature, music, or public speaking establishes the principal underlying theme. In corporate or commercial settings, greater importance is attached to the delivery of a keynote speech or keynote address...
Shumpei Kumon talk
about his vision for an information economy
Information economy
Information economy is a term that characterizes an economy with an increased emphasis on informational activities and information industry.The vagueness of the term has three major sources...
, they began working on a collaborative filtering
Collaborative filtering
Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...
system for Usenet news
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...
. The system collected ratings from Usenet readers and used those ratings to predict how much other readers would like an article before they read it. This recommendation engine was one of the first automated collaborative filtering systems in which algorithms were used to automatically form predictions based on historical patterns of ratings. The overall system was called the "GroupLens" recommender, and the servers that collected the ratings and performed the computation were called the "Better Bit Bureau". (This name was later dropped after a request from the Better Business Bureau
Better Business Bureau
The Better Business Bureau , founded in 1912, is a corporation consisting of several private business franchises of local BBB organizations based in the United States and Canada, which work through their parent corporation, the Council of Better Business Bureaus .The Better Business Bureau, through...
. "GroupLens" is now used as a name both for this recommender system, and for the research lab at Minnesota.)
A feasibility test
Feasibility study
Feasibility studies aim to objectively and rationally uncover the strengths and weaknesses of the existing business or proposed venture, opportunities and threats as presented by the environment, the resources required to carry through, and ultimately the prospects for success. In its simplest...
was done between MIT and
the University of Minnesota and a research paper was published including
the algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
, the system design, and the results of the feasibility
study, in the CSCW conference of 1994.
In 1995, Riedl and Resnick invited Joseph Konstan to join the
team. Together, they decided to create a higher-performance
implementation of the algorithms to support larger-scale deployments.
In summer 1995 the team gathered
Bradley Miller, David Maltz,
Jon Herlocker, and Mark Claypool for "Hack Week" to create
the new implementation, and to plan the next round of experiments.
In the Spring of 1996, the first workshop on
collaborative filtering
Collaborative filtering
Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...
was put together by Resnick and
Hal Varian
Hal Varian
Hal Ronald Varian is an economist specializing in microeconomics and information economics. He is the Chief Economist at Google and he holds the title of emeritus professor at the University of California, Berkeley where he was founding dean of the School of Information...
at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...
.
There, researchers from projects around the US
that were studying similar systems came together to share ideas and
experience.
In the Summer of 1996, David Gardiner, a
former Ph.D. student of Riedl's, introduced Riedl to Steven Snyder
Steven Snyder
Steven Snyder is a notable figure in high technology management. He was an early employee of Microsoft where he was Microsoft's first business unit general manager, leading the Development Tool Business...
.
Snyder had been one of the early employees at Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
, but had left
Microsoft to come to Minnesota to do a Ph.D. in Psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...
. He
realized the commercial potential of collaborative filtering, and
encouraged the team to found a company in April 1996. By June,
Gardiner, Snyder, Miller, Riedl, and Konstan had
incorporated
Incorporation (business)
Incorporation is the forming of a new corporation . The corporation may be a business, a non-profit organisation, sports club, or a government of a new city or town...
their company, and by July
they had their first round of funding, from the Hummer-Winblad
venture capital
Venture capital
Venture capital is financial capital provided to early-stage, high-potential, high risk, growth startup companies. The venture capital fund makes money by owning equity in the companies it invests in, which usually have a novel technology or business model in high technology industries, such as...
company. Net Perceptions went on to be one of the leading companies in
personalization during the Internet boom
Dot-com bubble
The dot-com bubble was a speculative bubble covering roughly 1995–2000 during which stock markets in industrialized nations saw their equity value rise rapidly from growth in the more...
of the late 1990s, and stayed
in business until 2004. Based on their experience, Riedl and Konstan wrote a book about the lessons learned from deploying recommenders in practice. Recommender systems have since become ubiquitous in the online world, with leading vendors such as Amazon
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...
and Netflix
Netflix
Netflix, Inc., is an American provider of on-demand internet streaming media in the United States, Canada, and Latin America and flat rate DVD-by-mail in the United States. The company was established in 1997 and is headquartered in Los Gatos, California...
deploying highly sophisticated recommender systems. Netflix even offered a $1,000,000 prize for improvements in recommender technology.
Meanwhile, research continued at the University of Minnesota. When
the EachMovie site closed in 1997, the researchers behind it generously released
the anonymous rating data they had collected, for other researchers
to use. The GroupLens Research team, led by Brent Dahlen and Jon
Herlocker, used this data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
to jumpstart a new movie recommendation
site called MovieLens
MovieLens
MovieLens is a recommender system and virtual community website that recommends films for its users to watch, based on their film preferences and using collaborative filtering. The website is a project of GroupLens Research, a research lab in the Department of Computer Science and Engineering at...
. They were able to get the first version of MovieLens running within a few months.
Since 1997, MovieLens has been a very visible research platform, including a detailed discussion in a New Yorker article by
Malcolm Gladwell
Malcolm Gladwell
Malcolm Gladwell, CM is a Canadian journalist, bestselling author, and speaker. He is currently based in New York City and has been a staff writer for The New Yorker since 1996...
, and a report in a full episode of ABC Nightline.
Between 1997 and 2002 the group continued its research on
collaborative filtering, which became known in the community by the
more general term of recommender systems. With Joe Konstan's expertise in user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...
s,
the team began exploring interface issues in recommenders, such as explanations, and meta-recommendation systems.
In 2002, GroupLens expanded into social computing
Social computing
Social computing is a general term for an area of computer science that is concerned with the intersection of social behavior and computational systems. It has become an important concept for use in business. It is used in two ways as detailed below....
and online communities
Virtual community
A virtual community is a social network of individuals who interact through specific media, potentially crossing geographical and political boundaries in order to pursue mutual interests or goals...
with the addition of Loren Terveen, who was known for his research of social recommender systems such as PHOAKS.
In order to broaden the set of research ideas and tools they used,
Riedl, Konstan, and Terveen invited colleagues in social psychology
Social psychology
Social psychology is the scientific study of how people's thoughts, feelings, and behaviors are influenced by the actual, imagined, or implied presence of others. By this definition, scientific refers to the empirical method of investigation. The terms thoughts, feelings, and behaviors include all...
(Robert Kraut
Robert E. Kraut
Robert E. Kraut is an American social psychologist who studies human-computer interaction, online communities, internet use, group coordination, computers in organizations, and the role of visual elements in interpersonal communication...
and Sara Kiesler
Sara Kiesler
Sara Kiesler is the Hillman Professor of Computer Science and Human Computer Interaction in the Human-Computer Interaction Institute at Carnegie Mellon University. She was elected to the CHI Academy in 2002, and was awarded CHI Lifetime Achievement Award, the most prestigious award by SIGCHI in...
, of the
Carnegie Mellon Human Computer Interaction Institute
Carnegie Mellon Human Computer Interaction Institute
The Human-Computer Interaction Institute is a department within the School of Computer Science at Carnegie Mellon University in Pittsburgh, Pennsylvania. It is considered to be one of the leading centers of human-computer interaction research,...
), and
economic and social analysis (Paul Resnick and
Yan Chen of the
University of Michigan School of Information
University of Michigan School of Information
The School of Information or iSchool at the University of Michigan in Ann Arbor is a graduate school offering both a Master of Science in Information and a Doctor of Information ....
) to collaborate. The
new, larger team adopted the name CommunityLab, and looked
generally at the effects of technological interventions on the
performance of online communities. For instance, some of their
research explored technology for enriching conversation systems, while other research explored the personal, social, and economic
motivations for user ratings.
In 2008 GroupLens launched Cyclopath, a
computational geowiki for bicyclists within a city.
Contributions
- The MovieLens recommender system: MovieLens is a non-commercialNon-commercialNon-commercial refers to an activity or entity that does not in some sense involve commerce, at least relative to similar activities that do have a commercial objective or emphasis...
movie recommender system that has been running for over a decade now, with over 164,000 unique visitors to date, who have provided over 15 million movie ratings.
- MovieLens ratings datasets: In the early days of recommender systems, research was slowed down by the lack of publicly available datasets. In response to requests from other researchers, GroupLens released three datasets: the MovieLens 100,000 rating dataset, the MovieLens 1,000,000 rating dataset, and the MovieLens 10,000,000 rating dataset. These datasets became the standard datasets for recommender research, and have been used in over 300 papers by researchers around the world. The dataset is also being used for teaching about recommender technology.
- MovieLens tagging dataset: GroupLens added taggingTag (metadata)In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...
to MovieLens in 2006. Since then, users have provided over 85,000 applications of 14,000 unique tags to movies. The MovieLens 10,000,000 ratings dataset also includes a 100,000 tag applications dataset for researchers to use.
- Information leakageInformation leakageInformation leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack your encryption codes...
from recommender datasets: a paper in the information retrievalSpecial Interest Group on Information RetrievalSIGIR is the Association for Computing Machinery's Special Interest Group on Information Retrieval. The scope of the group's specialty is the theory and application of computers to the acquisition, organization, storage, retrieval and distribution of information; emphasis is placed on working with...
conference analyzed the privacy risks to users of having large recommender datasets released. The basic risk discovered is that an anonymized dataset might be combined with public information to identify a user. For instance, a user who has written about his preference for movies on online forumsInternet forumAn Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are at least temporarily archived...
could be associated with a specific row in the MovieLens datasets. In some cases, these associations might leak information the user would prefer to keep private.
- Wikipedia research: The study of value and vandalism in Wikipedia published in 2007 described the concentration of contribution across Wikipedia editors. This paper was one of the first to focus on the length of time that a contribution survives within Wikipedia as a measure of its value. The paper also investigated the effects of vandalism on Wikipedia readers, by measuring the probability that a view of a page would capture that page in a vandalized state. GroupLens has also explored ways to help editors find pages that can effectively contribute to with the robot recommender. The group has also explored the evolution of the norms in Wikipedia that determine which articles are accepted or rejected, and the effect of changes in those norms on the Long TailLong tailLong tail may refer to:*The Long Tail, a consumer demographic in business*Power law's long tail, a statistics term describing certain kinds of distribution*Long-tail boat, a type of watercraft native to Southeast Asia...
of Wikipedia articles. GroupLens has also explored the functioning of the informal peer review system within Wikipedia to discover ways the decisions being made appear to be influenced inappropriately by ownership, and that experience does not seem to change editor performance very much. GroupLens researchers have also explored visualizations of the edit history of Wikipedia articles. In 2011, the GroupLens researchers completed a scientific exploration of gender imbalance in Wikipedia's popular editors, resulting in finding that there was a large gap between male and female editors.
- Shilling recommender systems: GroupLens has explored ways that users of recommender systems can attempt to inappropriately influence the recommendations given to other users. They call this behavior shillingShillA shill, plant or stooge is a person who helps a person or organization without disclosing that he or she has a close relationship with that person or organization...
, because of its relationship to the practice of hiring associates to pretend to be enthusiastic customersAstroturfingAstroturfing is a form of advocacy in support of a political, organizational, or corporate agenda, designed to give the appearance of a "grassroots" movement. The goal of such campaigns is to disguise the efforts of a political and/or commercial entity as an independent public reaction to some...
. They showed that some types of shilling are likely to be effective in practice. One concern about shilling is that the false predictions may change the reported opinions of later users, further corrupting the recommendations.
- Cyclopath: Beginning in 2008, GroupLens launched Cyclopath, a computational geowiki for local bicyclists. Cyclopath has since been used by hundreds of cyclists within the Twin Cities. More recently, Cyclopath has been adopted by the Twin Cities Metropolitan CouncilMetropolitan CouncilThe Metropolitan Council or Met Council is the regional governmental agency and metropolitan planning organization in Minnesota serving the Twin Cities seven-county metropolitan area. The Met Council is granted regional authority powers in state statutes by the Minnesota Legislature. These powers...
to help plan the regional cycling system.