Profiling practices
Encyclopedia
Profiling refers to the whole process of construction and application of profiles generated by computerized profiling technologies. What characterizes profiling technologies is the use of algorithms or other mathematical techniques that allow one to discover patterns or correlations in large quantities of data, aggregated in databases. When these patterns or correlations are used to identify or represent people they can be called profile
s. Other than a discussion of profiling technologies or population profiling the notion of profiling practices is not just about the construction of profiles, but also concerns the application of group profiles to individuals, e.g. in the case of credit scoring, price discrimination, or identification of security risks .
Profiling is not simply a matter of computerized pattern recognition; it enables refined price-discrimination, targeted servicing, detection of fraud, and extensive social sorting. Real-time machine profiling constitutes the precondition for emerging socio-technical infrastructures envisioned by advocates of ambient intelligence
, Autonomic Computing
and ubiquitous computing
.
One of the most challenging problems of the information society
is dealing with the increasing data overload. With the digitizing
of all sorts of content as well as the improvement and drop in cost of recording technologies, the amount of available information has become enormous and is increasing exponentially. It has thus become important for companies, governments, and individuals to be able to discriminate information from noise, detecting those data that are useful or interesting. The development of profiling technologies must be seen against this background. These technologies are thought to efficiently collect and analyse data in order to find or test knowledge in the form of statistical patterns between data. This process is called Knowledge Discovery in Databases (KDD) , which provides the profiler with sets of correlated data that are used as "profiles".
Data collection, preparation and mining all belong to the phase in which the profile is under construction. However, profiling also refers to the application of profiles, meaning the usage of profiles for the identification or categorization of groups or individual persons. As can be seen in step six (application), the process is circular. There is a feedback loop between the construction and the application of profiles. The interpretation of profiles can lead to the reiterant – possibly real-time – fine-tuning of specific previous steps in the profiling process. The application of profiles to people whose data were not used to construct the profile is based on data matching, which provides new data that allows for further adjustments. The process of profiling is both dynamic and adaptive. A good illustration of the dynamic and adaptive nature of profiling is the Cross-Industry Standard Process for Data Mining (CRISP-DM
).
. This is similar to the methodology of traditional scientific research in that it starts with a hypothesis and consists of testing its validity. The result of this type of profiling is the verification or refutation of the hypothesis. One could also speak of deductive profiling. On the other hand, profiles can be generated by exploring a data base, using the data mining process to detect patterns in the data base that were not previously hypothesized. In a way, this is a matter of generating hypothesis: finding correlations one did not expect or even think of. Once the patterns have been mined, they will enter the loop – described above – and will be tested with the use of new data. This is called unsupervised learning
.
Two things are important with regard to this distinction. First, unsupervised learning algorithms seem to allow the construction of a new type of knowledge, not based on hypothesis developed by a researcher and not based on causal or motivational relations but exclusively based on stochastical correlations. Second, unsupervised learning algorithms thus seem to allow for an inductive type of knowledge construction that does not require theoretical justification or causal explanation .
Some authors claim that if the application of profiles based on computerized stochastical pattern recognition 'works', i.e. allows for reliable predictions of future behaviours, the theoretical or causal explanation of these patterns does not matter anymore . However, the idea that 'blind' algorithms provide reliable information does not imply that the information is neutral. In the process of collecting and aggregating data into a database (the first three steps of the process of profile construction), translations are made from real-life events to machine-readable data. These data are then prepared and cleansed to allow for initial computability. Potential bias will have to be located at these points, as well as in the choice of algorithms that are developed. It is not possible to mine a database for all possible linear and non-linear correlations, meaning that the mathematical techniques developed to search for patterns will be determinate of the patterns that can be found. In the case of machine profiling, potential bias is not informed by common sense prejudice or what psychologists call stereotyping, but by the computer techniques employed in the initial steps of the process. These techniques are mostly invisible for those to whom profiles are applied (because their data match the relevant group profiles).
If an individual profile is applied to the individual that it was mined from, then that is direct individual profiling. If a group profile is applied to an individual whose data match the profile, then that is indirect individual profiling, because the profile was generated using data of other people. Similarly, if a group profile is applied to the group that it was mined from, then that is direct group profiling . However, in as far as the application of a group profile to a group implies the application of the group profile to individual members of the group, it makes sense to speak of indirect group profiling, especially if the group profile is non-distributive.
Knowledge about the behaviour and preferences of customers is of great interest to the commercial sector. On the basis of profiling technologies, companies can predict the behaviour of different types of customers. Marketing strategies can then be tailored to the people fitting these types. Examples of profiling practices in marketing are customers loyalty cards, customer relationship management
in general, and personalized advertising.http://epic.org/privacy/profiling/https://www.datenschutzzentrum.de/guetesiegel/register.htmhttps://www.datenschutzzentrum.de/guetesiegel/kurzgutachten/g041006/
In the financial sector, institutions use profiling technologies for fraud prevention and credit scoring. Banks want to minimise the risks in giving credit to their customers. On the basis of extensive group profiling customers are assigned a certain scoring value that indicates their creditworthiness. Financial institutions like banks and insurance companies also use group profiling to detect fraud or money-laundering. Databases with transactions are searched with algorithms to find behaviours that deviate from the standard, indicating potentially suspicious transactions.
In the context of employment, profiles can be of use for tracking employees by monitoring their online behaviour, for the detection of fraud by them, and for the deployment of human resources by pooling and ranking their skills.
http://epic.org/privacy/workplace/.
Profiling can also be used to support people at work, and also for learning, by intervening in the design of adaptive hypermedia
systems personalising the interaction. For instance, this can be useful for supporting the management of attention
.
In forensic science, the possibility exists of linking different databases of cases and suspects and mining these for common patterns. This could be used for solving existing cases or for the purpose of establishing risk profiles of potential suspects .
, equality
, due process
, security
and liability
. Numerous authors have warned against the affordances of a new technological infrastructure that could emerge on the basis of semi-autonomic profiling technologies .
Privacy is one of the principal issues raised. Profiling technologies make possible a far-reaching monitoring of an individual's behaviour and preferences. Profiles may reveal personal or private information about individuals that they might not even be aware of themselves .
Profiling technologies are by their very nature discriminatory tools. They allow unparalleled kinds of social sorting and segmentation which could have unfair effects. The people that are profiled may have to pay higher prices, they could miss out on important offers or opportunities, and they may run increased risks because catering to their needs is less profitable . In most cases they will not be aware of this, since profiling practices are mostly invisible and the profiles themselves are often protected by intellectual property or trade secret. This poses a threat to the equality of and solidarity of citizens. On a larger scale, it might cause the segmentation of society.
One of the problems underlying potential violations of privacy and non-discrimination is that the process of profiling is more often than not invisible for those that are being profiled. This creates difficulties in that it becomes hard, if not impossible, to contest the application of a particular group profile. This disturbs principles of due process: if a person has no access to information on the basis of which she is withheld benefits or attributed certain risks, she cannot contest the way she is being treated .
Profiles can be used against people when they end up in the hands of people who are not entitled to access or use them. An important issue related to these breaches of security is identity theft
.
When the application of profiles causes harm, the liability for this harm has to be determined who is to be held accountable. Is the software programmer, the profiling service provider, or the profiled user to be held accountable? This issue of liability is especially complex in the case the application and decisions on profiles have also become automated like in Autonomic Computing
or ambient intelligence
decisions of automated decisions based on profiling.
Profile
- Computing and technology :* Profile , a concept in Unified Modeling Language* Apple ProFile, a hard drive* User profile refers to the computer representation of user information...
s. Other than a discussion of profiling technologies or population profiling the notion of profiling practices is not just about the construction of profiles, but also concerns the application of group profiles to individuals, e.g. in the case of credit scoring, price discrimination, or identification of security risks .
Profiling is not simply a matter of computerized pattern recognition; it enables refined price-discrimination, targeted servicing, detection of fraud, and extensive social sorting. Real-time machine profiling constitutes the precondition for emerging socio-technical infrastructures envisioned by advocates of ambient intelligence
Ambient intelligence
In computing, ambient intelligence refers to electronic environments that are sensitive and responsive to the presence of people. Ambient intelligence is a vision on the future of consumer electronics, telecommunications and computing that was originally developed in the late 1990s for the time...
, Autonomic Computing
Autonomic Computing
Autonomic Computing refers to the self-managing characteristics of distributed computing resources, adapting to unpredictable changes whilst hiding intrinsic complexity to operators and users...
and ubiquitous computing
Ubiquitous computing
Ubiquitous computing is a post-desktop model of human-computer interaction in which information processing has been thoroughly integrated into everyday objects and activities. In the course of ordinary activities, someone "using" ubiquitous computing engages many computational devices and systems...
.
One of the most challenging problems of the information society
Information society
The aim of the information society is to gain competitive advantage internationally through using IT in a creative and productive way. An information society is a society in which the creation, distribution, diffusion, use, integration and manipulation of information is a significant economic,...
is dealing with the increasing data overload. With the digitizing
Digitizing
Digitizing or digitization is the representation of an object, image, sound, document or a signal by a discrete set of its points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal...
of all sorts of content as well as the improvement and drop in cost of recording technologies, the amount of available information has become enormous and is increasing exponentially. It has thus become important for companies, governments, and individuals to be able to discriminate information from noise, detecting those data that are useful or interesting. The development of profiling technologies must be seen against this background. These technologies are thought to efficiently collect and analyse data in order to find or test knowledge in the form of statistical patterns between data. This process is called Knowledge Discovery in Databases (KDD) , which provides the profiler with sets of correlated data that are used as "profiles".
The profiling process
The technical process of profiling can be separated in several steps:- Preliminary grounding: The profiling process starts with a specification of the applicable problem domain and the identification of the goals of analysis.
- Data collection: The target dataset or database for analysis is formed by selecting the relevant data in the light of existing domain knowledge and data understanding.
- Data preparation: The data are preprocessed for removing noise and reducing complexity by eliminating attributes.
- Data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
: The data are analysed with the algorithm or heuristics developed to suit the data, model and goals. - Interpretation: The mined patterns are evaluated on their relevance and validity by specialists and/or professionals in the application domain (e.g. excluding spurious correlations).
- Application: The constructed profiles are applied, e.g. to categories of persons, to test and fine-tune the algorithms.
- Institutional decision: The institution decides what actions or policies to apply to groups or individuals whose data match a relevant profile.
Data collection, preparation and mining all belong to the phase in which the profile is under construction. However, profiling also refers to the application of profiles, meaning the usage of profiles for the identification or categorization of groups or individual persons. As can be seen in step six (application), the process is circular. There is a feedback loop between the construction and the application of profiles. The interpretation of profiles can lead to the reiterant – possibly real-time – fine-tuning of specific previous steps in the profiling process. The application of profiles to people whose data were not used to construct the profile is based on data matching, which provides new data that allows for further adjustments. The process of profiling is both dynamic and adaptive. A good illustration of the dynamic and adaptive nature of profiling is the Cross-Industry Standard Process for Data Mining (CRISP-DM
CRISP-DM
CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners...
).
Types of profiling practices
In order to clarify the nature of profiling technologies some crucial distinctions have to be made between different types of profiling practices, apart from the distinction between the construction and the application of profiles. The main distinctions are those between bottom-up and top-down profiling (or supervised and unsupervised learning), and between individual and group profiles.Supervised and unsupervised learning
Profiles can be classified according to the way they have been generated . On the one hand, profiles can be generated by testing a hypothesized correlation. This is called top-down profiling or supervised learningSupervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
. This is similar to the methodology of traditional scientific research in that it starts with a hypothesis and consists of testing its validity. The result of this type of profiling is the verification or refutation of the hypothesis. One could also speak of deductive profiling. On the other hand, profiles can be generated by exploring a data base, using the data mining process to detect patterns in the data base that were not previously hypothesized. In a way, this is a matter of generating hypothesis: finding correlations one did not expect or even think of. Once the patterns have been mined, they will enter the loop – described above – and will be tested with the use of new data. This is called unsupervised learning
Unsupervised learning
In machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
.
Two things are important with regard to this distinction. First, unsupervised learning algorithms seem to allow the construction of a new type of knowledge, not based on hypothesis developed by a researcher and not based on causal or motivational relations but exclusively based on stochastical correlations. Second, unsupervised learning algorithms thus seem to allow for an inductive type of knowledge construction that does not require theoretical justification or causal explanation .
Some authors claim that if the application of profiles based on computerized stochastical pattern recognition 'works', i.e. allows for reliable predictions of future behaviours, the theoretical or causal explanation of these patterns does not matter anymore . However, the idea that 'blind' algorithms provide reliable information does not imply that the information is neutral. In the process of collecting and aggregating data into a database (the first three steps of the process of profile construction), translations are made from real-life events to machine-readable data. These data are then prepared and cleansed to allow for initial computability. Potential bias will have to be located at these points, as well as in the choice of algorithms that are developed. It is not possible to mine a database for all possible linear and non-linear correlations, meaning that the mathematical techniques developed to search for patterns will be determinate of the patterns that can be found. In the case of machine profiling, potential bias is not informed by common sense prejudice or what psychologists call stereotyping, but by the computer techniques employed in the initial steps of the process. These techniques are mostly invisible for those to whom profiles are applied (because their data match the relevant group profiles).
Individual and group profiles
Profiles must also be classified according to the kind of subject they refer to. This subject can either be an individual or a group of people. When a profile is constructed with the data of a single person, this is called individual profiling . This kind of profiling is used to discover the particular characteristics of a certain individual, to enable unique identification or the provision of personalized services. However, personalized servicing is most often also based on group profiling, which allows categorisation of a person as a certain type of person, based on the fact that her profile matches with a profile that has been constructed on the basis of massive amounts of data about massive numbers of other people. A group profile can refer to the result of data mining in data sets that refer to an existing community that considers itself as such, like a religious group, a tennis club, a university, a political party etc. In that case it can describe previously unknown patterns of behaviour or other characteristics of such a group (community). A group profile can also refer to a category of people that do not form a community, but are found to share previously unknown patterns of behaviour or other characteristics . In that case the group profile describes specific behaviours or other characteristics of a category of people, like for instance women with blue eyes and red hair, or adults with relatively short arms and legs. These categories may be found to correlate with health risks, earning capacity, mortality rates, credit risks, etc.If an individual profile is applied to the individual that it was mined from, then that is direct individual profiling. If a group profile is applied to an individual whose data match the profile, then that is indirect individual profiling, because the profile was generated using data of other people. Similarly, if a group profile is applied to the group that it was mined from, then that is direct group profiling . However, in as far as the application of a group profile to a group implies the application of the group profile to individual members of the group, it makes sense to speak of indirect group profiling, especially if the group profile is non-distributive.
Distributive and non-distributive profiling
Group profiles can also be divided in terms of their distributive character . A group profile is distributive when its properties apply equally to all the members of its group: all bachelors are unmarried, or all persons with a specific gene have 80% chance to contract a specific disease. A profile is non-distributive when the profile does not necessarily apply to all the members of the group: the group of persons with a specific postal code have an average earning capacity of XX, or the category of persons with blue eyes has an average chance of 37% to contract a specific disease. Note that in this case the chance of an individual to have a particular earning capacity or to contract the specific disease will depend on other factors, e.g. sex, age, background of parents, previous health, education. It should be obvious that, apart from tautological profiles like that of bachelors, most group profiles generated by means of computer techniques are non-distributive. This has far-reaching implications for the accuracy of indirect individual profiling based on data matching with non-distributive group profiles. Quite apart from the fact that the application of accurate profiles may be unfair or cause undue stigmatisation, most group profiles will not be accurate.Application domains
Profiling technologies can be applied in a variety of different domains and for a variety of purposes. These profiling practices will all have different effect and raise different issues.Knowledge about the behaviour and preferences of customers is of great interest to the commercial sector. On the basis of profiling technologies, companies can predict the behaviour of different types of customers. Marketing strategies can then be tailored to the people fitting these types. Examples of profiling practices in marketing are customers loyalty cards, customer relationship management
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...
in general, and personalized advertising.http://epic.org/privacy/profiling/https://www.datenschutzzentrum.de/guetesiegel/register.htmhttps://www.datenschutzzentrum.de/guetesiegel/kurzgutachten/g041006/
In the financial sector, institutions use profiling technologies for fraud prevention and credit scoring. Banks want to minimise the risks in giving credit to their customers. On the basis of extensive group profiling customers are assigned a certain scoring value that indicates their creditworthiness. Financial institutions like banks and insurance companies also use group profiling to detect fraud or money-laundering. Databases with transactions are searched with algorithms to find behaviours that deviate from the standard, indicating potentially suspicious transactions.
In the context of employment, profiles can be of use for tracking employees by monitoring their online behaviour, for the detection of fraud by them, and for the deployment of human resources by pooling and ranking their skills.
http://epic.org/privacy/workplace/.
Profiling can also be used to support people at work, and also for learning, by intervening in the design of adaptive hypermedia
Adaptive hypermedia
In contrast to traditional e-learning/electronic learning, e-business, and e-government systems, whereby all users are offered or even directed a standard series of hyperlinks, adaptive hypermedia tailors what the user sees to the learner's goals, abilities, interests, knowledge, etc...
systems personalising the interaction. For instance, this can be useful for supporting the management of attention
Attention management
Attention management refers to models and tools for supporting the management of attention at the individual or at the collective level , and at the short-term or at a longer term ....
.
In forensic science, the possibility exists of linking different databases of cases and suspects and mining these for common patterns. This could be used for solving existing cases or for the purpose of establishing risk profiles of potential suspects .
Risks and issues
Profiling technologies have raised a host of ethical, legal and other issues including privacyPrivacy
Privacy is the ability of an individual or group to seclude themselves or information about themselves and thereby reveal themselves selectively...
, equality
Equality before the law
Equality before the law or equality under the law or legal egalitarianism is the principle under which each individual is subject to the same laws....
, due process
Due process
Due process is the legal code that the state must venerate all of the legal rights that are owed to a person under the principle. Due process balances the power of the state law of the land and thus protects individual persons from it...
, security
Security
Security is the degree of protection against danger, damage, loss, and crime. Security as a form of protection are structures and processes that provide or improve security as a condition. The Institute for Security and Open Methodologies in the OSSTMM 3 defines security as "a form of protection...
and liability
Legal liability
Legal liability is the legal bound obligation to pay debts.* In law a person is said to be legally liable when they are financially and legally responsible for something. Legal liability concerns both civil law and criminal law. See Strict liability. Under English law, with the passing of the Theft...
. Numerous authors have warned against the affordances of a new technological infrastructure that could emerge on the basis of semi-autonomic profiling technologies .
Privacy is one of the principal issues raised. Profiling technologies make possible a far-reaching monitoring of an individual's behaviour and preferences. Profiles may reveal personal or private information about individuals that they might not even be aware of themselves .
Profiling technologies are by their very nature discriminatory tools. They allow unparalleled kinds of social sorting and segmentation which could have unfair effects. The people that are profiled may have to pay higher prices, they could miss out on important offers or opportunities, and they may run increased risks because catering to their needs is less profitable . In most cases they will not be aware of this, since profiling practices are mostly invisible and the profiles themselves are often protected by intellectual property or trade secret. This poses a threat to the equality of and solidarity of citizens. On a larger scale, it might cause the segmentation of society.
One of the problems underlying potential violations of privacy and non-discrimination is that the process of profiling is more often than not invisible for those that are being profiled. This creates difficulties in that it becomes hard, if not impossible, to contest the application of a particular group profile. This disturbs principles of due process: if a person has no access to information on the basis of which she is withheld benefits or attributed certain risks, she cannot contest the way she is being treated .
Profiles can be used against people when they end up in the hands of people who are not entitled to access or use them. An important issue related to these breaches of security is identity theft
Identity theft
Identity theft is a form of stealing another person's identity in which someone pretends to be someone else by assuming that person's identity, typically in order to access resources or obtain credit and other benefits in that person's name...
.
When the application of profiles causes harm, the liability for this harm has to be determined who is to be held accountable. Is the software programmer, the profiling service provider, or the profiled user to be held accountable? This issue of liability is especially complex in the case the application and decisions on profiles have also become automated like in Autonomic Computing
Autonomic Computing
Autonomic Computing refers to the self-managing characteristics of distributed computing resources, adapting to unpredictable changes whilst hiding intrinsic complexity to operators and users...
or ambient intelligence
Ambient intelligence
In computing, ambient intelligence refers to electronic environments that are sensitive and responsive to the presence of people. Ambient intelligence is a vision on the future of consumer electronics, telecommunications and computing that was originally developed in the late 1990s for the time...
decisions of automated decisions based on profiling.
See also
- ProfilingProfilingProfiling, the extrapolation of information about something, based on known qualities, may refer specifically to:* Profiling practices * Forensic profiling, used in several types of forensic sciences* Offender profiling...
- Forensic profilingForensic profilingForensic profiling refers to the study and the exploitation of traces in order to draw a profile relevant to the supporting of various security tasks, mostly in the criminal justice system. The term forensic in this context refers to "information that is used in court as evidence" . The traces...
- Data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Digital traces
- Identification (information)Identification (information)The function of identification is to map a known quantity to an unknown entity so as to make it known. The known quantity is called the identifier and the unknown entity is what needs identification. A basic requirement for identification is that the Id be unique. Ids may be scoped, that is, they...
- IdentityIdentity (social science)Identity is a term used to describe a person's conception and expression of their individuality or group affiliations . The term is used more specifically in psychology and sociology, and is given a great deal of attention in social psychology...
- Behavioral targetingBehavioral targetingBehavioral targeting is a technique used by online publishers and advertisers to increase the effectiveness of their campaigns.Behavioral targeting uses information collected on an individual's web-browsing behavior, such as the pages they have visited or the searches they have made, to select...
- Digital identityDigital identityDigital identity is the aspect of digital technology that is concerned with the mediation of people's experience of their own identity and the identity of other people and things...
- PrivacyPrivacyPrivacy is the ability of an individual or group to seclude themselves or information about themselves and thereby reveal themselves selectively...
- LabellingLabellingLabelling or labeling is describing someone or something in a word or short phrase. For example, describing someone who has broken a law as a criminal. Labelling theory is a theory in sociology which ascribes labelling of people to control and identification of deviant behavior.It has been argued...
- StereotypeStereotypeA stereotype is a popular belief about specific social groups or types of individuals. The concepts of "stereotype" and "prejudice" are often confused with many other different meanings...
- User profileUser profileA user profile is a collection of personal data associated to a specific user. A profile refers therefore to the explicit digital representation of a person's identity...
- DemographicsDemographicsDemographics are the most recent statistical characteristics of a population. These types of data are used widely in sociology , public policy, and marketing. Commonly examined demographics include gender, race, age, disabilities, mobility, home ownership, employment status, and even location...