Content analysis
Encyclopedia
Content analysis or textual analysis is a methodology
in the social sciences
for studying the content of communication
. Earl Babbie
defines it as "the study of recorded human communications, such as book
s, website
s, painting
s and law
s."
According to Dr. Farooq Joubish, content analysis is considered a scholarly methodology in the humanities
by which texts
are studied as to authorship, authenticity
, or meaning
. This latter subject include philology
, hermeneutics, and semiotics
.
Harold Lasswell
formulated the core questions of content analysis: "Who says what, to whom, why, to what extent and with what effect?." Ole Holsti
(1969) offers a broad definition of content analysis as "any technique for making inferences by objectively and systematically identifying specified characteristics of messages." Kimberly A. Neuendorf (2002, p. 10) offers a six-part definition of content analysis:
"Content analysis is a summarising, quantitative analysis of messages that relies on the scientific method (including attention to objectivity, intersubjectivity, a priori design, reliability, validity, generalisability, replicability, and hypothesis testing) and is not limited as to the types of variables that may be measured or the context in which the messages are created or presented."
The method of content analysis enables the researcher to include large amounts of
textual information and systematically identify its properties, e.g. the frequencies of
most used keywords (KWIC meaning "Key Word in Context") by locating the more important structures of its communication content. Yet such amounts of textual information must be categorised analysis, providing at the end a meaningful reading of content under scrutiny. David Robertson (1976:73-75) for example created a coding frame for a comparison of modes of party competition between British and American parties. It was developed further in 1979 by the Manifesto Research Group aiming at a comparative content-analytic approach on the policy
positions of political parties.
Since the 1980s, content analysis has become an increasingly important tool in the measurement of success in public relations
(notably media relations
) programs and the assessment of media profiles. In these circumstances, content analysis is an element of media evaluation
or media analysis. In analyses of this type, data from content analysis is usually combined with media data (circulation
, readership, number of viewers and listeners, frequency of publication). It has also been used by futurists to identify trends. In 1982, John Naisbitt
published his popular Megatrends, based on content analysis in the US media.
The creation of coding frames is intrinsically related to a creative approach to variables that exert an influence over textual content. In political analysis, these variables could be political scandals, the impact of public opinion polls, sudden events in external politics, inflation etc.
Mimetic Convergence, created by F. Lampreia Carvalho for the comparative analysis of electoral proclamations on free-to-air television is an example of creative articulation
of variables in content analysis. The methodology describes the construction of party identities during long-term party competitions on TV, from a dynamic perspective, governed by the logic of the contingent. This method aims to capture the contingent logic observed in electoral campaigns by focusing on the repetition and innovation of themes sustained in party broadcasts. According to such post-structuralist
perspective from which electoral competition is analysed, the party identities, 'the real'
cannot speak without mediations because there is not a natural centre fixing the meaning of a party structure, it rather depends on ad-hoc articulations. There is no empirical reality outside articulations of meaning. Reality is an outcome of power struggles that unify ideas of social structure as a result of contingent interventions. In Brazil
, these contingent interventions have proven to be mimetic and convergent rather than divergent and polarised, being integral to the repetition of dichotomised worldviews.
Mimetic Convergence thus aims to show the process of fixation of meaning through discursive articulations that repeat, alter and subvert political issues that come into play. For this reason, parties are not taken as the pure expression of conflicts for the representation of interests (of different classes, religions, ethnic groups (see: Lipset & Rokkan 1967, Lijphart 1984) but attempts to recompose and re-articulate ideas of an absent totality around signifiers
gaining positivity.
Every content analysis should depart from a hypothesis. The hypothesis of Mimetic Convergence supports the Downsian
interpretation that in general, rational voters converge in the direction of uniform positions in most thematic dimensions. The hypothesis guiding the analysis of Mimetic Convergence between political parties' broadcasts is: 'public opinion polls on vote intention, published throughout campaigns on TV will contribute to successive revisions of candidates' discourses. Candidates re-orient their arguments and thematic selections in part by the signals sent by voters. One must also consider the interference of other kinds of input on electoral propaganda such as internal and external political crises and the arbitrary interference of private interests on the dispute. Moments of internal crisis in disputes between candidates might result from the exhaustion of a certain strategy. The moments of exhaustion might consequently precipitate an inversion in the thematic flux.
As an evaluation approach, content analysis is considered by some to be quasi-evaluation because content analysis judgments need not be based on value
statements if the research objective is aimed at presenting subjective experiences. Thus, they can be based on knowledge
of everyday lived experiences. Such content analyses are not evaluation
s. On the other hand, when content analysis judgments are based on values, such studies are evaluations (Frisbie, 1986).
As demonstrated above, only a good scientific hypothesis can lead to the development of a methodology that will allow the empirical description, be it dynamic or static.
Content analysis. This is a closely related if not overlapping kind, often included under the general rubric of “qualitative analysis,” and used primarily in the social sciences. It is “a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding” (Stemler 2001). It often involves building and applying a “concept dictionary” or fixed vocabulary of terms on the basis of which words are extracted from the textual data for concording or statistical computation.
(1969) groups 15 uses of content analysis into three basic categories:
He also places these uses into the context of the basic communication paradigm
.
The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.
(1980 and 2004), six questions must be addressed in every content analysis:
The assumption is that words and phrases mentioned most often are those reflecting important concerns in every communication. Therefore, quantitative content analysis starts with word frequencies, space measurements (column centimeters/inches in the case of newspaper
s), time counts (for radio
and television
time) and keyword frequencies. However, content analysis extends far beyond plain word counts, e.g. with Keyword In Context routines words can be analysed in their specific context to be disambiguated. Synonyms and homonyms can be isolated in accordance to linguistic properties of a language.
Qualitatively, content analysis can involve any kind of analysis where communication content (speech, written text, interviews, images ...) is categorised and classified. In its beginnings, using the first newspapers at the end of 19th century, analysis was done manually by measuring the number of lines and amount of space given a subject. With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestoes, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data. By having contents of communication available in form of machine readable texts, the input is analysed for frequencies and coded into categories for building up inferences. Robert Philip Weber (1990) notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way" (p. 12). The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years (see Krippendorff, 2004).
One more distinction is between the manifest contents (of communication) and its latent meaning. "Manifest" describes what (an author or speaker) definitely has written, while latent meaning describes what an author intended to say/write. Normally, content analysis can only be applied on manifest content; that is, the words, sentences, or texts themselves, rather than their meanings.
Dermot McKeone (1995) has highlighted the difference between prescriptive analysis and open analysis. In prescriptive analysis, the context is a closely defined set of communication parameters (e.g. specific messages, subject matter); open analysis identifies the dominant messages and subject matter within the text.
A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications.
Methodology
Methodology is generally a guideline for solving a problem, with specificcomponents such as phases, tasks, methods, techniques and tools . It can be defined also as follows:...
in the social sciences
Social sciences
Social science is the field of study concerned with society. "Social science" is commonly used as an umbrella term to refer to a plurality of fields outside of the natural sciences usually exclusive of the administrative or managerial sciences...
for studying the content of communication
Communication
Communication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast...
. Earl Babbie
Earl Babbie
Earl Robert Babbie , is an American sociologist who holds the position of Campbell Professor Emeritus in Behavioral Sciences at Chapman University...
defines it as "the study of recorded human communications, such as book
Book
A book is a set or collection of written, printed, illustrated, or blank sheets, made of hot lava, paper, parchment, or other materials, usually fastened together to hinge at one side. A single sheet within a book is called a leaf or leaflet, and each side of a leaf is called a page...
s, website
Website
A website, also written as Web site, web site, or simply site, is a collection of related web pages containing images, videos or other digital assets. A website is hosted on at least one web server, accessible via a network such as the Internet or a private local area network through an Internet...
s, painting
Painting
Painting is the practice of applying paint, pigment, color or other medium to a surface . The application of the medium is commonly applied to the base with a brush but other objects can be used. In art, the term painting describes both the act and the result of the action. However, painting is...
s and law
Law
Law is a system of rules and guidelines which are enforced through social institutions to govern behavior, wherever possible. It shapes politics, economics and society in numerous ways and serves as a social mediator of relations between people. Contract law regulates everything from buying a bus...
s."
According to Dr. Farooq Joubish, content analysis is considered a scholarly methodology in the humanities
Humanities
The humanities are academic disciplines that study the human condition, using methods that are primarily analytical, critical, or speculative, as distinguished from the mainly empirical approaches of the natural sciences....
by which texts
Text (literary theory)
A text, within literary theory, is a coherent set of symbols that transmits some kind of informative message. This set of symbols is considered in terms of the informative message's content, rather than in terms of its physical form or the medium in which it is represented...
are studied as to authorship, authenticity
Authentication
Authentication is the act of confirming the truth of an attribute of a datum or entity...
, or meaning
Meaning (existential)
In existentialism, meaning is understood as the worth of life. Meaning in existentialism is unlike typical conceptions of "the meaning of life", because it is descriptive. Due to the method of existentialism, prescriptive or declarative statements about meaning are unjustified. Meaning is only...
. This latter subject include philology
Philology
Philology is the study of language in written historical sources; it is a combination of literary studies, history and linguistics.Classical philology is the philology of Greek and Classical Latin...
, hermeneutics, and semiotics
Semiotics
Semiotics, also called semiotic studies or semiology, is the study of signs and sign processes , indication, designation, likeness, analogy, metaphor, symbolism, signification, and communication...
.
Harold Lasswell
Harold Lasswell
Harold Dwight Lasswell was a leading American political scientist and communications theorist. He was a member of the Chicago school of sociology and was a professor at Yale University in law. He was a President of the American Political Science Association and World Academy of Art and Science...
formulated the core questions of content analysis: "Who says what, to whom, why, to what extent and with what effect?." Ole Holsti
Ole Holsti
Ole Rudolf Holsti is an American political scientist and academic. He currently holds the position of George V. Allen Professor Emeritus of Political Science at Duke University...
(1969) offers a broad definition of content analysis as "any technique for making inferences by objectively and systematically identifying specified characteristics of messages." Kimberly A. Neuendorf (2002, p. 10) offers a six-part definition of content analysis:
"Content analysis is a summarising, quantitative analysis of messages that relies on the scientific method (including attention to objectivity, intersubjectivity, a priori design, reliability, validity, generalisability, replicability, and hypothesis testing) and is not limited as to the types of variables that may be measured or the context in which the messages are created or presented."
Description
In 1931, Alfred R Lindesmith developed a methodology to refute existing hypotheses, which became known as a content analysis technique, and it gained popularity in the 1960s by Glaser and is referred to as “The Constant Comparative Method of Qualitative Analysis” in an article published in 1964-65. Glaser and Strauss (1967) referred to their adaptation of it as “Grounded Theory."The method of content analysis enables the researcher to include large amounts of
textual information and systematically identify its properties, e.g. the frequencies of
most used keywords (KWIC meaning "Key Word in Context") by locating the more important structures of its communication content. Yet such amounts of textual information must be categorised analysis, providing at the end a meaningful reading of content under scrutiny. David Robertson (1976:73-75) for example created a coding frame for a comparison of modes of party competition between British and American parties. It was developed further in 1979 by the Manifesto Research Group aiming at a comparative content-analytic approach on the policy
Policy
A policy is typically described as a principle or rule to guide decisions and achieve rational outcome. The term is not normally used to denote what is actually done, this is normally referred to as either procedure or protocol...
positions of political parties.
Since the 1980s, content analysis has become an increasingly important tool in the measurement of success in public relations
Public relations
Public relations is the actions of a corporation, store, government, individual, etc., in promoting goodwill between itself and the public, the community, employees, customers, etc....
(notably media relations
Media relations
Media relations involves working with various media for the purpose of informing the public of an organization's mission, policies and practices in a positive, consistent and credible manner. Typically, this means coordinating directly with the people responsible for producing the news and features...
) programs and the assessment of media profiles. In these circumstances, content analysis is an element of media evaluation
Media evaluation
Media evaluation is a discipline of the social sciences and centres on the analysis of media content rating the exposure using a number of pre-designated criteria commonly including tonal value and presence of key messages...
or media analysis. In analyses of this type, data from content analysis is usually combined with media data (circulation
Newspaper circulation
A newspaper's circulation is the number of copies it distributes on an average day. Circulation is one of the principal factors used to set advertising rates. Circulation is not always the same as copies sold, often called paid circulation, since some newspapers are distributed without cost to the...
, readership, number of viewers and listeners, frequency of publication). It has also been used by futurists to identify trends. In 1982, John Naisbitt
John Naisbitt
John Naisbitt is an American author and public speaker in the area of futures studies. His first book Megatrends was published in 1982. It was the result of almost ten years of research. It was on the New York Times bestseller list for two years, mostly as #1...
published his popular Megatrends, based on content analysis in the US media.
The creation of coding frames is intrinsically related to a creative approach to variables that exert an influence over textual content. In political analysis, these variables could be political scandals, the impact of public opinion polls, sudden events in external politics, inflation etc.
Mimetic Convergence, created by F. Lampreia Carvalho for the comparative analysis of electoral proclamations on free-to-air television is an example of creative articulation
Articulation (sociology)
In sociology, articulation labels the process by which particular classes appropriate cultural forms and practices for their own use. The term appears to have originated from the work of Antonio Gramsci, specifically from his conception of superstructure...
of variables in content analysis. The methodology describes the construction of party identities during long-term party competitions on TV, from a dynamic perspective, governed by the logic of the contingent. This method aims to capture the contingent logic observed in electoral campaigns by focusing on the repetition and innovation of themes sustained in party broadcasts. According to such post-structuralist
Post-structuralism
Post-structuralism is a label formulated by American academics to denote the heterogeneous works of a series of French intellectuals who came to international prominence in the 1960s and '70s...
perspective from which electoral competition is analysed, the party identities, 'the real'
The Real
The Real refers to that which is authentic, the unchangeable truth in reference both to being/the Self and the external dimension of experience, also referred to as the infinite and absolute - as opposed to a reality based on sense perception and the material order.-In psychoanalysis:The Real is a...
cannot speak without mediations because there is not a natural centre fixing the meaning of a party structure, it rather depends on ad-hoc articulations. There is no empirical reality outside articulations of meaning. Reality is an outcome of power struggles that unify ideas of social structure as a result of contingent interventions. In Brazil
Brazil
Brazil , officially the Federative Republic of Brazil , is the largest country in South America. It is the world's fifth largest country, both by geographical area and by population with over 192 million people...
, these contingent interventions have proven to be mimetic and convergent rather than divergent and polarised, being integral to the repetition of dichotomised worldviews.
Mimetic Convergence thus aims to show the process of fixation of meaning through discursive articulations that repeat, alter and subvert political issues that come into play. For this reason, parties are not taken as the pure expression of conflicts for the representation of interests (of different classes, religions, ethnic groups (see: Lipset & Rokkan 1967, Lijphart 1984) but attempts to recompose and re-articulate ideas of an absent totality around signifiers
Sign (semiotics)
A sign is understood as a discrete unit of meaning in semiotics. It is defined as "something that stands for something, to someone in some capacity" It includes words, images, gestures, scents, tastes, textures, sounds – essentially all of the ways in which information can be...
gaining positivity.
Every content analysis should depart from a hypothesis. The hypothesis of Mimetic Convergence supports the Downsian
Anthony Downs
Anthony Downs is a scholar in public policy and public administration, and since 1977 is a Senior Fellow at the Brookings Institution in Washington D.C..-Education:...
interpretation that in general, rational voters converge in the direction of uniform positions in most thematic dimensions. The hypothesis guiding the analysis of Mimetic Convergence between political parties' broadcasts is: 'public opinion polls on vote intention, published throughout campaigns on TV will contribute to successive revisions of candidates' discourses. Candidates re-orient their arguments and thematic selections in part by the signals sent by voters. One must also consider the interference of other kinds of input on electoral propaganda such as internal and external political crises and the arbitrary interference of private interests on the dispute. Moments of internal crisis in disputes between candidates might result from the exhaustion of a certain strategy. The moments of exhaustion might consequently precipitate an inversion in the thematic flux.
As an evaluation approach, content analysis is considered by some to be quasi-evaluation because content analysis judgments need not be based on value
Value theory
Value theory encompasses a range of approaches to understanding how, why and to what degree people should value things; whether the thing is a person, idea, object, or anything else. This investigation began in ancient philosophy, where it is called axiology or ethics. Early philosophical...
statements if the research objective is aimed at presenting subjective experiences. Thus, they can be based on knowledge
Knowledge
Knowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...
of everyday lived experiences. Such content analyses are not evaluation
Evaluation
Evaluation is systematic determination of merit, worth, and significance of something or someone using criteria against a set of standards.Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises, including the arts, criminal justice,...
s. On the other hand, when content analysis judgments are based on values, such studies are evaluations (Frisbie, 1986).
As demonstrated above, only a good scientific hypothesis can lead to the development of a methodology that will allow the empirical description, be it dynamic or static.
Content analysis. This is a closely related if not overlapping kind, often included under the general rubric of “qualitative analysis,” and used primarily in the social sciences. It is “a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding” (Stemler 2001). It often involves building and applying a “concept dictionary” or fixed vocabulary of terms on the basis of which words are extracted from the textual data for concording or statistical computation.
Uses of content analysis
Ole HolstiOle Holsti
Ole Rudolf Holsti is an American political scientist and academic. He currently holds the position of George V. Allen Professor Emeritus of Political Science at Duke University...
(1969) groups 15 uses of content analysis into three basic categories:
- make inferenceInferenceInference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
s about the antecedents of a communicationCommunicationCommunication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast... - describe and make inferences about characteristics of a communication
- make inferences about the effectsCausalityCausality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....
of a communication.
He also places these uses into the context of the basic communication paradigm
Paradigm
The word paradigm has been used in science to describe distinct concepts. It comes from Greek "παράδειγμα" , "pattern, example, sample" from the verb "παραδείκνυμι" , "exhibit, represent, expose" and that from "παρά" , "beside, beyond" + "δείκνυμι" , "to show, to point out".The original Greek...
.
The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.
Uses of Content Analysis by Purpose, Communication Element, and Question | ||||
Purpose | Element | Question | Use | |
Make inferences about the antecedents of communications | Source | Who? |
|
|
Encoding process | Why? |
|
||
Describe & make inferences about the characteristics of communications | Channel | How? |
|
|
Message | What? |
|
||
Recipient | To whom? |
|
||
Make inferences about the consequences of communications | Decoding process | With what effect? |
|
|
Note. Purpose, communication element, & question from Holsti (1969). Uses primarily from Berelson (1952) as adapted by Holsti (1969). |
The process of a content analysis
According to Dr. Klaus KrippendorffKlaus Krippendorff
Klaus Krippendorff Frankfurt am Main, is the Gregory Bateson professor for Cybernetics, Language, and Culture at the Annenberg School for Communication, University of Pennsylvania, Philadelphia, USA.- Overview :...
(1980 and 2004), six questions must be addressed in every content analysis:
- Which data are analysed?
- How are they defined?
- What is the population from which they are drawn?
- What is the context relative to which the data are analysed?
- What are the boundaries of the analysis?
- What is the target of the inferences?
The assumption is that words and phrases mentioned most often are those reflecting important concerns in every communication. Therefore, quantitative content analysis starts with word frequencies, space measurements (column centimeters/inches in the case of newspaper
Newspaper
A newspaper is a scheduled publication containing news of current events, informative articles, diverse features and advertising. It usually is printed on relatively inexpensive, low-grade paper such as newsprint. By 2007, there were 6580 daily newspapers in the world selling 395 million copies a...
s), time counts (for radio
Radio
Radio is the transmission of signals through free space by modulation of electromagnetic waves with frequencies below those of visible light. Electromagnetic radiation travels by means of oscillating electromagnetic fields that pass through the air and the vacuum of space...
and television
Television
Television is a telecommunication medium for transmitting and receiving moving images that can be monochrome or colored, with accompanying sound...
time) and keyword frequencies. However, content analysis extends far beyond plain word counts, e.g. with Keyword In Context routines words can be analysed in their specific context to be disambiguated. Synonyms and homonyms can be isolated in accordance to linguistic properties of a language.
Qualitatively, content analysis can involve any kind of analysis where communication content (speech, written text, interviews, images ...) is categorised and classified. In its beginnings, using the first newspapers at the end of 19th century, analysis was done manually by measuring the number of lines and amount of space given a subject. With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestoes, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data. By having contents of communication available in form of machine readable texts, the input is analysed for frequencies and coded into categories for building up inferences. Robert Philip Weber (1990) notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way" (p. 12). The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years (see Krippendorff, 2004).
One more distinction is between the manifest contents (of communication) and its latent meaning. "Manifest" describes what (an author or speaker) definitely has written, while latent meaning describes what an author intended to say/write. Normally, content analysis can only be applied on manifest content; that is, the words, sentences, or texts themselves, rather than their meanings.
Dermot McKeone (1995) has highlighted the difference between prescriptive analysis and open analysis. In prescriptive analysis, the context is a closely defined set of communication parameters (e.g. specific messages, subject matter); open analysis identifies the dominant messages and subject matter within the text.
A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications.
Reliability in content analysis
Dr. Kimberly A. Neuendorf (2002) suggests that when human coders are used in content analysis, reliability translates to intercoder reliability or "the amount of agreement or correspondence among two or more coders."External links
- http://www.yoshikoder.org/downloads.html: an open source content analysis program
- http://academic.csuohio.edu/kneuendorf/content/ Web site of the Content Analysis Guidebook Online, provides some CATA software for free download, list of archives, bibliographies and other important sources
- http://www.impact-media.co.uk Contains a general introduction to media analysis and media profile measurement including an outline of the differences between open and prescriptive analysis
- http://www.andersonanalytics.com/reports/AATAT.pdf: History of content analysis software in psychology and applications of content analysis, text mining and text analytics in market research.
- http://courses.washington.edu/socw580/contentsoftware.shtml - Software for Content Analysis, a list of programs for analyzing text, images, video and audio; a resource used by the Advanced Research Methods & Design course in the School of Social Work at the University of Washington.