Machine-generated data
Encyclopedia
Machine-generated data is the generic term for information
which was automatically created from a computer process, application, or other machine without the intervention of a human. However, there is some indecision as to the breadth of the term. Monash Research's Curt Monash, who is generally credited with the introduction of the term, defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at Yale
, proposes a narrower definition of "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of the conflict in definition, both exclude data manually entered by an end user. Machine-generated data crosses all industry sectors, and humans increasingly generate the data unknowingly .
court systems consider machine-generated data as highly reliable..
published that data will grow by 650% over the following five years.. Most of the growth in data is the byproduct of machine-generated data..
s/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
which was automatically created from a computer process, application, or other machine without the intervention of a human. However, there is some indecision as to the breadth of the term. Monash Research's Curt Monash, who is generally credited with the introduction of the term, defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at Yale
Yale University
Yale University is a private, Ivy League university located in New Haven, Connecticut, United States. Founded in 1701 in the Colony of Connecticut, the university is the third-oldest institution of higher education in the United States...
, proposes a narrower definition of "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of the conflict in definition, both exclude data manually entered by an end user. Machine-generated data crosses all industry sectors, and humans increasingly generate the data unknowingly .
Relevance of machine generated data
Machine-generated data tends to be amorphous; typically, users never modify this data. Machines often generate this data as a consistent response to an event which occurred. Since the event is historical, the data is less prone to updates and modifications. Partly because of this quality, the U.S.United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
court systems consider machine-generated data as highly reliable..
Handling machine-generated data
In 2009, GartnerGartner
Gartner, Inc. is an information technology research and advisory firm headquartered in Stamford, Connecticut, United States. It was known as GartnerGroup until 2001....
published that data will grow by 650% over the following five years.. Most of the growth in data is the byproduct of machine-generated data..
Processing machine-generated data
Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine-generated data is unstructured but then derived into a common structure. Typically, these derived structures contain many data pointData point
In statistics, a data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual...
s/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.
Examples of machine-generated data
- Web logs
- Call detail recordCall detail recordA call detail record , also known as call data record, is a data record produced by a telephone exchange or other telecommunications equipment documenting the details of a phone call that passed through the facility or device...
s - Financial instrument trades
- Network event logs
- SEIM logs
- TelemetryTelemetryTelemetry is a technology that allows measurements to be made at a distance, usually via radio wave transmission and reception of the information. The word is derived from Greek roots: tele = remote, and metron = measure...
collected by the government