Digital data: Difference between revisions

Jump to navigation Jump to search
imported>Slothwizard
mNo edit summary
 
imported>Kvng
m unpiped links using script
 
Line 1: Line 1:
{{Short description|Discrete, discontinuous representation of information}}
{{Short description|Discrete, discontinuous representation of information}}  
{{Use dmy dates|date=January 2023}}
{{Use dmy dates|date=January 2023}}
{{About|the concept in information theory and information systems|the electronics concept|Digital signal|other uses|Digital (disambiguation){{!}}Digital}}
{{About|the concept in information theory and information systems|the electronics concept|Digital signal|other uses|Digital (disambiguation){{!}}Digital}}
{{Merge from|Data (computer science)|discuss=Talk:Digital data#Proposed merge of Data (computer science) into Digital data|date=March 2025}}
[[File:Radiocontrolledclock.jpg|thumb|[[Digital clock]].  The time shown by the digits on the face at any instant is digital data.  The actual precise time is analog data. ]]
[[File:Radiocontrolledclock.jpg|thumb|[[Digital clock]].  The time shown by the digits on the face at any instant is digital data.  The actual precise time is analog data. ]]


'''Digital data''', in [[information theory]] and [[information systems]], is information represented as a string of [[Discrete mathematics|discrete]] symbols, each of which can take on one of only a finite number of values from some [[alphabet (formal languages)|alphabet]], such as letters or digits. An example is a [[text document]], which consists of a string of [[alphanumeric character]]s. The most common form of digital data in modern information systems is ''[[binary data]]'', which is represented by a string of [[binary digit]]s (bits) each of which can have one of two values, either 0 or 1.
'''Digital data''' or '''digital information''', in [[information theory]] and [[information systems]], is [[data]] or [[information]] represented as a string of [[Discrete mathematics|discrete]] symbols, each of which can take on one of only a finite number of values from some [[alphabet (formal languages)|alphabet]], such as letters or digits. An example is a [[text document]], which consists of a string of [[alphanumeric character]]s. The most common form of digital data in modern information systems is ''[[binary data]]'', which is represented by a string of [[binary digit]]s (bits) each of which can have one of two values, either 0 or 1.


Digital data can be contrasted with ''analog data'', which is represented by a value from a [[continuous variable|continuous]] range of [[real number]]s. Analog data is transmitted by an [[analog signal]], which not only takes on continuous values but can vary continuously with time, a continuous [[real-valued function]] of time. An example is the air pressure variation in a [[sound wave]].  
Digital data can be contrasted with ''analog data'', which is represented by a value from a [[continuous variable|continuous]] range of [[real number]]s. Analog data is transmitted by an [[analog signal]], which not only takes on continuous values but can vary continuously with time, a continuous [[real-valued function]] of time. An example is the air pressure variation in a [[sound wave]].  


The word ''digital'' comes from the same source as the words [[digit (anatomy)|digit]] and ''digitus'' (the [[Latin]] word for ''finger''), as fingers are often used for counting. Mathematician [[George Stibitz]] of [[Bell Labs|Bell Telephone Laboratories]] used the word ''digital'' in reference to the fast electric pulses emitted by a device designed to aim and fire anti-aircraft guns in 1942.<ref>{{Cite book |last=Ceruzzi |first=Paul E |title=Computing: A Concise History |date=29 June 2012 |publisher=[[MIT Press]] |isbn=978-0-262-51767-6}}</ref> The term is most commonly used in [[computing]] and [[electronics]], especially where real-world information is converted to [[Binary numeral system|binary]] numeric form as in [[digital audio]] and [[digital photography]].
Data requires [[Interpretation (logic)|interpretation]]  to become [[information]]. In modern (post-1960) computer systems, all data is digital.
 
The word ''digital'' comes from the same source as the words [[digit (anatomy)|digit]] and ''digitus'' (the [[Latin]] word for ''finger''), as fingers are often used for counting. Mathematician [[George Stibitz]] of [[Bell Telephone Laboratories]] used the word ''digital'' in reference to the fast electric pulses emitted by a device designed to aim and fire anti-aircraft guns in 1942.<ref>{{Cite book |last=Ceruzzi |first=Paul E |title=Computing: A Concise History |date=29 June 2012 |publisher=[[MIT Press]] |isbn=978-0-262-51767-6}}</ref> The term is most commonly used in [[computing]] and [[electronics]], especially where real-world information is converted to [[Binary numeral system|binary]] numeric form as in [[digital audio]] and [[digital photography]].


== Symbol to digital conversion ==
== Symbol to digital conversion ==
Line 15: Line 16:
Since symbols (for example, [[alphanumeric]] [[Character (computing)|characters]]) are not continuous, representing symbols digitally is rather simpler than conversion of continuous or analog information to digital. Instead of [[sampling (signal processing)|sampling]] and [[quantization (signal processing)|quantization]] as in [[analog-to-digital conversion]], such techniques as [[polling (computer science)|polling]] and [[Character encoding|encoding]] are used.
Since symbols (for example, [[alphanumeric]] [[Character (computing)|characters]]) are not continuous, representing symbols digitally is rather simpler than conversion of continuous or analog information to digital. Instead of [[sampling (signal processing)|sampling]] and [[quantization (signal processing)|quantization]] as in [[analog-to-digital conversion]], such techniques as [[polling (computer science)|polling]] and [[Character encoding|encoding]] are used.


A symbol input device usually consists of a group of switches that are polled at regular intervals to see which switches are switched. Data will be lost if, within a single polling interval, two switches are pressed, or a switch is pressed, released, and pressed again. This polling can be done by a specialized processor in the device to prevent burdening the main [[Central processing unit|CPU]].<ref>{{Cite book |last1=Heinrich |first1=Lutz J. |url=https://books.google.com/books?id=Uq4FCgAAQBAJ&dq=Digitale+Daten+lexikon&pg=PA198 |title=Wirtschaftsinformatik-Lexikon |last2=Heinzl |first2=Armin |last3=Roithmayr |first3=Friedrich |date=2014-08-29 |publisher=Walter de Gruyter GmbH & Co KG |isbn=978-3-486-81590-0 |language=de}}</ref> When a new symbol has been entered, the device typically sends an [[interrupt]], in a specialized format, so that the CPU can read it.
A symbol input device usually consists of a group of switches that are polled at regular intervals to see which switches are switched. Data will be lost if, within a single polling interval, two switches are pressed, or a switch is pressed, released, and pressed again. This polling can be done by a specialized processor in the device to prevent burdening the main [[CPU]].<ref>{{Cite book |last1=Heinrich |first1=Lutz J. |url=https://books.google.com/books?id=Uq4FCgAAQBAJ&dq=Digitale+Daten+lexikon&pg=PA198 |title=Wirtschaftsinformatik-Lexikon |last2=Heinzl |first2=Armin |last3=Roithmayr |first3=Friedrich |date=2014-08-29 |publisher=Walter de Gruyter GmbH & Co KG |isbn=978-3-486-81590-0 |language=de}}</ref> When a new symbol has been entered, the device typically sends an [[interrupt]], in a specialized format, so that the CPU can read it.


For devices with only a few switches (such as the buttons on a [[joystick]]), the status of each can be encoded as bits (usually 0 for released and 1 for pressed) in a single word. This is useful when combinations of key presses are meaningful, and is sometimes used for passing the status of modifier keys on a keyboard (such as shift and control). But it does not scale to support more keys than the number of bits in a single byte or word.
For devices with only a few switches (such as the buttons on a [[joystick]]), the status of each can be encoded as bits (usually 0 for released and 1 for pressed) in a single word. This is useful when combinations of key presses are meaningful, and is sometimes used for passing the status of modifier keys on a keyboard (such as shift and control). But it does not scale to support more keys than the number of bits in a single byte or word.


Devices with many switches (such as a [[computer keyboard]]) usually arrange these switches in a scan matrix, with the individual switches on the intersections of x and y lines. When a switch is pressed, it connects the corresponding x and y lines together. Polling (often called scanning in this case) is done by activating each x line in sequence and detecting which y lines then have a [[digital signal|signal]], thus which keys are pressed. When the keyboard processor detects that a key has changed state, it sends a signal to the CPU indicating the scan code of the key and its new state. The symbol is then [[Code|encoded]] or converted into a number based on the status of modifier keys and the desired [[character encoding]].
Devices with many switches (such as a [[computer keyboard]]) usually arrange these switches in a scan matrix, with the individual switches on the intersections of x and y lines. When a switch is pressed, it connects the corresponding x and y lines together. Polling (often called scanning in this case) is done by activating each x line in sequence and detecting which y lines then have a [[digital signal|signal]], thus which keys are pressed. When the keyboard processor detects that a key has changed state, it sends a signal to the CPU indicating the scan code of the key and its new state. The symbol is then [[encoded]] or converted into a number based on the status of modifier keys and the desired [[character encoding]].


A custom [[Character encoding|encoding]] can be used for a specific application with no loss of data. However, using a standard encoding such as [[ASCII]] is problematic if a symbol such as 'ß' needs to be converted but is not in the standard.
A custom [[Character encoding|encoding]] can be used for a specific application with no loss of data. However, using a standard encoding such as [[ASCII]] is problematic if a symbol such as 'ß' needs to be converted but is not in the standard.
Line 26: Line 27:


== States ==
== States ==
Digital data come in these three states: [[data at rest]], [[data in transit]], and [[data in use]]. The [[CIA Triad|confidentiality, integrity, and availability]] have to be managed during the entire lifecycle from 'birth' to the destruction of the data.<ref>{{Cite web |title=The three states of information |url=https://www.ed.ac.uk/arts-humanities-soc-sci/about-us/information-security-and-governance/what-information-do-i-have-to-protect/the-three-states-of-information |access-date=21 February 2021 |website=The University of Edinburgh |language=en |archive-date=14 April 2021 |archive-url=https://web.archive.org/web/20210414042237/https://www.ed.ac.uk/arts-humanities-soc-sci/about-us/information-security-and-governance/what-information-do-i-have-to-protect/the-three-states-of-information |url-status=dead}}</ref>
[[File:3 states of data.jpg|thumb|The 3 states of data.]]
Digital data come in these three states: [[data at rest]], [[data in transit]], and [[data in use]].<ref>{{cite web|url=http://www.nortoninternetsecurity.cc/2011/03/data-loss-prevention.html |title=Data Loss Prevention &#124; Norton Internet Security |publisher=Nortoninternetsecurity.cc |date=2011-03-12 |accessdate=2012-12-26}}</ref><ref>{{Cite web |title=Data Protection: Data In transit vs. Data At Rest |url=https://www.digitalguardian.com/blog/data-protection-data-in-transit-vs-data-at-rest |access-date=2023-04-12 |website=Digital Guardian |language=en}}</ref> The [[CIA Triad|confidentiality, integrity, and availability]] have to be managed during the entire lifecycle from 'birth' to the destruction of the data.<ref>{{Cite web |title=The three states of information |url=https://www.ed.ac.uk/arts-humanities-soc-sci/about-us/information-security-and-governance/what-information-do-i-have-to-protect/the-three-states-of-information |access-date=21 February 2021 |website=The University of Edinburgh |language=en |archive-date=14 April 2021 |archive-url=https://web.archive.org/web/20210414042237/https://www.ed.ac.uk/arts-humanities-soc-sci/about-us/information-security-and-governance/what-information-do-i-have-to-protect/the-three-states-of-information |url-status=dead}}</ref>
 
===Data at rest===
[[File:DAR v DIU.JPG|thumb|Data at Rest vs Data in Use.]]
'''Data at rest''' in [[information technology]] means data that is housed physically on [[computer data storage]] in any digital form (e.g. [[cloud storage]], [[file hosting service]]s, [[database]]s, [[data warehouse]]s, [[spreadsheet]]s, archives, tapes, off-site or cloud backups, [[mobile device]]s etc.). Data at rest includes both structured and [[unstructured data]].<ref>{{Cite web|last=Pickell|first=Devin|title=Structured vs Unstructured Data – What's the Difference?|url=https://learn.g2.com/structured-vs-unstructured-data|accessdate=2020-11-17|website=learn.g2.com|language=en}}</ref> This type of data is subject to threats from hackers and other malicious threats to gain access to the data digitally or physical theft of the data storage media. To prevent this data from being accessed, modified or stolen, organizations will often employ security protection measures such as password protection, data encryption, or a combination of both. The security options used for this type of data are broadly referred to as '''data-at-rest protection''' ('''DARP''').<ref>{{cite web |url=https://www.webopedia.com/TERM/D/data_at_rest_protection.html?title= |title=Webopedia:Data at Rest|date=8 June 2007}}</ref>
 
Definitions include:
<blockquote>"...all data in computer storage while excluding data that is traversing a network or temporarily residing in computer memory to be read or updated."<ref name="techtarget1">{{cite web|url=http://searchstorage.techtarget.com/definition/data-at-rest |title=What is data at rest? - Definition from WhatIs.com |publisher=Searchstorage.techtarget.com |date=2012-12-22 |accessdate=2012-12-26}}</ref></blockquote>
 
<blockquote>"...all data in storage but excludes any data that frequently traverses the network or that which resides in temporary memory. Data at rest includes but is not limited to archived data, data which is not accessed or changed frequently, files stored on hard drives, USB thumb drives, files stored on backup tape and disks, and also files stored off-site or on a [[storage area network]] (SAN)."<ref>{{cite web|url=http://www.webopedia.com/TERM/D/data_at_rest.html |title=What is data at rest? - A Word Definition From the Webopedia Computer Dictionary |publisher=Webopedia.com |date= 8 June 2007|accessdate=2012-12-26}}</ref></blockquote>
 
While it is generally accepted that archive data (i.e. which never changes), regardless of its storage medium, is data at rest and active data subject to constant or frequent change is data in use. “Inactive data” could be taken to mean data which may change, but infrequently. The imprecise nature of terms such as “constant” and “frequent” means that some stored data cannot be comprehensively defined as either data at rest or in use. These definitions could be taken to assume that Data at Rest is a superset of data in use; however, data in use, subject to frequent change, has distinct processing requirements from data at rest, whether completely static or subject to occasional change.
 
====Security====
Because of its nature data at rest is of increasing concern to businesses, government agencies and other institutions.<ref name="techtarget1"/> Mobile devices are often subject to specific security protocols to protect data at rest from unauthorized access when lost or stolen<ref>{{Cite web |date=12 October 2006 |title=06-EC-O-0008: Data-At-Rest (DAR) Protection |url=http://www.gordon.army.mil/nec/documents/BBP%20Data%20at%20Rest.pdf |archive-url=https://web.archive.org/web/20161222160014/http://www.gordon.army.mil/nec/documents/BBP%20Data%20at%20Rest.pdf |archive-date=22 December 2016 |website=Department of the Army |series=Information Assurance Best Business Practice (IA BBP)}}</ref> and there is an increasing recognition that database management systems and file servers should also be considered as at risk;<ref>{{cite web|url=http://www.gartner.com/research/spotlight/asset_63248_895.jsp |archive-url=https://web.archive.org/web/20040502025744/http://www3.gartner.com/research/spotlight/asset_63248_895.jsp |url-status=dead |archive-date=May 2, 2004 |title=IT Research, Magic Quadrants, Hype Cycles |publisher=Gartner |date= |accessdate=2012-12-26}}</ref> the longer data is left unused in storage, the more likely it might be retrieved by unauthorized individuals outside the network.
 
[[Data encryption]], which prevents data visibility in the event of its unauthorized access or theft, is commonly used to protect data in motion and increasingly promoted for protecting data at rest.<ref>{{cite web|last=Inmon |first=Bill |url=http://www.information-management.com/issues/20050801/1033567-1.html |title=Encryption at Rest - Information Management Magazine Article |publisher=Information-management.com |date= August 2005|accessdate=2012-12-26}}</ref> The encryption of data at rest should only include strong encryption methods such as [[Advanced Encryption Standard|AES]] or [[RSA (algorithm)|RSA]]. Encrypted data should remain encrypted when access controls such as usernames and password fail. Increasing encryption on multiple levels is recommended. [[Cryptography]] can be implemented on the database housing the data and on the physical storage where the databases are stored. Data encryption keys should be updated on a regular basis. Encryption keys should be stored separately from the data. Encryption also enables [[crypto-shredding]] at the end of the data or hardware lifecycle. Periodic auditing of sensitive data should be part of policy and should occur on scheduled occurrences. Finally, only store the minimum possible amount of sensitive data.<ref>{{cite web|url=https://www.owasp.org/index.php/Cryptographic_Storage_Cheat_Sheet |title=Cryptographic Storage Cheat Sheet |publisher=OWASP |date= |accessdate=2012-12-26}}</ref>
 
[[Tokenization (data security)|Tokenization]] is a non-mathematical approach to protecting data at rest that replaces sensitive data with non-sensitive substitutes, referred to as tokens, which have no extrinsic or exploitable meaning or value. This process does not alter the type or length of data, which means it can be processed by legacy systems such as databases that may be sensitive to data length and type. Tokens require significantly less computational resources to process and less storage space in databases than traditionally encrypted data. This is achieved by keeping specific data fully or partially visible for processing and analytics while sensitive information is kept hidden. Lower processing and storage requirements makes tokenization an ideal method of securing data at rest in systems that manage large volumes of data.
 
A further method of preventing unwanted access to data at rest is the use of data federation<ref>{{cite web|url=http://www.ibm.com/developerworks/webservices/library/ws-soa-infoserv1 |title=Information service patterns, Part 1: Data federation pattern |publisher=Ibm.com |date= |accessdate=2012-12-26}}</ref> especially when data is distributed globally (e.g. in off-shore archives). An example of this would be a European organisation which stores its archived data off-site in the US. Under the terms of the [[USA PATRIOT Act]]<ref>{{cite web |url=http://www.fincen.gov/statutes_regs/patriot/index.html |title=USA Patriot Act |publisher=Fincen.gov |date=2002-01-01 |accessdate=2012-12-26 |url-status=dead |archiveurl=https://web.archive.org/web/20121228161833/http://www.fincen.gov/statutes_regs/patriot/index.html |archivedate=2012-12-28 }}</ref> the American authorities can demand access to all data physically stored within its boundaries, even if it includes personal information on European citizens with no connections to the US. Data encryption alone cannot be used to prevent this as the authorities have the right to demand decrypted information. A data federation policy which retains personal citizen information with no foreign connections within its country of origin (separate from information which is either not personal or is relevant to off-shore authorities) is one option to address this concern. However, data stored in foreign countries can be accessed using legislation in the [[CLOUD Act]].
 
===Data in use===
'''Data in use''' is an [[information technology]] term referring to active [[data]] which is stored in a non-persistent digital state or [[volatile memory]], typically in computer [[random-access memory]] (RAM), [[CPU cache]]s, or [[CPU register]]s.<ref name=":0a" />
 
''Data in use'' has also been taken to mean “active data” in the context of being in a database or being manipulated by an application. For example, some [[enterprise encryption gateway]] solutions for the cloud claim to encrypt data at rest, [[data in transit]] and [[data in use]].<ref>{{cite web|url=http://www.securityweek.com/ciphercloud-brings-encryption-microsoft-office-365 |title=CipherCloud Brings Encryption to Microsoft Office 365 |date= 18 July 2012|accessdate=2013-11-01}}</ref>
 
Some cloud [[software as a service]] (SaaS) providers refer to data in use as any data currently being processed by applications, as the CPU and memory are utilized.<ref name="GCN">{{cite web |url=http://gcn.com/Articles/2012/09/06/CipherCloud-enryption-for-cloud-apps.aspx |title=CipherCloud encrypts data across multiple cloud apps |publisher=Searchstorage.techtarget.com |date=2012-09-06 |accessdate=2013-11-08 |archive-date=2013-10-29 |archive-url=https://web.archive.org/web/20131029201159/http://gcn.com/articles/2012/09/06/ciphercloud-enryption-for-cloud-apps.aspx |url-status=dead }}</ref>
 
====Security====
Because of its nature,  data in use is of increasing concern to businesses, government agencies and other institutions. Data in use, or memory, can contain sensitive data including digital certificates, encryption keys, intellectual property (software algorithms, design data), and [[personally identifiable information]].  Compromising data in use enables access to encrypted data at rest and data in motion.  For example, someone with access to random access memory can parse that memory to locate the encryption key for data at rest. Once they have obtained that encryption key, they can decrypt encrypted data at rest. 
Threats to data in use can come in the form of [[cold boot attack]]s, malicious hardware devices, [[rootkit]]s and bootkits.
 
Encryption, which prevents data visibility in the event of its unauthorized access or theft, is commonly used to protect Data in Motion and Data at Rest and increasingly recognized as an optimal method for protecting Data in Use. There have been multiple projects to encrypt memory. Microsoft [[Xbox]] systems are designed to provide memory encryption and the company [[PrivateCore]] presently has a commercial software product vCage to provide attestation along with full memory encryption for x86 servers.<ref name="Government Computer News">[http://gcn.com/Articles/2014/03/12/data-in-use-encryption.aspx GCN, John Moore, March 12, 2014:"How to lock down data in use -- and in the cloud"]</ref> Several papers have been published highlighting the availability of security-enhanced x86 and ARM commodity processors.<ref name=":0a">M. Henson and S. Taylor [https://link.springer.com/chapter/10.1007%2F978-3-642-38980-1_19 "Beyond full disk encryption:protection on security-enhanced commodity processors"], "Proceedings of the 11th international conference on applied cryptography and network security", 2013</ref><ref>M. Henson and S. Taylor [http://dl.acm.org/citation.cfm?id=2566673, "Memory encryption: a survey of existing techniques"], "ACM Computing Surveys volume 46 issue 4", 2014</ref>  In that work, an [[ARM Cortex-A8]] processor is used as the substrate on which a full memory encryption solution is built.  Process segments (for example, stack, code or heap) can be encrypted individually or in composition.  This work marks the first full memory encryption implementation on a mobile general-purpose commodity processor.  The system provides both confidentiality and integrity protections of code and data which are encrypted everywhere outside the CPU boundary.
 
For x86 systems, AMD has a Secure Memory Encryption (SME) feature introduced in 2017 with [[Epyc]].<ref>{{cite web |title=Secure Memory Encryption (SME) - x86 |url=https://en.wikichip.org/wiki/x86/sme |website=WikiChip |language=en}}</ref> Intel has promised to deliver its Total Memory Encryption (TME) feature in an upcoming CPU.<ref>{{cite web |title=Total Memory Encryption (TME) - x86 |url=https://en.wikichip.org/wiki/x86/tme |website=WikiChip |language=en}}</ref><ref>{{cite news |last1=Salter |first1=Jim |title=Intel promises Full Memory Encryption in upcoming CPUs |url=https://arstechnica.com/gadgets/2020/02/intel-promises-full-memory-encryption-in-upcoming-cpus/ |work=Ars Technica |date=26 February 2020 |language=en-us}}</ref>
 
Operating system kernel patches such as [[TRESOR]] and Loop-Amnesia modify the operating system so that CPU registers can be used to store encryption keys and avoid holding encryption keys in RAM. While this approach is not general purpose and does not protect all data in use, it does protect against cold boot attacks.  Encryption keys are held inside the CPU rather than in RAM so that data at rest encryption keys are protected against attacks that might compromise encryption keys in memory.
 
Enclaves enable an “enclave” to be secured with encryption in RAM so that enclave data is encrypted while in RAM but available as clear text inside the CPU and CPU cache.  Intel Corporation has introduced the concept of “enclaves” as part of its [[Software Guard Extensions]].  Intel revealed an architecture combining software and CPU hardware in technical papers published in 2013.<ref name="Securosis Blog">{{cite web|url=https://securosis.com/blog/intel-software-guard-extensions-sgx-is-mighty-interesting  |title= Intel Software Guard Extensions (SGX) Is Mighty Interesting |publisher=Securosis |date=2013-07-15 | accessdate=2013-11-08}}</ref>
 
Several cryptographic tools, including [[secure multi-party computation]] and [[homomorphic encryption]], allow for the private computation of data on untrusted systems. Data in use could be operated upon while encrypted and never exposed to the system doing the processing.
 
===Data in transit===
'''Data in transit''', also referred to as '''data in motion'''<ref>{{Cite web |url=https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf |title=Data in motion and data in transit both used on cloudsecurityalliance.org |access-date=2016-04-18 |archive-url=https://web.archive.org/web/20160415105252/https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf |archive-date=2016-04-15 |url-status=dead }}</ref> and '''data in flight''',<ref>{{Cite web|url=https://cacm.acm.org/magazines/2010/1/55738-data-in-flight/fulltext|title = Data in Flight &#124; January 2010 &#124; Communications of the ACM| date=January 2010 }}</ref> is data en route between source and destination, typically on a [[computer network]].
 
Data in transit can be separated into two categories: information that flows over the public or untrusted network such as the Internet and data that flows in the confines of a private network such as a corporate or enterprise [[local area network]] (LAN).<ref>[http://www.sans.org/reading-room/analysts-program/encryption-Nov07 SANS White Paper on Encryption]</ref>[[File:Data types - en.svg|thumb|Various types of data which can be visualized through a computer device.]]
 
== In computing ==
Data within a computer, in most cases, [[Parallel communication|moves as parallel data]]. Data moving to or from a computer, in most cases, [[Serial communication|moves as serial data]]. Data sourced from an analog device, such as a temperature sensor, may be converted to digital using an [[analog-to-digital converter]]. Data representing [[quantities]], characters, or symbols on which operations are performed by a [[computer]] are [[Data storage|stored]] and [[Record (computer science)|recorded]] on [[magnetic tape data storage|magnetic]], [[optical storage|optical]], electronic, or mechanical recording media, and [[Data communication|transmitted]] in the form of digital electrical or optical signals.<ref>{{cite web|url=https://www.lexico.com/en/definition/data|title=Data|work=Lexico|access-date=14 January 2022|url-status=dead|archive-url=https://web.archive.org/web/20190623094330/https://www.lexico.com/en/definition/data |archive-date=2019-06-23 }}</ref> Data pass in and out of computers via [[peripheral device]]s.
 
Physical [[computer memory]] elements consist of an address and a byte/word of data storage. Digital data are often stored in [[Relational database#RDBMS|relational databases]], like [[table (database)|tables]] or SQL databases, and can generally be represented as abstract key/value pairs. Data can be organized in many different types of [[data structure]]s, including arrays, [[Graph (abstract data type)|graphs]], and [[Object (computer science)|objects]]. Data structures can store data of many different [[data type|types]], including [[Floating-point arithmetic|numbers]], [[string (computer science)|strings]] and even other [[Recursive data type|data structures]].
 
=== Characteristics ===
[[Metadata]] helps translate data to information. Metadata is data about the data. Metadata may be implied, specified or given. 
 
Data relating to physical events or processes will have a temporal component. This temporal component may be implied. This is the case when a device such as a temperature logger receives data from a temperature [[sensor]]. When the temperature is received it is assumed that the data has a temporal reference of ''now''. So the device records the date, time and temperature together. When the data logger communicates temperatures, it must also report the date and time as metadata for each temperature reading.
 
Fundamentally, computers follow a sequence of instructions they are given in the form of data. A set of instructions to perform a given task (or tasks) is called a ''[[computer program|program]]''. A program is data in the form of coded instructions to control the operation of a computer or other machine.<ref>{{cite web|url=http://www.encyclopedia.com/topic/computer_program.aspx#2|title=Computer program|work=The Oxford pocket dictionary of current english|access-date=11 October 2012|url-status=live|archive-url=https://web.archive.org/web/20111128202415/http://www.encyclopedia.com/topic/computer_program.aspx#2|archive-date=28 November 2011}}</ref> In the nominal case, the program, as [[Execution (computing)|executed]] by the computer, will consist of [[machine code]]. The elements of [[computer data storage|storage]] manipulated by the program, but not actually executed by the [[central processing unit]] (CPU), are also data. At its most essential, a single datum is a [[Value (computer science)|value]] stored at a specific location. Therefore, it is possible for computer programs to operate on other computer programs, by manipulating their programmatic data.
 
To store data [[byte]]s in a file, they have to be [[Serialization|serialized]] in a [[file format]]. Typically, programs are stored in special file types, different from those used for other data. [[Executable file]]s contain programs; all other files are also [[data file]]s. However, executable files may also contain data used by the program which is built into the program. In particular, some executable files have a [[data segment]], which nominally contains constants and initial values for variables, both of which can be considered data.
 
The line between program and data can become blurry. An [[interpreter (computing)|interpreter]], for example, is a program. The input data to an interpreter is itself a program, just not one expressed in native [[machine language]]. In many cases, the interpreted program will be a human-readable [[text file]], which is manipulated with a [[text editor]] program. [[Metaprogramming]] similarly involves programs manipulating other programs as data. Programs like [[compiler]]s, [[Linker (computing)|linker]]s, [[debugger]]s, program updaters, [[virus scanner]]s and such use other programs as their data.
 
For example, a [[user (computing)|user]] might first instruct the [[operating system]] to load a [[word processor]] program from one file, and then use the running program to open and edit a [[Document file format|document]] stored in another file. In this example, the document would be considered data. If the word processor also features a [[spell checker]], then the dictionary (word list) for the spell checker would also be considered data. The [[algorithm]]s used by the spell checker to suggest corrections would be either [[machine code]] data or text in some interpretable [[programming language]].
 
In an alternate usage, [[binary file]]s (which are not [[human-readable]]) are sometimes called ''data'' as distinguished from human-readable ''[[text file|text]]''.<ref>{{cite web|url=https://man.openbsd.org/file.1|title=file(1)|work=OpenBSD manual pages|date=24 December 2015|access-date=4 February 2018|url-status=live|archive-url=https://web.archive.org/web/20180205000843/https://man.openbsd.org/file.1|archive-date=5 February 2018}}</ref>
 
The total amount of digital data in 2007 was estimated to be 281 billion [[gigabyte]]s (281 [[exabyte]]s).<ref>{{cite news|author=Paul, Ryan|title=Study: amount of digital info > global storage capacity|url=https://arstechnica.com/news.ars/post/20080312-study-amount-of-digital-info-global-storage-capacity.html|date=12 March 2008|publisher=Ars Technics|access-date=13 March 2008|url-status=live|archive-url=https://web.archive.org/web/20080313111238/http://arstechnica.com/news.ars/post/20080312-study-amount-of-digital-info-global-storage-capacity.html|archive-date=13 March 2008}}</ref><ref>{{cite web|author=Gantz, John F.|title=The diverse and exploding digital universe|url=http://www.emc.com/leadership/digital-universe/expanding-digital-universe.htm|publisher=International Data Corporation via EMC|year=2008|access-date=12 March 2008|display-authors=etal |url-status=dead |archive-url=https://web.archive.org/web/20080311234210/http://www.emc.com/leadership/digital-universe/expanding-digital-universe.htm |archive-date=11 March 2008}}</ref>
 
=== Data keys and values, structures and persistence ===
Keys in data provide the context for values. Regardless of the structure of data, there is always a key component present. Keys in data and data-structures are essential for giving meaning to data values. Without a key that is directly or indirectly associated with a value, or collection of values in a structure, the values become meaningless and cease to be data. That is to say, there has to be a key component linked to a value component in order for it to be considered data.{{cn|date=August 2021}}
 
Data can be represented in computers in multiple ways, as per the following examples:
====RAM====
[[Random access memory]] (RAM) holds data that the CPU has direct access to. A CPU may only manipulate data within its [[processor register]]s or memory. This is as opposed to data storage, where the CPU must direct the transfer of data between the storage device (disk, tape...) and memory. RAM is an array of linear contiguous locations that a processor may read or write by providing an address for the read or write operation. The processor may operate on any location in memory at any time in any order. In RAM the smallest element of data is the binary [[bit]]. The capabilities and limitations of accessing RAM are processor specific. In general [[Computer data storage|main memory]] is arranged as an array of [[Memory address|locations]] beginning at address 0 ([[hexadecimal]] 0). Each location can store usually 8 or 32 bits depending on the [[computer architecture]].
 
====Keys====
Data keys need not be a direct hardware address in memory. [[Indirection|Indirect]], abstract and logical keys codes can be stored in association with values to form a [[data structure]]. Data structures have predetermined [[Offset (computer science)|offsets]] (or links or paths) from the start of the structure, in which data values are stored. Therefore, the data key consists of the key to the structure plus the offset (or links or paths) into the structure. When such a structure is repeated, storing variations of the data values and the data keys within the same repeating structure, the result can be considered to resemble a [[Table (information)|table]], in which each element of the repeating structure is considered to be a column and each repetition of the structure is considered as a row of the table. In such an organization of data, the data key is usually a value in one (or a composite of the values in several) of the columns.
 
====Organised recurring data structures====
The [[Table (information)|tabular]] view of repeating data structures is only one of many possibilities. Repeating data structures can be organised [[hierarchically]], such that nodes are linked to each other in a cascade of parent-child relationships. Values and potentially more complex data-structures are linked to the nodes. Thus the nodal hierarchy provides the key for addressing the data structures associated with the nodes. This representation can be thought of as an [[inverted tree]]. Modern computer operating system [[file system]]s are a common example; and [[XML]] is another.
 
====Sorted or ordered data====
Data has some inherent features when it is [[Collation|sorted on a key]]. All the values for subsets of the key appear together. When passing sequentially through groups of the data with the same key, or a subset of the key changes, this is referred to in data processing circles as a break, or a [[control break]]. It particularly facilitates the aggregation of data values on subsets of a key.
 
====Peripheral storage====
Until the advent of bulk [[non-volatile memory]] like [[Flash memory|flash]], persistent data storage was traditionally achieved by writing the data to [[Peripheral|external block devices like magnetic tape and disk drives]]. These devices typically seek to a location on the magnetic media and then read or write [[Block (data storage)|blocks of data]] of a predetermined size. In this case, the seek location on the media, is the data key and the blocks are the data values. Early used ''raw disk'' data file-systems or disc operating systems reserved [[Fragmentation (computing)|contiguous]] blocks on the disc drive for [[data file]]s. In those systems, the files could be filled up, running out of data space before all the data had been written to them. Thus much unused data space was reserved unproductively to ensure adequate free space for each file. Later file-systems introduced [[Partition type|partitions]]. They reserved blocks of disc data space for partitions and used the allocated blocks more economically, by dynamically assigning blocks of a partition to a file as needed. To achieve this, the file system had to keep track of which blocks were used or unused by data files in a catalog or file allocation table. Though this made better use of the disc data space, it resulted in fragmentation of files across the disc, and a concomitant performance overhead due additional seek time to read the data. Modern file systems reorganize fragmented files dynamically to optimize file access times. Further developments in file systems resulted in [[virtualization]] of disc drives i.e. where a logical drive can be defined as partitions from a number of physical drives.
 
====Indexed data====
Retrieving a small subset of data from a much larger set may imply inefficiently searching through the data sequentially. '''[[Database index|Index]]es''' are a way to copy out keys and location addresses from data structures in files, tables and data sets, then organize them using [[Tree (data structure)|inverted tree structures]] to reduce the time taken to retrieve a subset of the original data. In order to do this, the key of the subset of data to be retrieved must be known before retrieval begins. The most popular indexes are the [[B-tree]] and the dynamic [[Hash function|hash]] key indexing methods. Indexing is overhead for filing and retrieving data. There are other ways of organizing indexes, e.g. sorting the keys and using a [[binary search algorithm]].
 
====Abstraction and indirection====
[[Object-oriented programming]] uses two basic concepts for understanding data and software:
# The taxonomic rank-structure of ''[[Class (programming)|classes]]'', which is an example of a hierarchical data structure; and
# at run time, the creation of references to in-memory data-structures of objects that have been [[Instance (computer science)|instantiated]] from a [[class library]].
It is only after instantiation that an object of a specified class exists. After an object's reference is cleared, the object also ceases to exist. The memory locations where the object's data was stored are [[Garbage collection (computer science)|garbage]] and are reclassified as unused memory available for reuse.
====Database data====
The advent of [[database]]s introduced a further [[layer of abstraction]] for persistent data storage. Databases use [[metadata]], and a [[structured query language]] protocol between [[Client–server model|client and server]] systems, communicating over a [[computer network]], using a [[two phase commit]] logging system to ensure [[Database transaction|transactional]] completeness, when saving data.
 
====Parallel distributed data processing====
Modern scalable and high-performance data persistence technologies, such as [[Apache Hadoop]], rely on massively parallel distributed data processing across many commodity computers on a high bandwidth network. In such systems, the data is distributed across multiple computers and therefore any particular computer in the system must be represented in the key of the data, either directly, or indirectly. This enables the differentiation between two identical sets of data, each being processed on a different computer at the same time.


== Properties of digital information ==
== Properties of digital information ==
Line 53: Line 156:
* [[Analog-to-digital converter]]
* [[Analog-to-digital converter]]
* [[Barker code]]
* [[Barker code]]
* [[Big data]]
* [[Binary number]]
* [[Binary number]]
* [[Comparison of analog and digital recording]]
* [[Comparison of analog and digital recording]]
* [[Computer data storage]]
* [[Computer data storage]]
* [[Data (computer science)]]
* [[Data]]
* [[Data dictionary]]
* [[Data modeling]]
* [[Data remanence]]
* [[Data remanence]]
* [[Data stream]]
* [[Data set]]
* [[Database index]]
* [[Digital architecture]]
* [[Digital architecture]]
* [[Digital art]]
* [[Digital art]]
Line 71: Line 180:
* [[Digital-to-analog converter]]
* [[Digital-to-analog converter]]
* [[Internet forum]]
* [[Internet forum]]
* [[State (computer science)]]
* [[Tuple]]
{{div col end}}
{{div col end}}


Line 79: Line 190:
* Tocci, R. 2006. Digital Systems: Principles and Applications (10th Edition). Prentice Hall. {{ISBN|0-13-172579-3}}
* Tocci, R. 2006. Digital Systems: Principles and Applications (10th Edition). Prentice Hall. {{ISBN|0-13-172579-3}}


{{data}}
{{Digital systems}}
{{Digital systems}}
{{Authority control}}


{{DEFAULTSORT:Digital data}}
{{DEFAULTSORT:Digital data}}
[[Category:Computer data]]
[[Category:Consumer electronics]]
[[Category:Digital media]]
[[Category:Digital media]]
[[Category:Computer data]]
[[Category:Digital systems]]
[[Category:Digital systems]]
[[Category:Digital technology]]
[[Category:Digital technology]]
[[Category:Consumer electronics]]