Skip to content

2.3 Extracting Information from Data 从数据中提取信息

Programs can be used to process data, which allows users to discover information and create new knowledge.

  • 程序可用于处理数据,这允许用户发现信息并创建新知识。

核心要点 Core Points

  1. Information is the collection of facts and patterns extracted from data.

    • 信息是从数据中提取的事实和模式的集合。
  2. Data provide opportunities for identifying trends, making connections, and addressing problems.

    • 数据为识别趋势、建立联系和解决问题提供了机会。
  3. Digitally processed data may show correlation between variables. A correlation found in data does not necessarily indicate that a causal relationship exists. Additional research is needed to understand the exact nature of the relationship.

    • 数字处理的数据可能显示变量之间的相关性。在数据中发现的相关性不一定表明存在因果关系。需要进一步研究来理解关系的确切性质。
  4. Often, a single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.

    • 通常,单一来源不包含得出结论所需的数据。可能需要结合来自各种来源的数据来制定结论。
  5. Metadata are data about data. For example, the piece of data may be an image, while the metadata may include the date of creation or the file size of the image.

    • 元数据是关于数据的数据。例如,数据片段可能是图像,而元数据可能包括创建日期或图像的文件大小。
  6. Changes and deletions made to metadata do not change the primary data.

    • 对元数据的更改和删除不会改变主要数据。
  7. Metadata are used for finding, organizing, and managing information.

    • 元数据用于查找、组织和管理信息。
  8. Metadata can increase the effective use of data or data sets by providing additional information.

    • 元数据可以通过提供额外信息来增加数据或数据集的有效使用。
  9. Metadata allow data to be structured and organized.

    • 元数据允许数据被结构化和组织。
  10. The ability to process data depends on the capabilities of the users and their tools.

    • 处理数据的能力取决于用户及其工具的能力。
  11. Data sets pose challenges regardless of size, such as:

    • 数据集无论大小都带来挑战,例如:
    • the need to clean data 需要清理数据
    • incomplete data 不完整的数据
    • invalid data 无效数据
    • the need to combine data sources 需要结合数据源
  12. Depending on how data were collected, they may not be uniform. For example, if users enter data into an open field, the way they choose to abbreviate, spell, or capitalize something may vary from user to user.

    • 根据数据收集的方式,数据可能不统一。例如,如果用户将数据输入到开放字段中,他们选择缩写、拼写或大写的方式可能因用户而异。
  13. Cleaning data is a process that makes the data uniform without changing their meaning (e.g., replacing all equivalent abbreviations, spellings, and capitalizations with the same word).

    • 清理数据是一个使数据统一而不改变其含义的过程(例如,用同一个词替换所有等效的缩写、拼写和大写)。
  14. Problems of bias are often created by the type or source of data being collected. Bias is not eliminated by simply collecting more data.

    • 偏见问题通常由正在收集的数据类型或来源造成。仅仅收集更多数据并不能消除偏见。
  15. The size of a data set affects the amount of information that can be extracted from it.

    • 数据集的大小影响可以从其中提取的信息量。
  16. Large data sets are difficult to process using a single computer and may require parallel systems.

    • 大型数据集难以使用单台计算机处理,可能需要并行系统。
  17. Scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.

    • 在处理数据集时,系统的可扩展性是一个重要的考虑因素,因为系统的计算能力影响数据集的处理和存储方式。

学生活动 Student Activities

  1. Describe what information can be extracted from data.

    • 描述可以从数据中提取什么信息。
  2. Describe what information can be extracted from metadata.

    • 描述可以从元数据中提取什么信息。
  3. Identify the challenges associated with processing data.

    • 识别与处理数据相关的挑战。

必背单词

  1. information 信息
  2. pattern 模式
  3. extract 提取,提炼
  4. correlation 相互关系
  5. causal relationship 因果关系
  6. formulate 制定,规划
  7. metadata 元数据
  8. capability 能力
  9. invalid 无效的
  10. abbreviate 缩写
  11. equivalent 等同的,等效的
  12. parallel 并行的;并联的
  13. scalability 可拓展性,可伸缩性

基于 VitePress 构建的 AP CSP 学习平台