Skip to contents

As Predictive Analytics Group’s (PAG) work becomes interwoven with other units, all parties need to learn more about each other’s lexicon and define a common lexicon to reduce confusion and allow efficient discussions.

Within PAG, we strive to define terms that are easy to understand and develop a lexicon that does not fall into the problem of technical jargon. Specifically, terms that obfuscate a concept make it hard to understand and avoid defining a term via mutual recursion; two concepts are defined in terms of each other, thus creating a loop. PAG does not shy away from using the appropriate terminology and works to educate others about the importance of using the terms correctly.

General Terms

  • Analysis: Detailed examination of an object’s elements or structure.
  • Research: Iterative, systematic investigation and study to establish facts and reach new conclusions.
  • Data: Collection of features (descriptors) about a group or event that includes dates, demographics, location, actions, and calculated values.
  • Information: Transformed data that assists in decision-making.
  • Model: Representation of an object used to explain the object and make predictions. Often, models are presented as an equation composed of features and constructed by an algorithm to represent an observation/outcome from a collection of samples.

Applicant Actions

  • Submit: Applied to MSU but not admitted.
  • Admit: Applied to MSU and admitted.
  • Deposit: Applied to MSU, admitted, and deposited.
  • Cancel: Applied to MSU, admitted, deposited, and cancelled deposit. The deposited applicant informs MSU that they are not enrolling in or attending classes at MSU before the first day of classes.

Note: The Predictive Analytics Group (PAG) within Enrollment and Academic Strategic Planning no longer uses the term melt to describe those cancelling their deposits to MSU. The term melt, while used by numerous institutions and offices, is ambiguous due to it being applied to those who withdraw during the quarter term of their first semester at an institution. For these reasons, PAG uses the term cancel for those who cancel their deposits and withdraw for those withdrawing on or after the first day of classes of their first term.

  • Withdraw: On or after the first day of class, an applicant who deposited and enrolled in classes at MSU informs MSU that they will no longer be a student at MSU. Depending on when the student withdraws from MSU, they may receive a refund.

Note: FERPA defines a student as: “Any individual with respect to whom the University maintains education records and has been in attendance at the University. Attendance is defined as enrollment or participation in a collegiate level, University-sponsored program or course, regardless of program level; full-time or part-time status; credit, degree, or certificate awarded; location; or mode of instruction. No student shall be required to waive their rights under FERPA as a condition of admission or for the receipt of any services or benefits.” Based on the FERPA definition, PAG views a student as one that “participat[es] in a collegiate level, University-sponsored program or course” and thus, cancellations on or after the first day of class is classified as a withdrawal.

Cohorts & Samples

  • Sample: An individual observation.
  • Cohort: Group of individual observations.
  • Training Set: A cohort (or subset) from a larger cohort used to train a model.
  • Test Set: A cohort (or subset) used to evaluate a model extracted from the larger cohort used to construct the model. This set is not included in the model training.
  • External Test Set: A new cohort, unrelated to training or test set, used to evaluate a model.
  • ____ of Interest: A cohort or subset of interest; abbrev _oI.

Modelling & Similarity

  • Descriptor: A measurement of a sample or a value derived from other measurements for a sample. Sometimes referred to as Feature.
  • Fingerprint: A collection of features (aka descriptors) used to determine the similarity between samples. The fingerprint can be applied to new samples for (dis)similarity calculations.
  • Prediction: Calculated outcome from a model that may or may not reflect reality.
  • Normalizing: Modifying a column of descriptors by subtracting the mean (average) from each value in the column, followed by dividing each value by the standard deviation. The resulting scaled and centered values are the number of standard deviations for the sample’s value from the mean of all values in the column. Sometimes referred to as Scaling & Centering.
  • Similarity: A numerical value indicating the amount of similarity (or dissimilarity) between a pair of samples. Depending on the similarity method used the returned value ranges between 0 and 1 (e.g., binary, Jaccard, and Tanimoto) or –1 and 1 (e.g., xxxx) or 0 and infinity (e.g., Euclidian and Manhattan distances).