Please enable JavaScript to use CodeHS

Standards Framework

for Arkansas Data Science

78

Standards in this Framework

Standard Description
1.1.1 Identify the key stages of a data science project lifecycle.
1.1.2 Identify key roles and their responsibilities in a data science team (e.g., business stakeholders, define objectives; data engineers, build pipelines; data scientist, develop models; domain experts, provide expertise).
1.1.3 Define and create project goals and deliverables (e.g., problem statements, success metrics, expected outcomes, final reports, summary presentations).
1.1.4 Create and manage project timelines (e.g., milestones, deadlines, task dependencies, resources allocation).
1.1.5 Create a student portfolio including completed data science projects, reports, and other student-driven accomplishments.
1.2.1 Collaborate in team-based projects (e.g., team discussions, maintaining project logs, following protocols, code review, documentation).
1.2.2 Communicate technical findings to non-technical audiences (e.g., data visualizations, present key-insights, explaining complex concepts).
1.2.3 Make data-driven decisions and recommendations by proposing solutions and evaluating alternatives.
1.3.1 Identify ethical considerations in data collection, storage and usage (e.g., data privacy, bias, transparency, consent).
1.3.2 Demonstrate responsible data handling practices (e.g., protecting sensitive information, citing data sources, maintaining data integrity).
1.3.3 Report results responsibly (e.g., addressing limitations, acknowledging uncertainties, prevent misinterpretation).
2.1.1 Differentiate between discrete and continuous probability distributions.
2.1.2 Calculate probabilities using discrete distributions (e.g. Uniform, Binomial, Poisson).
2.1.3 Calculate probabilities using continuous distributions (e.g. Uniform, Normal, Student, Exponential).
2.1.4 Apply Bayes’ Theorem to calculate posterior probabilities.
2.2.1 Calculate p-values using a programming library and interpret the significance of the results.
2.2.2 Perform hypothesis testing.
2.2.3 Identify and explain Type I and Type II Errors (e.g., false-positives, false-negatives).
2.2.4 Calculate and interpret confidence intervals.
2.2.5 Design and analyze experiments to compare outcomes (e.g., identifying control/treatment groups, selecting sample sizes, determining variables, implementing A/B tests).
2.3.1 Perform basic matrix operations including addition, subtraction and scalar multiplication.
2.3.2 Calculate dot products and interpret their geometric meaning.
2.3.3 Apply matrix transformations to data sets.
2.3.4 Compute and interpret distances between vectors.
3.1.1 Create and manipulate (e.g., sort, filter, aggregate, reshape, merge, extract, clean, transform, subset) one-dimensional data structures for computation analysis (e.g lists, arrays, series).
3.1.2 Create and manipulate (e.g., transpose, join, slice, pivot, reshape) two-dimensional data structures for organizing structured datasets. (e.g. matrices, dataframes).
3.1.3 Utilize operations (e.g., arithmetic, aggregations, transformations) across data structures based on analytical needs.
3.1.4 Apply indexing methods to select and filter data based on position, labels, and conditions.
3.2.1 Import data into a DataFrame from common spreadsheets formats (e.g., csv, xlsx).
3.2.2 Import data into a DataFrame directly from a database (e.g., using SQLalchemy library).
3.2.3 Import data into a DataFrame using web scraping libraries (e.g. Beautiful Soup, Selenium).
3.2.4 Import data into a DataFrame leveraging API requests (e.g., Requests, urllib).
3.3.1 Convert between data types as needed for analysis (e.g., strings to numeric values, dates to timestamps, categorical to numeric encoding).
3.3.2 Convert between structures as needed for analysis (e.g., lists to arrays, arrays to data frames).
3.3.3 Standardize and clean text data (e.g., remove whitespace, correct typos, standardize formats).
3.3.4 Identify and remove duplicate or irrelevant rows/records.
3.3.5 Restructure columns/fields for analysis (e.g., splitting, combining, renaming, removing irrelevant data).
3.3.6 Apply masking operations to filter and select data.
3.3.7 Handle missing and invalid data values using appropriate methods (e.g., removal, imputation, interpolation).
3.3.8 Identify and handle outliers using statistical methods.
3.4.1 Examine data structures using preview and summary methods (e.g., head, info, shape, describe).
3.4.2 Create new data frames by merging or joining two data frames.
3.4.3 Sort and group records based on conditions and/or attributes.
3.4.4 Create functions to synthesize features from existing variables (e.g., mathematical operations, scaling, normalization).
4.1.1 Generate histograms and density plots to display data distributions.
4.1.2 Create box plots and violin plots to show data spread and quartiles.
4.1.3 Construct Q-Q plots to assess data normality.
4.2.1 Generate scatter plots and pair plots to show relationships between variables.
4.2.2 Generate correlation heatmaps to display feature relationships.
4.2.3 Plot decision boundaries to visualize data separations.
4.3.1 Generate bar charts and line plots to compare categorical data.
4.3.2 Create heat maps to display confusion matrices and tabular comparisons.
4.3.3 Plot ROC curves and precision-recall curves to evaluate classifications.
4.4.1 Generate line plots to show trends over time.
4.4.2 Create residual plots to analyze prediction errors.
4.4.3 Plot moving averages and trend lines.
4.5.1 Draw conclusions by interpreting statistical measures (e.g., p-values, confidence intervals, hypothesis test results).
4.5.2 Evaluate model performance using appropriate metrics and visualizations (e.g., R-squared, confusion matrix, residual plots).
4.5.3 Identify patterns, trends, and relationships in data visualizations (e.g., correlation strength, outliers, clusters).
4.5.4 Draw actionable insights from analysis results.
5.1.1 Describe the key characteristics of Big Data (e.g., Volume, Velocity, Variety, Veracity).
5.1.2 Identify real-world applications of Big Data across industries (e.g., healthcare, finance, retail, social media).
5.1.3 Analyze case studies of successful and unsuccessful Big Data implementations across industries (e.g., recommendation systems, fraud detection, predictive maintenance).
5.1.4 Identify common Big Data platforms and tools (e.g., Hadoop for distributed storage, Spark for data processing, Tableau for visualization, MongoDB for unstructured data).
5.2.1 Describe how organizations store structured and unstructured data.
5.2.2 Compare different types of data storage systems (e.g., data warehouse, data lakes, databases).
6.1.1 Contrast supervised and unsupervised learning.
6.1.2 Differentiate between classification and regression problems.
6.1.3 Evaluate model performance using appropriate metrics (e.g. Accuracy, Precision/Recall, Mean Squared Error, R-squared).
6.2.1 Perform linear regression for prediction problems.
6.2.2 Perform multiple regression for prediction problems.
6.2.3 Perform logistic regression for classification tasks.
6.2.4 Implement Naive Bayes Classification using probability concepts.
6.2.5 Perform k-means clustering using distance metrics.
6.3.1 Apply standard methods to split data into training and testing sets.
6.3.2 Apply cross-validation techniques (e.g. k-fold, leave-one-out, stratified k-fold).
6.3.3 Identify and address overfitting/underfitting.
6.3.4 Select appropriate models based on data characteristics and problem requirements.