11 References
Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). Wiley.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Bechard, S., Karvonen, M., & Erickson, K. (2021). Opportunities and challenges of applying cognitive process dimensions to map-based learning and alternate assessment. Frontiers in Education, 6, Article 653693. https://doi.org/10.3389/feduc.2021.653693
Bradshaw, L. (2016). Diagnostic classification models. In A. A. Rupp & J. Leighton (Eds.), The handbook of cognition and assessment: Frameworks, methodologies, and applications (1st ed., pp. 297–327). John Wiley & Sons. https://doi.org/10.1002/9781118956588.ch13
Chapelle, C. A. (2021). Argument-based validation in testing and assessment. SAGE Publications. https://doi.org/10.4135/9781071878811
Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43(6), 551–558. https://doi.org/10.1016/0895-4356(90)90159-M
Clark, A., Kobrin, J. L., Karvonen, M., & Hirt, A. (2023). Teacher use of diagnostic score reports for instructional decision-making in the subsequent academic year. Practical Assessment, Research, and Evaluation, 28(1), Article 6. https://doi.org/10.7275/pare.1255
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555
Dynamic Learning Maps Consortium. (2022). 2021–2022 Technical Manual—Instructionally Embedded Model. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2023). 2022–2023 Technical Manual Update—Instructionally Embedded Model. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2024). 2023–2024 Technical Manual Update—Instructionally Embedded Model. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2025a). 2024–2025 Technical Manual Update—Science. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2025b). Accessibility Manual 2024–2025. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2025c). Educator Portal User Guide. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2025d). Test Administration Manual 2024–2025. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Erickson, K., & Karvonen, M. (2025). Access to the general education curriculum for students who communicate using only non-symbolic modes [Research synopsis]. University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Non-Symbolic_Language_Synopsis_Update_62025.pdf#page=1.25
Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), 543–549. https://doi.org/10.1016/0895-4356(90)90158-L
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277. https://doi.org/10.1177/0146621604272623
Henson, R., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349. https://doi.org/10.1207/S15324818AME1404_2
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635–664. https://doi.org/10.1111/jedm.12196
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365. https://doi.org/10.1037/0033-2909.111.2.361
Nitsch, C. (2013). Dynamic Learning Maps: The Arc parent focus groups. The Arc. https://dynamiclearningmaps.org/sites/default/files/documents/publication/TheArcParentFocusGroups.pdf
O’Leary, S., Lund, M., Ytre-Hauge, T. J., Holm, S. R., Naess, K., Dalland, L. N., & McPhail, S. M. (2014). Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies. Physiotherapy, 100(1), 27–35. https://doi.org/10.1016/j.physio.2013.08.002
Pontius, R. G., Jr., & Millones, M. (2011). Death to kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing, 32(15), 4407–4429. https://doi.org/10.1080/01431161.2011.552923
Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30(2), 251–275. https://doi.org/10.1007/s00357-013-9129-4
Thompson, W. J., Clark, A. K., & Nash, B. (2019). Measuring the reliability of diagnostic mastery classifications at multiple levels of reporting. Applied Measurement in Education, 32(4), 298–309. https://doi.org/10.1080/08957347.2019.1660345
Thompson, W. J., Nash, B., Clark, A. K., & Hoover, J. C. (2023). Using simulated retests to estimate the reliability of diagnostic assessment systems. Journal of Educational Measurement, 60(3), 455–475. https://doi.org/10.1111/jedm.12359
Wine, M., & Hoffman, A. (2021). Rigorous item feedback. AleDev Research & Consulting. https://static1.squarespace.com/static/61bc9f534dff2416f1f63492/t/64552da742e14c6c663536b6/1683303847762/Wine%2C+M.+%26+Hoffman%2C+A.+%282021%29.+Rigorous+Item+Feedback.+AleDev+Research.pdf
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF [Working Paper]. University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioral Science.