After reading this week's article on EDW, I got into a reflection on origin stories. This one seemed to provide a helpful service of linking to many research publications that must have had some impact, but I suspect there is more to the story. The article seems to function better as the 2nd chapter to a dissertation, than as an origin story. Partitioners need to be more self-aware of their work and its origin to be effective advocates.
Can we help? It might be too early to know, but what do you think have been the major contributions to formation of the analytic movement in education? How did we get here? Interested in all perspectives and influences.
Baker, S.J.D., Yacef, K. (2009) The State of Educational Data Mining in 2009: A Review and Future Visions: http://www.educationaldatamining.org/JEDM/images/articles/vol1/issue1/JEDMVol1Issue1_BakerYacef.pdf
I liked this paper. Nice easy intro to the area. One thing I didn't get was in section 2 when Baker discusses the differences between EDM and DM in general. He mentions that EDM has to account for multi-level hierarchy - any ideas what he is talking about here. Is it pre-requisite relationships in concepts or is it the hierarchical element found in the classroom in terms of students and teachers?
Good call out. When I read that part I assumed it was the nested nature of school settings with students measured by teachers, in classes, in schools, in districts, etc... Many have found hiearchical linear models to help here. These complexities in the measurement model are probably somewhat unique to education and need some additional work beyond most business applications.
That said, I suspect it's true also that the analytic applications havn't focused enough on the the conceptual relationships amoungst their measures and items. I've been reading about Rasch analytic techniques and suspect their application across assignments/courses/programs would strengthen our measures of learning outcomes.
It has been proposed that educational data mining methods are often different from standard data mining methods, due to the need to explicitly account for (and the opportunities to exploit) the multi-level hierarchy and non-independence in educational data [Baker in press].
which I agree is confusing. So I searched for the forward reference (Baker, in press) and found the following:
Educational data mining methods often differ from methods from the broader data mining
literature, in explicitly exploiting the multiple levels of meaningful hierarchy in educational data.
Methods from the psychometrics literature are often integrated with methods from the machine
learning and data mining literatures to achieve this goal.
For example, in mining data about how students choose to use educational software, it may be
worthwhile to simultaneously consider data at the keystroke level, answer level, session level,
student level, classroom level, and school level. Issues of time, sequence, and context also play
important roles in the study of educational data.
(via a preprint of his chapter)
Ah, the joys of sorting out semantics in a multi-disciplinary field ;)
I developed an interactive Power Point on learning analytics that might be a place for you to start (limited numbers of words per screen covering only the very basics).
If you are interested, you should be able to access and dowload it from: https://landing.athabascau.ca/pg/file/tanyael/read/43701/learning-analytics-oer
I have a proposal for a discussion group but I'll wait for the right moment to share with all my colleagues.
Finally, perhaps personal replies (like "good to see you") could be kept personal for we have already so many emails to read. Am I being to harsh? Hope not...
Hopefully more people will be able to access this one. An inaccessible open resource is not really much good.
The functionality may not all be there in this version, but I am fairly inept with computers so this is the best I can do. (Thank goodness for development teams!)
If saving it as a different file type would be helpful, let me know (and I'll try, apparently even openess is easier in principle than it practice ).
I've been peeved when classmates have just written "agreed!" or "hooray!" or "you're right!" as a response because it clutters my mailbox...then again, if there isn't a like button, there is no other way of letting people know that they've struck a cord.
George Siemens wrote,
Definitely... all these acronyms without previous full explanation are difficult to guess if you are not initiated. Take for instance Philip Goldstein's Academic Analytics 2005 paper:
trying to develop some familiarity with language and concepts.
Maybe the glossary function could be used.
You can check my 'review' on http://shazz-lak.blogspot.com/2011/01/gist-of-first-reading.html
Digesting slowly still...
The ECAR article was written, it seems to me, by managers for managers. Sometimes those managers have come from a teaching background and know what teachers/instructors/faculty (hereafter "teachers") might want to know, but CIOs don't often come from academic backgrounds (at least in my experience). The data that is pulled and analyzed isn't always the same data and where you come from and what you use it for will influence your conclusions.
Now as far as focusing on the learning, in my experience not all faculty view more learner information as a good thing (as a matter of fact most seem to even view meaningful technology enhancement of their courses as additional work that they don't want to undertake). Part of it is really a culture change issue for existing faculty. The management side (affectionately known as the bean counters) have used data and analytics of various sorts for much longer for their business decisions, so it doesn't really surprise me that we see more info geared towards that side of the house.
I am very new to this area and feel that I have lots to learn from all these highly-qualified experts here.
What I could tell about analytics in education is that from the very first week it's causing me to change my perspective and add further visions to my understanding about what's happening around me.
I was aware that some Amazon, YouTube and Gmail was using the content in my online activities to make recommendations or to show me adverts that I would likely to be interested in. What I wasn't aware before taking this course was I could use this information in an education setting. I've been asking this question to myself:
Why don't we analyse the information on our students' activities to determine what they really are interested in and recommend them further readings, books, films or even courses etc. that they would be motivated to follow?
I am not a developer myself, but I know that we could develop tools that help learners find out their areas of interest in education and make correct choices about what courses to take (and as far as I can see some people are working on it).
I also find it very creative to make use of students' online activity information to develop early warning systems so that they would have better academic success. So, I find what Fritz is doing very useful.
Thanks for making me open another eye...
Good question, I think we should. Seems like these analytic approaches are broadly useful and could mature into a helpful educational resource. I suspect learners that struggle with a particular concept also share some common identifiable attributes, experiences, or behaviors. Many great teachers already have some ideas about these commonalities, but this expertise is too often underutilized because we don't have a systematic analytic environment to detect them or to support educational actions.
For example, imagine if a physics faculty member knew the textbook a student's ninth grade science teacher used to first explain the scientific method. Could that knowledge reveal something that would be relevant to his instructional strategy? Potentially, and it seems like analytics could help scale such tailoring so that patterns of historical data are used to predict potential misconceptions and to recommend targeted/personalized instructional support. We have a long way to go, but several analytic competitors (such as, Amazon and Netflix) have already illustrated the potential.
Here are my few thoughts and reflections from the readings and lecture.
1. In the article 'The State of Educational Data Mining in 2009', Baker identifies access to data as a major roadblock to research and activity in learning analytics. Baker's reference to the Pittsburgh Science and Learning Centre (PSLC) was extremely helpful, as I have had difficulty accessing data for a project that I have been trying to initiate for some time.
2. Although it is great to know that I can test models on PSLC's data set, it would be great to understand how other institutions have overcome issues of data access. Is anyone willing to share the data governance policies that their institutions use to regulate use of data in analytic projects. Can John Fritz comment on data governance at UMBC?
3. "I like" . . . "What is the educational responsibility of knowing". I'm wondering what others reaction\ interpretation is to this statement. My thoughts drifted to wondering if some roadblocks to data access are related in part to resource management issues/decisions. For example, if a "Signal-type" technology identifies 20% of our students as at risk? Do we have the resources to help these students? If we don't . . . do we want to know . . . I guess I've been reading too much 911 conspiracy theory stories on the internet.
4. John Fritz's distinction between 'prediction vs intervention' was interesting. I would think though that prediction (like those created by Signal's) would be used in a similar fashion to the metrics provided at UMBC. There all information that the students and faculty can look at and make decision about next steps/actions. I don't see these two approaches as either/or.
5. In John Fritz's talk, he mentions that MacFadyens concluded that a "one size does not fit all" wrt analytics and interactive courses. John indicated that he has some issues with this and was going to explain them wrt the work being done at UMBC. I missed the explanation . . . did anyone else pick on what the issues where.
Thanks for your post. Let me share a few comments (sorry for the length of this reply, but you raise good questions):
WRT . . .
#1 "Baker identifies access to data as a major roadblock to research and activity in learning analytics."
I couldn't agree more. But I would say that often the barrier is not policy, but mere ignorance of how to tunnel through or integrate systems, or even who the stewards are who can do so on a campus. In fact, they may not even realize the significance of the information they have found because it isn't connected.
For example, I recently "scored" a big win for our Dean of Undergraduate Education who'd been wanting final grade distribution reports for students who receive mid-term alerts through our Learning Resources Center (LRC). Each semester for the past 21 years, the LRC asks faculty to identify Freshmen students in jeopardy of receiving a D or F if the semester were to end at mid-term. The LRC then contacts these students through our Freshmen Year Intervention (FYI). We know that 40 percent of FYI-alerted students go on to earn a final grade of C or higher, but there's never been a control group study because the dean didn't want to deny a perceived benefit of FYI to some students in order to create a control group.
Well, we recently added transfer students and renamed FYI the First Year Initiative. Since transfer students did not previously receive the FYI "treatment" we've been able to compare the un-treated transfers from previous years with current FYI transfers. While these are different populations, the results tend to confirm the 40 percent student success longitudinal data we've collected. But the key is that the person maintaining the student grades didn't know of the person who maintains the FYI alert data, which only recently was converted to an electronic process. I saw what they both had, pulled a few meetings together to see how we could correlate the data an voila. As an organization, we simply didn't know who had the "dots" of information, though connecting them was relatively trivial once we did.
#2 "Can John Fritz comment on data governance at UMBC?"
There is a recently formed data management council at UMBC, but they have not weighed in on what I'm doing, nor have they really issued any policies or guidelines.
To be honest, the biggest obstacle we faced with mining the Blackboard data was the perception that it would violate our Bb license. I pushed Bb on this and found that it was more that they wouldn't support you--and with good reason. When we first started querying our production db, we crashed it twice. We were able to bring it back up, but I can imagine some other schools (clients), might not and Bb didn't want to support that mess.
That's when we came up with the idea of cloning production and reporting against that. We ran it by Bb and they had no problem with this, and in fact they grew interested because they saw what we were trying to do. We mutually agreed that UMBC would keep doing this on our own, but that we'd keep in touch with each other. Several Bb staff were helpful, particularly George Kroner, Deb Everhart and Volker Kleinschmidt, who had formerly been the Bb sysadmin at Univ. of Illinois at Chicago. This was one of those cases where it worked out well for both of us: we could be more nimble than Bb and do what worked for us, but at key stages they were able to answer questions that helped us move forward. They also connected us with other schools who were interested in this, which was great.
#3 "What is the educational responsibility of knowing?"
Actually, this question was first raised by John Campbell at Purdue and it was "What is an institution's ETHICAL obligation of knowing?" (my emphasis added). This was totally about student privacy and big brother concerns, which John was and is right to wonder about. In his 2007 Educause Quarterly article and in the Chronicle of Higher Ed article the following year, some students have raised questions about being tracked without their knowledge.
This was a key point of departure for me in the intervention we developed in our Check My Activity (CMA) tool: I decided I'd rather run the risk of students under-responding to the data we present to them about their own activity than resent us for making predictions based on data they didn't know we were collecting--even if our data models were right. I later saw theoretical justification for this in the self-efficacy work of Bandura, and Tinto's own comments that students bear some responsibility for their own learning. I don't mean this as a critique of what Purdue has done at all. But in addition to not having the resources they did to design a "brute force" data mining solution that connected student activity data and academics/demographics, I simply benefited from the fact that they were first and could decide to put more onus on students to become engaged.
#4 "I don't see these two approaches as either/or."
Me either. See #3 above.
But I would just say this: I think it's a LOT easier to answer a request for help than it is to try to solicit one from someone who might not know they need it. The only substantive difference between us and Purdue is who initiates the conversation that logically ensues from what the data shows: In our case, I think it's the students. In Purdue's case, it's the institution (or the instructor as an agent of the institution). I've talked a lot about this with both John and Kim Arnold, his colleague who seems to be doing the follow up since John took over research computing. The one issue they've had is lack of consistency on the part of faculty for when to issue a signals alert to students. How far does a student sink in a course before the prof issues an alert? And by definition, the prof is not able to see the totality of a student's success in all courses (like the student can) so there may be broader "success" issues that aren't triggering an alert. But that's just my assumption based on how (I think) Signals was designed. So, I could be wrong about that.
#5 "did anyone else pick on what the issues [were with Mcfadyen and Dawson's "one size doesn't fit all"]?"
Sorry, I ran out of time in the presentation, and it's a bit complicated, but spelled out in my article. Basically, the lack of consistent effective practice about faculty alerting that Purdue has discovered (#4 above) is what's driving my concern. Purdue and McFadyen and Dawson rightly conclude that variety in faculty course design or "pedagogical intent" makes it difficult to design the highly interactive courses that they both conclude ARE predictive of student success in an LMS.
But I think shining light on how students learn--and that they actively do so in courses designed to engage them with each other as well as the prof--could have a transformative effect on instructor pedagogy. Or at least it should.
To help, I've considered showing students the overall Bb activity ranking of their courses we display on our CMA dashboard. This way, they could avoid being too proud or despondent of their high or low "activity" in a course that is not very active overall itself. In turn, I could imagine students going to profs of non-interactive courses to say "I learn better in courses that are more interactive because they generate data points I can compare myself against with peers."
My one concern with this is that some faculty might then think I'm ranking their courses on quality, which I don't have the authority to do. But since students could look up this information on their own now--through our publicly available Bb reports at www.umbc.edu/blackboard/reports --all I would be doing is making this easier to see in their private dashboard, which then informs student insight even more.
I think there is a symbiotic relationship between students and faculty around the grade book, and the context that activity-based, personal data mining reports can provide. Over time, my hope is that evidence-based approaches to facilitating student self-awareness might eventually influence faculty pedagogy and course design. As such, I think the LMS could be a lever for doing so, and eventually be AN essential article of intervention clothing that everyone wears, even if "one size doesn't fit all."
P.S. Unfortunately, Elsevier requires a subscription to view my article, but Educause Quarterly has a video demo of the CMA in its current issue on student retention.
Thank-you for such a thorough response to my posting/questions. Your comments, ideas and examples have given me a lot to think about.
I've read the section (5.4) in John Campbell's thesis relating to the "Obligation of Knowing" and I think that although he mentions privacy issues related to the use of predicitive models, the overall emphasis of this sections is the need to develop a clear understanding of the shared responsibilities the institution, faculty and students have in the learning process in order to help guide how analytics is introduced and implemented. This helps me to understand the distinction that you make between focussing on intervention over prediction. It's all about how we can best use these tools given our perspective on teaching and learning and the resources that we have available. I am now even more interested in how you have grounded your work in the self-efficacy work of Bandura!
The examples of success that you have shared have been great (Thanks for the link to your article). Like Maryland, our institution is very concerned about privacy issues and your article has got me thinking about ways that we can incorporate analytics that may be more acceptable/appropriate for our institution.
LMSs obviously hold great promise, but the weakness is being a closed system. Yet this is adding nothing new to the discussion. I do know in my own experience, however, both as a student and a instructor, that a mounting resentment grows with lack of options outside of a closed system, particularly given the limitations of most, even good LMSs. This is, to me, one more reason for supporting the notion of openness.
Reading the material did, however, make me wonder about the commoditization of data. If a students is in a constant state of monitoring their analytics, does it devalue the feedback in any way? Is the learning subjugated by the desire and compulsion to simply rack up whatever data happens to become the unit of currency? Is this a good thing? I don't necessarily have a lot of answers to these questions but they definitely began to entrench themselves in my head.
Lastly, I keep wondering about the limits of what sense machines can make of all the data. The human element seems to remain central to making the data information and we can only keep up so much. There will always be more data than we as humans can possibly sort, analyze, and make meaning. So we essentially become our own bottlenecks, but I am not sure that there is any way around that exactly.
Sorry for this long reply to your post, but you raise an understandable question and concern about the point of learning analytics.
Personally, I DO believe that activity is an indicator (not a cause) of engagement, but as I've presented my findings for nearly three years now, I'm often met with disbelief and skepticism. People just don't want to believe that learning is reduced to hits. Actually, that's not what I'm saying at all, but that's the cliff I often get pushed over, despite my best efforts to resist. To be clear, I'm not interested in how Bb makes good students, but I am interested in how good students use Bb, and what effect (if any) sharing this information with all students has on their awareness, motivation and ultimately their behavior (performance).
And yet, if I were to invoke your "well established truth" that attendance in a face-to-face course is predictive of success, nobody would even bat an eye. It's so self-evident it's beyond debate. Why then do people take such a dim view of this phenomenon in online learning? To be sure, some think I'm suggesting the instructor is irrelevant and that all students have to do is interact with a computer. Nothing could be further from the truth of my findings or my intent. Interaction is not hitting the submit button a hundred times. This is simply the record of having something to say--and the act of submitting it.
As I tried to imply in the title of my article, unlike a traditional F2F course, online learning creates "classroom walls that talk," by leaving a residue of engagement that can be observed, studied and reflected upon by all actors in a course: student, teacher and yes, even the administrators who make it possible for the two to meet. The traditional mystery, even sanctity, of the F2F classroom means that a lot goes on that is unobserved except by those who are in the room. I get it. But at a time when the effectiveness or efficiency of traditional learning is under increasing scrutiny, I don't think educational institutions can continue to say "Trust us, we know what we're doing, all will be fine." That may be, but when less than half of U.S. students who start college finish, it rings hollow. In fact, I think this dismal student success and completion rate is why open learning is gaining a foothold, which I applaud. It's time to think outside of the box.
To a certain point, I agree with Irwin DeVries' post before yours, about "the iterative relationship between tracking of data and the behaviour of the tracked." But I don't necessarily see this as a bad thing.
In our Check My Activity tool, students are the only ones who can see their overall activity in all courses. I suppose we could build an admin tool to look this up, but it would be a colossal waste of time, because there simply aren't enough UMBC administrators to act on what they may find in each student's record. That's the point of trying to put students in a position to be the first to know what their activity is, in the context of performance indicators by course peers, and putting the burden on them to interpret what it means for themselves and decide to act on it or ignore it. Some will, and some won't. But this is how I interpret John Campbell's ethical obligation of knowing: let's show them their own data, but then leave it up to them to act, which includes seeking or accepting help the institution can provide already.
If they do, then my hope is that as faculty see how students can and do self-regulate their learning, there will be more incentive to design the types of interactive courses that encourage students to engage with the instructor, course concepts and course peers. Student-to-student interaction, particularly in forums like this or the blogging and commenting that are hallmarks of open learning environments, is where the engagement and activity shoot through the roof. All I want to do is give students a way to contextualize their own activity in this conversation, which I suppose some could say is just a data stream captured by a computer.
To me, the transformation of technology in education is the reflection it can create among students and teachers about their own behavior. Technology is just a tool, but it can be a powerful one when it creates an opportunity to rethink how to transfer the value of traditional learning into new environments. I think there really is something to Bandura's social cognitive theory of learning. We judge ourselves by what we think, feel, believe and know about ourselves compared with what we observe or believe to be true of others we wish to emulate. In this way, self-efficacy tools like the CMA might help students find themselves in the crowd. Maybe it seems like a black box, but that's the academic crowd-sourcing I'm trying to create. Learning analytics can be a lens or filter of critical self-reflection that may not be possible in other forms of learning that don't necessarily leave a data trail.
As a Software Engineer I like solutions that solve many problems at once (like TFT vs Cathode tube).
Having students act on their own data answers positively to many questions:
1) The question of data privacy!
2) The question of large classes or MOOC (facilitators just can't help everyone even with great tools).
3) The quastion of student reponsibilisation/empowerment (they have to be the ones to make an effort).
4) The question of tools choice (more or less integrated)for students will tend to opt and lobby their peers for the ones that give them better visibility, control and motivation.
There are certainly problems that it solves and I do not see for the moment problems that it arises.
Thanks a lot for your insights in that "long" reply, John.
I'm using evernote as I go along in the week, and aggregate that all in one weekly post on my blog as I progress in this course.