My Machine Learning Learning Experience (Part 4): Exam and Final Project

May 7th, 2016 - May 13th, 2016

Finished exam and stupid project

My machine learning course in CUHK is finally coming to an end. The exam is today and it's also my last exam in CUHK. Boy will I miss that. Just kidding, I'll never ever miss finals. I spent a few days making a cheat-sheet, and I think I had a basic understanding about everything that's taught, so I'm sure I will pass. Will I get a good grade? Probably not.

First of all, I studied different materials. I haven't gone to the lectures for months as I really didn't understand what the prof was saying. And his lecture notes? I don't understand the notations, I don't understand the assumptions, in fact, I don't understand even a single slide of his notes. Therefore, I've been watching lecture videos from UC Berkeley's Machine Learning course (CS189) to catch up. The materials they covered are 80% the same, but the prod Johnathan Shewcuhk did a much much better job on teaching the concepts and showing the derivations of everything, and in more depth. However, he used different notations and the ways he used for derivations of methods like PCA and how he helped us visualize things like regression and hyperplanes are quite different from my prof. For instance, the way Shewcuhk taught Bayesian Decision Rule is simply elegant and delightful. But when my prof did it (I was still attending lectures at the time), it is still awful and hard to understand how to look at things his way. Therefore, I would say I learned the same stuff, but in different ways, different notations, different assumptions (In the meantime, I have also been jotting down notes, I will share in latter posts.)

Second, there are people who stacked the deck. There were only five questions in the exam. I think I knew how to do more that half of the questions. Some of the questions were tied to what the prof did in lectures, and since I wasn't there, I could only guess what he did and wing it. I tried to be prepared but there were no pastpapers or sample questions available, and the prof decided not to share them, on purpose. When I finished the exam and went back to the lab, some of the people there told me they actually found the pastpapers (I didn't know how) and copied all the answers onto the cheatsheet, which was used by about ten people. And four of the questions this time were from the exam last year. That means those people were already 80% of the score ahead of us when we started the exam. A score this high can already secure a B+ or even higher. But technically, they did nothing wrong, they were just a bunch of assholes.

Though all my exams are over, there's still an individual project to be finished for the machine learning course. I finished it a few days later. I was trying to write something about it, but my heart wasn't there anymore. In short, I had to write four classifiers and train them with a bunch of emails they gave us. And instead of doing something like ensemble learning, I was required to do prediction using ALL the classifiers INDIVIDUALLY, then the TAs will evaluate the INDIVIDUALLY based on how well they performed on a testing set with 500 emails, which will determine our project scores. After I have trained them to have an accuracy at around using cross-validation, I submitted the damn thing and found something better to do.

Was I overfitting or underfitting? Don't know, don't care. Seriously, all I had was a training set, I didn't think there was a way rather than cross-validation. Days later, I thought of a painful way to find out: tune the hyper-parameters of every classifier and evaluate the performance, plot the TOC graph and see at which setting do we have the highest accuracy, overfit and underfit. Should I try it out? Nah, this project is stupid.

To wrap up, the course, CSCI3320, offered by CUHK, sucks balls. Yeah, there, I said it. I don't think it aroused students' interest in machine learning, it was doing the opposite. I'm also certain how they grade students is totally biased (underfits), or has high variance (overfits) due to them trying so hard to fit the random noise they make, which is so high that we cannot assume their mean is 0.

Kev

P.S. This project doesn't deserve to be posted on Github, if you really want it that bad, let me know and I'll send them to you.

Road to Kevolution

My Machine Learning Learning Experience (Part 4): Exam and Final Project

May 7th, 2016 - May 13th, 2016

No comments:

Note:

Recents

Popular

Comments

About me

Albums

Email me

Archive

Label

Contributors

Popular Posts

Random Posts

Recent Posts