This post is inspired by a recent post that is written by my friend Mustafa Saad. It is a great post that I recommend you to read and think about the topics and suggested projects that are mentioned on his post. Actually, I don’t completely agree with him, but Mustafa is an experienced guy whose ideas and points of view should be taken into consideration.
In this post, I mainly talk to the people who will join CS department at the faculty. I have joined CS department in my 4th year and I may know how the people there think. However, this is just my point of view and it is maybe right or wrong.
So, you have successfully made it to your final year at the college. I don’t know, but I still believe that the 3rd year will be always the hardest and the most tough year compared to other years. Most of you should have worked on some interesting topics during the previous years. Some of you maybe got interested about Machine Learning, others maybe liked Graphics, Compilers or Architecture(for real .. how could one like such Archi. stuff? It is a curse, bro!). Well, that’s great actually, it is always better to search about a specific field to focus on for either your GP or after the graduation as a career path.
I mainly will talk about Machine Learning based projects. Machine Learning is considered to be very trendy nowadays and it is related to CS and SC departments specifically. In general, there are several applications that are under the umbrella of Machine Learning, such as Natural Language Processing(NLP), Automatic Speech Recognition(ASR) and Computer Vision(CV). These fields work on Text, Signals and Images data respectively.
The main question that you need to ask yourself about is “what is the field that I am interested in? What is data type that I want to work on (text, speech signal, image, ..etc.)?”
After answering these questions, you can begin your research and survey regarding the topic you decided. Your target is to follow the top universities research labs blogs which talk about their latest researches and their applications.
For example, if you are interested in NLP, you need to identify the universities that are well-known about their great efforts on NLP, you will see that Berkley http://www.berkeley.edu/ and British-Columbia https://www.ubc.ca/ are popular universities in that field, so, you go to their websites and see what are their latest papers and participation on some top conferences, such as NIPS https://nips.cc/ and EMNLP http://www.emnlp2016.net/ .
Actually, in Graduation Projects (and maybe M.Sc too), you don’t need to create something very new, you have 2 options when you begin your journey.
- Applied Research Project
Such projects, you are not looking forward to creating something that isn’t existed. You only seek for learning and increasing your programming skills by finding an existed project and either implement it from scratch as a whole or focus on specific part that you find it interesting. In such projects, you must have some existed resources to help you during your implementation phase
- Papers, clear documents and useful blogs and links
- Open-source Projects in git community. Your implementation shall include some important techniques such as Object Oriented Programming(OOP), Data Structures(DS) and Algorithms.
- You will need to understand some mathematical and statistical content that may be included in the paper.
I prefer this kind of projects, because you learn and implement what you learnt, also, the college likes such projects at which they see an actual output to see.
- Research Projects
You have an existed solution for a problem but you have in mind some theoretical enhancements. In such projects, you expect to do several experiments and search a lot to increase your knowledge and make sure of what you are doing. Choosing such projects, you must be ready to read a lot of theories, papers and some chapters from a reference book.
Also, it is very preferable to already have a previous background regarding to what you want to do. The team who want to work in such projects must have
- Solid background about mathematics and statistics
- Likes to read and search
- Expect to understand a lot and code less.
To be honest, I don’t prefer such projects, because
- The output isn’t guaranteed and the college is always expecting to see output and won’t appreciate any efforts without seeing an output
- Such deep understanding to the theoretical background of things is very rare in people with your level
So, I consider this kind of projects as an unnecessary risk.
If you are going to work on any of these two categories of projects, you must have the following to help you to finish the project
- Powerful Machine: Machines that have GPUs that you will need to train your complex models. The machines either could be online or offline. But in all aspects, you must make sure to have such machines
- Available Datasets: You need to make sure to have at least one dataset to do your experiments on. Avoid collecting the dataset by your own. You won’t have time to gather it and also collecting dataset needs some sort of divergence to help in generalization and coverage your patterns
- It is very preferable to work on Linux-based Operating Systems instead of Windows
Look .. there are some facts that I want to share with you to know what you are going to see when you begin your project
- The basic Machine Learning techniques are considered to be an old school. You won’t see a lot of projects nowadays that use popular algorithms and techniques, such as Naive Bayes and Hidden Markov Models. The research community is biasing towards Deep Learning(DL). Deep Learning needs powerful machines and large datasets and fortunately, these things are available and rich compared to the past. This supports the previous notes (The available datasets and powerful machines)
- There exist a lot of libraries that make the life easier while working on the projects. Commonly, the libraries are built on Python, R and C++ programming languages. The libraries helps on training and evaluating the models easily but it has a very bad effect. The problem with these libraries is in their abstraction. The libraries are made as a black box that you have several algorithms and techniques running on the backend. Trust me, you can train and produce output without even understanding 30% what is going on! The most popular libraries are Keras, Tensorflow and PyTorch.
- Don’t expect to have a lot of support in the college from the TAs and Drs. It is your project and you MUST be the one who fully understand what you want to do. Just Help Yourself!
From my experiences in mentoring and supervising teams after my graduation, I found that the teams are stuck in different problems.
- They rely on the seminars and the grades without much caring about the actual output
- They don’t make use of the summer vacation and the preparations aren’t organized
- They don’t divide the projects to several tiny modules
- They don’t get to the point. They waste their times watching from A to Z courses during the semesters
I will tell you something .. if you really are planning to work on Machine Learning based projects, you must be willing to spend part of your vacations for preparation and studying. If you enter the year without any prior knowledge or without finishing a beginner course in Machine Learning, then CHOOSE SOMETHING ELSE or you will end up running some code without understanding what you are doing. At least, there should be one member of the team who have some knowledge about the task so that he/she could be able to lead the team.
So, once you reached to this point, here are the concluded steps that I think that it is a good start for anyone who is working on Machine Learning projects
- Use the summer vacation to enroll into a Machine Learning course and make sure to finish it before the beginning of the year and before registering the graduation project
- Find an interesting field of study that is closely related to ML such as NLP and ASR
- Search about some of its popular topics and the current research progress regarding to it
- Gather the needed materials such as papers, useful links and books
- Find a runnable complete/incomplete open-source projects and make sure that you can install and run them in your machine. Also, check the number of stars and forks.
- Run and produce output from the open-source projects
- Implement your own code. Use either Python or C++ to write or rewrite the code. For example, you may think of implementing a Neural Network from scratch or other complex models such as Convolutional Neural Networks(CNNs) and Recurrent Neural Networks(RNNs)
- If you have time, you may create a desktop application or a web service as an interface for your project