Ricky J. Sethi
https://research.sethi.org/ricky/
rickys@sethi.org
First, let's declare a few provisional, and very informal, definitions for context: I'll call a field any broad subject of human inquiry, like the natural sciences, social sciences, history, etc. I'll then say a discipline is a more focused part of a field (in our case, computer science, physics, neurobiology, etc.). Within this discipline, there can be many sub-disciplines, or areas, which are specialized branches in that discipline (in our case, things like artificial intelligence, bioinformatics, robotics, etc.).1
Selecting a topic within a specific area can be done in a top-down or bottom-up manner. In your undergrad or MS, you might start by first coming up with a topic (or have a topic assigned to you) in a new area, or an area in which you don't have deep experience, and then you research references and approaches to support answering or developing that topic; this can be considered the top-down approach since you start with a topic in hand first.
Alternatively, in the PhD, you have to discover a particular topic of contribution in an area you know at a deep level; this can be thought of as the bottom-up approach. This paradigm often takes years: you start in the first couple of years of your PhD by taking classes and learning several areas within your discipline. You might start to focus on one or two areas in particular.
You might then, in the research phase, start to work with a research group in one of those areas and start to learn not only what's widely known but what's at the cutting edge of that area. You'll learn that particular area to a profound depth and get a good sense of the state-of-the-art in that area, as well. Once you've learned that area to that depth, you can then start seeing to which part of it you can make a novel contribution.
In order to learn an area in-depth, you might begin by doing a Literature Review of your chosen area by looking through conference proceedings, journals, online resources, books, etc., to learn about your area and review what's known therein. You can use Mendeley to store and markup papers and use resources like Google Scholar, PubMed, JSTOR, DBLP, Microsoft Academic, arXiv, etc. to find peer-reviewed, highly-cited relevant papers that interest you.
Sometimes, what you start looking for may lead you to find something that you had not intended but which you may find more interesting. Reading and understanding tons of papers is the key to research since these scientific papers will often be the fuel for your ideas. This is your chance to discover what the rest of the world is doing and is an especially important step as it will also give you a broad view of your discipline. In fact, one approach to finding new related papers is to use literature mapping tools like Connectedpapers, Litmaps, Citation Gecko, ResearchRabbit, and others that seem to use Microsoft Academic Graph.
The area review might also help motivate the need for selecting a topic in which you can solve some problem. This topic selection should actually end up with, or lead to, Research Questions that you might end up addressing in your research.
The position you take or the hypothesis you develop should come out of the data rather than having the data that would support a pre-designed hypothesis; e.g., this would be the difference between a scientific argument and a legal argument: one is looking to see what position they should take and the other is looking for evidence to support a position. For the topic justification, you'd look for the latter; but for the research questions, you'd look for the former.
Thus, the references should be limited specifically to the topic selection where they describe the problem or the limits of any supposed solutions as your proposed/potential solution will likely arise from your examination of extant research at solving those specific sub-problems and be encapsulated in your research questions.
The goal for the research questions portion would be to survey the current landscape of actual research in the discipline and, once we've mastered that, to see which part of it you could extend and support via rigorous research.
Read all the papers you've found (or as many as you can) and, after reading each one, enter it into an annotated bibliography. This is a bibliography that contains a full reference of the paper and a two to three paragraph summary of the paper.
The following questions can be used to craft the summary of the annotated sources:
What was the purpose of this research?
What was the problem the researcher(s) addressed?
What are the research questions?
What method and design were used in this research?
What were the results of the research?
What are the implications for future research?
Pay special attention to write personal notes like "This is a good paper because...", "I liked it because...", etc.
Feel free to add hashtags and notes to facilitate later search and parsing
The annotated bibliography is the first step to a comprehensive literature review and provides an opportunity to identify gaps in existing scholarship that can be used for future research. An annotated bibliography requires critical reading and summarizing of the findings of collected sources.
This will also be valuable because it will form your bibliography when you get to writing the dissertation. You might also want to look for papers and dissertations that include an extensive literature review and survey of the state of the art. One caveat would be to ensure the research papers were published in reputable, peer-reviewed venues and have a high number of citations for that area/discipline/sub-specialty. That is usually considered state-of-the-art and a good candidate for work to build upon and extend.
Furthermore, this summary is important because it will make up the "Background Research" part of your thesis as well as any papers you'd like to publish.
Talk to your advisor about what you are reading. They should be able to point you towards other resources like Google Scholar, etc. If you find an interesting paper, then look at the author's web site and see what links are there. What new projects are they working on? What new papers may have been published since their previous work?
In addition, the bibliography should look at the current landscape so you can formulate research questions rather than look to support a particular perspective. If it's an overwhelming conclusion of a survey of the discipline that a certain approach or technology will singularly do that, then it should come out of the survey rather than the survey being restricted to just the few sources that support the thesis statement.
After your first term at University, you should have a rough idea what you are interested in. During the second term, you should do the area review so that, at the end of your first year, you are ready, based on your survey of the field, to see what research you can extend following the next items below.
You should try to choose an area that interests you and ideally find a topic that you have funding to research as it will allow you more time to devote to your thesis and dissertation. Once you've chosen this area, you need to think carefully about what your dissertation's contribution could be. In general, a dissertation's thesis is a new contribution to the discipline you are studying. You cannot simply duplicate work that has already been done but have to contribute to the body of knowledge to some (usually small) extent, either theoretically or by improving some application of the state-of-the-art.
One technique that can be helpful is to draw a table with authors on one axis, which areas/topics they research on another axis, and a third axis (or additional block of columns) for Future Work possibilities from the Future Work sections of papers as that often includes work the authors recognize is needed but might not have the time or resources to complete. This kind of approach will allow you to see what areas or topics haven't been addressed as yet.
After you have researched your overall topic (read all relevant books, papers, reports, etc.), you should have an idea about what you plan to research. From an academic perspective, it can be useful to use the "decent and recent" rule; i.e., have you read the top 10 or 20 academic, peer-reviewed papers on the topic from the last 5 - 20 years?
It can be useful to draw up a draft Research Problem Statement at this stage. This is usually a (long) sentence that contains all the key elements in your research project. It should contain the scope and methodology in the research problem statement.
The scope should cover the breadth and depth as well as the quantitative assessment; the methodology should give specific approaches you will be utilizing. In addition, it should:
State the purpose of your study
Quantitative studies usually evaluate the efficacy, examine the details of a causal relationship, or measure the metrics of some area
Define what is being measured
Define the population group
Define the methodology to be employed
Be neutral (not assume the intervention being studied is effective or not)
Be able to be answered in the timeframe you have planned for the study
It is good practice to look at copies of previous theses, dissertations, academic research articles, etc. to see how their research statements are worded. You will usually find them in the Abstract or Introduction.
The Research Problem Statement documents an issue captured in the literature prompting the need for a solution. The research conducted serves as a solution or partial solution to the problem.
Your final research problem statement should take all of the above into consideration so you can start asking yourself open-ended "how" and "what" questions about your general topic.
When you have fully formulated the research problem statement, you can then use it in your abstract, introduction, research methodology, and conclusion chapters as a constant reminder of the overall focus of your research.
A Research Question should aim to advance the state of the art in a specific scientific area. Papers and dissertations can help you identify the state of the art. You can check their list of proposals for future research.
You might need to convert your Research Questions into specific Hypothesis Statements as shown in the above link. A hypothesis is a particular view or a standpoint which may be proposed, defended, or proved/falsified. It is simply a question that is styled as a statement, which you set out to falsify.
Start with the "How" and "What/Why" questions from the Research Problem Statement and then evaluate your questions. After you've put a question or even a couple of questions down on paper, evaluate these questions to determine whether they would be effective research questions or whether they need more revising and refining. Often, the "what/why" questions you had will become more detailed, focused "how" or "to what extent" questions. In general, here are some considerations to keep in mind:
Is your research question clear? With so much research available on any given topic, research questions must be as clear as possible in order to be effective in helping the writer direct his or her research.
Is your research question focused? Research questions must be specific enough to be well covered in the space available.
Is your research question complex? Research questions should not be answerable with a simple "yes" or "no" or by easily-found facts. They should, instead, require both research and analysis on your part, which is why they often begin with "How."
Caveats about Research Problem Statements and Research Questions
Make sure it's not just solving a business problem or creating a business product rather than embarking upon a research project that proposes original ideas about a specialized topic, as well as a high degree of methodological/scientific rigor.
Research results from the PhD contribute new knowledge, making a difference to the theoretical context of the given field of study. Scientific value, the predominant measure for the PhD, implies an advancement of conceptual rigor or enhancement of its potential for broad operationalization and testing. This means the scope of a project must be great enough to contribute to, extend, or facilitate the extension of theory.
Research requires a specific method and design to be accurately and reproducibly executed. You should draft the proposed method and design appropriate to conduct the study. Research must include precise procedures used; an explanation of the appropriateness of the methodology; rationale for the research design; and a description of the recruiting of subjects, organizations, etc. One key feature of effective studies is that the method and design must be outlined in enough detail for other researchers to replicate the study.
Once you have the specific procedures in place, you should convert your research questions to properly formed Hypotheis Couplets. These should be detailed in regards to the procedures, the methodology, and the metrics as this is what your research will likely answer to help advance the state of human knowledge.
In addition, you should create a so-called elevator pitch: a 1-2 sentence summary of your research problem that clearly delineates your specific contributions to that area/topic (as opposed to what was known beforehand).
In general, a research group is made up of a professor/senior scientist (the PI), a junior scientist (the postdoc), some grad students (PhD or MS), and possibly some undergrad students, as well. Most PIs are primarily interested in doing significant work, which usually means that it's at the level of being publishable in high profile, high impact venues; this means that they would only be interested in bringing on students whose approach aligns with them.
Of course, everyone in the group will contribute to all aspects to varying degrees but the main responsibility for the strategy or tactics/implementation will be assigned to different members. The assumption is that everyone will be very motivated and hard-working and will all work together very well and collaboratively contribute to all aspects together significantly.
In general, though, the PI and postdoc(s) will have the major responsibility for the strategic parts. The postdoc(s) will straddle both strategic and tactical parts (especially related to the implementation). The postdoc(s) will work closely with the grad students who, in turn, will work with the undergrad students, usually. In such a dynamic, the senior grad student (the PhD student) will have significant responsibility for both the implementation and also to the overall strategy. The junior grad student (the MS student) will have significant responsibility for both the implementation and the tutorials or survey papers. The goal is to do some strong work so that there are usually at least 1 - 3 significant publications together, including tutorial or survey publications that the grad students can incorporate into their dissertations. The technical lead will usually have primary discretion in determining author order on all papers, as well.
There is quite a bit of elitism at all levels of academia. As such, if you're looking for admission into a program, it's a good idea to aim for as high a tier as you can; the same applies to the venues you might choose for publication, as well. In fact, there are often informal, and sometimes formal, rankings of venues. In computer science, you can use, as a rough guideline, rankings like the Australian CORE rankings, Google Scholar, Conference Ranks, Research.com, and other lists people maintain for ranking conference venues in sub-fields and areas like InfoVis, etc.
In general, publishing is very difficult. It is extremely hard and serendipitous to have any papers accepted and usually involves multiple rejections from multiple venues along the way. In fact, getting a paper published is a very tough and slow process that also involves quite a bit of luck as there are these multiple rejections involved, usually, and it's even entirely possible that you might not be able to actually publish any papers in top venues at all.
In order to stack the odds in our favor, the overall research and publication strategy is set by the foremost experts in that area within the group. In practice, the technical lead will be either the PI or postdoc(s). Together, the PI and postdoc(s) will set the general strategy and also, to various extents, be involved in the the tactics/implementation.
As such, instead of putting all your (research) eggs into one basket and just submitting to a single conference, usually they'll establish a policy like a pipeline of submitting to multiple relevant top venues and, depending upon the feedback from the reviewers and additonal research work options, lowering to the next tier conference, etc.
In general, the higher the conference tier, the better the feedback from the reviewers will be (usually). A good strategy here might be to always address any and all errors but, suggestions about format or presentation, etc., should be taken with a grain of salt... and altogether ignored if it's from the notorious third reviewer. The third reviewer, sometimes referred to as the second reviewer, refers to the (humorous in retrospect) situation where all reviewers but one give your paper good reviews.
A similar effect can occur in grant proposal review panels where one especially negative (or positive!) panelist can torpedo (or champion!) your entire proposal during the discussions, especially if they're particularly persuasive or intimidating. Regardless, anecdotes abound of people submitting the identical paper to different (top-level) conferences with drastically different results (almost always assuming there aren't any significant errors, of course!).
Everyone in the group would be contributing significantly to the research thrust/project. As such, if anyone is listed as a co-author on a paper, they have full ownership of the paper and it reflects a significant contribution to the research, regardless of the order, which sometimes indicates its degree, depending upon the field. Indeed, all group members will usually be co-authors on all papers which they work on while they're with the group, even if the papers are finally published after they leave the group. Of course, the contributions will likely be to different amounts and, in case of disagreement, the technical lead will typically make the final decision as they often have the most context about the project's development and will bear responsibility for the work's quality. However, this authority should be exercised with care, as generous and fair authorship practices are essential to good mentorship and successful research groups.
In regards to the author order, normally, in a collaborative research effort, there will be some initial discussion about who has contributed to what extent, sometimes following approaches like the Contributor Roles Taxonomy (CRediT) which looks at categories like Conceptualization, Methodology, Writing, Software, Validation, Formal Analysis, Data Curation, Visualization, Review/Editing, Supervision, Project administration, and Funding acquisition.
The technical lead, who likely originated and/or led the project, might offer initial suggestions for author order, which can lead to discussion; and, in case of any disagreement that cannot be resolved through discussion, the technical lead will make the final decision on author order based on documented contributions and field conventions, as it will likely be their foundational ideas, framework, or implementation upon which the new research will be built.
Some of my favourite quotes about publishing are by Feynman and Captain Jean-Luc Picard, as shown below. What they show is that you have to go out of your way in papers or books to show not only how your ideas work and what evidence supports them but also how they don't work and what evidence is counter to them or what are their limitations.
All statements and claims you make should thus be supported by arguments and evidence as honestly as you can make them; this applies to every line that you put in every paper or every book to the best of your ability.
Thus, every sentence should be a properly formed argument, where its claim (or premise) should be fully supported by some evidence, either in the form of a reference to state-of-the-art research that directly provides that evidence or by your own research results or theoretical development which directly provides that evidence.
This is, of course, something that's very hard to do and it's not something I did early on but I endeavour to do so now. The earlier you get into this habit, the better as the goal isn't to just push a paper through but to advance the state of human (and your own) knowledge.
Without further ado, here are the quotes:
Richard P. Feynman, Cargo Cult Science Speech, CalTech, 1974:
It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated.
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.
In summary, the idea is to try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another.
...
The first principle is that you must not fool yourself, and you are the easiest person to fool.
I would like to add something that's not essential to the science, but something I kind of believe, which is that you should not fool the laymen when you're talking as a scientist... I'm talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you're maybe wrong, [an integrity] that you ought to have when acting as a scientist. And this is our responsibility as scientists, certainly to other scientists, and I think to laymen.
By honest I don't mean that you only tell what's true. But you make clear the entire situation. You make clear all the information that is required for somebody else who is intelligent to make up their mind.
Ronald D. Moore and Naren Shankar writing for Capt. Jean-Luc Picard in Star Trek: The Next Generation, "The First Duty":
The first duty of every Starfleet officer is to the truth; whether it's scientific truth, or historical truth, or personal truth. It is the guiding principle on which Starfleet is based.
So how does this help you write your papers and communicate your actual research? First and foremost, this means you do not do copy-and-paste or put content directly generated by the likes of ChatGPT (or, really, anyone but you, possibly in collaboration with others) in your paper. This is both for reasons of fundamental science and integrity, as discussed above, but also out of a sense of self-preservation as papers can sometimes be withdrawn and reputations destroyed even decades after they're published!
The easiest way to proceed is the simplest and most difficult: honestly and straightforwardly. Struggle with every sentence, wrestle with every figure, and do your best to try to meet those silly and those important publishing constraints of space and content. You'll often end up spending hours getting a single sentence right and that's a good thing because this is work of which you will then be rightfully proud, regardless of whether it got published or not (of course, publication and recognition is always welcome and is usually also a great ego, and sometimes career, boost, as well).
Usually, you will be able to incorporate significant parts of all the papers on which you are a co-author into your dissertation since all co-authors have full ownership of the published work. This normally ends up helping accelerate the completion of the dissertation as the relationship between the dissertation and published papers is symbiotic, with each contributing to the other. In general, at traditional institutions, it usually takes about 3-5 years after finishing the coursework/comps portion to finish the dissertation (but, of course, there are no guarantees on either side and most people use the heuristic of publishing about 3 major papers to get something 3 people are willing to sign off on).
The final thesis defence might be intimidating but, usually, by that point it's just more of a (still daunting) formality. Once in a while, though, you might get a more combative experience and, in those instances, here's the best bit of advice for your defense:
In fact, even when you apply for a PhD program, one of the best tactics might be to find one or two professors with whom you'd want to work and find one or two papers they've written to which you can contribute (either in its theoretical development or in terms of its application context or domain), or extend, or otherwise build upon their paper(s). Once you have some solid ideas for doing so, especially in terms of some extension to their work, a new problem domain, or a novel use case, you should reach out to the professor with your ideas, concentrating on illustrating the depth to which you understand the ideas, motivations, and technical aspects of the paper.
If you start such a collaboration with a current professor, the admissions process can often be expedited greatly. Please do keep in mind that, from their perspective, it's likely that most professors are interested in how you might help them get grants or publications or how you might become a good colleague/collaborator down the line (and possibly bring them on for some research projects or grants then, as well).
In general, assistant professors are (usually) the most ambitious and beginning PhD students will probably get a lot of attention from them; associate professors also give a fair bit of attention to their students but, in general, senior professors will likely be busy planning their next vacation but might bring quite a bit of cachet with their reputation and connections. In general, productivity (and interest) decline with seniority (not always or in everything or for everyone, of course!):
Weekly progress reports are essential to your PhD progress (perhaps beyond, as well). Not only will a weekly report ensure that you're on track and making consistent progress over a long term, but it will be a record of the continuous progress you made in your thinking and your research and it will serve as a resource which you can likely incorporate directly into your papers and dissertation. Even minor progress in some weeks is still progress and will let you reflect upon the cumulative effect of the incremental work you've done, which is very helpful especially when you might feel discouraged or despondent.
It will also help crystallize your thinking by forcing you to think through your ideas and write down what you need to do or have done in a concrete fashion. As you take the time to organize your thoughts and ideas, you'll become more productive and deepen your understanding. It will also allow you to get feedback from your advisor, which can be a valuable perspective both on work you've done or directions you might want to explore. In fact, sending periodic updates to your entire PhD committee can be quite helpful, as well:
Finally, even if you didn't make much progress, it will be a record of all the hard work you're putting in (and, if you're not putting in the time, those scant updates can serve as a good flag to indicate you might want to ramp up your work-time or take a break and come back when you're able to devote more time and effort to it).
Your weekly reports should be written in markdown. The weekly report should be titled as something like weekly_update-LastName_FirstName-YYYY-MM-DD.md. It should include the following sections:
Previous Week's Plan: please summarize (in one or two sentences), the previous week's intended plan. This will help you identify if you were able to accomplish what you intended or not.
This Week's Tasks: please summarize, as a bulleted list in markdown, the tasks you finished this week. This can include things like:
Papers you read
Code you developed
Sections/Papers you wrote
Presentations you wrote/made
Difficulties you're facing or ones you overcame
New ideas you developed
People with whom you collaborated
Edits you incorporated/addressed: briefly summarize any notes/comments on papers from your advisor that you resolved
Next Week's Plan: again using a markdown bulleted list, please list goals/tasks for next week. As always, the more precise or measurable the goal is, the better it will be for you to track and evaluate.
You can also include a sub-section for longer-term goals/tasks, as well.
The report is usually about half a page to one page in length, nothing too onerous but detailed enough to be informative as a research record. It is usually helpful to also include the new papers you're planning to read at the end of the report. This will give your advisor a sense of where your research is heading, as well.
Separately, on Mendeley or another paper-tracking system, you should write 1-page summaries of all important papers you read for your topic. As you digest these papers, which are often highly mathematical, you'll likely find that it will be an iterative process where, over time and with multiple readings, your understanding of the paper will deepen and, as it does, you can add to your 1-page summary of the paper, as needed. I could never do this but you can also try Feynman's approach to papers where you re-derive every single part of the paper from scratch without relying on any of the references (!).
When you give a paper or chapter to your advisor, they'll usually make many comments and suggestions over many iterations of the paper:
In order to manage these versions, if you're not using a revision control system or the comment/revision tracking in tools like Overleaf, then you should ensure that when you submit a new version that you resubmit it with a changes file or with the changes highlighted so that the differences are clear to your advisor.
It will be especially important to resubmit papers/chapters with the changes highlighted/emphasized as your dissertation grows larger since it will otherwise be intractable to keep track of changes; in addition, your advisor is highly unlikely to re-read entire chapters in full for each revision. The end result would be that you'd get less than optimal feedback and mistakes or missed opportunities might sneak into your final product, which will be part of your publication record forever.
Next, map your schedule out on a calendar. Depending upon your university, you might have a specified calendar so please do not ask your advisor to accelerate or compress that schedule or sign off on your forms before you have completed all of the requirements. They are almost guaranteed to either say no or just simply ignore that request and tell you to refer back to these reference materials so you know that it's not some arbitrary delay or a personal issue (there's a whole other document for personal issues with your advisor or potential advisor but reading PhD Comics might give you some good insight in this regard).
By dint of necessity, a PhD requires you to find and follow a path yourself. Your advisor can only help you find the path (or perhaps learn how to find the path) but then it's entirely up to you to walk that path. As in the old Kung Fu television series, you will be ready to graduate only when you can do all of this by yourself; if you need your advisor to specify every next step, then you are not ready to leave (graduate).
Email Response Times
In regards to your advisor and your schedule, please ensure you respond to any emails from your advisor ASAP. In today's environment, you're expected to monitor your email quite frequently and your advisor will likely expect your response within minutes and never more than about 12 - 24 hours later.
Source: http://research.sethi.org/ricky/book/
Caption: Research Question Genie:
If your research question is vague, it doesn't have to be answered with
a category label or a number. When thinking about a good research
question, people often imagine they've found a mischievous genie who
will truthfully answer all questions but will also try to make the
answer vague and confusing. So you have to pin the genie down with a
question so indisputable that the genie has no choice but to answer
clearly and unambiguously.
For example, if you asked a vague question like, "Is there a correlation between stock prices and time?", the genie might answer, "Yes, the price will change with time." Although correct, this answer is not very helpful. If, on the other hand, you formulate a precise, quantitative question like, "Will my IBM stock hit $50 by next Wednesday?", the genie has no choice but to give a specific answer and predict the stock's price.
There is a well-known riddle that's often used to illustrate the evolution of a well-formed research question: the chicken crossing the road!
Original Formulation: Why did the chicken cross the road?
The original formulation is ambiguous and overly broad: it doesn't
indicate which chickens were involved, where they were located, what
the weather and other environmental conditions were, etc.
First Revision: How many chickens crossed Main St. last night at
3am and under what conditions?
This first iteration is a little better and you can answer it
directly by just gathering the data and doing an initial analysis:
this is the EDA and questions like these can be answered by simple
statistical analysis.
Final Formulation: Which contextual and environmental factors
likely caused 30% of the chickens to cross Main St. between 1am -
3am on a rainy night with depleted food stores from the desert to
the cornfield?
The final version is more precise and makes quantitative predictions
(the probability or likelihood) of which factors motivated chickens
to cross: was lack of food the most important factor? Was the
weather a factor? Would more have crossed in sunnier weather? Did
the night-time make them more confident? These are only the factors,
or variables, we've identified so far, especially through our EDA.
Caption: Drafting Machine Learning Research Questions:
The research question should be falsifiable or scientifically testable.
It should be formulated as one of the 8 common questions machine
learning can answer, usually as a category label or number. It should
also identify the predictor and response variables. In the end, you
should formulate a falsifiable hypothesis that can be tested with
quantitative data.
Business Problem
It helps to start by first stating the business problem.
E.g., What factors are related to employee churn? Can we predict
future terminations?
Initial Problem Statement
Then, convert that business problem into an initial problem
statement by making it precise and quantitative and something you
can use with a machine learning model.
E.g., Is there a correlation between age, length of service, and
business unit with the terminated status of an employee?
In general, the initial problem statement should be precise and
quantitative and is often expressed as some independent
variables, the predictors, having a presumed correlation with
some dependent variable, the response variable or class
label.
Research Question
Once you have a precise, quantifiable question, you will have to
fine-tune it to match one or more of the 8 kinds of questions that
machine learning usually answers. Once you decide which of those
questions it fits, you'll formulate your final research
question.
E.g., How likely is it that someone under 50, with more than 20
years service, working in the IT Department will be gone within a
year?
This would be an example of a regression task where you predict
a number, a probability, but you can also convert it to a
classification task where you predict a category label,
whether that person belongs in category A or category B.
Caption: Eight Common ML Approaches:
Pragmatically speaking, there are usually eight kinds of questions we
answer in most data analytic applications. Each of these eight questions
is usually addressed using a different machine learning algorithm. These
eight common questions and their corresponding approaches are:
Find if there are any unusual data points: is this weird? → Anomaly Detection
Discover how this is organized or structured? → Clustering and Dimensionality Reduction
Predict a number, like how much or how many? → Regression
Predict if something is A or B? → Binary Classification
Predict if this is A or B or C or ...? → Multi-Class Classification
Find the best way to make your next move; what should you do next? → Reinforcement Learning
Is the data super-complicated and hard to understand? → Deep Learning
Is accuracy quality really important? → Ensemble Methods
There are also additional questions that are often asked like "Is this the best?", which can be answered using Optimization or some such approach.
We usually start by stating the business problem. E.g., What factors are related to employee churn? Can we predict future terminations? We then convert that business problem into a research problem statement by making it precise and quantitative and something you can use with a machine learning model. E.g., is there a correlation between age, length of service, and business unit with the terminated status of an employee?
In general, the research problem statement should be precise and quantitative and is often expressed as some independent variable(s) (the predictor(s)) having a presumed correlation with some dependent variable (the response variable or class label). Once we have a precise, quantifiable research question, we can see if it matches one or more of these questions that machine learning usually answers.
Some good references for the PhD process, some of which were quoted/referenced above:
The importance of stupidity in scientific research by Dr. Martin A. Schwartz, Yale University
FAQs by Dr. Hal Daumé III of UMD
Advice for researchers and students compiled by Michael Ernst

My Ph.D. advisers expected weekly progress reports. I’m glad they did by Pijar Religia
Title of Thesis and Current Date:
Your Name and Email Address:
Committee Members and Advisor:
This should contain the following:
1 Page Summary.
Start with "I propose…". Then explain briefly what the research problem is, why a solution is needed, and what your approach is.
State your Research Problem Statement: describe in detail the problem you are trying to solve.
What is the Research Question you are trying to answer?
Why is this problem/question worth solving/asking? Who would others care about it?
How have other people in the past tried to solve/answer it?
What is your new approach to solving/answering this problem? Or, what specific improvements are you making on an existing solution?
How do you validatethat the solution you came up with is a good solution?
How can you demonstrate that your solution works?
Intellectual Merit:
Explain what is the contribution to computer science.
Broader Impact:
Explain how it broadly impacts society, scientists, etc.
The introduction should, unsurprisingly, introduce your topic and might start with something like, "I propose to….," but say it in more detail than above. It should explain the rationale/problem in more detail and also explain what the proposed solution that will follow in the subsequent chapters, as well. In general, it should contain sections like:
Statement of the Problem
Purpose of the Study
Introduction to Theoretical Framework
Introduction to Research Methodology and Design
Research Questions, including the Hypotheses
Significance of the Study
At each step, you should use references to support your argument in careful detail.
Describe the field and explain any fundamental ideas or approaches that will serve as the background knowledge to understand your proposed contributions. It should have sections like:
Problem Space
Key Area 1
Key Area 2
Key Area 3...
Theoretical Framework and Approaches
The literature review would be a detailed review of literature for each of the key areas identified above. In general, you should provide a comprehensive literature review of each key research area by going over all the important and relevant papers and talking about what they do in an effort to establish the current state-of-the-art so you can show how you extend it (it's basically what would go into a survey paper, e.g.). It might be helpful to:
Explain what other researchers have done in the discipline to try and solve the problem.
Draw a table comparing all the approaches with your approach.
Summarize in one succinct sentence/paragraph what is the main difference between your approach and others.
Explain the work you have done so far towards your proposed approach (e.g. prototype implementations) that has led you to believe that your approach will work.
Explain the overall approach at a high level so they have a precursor to what is to follow in the guts of the proposal below. Most importantly, it should include Assumptions, Limitations, and Key Findings/Results.
This will be easier to flesh out once you have the time-line figured out (see below).
Again, begin with "I propose to…"
Go into full detail on what your approach is.
Articulate what the concrete deliverables of the thesis are and how you plan to prove that it is good. I.e., what is the ultimate payoff.
Include a timeline showing the overall goals and when they need to be accomplished. For example:
F = date to file all the paperwork needed to officially complete your dissertation
D = date of dissertation defense with your committee members
A = date to submit the final draft of your dissertation to your advisor
W = date to start writing your dissertation
I = date to implement, evenaluate, and analyze feature Zi of your thesis, etc.
In this timeline, I < W < A < D < F.
Describe how you might deploy the system or research for others to use. E.g., a web site with downloadable software, etc.
Describe any equipment or experimental setup you need to accomplish your thesis.
You can use any good format for references like IEEE or something like $Last name of first author plus 2 digit year$. E.g., $Jones23$ or $Jones23a$ if there is more than one paper in that year that was cited. But any consistent reference system will work equally well, depending upon the requirements of your department or your university.
Add any additional components here.