Writing data science questions
What this guide covers
This guide briefly explores how to frame a problem by defining questions that could be answered from the data set for your coursework.
Why are questions important?
In data science projects you are considering how data might be used to solve a problem.
To help transform data into something that will help to address the business problem and achieve the goals, you should start preparing a set of questions that will help to arrive at relevant insights.
For example, if a goal were to increase revenue in an online fashion app then questions we might be seeking to explore from the data could include:
- What revenue do we make now?
- Which channels/customer segments does that revenue come from?
- Which products generate the most/least revenue?
- How has revenue changed over time?
Typical questions
Consider "Who, What, Where, When, Why and How" questions.
Some typical data science questions:
- How much or how many?
- Which category (classification) is this?
- What is the distribution?
- How does this compare to that?
- Is this weird? Are there any outliers?
- Which option should be taken?
Questions for the COMP0035 coursework
The questions you write should:
- relate to the nature and content of your data set
- be framed in the context of the problem you defined
- be relevant to your target audience
For example, in previous years there was a London air quality data set that provided readings of three different pollutants, the date and time of the reading, and the location. Students asked questions relating to trends in these parameters for one or more areas of London. Asking questions about the cause of pollution however would not have been appropriate since the data set did not contain such data. Some students using the data set chose to ask questions from the perspective of the general-public living in London (their target audience); other focused on a more specific target audience such as environmental policy-makers.
You are not assessed on choice of statistical methods since this is neither a course pre-requisite nor covered in the course.
You will use the questions in the COMP0034 coursework. That is, you will try to answer the questions using visualisation (charts); a machine learning algorithm; or other features of a web app that uses your dataset.
Further reading and examples
How to ask questions data science can solve
The data science process - Step 1: Frame the problem
IBM Translate a business problem into an AI and data science solution
What's your problem - framing data science questions the right way