Writing data science questions
This guide briefly explores how to frame a problem by defining questions that could be answered using data.
Why are questions important?
In data science projects you are considering how data might be used to solve a problem.
To help transform data into something that will help to address the business problem and achieve the goals, you should start preparing a set of questions that will help to arrive at relevant insights.
For example, if a goal were to increase revenue in an online fashion app then questions we might be seeking to explore from the data could include:
- What revenue do we make now?
- Which channels/customer segments does that revenue come from?
- Which products generate the most/least revenue?
- How has revenue changed over time?
Typical questions
Consider "Who, What, Where, When, Why and How" questions.
Some typical data science questions:
- How much or how many?
- Which category (classification) is this?
- What is the distribution?
- How does this compare to that?
- Is this weird? Are there any outliers?
- Which option should be taken?
Questions in the COMP0035/COMP0034 coursework
You are not required to include questions in the coursework. Doing so could help you to think about the app your will need to do. It can be especially useful when designing data dashboards and visualisations (charts).
The questions you write should:
- relate to the nature and content of the data set
- be relevant to the target audience (the people who will use the app)
For example, a London air quality data set provided readings of three different pollutants, the date and time of the reading, and the location.
- Students identified questions relating to trends in these parameters for one or more areas of London.
- Asking questions about the cause of pollution would not have been appropriate since the data set did not contain such data.
- Some students chose to consider questions from the perspective of the general-public living in London; others focused on a more specific audience such as environmental policy-makers.
Further reading
How to ask questions data science can solve
The data science process - Step 1: Frame the problem
IBM Translate a business problem into an AI and data science solution
What's your problem - framing data science questions the right way
Read Minding the gap – visualising the impact of COVID-19 by Stuart Johnson
This activity can be used to gain practice in 'framing the problem' by identifying the questions to be answered from a data science project. Read the article and identify from the article:
- Who is the target audience? i.e. who did the Author intend would use the charts?
- What was the goal the author was tying to achieve in providing the charts?
- What questions can be answered with each of the charts?