The Indispensable Role of Data Science in Our Lives
Do you often order food or grocery delivery? With the widespread availability of the internet, more and more convenient services are filling our lives. Beyond the essential delivery services many rely on, there are also ride-hailing apps, business order systems, and more. These systems all depend on data science working behind the scenes to ensure you receive the services you need as quickly as possible. Take food delivery, for example. Every step, from placing your order to the restaurant accepting it and the delivery reaching your door, is meticulously planned. The preparation time at the restaurant and the delivery person's route are all calculated using complex data science algorithms to provide the best service quality!
BY Shao-Fang (Pam) Wang
What is data science used for?
In data science, one common framework categorizes analyses into three types: descriptive, diagnostic, and predictive. These analytical approaches empower us to comprehend past events, delve into the underlying reasons for specific occurrences, and predict future outcomes, respectively. So, how do data scientists utilize these analytical approaches to solve real-world problems?
Imagine you're designing a food delivery service app like Uber Eats or Foodpanda, with the goal of ensuring swift and fresh food delivery to customers. When a user places a food order in the app, the designated restaurant begins preparing the order. Upon completion, a delivery partner receives a notification to collect the food at the restaurant and deliver it to the customer.
However, a critical question arises: when should the food delivery system dispatch a delivery partner to pick up the order from the restaurant? Dispatching too early may result in the delivery partner waiting at the restaurant unnecessarily while the food is being prepared. Dispatching too late may compromise the food's freshness upon arrival and disappoint the customer. An inefficient system not only wastes the delivery partner's time, reducing the number of potential trips and affecting their earnings, but also impacts the number of orders a restaurant can accommodate and the customer's satisfaction with the delivered food.
To tackle these challenges, we employ the three data science methodologies. By employing a combination of descriptive, diagnostic, and predictive analyses, we can gain comprehensive insights into our delivery system's performance and strategically enhance our app to meet evolving demands.
Descriptive Analysis: What Happened?
Descriptive analysis involves summarizing historical data to understand past events and develop a comprehensive understanding of the business. In the context of food delivery services, analyzing historical trip data empowers a food delivery company to identify patterns and trends throughout the delivery process. For example, gathering information on the number of available delivery partners at different times, the volume of orders received, and any daily or weekly trends can provide valuable insights.
Additionally, it's essential to understand the time taken for each step of the food delivery process to create a timeline that illustrates the journey of an order. This includes analyzing the typical duration spent at various stages of a trip, spanning from dispatch to drop-off, as well as examining the variability in these durations.
Using a concrete example, let's delve into the time interval between an order being ready for pick-up and the actual pick-up. Ideally, we aim for this interval to be as short as possible: once the food is prepared, we want the delivery partner to promptly arrive for pick-up. However, to understand the reality of this interval, we need to analyze data from real-world orders. Employing summary statistics such as the mean and median helps us understand the central tendency. Additionally, examining the 25th and 75th percentiles allows us to identify shorter and longer intervals, providing insights into the variability across orders. Calculating the standard deviation further elucidates the spread of these durations.
Further visualizing these metrics' distributions through boxplots or histograms enhances our understanding of their variability. For example, the green boxplots and histograms depict the distribution of time intervals between an order being ready for pick-up and the actual pick-up time. Upon visual inspection, it exhibits a wider spread, with a pronounced long tail towards the right. We can also observe two distinct peaks in the delivery time distribution: one occurring in less than 10 minutes, and the other ranging between 20 to 40 minutes.
The orange boxplots and histograms, on the other hand, illustrate that the distribution for cook time is narrower, indicating less variability (mean = 14.78 minutes; median = 14.64 minutes; SD = 5.04 minutes; minimum = 1.40 minutes; maximum = 29.09 minutes). The proximity of the mean and median suggests a relatively symmetric distribution.
Based on this analysis, the time needed to prepare food appears reasonable and relatively stable, with a duration and fluctuation within acceptable limits. However, the time between food being ready and actual pick-up is concerning. It frequently exceeds 30 minutes, demonstrating significant variability ranging from very short to very long durations. This uncertainty diminishes the quality of our food delivery app. We may further inquire why there are instances when the time is less than 10 minutes, while in other cases it exceeds 30 minutes, assuming this is for delivery from the same restaurant by delivery partners from similar locations.
Diagnostic Analysis: Why Did It Happen?
Diagnostic analysis focuses on understanding the potential reasons or factors that contribute to observed phenomena. While conducting experiments or manipulating situations to collect data for establishing actual causal relationships may not always be feasible, various statistical methods, such as regressions,and correlation analysis, provide a means to utilize observed data in identifying robust relationships or associations between factors of interest. This yields actionable insights that hold the potential for enhancing business operations.
In our food-delivery example, our data revealed that the wait time between the delivery partner's arrival at the restaurant and the food being ready for pick-up is not optimal. A diagnostic analysis may uncover the factors contributing to delivery partners arriving either too early or too late, as well as factors affecting food preparation time.
Perhaps we observe a strong correlation between this time interval and weekends or weekdays, and whether the restaurant is located in the city center. Additionally, leveraging motion sensor data from cell phones, we might discover that delivery partners spend considerable time searching for parking. Together, this would suggest that most late-arriving delivery partners tend to be for weekend orders from city-center restaurants, potentially due to spending more time searching for parking. Conversely, we may find that delivery partners do not usually arrive late for other times and locations. Furthermore, food preparation time appears consistent between weekends and weekdays. Finally, suppose we observe that restaurants in the city center, with designated pick-up parking locations where drivers don’t need to search for parking, experience minimal time between arrival and picking up food.
Here is an example of one diagnostic analysis we can conduct using the same fabricated data from descriptive analysis. In this analysis, we break down the time between food being ready for pick-up and actual pick-up by weekdays and weekends. The data revealed that this duration is quite narrow and reasonable during weekdays, with a mean of 7.03 minutes and a median of 6.53 minutes. However, the uncertainty and undesirable long durations for this time interval appear to stem from weekend deliveries. The mean for weekends is 29.87 minutes, with a median of 28.71 minutes, exhibiting a wide spread from a minimum of 4.14 minutes to a maximum of 61.88 minutes. This provides evidence that one contributing factor to the variance of this time period is the distinction between weekday and weekend orders.
Overall, our analysis indicates a notable correlation between extended wait times for delivery partners and weekend orders from city-center restaurants, potentially attributable to increased parking search times.
The diagnostic analysis expands upon our descriptive findings and offers an explanation for late delivery partner arrivals relative to food readiness. Based on these findings, we can already formulate and implement potential solutions. For instance, we can encourage restaurants in the city center to provide curb-side pick-up service to avoid the need for parking. Additionally, the system could proactively dispatch the delivery partner ahead of time to account for anticipated delays in finding parking, or notify the partner about the likelihood of extended parking durations. These measures aim to shorten the duration between food readiness and collection, enhancing overall efficiency in the food delivery process.
Predictive Analysis: What Will Happen?
Predictive analysis involves forecasting future outcomes based on historical data using statistical models. The primary goal is to predict future events based on what we know about the past through various statistical analyses and machine learning techniques.
Following the example, predictive analysis can dynamically forecast when to dispatch delivery partners early. Our previous findings suggested that delivery partners will require additional time to find parking for orders from city-center restaurants on weekends. Therefore, we can develop a machine learning algorithm to learn from historical trip data and predict instances where the arrival and pick-up times exceed a certain threshold, prompting the need for early dispatch. We can further predict the time needed to find parking to determine how much earlier we need to dispatch delivery partners.
The model's inputs may encompass various factors such as the order time, restaurant location, predicted traffic and weather conditions, and significant events near the restaurant that could affect parking availability, such as concerts, sports events, and conferences. Additionally, we can enhance our predictions by determining the optimal lead time for dispatching the delivery partner in specific scenarios. For instance, if the model anticipates heavy traffic due to a nearby major event, it may suggest dispatching the delivery partner earlier to accommodate potential delays. Furthermore, the model can integrate historical data from past occurrences of major events to accurately estimate the duration of delays and thus predict the amount of lead time needed more accurately. By harnessing such predictive analytics, we can streamline the dispatching process and improve overall efficiency in food delivery operations.
✨Further Reading:If you would like to read the Chinese version, please refer to《生活中離不開的資料科學》、"The secret behind Big Data-Data Science"