Ultimate access to all questions.
As a data engineer working for a global shipping company, your task is to train a machine learning model using a vast dataset of 40 TB. This model aims to predict which ships in various geographic regions are likely to cause delivery delays on any given day. The dataset will be derived from multiple attributes collected from various sources, including telemetry data. Importantly, the telemetry data—which contains ships' locations in GeoJSON format—will be pulled from each ship and loaded into the system every hour. In addition to the predictive model, you need a dashboard that displays the number of ships and identifies which ones are likely to cause delays in each region. Given these requirements, including the need for native prediction capabilities and geospatial data processing, what would be the most suitable storage solution for this use case?