
Answer-first summary for fast verification
Answer: BigQuery
BigQuery is the best choice for this use case because it is a fully managed, highly scalable data warehouse that can handle large volumes of data, like the 40 TB dataset mentioned. It supports standard SQL and has built-in machine learning capabilities through BigQuery ML, which allows you to build predictive models using SQL queries. Additionally, BigQuery has robust support for geospatial data processing, providing functions for working with GeoJSON and geospatial data types. This makes it suitable for ship telemetry data and geospatial analysis. Finally, BigQuery easily integrates with visualization tools like Google Data Studio, enabling the creation of interactive dashboards to monitor ship delays based on your model's predictions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a data engineer working for a global shipping company, your task is to train a machine learning model using a vast dataset of 40 TB. This model aims to predict which ships in various geographic regions are likely to cause delivery delays on any given day. The dataset will be derived from multiple attributes collected from various sources, including telemetry data. Importantly, the telemetry data—which contains ships' locations in GeoJSON format—will be pulled from each ship and loaded into the system every hour. In addition to the predictive model, you need a dashboard that displays the number of ships and identifies which ones are likely to cause delays in each region. Given these requirements, including the need for native prediction capabilities and geospatial data processing, what would be the most suitable storage solution for this use case?
A
BigQuery
B
Cloud Bigtable
C
Cloud Datastore
D
Cloud SQL for PostgreSQL
No comments yet.