Google Data Engineer Sample Questions:
1. Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.
You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)
A) Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
B) Introduce data compression for each file to increase the rate file of file transfer.
C) Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
D) Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
E) Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
F) Assemble 1,000 files into a tape archive (TAR) fil
2. You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the dat a. Which two actions should you take? (Choose two.)
A) Increase the number of nodes in the Cloud Bigtable cluster
B) Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
C) Configure your Cloud Dataflow pipeline to use local execution
D) Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable
E) Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
3. As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.
A) Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
B) For each Cloud Storage bucket or BigQuery dataset, decide which projects need acces
C) Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
D) Use Cloud Deployment Manager to automate access provision.
E) Create distinct groups for various teams, and specify groups in Cloud IAM policies.
F) Introduce resource hierarchy to leverage access control policy inheritance.
4. Which of these sources can you not load data into BigQuery from?
A) File upload
B) Google Cloud SQL
C) Google Drive
D) Google Cloud Storage
5. Which Google Cloud Platform service is an alternative to Hadoop with Hive?
A) Cloud Dataflow
B) Cloud Bigtable
C) BigQuery
D) Cloud Datastore
Solutions:
| Question # 1 Answer: A,C | Question # 2 Answer: B,D | Question # 3 Answer: E,F | Question # 4 Answer: B | Question # 5 Answer: C |














11 Customer Reviews
Quality and ValueITCertKing Practice Exams are written to the highest standards of technical accuracy, using only certified subject matter experts and published authors for development - no all study materials.
Tested and ApprovedWe are committed to the process of vendor and third party approvals. We believe professionals and executives alike deserve the confidence of quality coverage these authorizations provide.
Easy to PassIf you prepare for the exams using our ITCertKing testing engine, It is easy to succeed for all certifications in the first attempt. You don't have to deal with all dumps or any free torrent / rapidshare all stuff.
Try Before BuyITCertKing offers free demo of each product. You can check out the interface, question quality and usability of our practice exams before you decide to buy.
