[Nov 09, 2022] Professional-Data-Engineer Dumps Full Questions - Exam Study Guide [Q62-Q80]

Rate this post

[Nov 09, 2022] Professional-Data-Engineer Dumps Full Questions – Exam Study Guide

Google Cloud Certified Free Certification Exam Material from PrepAwayPDF with 270 Questions

The candidates must develop practical skills in the exam topics to succeed. These objectives are highlighted below:

Design Data Processing Systems

Design Data Processing Solutions: This topic includes the individuals’ expertise in planning, distributed systems usage, choice of infrastructure, hybrid Cloud & edge computing, system availability & fault tolerance. You should also know about the architecture options, including message queues, message brokers, service-oriented architecture, middleware, and serverless function;
Migrate Data Processing & Data Warehousing: This section includes validating migrations, migration from on-premises to Cloud, and awareness of the current state & how to migrate designs to the future state.
Select the Relevant Storage Technologies: The considerations for this area include mapping storage systems to the business needs, data modeling, distributed systems, as well as tradeoffs, involving transactions, throughput, and latency;
Design Data Pipeline: The focus for this subsection includes data visualization & publishing and batch & streaming data (Cloud Dataproc, Cloud Dataflow, Cloud Sub/Pub, Hadoop ecosystem, Apache Spark, Apache Beam, and Apache Kafka). It also focuses on online versus batch prediction and job orchestration & automation;

Q62. Which Google Cloud Platform service is an alternative to Hadoop with Hive?

Cloud Dataflow

Cloud Bigtable

BigQuery

Cloud Datastore

Q63. Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training dat
a. However, when tested against new data, it performs poorly. What method can you employ to address this?

Threading

Serialization

Dropout Methods

Dimensionality Reduction

Q64. You are training a spam classifier. You notice that you are overfitting the training data. Which three actions can you take to resolve this problem? (Choose three.)

Get more training examples

Reduce the number of training examples

Use a smaller set of features

Use a larger set of features

Increase the regularization parameters

Decrease the regularization parameters

Q65. You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use on-demand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organization sometimes don’t get slots to execute their query and you need to correct this. You’d like to avoid introducing new projects to your account.
What should you do?

Convert your batch BQ queries into interactive BQ queries.

Create an additional project to overcome the 2K on-demand per-project quota.

Switch to flat-rate pricing and establish a hierarchical priority model for your projects.

Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.

Q66. Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

Dataproc Worker

Dataproc Viewer

Dataproc Runner

Dataproc Editor

Q67. You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?

Create a Cloud Dataproc Workflow Template

Create an initialization action to execute the jobs

Create a Directed Acyclic Graph in Cloud Composer

Create a Bash script that uses the Cloud SDK to create a cluster, execute jobs, and then tear down the cluster

Q68. Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

Issue a command to restart the database servers.

Retry the query with exponential backoff, up to a cap of 15 minutes.

Retry the query every second until it comes back online to minimize staleness of data.

Reduce the query frequency to once every hour until the database comes back online.

Q69. Which is not a valid reason for poor Cloud Bigtable performance?

The workload isn’t appropriate for Cloud Bigtable.

The table’s schema is not designed correctly.

The Cloud Bigtable cluster has too many nodes.

There are issues with the network connection.

Q70. Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO.
During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Check the dashboard application to see if it is not displaying correctly.

Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.

Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.

Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

Q71. You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into BigQuery. For security reasons, you decide to redact your customers’ Government issued Identification Number while allowing customer service representatives to view the original values when necessary. What should you do?

Use BigQuery’s built-in AEAD encryption to encrypt the SSN column. Save the keys to a new table that is only viewable by permissioned users.

Use BigQuery column-level security. Set the table permissions so that only members of the Customer Service user group can see the SSN column.

Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic hash.

Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token.

Q72. You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

Both batch and streaming

BigQuery cannot be used as a sink

Only batch

Only streaming

Q73. Case Study: 2 – MJTelco
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to- many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments ?development/test, staging, and production ?
to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis.
Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud’s machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
MJTelco is building a custom interface to share data. They have these requirements:
They need to do aggregations over their petabyte-scale datasets. They need to scan specific time range rows with a very fast response time (milliseconds). Which combination of Google Cloud Platform products should you recommend?

Cloud Datastore and Cloud Bigtable

Cloud Bigtable and Cloud SQL

BigQuery and Cloud Bigtable

BigQuery and Cloud Storage

Q74. You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD.You want to query all of the tables for the past 30 days in legacy SQL. What should you do?

Use the TABLE_DATE_RANGEfunction

Use the WHERE_PARTITIONTIMEpseudo column

Use WHEREdate BETWEEN YYYY-MM-DD AND YYYY-MM-DD

Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD

Q75. Your company is loading comma-separated values (CSV) files into Google BigQuery. The data is fully imported successfully; however, the imported data is not matching byte-to-byte to the source file. What is the most likely cause of this problem?

The CSV data loaded in BigQuery is not flagged as CSV.

The CSV data has invalid rows that were skipped on import.

The CSV data loaded in BigQuery is not using BigQuery’s default encoding.

The CSV data has not gone through an ETL phase before loading into BigQuery.

Q76. Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Use a row key of the form <timestamp>.

Use a row key of the form <sensorid>.

Use a row key of the form <timestamp>#<sensorid>.

Use a row key of the form >#<sensorid>#<timestamp>.

Q77. You’re training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you’ve discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you’d like to engineer a feature that incorporates this physical dependency.
What should you do?

Provide latitude and longtitude as input vectors to your neural net.

Create a numeric column from a feature cross of latitude and longtitude.

Create a feature cross of latitude and longtitude, bucketize at the minute level and use L1 regularization during optimization.

Create a feature cross of latitude and longtitude, bucketize it at the minute level and use L2 regularization during optimization.

Q78. You are analyzing the price of a company’s stock. Every 5 seconds, you need to compute a moving average of the past 30 seconds’ worth of data. You are reading data from Pub/Sub and using DataFlow to conduct the analysis. How should you set up your windowed pipeline?

Use a fixed window with a duration of 5 seconds. Emit results by setting the following trigger:
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))

Use a fixed window with a duration of 30 seconds. Emit results by setting the following trigger:
AfterWatermark.pastEndOfWindow().plusDelayOf(Duration.standardSeconds(5))

Use a sliding window with a duration of 5 seconds. Emit results by setting the following trigger:
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))

Use a sliding window with a duration of 30 seconds and a period of 5 seconds. Emit results by setting the following trigger: AfterWatermark.pastEndOfWindow()

Q79. You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?

Create a table in BigQuery, and append the new samples for CPU and memory to the table

Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second

Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second

Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.

Q80. Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

Create a file on a shared file and have the application servers write all bid events to that file. Process the file with Apache Hadoop to identify which user bid first.

Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.

Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.

Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull

Career Opportunities

The certified individuals can explore a variety of job opportunities. Some of the positions that they can take up include a Software Engineer, a Cloud Architect, a Data Engineer, a Sales Engineer, a Data Scientist, a Cloud Developer, and a Kubernetes Architect, among others. The salary outlook for these job roles is an average of $128,500 per annum.

Dumps Brief Outline Of The Professional-Data-Engineer Exam: https://www.prepawaypdf.com/Google/Professional-Data-Engineer-practice-exam-dumps.html

[Nov 09, 2022] Professional-Data-Engineer Dumps Full Questions – Exam Study Guide [Q62-Q80]

The candidates must develop practical skills in the exam topics to succeed. These objectives are highlighted below:

Career Opportunities

admin

Leave a Reply Cancel reply

The candidates must develop practical skills in the exam topics to succeed. These objectives are highlighted below:

Career Opportunities

Related posts:

admin

You might also like

Leave a Reply Cancel reply