Best Way To Study For Databricks Databricks-Certified-Professional-Data-Engineer Exam Brilliant Databricks-Certified-Professional-Data-Engineer Exam Questions PDF
Updated Verified Pass Databricks-Certified-Professional-Data-Engineer Exam - Real Questions and Answers
Passing the Databricks Certified Professional Data Engineer exam is a significant achievement for any data engineer. It demonstrates that the candidate has a high level of expertise in working with Databricks and can design and manage complex data pipelines. Databricks Certified Professional Data Engineer Exam certification is also highly valued by employers and can lead to new career opportunities and higher salaries.
NEW QUESTION # 31
A data engineer has written the following query:
1. SELECT *
2. FROM json.`/path/to/json/file.json`;
The data engineer asks a colleague for help to convert this query for use in a Delta Live Tables (DLT)
pipeline. The query should create the first table in the DLT pipeline.
Which of the following describes the change the colleague needs to make to the query?
- A. They need to add the cloud_files(...) wrapper to the JSON file path
- B. They need to add a COMMENT line at the beginning of the query
- C. They need to add a live. prefix prior to json. in the FROM line
- D. They need to add a CREATE LIVE TABLE table_name AS line at the beginning of the query
- E. They need to add a CREATE DELTA LIVE TABLE table_name AS line at the beginning of the query
Answer: D
NEW QUESTION # 32
The data analyst team had put together queries that identify items that are out of stock based on orders and replenishment but when they run all together for final output the team noticed it takes a really long time, you were asked to look at the reason why queries are running slow and identify steps to improve the performance and when you looked at it you noticed all the code queries are running sequentially and using a SQL endpoint cluster. Which of the following steps can be taken to resolve the issue?
Here is the example query
1.--- Get order summary
2.create or replace table orders_summary
3.as
4.select product_id, sum(order_count) order_count
5.from
6. (
7. select product_id,order_count from orders_instore
8. union all
9. select product_id,order_count from orders_online
10. )
11.group by product_id
12.-- get supply summary
13.create or repalce tabe supply_summary
14.as
15.select product_id, sum(supply_count) supply_count
16.from supply
17.group by product_id
18.
19.-- get on hand based on orders summary and supply summary
20.
21.with stock_cte
22.as (
23.select nvl(s.product_id,o.product_id) as product_id,
24. nvl(supply_count,0) - nvl(order_count,0) as on_hand
25.from supply_summary s
26.full outer join orders_summary o
27. on s.product_id = o.product_id
28.)
29.select *
30.from
31.stock_cte
32.where on_hand = 0
- A. Turn on the Auto Stop feature for the SQL endpoint.
- B. Turn on the Serverless feature for the SQL endpoint and change the Spot Instance Pol-icy to "Reliability Optimized."
- C. Increase the cluster size of the SQL endpoint.
- D. Turn on the Serverless feature for the SQL endpoint.
- E. Increase the maximum bound of the SQL endpoint's scaling range.
Answer: C
Explanation:
Explanation
The answer is to increase the cluster size of the SQL Endpoint, here queries are running sequentially and since the single query can not span more than one cluster adding more clusters won't improve the query but rather increasing the cluster size will improve performance so it can use additional compute in a warehouse.
In the exam please note that additional context will not be given instead you have to look for cue words or need to understand if the queries are running sequentially or concurrently. if the que-ries are running sequentially then scale up(more nodes) if the queries are running concurrently (more users) then scale out(more clusters).
Below is the snippet from Azure, as you can see by increasing the cluster size you are able to add more worker nodes.
SQL endpoint scales horizontally(scale-out) and vertically (scale-up), you have to understand when to use what.
Scale-up-> Increase the size of the cluster from x-small to small, to medium, X Large....
If you are trying to improve the performance of a single query having additional memory, additional nodes and cpu in the cluster will improve the performance.
Scale-out -> Add more clusters, change max number of clusters
If you are trying to improve the throughput, being able to run as many queries as possible then having an additional cluster(s) will improve the performance.
SQL endpoint
A picture containing diagram Description automatically generated
NEW QUESTION # 33
What is the output of the below function when executed with input parameters 1, 3 :
1.def check_input(x,y):
2. if x < y:
3. x= x+1
4. if x<y:
5. x= x+1
6. if x <y:
7. x = x+1
8. return x
check_input(1,3)
- A. 0
- B. 3
(Correct) - C. 1
- D. 2
- E. 3
Answer: B
NEW QUESTION # 34
Which method is used to solve for coefficients bO, b1, ... bn in your linear regression model:
- A. Apriori Algorithm
- B. Ridge and Lasso
- C. Ordinary Least squares
- D. Integer programming
Answer: C
Explanation:
Explanation : RY = b0 + b1x1+b2x2+ .... +bnxn
In the linear model, the bi's represent the unknown p parameters. The estimates for these unknown parameters
are chosen so that, on average, the model provides a reasonable estimate of a person's income based on age
and education. In other words, the fitted model should minimize the overall error between the linear model and
the actual observations. Ordinary Least Squares (OLS) is a common technique to estimate the parameters
NEW QUESTION # 35
Which of the following Structured Streaming queries is performing a hop from a Bronze table to a Silver
table?
- A. 1. (spark.read.load(rawSalesLocation)
2. .writeStream
3. .option("checkpointLocation", checkpointPath)
4. .outputMode("append")
5. .table("uncleanedSales")
6. ) - B. 1. (spark.table("sales")
2. .agg(sum("sales"),
3. sum("units"))
4. .writeStream
5. .option("checkpointLocation", checkpointPath)
6. .outputMode("complete")
7. .table("aggregatedSales")
8. ) - C. 1. (spark.table("sales")
2. .groupBy("store")
3. .agg(sum("sales"))
4. .writeStream
5. .option("checkpointLocation", checkpointPath)
6. .outputMode("complete")
7. .table("aggregatedSales")
8.) - D. 1. (spark.readStream.load(rawSalesLocation)
2. .writeStream
3. .option("checkpointLocation", checkpointPath)
4. .outputMode("append")
5. .table("uncleanedSales")
6. ) - E. 1. (spark.table("sales")
2. .withColumn("avgPrice", col("sales") / col("units"))
3. .writeStream
4. .option("checkpointLocation", checkpointPath)
5. .outputMode("append")
6. .table("cleanedSales")
7.)
Answer: E
NEW QUESTION # 36
You would like to build a spark streaming process to read from a Kafka queue and write to a Delta table every
15 minutes, what is the correct trigger option
- A. trigger(process "15 minutes")
- B. trigger(15)
- C. trigger(processingTime = "15 Minutes")
- D. trigger(processingTime = 15)
- E. trigger("15 minutes")
Answer: C
Explanation:
Explanation
The answer is trigger(processingTime = "15 Minutes")
Triggers:
*Unspecified
This is the default. This is equivalent to using processingTime="500ms"
*Fixed interval micro-batches .trigger(processingTime="2 minutes")
The query will be executed in micro-batches and kicked off at the user-specified intervals
*One-time micro-batch .trigger(once=True)
The query will execute a single micro-batch to process all the available data and then stop on its own
*One-time micro-batch.trigger .trigger(availableNow=True) -- New feature a better version of (once=True) Databricks supports trigger(availableNow=True) in Databricks Runtime 10.2 and above for Delta Lake and Auto Loader sources. This functionality combines the batch processing approach of trigger once with the ability to configure batch size, resulting in multiple parallelized batches that give greater control for right-sizing batches and the resultant files.
NEW QUESTION # 37
Which of the following are stored in the control pane of Databricks Architecture?
- A. All Purpose Clusters
- B. Delta tables
- C. Job Clusters
- D. Databricks Filesystem
- E. Databricks Web Application
Answer: E
Explanation:
Explanation
The answer is Databricks Web Application
Azure Databricks architecture overview - Azure Databricks | Microsoft Docs Databricks operates most of its services out of a control plane and a data plane, please note serverless features like SQL Endpoint and DLT compute use shared compute in Control pane.
Control Plane: Stored in Databricks Cloud Account
* The control plane includes the backend services that Databricks manages in its own Azure account.
Notebook commands and many other workspace configurations are stored in the control plane and encrypted at rest.
Data Plane: Stored in Customer Cloud Account
* The data plane is managed by your Azure account and is where your data resides. This is also where data is processed. You can use Azure Databricks connectors so that your clusters can connect to external data sources outside of your Azure account to ingest data or for storage.
Timeline Description automatically generated
Bottom of Form
Top of Form
NEW QUESTION # 38
Which of the following commands can be used to query a delta table?
- A. 1.%python
2.spark.sql("select * from table_name") - B. 1.%python
2.delta.sql("select * from table") - C. Both A & B
(Correct) - D. 1.%sql
2.Select * from table_name - E. 1.%python
2.execute.sql("select * from table")
Answer: C
Explanation:
Explanation
The answer is both options A and B
Options C and D are incorrect because there is no command in Spark called execute.sql or delta.sql
NEW QUESTION # 39
Newly joined data analyst requested read-only access to tables, assuming you are owner/admin which section of Databricks platform is going to facilitate granting select access to the user
- A. Admin console
- B. Data explorer
- C. User settings
- D. Azure Databricks control pane IAM
- E. Azure RBAC
Answer: B
Explanation:
Explanation
Anser is Data Explorer
https://docs.databricks.com/sql/user/data/index.html
Data explorer lets you easily explore and manage permissions on databases and tables. Users can view schema details, preview sample data, and see table details and properties. Administrators can view and change owners, and admins and data object owners can grant and revoke permissions.
To open data explorer, click Data in the sidebar.
NEW QUESTION # 40
The viewupdatesrepresents an incremental batch of all newly ingested data to be inserted or updated in the customerstable.
The following logic is used to process these records.
Which statement describes this implementation?
- A. The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.
- B. The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.
- C. The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.
- D. The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.
- E. The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.
Answer: D
Explanation:
Explanation
The logic uses the MERGE INTO command to merge new records from the view updates into the table customers. The MERGE INTO command takes two arguments: a target table and a source table or view. The command also specifies a condition to match records between the target and the source, and a set of actions to perform when there is a match or not. In this case, the condition is to match records by customer_id, which is the primary key of the customers table. The actions are to update the existing record in the target with the new values from the source, and set the current_flag to false to indicate that the record is no longer current; and to insert a new record in the target with the new values from the source, and set the current_flag to true to indicate that the record is current. This means that old values are maintained but marked as no longer current and new values are inserted, which is the definition of a Type 2 table. Verified References: [Databricks Certified Data Engineer Professional], under "Delta Lake" section; Databricks Documentation, under "Merge Into (Delta Lake on Databricks)" section.
NEW QUESTION # 41
In order to use Unity catalog features, which of the following steps needs to be taken on man-aged/external tables in the Databricks workspace?
- A. Migrate/upgrade objects in workspace managed/external tables/view to unity catalog
- B. Upgrade to DBR version 15.0
- C. Enable unity catalog feature in workspace settings
- D. Upgrade workspace to Unity catalog
- E. Copy data from workspace to unity catalog
Answer: A
Explanation:
Explanation
Upgrade tables and views to Unity Catalog - Azure Databricks | Microsoft Docs Managed table: Upgrade a managed to Unity Catalog External table: Upgrade an external table to Unity Catalog
NEW QUESTION # 42
How does Lakehouse replace the dependency on using Data lakes and Data warehouses in a Data and Analytics solution?
- A. Open, direct access to data stored in standard data formats.
- B. Support for end-to-end streaming and batch workloads
- C. All the above
- D. Supports BI and Machine learning workloads
- E. Supports ACID transactions.
Answer: C
Explanation:
Explanation
Lakehouse combines the benefits of a data warehouse and data lakes,
Lakehouse = Data Lake + DataWarehouse
Here are some of the major benefits of a lakehouse
Text, letter Description automatically generated
Lakehouse = Data Lake + DataWarehouse
A picture containing text, blackboard Description automatically generated
NEW QUESTION # 43
Create a sales database using the DBFS location 'dbfs:/mnt/delta/databases/sales.db/'
- A. The sales database can only be created in Delta lake
- B. CREATE DATABASE sales USING LOCATION 'dbfs:/mnt/delta/databases/sales.db/'
- C. CREATE DATABASE sales FORMAT DELTA LOCATION 'dbfs:/mnt/delta/databases/sales.db/''
- D. CREATE DELTA DATABASE sales LOCATION 'dbfs:/mnt/delta/databases/sales.db/'
- E. CREATE DATABASE sales LOCATION 'dbfs:/mnt/delta/databases/sales.db/'
Answer: A
Explanation:
Explanation
The answer is
CREATE DATABASE sales LOCATION 'dbfs:/mnt/delta/databases/sales.db/'
Note: with the introduction of the Unity catalog and three-layer namespace usage of SCHEMA and DATABASE is interchangeable
NEW QUESTION # 44
A data architect has determined that a table of the following format is necessary:
Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above
format regardless of whether a table already exists with this name?
- A. 1. CREATE OR REPLACE TABLE table_name ( id STRING, birthDate DATE, avgRating FLOAT )
- B. 1. CREATE TABLE table_name AS
2. SELECT id STRING, birthDate DATE, avgRating FLOAT - C. 1. CREATE OR REPLACE TABLE table_name AS
2. SELECT id STRING, birthDate DATE, avgRating FLOAT USING DELTA - D. 1. CREATE TABLE IF NOT EXISTS table_name ( id STRING, birthDate DATE, avgRating FLOAT )
- E. 1. CREATE OR REPLACE TABLE table_name
2. WITH COLUMNS ( id STRING, birthDate DATE, avgRating FLOAT ) USING DELTA
Answer: A
NEW QUESTION # 45
A nightly job ingests data into a Delta Lake table using the following code:
The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.
Which code snippet completes this function definition?
def new_records():
- A. return spark.readStream.load("bronze")
- B.

- C.

- D. return spark.read.option("readChangeFeed", "true").table ("bronze")
- E. return spark.readStream.table("bronze")
Answer: C
Explanation:
Explanation
https://docs.databricks.com/en/delta/delta-change-data-feed.html
NEW QUESTION # 46
A data engineer wants to create a relational object by pulling data from two tables. The relational object must
be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to
avoid copying and storing physical data.
Which of the following relational objects should the data engineer create?
- A. View
- B. Database
- C. Delta Table
- D. Spark SQL Table
- E. Temporary view
Answer: A
NEW QUESTION # 47
......
Databricks Certified Professional Data Engineer certification is designed for data engineers who are responsible for building and maintaining data pipelines and data lakes on the Databricks platform. Databricks Certified Professional Data Engineer Exam certification exam covers a wide range of topics, including data engineering concepts, data modeling, data ingestion, data transformation, data processing, and data warehousing. Databricks-Certified-Professional-Data-Engineer exam is designed to assess a candidate's ability to design, build, and maintain scalable and reliable data pipelines on the Databricks platform.
Updated PDF (New 2024) Actual Databricks Databricks-Certified-Professional-Data-Engineer Exam Questions: https://dumpstorrent.prep4surereview.com/Databricks-Certified-Professional-Data-Engineer-latest-braindumps.html
