Capital One Data Science Assessment 题目整理 – OA代写 – 面试代面 – OA help - csOAhelp|代码代写|面试OA助攻|面试代面|作业实验代写|考试高分代考

✅ Question 1: Data Collection and Aggregation

Prompt:

You are given access to the data containing information about taxi drivers and their rides, created by April 15th, 2023. When calculating any time features, consider April 15th, 2023 as today. The data is distributed across 6 different files:

`drivers.csv`:

driver_id (type: int) – Unique identifier of the driver
car_id (type: int)
age (type: int)
started_driving_year (type: int)
second_language (type: str) – If a driver doesn't have a second language, the value is "no"
rating (type: float)
net_worth_of_tips (type: float)
driver_class (type: str) – One of the following: ["A class", "B class"]

`rides_{i}.csv` (split into 4 files):

ride_id (type: int)
driver_id (type: int)
passenger_id (type: int)
date (type: str)
status (type: str) – One of the following: ["Rejected by the driver", "Cancelled by the passenger", "Success"]
car_clearness_upvote_given (type: bool)
politeness_upvote_given (type: bool)
communication_upvote_given (type: bool)
punctuality_upvote_given (type: bool)
complaint_given (type: bool)

`cars.csv`:

car_id (type: int)
model (type: str)
manufacture_year (type: int)
last_inspection_date (type: str)

Your task:

Retrieve the needed information from the data about each driver and store it in the collected.csv file.

Your goal is to obtain a table with the following columns. You may order rows and columns in any way you find comfortable to work with, tests are designed to be order-agnostic:

driver_id (int)
car_model (str)
car_manufacture_year (int)
days_since_inspection (int)
age (int)
experience (int)
second_language (str)
rating (float)
net_worth_of_tips (float)
number_of_upvotes (int)
driver_class (str)

✅ Question 2: Basic Analysis and CSV Output

Prompt:

Your tasks are as follows:

Calculate the average driver rating.

Compute the average of the rating column from drivers.csv
Store the result as:
- insight_type: "average_driver_rating"
- value: The calculated average (float)

Calculate the percentage of drivers with a second language.

Determine the percentage of drivers who have a second language (i.e., where second_language is not "no")
Store the result as:
- insight_type: "percentage_drivers_with_second_language"
- value: The calculated percentage

Calculate the ride success rate.

Combine the ride data from all rides_{i}.csv files into a single dataset
Calculate the percentage of rides that were successful (i.e., where status is "Success")
Store the result as:
- insight_type: "ride_success_rate"
- value: The calculated percentage

Output Requirements:

Save the results in a CSV file named analysis_results.csv
The CSV file should have two columns:
- insight_type
- value

✅ Question 3: Preprocessing for Classification

Prompt:

The dataset includes various attributes for each driver.

Your tasks:

a. Encode the car_model column using numerical encoding.
Encoding should start from 0 and encode values with consecutive integers.

Example of correct mapping:

"Nissan Altima" -> 3  
"Ford Fusion"   -> 1  
"Honda Accord"  -> 0  
"Hyundai Sonata"-> 2

The set of codes is (0, 1, 2, 3); it is a consecutive sequence from zero.

Example of incorrect mapping:

"Nissan Altima" -> 5  
"Ford Fusion"   -> 4  
"Honda Accord"  -> 2  
"Hyundai Sonata"-> 1

The set of codes is (1, 2, 4, 5); it is not a consecutive sequence starting from zero.

b. Normalize the net_worth_of_tips column using Standard Scaling.

c. Convert the driver_class column into numerical values:

Replace "A class" with 0
Replace "B class" with 1

✅ Question 4: Classification Model and Predictions

Prompt:

You are given the dataset that is a result of the previous question. To prevent being able to use the dataset for the answer to the previous question, random adjustments were applied, but the format and structure remain the same.

Using the cleaned dataset from the prior question, your goal is to train a classifier that is able to predict whether the driver is of "A class" (0) or of a "B class" (1).

This is a free-form task, so feel free to use any machine-learning model you want. You can also use any Python libraries you want.

The testing set from the previous task was split into validation and test sets. The test set does not contain the classes, and it is used to evaluate model performance on unseen data when questions are submitted. The validation set contains the classes and can be used to evaluate model performance during training.

Your task:

Predict classes of the drivers from test.csv with the lowest possible error.

As an error metric, use precision and recall, considering B class to be the positive class.

Your goal is to maximize recall, while keeping the precision on a relatively high level.

Once you are satisfied with the model’s performance on the validation set val.csv, submit the predicted classes of the drivers in the test set to the platform.

The predicted classes should be saved in a csv file with the name predictions.csv. The file should have the following format:

driver_class
0
1
1
0

我们长期稳定承接各大科技公司如Capital One、TikTok、Google、Amazon等的OA笔试代写服务，确保满分通过。如有需求，请随时联系我们。

We consistently provide professional online assessment services for major tech companies like TikTok, Google, and Amazon, guaranteeing perfect scores. Feel free to contact us if you're interested.

✅ Question 1: Data Collection and Aggregation

drivers.csv:

rides_{i}.csv (split into 4 files):

cars.csv:

Your task:

✅ Question 2: Basic Analysis and CSV Output

Calculate the average driver rating.

Calculate the percentage of drivers with a second language.

Calculate the ride success rate.

Output Requirements:

✅ Question 3: Preprocessing for Classification

Your tasks:

✅ Question 4: Classification Model and Predictions

Your task:

Leave a Reply Cancel reply

`drivers.csv`:

`rides_{i}.csv` (split into 4 files):

`cars.csv`: