Wayfair的da岗位,以下是take-home accessment 105min
Wayfair Technical Analyst OA 限时105分钟,这里展示其中2道题
Weather Department Statistics
There is a database containing temperature and humidity statistics by state for various countries. The database aims to return a list of states with their associated country names, displaying the state's average humidity and average temperature during November 2018. The average temperature is categorized into a weather type:
- COLD - If 0 ≤ average monthly temperature < 15.
- WARM - If 15 ≤ average monthly temperature < 30.
- HOT - If ≥ 30 average monthly temperature.
The output should be formatted as: state.name | country.name | average_monthly_humidity | weather_type
, ordered descending by average humidity and ascending by state name in case of ties. Average humidity should be displayed with four decimal places.
Database Schema
Tables:
- country
- id: INTEGER, primary key, the country's id number.
- name: STRING, the country's name.
- state
- id: INTEGER, primary key, the state's id number.
- name: STRING, the state's name.
- country_id: INTEGER, the country's id number.
- state_weather_stats
- state_id: INTEGER, foreign key referencing
state.id
. - record_date: DATE, the date when the stat was recorded.
- temperature: INTEGER, recorded temperature value.
- humidity: INTEGER, recorded humidity value.
- state_id: INTEGER, foreign key referencing
Sample Output
- Alberta, Canada | 47.0000 | WARM
- Bheri, Nepal | 41.0000 | WARM
- British Columbia, Canada | 37.0000 | HOT
- Dhawalagiri, Nepal | 31.5000 | COLD
- Bagmati, Nepal | 20.5000 | COLD
Find Top Students
Overview
Given a dataset of marks scored by students in three different subjects (math, reading, and writing), the task is to analyze and process the data using a pandas DataFrame to identify top-performing students based on specific criteria.
Task Description
- Data Cleaning and Preprocessing
- Remove Incomplete Data: Drop all students who have marks missing in two or more subjects.
- Impute Missing Data: For students with marks missing in fewer than two subjects, fill in the missing marks with the median score for the respective subject.
- ID Cleaning
- Standardize IDs: Clean the
student_id
column by removing all non-numeric characters to simplify identification.
- Standardize IDs: Clean the
- Calculations
- Weighted Average: Calculate the weighted average of marks using the following weights:
- Math: 50%
- Reading: 20%
- Writing: 30%
- Sort and Filter: Order the students by their weighted average in descending order and filter out students whose average is above 70.
- Weighted Average: Calculate the weighted average of marks using the following weights:
Implementation Steps
- Data Cleaning: Students with incomplete data (missing marks in two or more subjects) are excluded from further analysis.
- Imputation: Use the median of the available scores in each subject to fill any missing scores for students with fewer missing marks.
- ID Transformation: Convert
student_id
entries to purely numeric form by stripping non-numeric characters. - Score Calculation and Sorting: Compute the weighted average for each student and sort the DataFrame by these scores in descending order.
- Result Extraction: Extract and return the
student_id
of students whose weighted average score exceeds 70.
Example
Given the following DataFrame:
student_id | math_score | reading_score | writing_score |
---|---|---|---|
sde123as | 95 | 90 | Missing |
ml1256w | 50 | 70 | 10 |
as34erty | 90 | Missing | Missing |
After cleaning and processing, the DataFrame showing top students might look like:
student_id |
---|
123 |
Explanation
In the given example, only student_id
sde123as (cleansed to 123) met all the criteria after imputation and computation of the weighted average. The student's final score surpassed the threshold of 70, qualifying them as a top student.
Function Description
- Function:
topStudents(df)
- Parameters: A pandas DataFrame
df
containing the student marks. - Returns: A pandas DataFrame listing the student IDs of the top students.
Constraints
- The DataFrame
df
will have at least 1 row and no more than 1000 rows.
如果你也对Wayfair 感兴趣,欢迎联系我们。查看我们的服务价格,代面试,面试辅助,简历编写和算法私教等等,应有尽有。
If you are also interested in Wayfair , feel free to contact us. Check out our service rates for interview proxy, interview assistance, resume writing, private algorithm tutoring, and much more—everything you need is available.