[BBC] 120min OA 2025 start – 24 Apr (generic)

Classify Customer Feedback

Business Logic

Given a dataset of customers' feedback classified as Positive or Negative, build a deep neural network to classify feedback accurately. A review is categorized based on tone, words, length, and the style of writing by the customers.

Datasets

  • train.csv – data used to train the model
  • test.csv – data used to test predictions
  • submissions.csv – populate this file with the results
  • sample_submission.csv – sample reference of submission data file

Schema

FieldTypeDescription
customer_reviewstrcustomer review
feedbackintfeedback (0-Negative, 1-Positive)

Predictive Modeling and Model Evaluation

Build a neural network to classify customer feedback.
Experiment with different preprocessing methods, numbers of layers, types of layers, activation functions, and any other relevant parameters. Compile the model by specifying the loss function and optimizer. Ensure that the model is not overfitting.
Assess model performance on train.csv using the F1 Score metric. Information about the metric is here.
The model will be tested for robustness using a different dataset.

Submission

For each record in the test set (test.csv), predict the value of the feedback variable.
Submit a CSV file with a header row and one row per test entry. The file (submissions.csv) should have exactly 2 columns:

  • customer_review
  • feedback (0/1)

Test Evaluation

The score will be automatically evaluated based on the value of F1 metric for the submissions.csv file.
Get the best possible value of the metric during model development.
The reviewer might dive deeper into the Jupyter notebook to get more context.

Bag of Words

Preprocessing Output

Consider the following sentences:
Sentence 1: "Welcome to HackerRank Learning. Now start Learning"
Sentence 2: "Learning is very important"

What will the output of the bag of words after preprocessing?

SentenceWelcomeHackerRankLearningnowstartveryimportant
Sentence11121100
Sentence20010011

Sequence Labeling with BiLSTM for Text Classification

Code Snippet

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense

# Define the model architecture
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_seq_length))
model.add(Bidirectional(LSTM(units=hidden_units, return_sequences=True)))
model.add(Dense(num_labels, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=num_epochs, batch_size=batch_size)

Question

What is a potential error or limitation in the code snippet that may cause the model to fail to converge during training?

Answer: An incorrect loss function is chosen for the sequence labeling task.

Non-Convergence

Scenario

A large language model is fine-tuned on a sentiment analysis task with instructions-based tuning, but the training loss does not converge. It is observed that the instruction provided to the model is insufficient, and not much information is provided to the LLM related to the task.

Actions to Improve Convergence

  • Increase the number of training examples for sentiment classification.
  • Validate the training data consistency and use only consistent training samples.
  • Design more informative instructions with examples from sentiment analysis task for fine-tuning the LLM.

Segmentation Model Training Setup

Scenario

A model is being prepared for an image segmentation task to classify pixels in an image into 2 categories (object and background). 2000 pairs of input and mask (label) images were collected for the training dataset.

Best Practice Training Setup

Answer: Training set: 1600 pairs, validation set: 200 pairs, test set: 200 pairs, steps per epoch: 100, batch size: 16, learning rate: 0.001, loss function: binary crossentropy.

我们长期稳定承接各大科技公司如TikTok、Google、Amazon等的OA笔试代写服务,确保满分通过。如有需求,请随时联系我们。

We consistently provide professional online assessment services for major tech companies like TikTok, Google, and Amazon, guaranteeing perfect scores. Feel free to contact us if you're interested.

Leave a Reply

Your email address will not be published. Required fields are marked *