Classify Customer Feedback
Business Logic
Given a dataset of customers' feedback classified as Positive or Negative, build a deep neural network to classify feedback accurately. A review is categorized based on tone, words, length, and the style of writing by the customers.
Datasets
train.csv
– data used to train the modeltest.csv
– data used to test predictionssubmissions.csv
– populate this file with the resultssample_submission.csv
– sample reference of submission data file
Schema
Field | Type | Description |
---|---|---|
customer_review | str | customer review |
feedback | int | feedback (0-Negative, 1-Positive) |
Predictive Modeling and Model Evaluation
Build a neural network to classify customer feedback.
Experiment with different preprocessing methods, numbers of layers, types of layers, activation functions, and any other relevant parameters. Compile the model by specifying the loss function and optimizer. Ensure that the model is not overfitting.
Assess model performance on train.csv
using the F1 Score metric. Information about the metric is here.
The model will be tested for robustness using a different dataset.
Submission
For each record in the test set (test.csv
), predict the value of the feedback
variable.
Submit a CSV file with a header row and one row per test entry. The file (submissions.csv
) should have exactly 2 columns:
- customer_review
- feedback (0/1)
Test Evaluation
The score will be automatically evaluated based on the value of F1 metric for the submissions.csv
file.
Get the best possible value of the metric during model development.
The reviewer might dive deeper into the Jupyter notebook to get more context.
Bag of Words
Preprocessing Output
Consider the following sentences:
Sentence 1: "Welcome to HackerRank Learning. Now start Learning"
Sentence 2: "Learning is very important"
What will the output of the bag of words after preprocessing?
Sentence | Welcome | HackerRank | Learning | now | start | very | important |
---|---|---|---|---|---|---|---|
Sentence1 | 1 | 1 | 2 | 1 | 1 | 0 | 0 |
Sentence2 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
Sequence Labeling with BiLSTM for Text Classification
Code Snippet
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense
# Define the model architecture
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_seq_length))
model.add(Bidirectional(LSTM(units=hidden_units, return_sequences=True)))
model.add(Dense(num_labels, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=num_epochs, batch_size=batch_size)
Question
What is a potential error or limitation in the code snippet that may cause the model to fail to converge during training?
Answer: An incorrect loss function is chosen for the sequence labeling task.
Non-Convergence
Scenario
A large language model is fine-tuned on a sentiment analysis task with instructions-based tuning, but the training loss does not converge. It is observed that the instruction provided to the model is insufficient, and not much information is provided to the LLM related to the task.
Actions to Improve Convergence
- Increase the number of training examples for sentiment classification.
- Validate the training data consistency and use only consistent training samples.
- Design more informative instructions with examples from sentiment analysis task for fine-tuning the LLM.
Segmentation Model Training Setup
Scenario
A model is being prepared for an image segmentation task to classify pixels in an image into 2 categories (object and background). 2000 pairs of input and mask (label) images were collected for the training dataset.
Best Practice Training Setup
Answer: Training set: 1600 pairs, validation set: 200 pairs, test set: 200 pairs, steps per epoch: 100, batch size: 16, learning rate: 0.001, loss function: binary crossentropy.
我们长期稳定承接各大科技公司如TikTok、Google、Amazon等的OA笔试代写服务,确保满分通过。如有需求,请随时联系我们。
We consistently provide professional online assessment services for major tech companies like TikTok, Google, and Amazon, guaranteeing perfect scores. Feel free to contact us if you're interested.
