跳槽到Meta 超全攻略!

csoahlp今日份写了篇,跳槽搭配Meta 的攻略,非常的详细

面试全程60分钟 面试官是一位多年的资深白人EM

5分钟intro, 15分钟聊BQ,与别人工作中遇到的conflict,需要具体的实例不能泛泛而谈

后40分钟两道题,没有在力扣上刷过...... 第一道写一个function判断两个日期是否是正好1 month away,or within a month, or more than one month, 自己定义input/output。 第二道是一个数学题,有n个开关控制灯泡,一开始n个开关都是关闭状态。然后有n次操作,第一次分别switch每个开关,

第二次隔一个开关switch一次(i.e. switch第1, 3, 5, ...开关),第三次隔两个switch一次(i.e. switch 第1, 4, 7,...)。

然后最终问还有多少个开关是点亮的状态。

真题:

 

Meta Data Scientist Interview Questions Meta interview questions generally fall into four main categories: Product and business sense Technical data analysis (SQL, pandas) Statistics and probability Modeling knowledge and applying data The technical screen will generally consist of one product question and one data analysisquestion. Be sure to prepare for both in order to move on to the onsite.Note: This process is the same for both those seeking ful-time jobs at Meta, as well as Meta datascience internships Meta Case Studylnterview Questions Case study questions in Meta data science interviews focus heavily on product metrics andbusiness cases. 1. Meta composer, the posting tool, drops from 3% posts per user lastmonth to 2.5% posts per user today. How would you investigate what happened? The question states the drop is from 3% a month ago to 2.5% today. The first thing we have todo is clarify the context around the problem before jumping to conclusions about metrics. lstoday a weekday and one month from today a weekend, so users are posting less? ls there aspecial event or seasonality? ls this an ongoing downward trend or a one-time occurrence spikedownwards? The second part is understanding the metric itself. What drove the decrease: was it the numberof users that increased or the number of posts that decreased? The interviewer will likely askvou to jump into one or both of the metrics to discuss what could have caused the decrease 2. A Meta Groups product manager decides to add threading to comments on group posts. Comments per user increase by 10%, but posts go down 2%. Why would that be? What metrics would prove your hypotheses? Threading restructures the flow of comments so that, instead of responding to the post, userscan now respond to individual comments beneath the post,What effect might this have on apush notification ecosystem?

 

Meta Case Study面试问题,以下是中文解答:

Meta composer的问题

我们需要查看Meta composer的具体数据,以了解其下降的具体情况。我们需要比较用户活跃度、使用时长、留存率等关键指标的变化。此外,我们还需要了解该应用的其他功能是否也出现了类似的问题。

然后,我们需要确定Meta composer下降的具体原因。可能的原因包括:用户体验不佳、竞争激烈、市场饱和等。为了验证这些原因,我们需要收集更多的数据和用户反馈。

最后,我们需要制定相应的策略来解决问题。这可能包括改进用户体验、增加营销推广、开发新功能等。

Meta Groups产品经理添加评论线程的问题

我们需要了解评论线程的添加对用户行为的直接影响。由于评论数量增加了10%,这表明用户参与度有所提高。但是,帖子数量下降了2%,这可能表明用户更倾向于在评论中表达自己的观点,而不是发布新的帖子。

其次,我们需要考虑评论线程对其他相关指标的影响。例如,我们需要注意到评论的回复率和点赞数是否也有所变化。如果回复率和点赞数也有所增加,则说明评论线程的添加对用户互动产生了积极影响。

最后,我们需要考虑如何优化产品以进一步提高用户参与度和满意度。这可能包括改进评论线程的用户界面和用户体验,增加更多的互动功能等。

Write a SQL query to create a histogram of the number of comments per user in the month of January 2020. Assume bin buckets with class intervals of one.

SELECT COUNT(*) AS comment_count, FLOOR(DATEDIFF(day, '2020-01-01', created_at) / 7) AS week_bucket FROM comments JOIN users ON comments.user_id = users.id WHERE MONTH(created_at) = 1 AND YEAR(created_at) = 2020 GROUP BY week_bucket ORDER BY week_bucket;

In this query, we're creating a histogram of the number of comments per user in the month of January 2020. The DATEDIFF function calculates the number of days between the start of the month (2020-01-01) and the created_at column in the comments table. We divide this by 7 to get the number of weeks since the start of the month. The FLOOR function is used to create bin buckets with class intervals of one, which means we'll have one week bins for the histogram. We then join the users table on the user_id column to get the name of each user. The WHERE clause filters out comments made outside of January 2020. The GROUP BY clause groups the results by week bucket, and the ORDER BY clause orders the results by week bucket. The result is a histogram showing the number of comments per user in each week of January 2020.

 

解答

Meta正在推出一个名为“Mentions”的新功能,这是一个专门为Meta上的名人设计的应用程序,以便与他们的粉丝建立联系。你会如何衡量Mentions应用程序的健康状况?

我们可以开始分解面试官正在寻找的一些结构。每当我们面对这些开放性的产品问题时,通过明确的目标来构建问题是有意义的,这样我们就不会在不同的答案之间切换。

  1. 在定义指标之前,你是否先阐述了该功能的目标?Mentions功能的意义是什么?

  2. 你的回答是结构化的,还是倾向于谈论随机点?

  3. 指标定义是具体的,还是像“我会找出人们是否经常使用Mentions”这样的例子中是泛化的?

  4. Meta如何发现用户何时伪造他们就读的学校?

  5. 如果70%的iOS上的Meta用户使用Instagram,但只有35%的Android上的Meta用户使用Instagram,你将如何调查这种差异?

  6. Meta Newsfeed的参与度下降了10%。你会怎么找出原因?

Meta SOL面试问题

SQL问题是数据科学面试中最常被问到的问题。在我们的指南“Meta SOL面试问题”中查看更多实践问题。

  1. 编写一个SOL查询,以创建2020年1月份每个用户的评论数量的直方图。假设bin桶的类别间隔为1。

用户表

  • Columns:
    • id (INTEGER)
    • name (VARCHAR)
    • created_at (DATETIME)
    • neighborhood (INTEGER)
  • id (VARCHAR)
  • mail (VARCHAR)

评论表 (commentstable)

Columns user_id body created_at Type INTEGER VARCHAR DATETIME Since a histogram is just a display of frequencies of each user, all we really need to do is get thetotal count of user comments in the month of January 2020 for each user, and then group bythat count. 2.In the table below, column action represents either ( post enter'post submit,post canceled ) for when a user starts a post (enter)ends up canceling it (cancel), or ends up posting it (submit). events table ColumnType id INTEGER user_idINTEGER created_atDATETIME actionVARCHAR urlVARCHAR VARCHARplatform Write a query to get the post success rate for each day in the month of January 2020Let’s see if we can clearly define the metrics we want to calculate before just jumping into theproblem. We want post success rate for each day over the past week.To get that metric, we can assume post success rate can be defined as: (total posts created) / (total posts entered) Additionally, since the success rate must be broken down by day, we must make sure that a postthat is entered must be completed on the same day. Now that we have these requirements, it’ s time to calculate our metrics. We know we have toGROUP BY the date to get each day’ s posting success rate. We also have to break down how wecan compute our two metrics of total posts entered and total posts actually created.

 

解答

首先,我们来分析题目。题目中给出了两个表格,一个是“用户评论”表,另一个是“事件”表。这两个表分别记录了用户发表的评论和用户在平台上创建或取消创建帖子的事件。

根据题目要求,我们需要计算在2020年1月每天的帖子成功创建率。所谓的“帖子成功创建率”是指成功创建的帖子数量与尝试创建的帖子数量之比。因此,我们需要统计每天尝试创建的帖子数量和成功创建的帖子数量。

为了计算这两个数量,我们需要从“事件”表中筛选出在2020年1月的相关记录。然后,我们需要根据日期和动作(post enter、post submit、post canceled)对数据进行分组,以统计每天的尝试创建和成功创建的帖子数量。

其次,我们需要使用SQL的聚合函数来计算每天的总尝试创建和总成功创建的帖子数量。使用COUNT函数可以帮助我们统计每天的帖子数量,而使用SUM函数可以帮助我们统计每天尝试创建和成功创建的帖子数量。

最后,我们将使用这些聚合数据来计算每天的帖子成功创建率。我们可以通过将每天的成功创建帖子数量除以尝试创建的帖子数量来得到这个比率。

SELECT DATE(created_at) AS post_date, (SUM(CASE WHEN action = 'post submit' THEN 1 ELSE 0 END) / SUM(CASE WHEN action = 'post enter' THEN 1 ELSE 0 END)) AS success_rate FROM events WHERE created_at >= '2020-01-01' AND created_at <= '2020-01-31' GROUP BY post_date;

 

3. We want to build a naive recommender, and we’ re given two tables, one table called friends with a user id and friend id columns representing each user’ s friends, and another table called page_likes with a user_id and a page_id representing the page each user liked. friends table TypeColumn user_idINTEGER friend_idINTEGER page_likestable ColumnType user_idINTEGER INTEGERpage_id Write an SOL query to create a metric to re(.ir r;end pages for each user based onrecommendations from their friends liked pages. We can start by visualizing what kind of output we want from the query. Given that we have tocreate a metric for each user to recommend pages, we know we want something with a user idand a pageid along with some sort of recommendation score. How can we easily represent the scores of each user id and page id combo? One naive methocwould be to create a score by summing up the total likes by friends on each page that the userhasn’t currently liked, The max value on our metric would be the most recommendable pageThe first thing we have to do then is to write a query to associate users to their friends likedpages. We can do that easily with an initial join between the two tables. Statistics and Probability Interview Ouestions Statistics and probability questions assess your understanding of mathematical concepts andhow they’ re used in data science interviews at Meta. 1. What do you think the distribution of time spent per day on Meta lookslike? What metrics would you use to describe that distribution? Having the vocabulary to describe a distribution is an important skill as a data scientist when itcomes to communicating ideas to vour peers. There are four important concepts with

解答:

我们想建立一个简单的推荐器,并且我们有两个表,一个表叫做friends,包含用户ID和朋友ID列,代表每个用户的朋友;另一个表叫做page_likes,包含用户ID和页面ID,代表每个用户喜欢的页面。 friends表

user_id: 整数 friend_id: 整数 page_likes表

user_id: 整数 page_id: 整数 我们要写一个SQL查询,基于用户朋友们喜欢的页面来为每个用户推荐页面。

首先,我们要明确我们想要从查询中得到什么样的输出。考虑到我们要为每个用户创建一个推荐页面的指标,我们知道我们需要用户ID、页面ID和某种推荐分数。

如何轻松地表示每个用户ID和页面ID组合的分数呢?一个简单的方法是通过汇总每个页面上朋友的总喜欢数(用户当前没有喜欢的页面)来创建分数。我们指标的最大值将是最推荐的页面。

那么,我们首先要做的就是写一个查询,将用户与他们朋友喜欢的页面关联起来。我们可以很容易地通过两个表之间的初始连接来实现这一点。

以下是SQL查询的示例:

sql SELECT u.user_id, pl.page_id, SUM(pl.likes) AS recommendation_score FROM users u JOIN friends f ON u.user_id = f.user_id JOIN page_likes pl ON f.friend_id = pl.user_id WHERE NOT EXISTS ( SELECT 1 FROM page_likes pl2 WHERE pl2.user_id = u.user_id AND pl2.page_id = pl.page_id ) GROUP BY u.user_id, pl.page_id ORDER BY recommendation_score DESC; 这个查询首先连接了用户、他们的朋友以及朋友喜欢的页面。然后,它排除了用户已经喜欢的页面,并对每个用户和页面组合的总喜欢数进行了汇总。最后,它按推荐分数降序排列结果。

统计和概率面试问题: 在Meta的数据科学面试中,统计和概率问题会评估你对数学概念的理解以及如何在数据科学中使用它们。

你认为每天在Meta上花费的时间的分布是什么样的?你会使用哪些指标来描述这种分布? 作为数据科学家,能够描述分布的词汇是一项重要的技能,因为这涉及到与同行交流想法。有四个重要的概念与分布相关:中心趋势(均值、中位数、众数)、分散程度(方差、标准差)、偏态和峰态。这些概念可以帮助你全面地描述一个分布的形状、位置和变化。

Leave a Reply

Your email address will not be published. Required fields are marked *