作为Data岗的技术力代表,Data Science所要求掌握的技术不仅多,还需要精,面试前需要准备的考察重点也多的让人头昏脑涨。
为了让留学生们在准备投递前更系统、全面的复习,给大家推荐这150道真题。
优势: 由GitHub上的DS大神整理搜集,专为Data面试的候选人编写。
150道题目均来源于真实面试,每道题的都有解题思路和标准答案,参考性非常强。
数学/编程/简历问题等,DS面试中所有类型的题目都在这里。
确切来说就是DS面试指南!
以真实面试过程中的考察重点分为7大板块,每个板块涉及的真题都源自于22-23年最新DS岗位真题
涉及内容:机器学习、深度学习、统计、概率、Python、SQL&Database、简历优化问题
其涵盖的内容范围广,即使是其它Data岗位,如Data Analyst、Data Developer等,也可以拿来复习!
它的问题解答思路采用了具有先后顺序的Bullet Points!
在DS面试中,除了要会解题,如何将你的思考过程用逻辑清楚的语言表述给面试官,根据题目的标准答案的回答顺序就可以表述清楚!
150+个真实面试中的DS技术面考题和常见的行为面问题,并提供详细的解题思路和完整答案。
包括2个章节:讨论Data面试中的高频、难度较高的Statistics和Probability问题,附这类问题会遇到的常见陷阱。
谷歌大神整理的Data Science 150道真题 Here were using the famous John Von Neumam solution Lets say that we have a biased coin with probability of showing head p 0.5 and let dthe probability of showing tails. Neumann's proposed solution is: 1) Consider two tosses as a single toss2)If we get HT, call if the new 'Heads' and if we get TH, call it the new 'Tails’.Ignorcuteomeat the asove process wtil one of T or TH is obtaiedwang smpatensoutcomes ofHHor TTLets see the rationale behind this solution: HT-> TH-, HH一>TH)=P&TCI)=q=1-f1>(W=9人P)=T=Pd)=PHH)+PTT =fa Though the probability of new heads and new tails are edual. They don't add up to 1 since there'sof meither of +hem to show up.Thats precisely why he proposed to toss till we get the either of HT or Ro of tkod =Fh)+PPG)+PP)+PP)+.钟四HPyTa)t+y'Very smilarly wecan show that Rob.of Tail = 013: According to hospital records, 75% of patients suffering from a disease die from that disease. Find out the probability that 4 out of the 6 randomly selected patients survive. Answer: This has to be a binomial since there are only 2 outcomes - death or life. Here n =6 and x=4. p=0.25 (probability if life) q = 0.75(probability of death)Using probability mass function equation: P(X)=nCx*p g(n-x) Then: P(4)=6C4*(0.25)4(0.75)*2=0.032 014: Discuss some methods you will use to estimate the Parameters of a Probability Distribution Answer 015: You have 40 cards in four colors, 10 reds, 10 greens, 10 blues, and ten yellows. Each eolar has a pumber frem 1 tc10. When you pick two cards without replacement, what is the probability that the tw3 carts are ot i the same colorand not in the same number?
真人面试的考察重点分类 The questions are divided into seven categories: Machine Learning Interview Questions & Answers for Data ScientistsDeep Learning Interview Questions & Answers for Data ScientistsStatistics Interview Questions & Answers for Data ScientistsProbability Interview Questions & Answers for Data ScientistsPython Interview Questions & Answers for Data Scientists.SQL & DB Interview Questions & Answers for Data Scientists.Resume Based Questions0 涉及内容机器学习、深度学习、统计、概率、Python、SQL&Database、基于简历的问题 其涵盖的内容范围广,即使是其它Data岗位,如Data Analyst、Data Developer等,也完全可以用来复习,光这一点,就比市面上别的面经强多了。
问题解答思路清晰 Q11: Why use Right Join When Left Join can suffice the requirement? Answer: in MysQl, the RIGHT jOIN and LEFT joiN are used to retrieve data from multiple tables by joining them based on a specifiedcondition. Generally, the LEF joiN is used more frequently than the RIGHT JON because it returs all the rows from the left table and matching rowsfrom the right table, or NULL values if there is no match. In most cases, a LEFT jOIN is sufficient to meet the requirement of retrieving all the data from the left table and matching data from the righttable. However, there may be situations where using a RIGHT JOIN is more appropriate. Here are a few examples: .When the primary table is the right table: lf the right table contains the primary data that needs to be retrieved, and the left table containssupplementary data, a RIGHT oIN can be used to retrieve all the data from the right table and matching data from the left table..When the query needs to be optimized: in some cases, a RIGHT JOIN may be more efficient than a lEFT jOiN because the databasetimizer can choose the most efficient join order based on the guery structure and the available indexes 3.When uzing outer joins: lf the query requires an outer join, a RIGHT jON may be used to return all the rows from the right table, includingthose with no matching rows in the left table. lt's important to note that while a RiGHT jIN can provide additional functionality in certaincases, it may also make the guery more complex and difficul to read. in most cases, LEFT jOlN is the preferred method for joining tables inMysQL 如何将你的思考过程用逻辑清楚的语言表述给面试官,是至关重要的,而这份面经就完全考虑到了这一点: 它的所有问题解答思路几乎都采用了具有先后顺序的Bullet Points,在面试中,你只需按照标准答案的回答顺序就可以表述清楚。
分析高频、难点 + 常见面试陷阱 Q2: Briefly explain the A/B testing and its application? What are some common pitfalls encountered in A/Btesting? A/B testing helps us to determine whether a change in something will cause a change in performance significantly or not. So in other wordsyou aim to statistically estimate the impact of a given change within your digital product (for example). You measure success and countermetrics on at least 1 treatment vs 1 control group (there can be more than 1 XP group for multivariate tests) Applications: 1.Consider the example of a general store that sels bread packets but not butter, for a year. lf we want to check whether its sale depends orthe butter or not, then suppose the store also sells butter and sales for next year are observed. Now we can determine whether sellingbutter can sianificantly increase/decrease or doesn't affect the sale of bread2. While developing the landing page of a website you create 2 different versions of the page. You define a criteria for succes eg. conversionrate. Then define your hypothesis Nul hypothesis(H): No diference between the performance of the 2 versions. Alternative hypothesis(H)version A will perform better than B. NOTE: You wil have to split your traffic randomlyto avoid sample bias) into 2 versions. The split doesn't have to be symmetric, you just need toset the minimum sample size for each version to avoid undersample bias.Now if version A gives better results than version B, we will sti have to statistically prove that results derived from our sample represent theentire population.Now one of the very common tests used to do so is 2 sample t-test where we use values of significance level (alpha) and pvalue to see which hypothesis is right. If p-valuecalpha H is rejected. 常见陷阱Common pitfalls:1. Wrong success metrics inadequate to the business problem2. Lack of counter metric, as you might add friction to the product regardless along with the positive impact3. Sample mismatch: heterogeneous control and treatment, unequal variances4.Underpoered test: oo smal sample or XP running too shot 5.Not accounting for network effets (introduce bias within measurement) 2个单独的章节用来讨论Data面试中的高频、难度较高的statistics和Probability问题,还附上了这部分问题会遇到的常见陷阱,十分贴心。