16/06/2016
21 MustKnow Data Science Interview Questions and Answers
KDnuggets Data Mining, Analytics, Big Data, and Data Science Subscribe to KDnuggets News | Follow | Contact search KDnuggets Search
Data Mining Software News Top stories Opinions Tutorials Jobs Academic Companies Courses Datasets Education Meetings Polls Webinars KDnuggets Home » News » 2016 » Feb » Tutorials, Overviews » 21 MustKnow Data Science Interview Questions and Answers ( 16:n06 )
Latest News, Stories Top tweets, Jun 814: Allinone Docker image for Dee... How Much Will A.I. Surprise Us? Figuring Out the Algorithms of Intelligence TMobile: Sr. Data Scientist Yale School of Management: Assistant Professor of Mark... More News & Stories | Top Stories
21 MustKnow Data Science Interview Questions http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
1/6
16/06/2016
21 MustKnow Data Science Interview Questions and Answers
and Answers Previous post Next post 50
Tweet Tags: Bootstrap sampling, Data Science, Interview questions, Kirk D. Borne, Precision, Recall, Regularization, Yann LeCun 440
KDnuggets Editors bring you the answers to 20 Questions to Detect Fake Data Scientists, including what is regularization, Data Scientists we admire, model validation, and more. By Gregory Piatetsky, KDnuggets. comments The recent post on KDnuggets 20 Questions to Detect Fake Data Scientists has been very popular most viewed in the month of January. However these questions were lacking answers, so KDnuggets Editors got together and wrote the answers to these questions. I also added one more critical question number 21, which was omitted from the 20 questions post. Here are the answers. Because of the length, here are the answers to the first 11 questions, and here is part 2.
Q1. Explain what regularization is and why it is useful. Answer by Matthew Mayo. Regularization is the process of adding a tuning parameter to a model to induce smoothness in order to prevent overfitting. (see also KDnuggets posts on Overfitting) This is most often done by adding a constant multiple to an existing weight vector. This constant is often either the L1 (Lasso) or L2 (ridge), but can in actuality can be any norm. The model predictions should then minimize the mean of the loss function calculated on the regularized training set. Xavier Amatriain presents a good comparison of L1 and L2 regularization here, for those interested.
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
2/6
16/06/2016
21 MustKnow Data Science Interview Questions and Answers
Fig 1: Lp ball: As the value of p decreases, the size of the corresponding Lp space also decreases.
Q2. Which data scientists do you admire most? which startups? Answer by Gregory Piatetsky: This question does not have a correct answer, but here is my personal list of 12 Data Scientists I most admire, not in any particular order.
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
3/6
16/06/2016
21 MustKnow Data Science Interview Questions and Answers
Geoff Hinton, Yann LeCun, and Yoshua Bengio for persevering with Neural Nets when and starting the current Deep Learning revolution. Demis Hassabis, for his amazing work on DeepMind, which achieved human or superhuman performance on Atari games and recently Go. Jake Porway from DataKind and Rayid Ghani from U. Chicago/DSSG, for enabling data science contributions to social good. DJ Patil, First US Chief Data Scientist, for using Data Science to make US government work better. Kirk D. Borne for his influence and leadership on social media. Claudia Perlich for brilliant work on ad ecosystem and serving as a great KDD2014 chair. Hilary Mason for great work at Bitly and inspiring others as a Big Data Rock Star. Usama Fayyad, for showing leadership and setting high goals for KDD and Data Science, which helped inspire me and many thousands of others to do their best. Hadley Wickham, for his fantastic work on Data Science and Data Visualization in R, including dplyr, ggplot2, and Rstudio. There are too many excellent startups in Data Science area, but I will not list them here to avoid a conflict of interest. Here is some of our previous coverage of startups.
Q3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
Answer by Matthew Mayo. Proposed methods for model validation: If the values predicted by the model are far outside of the response variable range, this would immediately indicate poor estimation or model inaccuracy. If the values seem to be reasonable, examine the parameters; any of the following would indicate poor estimation or multicollinearity: opposite signs of expectations, unusually large or small values, or observed inconsistency when the model is fed new data. Use the model for prediction by feeding it new data, and use the coefficient of determination (R squared) as a model validity measure. Use data splitting to form a separate dataset for estimating model parameters, and another for validating predictions. Use jackknife resampling if the dataset contains a small number of instances, and measure validity with R squared and mean squared error (MSE). Pages: 1 2 3 http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
4/6
16/06/2016
21 MustKnow Data Science Interview Questions and Answers
Previous post Next post
Most popular last 30 days Most viewed 1. 7 Steps to Mastering Machine Learning With Python 2. R vs Python for Data Science: The Winner is ... 3. Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? 4. TensorFlow Disappoints Google Deep Learning falls shallow 5. What is the Difference Between Deep Learning and “Regular” Machine Learning? 6. 9 MustHave Skills You Need to Become a Data Scientist 7. Top 10 Data Analysis Tools for Business
Most shared 1. What is the Difference Between Deep Learning and “Regular” Machine Learning? 2. How to Explain Machine Learning to a Software Engineer 3. Data Science of Variable Selection: A Review 4. R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results 5. 5 Machine Learning Projects You Can No Longer Overlook 6. Machine Learning Key Terms, Explained 7. How to Build Your Own Deep Learning Box
More Recent Stories Mining Twitter Data with Python Part 1: Collecting Data KDnuggets 16:n21, Jun 15: What Big Data, Data Science tools... What Big Data, Data Science, Deep Learning software goes toget... 10 Useful Python Data Visualization Libraries for Any Discipline Data Science Summit, July 1213, San Francisco – KDnugge... 10 Data Acquisition Strategies for Startups Machine Learning Classic: Parsimonious Binary Classification T... Crowdfunding Analytics = New Revelations Ahead Webcast: Learn how statisticians can work across disciplines. Top Stories, June 612: Data Science of Variable Selection; R,... How to Select Support Vector Machine Kernels Apache Spark Key Terms, Explained PPMI Data Challenge 2016 – Help Solve Parkinsons Disease http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
5/6
16/06/2016
21 MustKnow Data Science Interview Questions and Answers
A Brief Primer on Linear Regression – Part 2 Metis Data Science Open House, Jun 13, New York City Data Insight Leaders Summit, Barcelona, 1213 Oct AIG & Zurich on Machine Learning in Insurance Project Murphy Microsoft Bot Framework AI Doing Data Science: A Kaggle Walkthrough Part 4 – Data Trans... Build Your Own Audio/Video Analytics App With HPE Haven OnDema... KDnuggets Home » News » 2016 » Feb » Tutorials, Overviews » 21 MustKnow Data Science Interview Questions and Answers ( 16:n06 ) © 2016 KDnuggets. About KDnuggets Subscribe to KDnuggets News | Follow X
@kdnuggets |
|
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook…
6/6