ashishtele

Open Source

Quick-Notes-for-ML-DS

# 🔥 Quick Notes for ML, DS, MLOps, LLMOps 🔥 It contains interview preparation notes from iNeuron, article links, and others ## Important Concepts: 1. What is the difference between filter, wrapper, and embedded methods for feature selection? [Answer](https://sebastianraschka.com/faq/docs/feature_sele_categories.html) 2. 120 Questions. [Answer](https://towardsdatascience.com/120-data-scientist-interview-questions-and-answers-you-should-know-in-2021-b2faf7de8f3e) 3. Probability vs. Likelihood. [Answer](https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability#2647) My Fav.: [StatQuest](https://www.youtube.com/watch?v=pYxNSUDSFH4) 4. Generative and discriminative. [Answer](https://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-a-discriminative-algorithm) 5. ML concepts and code. [Answer](https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html) 6. EM - Expectation-Maximization. [Answer](Expectation-Maximization) 7. Random Forest. [Answer](https://www.youtube.com/watch?v=J4Wdy0Wc_xQ) 8. Regression - Type of change. [Answer](https://web.stanford.edu/~mrosenfe/soc_meth_proj3/soc_180B_regression_whatchanges.htm) 9. Pearson vs Spearman vs Kendall: [Stackexchange](https://datascience.stackexchange.com/questions/64260/pearson-vs-spearman-vs-kendall) 10. Gain and Lift Charts. [listendata](https://www.listendata.com/2014/08/excel-template-gain-and-lift-charts.html) 11. Statistical Hypothesis tests in Python. [Jason](https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/) 12. Machine learning system design. [Link](https://huyenchip.com/machine-learning-systems-design/toc.html) 13. A/B Testing. [Link](https://nancyyanyu.github.io/posts/17c5bb19/), [Link](https://www.youtube.com/watch?v=DUNk4GPZ9bw&ab_channel=DataInterview) 14. Product Questions. [Quora](https://www.quora.com/profile/Teng-Lu-1/answers) 15. Random Forest to Layman. [Quora](https://www.quora.com/How-does-randomization-in-a-random-forest-work) 16. ANOVA, ANCOVA etc. [Link](http://www.statsmakemecry.com/smmctheblog/stats-soup-anova-ancova-manova-mancova) 17. ML System Design Template [Link](https://www.mle-interviews.com/ml-design-template) 18. PM Technical Concepts [Link](https://divyacohen.medium.com/how-to-prepare-for-googles-product-management-technical-round-when-you-are-not-technical-474de3ee01b3) 19. Trustworthy Online Controlled Experiments [Link](https://www.amazon.com/gp/product/1108724264/ref=as_li_tl?ie=UTF8&tag=rdy-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1108724264&linkId=ec7f21541818587686159b0d44e4f63d) 20. Product Sense [Link](https://prodbee.com/index.html) 21. Tableau-style User Interface for visual exploration [Link](https://github.com/Kanaries/pygwalker) ## Useful blogs to refer to: 1. Martin Henze (Heads or Tails). [Blog](https://heads0rtai1s.github.io/2020/11/05/r-python-dplyr-pandas/) 2. Python Snippets. [Link](https://github.com/dushyantkhosla/python-snippets) 3. PandasVault. [Link](https://github.com/firmai/pandasvault#shift-columns-to-front) 4. Python Engineer. [Twitter](https://twitter.com/python_engineer) 5. Paired vs Unpaired data: [link](https://socratic.org/questions/what-is-a-paired-and-unpaired-t-test-what-are-the-differences) 6. Data-informed product building: [Link](https://medium.com/sequoia-capital/data-informed-product-building-1e509a5c4112) 7. Metric: [Link](https://productlessons.substack.com/p/what-to-do-when-your-metrics-dip), [Link](https://igotanoffer.com/blogs/product-manager/product-metric-interview-questions),[SQL](https://quip.com/2gwZArKuWk7W) 8. Into to Linear Algebra: [Link](https://pabloinsente.github.io/intro-linear-algebra) 9. IMS data sources: [Link](https://csimarket.com/stocks/segments.php?code=RX) 10. Predictive model performance check: [ListenData](https://www.listendata.com/2015/01/model-performance-in-logistic-regression.html) 11. Case Study: [Link](https://hackingthecaseinterview.thinkific.com/pages/market-entry-case-interview) 12. Collection of cases: [Link](https://hackingthecaseinterview.thinkific.com/pages/articles), [GAME](https://hackernoon.com/metrics-game-framework-5e3dce1be8ac) 13. Gradient Boosting: [Link](https://www.youtube.com/watch?v=3CC4N4z3GJc&ab_channel=StatQuestwithJoshStarmer) 14. Federated learning: [Link](https://www.quora.com/What-is-federated-learning), [Link2](https://federated.withgoogle.com/) 15. MLOps: [Link](https://github.com/GokuMohandas/madewithml) 16. Mixed Effect Models: [Link](https://towardsdatascience.com/how-linear-mixed-model-works-350950a82911), [Link1](https://medium.com/analytics-vidhya/introduction-to-mixed-models-208f012aa865) 17. ML System feature store: [Link](https://medium.com/data-for-ai/comprehensive-and-comparative-list-of-feature-store-architectures-for-data-scientists-and-big-data-86ea8c4d853b) 18. Data Science Cheat Sheet: [Link](https://www.theinsaneapp.com/2020/12/machine-learning-and-data-science-cheat-sheets-pdf.html) 19. Things can go wrong: [Link](https://towardsdatascience.com/51-things-that-can-go-wrong-in-a-real-world-ml-project-c36678065a75) 20. Transformers from scratch [Link](https://e2eml.school/transformers.html) 21. Dive into Deep Learning [Link](https://d2l.ai/chapter_preface/index.html) 22. DL Interview [Link](https://arxiv.org/ftp/arxiv/papers/2201/2201.00650.pdf) 23. DL Rules of Thumb [Link](https://jeffmacaluso.github.io/post/DeepLearningRulesOfThumb/) 24. ML Forecasting [Link](https://towardsdatascience.com/ml-time-series-forecasting-the-right-way-cbf3678845ff) 25. MLOps without much Ops [Link](https://towardsdatascience.com/mlops-without-much-ops-d17f502f76e8) 26. Rules of Machine Learning by Google [Link](https://developers.google.com/machine-learning/guides/rules-of-ml) 27. Product Management for AI [Link](https://www.oreilly.com/radar/product-management-for-ai/) 28. Feature Engineering and stacking [Link](https://www.kaggle.com/code/solegalli/feature-engineering-and-model-stacking/notebook) 29. Distilled AI [Link](https://aman.ai/cs229/) 30. Leetcode List [link](https://aman.ai/code/) 31. There is only one test [Link](https://towardsdatascience.com/data-scientists-need-to-know-just-one-statistical-test-3115b2ff26fd) 32. Engineering Practices for DS [Link](https://valohai.com/engineering-practices-ebook/) 33. MLE Flashcards [Link](https://github.com/b7leung/MLE-Flashcards) 34. Time Series Forecasting [Link](https://github.com/KishManani/DataTalksClub2022/blob/main/Feature%20engineering%20for%20time%20series%20forecasting%20DataTalksClub.pdf) 35. MLStack.Cafe [Link](https://www.mlstack.cafe/) 36. Agile data science [Link](https://towardsdatascience.com/my-best-tips-for-agile-data-science-research-b40365cc979d) 37. Matt Mochary Method [Link](https://docs.google.com/document/d/18FiJbYn53fTtPmphfdCKT2TMWH-8Y2L-MLqDk-MFV4s/preview?pru=AAABhJXMgQo*wpkvH9cihXuCqm_7HASBVw) 38. Nubank [Link](https://building.nubank.com.br/data/data-science-machine-learning/) ### ML System Design: 1. Framework [Link](https://leetcode.com/discuss/interview-question/system-design/566057/machine-learning-system-design-a-framework-for-the-interview-day) 2. Product minded ML design. [Link](https://www.youtube.com/watch?v=Hv54e-9XnZ0&ab_channel=AssociationforComputingMachinery%28ACM%29) 3. ML Design [Link](https://github.com/khangich/machine-learning-interview/blob/master/design.md) 4. MLE Book [Link](http://www.mlebook.com/wiki/doku.php) 5. ML System design [Link](https://becominghuman.ai/machine-learning-system-design-f2f4018f2f8) 6. Full stack deep learning [Link](https://fall2019.fullstackdeeplearning.com/) 7. Production Machine Learning Problems [Link](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46178.pdf) 8. ML System Design Resources [Link](https://www.teamblind.com/post/Machine-learning-engineering-and-ML-systems-design-resources-master-list-gWY7ZUTT) 9. Metric Question [Link](https://medium.com/datainterview/principles-and-frameworks-of-product-metrics-youtube-case-study-ff63257a82d3) 10. Product Matrics [Link](https://medium.com/datainterview/principles-and-frameworks-of-product-metrics-youtube-case-study-ff63257a82d3) 11. ML Stack Template [Link](https://ml-ops.org/content/state-of-mlops) 12. Patrick Halina - ML Design [Link](http://patrickhalina.com/posts/ml-systems-design-interview-guide/) 13. ML Interview [Link](https://github.com/alirezadir/machine-learning-interview-enlightener) 14. ML Cheat Sheet [Link](https://sites.google.com/view/datascience-cheat-sheets/machine-learning_1) 15. ML Project Timelines [Link](https://docs.google.com/document/d/1D-M6nxeLnIaFufS-u2Ymp45AYB9eEmVHZno7G2F545U/edit) 16. ML in Production [Link](https://mlinproduction.com/) 17. MLOps Paper [Link](https://arxiv.org/ftp/arxiv/papers/2205/2205.02302.pdf) 18. MLOps Questions [Link](https://hashdork.com/top-mlops-interview-questions/) 19. System Design Videos [Link](https://www.youtube.com/c/ByteByteGo/videos) 20. Instacart MLOps [Link](https://tech.instacart.com/lessons-learned-the-journey-to-real-time-machine-learning-at-instacart-942f3a656af3) 21. ML Tests [Link](https://github.com/microsoft/recommenders/tree/main/tests) 22. Operationalizing Machine Learning [Link](https://arxiv.org/pdf/2209.09125.pdf) 23. Swirlai [Link](https://www.newsletter.swirlai.com/archive?sort=new) 24. Twitter Recommendation [Link](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm) 25. AskItRight [Link](https://github.com/AbdArdati/PDFQueryAI/tree/main) ## Foundation Models: 1. Stanford LLM [Link](https://stanford-cs324.github.io/winter2023/) 2. Better Product Search [Link](https://www.databricks.com/blog/enhancing-product-search-large-language-models-llms.html) 3. Natural Language Processing with Deep Learning [Link](https://web.stanford.edu/class/cs224n/index.html?utm_source=substack&utm_medium=email#schedule) 4. Chat2Vis [Link](https://github.com/frog-land/Chat2VIS_Streamlit/blob/main/classes.py) 5. RAG vs Fine-tuning [Link](https://arxiv.org/pdf/2401.08406.pdf) 6. Practical RAG [Link](https://huggingface.co/blog/hrishioa/retrieval-augmented-generation-1-basics) 7. LLM Zoomcamp [Link](https://github.com/DataTalksClub/llm-zoomcamp) 8. What We Learned from a Year of Building with LLMs [Link](https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/) [Link](https://applied-llms.org/) 9. LongRAG [Link](https://arxiv.org/abs/2406.15319) 10. RouteLLM [Link](https://arxiv.org/pdf/2406.18665) 11. ColPali: Efficient Document Retrieval with Vision Language Models [Link](https://arxiv.org/abs/2407.01449) 12. LLama 3.1 [Link](https://scontent-lga3-2.xx.fbcdn.net/v/t39.2365-6/452387774_1036916434819166_4173978747091533306_n.pdf?_nc_cat=104&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=t6egZJ8QdI4Q7kNvgFoyQP-&_nc_ht=scontent-lga3-2.xx&oh=00_AYApQoV3LKJxSbr5OyC__fGrjVkPa0Ck_zIgLSoN9bE_uw&oe=66A642CD) 13. 🧠 IncarnaMind [Link](https://github.com/junruxiong/IncarnaMind/tree/main) 14. Mastering LLMs [Link](https://hamel.dev/blog/posts/course/) 15. LongRAG [Link](https://arxiv.org/pdf/2406.15319) 16. Denser Retriever [Link](https://github.com/denser-org/denser-retriever) 17. RAG Techniques [Link](https://github.com/NirDiamant/RAG_Techniques) 18. PDF extract [Link](https://github.com/opendatalab/PDF-Extract-Kit) 19. crawl4ai [Link](https://github.com/unclecode/crawl4ai?tab=readme-ov-file) 20. NotebookLM [Link](https://notebooklm.google/) 21. Open Data QA [Link](https://github.com/GoogleCloudPlatform/Open_Data_QnA) 22. Contexual RAG [Link](https://colab.research.google.com/drive/1hUi7ECRU5fXUZ_9IHN5vMOZz7rVO0Qn-?usp=sharing) 23. Long-Context LLMs Meet RAG [Link](https://arxiv.org/pdf/2410.05983) 24. LLM Systems [Link](https://llmsystem.github.io/llmsystem2025spring/) ## Useful LinkedIn Posts: Understand the business context first, don't get over-excited about the tech, and jump into coding too early. When someone asks you for a model, always ask: 👉 why do you need it? 👉 what is your current solution (e.g. what is the baseline to beat)? 👉 who is going to use the predictions and how? 👉 what is the financial impact of the model’s downtime or mistakes? 👉 which metrics do we care about to measure what? Once you have your answers, back them up with a solid exploratory data analysis, and when you're done, loop in the business team again. This is a critical moment as your results will translate into 3 potential outcomes: 💡 “Really? This contradicts what I thought. Well, in this case, the ML model doesn’t make much sense anymore”. You are off the hook without a single line of code 🔴 💡 “Ah, interesting. I guess we’ll have to change requirements/scope then.” Course-correct before moving forward 🟠 💡 “This is what I expected. Let’s go ahead”. Greenlight 🟢 [Vin Vashishta:](https://www.linkedin.com/in/vineetvashishta/) “Next year, we can deliver $X in cost savings and revenue. Last year we delivered B projects resulting in C revenue and D cost savings. We plan to grow that by E%, requiring an F% increase in our total budget to execute.”

Education & Learning

160 Github Stars

Software by ashishtele

Quick-Notes-for-ML-DS