Some Data Science/Machine Learning References

Additional Current Resources:
Machine Learning Resources: https://www.sethi.org/classes/pub/resources/ml-resources.html
Project Topics/Datasets: https://www.sethi.org/classes/pub/resources/project_topics_datasets.html

Videos and Courses

Data Science Videos and Courses

Free: Edureka Data Science Full Course: https://www.youtube.com/watch?v=-ETQ97mXXF0
Free: UDacity Introduction to Data Science Course: https://www.udacity.com/course/ud359
UDemy Machine Learning A - Z Course: https://www.udemy.com/machinelearning/learn/v4/overview
Free: Udacity Course on Deep Learning: https://classroom.udacity.com/courses/ud730/lessons/6370362152/concepts/63798118150923#
Coursera's JHU Specialization in Data Science: https://www.coursera.org/specializations/jhu-data-science
Data Science with R: https://www.youtube.com/watch?v=32o0DnuRjfg
Nice video series on Data Mining and Predictive Modeling: https://www.youtube.com/watch?v=G_0d3w0THCc&list=PLea0WJq13cnCS4LLMeUuZmTxqsqlhwUoe
Great FREE courses on all aspects of data science: https://www.datacamp.com/courses/all
:

Data Science

Data Science Essentials

The 5 questions data science answers: https://docs.microsoft.com/en-us/azure/machine-learning/studio/data-science-for-beginners-the-5-questions-data-science-answers
1. Is this A or B? → Classification
2. Is this weird? → Anomaly Detection
3. How much, or how many? → Regression
4. How is this organized? → Clustering and Dimensionality Reduction
5. What should I do next? → Reinforcement Learning
But also questions like, Is this the best? which might require optimization, etc.
Industry recommendations for academic data science programs: https://github.com/brohrer/academic_advisory
Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data: https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463

Data Science Projects/Exercises

Five data science projects to learn data science: https://www.analyticsvidhya.com/blog/2014/11/data-science-projects-learn/
Analyzing Election and Polling Data in R: https://www.datacamp.com/courses/analyzing-election-and-polling-data-in-r
Presidential Elections from a Data Science Perspective: https://medium.com/@roberto.reydecastro/presidential-elections-from-a-data-science-perspective-41c94c4e7a6
Election 2016 EDA: https://www.inertia7.com/projects/58
Kaggle Election Tweets Text Mining: https://www.kaggle.com/erikbruin/text-mining-the-clinton-and-trump-election-tweets

Data Science vs. Data Analysis

Difference between Computer Science and Data Science: https://onthe.io/learn/en/category/analytic/Computer-Science-vs-Data-Science-vs-Informatics

Computer Science emphasizes theoretical and empirical approaches to mainpulating data via computation or algorithmic processes.
Informatics deals with a broader study of data and its manipulation in information processes and systems, including social and cognitive models.
Data Science tackles structured and unstructured data and uses both computational methods and cognitive or social methods, especially when visualizing complicated data analytics and business analytics.
Difference between a Data Scientist and Data Analyst: https://www.captechconsulting.com/blogs/data-scientist-vs-data-analyst

"In general, while Data Analysts tend to be more business focused, Data Scientists are often more mathematically focused."
A greater emphasis might be on the ability to examine a database's records and the overall behavior of its objects.
Also, Data Analysts should set clear measurement priorities; decide what to measure and how to measure it.
Finally, Data Analysts organize the data efficiently and customize reports and dashboards based on business rules and requirements to make informed business decisions.
Data Science vs Data Analysis: https://www.import.io/post/data-scientists-vs-data-analysts-why-the-distinction-matters/
Data Science vs Data Analysis where the latter is drawing conclusions from raw data: https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
Difference between Business Intelligence (BI) and Data Analytics: http://www.kdnuggets.com/2016/08/big-data-key-terms-explained.html/2

"Business Intelligence (BI) is about reporting what happened and Analytics is about answering why. BI includes collecting, transforming and loading (ETL), performing analysis (running queries etc) and presenting (visualize) the results. Power BI, Pentaho are the well known BI tools."
You could think of BI as answering what; DA as answering why; and DS as predicting what will happen;
Difference between Business Intelligence and Data Science: https://www.datasciencegraduateprograms.com/data-science-versus-data-analytics-and-business-intelligence/

Business Intelligence provides retrospective reports to help businesses monitor the current state and historical business performance. Data Science uses past data to make future predictions.
Retargeting Ads, image trackers, cross site conversion tracking: https://geobid.com/cross-site-conversion-tracking/

Can put image pixels, share data on the backend, Digital Advertising Alliance (DAA), etc. Use extensions like Ghostery to block these.
- A Beginner's Guide to Retargeting Ads: https://blog.hubspot.com/marketing/retargeting-campaigns-beginner-guide
- What is retargeting?: https://www.adroll.com/learn-more/retargeting
- How does Facebook know what I looked at on Amazon? : https://www.quora.com/How-does-Facebook-know-what-I-looked-at-on-Amazon-More-generally-how-do-two-sites-share-cookie-data

Miscellaneous Data Science Links

Importance of Data Preparation: https://www.dezyre.com/article/why-data-preparation-is-an-important-part-of-data-science/242
What is Data Science https://www.cloudera.com/content/dam/cloudera/Resources/PDF/What_is_Data_Science_OReilly.pdf
5 questions Data Science answers http://bigdata.black/analytics-predictions/data-science/data-science-beginners-basic-questions/
IoT Definition: http://www.mckinsey.com/industries/high-tech/our-insights/the-internet-of-things
One definition might be that IoT encompasses any sensor that can be controlled and is connected to the internet.
"In what's called the Internet of Things, sensors and actuators embedded in physical objects, from roadways to pacemakers, are linked through wired and wireless networks, often using the same Internet Protocol (IP) that connects the Internet."
So you can think of it as: a combination of a physical object + a controller/sensor/actuator + Internet
Difference between Machine Learning and Statistical Learning: https://www.quora.com/What-is-the-difference-between-statistical-learning-and-machine-learning-1
Machine Learning (T. Mitchell): a program is said to learn if its performance at a task improves with experience (data).
Statistical learning uses data to construct probabilistic models for analysis; random variables are used to describe features and probabilistic distributions are used to describe the statistical regularity of data
Difference between data lakes and data warehouses: https://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
:

Data Science Resources

Data Science Competitions and Job Sites

Kaggle: https://www.kaggle.com/
HackerRank: https://www.hackerrank.com/
AngelList Startups: https://angel.co/
NUMERAI: https://numer.ai/
Topcoder: https://www.topcoder.com/
CrowdAnalytics: https://www.crowdanalytix.com/
DrivenData: https://www.drivendata.org/

Data Science Interview Questions/Hints

Difference between Data Analyst and Data Scientist and 5 essential interview questions: https://datasciencecareeroptions.com/resources/articles/spot-a-fake-data-scientist/
21 Must-Know Machine Learning Interview Questions and Answers: https://elitedatascience.com/machine-learning-interview-questions-answers
41 Essential Machine Learning Interview Questions (with answers): https://www.springboard.com/blog/machine-learning-interview-questions/
40 Interview Questions asked at Startups in Machine Learning/Data Science: https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions-asked-at-startups-in-machine-learning-data-science/
21 Must-Know Data Science Interview Questions and Answers: http://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html
Quora's What are some common Machine Learning interview questions? https://www.quora.com/What-are-some-common-Machine-Learning-interview-questions
30 Questions to test a data scientist on Natural Language Processing [Solution: Skilltest NLP]: https://www.analyticsvidhya.com/blog/2017/07/30-questions-test-data-scientist-natural-language-processing-solution-skilltest-nlp/
30 Questions to test a data scientist on Linear Regression [Solution: Skilltest Linear Regression]: https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/
:

EDA and Visualization

Visualization

Choosing a good chart: http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf
Types of Data Visualization: http://guides.library.duke.edu/datavis/vis_types
Excellent website on basics of visualization, especially Practical tab: https://eagereyes.org/section/techniques
38 best tools for data visualization: http://www.creativebloq.com/design-tools/data-visualization-712402
18 Free Exploratory Data Analysis Tools: https://www.analyticsvidhya.com/blog/2016/09/18-free-exploratory-data-analysis-tools-for-people-who-dont-code-so-well/
Ultimate resource for understanding & creating data visualization: https://www.analyticsvidhya.com/blog/2015/05/data-visualization-resource/
Excellent tips from Harvard Business Review on brainstorming good data viz: https://hbr.org/webinar/2017/12/better-charts-in-a-couple-of-hours-sketching-to-win
- Start by sketching what you want to communicate and why
- Think through your ocntext: establish the context and don't worry about what data you have, etc.
- HBR's sequel to the video: https://hbr.org/webinar/2018/02/the-right-stuff-chart-types-and-visualization-best-and-worst-practices#comment-section
Best Business Intelligence tools (Tableau, Qlik, Microsoft Power BI, Spotfire, Alteryx, etc.): https://www.linkedin.com/pulse/5-reasons-why-power-bi-taking-over-tableau-best-tool-tacoronte
- Tableau Student Edition: https://www.tableau.com/academic/students
- Spotfire Student Edition: https://spotfire.tibco.com/better-world-donation-program
- Student Edition:
15 Python Libraries for Data Science: https://www.upwork.com/hiring/data/15-python-libraries-data-science/
5 best libraries for building data visualizations: https://www.fastcompany.com/3029760/the-five-best-libraries-for-building-data-vizualizations
- TOOL: ZingChart: a very easy to use JavaScript library for graphing: https://www.zingchart.com/
- TOOL: Chart.js: another very easy to use JavaScript library for graphing: https://www.sitepoint.com/introduction-chart-js-2-0-six-examples/
- TOOL:
"Above All Else Show the Data": https://medium.com/@plotlygraphs/above-all-else-show-the-data-1b8bbf05c2ae
Misleading statistics and visualizations:
- Quartz' guide to worst charts and how to fix them: https://qz.com/580859/the-most-misleading-charts-of-2015-fixed/
- Misleading Statistics Examples: http://www.datapine.com/blog/misleading-statistics-and-data/
- Most Common Data Visualization Mistakes And How To Avoid Them: http://www.datapine.com/blog/common-data-visualization-mistakes/
What is a Decision Tree Diagram: https://www.lucidchart.com/pages/decision-tree
- TOOL: Silver Decisions, open source decision tree software: http://silverdecisions.pl/
Random Forests and Decision Trees: A Practical Guide to Tree Based Learning Algorithms: https://sadanand-singh.github.io/posts/treebasedmodels/
- Practical Tutorial on Random Forest and Parameter Tuning in R: https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/tutorial-random-forest-parameter-tuning-r/tutorial/
- Tuning the parameters of your Random Forest model: https://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/
- Classification And Regression Trees (CART): https://towardsdatascience.com/what-is-a-decision-tree-22975f00f3e1
- Decision Tree (CART) Retail Case Study Example: http://ucanalytics.com/blogs/decision-tree-cart-retail-case-example-part-5/
- A Practical Guide to Tree Based Learning Algorithms: https://sadanand-singh.github.io/posts/treebasedmodels
:

Exploratory Data Analysis (EDA)

Step-by-Step Exploratory Data Analysis (EDA) using Python: https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-data-analysis-eda-using-python/
EDA from Penn State's Applied Data Mining course: https://onlinecourses.science.psu.edu/stat857/node/4
Cheat Sheet for Exploratory Data Analysis in Python: https://www.analyticsvidhya.com/blog/2015/06/infographic-cheat-sheet-data-exploration-python/
A Comprehensive Guide to Data Exploration: https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
How to choose the right chart: http://bigdata.black/analytics-predictions/visual-analytics/how-to-choose-the-right-chart/
18 Free Exploratory Data Analysis Tools For People who don't code so well: https://www.analyticsvidhya.com/blog/2016/09/18-free-exploratory-data-analysis-tools-for-people-who-dont-code-so-well/
The Art of Story Telling in Data Science and how to create data stories: https://www.analyticsvidhya.com/blog/2017/10/art-story-telling-data-science/

Feature Engineering

General Feature Engineering

Categorical Variables

Pre-processing: Part of pre-processing might involve converting non-numeric, categorical features into numeric variables: you can use methods like Label encoding, One-hot encoding, Mean/Median/Mode imputation, K-nearest neighbors imputation, etc. Here are some links to some helpful tutorials for converting categorical features to numeric encodings:

String Value Variables

Converting non-categorical, general string variables to numeric features can be a bit more challenging than converting categorical variables, as there is no single "best" way to do it. The best approach will depend on the specific data you are working with and the task at hand but here are some general strategies and then links to tutorials for that, as well: Tokenization: This involves breaking the string into individual words or tokens. You can then use techniques like word embedding to convert the tokens into numeric vectors; N-grams: This involves creating features based on n-grams, which are sequences of n consecutive words or tokens. For example, you could create features based on bigrams (sequences of two words) or trigrams (sequences of three words); Entity extraction: This involves identifying and extracting named entities from the text, such as people, places, and organizations. You can then use techniques like entity embedding to convert the entities into numeric vectors; Bag-of-words: This involves creating a feature for each unique word in the text. The value of each feature is the number of times the word appears in the document; TF-IDF: This is a variation of bag-of-words that weights the features based on their term frequency (TF) and inverse document frequency (IDF). TF is the number of times the word appears in the document, and IDF is a measure of how rare the word is across all documents; Other more abstract representations like distributed text representations and embeddings like Word2Vec and GloVe, etc.:

Machine Learning

Choosing Machine Learning Algorithms

3 Must-Ask Questions Before Choosing That Machine Learning Algorithm: https://medium.com/@ForecastThis/3-must-ask-questions-before-choosing-that-machine-learning-algorithm-c49bd61860ed#.iiebm2v95
Choosing between Machine Learning algorithms: http://www.datasciencecentral.com/profiles/blogs/want-to-know-how-to-choose-machine-learning-algorithm
10 essential ML algorithms: https://chatbotsmagazine.com/machine-learning-neural-networks-and-algorithms-5c0711eb8f9a
Simple guide to ML: https://10clouds.com/blog/machine-learning-startup/
Tutorial on Automated Machine Learning using MLBox: https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/
RapidMiner Studio Machine Learning/Data Science tools platform: https://rapidminer.com/educational-program/
Strengths and Weaknesses of ML Algorithms: https://elitedatascience.com/machine-learning-algorithms
ML and Optimization: Machine Learning = Representation + Evaluation + Optimization: https://medium.com/@devnag/machine-learning-representation-evaluation-optimization-fc7b26b38fdb
Most methods in machine learning are based on finding parameters that minimize some objective/loss/cost function. ML is mainly about generalization.
- Quora: How much of machine learning is actually just optimization?: https://www.quora.com/How-much-of-machine-learning-is-actually-just-optimization
Extracting statistical features from time series datasets:
- Time series classification based on statistical features: https://jwcn-eurasipjournals.springeropen.com/articles/10.1186/s13638-020-1661-4
- Basic Feature Engineering With Time Series Data in Python: https://machinelearningmastery.com/basic-feature-engineering-time-series-data-python/
- Application of Machine Learning for Tool Condition Monitoring in Turning: https://www.techscience.com/sv/v56n2/47132/html
What is the difference between a Generative and Discriminative Algorithm?: https://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm
A generative model only applies to probabilistic methods. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x)
- Generative vs. discriminative: https://stats.stackexchange.com/questions/12421/generative-vs-discriminative
- ML FAQ: http://stp.lingfil.uu.se/~santinim/ml/FAQs.pdf
- Generative vs. Discriminative; Bayesian vs. Frequentist: https://lingpipe-blog.com/2013/04/12/generative-vs-discriminative-bayesian-vs-frequentist/
  - Bayesian Data Analysis: Generative Models: https://www.youtube.com/watch?v=mAUwjSo5TJE
Gradient Descent: Life is gradient descent: https://hackernoon.com/life-is-gradient-descent-880c60ac1be8
Gradient descent is a local optimizer. Stochastic gradient descent can find a global optimum. Approaches like Simulated annealing can find global minimum for different types of functions.
- EXCELLENT NOTES HERE: Stochastic Gradient Descent is quick and dirty approach: https://am207.github.io/2017/wiki/gradientdescent.html#stochastic-gradient-descent
  - AM 207 Course Material: https://am207.github.io/2017/material.html
- Optimization: https://frnsys.com/ai_notes/foundations/optimization.html
- What is an intuitive explanation of gradient descent?: https://www.quora.com/What-is-an-intuitive-explanation-of-gradient-descent
- Adventures in AI part 1: What is a gradient descent algorithm?: http://fizzylogic.nl/2017/05/26/adventures-in-ai-part-1-what-is-a-gradient-descent-algorithm/
- Hello, Gradient Descent: https://medium.com/ai-society/hello-gradient-descent-ef74434bdfa5
Quora: What is the difference between a parametric model and a non-parametric model?: https://www.quora.com/What-is-the-difference-between-a-parametric-model-and-a-non-parametric-model
ML methods can be categorized as being generative/discriminative or parametric/nonparametric or supervised/unsupervised, etc. Non-parametric just means number of parameters depends on data. Parametric methods are subject to optimization but nonparametric might not be in the same way.
- What is the difference between a parametric learning algorithm and a nonparametric learning algorithm?: https://sebastianraschka.com/faq/docs/parametric_vs_nonparametric.html
- What are real life examples of non-parametric statistical models?: https://stats.stackexchange.com/questions/230044/what-are-real-life-examples-of-non-parametric-statistical-models
- Machine Learning Thoughts; Parametric or Nonparametric Model: https://www.linkedin.com/pulse/machine-learning-thoughts-parametric-nonparametric-model-mokhtarian/
- Machine Learning Lesson of the Day Parametric vs. Non-Parametric Models: https://chemicalstatistician.wordpress.com/2014/01/14/machine-learning-lesson-of-the-day-parametric-vs-non-parametric-models/
- Parametric and Nonparametric Machine Learning Algorithms: https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms/
:

Machine Learning Tutorials/Guides

Metrics: Excellent Guide: F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?: https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
Metrics: Micro, Macro & Weighted Averages of F1 Score, Clearly Explained: https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f
ROC curves and Area Under the Curve explained: http://www.dataschool.io/roc-curves-and-auc-explained/
Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning: https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78
Succinct guide to Model Evaluaiton for Binary Classification (Precision, Recall, Sensitivity, Specificity): http://www.ling.upenn.edu/courses/ling005/BinaryClassification.html
- Another nice summary of the Confusion Matrix metrics: http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
Accuracy Paradox: accuracy is the number of correct predictions made divided by the total number of predictions made:
- Beyond Accuracy: Precision and Recall: https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
- Why accuracy alone is a bad measure for classification tasks: https://tryolabs.com/blog/2013/03/25/why-accuracy-alone-bad-measure-classification-tasks-and-what-we-can-do-about-it/
- An Intuitive Explanation: https://www.linkedin.com/pulse/intuitive-explanation-precision-recall-accuracy-daniel-d-souza/
:

kNN

Great overall technical guide: https://machinelearningmastery.com/k-nearest-neighbors-for-machine-learning/
kNN is lazy and does NOT learn a model (no parameters learned from data only hyperparameters tuned): https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7
Tuning the hyperparameter k: https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/
Discuss what learning happens in lazy learner kNN: https://www.kdnuggets.com/2017/09/rapidminer-k-nearest-neighbors-laziest-machine-learning-technique.html
:

Neural Network Links

MarI/O - Machine Learning for Video Games: Evolutionary Algorithms for Neural Networks: https://www.youtube.com/watch?v=qv6UVOQ0F44
Great video introduction to RNNs and LSTMs: https://www.youtube.com/watch?v=WCUNPb-5EYI
Understanding LSTM Networks: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
A Beginners Guide to Recurrent Networks and LSTMs: https://deeplearning4j.org/lstm.html
Exploring LSTMs by Edwin Chen: http://blog.echen.me/2017/05/30/exploring-lstms/
Essentials of Deep Learning : Introduction to Long Short Term Memory: https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/
DeepLizard Video on CNNs: https://www.youtube.com/watch?v=YRhxdVk_sIs
EXCELLENT: What is a receptive field in a convolutional neural network?: https://www.quora.com/What-is-a-receptive-field-in-a-convolutional-neural-network
... the input neurons are basically the pixel intensities of an input image and on the right is a one hidden neuron out of the many neurons in the first hidden layer. Each neuron will be connected to only a region of the input layer, that region in the input image is called the local receptive field for the hidden neuron. It's a little window on the input pixels. Receptive field, kernel and filter are used interchangenably.
More details and source pages
- EXCELLENT online book by Michael Nielsen on Neural Networks and source of above image: http://neuralnetworksanddeeplearning.com/chap6.html
  - Heuristics for choosing HyperParameters: http://neuralnetworksanddeeplearning.com/chap3.html#how_to_choose_a_neural_network's_hyper-parameters
    
    Hyperparameters are those that are not learned from the data and are set by the developer
    E.g., see https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/ and https://www.quora.com/What-are-hyperparameters-in-machine-learning
- A Beginner's Guide To Understanding Convolutional Neural Networks by Adit Desphande: https://adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/
  
  Great explanation of sliding windows, activation map, feature maps, etc., as well as that each layer's depth is proportional to the number of filters used for that layer, which could be hundreds: "Remember, this is just for one filter. This is just a filter that is going to detect lines that curve outward and to the right. We can have other filters for lines that curve to the left or for straight edges. The more filters, the greater the depth of the activation map, and the more information we have about the input volume." And classic CNN architecture would be like:
  
  Input → Conv → RelU → Pool → RelU → Conv → RelU → Pool → Fully-Connected
  
  "We talked about what the filters in the first conv layer are designed to detect. They detect low level features such as edges and curves. As one would imagine, in order to predict whether an image is a type of object, we need the network to be able to recognize higher level features such as hands or paws or ears. So lets think about what the output of the network is after the first conv layer. It would be a 28 x 28 x 3 volume (assuming we use three 5 x 5 x 3 filters)."
  
  "So each layer of the input is basically describing the locations in the original image for where certain low level features appear. Now when you apply a set of filters on top of that (pass it through the 2nd conv layer), the output will be activations that represent higher level features. Types of these features could be semicircles (combination of a curve and straight edge) or squares (combination of several straight edges). As you go through the network and go through more conv layers, you get activation maps that represent more and more complex features. "
- Deep Learning Crash Course Part 2 by Sasank Chilamkurthy: https://chsasank.github.io/deep-learning-crash-course-2.html
  
  Great explanation of feature maps, pooling/sub-sampling, etc. "We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information. "
- Understanding Convolutional Neural Networks for NLP: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
  
  Great intuitive explanation of computer vision filters of edges, etc., and imaging of intermediate layers: "For example, in Image Classification a CNN may learn to detect edges from raw pixels in the first layer, then use the edges to detect simple shapes in the second layer, and then use these shapes to deter higher-level features, such as facial shapes in higher layers. The last layer is then a classifier that uses these high-level features."
- Understanding of Convolutional Neural Network - (CNN) Deep Learning: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
  
  Great listing of activation functions (like sigmoid, tanh, RelU, etc.) that's combined with Convolution and also different pooling functions like max, average, sum, etc.
- Receptive Fields and Shared Parameters: http://cs231n.github.io/convolutional-networks/
- Receptive Field Arithmetic in CNNs by Dang Ha The Hien: https://medium.com/@nikasa1889/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807
- Single Layer Perceptron creates a linear hyperplane and cannot implement non linearly separable functions like XOR: http://computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html
- 12-minute tutorial on perceptrons (and others!): https://appliedgo.net/perceptron/
  - 12-minute tutorial on REST, MapReduce, Flow-Based Programming, etc.: https://appliedgo.net//categories/tutorial/
- Different ANN Architectures: http://www.cs.stir.ac.uk/courses/ITNP4B/lectures/kms/2-Perceptrons.pdf
- 10 misconceptions about Neural Networks: http://www.turingfinance.com/misconceptions-about-neural-networks/
Biological Basis of Convolutions/Neural Networks
- Neural Networks and Neuroscience-Inspired Computer Vision by David Cox: https://www.sciencedirect.com/science/article/pii/S0960982214010392
- Convolutional Neural Network- Inspired by the Brain: http://blog.arimaresearch.com/convolutional-neural-network-cnn/
- Overview and Applications of Artificial Neural Networks: https://medium.com/@xenonstack/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7
- The Brain vs Deep Learning: http://timdettmers.com/2015/07/27/brain-vs-deep-learning-singularity/
- How brains differ from computers: http://www.explainthatstuff.com/introduction-to-neural-networks.html
Great overview of neural networks and their history and neural connection: https://gizmodo.com/youre-using-neural-networks-every-day-online-heres-h-1711616296
Neuromorphic Chips: https://en.wikichip.org/wiki/intel/loihi

Chris Eliasmith also discusses design of analog (or digital) neuromorphic chips that simulate biological neurons and neural circuits and some artificial brains have up to 3.5 million neurons and 1 billion synapses
https://www.humanbrainproject.eu/en/silicon-brains/
https://www.therecord.com/news-story/7763151-waterloo-prof-constructs-world-s-largest-simulation-of-a-human-brain/
:

Regression

Excellent introduction to Linear Regression Analysis and Correlation Coefficients: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2992018/
Logistic Regression intro: http://www.statisticssolutions.com/what-is-logistic-regression/
:

Ensemble Methods

5 Easy questions on Ensemble Modeling everyone should know: https://www.analyticsvidhya.com/blog/2015/09/questions-ensemble-modeling/
Bagging (Bootstrap Aggregating): results of multiple classifiers are combined using average or majority voting. Boosting (AdaBoost, XGBoost, Gradient Tree Boost): provides sequential learning of the predictors: first predictor is learned on the whole data set with equal weights on all samples. Each subsequent learner assigns a higher weight to mis-classified samples before learning.
Stacking: uses multiple base classifiers for prediction; learner is used to combine their predictions.
Ensemble Methods: https://www.toptal.com/machine-learning/ensemble-methods-machine-learning
Committees, Random Forests, bagging vs boosting, bias variance, etc.: http://www.cs.colostate.edu/~cs545/fall15/lib/exe/fetch.php?media=wiki:21_ensembles.pdf
Ensemble Learning to Improve Machine Learning Results: https://blog.statsbot.co/ensemble-learning-d1dcd548e936
:

Probability and Statitics

Programming Languages for Data Science

Python Links

Useful python libraries for data science: https://github.com/rasbt/pattern_classification/blob/master/resources/python_data_libraries.md

R Links

Introduction to R Course: https://www.datacamp.com/courses/free-introduction-to-r
Try R Course from Code School: https://www.codeschool.com/courses/try-r
R Statistics Tutorial for Beginners: https://www.youtube.com/watch?v=qEJHYIa-EhI
Debugging, condition handling, and defensive programming in R: http://adv-r.had.co.nz/Exceptions-Debugging.html
Data Frames in R: https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-r#gs.null
Course:

General Reference Sites

Cybersecurity and Data Security

Verizon's annual Data Breach Investigations Report (DBIR): http://www.verizonenterprise.com/resources/reports/rp_DBIR_2016_Report_en_xg.pdf Web applications are the number one attack vector across a number of industries, especially using Javascript and SQL

Artificial Intelligence Links

The History of Artificial Intelligence: http://courses.cs.washington.edu/courses/csep590/06au/projects/history-ai.pdf
Overview of AI Libraries in Java: http://www.baeldung.com/java-ai
:

CyberSecurity Links

InfoSec Labs: https://lab.infoseclearning.com/labs
:
:

Miscellaneous ML Notes

Covariance and correlation are similar concepts; the correlation between two variables is equal to their covariance divided by their variances, as explained at http://mccormickml.com/2014/07/22/mahalanobis-distance/

We can uuse the Mahalanobis distance to find outliers in multivariate data. It measures the separation of two groups of objects. Nice intuitive explanation here: https://www.theinformationlab.co.uk/2017/05/26/mahalanobis-distance/ The covariance matrix provides the covariance associated with the variables (the reason covariance is followed is to establish the effect of two or more variables together).

It is primarily used in classification and clustering problems where there is a need to establish correlation between different groups/clusters of data. Euclidean distance only makes sense when all the dimensions have the same units (like meters), since it involves adding the squared value of them.

When you are dealing with probabilities, a lot of times the features have different units. For example: we might have a model for men and a model for women, where both models are based on their weight [Kg] and height [m]. We also know the mean and covariance for each model. Now if we get a new measurement vector, an ordered set composed of weight and height, we have to decide if it's a man or a woman. We can use the Mahalanobis distance from the models of both men and women to decide which is closer, meaning which is more probable. The Mahalnobis distance transforms the random vector into a zero mean vector with an identity matrix for covariance. In that space, the Euclidean distance is safely applied.

Linear Discriminant Analysis (LDA) is used to classify multiple classes using dimensionality reduction like Principal Component Analysis (PCA). For two classes, you can just use Logistic Regression. For each input variable, you need to calculate the mean value of that variable for each class as well as the variance of that variable for each class. "Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value." https://www.kdnuggets.com/2018/02/tour-top-10-algorithms-machine-learning-newbies.html

This follows from the Curse of Dimensionality: as we we add in higher and higher dimensional in our feature vector, we need more computational power and data to effectively train the model. If you add in more features, you need more data, as seen here: https://towardsdatascience.com/curse-of-dimensionality-2092410f3d27

Thus, the goal of LDA is to reduce the dimension of the feature vectors without loss of information and maximize class separability; discrimination here is coming up with a rule that accurately assigna a new measurement/vector to one of several classes. http://www.cs.uml.edu/~ycao/teaching/fall_2013/downloads/05_MC_2_LDA.pdf

The rule is a discriminant function, a linear equation of the X variables that will provide best separation between the categorical Y variable. This checks to see if there are significant intra-group differences in terms of the X variables. It also identifies the X variables that contribute most to the inter-group separation. https://www.datascience.com/blog/predicting-customer-churn-with-a-discriminant-analysis

Ricky J. Sethi, PhD <rickys@sethi.org>

Last updated: Friday, December 6 2024
(www.sethi.org/tutorials/references_data_science.shtml)