The scikit-learn is a machine learning package in Python.


The stable version of scikit-learn

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

Features

  • Classification: SVM, nearest neighbors, random forest
  • Regression: SVR, ridge regression, Lasso
  • Clustering: k-Means, spectral clustering, mean-shift
  • Dimensionality reduction: PCA, feature selection, non-negative matrix factorization
  • Model selection: grid search, cross validation, metrics
  • Preprocessing: preprocessing, feature extraction

Doc

Online Documents

Learning scikit-learn: Machine Learning in Python

The book in PACKT Publishing by Raúl Garreta, Guillermo Moncecchi November 2013

IPython source code

Online Source Code

  • Chapter 1 - A Gentle Introduction to Machine Learning
  • Chapter 2 - Supervised Learning - Image Recognition with Support Vector Machines
  • Chapter 2 - Supervised Learning - Regression
  • Chapter 2 - Supervised Learning - Text Classification with Naive Bayes
  • Chapter 2 - Supervised learning - Explaining Titanic Hypothesis with Decision Trees
  • Chapter 3 - Unsupervised Learning - Clustering Handwritten Digits
  • Chapter 3 - Unsupervised Learning - Principal Component Analysis
  • Chapter 4 - Advanced Features - Feature Engineering and Selection
  • Chapter 4 - Advanced Features - Model Selection