Blog

Blog Categories

   Random Forest is a powerful ensemble learning method that can be applied to various prediction tasks, in particular classification and regression. The method uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy, easy usage, and no necessity of scaling data. Moreover, it also has a very important additional benefit, namely perseverance to overfitting (unlike...
The way other people think about one or another product or service has a big impact on our everyday process of making decisions. Earlier, people relied on the opinion of their friends, relatives, or products and services reposts, but the era of the Internet has made significant changes. Today opinions are collected from different people around the world via reviewing e-commerce sites as well as blogs and social nets. To transform gathered...
The random forest algorithm is the combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. It can be applied to different machine learning tasks, in particular, classification and regression. Random Forest uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy,...
Curious about neural networks and deep learning? This post will inspire you to get started in deep learning. Why are we witnessing this kind of build up for neural networks? It is because of their amazing applications. Some of their applications include image classification, face recognition, pattern recognition, automatic machine translation, and so on. So, let’s get started now. Machine Learning is a field of computer science that...
The open-source project R is among the leading tools for data science and machine learning tasks. Given its open-source framework, there are continuous contributions, and package libraries with new features pop up frequently. Currently, the CRAN package repository features 12,525 available packages. This post takes a look at the most popular and useful packages that have set the standards for solving data manipulation, visualization, and...
GBM is a highly popular prediction model among data scientists or as top Kaggler Owen Zhang describes it: "My confession: I (over)use GBM. When in doubt, use GBM." GradientBoostingClassifier from sklearn is a popular and user friendly application of Gradient Boosting in Python (another nice and even faster tool is xgboost). Apart from setting up the feature space and fitting the model, parameter tuning is a crucial task in...
  Currently, Python and R are the dominating data science tools and Python will probably even be taking the lead (at least based on the latest KDNuggets survey ). When did the two open source players manage to become the leading platforms for analytics, data science, and machine learning, leaving behind established players such as Matlab or SAS? Here are some insights from Google Trends. Looking at the years 2009 - 2013 in the...