Blog

Blog Categories

   Random Forest is a powerful ensemble learning method that can be applied to various prediction tasks, in particular classification and regression. The method uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy, easy usage, and no necessity of scaling data. Moreover, it also has a very important additional benefit, namely perseverance to overfitting (unlike...
For the past few years, tasks involving text and speech processing have become really hot-trendy. Among the various researches belonging to the fields of Natural Language Processing and Machine Learning, sentiment analysis ranks really high. Sentiment analysis allows identifying and getting subjective information from the source data using data analysis and visualization, ML models for classification, text mining and analysis. This helps...
Introduction Nowadays PostgreSQL is probably one of the most powerful relational databases among the open-source solutions. Its functional capacities are no worse than Oracle’s and definitely way ahead of the MySQL. So if you are working on apps using Python, someday you will face the need of working with databases. Luckily, Python has quite a wide amount of packages that provide an easy way of connecting and using databases. In...
The way other people think about one or another product or service has a big impact on our everyday process of making decisions. Earlier, people relied on the opinion of their friends, relatives, or products and services reposts, but the era of the Internet has made significant changes. Today opinions are collected from different people around the world via reviewing e-commerce sites as well as blogs and social nets. To transform gathered...
Introduction Exploratory data analysis (EDA) is an approach to data analysis to summarize the main characteristics of data. It can be performed using various methods, among which data visualization takes a great place. The idea of EDA is to recognize what information can data give us beyond the formal modeling or hypothesis testing task. In other words, if initially we don’t have at all or there are not enough priori ideas about...
In the modern world, the information flow which befalls on a person is daunting. This led to a rather abrupt change in the basic principles of data perception. Therefore visualization is becoming the main tool for presenting information. With the help of visualization, information is presented to the audience in a more accessible, clear, visual form. Properly chosen method of visualization can make it possible to structure large data arrays,...
The more carefully you process the data and go into details, the more valuable information you can get for your benefit. Data visualization is an efficient and handy tool for gaining insights from data. Moreover, you can make the data far more understandable, colorful and pleasant with the help of visualization tools. As data is changing every second, it is an urgent task to investigate it carefully and get the insights as fast as...
What is Exploratory Data Analysis Exploratory data analysis (EDA) is a powerful tool for a comprehensive study of the available information providing answers to basic data analysis questions. What distinguishes it from traditional analysis based on testing a priori hypothesis is that EDA makes it possible to detect — by using various methods — all potential systematic correlations in the...
Companies have a growing demand to visualize their data with business intelligence tools. We compared the salaries from across 10 different European countries using   Glassdoor , which offers self-reported salary information by location and employer, giving us some key insights into the salaries of people with “Business Intelligence” in their job title. Switzerland with the highest salary for Business Intelligence There...
We’re always on the lookout for interesting data science work, especially as it relates to the UK. Shoot us an e-mail with an outline of your idea, your analysis or a written-out blog post. Our only guideline: Choose a topic that’s related to data science and/or support your arguments by data. Contact us at  info@datacareer.co.uk  – we’re looking forward to your contribution!     Back
In financial markets, tradable instruments and securities have unique identifiers. The identifiers are very useful, because you can make sure that you and your counterparty are talking about the same instrument while trading. The difficulty is that there isn't really a standard for all the various sorts of instruments or markets. Anyone working in the industry will recognize this issue, especially people working at larger institutions who...
The random forest algorithm is the combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. It can be applied to different machine learning tasks, in particular, classification and regression. Random Forest uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy,...