# Blog

Risky loan applicants data analysis case study An imbalanced data analysis approach to understand the factors contributing to a loan default. Predicting the misclassification cost incurred in air pressure system failure in heavy vehicles In this post, I provide an in-depth analysis of the Air Pressure System (APS), which is a type of function used in heavy vehicles to assist braking and gear changing. A classification approach to predicting air crash survival In this post, I provide an in-depth analysis of imbalanced data classification problem Employee flight risk modeling behavior In this post, I provide an in-depth treatment of mixed variables for a classification task Scraping twitter data to visualize trending tweets in Kuala Lumpur In this post, I've scraped twitter data to search some hash-tags and visualized them. To eat or not to eat! That's the question? Measuring the association between categorical variables In this post, I provide an in-depth treatment of categorical nominal data to measure significance and association between the variables. Learning a classifier from census data A case study on adult income data to predict income level Predicting employment related factors in Malaysia- A regression analysis approach A case study on determining factors responsible for a steady employment in Malaysia based on data from the department of statistics Predicting rubber plantation yield- A regression analysis approach A case study on predicting rubber yield per tonne based on rubber plantation data of Malaysia Basic assumptions to be taken care of when building a predictive model Fundamental concepts in building a predictive model Data Transformations in R Types of data transformations Sold! How do home features add up to its price tag? Predicting the house price based on its features Learning from data science competitions- baby steps What it takes to compete and excel in data science competitions Data Splitting How to split data into training and testing sets Big or small-lets save them all- Visualizing Data Making data powered visualizations for the gapminder dataset Big or small-lets save them all- Making Data Management Decisions Making data powered decisions for the gapminder dataset Big or small-lets save them all-Exploratory Data Analysis Exploratory data analysis of the gapminder dataset Batch Geo-coding in R Learn how to geocode geographical locations in R To read multiple files from a directory and save to a data frame Basic data manipulation techniques in R Installing Apache Spark on Windows 7 environment Step wise instructions on installing apache spark on windows environment Gini index to compute inequality or impurity in the data Understanding Gini index and its usage in determining data purity quality Assessing Clustering Tendency in R Determining the possible number of clusters present in data Packages for data mining algorithms in R and Python Packages for data mining in R How to create a dissimilarity matrix for mixed type dataset What in the world is a dissimilarity matrix? Interested? Read on.. Hierarchical Clustering Methods implementation in R- A Case Study implementation of agglomerative hierarchical clustering in R How to solve the missing data problem? Solving the missing data mystery! How to split a data frame in R with over a million observations in above 50 variables? Does data size matter in analysis? Connect R to SQL Server 2014 Building a Data Pipeline Data Analysis with R Series- Part 1 Understanding the basics of data analysis Splitting a data frame into training and testing sets in R Learning to split data Data preprocessing with R- part II More concepts in data cleaning Data preprocessing with R Concepts in data cleaning How to read CSV file into R An example on reading csv file in R