- Risky loan applicants data analysis case study An imbalanced data analysis approach to understand the factors contributing to a loan default.
- Predicting the misclassification cost incurred in air pressure system failure in heavy vehicles In this post, I provide an in-depth analysis of the Air Pressure System (APS), which is a type of function used in heavy vehicles to assist braking and gear changing.
- A classification approach to predicting air crash survival In this post, I provide an in-depth analysis of imbalanced data classification problem
- Employee flight risk modeling behavior In this post, I provide an in-depth treatment of mixed variables for a classification task
- Scraping twitter data to visualize trending tweets in Kuala Lumpur In this post, I've scraped twitter data to search some hash-tags and visualized them.
- To eat or not to eat! That's the question? Measuring the association between categorical variables In this post, I provide an in-depth treatment of categorical nominal data to measure significance and association between the variables.
- Learning a classifier from census data A case study on adult income data to predict income level
- Predicting employment related factors in Malaysia- A regression analysis approach A case study on determining factors responsible for a steady employment in Malaysia based on data from the department of statistics
- Predicting rubber plantation yield- A regression analysis approach A case study on predicting rubber yield per tonne based on rubber plantation data of Malaysia
- Basic assumptions to be taken care of when building a predictive model Fundamental concepts in building a predictive model
- Data Transformations in R Types of data transformations
- Sold! How do home features add up to its price tag? Predicting the house price based on its features
- Learning from data science competitions- baby steps What it takes to compete and excel in data science competitions
- Data Splitting How to split data into training and testing sets
- Big or small-lets save them all- Visualizing Data Making data powered visualizations for the gapminder dataset
- Big or small-lets save them all- Making Data Management Decisions Making data powered decisions for the gapminder dataset
- Big or small-lets save them all-Exploratory Data Analysis Exploratory data analysis of the gapminder dataset
- Batch Geo-coding in R Learn how to geocode geographical locations in R
- To read multiple files from a directory and save to a data frame Basic data manipulation techniques in R
- Installing Apache Spark on Windows 7 environment Step wise instructions on installing apache spark on windows environment
- Gini index to compute inequality or impurity in the data Understanding Gini index and its usage in determining data purity quality
- Assessing Clustering Tendency in R Determining the possible number of clusters present in data
- Packages for data mining algorithms in R and Python Packages for data mining in R
- How to create a dissimilarity matrix for mixed type dataset What in the world is a dissimilarity matrix? Interested? Read on..
- Hierarchical Clustering Methods implementation in R- A Case Study implementation of agglomerative hierarchical clustering in R
- How to solve the missing data problem? Solving the missing data mystery!
- How to split a data frame in R with over a million observations in above 50 variables? Does data size matter in analysis?
- Connect R to SQL Server 2014 Building a Data Pipeline
- Data Analysis with R Series- Part 1 Understanding the basics of data analysis
- Splitting a data frame into training and testing sets in R Learning to split data
- Data preprocessing with R- part II More concepts in data cleaning
- Data preprocessing with R Concepts in data cleaning
- How to read CSV file into R An example on reading csv file in R