top of page
Data Analysis Machine Learning
Ah, data snooping. The sneaky little troublemaker.
Ah, data snooping. The sneaky little troublemaker that loves to mess with the integrity of your data science analyses. It's like that...
Charles Stoy
Dec 15, 20232 min read
Â
Â
Â
Savoring the Numbers: Eliminating Discrepancies through Data Standardization
Numerical data is an essential part of our lives, from calculating our expenses to analyzing trends in business. But, as we dive deeper into it, we're reminded of the most challenging obstacle, the difference in magnitudes. We've discovered that comparing data with different orders of magnitude is a common problem that can lead to biased results and conclusions.
Charles Stoy
Sep 10, 20231 min read
Â
Â
Â
Master the Art of Data Transformation: A Deep Dive into Python-Powered Data Standardization
Welcome to the mysterious world of data standardization. Imagine a world where data is not standardized and it's just a vague, un-explainable mess! It's a terrifying thought, but fortunately, data standardization is here to save the day and bring order to the chaos. In this blog, I will show you how to create a machine learning program using Python, that does data standardization, and transforms data into a beautifully crafted masterpiece.
Charles Stoy
Aug 23, 20232 min read
Â
Â
Â
The Difference Between Binary Encoding and One-Hot Encoding for Categorical Variables
In the field of data science, the terms "binary encoding" and "one-hot encoding" often come up when dealing with categorical variables. These encoding techniques are used to convert categorical data into a numerical format that can be processed by machine learning algorithms. In this article, we will explore the differences between binary encoding and one-hot encoding, their impact on neural networks, and when to use each approach.
Charles Stoy
Aug 20, 20233 min read
Â
Â
Â
Mastering Data Cleaning: Effective Strategies for Handling Outliers, Missing Data, and More
Data cleaning, also known as data cleansing or data scrubbing, is a crucial step in the data analysis process. It involves detecting and correcting or removing errors, inconsistencies, and inaccuracies from datasets. Despite its importance, data cleaning can present numerous challenges. For the sake of simplicity, we will assume you are using a language like Python with libraries such as pandas, numpy, and sklearn which are common for data cleaning and preprocessing.
Charles Stoy
Jul 30, 20235 min read
Â
Â
Â
Data Cleaning: running data through the washer with code snippets
Data munging, also known as data wrangling or data cleaning, is an essential step in the data analysis process. It involves the process of transforming and mapping raw data into a format that is more suitable for analysis. Here's why data munging is so crucial:
Charles Stoy
Jul 20, 20235 min read
Â
Â
Â
Get Your Data in Shape: 3 Ways to Streamline Your Cleaning Process
Cleaning the data, including removing missing values, is an important step in data analysis and modeling. It helps to improve the performance and accuracy of the models by reducing noise and variability in the data.
Charles Stoy
Feb 3, 20233 min read
Â
Â
Â
A/B Testing...is not testing for Bees
ML or Mid Life Crisis? Machine learning is a method of teaching computers to learn from data, without being explicitly programmed. It...
Charles Stoy
Jan 23, 20235 min read
Â
Â
Â
It's Getting Convoluted Up In Here
ANN Are You Listening? Artificial Neural Networks (ANNs) are a type of machine learning model that are inspired by the structure and...
Charles Stoy
Jan 23, 20233 min read
Â
Â
Â
ANN: You are so artificial
An artificial neural network (ANN) is a computational model inspired by the structure and function of the biological neural networks that...
Charles Stoy
Jan 9, 202310 min read
Â
Â
Â
Clearly Reverend Bayes, you are naive
Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive)...
Charles Stoy
Jan 9, 20233 min read
Â
Â
Â
Munging data for fun and food
I already had lunch Data munging is the process of cleaning and transforming raw data into a form that is more suitable for analysis. It...
Charles Stoy
Jan 9, 20233 min read
Â
Â
Â
Tabeling a Pivot: Pivot Tables and Financial Analysis
Pivot tables are a useful tool for bookkeeping because they allow you to quickly and easily summarize large amounts of data in a way that...
Charles Stoy
Jan 8, 20232 min read
Â
Â
Â
My logistics have regressed!
What is Logistic Regression Logistic regression is a type of statistical model that is used to predict the probability of an event...
Charles Stoy
Jan 8, 20232 min read
Â
Â
Â
Small businesses have dynamic systems?
System dynamics is a method of analysis that can be useful for small businesses because it allows them to understand the complex...
Charles Stoy
Jan 3, 20232 min read
Â
Â
Â
bottom of page