katie haske.

data analyst.

hello professional academic 25% katie 50% katie 100% katie portfolio skills music


just a lil assortment of projects i've done over the years.

🕵️‍♀️Detecting suspicious account activity
Salto

After conducting data mining on a Salto dataset, I created a system of detecting users who share their account with multiple people, a system that will be later applied to a brand recommendation engine.

💻Analyzing extent of cyber attack
Cyberguerre

With Numerama's Cyberguerre, I processed and analysed a leaked dataset to explore the extent of a cyber attack that affected approximately 500,000 people.

📊Google Analytics Customer Revenue Preprocessing

This project aimed take Google Analytics Customer Revenue data and clean it up by removing missing values or engineering new features. Visualisations were then run on the data to explore the information.

💸Predicting user conversion rate

This project used various classification models — logistic regression, decision trees — to predict user conversion rate.

🛶Planning a vacation with Kayak

Starting with a list of the top 35 best cities in France to visit, this project used APIs and web scraping techniques to determine which cities and hotels would be the best to travel to in the next week based on forecasted weather conditions. The collected data was then stored in an AWS S3 bucket and uploaded into an RDS.

📰Fake news detection

In this project, I used machine learning techniques (linear regression, decision tree classification, and gradient boost) to detect fake news.

🎧YouTube playlog EDA

After taking YouTube playlog data from an S3 bucket or extracting it using a YouTube API, I loaded them into a PySpark dataframe, cleaned the information, and then saved the processed dataframe back to S3 and Redshift.

💉COVID-19 World Vaccination Progress

This project answers the questions presented in task one of the Kaggle project COVID-19 World Vaccination Progress.

🚗Uber pickups

Using unsupervised machine learning techniques and data provided by Uber, this project aimed to give drivers recommendations about which hot-zones in majors cities to be in to be able to pick up riders within 5-7 minutes of their ride request.

📈Boosting with XGBoost

This project uses the various classifiers -- random forest, SVM -- to predict fraudulent bank activity.

🎬Movie Recommendation System using Collaborative Filtering

In this project, I built a movie recommendation system from scratch that uses the past ratings made by a user to predict new titles that they will likely enjoy.

🐕🐇Three question quiz

Test you knowledge about my pets with this interactive quiz!