Data Science student at Syracuse University, with a background in defense finance and a focus on machine learning, deep learning, and NLP. I bring the full data science lifecycle to every problem.
About me
"The ultimate purpose of analytics is to communicate findings to the concerned who might use these insights to formulate policy or strategy."
-- Murtaza Haider, Getting Started with Data Science
I'm a Program Cost Analyst at HII (Huntington Ingalls Industries) in Syracuse, NY, managing financial performance across a ~$90M defense program portfolio. I'm completing my Master of Science in Applied Data Science at Syracuse University (expected May 2026), where my projects have spanned RF signal intelligence, financial deep learning, and natural language processing.
My background is in corporate finance -- pricing analysis at Carrier and Leidos -- and my goal is to bridge that expertise with data science to drive more impactful decisions in the defense industry. My projects didn't stick to one field: while in the program I explored many directions before deciding to stay in defense and apply data science methodologies where I work today.
I approach every project with transparency and ethical rigor: documenting limitations, acknowledging AI assistance in code reviews, and understanding the real-world stakes of data misuse. Data science is a practice of iteration, self-critique, and curiosity -- and I plan to keep growing with it.
Skills & Tools
Projects
Machine Learning · Inspired by work in defense / RF hardware
Investigated the challenges of classifying radio frequency (RF) signals from a complex, real-world dataset containing I/Q (in-phase and quadrature) signal data. The dataset required extensive preprocessing -- parsing multi-valued string cells into usable complex arrays using a custom parse_iq_cell() function with ast.literal_eval(), normalizing signals to unit power, and crafting features from both the time domain and frequency domain using Welch Power Spectral Density estimation. Signal classes were organized by ITU frequency band designations and refined into VHF sub-bands (FM Broadcast, Marine VHF, Airband Communications).
The model used a two-stage Random Forest pipeline: a binary "gate" classifier to first separate FM Broadcast signals from all others, followed by a multi-class classifier for finer sub-band classification. Stratified K-Fold cross-validation (5 folds), calibrated probability outputs, and class-weighted training addressed dataset imbalance. Feature importance scores provided interpretability into which signal attributes drove classification decisions.
Deep Learning · AAPL historical price forecasting
Applied Long Short-Term Memory (LSTM) neural networks to predict Apple Inc. (AAPL) stock closing prices using historical time-series data sourced from Kaggle. Data was preprocessed with MinMaxScaler normalization and structured into sequential 60-day lookback windows to capture temporal dependencies -- requiring careful datetime parsing and chronological sorting to preserve time-phased data patterns.
A two-layer LSTM architecture with Dropout regularization was implemented in TensorFlow/Keras, trained with early stopping and learning rate reduction callbacks to prevent overfitting. Evaluated on held-out test data using RMSE, MAE, R-squared, and MAPE. Visualizations included training/validation loss curves, actual-vs-predicted price overlays, error distribution histograms, and scatter plots -- translating a complex sequence model into charts accessible to any stakeholder.
Natural Language Processing · Rotten Tomatoes / Kaggle corpus
Built and compared text classification models for five-level sentiment analysis using the Kaggle Movie Reviews dataset (a subset of the Rotten Tomatoes corpus). Preprocessing involved NLTK tokenization, stopword removal, and spaCy lemmatization consolidated into a single reusable pipeline. Feature engineering progressed through several configurations: a 150-word unigram bag-of-words baseline, a 1,000-word expansion, and a combined feature set adding bigrams, POS tag counts, and VADER sentiment scores.
Three experimental conditions were evaluated using 5-fold cross-validation with an NLTK Naive Bayes classifier, plus an advanced condition using Logistic Regression with the combined feature set. Evaluation metrics included precision, recall, and F-measure across all folds -- reinforcing the trade-offs between vocabulary size, feature richness, and classifier complexity.
Program Learning Outcomes
My projects and coursework demonstrate achievement across all six learning outcomes of the Syracuse Applied Data Science program.
Resume
Python · R · SQL · Machine Learning · Deep Learning · NLP · Regression Modeling
Data & VisualizationExcel (Pivot Tables, Power Query, Macros, XLOOKUP) · Power BI · Tableau · Google Analytics
Enterprise SystemsOneStream · SAP · Salesforce · Costpoint
Domain ExpertiseFinancial Analysis · Variance Analysis · Pricing Analysis · Defense Program Finance · Statistical Inference
Contact