Computational Analysis of Reference Networks Across Philosophical Texts

As a research assitant at Tulane University, I conducted what is currently the largest computational analysis of historical texts in philosophy—analyzing over 2,245 texts, from 550 BCE to 1940 AD,to investigate patterns in how philosophical influence has spread across time. We created custom Python scripts to identify and track over 294,970 references between authors and classified these references with a transformer-based natural language processing model to investigate how major figures influenced different areas of philosophy.

We represented our data as graphs, treating authors as node and their in-text references to other authors as edges, which allowed us to apply graph theory algorithms to its structure. We informed our analysis with a deep investigation into primary and secondary sources in philosophy, and created a publically available3D interactive tool to visualize our results.

This research was conducted under the guidance of Aron Culotta, a scholar in natural language processing and social network analysis. From our literature review, it is currently the largest computational analysis of philosophical texts that has ever been conducted. Our work is currently undergoing peer review at the Oxford Journal "Digital Scholarship in the Humanities", and is available in pre-print on arXiv.

Machine Learning Analysis of 40,000 Chess Games

PythonScikit-learnPandasNumPyMatplotlibSeabornMachine LearningKNN

I analyzed over 40,000 online chess games to explore relationships between player ratings, outcomes, openings, and other variables. After conducting a detailed statistical and visual analysis using Python, Pandas, Matplotlib, and Seaborn, I designed a customized K-Nearest Neighbors (KNN) model to predict a player's rating based on their opening move, game outcome, and opponent skill level — achieving a median error rate within 1.5% of a player's true ranking.

In addition, I built a Generative Pretrained Transformer to play chess by continually predicting the next move within a sequence. I tested its performance against StockFish, the most powerful Chess Engine, in order to illustrate the potential, limitations, and modern approaches of applying attention-based AI architectures to domains currently dominated by traditional deep learning systems.

Using Deep Learning to Predict B-Factors of Amino Acids via Protein Sequences

PythonPyTorchTransformersBERTDeep LearningBioinformaticsPandasNumPy

Working with two peers at Tulane University, I helped develop a transformer-based model to predict protein B-factors from amino acid sequences. B-factors measure how much each part of a protein varies from its average position; when, for example, a protein is modeled in Alphafold, each amino acid is displayed in its average position, while B-Factor would measure how much each amino acid would move. Our final model was trained on 60,000 protein sequences, achieving a Pearson correlation coefficient of 0.82, on-par with state-of-the-art performance.

My main role in the project involved systematically testing architectures and set-ups -- including linear models, RNNs, LSTMs, and Transformers -- to inform how we would design our final model. Using my experience in NLP, I found that BERT embeddings allowed all models to make far more accurate predictions, an insight which greatly improved the efficiency of our final model.