Projects

Identifying "At-Risk" Entities

Developed a feature for the Elastic Security product, which uses mathemetical models such as time decay, Reimann Zeta function, and Bayes factors, to highlight the most "at-risk" hosts in an organization, based on the activity seen on the hosts. Being able to identify risky entities is a crucial part of any SIEM (Security Information and Event Management) product it provides security analysts with a useful starting point for triage. Read more here.
Technology used: Painless (similar to Java), Elasticsearch, Kibana

Identifying Network Beaconing

Led the development of a framework to detect beaconing malware by applying mathematical models such as signal autocorrelation on network data. The framework effectively identifies beaconing indicators for attacks such as NOBELIUM and popular C2 frameworks like Koadic, Empire, and Metasploit. Read more here.
Technology used: Painless (similar to Java), Elasticsearch, Kibana

Model Monitoring Pipeline

Designed a pipeline to visually evaluate the efficacy of candidate production machine learning models and monitor their performance after deployment. The pipeline processes metadata from more than 300 million Windows and/or MacOS binaries prior to every model release to generate a set of dashboards to evaluate the new model for production. Post deployment, the model performance is monitored in almost realtime and alerts are generated if performance deteriorates.
Technology used: Python, AWS Batch, Elasticsearch, Kibana

ProblemChild: Detecting Living-off-the-Land Attacks

Developed a hybrid framework using supervised machine learning and anomaly detection to identify rare parent-child relationships in Windows process event data, which is an indicator of a class of cybersecurity attacks called living-off-the-land attacks. Read more here.
Technology used: Supervised Machine Learning, Anomaly Detection, Elasticsearch, Kibana

Malicious URL Detection

Scoped and developed a fast, lightweight Random Forest model for malicious URL detection using a combination of static lexical features and ngrams derived from URL strings. The model resulted in a 22% increase in detections coming from FireEye's URL Detection Engine. Read more here.
Technology used: Python, NLTK, Scikit-Learn, AWS (several services)

Clustering to identify phishing campaigns

Built a clustering algorithm using Transfer Learning and Approximate Nearest Neighbor (Annoy) clustering on screenshots of webpages to identify and detect large phishing campaigns. A pre-trained Deep CNN model was used to learn features from the webpages, and Annoy was used for approximate clustering since the algorithm was expected to deal with ~9 million URLs per day.
Technology used: Python, Keras (TensorFLow backend), Annoy, AWS (several services)

Predicting user churn in online health communities

Used an NLP and ensemble learning approach to categorize ~3 million user posts, contributed by ~50,000 users on an online health community, into different types of social support. These annotated posts were then used to create state trajectories to capture the change in posting activity of users across several months. Finally, the trajectories were used as inputs to a Bayesian model to predict the probability of a user churning from the community. Read more here.
Technology used: Python, NLTK, Word2Vec, Scikit-Learn

Address

San Jose, CA
United States of America