Hello, I'm Long Luu, a Senior Data Scientist at AstraZeneca. My works are contributing to the emerging field of Digital Health, which applies the state-of-the-art digital technology and artificial intelligence to revolutionize healthcare and medicine. My main line of research is on remote patient monitoring which facilitates continuous tracking of patients' health throughout the day at the comfort of their homes (e.g. automatic monitoring of physical activity).

Before joining AstraZeneca, I had been conducting several research projects at academic institutions. During my postdoctoral and doctoral years, I studied human decision-making and visual perception using computer-controlled behavioral experiments, statistical analysis, computational modeling and machine learning. Before that, my research works focused mostly on the field of computer vision, image processing and biomedical imaging.

If you would like to learn more about my professional experience and education, feel free to check out my LinkeIn page.

Skills

coding

Programming Languages

Since my first exposure to coding in high-school, I have been always enchanted by the beauty of communicating with computers through programming languages. I used to learn languages as archaic as dinosaur (Assembly, Pascal, VHDL, Verilog). Early in my career, I mostly used Matlab and C++ to develop computer vision software and perform computer-controlled behavioral experiments. More recently, I mainly code with Python, R, SQL and Spark for data science and machine learning applications.

Data Analytics

For most of my career, I have been working extensively on drawing insights from complex data using advanced analytics tools. I'm proficient in manipulating data of all kinds (structured and unstructured) and making insightful visualization using data analytics tools (pandas, scipy, numpy, matplotlib, plotly, seaborn in Python and tidyverse, ggplot in R). You can check out my COVID-19 visualization project. I also have experience wrangling large amount of data (up to 20 TB) on cloud platforms using tools like AWS Athena, Glue, SQL, Linux. As a trained scientist, I also have considerable experience in performing advanced statistical analyses and hypothesis testing (like A/B testing ) to draw actionable and deep insights from complex data (you can check out this postdoctoral project).

coding
coding

Machine Learning & Artificial Intelligence

I have extensive experience building machine learning models to make prediction and gain insights from data. During my PhD at UPenn, I performed research with traditional Machine Learning methods: Bayesian modeling, Linear/Logistic regression, LDA, SVM, Tree-based models, PCA, k-means, kNN, Gaussian mixture, tSNE. Later on when I started working at Columbia University and AstraZeneca, I transitioned to the more recent neural networks models and worked on state-of-the-art architectures like CNN, RNN-LSTM, TCN, and GAN. I mostly use popular Python packages to train machine learning models (scikit-learn, tensorflow, keras, pytorch, transformers).

Natural Language Processing

My first exposure to NLP was a project in which I predicted a user's emotion (happy/sad) from their Twitter posts using bag-of-words approach and traditional machine learning methods like Naive Bayes and SVM. Later on, I learned more about feature extraction and preprocessing in NLP (stemming, lemmatization, TF-IDF, tokenization ) and applied them along with machine learning and tSNE method to gain insight from job descriptions. More recently, I'm into the world of LLM and Transformer and have done projects to apply them to clinical texts for several important tasks such as clinical entity recognition and question-answering. For NLP, I mostly use popular Python libraries like nltk, spacy, transformers.

coding
coding

Computer Vision & Medical Imaging

Early in my career, my research focused on traditional computer vision methods. I worked on a project to develop computer vision software that can accurately track changes in objects' properties over time. In a similar vein, I also developed and validated medical imaging softwares to measure blood flow and blood oxygen which are strong diagnostic measures of several diseases (e.g. diabetes as illustrated in my works). Later on when I focused more on human vision rather than computer vision, I had an interesting project to compare those vision systems. I also did a project to detect cancer in medical scan images using various neural network models (CNN, ResNet, Inception) and transfer learning technique. Another intersting project I did was to employ styleGAN (a generative neural network) to generate synthetic images and FaceNet to analyze the synthetic images for behavioral experiments.

Time serise analysis

During my time at AstraZeneca, I spent a lot of time extracting actionable insights from sensor data (e.g. accelerometer, acoustic signals). Several of my projects involve deriving physical activity measures (e.g. step count, walk distance) from wearable devices using both signal processing algorithms and state-of-the-art neural networks (CNN, LSTM, WaveNet) with a focus on clinical use. I also worked on developing, validating and deploying machine learning algorithms (XGBoost, YamNet) to detect cough from acoustics signals.

coding
coding

Web Development

I mostly work from the backend side to develop API that a front-end App or website can query results of machine learning models' prediction (using, for example, Python fastAPI). That said, I also enjoy the front-end party by making beautiful websites. In fact, I designed and created the site you're reading right now from scratch using tools like HTML, CSS, Bootstrap, JavaScript (code is here if you're interested).

Software Development

I'm a strong advocate of best practices in software development such as Agile and DevOps principles. As a way of working, I adopt tools such as Jira, Kanban, Srum, Sprint into my daily workflow. From the coding perspective, I use Git (Bitbucket, GitHub) for version control, VSCode for IDE, pytest for unit testing, Jenkins, Github Action for CI/CD, Docker, Kubernetes for deployment, and popular cloud tools like AWS (SageMaker, Athena, Glue, S3), Slurm, GCP.

coding

Projects

Hobbies

books

I love reading books! Fun facts: I translated 2 popular psychology books from English to Vietnamese and they were officially published ("Chatter" by Ethan Kross and "Beyond Order" by Jordan Peterson, in case you're curious).

lotus

I started practicing mindfulness meditation since I was in college and it was a profound and transformative turning points in my life. At some point, it's no longer just a "skill" but becomes a way of being and an essential part of my life like breathing and eating.

flute

I love playing musical instruments. I started out with guitar and really enjoy it. One day, I heard a beautiful flute song in a movie and decided to learn playing flute. It turns out to be the best experience I have with musical instruments (despite a lot of struggles and frustration to make it work).

family

I love spending time with my family, esp. playing my little daughter. We have so much fun in doing everything (even silly things)!