Open to full-time opportunities · Fairfax, VA

Turning messy data
into decisions that ship.

I build  

Data & Database Analyst with 2+ years building production ETL pipelines, enforcing data quality, and modeling relational and analytical schemas. MS in Data Analytics Engineering from George Mason University (GPA 3.90).

2+
Years in industry
500K+
Records pipelined
3.90
GMU MS GPA
5
Shipped projects
01 · About

Who I am

Data work that actually holds up in production, not just in slides.

I'm Rushikesh, a data / database analyst based in Fairfax, VA. I spent my first two years at Cognizant Technology Solutions writing SQL, stored procedures, and ETL workflows in BigQuery on GCP — the kind of work where a broken pipeline at 2 a.m. is your problem to find and fix.

At George Mason University I finished my MS in Data Analytics Engineering in December 2025 (GPA 3.90). My graduate work focused on source-to-target mapping, data quality enforcement, and relational data modeling across heterogeneous sources — climate APIs, cybersecurity feeds, public health records.

I care about the boring parts: schema validation, metadata docs, null handling, referential integrity. That's where reliability comes from. I work equally well on the analysis side — SQL, Python, Tableau, Power BI — when the question is "what does this data mean?" instead of "how do we move it?"

Currently open to full-time roles in data analytics, data engineering, or analytics engineering in the US.

Location
Fairfax, VA · Open to relocation
Work Authorization
US · No sponsorship needed
Best at
SQL · ETL · Data quality · Modeling
Also comfortable with
Python · GCP/AWS · Tableau · Airflow
02 · Experience

Where I've worked

Programmer Analyst / Data Analyst
Cognizant Technology Solutions · Pune, India
Feb 2022 – Nov 2023
  • Designed and optimized SQL queries, stored procedures, and ETL workflows in BigQuery on GCP, improving data delivery time by 25% and cutting compute cost by 18%.
  • Performed data profiling, source-to-target data mapping, and pre/post-load data quality checks across reporting pipelines.
  • Implemented data validation rules — constraint enforcement, null handling, referential integrity — to keep reporting datasets clean across systems.
  • Automated deployment and testing with Docker, Airflow, and GitHub Actions CI/CD, reducing deployment errors by 20%.
  • Documented data definitions, transformation logic, and metadata standards for governance and team knowledge sharing.
Data Engineer — Climate Data Integration Platform
Academic + Deployed · Fairfax, VA
Aug 2025 – Dec 2025
  • Profiled and mapped multi-source datasets (APIs, relational databases, CSVs) into a unified relational schema supporting 500K+ records across 165 countries.
  • Built fact and dimension data models reducing query latency by 70%.
  • Enforced data quality across 4 validation dimensions — schema match, value ranges, unit consistency, null handling — improving pipeline reliability by 85%.
  • Authored data dictionaries, pipeline docs, and data flow diagrams to support governance and maintainability.
  • Containerized pipelines with Docker and deployed via CI/CD with GitHub Actions for reproducible runs.
03 · Projects

Selected work

Each of these is either deployed live or has a public repo. Code, demos, and metrics linked below.

NLP to SQL Transformer
NLP-SQL Transformer

Transformer-based NLP model (BART fine-tune) that converts natural-language questions into valid SQL over multiple uploaded CSVs. Reached 45.6% exact-match accuracy on the Spider benchmark. Deployed as a live Streamlit app on Hugging Face Spaces.

PythonBARTPyTorchStreamlitHuggingFace
Forest Analytics / ClimateGPT
Forest Analytics · ClimateGPT

Data pipeline over 500K+ environmental records integrated with the ClimateGPT API for automated climate insights. Star-schema data models on forest loss, CO₂ emissions, and land cover; query latency down 70%. 165-country coverage.

PythonSQLDockerGitHub ActionsClimateGPT
KEV Vulnerability Prediction
KEV Vulnerability Prediction

XGBoost model predicting which CVEs will become Known Exploited Vulnerabilities. Ingested CVE / CVSS / EPSS / CPE feeds into a normalized schema, handled class imbalance with SMOTE + PCA + threshold calibration, reached AUC 0.996. Served via REST API.

PythonXGBoostPostgreSQLREST APISMOTE
Public Health Analytics
Public Health Analytics — Drug Overdose

Profiled and cleaned 20 years of overdose records using AWS Glue DataBrew; mapped multiple source schemas into a unified analytical model. SQL analytics + Tableau dashboards surfacing demographic risk (notably fentanyl's disproportionate impact on ages 25–34).

PythonSQLAWS GluePostgreSQLTableau
FIFA 22 Player Analysis
FIFA 22 Player Performance Analysis

Statistical + ML analysis of FIFA 22 player data across top football leagues. Interactive dashboard for player ratings, positional performance, and league comparisons. Increased stakeholder insight accessibility by 20%.

PythonRPandasData Viz
Infrastructure Analytics
Infrastructure Analytics & Optimization

Condition modeling on 7,500+ structural assets using XGBoost and Random Forest. Applied Chi-Square, ANOVA, and NPV analysis to surface relationships that inform maintenance investment decisions. R² = 0.45 on forecasted ratings.

PythonXGBoostRandom ForestANOVASQL
04 · Skills

Tooling

What I reach for first — grouped by where it fits in the stack.

Databases & SQL
PostgreSQLMySQLBigQueryStored ProceduresTriggersStar SchemaNormalized Modeling
Data Engineering & ETL
Data ProfilingSource-to-Target MappingData Quality RulesMetadata MgmtAirflowSpark
Programming
PythonBash / ShellRJavaUNIX / Linux
Cloud & Platforms
GCP · BigQueryAWS · S3 / EC2 / GlueDatabricksSnowflake
DevOps & Automation
DockerKubernetesGitHub ActionsGitTerraform
Visualization & BI
Power BITableauStreamlit
ML & Analytics
XGBoostRandom ForestPCA / SMOTENLP (BART)ANOVA / Chi-Square
Governance & Docs
Data DictionariesData Flow DiagramsRunbooksAgile
05 · Education

Training

George Mason University
M.S. Data Analytics Engineering
Fairfax, VADec 2025GPA 3.90

Coursework: Big Data, Machine Learning, Predictive Analytics, NLP with Deep Learning, Data Engineering, Database Design.

Savitribai Phule Pune University
B.E. Mechanical Engineering
Pune, IndiaMay 2021GPA 3.52

Undergraduate research: Hybrid CFRP/GFRP composite analysis under thermal and bending loads using MATLAB, ANSYS, and SolidWorks.

Let's talk.

Open to full-time data analyst, data engineer, and analytics engineer roles. Fastest reply is email.