Eduardo Gade Gusmao eggduzao

Knowledge Domains | Data Engineering & Platform Systems across cloud, hybrid, and regulated data environments

Technical Quality | Focused on production-grade pipelines: from raw ingestion -> trusted datasets -> analytics-ready products

Engineering Practice | Strong emphasis on reliability, reproducibility, auditability, and systems that age well

Broad Curriculum | Experience spanning ETL/ELT, distributed processing, cloud/HPC, data quality, and ML-adjacent systems

I would like to know more...

Hello, and welcome to my profile. My name is Eduardo - grab a cup of coffee and allow me to introduce myself.

I build data platforms and production-grade data systems where messy real-world inputs are transformed into reliable, auditable, and analysis-ready products.

In practical terms, my work lives somewhere between:

Data ingestion - where reality arrives poorly formatted and with opinions.
Transformation layers - where Python, SQL, Spark, and modeling discipline try to restore civilization.
Data quality - because a pipeline that runs and a pipeline that is correct are not the same animal.
Delivery - where analytics, BI, ML, and internal users need datasets that are stable enough to trust and boring enough to maintain.

I specialize in Python, SQL, PySpark/Spark, cloud and hybrid data infrastructure, ETL/ELT pipelines, data modeling, validation frameworks, and reproducible platform workflows. I have worked across environments involving AWS, GCP, Azure, HPC clusters, Docker/Kubernetes, CI/CD, and large heterogeneous datasets that rarely introduce themselves politely.

My engineering bias is simple: build systems that are clear, testable, observable, and maintainable after the original excitement has left the room.

This usually translates to:

Production data pipelines with strong reproducibility, monitoring, and failure handling
Scalable batch and distributed processing for high-volume analytical workloads
Data modeling layers supporting analytics, BI, reporting, and ML workflows
Validation and governance practices for environments where correctness is not decorative
Cloud, hybrid, and HPC workflows designed to survive both scale and human memory

My background includes complex healthcare and enterprise data environments, but my current professional identity is straightforward: Senior Data Engineer / Data Platform Engineer. Machine Learning is still part of the toolbox, but the main job is now the plumbing, contracts, orchestration, and reliability that make downstream intelligence possible.

I value clean design, explicit trade-offs, and systems that are understandable by humans - not just machines with suspicious confidence.

Ethics, reproducibility, and long-term sustainability are not optional; they are part of the job.

Availability | Currently open to remote, hybrid, relocation-friendly, and long-term Data Engineering / Data Platform roles. See contact details. Relocation and onboarding take planning - good systems (and good moves) benefit from doing things properly.

Cheers.

2025 | Committer | Awarded Apache Spark Committer Status | The Apache Software Foundation (ASF) | Finland & Brazil

2022 | Senior Transition | Senior Data Engineer | Turku Biosciences & Brazilian Ministry of Health | Finland & Brazil

2020 | Outreach | Award-Winning COVID-19 Outreach Campaign | Göttingen University Medical School | Germany

2017 | Patent | LAG3-Targeting Cancer Therapy | Current owner: Bristol Myers Squibb | USA

2016 | Industry Transition | Data Engineer | Dana-Farber Cancer Institute | USA

2013 | Research | Computational Biology Researcher | RWTH University Medical School | Germany

I would like to know more...

Career Analytics

KEY MILESTONES

Data Engineer
- Efficient systems leveraging distributed systems
- Delivered 25+ full production solutions
- Combined operating margin growth of ~175%
Data & Health Researcher
- Advanced multi-omics precision medicine pipelines
- 8 full production solutions, earned ESG compliance
- Relatlimab + nivolumab shown an ORR of 43.1%!
Diplomat between stakeholders
- Advanced storytelling techniques
- Translate real-world business problems into systems
- Sharp and operational communication
Current: Cloud & Platform Systems
- Developing efficient cloud-based ecosystems
- Managing 4 DEs, 2 Data Scientists and 1 Dev
- Filed 2 patents and improved operating margin by ~18%

WRITER AND EDUCATOR

Technical Writing, Blogging & Prose
- Maintains a technical blog at GusmaoLab
- Shares prose at Medium
- Top-100 Best-Selling at Amazon
Public Speaking
- From personalized training to Platform Courses
- Keynote spearker at NeurIPS ICML and ISMB
- 200+ high-stakes presentations delivered
Professor, Educator & Mentor
- Professor of CS at UFPE/CIn and TUM/SLS
- Mentored 25+ Junior/Mid Data Engineers
- Taught >1,500 students; Mentored 15+ MSc/PhDs
Professional Networks
- Connects researchers and developers
- Bridges the gap between academia and industry
- Translates Bio/Pharma-Tach problems into solutions

name: "Eduardo Gusmao"
role: "Senior Data Engineer | Data Platform Engineer"
contact: "Recife, Brazil | eduardogade@gmail.com | github.com/eggduzao | linkedin.com/in/eduardogade"
languages: "English fluent | Portuguese native | Spanish B2 | German A2 | Finnish A1"

education: "2x PhD in Biomedical Informatics and Data Engineering / Computational Life Sciences; BSc + MSc in Computer Science"

summary: "Data Platform Engineer with 8+ years designing scalable data platforms, distributed systems, and production-grade pipelines across healthcare, life sciences, and enterprise environments. Strong Python, SQL, PySpark/Spark, AWS, Docker/Kubernetes, CI/CD, data quality, and analytics/BI platform experience."

professional_engagements:
  current_role:
    company: "Turku Biosciences / Brazilian Ministry of Health"
    title: "Senior Data Engineer"
    location: "Finland / Brazil"
    date: "Sep 2022 - Present"
    scope:
      - "Lead a national-scale precision-medicine data platform integrating genomic, phenotypic, and clinical EHR data for 65,000+ individuals."
      - "Build scalable Python, SQL, PySpark/Spark, Databricks-adjacent, HPC/SLURM, and API-driven ingestion and transformation workflows."
      - "Deliver regulated ingestion, validation, governance, PII-compliant processing, observability, idempotency, and data quality controls."
      - "Support analytics, BI, and ML workloads through reusable integration layers, backend data services, and optimized Parquet-based processing."

development_environment:
  infrastructure: "AWS | Azure | GCP | HPC/SLURM | Docker | Kubernetes | Terraform | GitHub Actions"
  languages: "Python | SQL | PySpark | Bash/Shell | Scala | Java | C/C++ | YAML | HCL"
  data_stack: "PySpark | Spark | Pandas | Polars | NumPy | BigQuery | PostgreSQL | Parquet | JSON | dbt | dimensional modeling"
  platform_engineering: "ETL/ELT | ingestion frameworks | transformation layers | platform APIs | CI/CD | observability | validation | data quality"
  ml_ai_stack: "ML pipelines | MLOps | feature engineering | LLM APIs | embeddings | RAG | Hugging Face"
  collaboration: "Agile/Scrum | stakeholder enablement | analytics teams | data scientists | engineers | product | infrastructure | security"

I would like to know more...

name: "Eduardo Gusmao"
role: "Senior Data Engineer | Data Platform Engineer | Cloud Data Engineer"
location: "Recife, Brazil"
contact: "eduardogade@gmail.com | github.com/eggduzao | linkedin.com/in/eduardogade"
languages: "English fluent | Portuguese native | Spanish B2 | German A2 | Finnish A1"

summary: "Data Platform Engineer with 8+ years of experience designing and operating scalable data platforms, distributed data systems, and production-grade pipelines across healthcare, life sciences, and enterprise environments. Strong expertise in Python and SQL, with hands-on experience in PySpark/Spark, Kafka-style event-driven workflows, Airflow, AWS, Docker/Kubernetes, Terraform, and CI/CD to build reliable data infrastructure and platform services."

core_expertise:
  - "Data Engineering"
  - "Data Platform Engineering"
  - "Cloud and Hybrid Data Infrastructure"
  - "Distributed Data Systems"
  - "ETL/ELT Pipelines"
  - "Data Modeling and Analytics Engineering"
  - "Data Quality and Governance"
  - "Healthcare and Life Sciences Data"
  - "Machine Learning Data Pipelines"
  - "Production Reliability and Observability"

career_profile:
  - "8+ years building scalable data platforms and distributed data systems"
  - "Production-grade ingestion, transformation, validation, observability, and internal tooling"
  - "Strong Python, SQL, PySpark/Spark, cloud, CI/CD, Docker/Kubernetes, and data quality background"
  - "Experience supporting analytics, BI, ML workflows, and mission-critical data products"
  - "Comfortable translating complex stakeholder requirements into maintainable platform capabilities"

professional_engagements:
  current:
    company: "Turku Biosciences / Brazilian Ministry of Health"
    title: "Senior Data Engineer"
    location: "Finland / Brazil"
    date: "Sep 2022 - Present"
    scope:
      - "Lead the design and delivery of a national-scale data platform for precision medicine, integrating multi-modal genomic, phenotypic, and clinical EHR data for 65,000+ individuals."
      - "Build scalable Python-based pipelines, distributed systems, PySpark/Spark workflows, Databricks-adjacent processing, HPC/SLURM execution, and API-driven ingestion workflows."
      - "Architect production-grade data platform services for regulated ingestion, validation, transformation, governance, PII-compliant processing, reproducible workflows, and quality/reliability controls."
      - "Develop high-performance processing and modeling layers using Python, SQL, PySpark, partitioning strategies, Parquet formats, and distributed query tuning."
      - "Design reusable integration layers and backend data services connecting heterogeneous clinical, genomic, and ERP/SAP data sources."
      - "Enable event-driven workflows, orchestration patterns, and batch/streaming-adjacent pipelines supporting analytics, BI, and ML systems."
      - "Collaborate with product, analytics, engineering, and infrastructure stakeholders to deliver platform capabilities, CI/CD, Docker/Kubernetes workloads, observability, logging, alerting, and performance tuning."
    outcomes:
      - "Integrated precision-medicine datasets for 65,000+ individuals."
      - "Improved pipeline efficiency by approximately 25%."
      - "Reduced storage costs by approximately 80%."
      - "Enabled more than 40% faster data delivery for downstream analytics, BI, and ML systems."

  previous_mid:
    company: "Göttingen General Hospital"
    title: "Data Engineer II"
    location: "Germany"
    date: "Mar 2019 - Sep 2022"
    scope:
      - "Designed and implemented scalable data platform services using Python and SQL on cloud and hybrid environments."
      - "Enabled reliable ingestion, transformation, and low-latency access for downstream analytics, BI, and application workloads."
      - "Developed and optimized high-performance ETL/ELT pipelines with Python and PySpark, leveraging distributed processing, batch workflows, and orchestration patterns."
      - "Refactored legacy systems into modular, production-grade platform services with CI/CD, automated testing, monitoring/logging, idempotency, retries, and robust error handling."
      - "Built reusable data processing frameworks and integration layers for large-scale heterogeneous datasets."
      - "Applied data modeling, validation, lifecycle standards, and governance across 12 cross-functional teams in a distributed environment."
    outcomes:
      - "Improved data availability and system responsiveness by approximately 33%."
      - "Improved reliability, maintainability, and operational efficiency by approximately 50-60%."
      - "Supported consistent data lifecycle practices across 12 cross-functional teams."

  previous_old:
    company: "Dana-Farber Cancer Institute"
    title: "Data Engineer I"
    location: "USA"
    date: "Jan 2016 - Mar 2019"
    scope:
      - "Developed cloud-native data platform services supporting large-scale drug discovery."
      - "Built Python-based ETL/ELT pipelines and API-driven integration layers for heterogeneous biomedical, operational, and financial datasets."
      - "Implemented end-to-end data processing pipelines using Python, SQL, and PySpark on Apache Spark distributed systems."
      - "Enabled scalable ingestion, transformation, validation, and batch workflows for analytics and ML-driven applications."
      - "Collaborated with product, analytics, and research stakeholders to define KPIs and translate requirements into data models, backend data logic, and reusable platform components."
      - "Contributed to production-grade data engineering practices including Git version control, validation checks, documentation, maintainable system design, reliability, and reproducibility."
    outcomes:
      - "Improved data accessibility and reduced operational costs by more than 25%."
      - "Supported analytics and ML-driven applications through reusable data platform components."
      - "Established reliable, reproducible lifecycle standards for heterogeneous biomedical and operational data."

education:
  phd_biomedical_informatics:
    degree: "Ph.D. in Biomedical Informatics"
    institution: "Harvard Medical School"
    location: "Boston / Cambridge, USA"
    date: "2013 - 2017"

  phd_computational_life_sciences:
    degree: "Ph.D. Dr. rer. nat. in Data Engineering and Computational Life Sciences"
    institution: "RWTH Aachen University"
    location: "Aachen, Germany"
    date: "2011 - 2015"

  bachelor_master_computer_science:
    degree: "B.Sc. and M.Sc. in Computer Science"
    institution: "Federal University of Pernambuco"
    location: "Recife, Brazil"
    date: "2008 - 2011"

technical_strengths:
  programming:
    primary: ["Python", "SQL", "PySpark", "Spark SQL", "Bash/Shell"]
    secondary: ["Scala", "Java", "C/C++", "YAML", "HCL"]
   : ["REST APIs", "Async programming", "Data serialization", "Production-grade software engineering", "Parquet", "JSON"]

  data_platform_engineering:
    capabilities: ["Scalable data platforms", "Distributed data systems", "Internal data tooling", "Reusable ingestion frameworks", "Transformation layers", "Platform APIs", "Developer-facing abstractions", "Self-service data capabilities", "Analytics enablement", "ML workflow support", "BI workload support"]

  distributed_data_systems:
    tools: ["PySpark", "Apache Spark", "Pandas", "Polars", "NumPy"]
    capabilities: ["Large-scale processing", "Distributed compute", "Performance tuning", "Partitioning", "Query optimization", "Resource efficiency", "Batch pipelines", "Streaming-adjacent pipelines", "Kafka", "Spark Streaming patterns"]

  cloud_hybrid_infrastructure:
    cloud: ["AWS", "Azure", "GCP"]
    infrastructure: ["HPC/SLURM", "Docker", "Kubernetes", "Terraform", "GitHub Actions", "GitLab CI"]
    capabilities: ["Cloud-native data infrastructure", "Hybrid data infrastructure", "Infrastructure-aware engineering", "Containerized workloads", "Deployment environments", "Scalable platform operations"]

  hadoop_on_prem_ecosystems:
    technologies: ["HDFS", "YARN", "Hive", "Kerberos"]
    capabilities: ["Distributed storage patterns", "Distributed compute patterns", "Legacy-to-modern platform evolution", "Secure access-controlled data environments"]

  data_modeling_tooling:
   : ["Dimensional modeling", "Semantic modeling", "Schema design", "Metadata management", "Transformation layers", "Data contracts", "Lineage", "Modeling standards", "Analytics enablement", "Platform consistency"]
    tools: ["dbt", "BigQuery", "PostgreSQL", "Parquet", "JSON"]

  software_engineering_devops_reliability:
    tools: ["Git", "GitHub", "GitHub Actions", "GitLab CI", "Docker", "Kubernetes", "Terraform"]
    practices: ["CI/CD pipelines", "Automated testing", "Deployment automation", "Monitoring", "Logging", "Alerting", "Observability", "Incident response", "Idempotency", "Retries", "SLA/SLO thinking", "Fault-tolerant design"]

  machine_learning_data_pipelines:
    capabilities: ["ML pipelines", "MLOps", "Feature engineering", "Data preparation", "Personalization workflows", "AI-enabled data workflows", "Production-oriented ML data support"]
    ai_llm: ["LLM APIs", "Embedding pipelines", "RAG", "Hugging Face"]

  data_security_governance_quality:
    capabilities: ["Data privacy", "PII-aware processing", "Compliance-aware pipelines", "Access control", "Validation strategies", "Auditability", "Data quality checks", "Governance practices", "Secure data lifecycle management", "Reliability controls", "Consistency checks"]

  processes_collaboration:
    practices: ["High-ownership engineering mindset", "Agile/Scrum", "Cross-functional collaboration", "Stakeholder enablement", "Requirements translation", "Technical documentation", "Platform capability delivery"]
    collaborators: ["Analysts", "Data scientists", "Engineers", "Product teams", "Infrastructure teams", "Security teams"]

development_environment:
  hardware: ["Apple Silicon", "ARM", "Intel", "NVIDIA GPU environments", "HPC clusters"]

  operating_systems: ["macOS", "Ubuntu", "Debian", "Fedora", "Windows"]

  infrastructure:
    cloud_computing: ["AWS", "Azure", "GCP"]
    hpc: ["SLURM", "OpenPBS", "Distributed compute environments"]
    containers: ["Docker", "Kubernetes", "Singularity"]
    infrastructure_as_code: ["Terraform", "HCL", "Cloud deployment"]

  languages:
    data_engineering: ["Python", "SQL", "PySpark", "Spark SQL", "Bash/Shell"]
    systems_and_general: ["C/C++", "Java", "Scala"]
    markup_and_config: ["YAML", "Markdown", "LaTeX", "HTML/CSS", "HCL"]

  data_stack:
    distributed_processing: ["Apache Spark", "PySpark", "Spark SQL", "Pandas", "Polars", "NumPy"]
    storage_formats: ["Parquet", "JSON", "CSV", "HDF5"]
    databases_and_warehouses: ["BigQuery", "PostgreSQL", "MongoDB", "DynamoDB", "Relational databases", "NoSQL databases"]
    modeling_and_quality: ["Dimensional modeling", "Semantic modeling", "Schema design", "Data contracts", "Lineage", "Validation checks", "Data quality checks", "dbt"]

  ml_ai_stack:
    frameworks_and_tools: ["PyTorch", "TensorFlow", "Keras", "Scikit-Learn", "Hugging Face", "NLTK"]
    workflows: ["ML pipelines", "MLOps", "Feature engineering", "Embedding pipelines", "RAG", "LLM APIs"]

  systems_tooling:
    version_control: ["Git", "GitHub"]
    packaging_and_environments: ["pip", "poetry", "micromamba", "mamba", "conda", "npm"]
    ci_cd: ["GitHub Actions", "GitLab CI"]
    observability: ["Logging", "Monitoring", "Alerting", "Observability", "Prometheus", "Grafana"]

github_positioning:
  short_pitch: "I build reliable data platforms, distributed pipelines, and production-ready data systems for analytics, BI, ML, and healthcare/life-sciences workloads."
  engineering_style:
    - "Clean, maintainable, typed Python"
    - "Data quality and reliability first"
    - "Production-aware platform design"
    - "Reproducible workflows"
    - "Strong documentation"
    - "Pragmatic cloud and hybrid infrastructure"

Email | eduardo@gusmaolab.org

LinkedIN | https://www.linkedin.com/eduardogade

Location | Recife, Brazil | Remote-friendly

Status | Open to Data Engineering roles

I would like to know more...

Professional Profiles

LinkedIn: https://www.linkedin.com/in/eduardogade/

Website & Blog: https://www.gusmaolab.org

One-Page Resume: https://www.gusmaolab.org/Gusmao-EG-CV.pdf

Stack Overflow: https://stackoverflow.com/users/32223943/eduardo-gusmao

Medium: https://medium.com/@eduardogade

Practical notes

Preferred contact: Email | LinkedIn

Response time: 1-2 business days

Open to remote, hybrid, or relocation

Details

See [availability & engagement details](#availability)

See [writting & communication details](#communication)

See [education](#education) & [leadership details](#career)

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

Apple
Python
PyTorch
GATK
Git
Snakemake
Gradio
Docker
AWS
Jira

Linux
R
TensorFlow
Bioconduct
GitHub
Nextflow
FastAPI
K8s
Terraform
Grafana

VsCode
Bash
JAX
Ruff
GActions
Mamba
Postgres
Redis
DtBricks
Prometheus

I would like to know more...

Placeholder.

Machine & Deep Learning | Repository | Publication

Variational Inference | Repository | Publication

Precision Medicine | Repository | Publication

Regulatory Genomics | Repository | Publication

I would like to know more...

Selected Publications (decreasing order by year)

Global age-sex-specific all-cause mortality and life expectancy estimates for 204 countries and territories and 660 subnational locations, 1950-2023: a demographic analysis for the Global Burden of Disease Study 2023

The Lancet · Oct 18, 2025

Contributions:

Responsible for orchestrating the LATAM-branch with 45+ PIs and 200+ researchers.
Horizontal meetings for data and experience sharing have shown great success, with ~380% more efficiency than the second most efficient branch - per capita.
Has solved pharmacological conflict of interests by cross-deployment and blind-genotype blind-phenotype strategy, which exhibit 17% increased accuracy over North America (first COI - percapita) and 5% over Asia (second COI - per capita).

A ONECUT1 regulatory, non-coding region in pancreatic development and diabetes

Cell Reports · Nov 26, 2024

Contributions:

The tool Bloom has increased analysis mechanism by promoting different views into the regulatory spatial configuration, resulting in ~50% wet-lab equipment cost reduction and solving a stalled-case.
Provided personal guidance towards architecture and Hi-C methodology, saving 15% overall lab-time.
Overall, this was the first non-trivial non-intermediary-distance (>1Gbp) lncRNA interference in a region unknown to be a regulatory enhancer.

Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021

The Lancet · Jul 15, 2023

Contributions:

Responsible for orchestrating a team of 3 brazilian PIs and 5 independent investigators.
Used scrum, coupled with CRISP-DM, delivering net gains (profitability converted back) through network revenue saving and wet/dry-lab material cost reduction.
Developed national-scale geno/phenotype QC pipeline - Fabric (Phenoteka Module) - used across 20+ institutes.

100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report

The New England Journal of Medicine · Oct 11, 2022

Contributions:

Developed Blacksmith, that coupled with Bloom improved operating margin by over 15%.
Freed at least 15 engineering hours per week with Blacksmith coupled with Apollo.
Intending to lower carbon footprint, we adopted a trademarked DB 'bit-brushing' methodology (currently owned by Databricks Inc.).

HMGB1 coordinates SASP‐related chromatin folding and RNA homeostasis on the path to senescence

Molecular Systems Biology · Jun 24, 2021

Contributions:

Analized Spatial Chromatin Biology and RNA-seq to identify - for the first time - HMGB1 as a 'rheostat' factor.
Reduced cloud compute costs by 40% using Apollo's strong mathematical features and Bloom to analyse Chromatin conformation.
After this project's results, we have earned an ESG compliance through impeccable waste management and safety handling.

Redundant and specific roles of cohesin STAG subunits in chromatin looping and transcriptional control

Genome Research · Apr 6, 2020

Contributions:

Analized most omics in a single project: ChIP-seq, degron-X, RNA-seq, Hi-C, STORM, DNase-seq, ATAC-seq, MSMS and MS-based microscopy.
Developed Musique, shortening development cycles by ~9 weeks.
Musique saved 300 GPU-hours per month by performing simple heuristics which are generalizeable to any dataset.

Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations

Molecular Cell · Jul 25, 2019

Contributions:

Patented technique for BLISS-seq data processing, earning ~25% extra funds for the laboratory.
Lower wet-lab costs using dry-lab tools by ~30% (estimated for this project); achieving reproducible and insightful results on MLL fusions.
Created the triple-correlation method. Translating category theory into a real-world phenomenon.

HMGB2 loss upon senescence entry disrupts genomic organisation and induces CTCF clustering across cell types

Molecular Cell · May 17, 2018

Contributions:

Developed Bloom and Apollo, which reduced processing time by at least 3 months.
Very agile methodology with microprocessed multicycled days, leading to novel discoveries and decreasing overall time-to-delivery.
Reduced local infrastructure storage footprint by ~100TB with Bloom & Apollo.

Integrated genomic and molecular characterization of cervical cancer

Nature · Jan 23, 2017

Contributions:

Devised bioinformatics pipelines with collaborators and created the Gaussian-as-DPMM method of clustering, increasing speed by, at least, ~100x.
Clustering was able to identify 3+ unique subtypes never previously reported.
Created a deep regulatory network, especially with SHKBP1 ERBB3 and TGFBR2; which contained 98% of the cancer mortality information variability.

Analysis of computational footprinting methods for DNase sequencing experiments

Nature Methods · Feb 22, 2016

Contributions:

Landmark study on comparing 12+ footprinting methods. The study was the cover of Nature Methods magazine.
Without any dry experiment, we were able to identify the limits of sequencing technologies, and propose results that exceded ~5% AUPR of known methods.
Our method - Olympus (published in 2023) - offers ~7x most complete analysis of regulatory genomics than any other tool.

Epigenetic program and transcription factor circuitry of dendritic cell development

Nucleic Acids Research · Oct 17, 2015

Contributions:

First use of Faun, the motif enrichment analysis that uses hypergeometric distributions to query the sensitivity and specificity of TF occupancy in a certain genomic region.
Proposed the usage of Cytoscape, widely minimising meeting preparation time by ~25%.
Proposed use of fewer histone modification essays by recreating chromatin states in silico; thus, minimizing project costs by ~30%

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

2018 | PhD | Biomedical Informatics | Harvard Medical School

2015 | PhD | Life Sciences | RWTH Aachen University

2011 | MSc | Machine & Deep Learning | RWTH Aachen University

2010 | BSc | Computer Science | RWTH Aachen University

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

Flagship: 🏳️‍⚧️ | 🏳️‍🌈 | 🇺🇳

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

I would like to know more...

Placeholder.

🚀 "If you ever change your mind about leaving it all behind, remember. Remember. No Geography." 🚀

_{Designed & Built - Eduardo Gusmao - 2025}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eduardo Gade Gusmao eggduzao

Achievements

Achievements

Block or report eggduzao

Career Analytics

KEY MILESTONES

WRITER AND EDUCATOR

Professional Profiles

Practical notes

Details

Apple
Python
PyTorch
GATK
Git
Snakemake
Gradio
Docker
AWS
Jira

Linux
R
TensorFlow
Bioconduct
GitHub
Nextflow
FastAPI
K8s
Terraform
Grafana

VsCode
Bash
JAX
Ruff
GActions
Mamba
Postgres
Redis
DtBricks
Prometheus

Selected Publications (decreasing order by year)

Global age-sex-specific all-cause mortality and life expectancy estimates for 204 countries and territories and 660 subnational locations, 1950-2023: a demographic analysis for the Global Burden of Disease Study 2023

A ONECUT1 regulatory, non-coding region in pancreatic development and diabetes

Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021

100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report

HMGB1 coordinates SASP‐related chromatin folding and RNA homeostasis on the path to senescence

Redundant and specific roles of cohesin STAG subunits in chromatin looping and transcriptional control

Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations

HMGB2 loss upon senescence entry disrupts genomic organisation and induces CTCF clustering across cell types

Integrated genomic and molecular characterization of cervical cancer

Analysis of computational footprinting methods for DNase sequencing experiments

Epigenetic program and transcription factor circuitry of dendritic cell development

🚀 "If you ever change your mind about leaving it all behind, remember. Remember. No Geography." 🚀

Popular repositories Loading

Uh oh!

Apple	Python	PyTorch	GATK	Git	Snakemake	Gradio	Docker	AWS	Jira
Linux	R	TensorFlow	Bioconduct	GitHub	Nextflow	FastAPI	K8s	Terraform	Grafana
VsCode	Bash	JAX	Ruff	GActions	Mamba	Postgres	Redis	DtBricks	Prometheus

Eduardo Gade Gusmao eggduzao

Achievements

Achievements

Career Analytics

KEY MILESTONES

WRITER AND EDUCATOR

Professional Profiles

Practical notes

Details

Apple Python PyTorch GATK Git Snakemake Gradio Docker AWS Jira Linux R TensorFlow Bioconduct GitHub Nextflow FastAPI K8s Terraform Grafana VsCode Bash JAX Ruff GActions Mamba Postgres Redis DtBricks Prometheus

Selected Publications (decreasing order by year)

Global age-sex-specific all-cause mortality and life expectancy estimates for 204 countries and territories and 660 subnational locations, 1950-2023: a demographic analysis for the Global Burden of Disease Study 2023

A ONECUT1 regulatory, non-coding region in pancreatic development and diabetes

Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021

100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report

HMGB1 coordinates SASP‐related chromatin folding and RNA homeostasis on the path to senescence

Redundant and specific roles of cohesin STAG subunits in chromatin looping and transcriptional control

Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations

HMGB2 loss upon senescence entry disrupts genomic organisation and induces CTCF clustering across cell types

Integrated genomic and molecular characterization of cervical cancer

Analysis of computational footprinting methods for DNase sequencing experiments

Epigenetic program and transcription factor circuitry of dendritic cell development

🚀 "If you ever change your mind about leaving it all behind, remember. Remember. No Geography." 🚀

Popular repositories Loading

Uh oh!

Apple
Python
PyTorch
GATK
Git
Snakemake
Gradio
Docker
AWS
Jira

Linux
R
TensorFlow
Bioconduct
GitHub
Nextflow
FastAPI
K8s
Terraform
Grafana

VsCode
Bash
JAX
Ruff
GActions
Mamba
Postgres
Redis
DtBricks
Prometheus