Metapub Citations

Research citing metapub in PubMed journals, posters, and beyond.



Emerging trends in multiple sclerosis research

Authors: Gloria Dalla Costa, Giancarlo Comi

DOI: 10.1016/j.msard.2022.104124

Methods

The list of Pubmed ID (PMID) of articles published from 2000 onwards on the research topic ‘Multiple Sclerosis’ has been downloaded from Pubmed, and for each PMID the abstract was retrieved using the metapub library for Python.


Uremic toxicity: gaining novel insights through AI-driven literature review

Authors: Hanjie Zhang, Peter Kotanko

Journal: Nephrology Dialysis Transplantation

Year: 2024

DOI: 10.1093/ndt/gfae069.657

Method

First, we collected on PubMed all abstracts related to the topic of “uremic toxins” through Metapub, a Python library designed to facilitate fetching metadata from PubMed. Second, we set up a RAG system that comprises 2 steps. In a retrieval step, the questions on topic (“uremic toxins”) and the documents (=all collected abstracts and manuscripts) are encoded into vectors (i.e., high-dimensional numerical representations). Similarity measures are used to find the best matches between documents and the questions on topic. Second, in the augmented generation step, the LLM (e.g., ChatGPT) uses these best matches of documents to generate a coherent and informed response.


Loop Catalog: a comprehensive HiChIP database of human and mouse samples

Authors: J Reyna, K Fetter, R Ignacio et al

DOI: 10.1101/2024.04.26.591349

Curating HiChIP and ChIP-seq Samples

To identify a comprehensive list of publicly-released HiChIP datasets, we developed a pipeline that scans NCBI’s Gene Expression Omnibus (GEO) database (Barrett et al., 2013) for studies performing HiChIP experiments. To extract information on these studies the BioPython.Entrez (Buchmann & Holmes, 2019) and metapub.convert (https://pypi.org/project/metapub/) packages were used. Raw sequencing data associated to these studies was then identified from the SRA database using the pysradb Python package (https://github.com/saketkc/pysradb) and the results were manually examined to extract HiChIP samples. ChIP-seq samples corresponding to these studies were also extracted if there was a record of them within the same GEO ID as the HiChIP sample.


Adera2.0: A Drug Repurposing Workflow for Neuroimmunological Investigations Using Neural Networks

Authors: Marzena Lazarczyk et al

DOI: 10.3390/molecules27196453

"The first phase of the workflow (phase I) covers the aim of building a database of the JSON format containing parsed PDFs (Figure 1). This phase consists of five steps. The first step’s objective is to fetch the PubMed IDs related to the search query. This is accomplished by using the PubMed fetcher function available through the Metapub python library. This step uses the input query to search for recent PubMed articles that match the query terms. After that, in the second step, the workflow fetches the abstracts and keywords of the retrieved PubMed IDs. This is achieved through the use of the python library Keybert. The third step in this phase involves downloading the identified PDFs; this is done using the fetch_PDFs library."


Navigating the Multiverse: A Hitchhiker’s Guide to Selecting Harmonisation Methods for Multimodal Biomedical Data

Authors: Murali Aadhitya Magateshvaren Saras, Mithun K. Mitra, Sonika Tyagi

DOI: 10.1101/2024.03.21.24304655

Based on existing literature reviews published over the past decade, a general outline was followed to select articles that mentioned multimodal learning techniques. An extensive search of various ML methods focused on biomedical data was initially gathered using the metapub (https://github.com/metapub/metapub) python module based on the following keywords...



TREASURE: Text Mining Algorithm Based on Affinity Analysis and Set Intersection to Find the Action of Tuberculosis Drugs against Other Pathogens

Authors: Pradeepa Sampath et al

DOI: 10.3390/app11156834

Appl. Sci. 2021, 11, 6834 7 of 19 4.1. Data Preprocessing Around eight drugs are analyzed with this model. For this purpose, abstracts from each document are collected, as they provide the accurate and necessary information about the paper. The PubMed abstracts have a unique ID called PMID. The metapub li-brary in python gets these IDs as input and extracts their corresponding abstracts. The number of document abstracts collected for each drug from PubMed is given in Table 1. Table 1. Number of documents collected from PubMed for each drug.


Trends in Technology Usage for Parkinson’s Disease Assessment: A Systematic Review

Authors: Ranadeep Deb, Ganapati Bhat, Sizhe An, Holly Shill, Umit Y. Ogras

Year: 2021

DOI: 10.1101/2021.02.01.21250939

DATA COLLECTION

The methodologies used for downloading the data from the four online databases are also different. The documents were exported in comma-separated values (CSV) format from IEEE Xplore, tab-delimited format from MDPI and BibTex format from Science Direct. While, for PubMed Central, we used a Python-based API, Metapub [104] for an automated search. The information extracted from all of the databases were accumulated and stored together in a .CSV file.

Citation given as:

N. Most. (1999) metapub . PyPI. https://pypi.org/project/metapub/, accessed March 7, 2019.


Promoting Fairness in Classification of Quality of Medical Evidence

Authors: Simon Šuster, Timothy Baldwin, Karin Verspoor

DOI: 10.18653/v1/2023.bionlp-1.39

Data We collect a large dataset of clinical trial abstracts from studies for which manual RoB annotations exist in CDSR, similarly to Marshall et al. (2015a). Starting with the PubMed identifiers for the studies included in CDSR, we then searched for abstracts using the metapub package (5) obtaining a total of around 24,000 abstracts.