Assessing the impact of Open Research Information Infrastructures using NLP driven full-text Scientometrics: A case study of the LXCat open-access platform

This paper proposes a domain-agnostic, NLP-driven scientometric framework that uses full-text analysis to quantify the scholarly impact of open research information infrastructures beyond traditional citations, using the LXCat platform as a case study to reveal fine-grained patterns of data usage and research evolution.

Original authors: Kalp Pandya, Khushi Shah, Nirmal Shah, Nakshi Shah, Bhaskar Chaudhury

Published 2026-02-10
📖 4 min read☕ Coffee break read

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The "Library of Ingredients" for Plasma Science: A Simple Breakdown

Imagine you are a world-class chef trying to invent a brand-new type of sustainable fuel. To do this, you don't just need a recipe; you need a massive, perfectly organized warehouse filled with thousands of specific ingredients—the exact saltiness of a certain sea salt, the precise acidity of a specific lemon, and the exact smoke point of a rare oil.

If that warehouse is messy, or if the ingredients are scattered across a thousand different websites, you’ll spend all your time searching and none of your time cooking.

In the world of science, specifically in a field called Low Temperature Plasma (LTP) research, scientists face this exact problem. They need "ingredients" (data like electron collision rates and gas properties) to run their "recipes" (computer simulations and experiments).

LXCat is that massive, community-run warehouse. It’s a digital platform where scientists share the essential "ingredients" needed to understand plasma—the stuff that powers everything from microchips to medical treatments.


The Problem: The "Citation" Illusion

Usually, when people want to know if a library or a warehouse is useful, they look at citations. It’s like saying, "This cookbook is important because 100 people mentioned it in their own books."

But there’s a flaw: Just because someone mentions a cookbook doesn't mean they actually used the recipes inside. They might just be acknowledging it exists. To truly know if LXCat is changing the way science is done, the researchers in this paper realized they needed to look deeper. They didn't just want to see if people mentioned LXCat; they wanted to see if people were actually using the ingredients inside it.

The Solution: The "Digital Detective" (NLP)

To solve this, the researchers built a Digital Detective using Natural Language Processing (NLP).

Think of this detective as a super-fast reader that can scan 400 scientific papers in seconds. Instead of just looking for the name "LXCat," the detective looks for "clues" in the text:

  1. The Ingredients: Which specific gases (like Nitrogen or Oxygen) are scientists actually talking about?
  2. The Tools: Are they using specific digital "blenders" (like a software called BOLSIG+) to process the data?
  3. The Specific Shelves: Which specific databases within the LXCat warehouse are being pulled off the shelves most often?
  4. The Global Map: Which countries are the most active "chefs" in this kitchen?

What They Discovered

By letting the Digital Detective scan the literature, they found things a simple citation count would have missed:

  • The "Staple" Ingredients: They found that Nitrogen and Oxygen are the "salt and pepper" of this field—used constantly.
  • The "Power Tools": They saw a strong connection between the data in LXCat and the software tools used to process it, proving that LXCat isn't just a storage unit; it’s a vital part of the "machinery" of science.
  • The Growing Kitchen: They saw that while the US has long been a leader, the "cooking" is becoming a global event, with more and more countries joining the community.
  • New Flavors: They used "Topic Modeling" (a way of grouping ideas) to see that LXCat isn't just used for one thing. It’s being used for everything from cleaning the environment to powering spacecraft.

Why This Matters

This paper isn't just about plasma; it's a blueprint for a new way to measure impact.

The researchers have created a "measuring tape" that can be used for any scientific resource. Whether it’s a database of human genes or a map of the stars, we can now use this NLP "detective" to see if these resources are actually fueling real progress or if they are just sitting on the shelf gathering digital dust.

In short: They moved from counting "likes" to measuring "actual usage," proving that LXCat is the essential pantry that keeps the kitchen of plasma science running.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →