A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems

Imagine you are trying to teach a robot how to diagnose a patient. You hand the robot a massive medical file containing thousands of details: blood pressure, cholesterol levels, the color of their eyes, their favorite ice cream flavor, the brand of shoes they wear, and whether they have a fever.

The Problem: Too Much Noise
The robot gets confused. It doesn't know that "favorite ice cream" has nothing to do with a fever, but "blood pressure" does. In the world of data, this is called Big Data. It's huge, fast, and messy. If you feed the robot all this information, it takes forever to learn, and it often makes mistakes because it's distracted by the irrelevant details (like the ice cream).

This is where Feature Selection comes in. It's like a detective sifting through a pile of clues to find the one or two that actually solve the case, throwing away the rest.

The Old Way: The "Intersection" Trap
For a long time, scientists used a method called Fuzzy Rough Set Theory to find these clues. Think of this method as trying to find the "common ground" between two people.

Old Method: To see if two patients are similar, the old method looked at every single attribute they shared. If Patient A and Patient B both had high blood pressure AND liked vanilla ice cream AND wore red shoes, they were considered "similar."
The Flaw: In a world with thousands of attributes, this is like trying to find two people who share every single detail in the universe. It's nearly impossible. The math gets so heavy and slow that the computer chokes. Also, if there's a tiny bit of "noise" (a typo in the data), the whole calculation breaks down, making the robot confused.

The New Solution: FSbuHD (The "Distance" Detective)
The authors of this paper, Safarpour, Alavi, and their team, invented a new model called FSbuHD. Instead of looking for common ground (intersection), they decided to measure distance.

Here is the analogy:
Imagine you are in a crowded room with people speaking different languages (some speak English, some French, some use sign language, some use emojis). This is a Hybrid Information System.

The Old Way: You tried to find the perfect match by checking if everyone spoke the exact same words.
The New Way (FSbuHD): You simply measure how far apart two people are standing.
- If two people are standing right next to each other, they are "similar."
- If they are on opposite sides of the room, they are "different."
- The magic of FSbuHD is that it has a special ruler that can measure distance between any type of person, regardless of whether they are speaking, signing, or using emojis. It converts all these different "languages" into a single distance number.

How It Works: The Two Modes
The model works in two "moods" or states, depending on how strict the detective wants to be:

Normal State: The detective is cautious. They only group people together if they are very close to each other.
Optimistic State: The detective is hopeful. They group people together even if they are a little further apart, just in case they are related.

The Optimization: The Black Hole
Once they have measured the distances, they have a huge puzzle: "Which specific clues (features) should we keep to make the robot smart, without keeping the junk?"
To solve this, they used a Black Hole Algorithm.

The Analogy: Imagine a swarm of stars (potential solutions) floating in space. The "Black Hole" is the best solution found so far. The other stars are pulled toward the Black Hole. If a star gets too close, it gets "swallowed" (discarded) because it's not good enough. The remaining stars keep moving and adjusting until they find the perfect orbit—the perfect set of features.

The Results
The team tested this new detective (FSbuHD) on eight different datasets from the UCI repository (a giant library of real-world data, like heart disease records and credit card applications).

They compared it to other famous detectives (algorithms).
The Verdict: FSbuHD was faster, found fewer irrelevant clues, and made the robot (the classifier) more accurate. It was like finding the needle in the haystack without burning the whole barn down.

In Summary
This paper is about a smarter, faster way to clean up messy data. Instead of getting stuck trying to find perfect matches in a chaotic world, the new method measures how "far apart" things are. It handles mixed-up data types (numbers, words, yes/no) effortlessly and uses a cosmic "Black Hole" search to find the absolute best set of clues for making decisions. It's a major upgrade for anyone trying to make sense of the data explosion we live in today.

Here is a detailed technical summary of the paper "A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems."

1. Problem Statement

The paper addresses critical challenges in Feature Selection (FS) for Big Data and Hybrid Information Systems (HIS). HIS contain diverse attribute types (real-valued, categorical, set-valued, boolean, and linguistic variables). Traditional feature selection methods based on Fuzzy Rough Set (FRS) theory face two major limitations in high-dimensional spaces:

Computational Inefficiency: Calculating fuzzy equivalence relations via intersection operations on high-dimensional data is time-consuming and memory-intensive.
Noise Sensitivity: The intersection-based approach often leads to noisy data and inaccurate membership degrees, degrading the discrimination capability of the similarity relations. This results in suboptimal feature subsets that fail to preserve the essential characteristics of the data.

2. Methodology: The FSbuHD Model

The authors propose a novel feature selection model named FSbuHD (Feature Selection based on Hybrid Distance). The methodology transforms the feature selection problem into a constrained optimization problem solved via meta-heuristics.

A. Hybrid Distance Measure

Instead of relying on traditional intersection operations to define similarity, the model calculates the Hybrid Distance (HD) between objects. This distance metric handles mixed attribute types by defining specific distance functions for each type:

Boolean: Simple equality check (0 or 1).
Real-valued: Normalized Euclidean distance using standard deviation.
Set-valued: Based on the intersection cardinality of sets.
Linguistic Variables: Converted into Trapezoidal Fuzzy Numbers, defuzzified using the Centroid Method, and then treated as real-valued distances.
The total HD is the square root of the sum of squared distances across all attributes.

B. Fuzzy Equivalence Relations

Using the Hybrid Distance, the model constructs a Gaussian Kernel to generate a fuzzy similarity relation ( $R_G$ ):
$R_G(x_i, x_j) = \exp\left(-\frac{HD(x_i, x_j)^2}{2\sigma^2}\right)$
The authors prove that this relation satisfies $T_p$ -transitivity (using the product t-norm), making it a valid fuzzy equivalence relation.

C. Optimization Formulation

The feature selection problem is reformulated as a binary optimization problem:

Objective: Minimize the number of selected features ( $\sum \chi_k$ ).
Constraints: For any pair of objects $(x_i, x_j)$ $(x_{i}, x_{j})$ belonging to different decision classes, the similarity after feature reduction must remain below a threshold $\delta$ $δ$ (to ensure classes remain distinguishable).
- Constraint: $e^{-\frac{\sum \chi_k d^2}{2\sigma^2}} \leq \delta$
Decision Variables: $\chi_k \in \{0, 1\}$ , where 1 indicates the feature is selected.

D. Two Operational States

The model operates in two modes based on the decision-maker's preference regarding uncertainty:

Normal State: Uses the Fuzzy Lower Approximation ( $\underline{R}$ ), representing a conservative approach where objects must certainly belong to a class.
Optimistic State: Uses the Fuzzy Upper Approximation ( $\overline{R}$ ), representing a more permissive approach where objects possibly belong to a class.

E. Solution Algorithm

The resulting NP-hard optimization problem is solved using the Black Hole (BH) meta-heuristic algorithm, which simulates the gravitational pull of black holes to evolve a population of candidate solutions toward the optimal feature subset.

3. Key Contributions

Novel Similarity Definition: Replaces the computationally expensive and noise-prone intersection of fuzzy relations with a Hybrid Distance-based Gaussian Kernel, effectively handling mixed data types without discretization.
Optimization Framework: Reformulates feature selection as a constrained optimization problem, allowing for the use of advanced meta-heuristics to find global optima rather than greedy local solutions.
Dual-Mode Operation: Introduces "Normal" and "Optimistic" states, providing flexibility for different risk tolerances in decision-making systems.
Handling Linguistic Variables: Provides a robust mechanism for integrating linguistic variables (via trapezoidal fuzzy numbers and defuzzification) into the distance metric.

4. Experimental Results

The model was evaluated on 8 datasets from the UCI Machine Learning Repository (including Hybrid, Numerical, and Binary/Multi-class datasets).

Feature Reduction: FSbuHD consistently selected fewer features than competing algorithms (FARNeM, WARA, CfsSubsetEval, RSFSAID).
- Example: On the wpbc dataset, FSbuHD (Optimistic) selected only 5 features compared to 12-13 by other methods.
- Example: On the australian dataset, it selected 4 features (Optimistic) vs. 6-14 by others.
Classification Performance: The selected subsets were tested using Linear SVM, KNN, and Complex Tree classifiers.
- Accuracy: FSbuHD achieved accuracy comparable to or higher than the original full dataset and other feature selection methods.
- Robustness: In terms of Precision, Recall, and Matthews Correlation Coefficient (MCC), FSbuHD frequently outperformed or matched the best results among the compared algorithms, particularly in the Optimistic state.
Efficiency: The reduction in dimensionality significantly lowered the computational load for subsequent classification tasks without sacrificing predictive power.

5. Significance

This research offers a significant advancement in Big Data preprocessing and Hybrid Information Systems. By moving away from rigid intersection-based fuzzy rough sets to a distance-based optimization framework, the FSbuHD model:

Mitigates the Curse of Dimensionality: Effectively reduces high-dimensional data while preserving class separability.
Enhances Interpretability: Unlike feature extraction (e.g., PCA), feature selection retains original features, maintaining physical interpretability.
Adaptability: The ability to switch between "Normal" and "Optimistic" states allows the model to be tailored to specific application requirements (e.g., medical diagnosis vs. financial risk).
Scalability: The use of meta-heuristics makes the approach viable for complex, large-scale datasets where exact solutions are intractable.

In conclusion, the paper demonstrates that FSbuHD is a superior, efficient, and flexible method for feature selection in complex, heterogeneous data environments.

A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems

1. Problem Statement

2. Methodology: The FSbuHD Model

A. Hybrid Distance Measure

B. Fuzzy Equivalence Relations

C. Optimization Formulation

D. Two Operational States

E. Solution Algorithm

3. Key Contributions

4. Experimental Results

5. Significance

More like this

EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue

LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks

On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning