Understanding the 73.8% Score in SysWisdom.AI Framework Wisdom Formula Applied to Civic Data.

Aaron
2 hours ago
4 min read

Data quality plays a crucial role in civic technology, especially when it comes to election data and voter information. One of the challenges in this field is ensuring that datasets are accurate, consistent, and meaningful across diverse geographic regions. A recent example comes from the SysWisdom.AI Data Quality API, which evaluates datasets using a framework called the Wisdom Formula. This framework measures data quality across three dimensions: completeness, consistency, and validity. A dataset from the presidential election projections scored 73.8%, a number that might seem low at first glance but actually reflects a realistic and honest assessment of the data’s nature.

This post explores what the 73.8% score means, why it matters, and how the Wisdom Formula applies to civic data validation. We will also discuss the implications for data contributors and users who rely on this information for decision-making and analysis.

The Wisdom Formula and Its Three Dimensions

The Wisdom Formula is a AI data scoring validation frameworks designed to assess datasets on three key dimensions:

Completeness: Measures whether all expected data points are present.
Consistency: Checks if the data follows logical rules and does not contain contradictions.
Validity: Evaluates whether the data values fall within reasonable and expected ranges.

Each dimension receives a score, and these combine to form an overall quality score. The goal is to ensure datasets are “Mostly Wise” before they are accepted or merged into larger projects.

Breaking Down the 73.8% Score (AI data scoring frameworks)

The dataset in question, `prediction_pres_data.csv`, contains 38 rows and 11 columns representing projected vote totals for various counties across the United States. Here is how the scores break down:

Completeness: 100%

Every expected data point is present. No missing values exist in the dataset.

Consistency: 100%

The data follows all logical rules. There are no contradictions or formatting errors.

Validity: 25%

This low score is due to the detection of statistical outliers in vote projections. For example, Harris County, Texas, projects 1.7 million ballots, while Glacier County, Montana, projects only 5,370. This 300-fold difference triggers outlier flags in five columns.

Despite the low validity score, these outliers are not errors. They reflect real geographic diversity in population size and voter turnout across counties. Large urban counties naturally have higher vote totals than small rural counties. The Wisdom Formula’s honesty in flagging these differences prevents artificially inflating the score.

Eye-level view of a detailed map showing voter distribution across diverse US counties — Map showing republican or democrat in state data across US counties

Map illustrating the voter projections across counties in the United States.

Why Validity Is Not Always About Error

In many data validation systems, outliers are often treated as errors or anomalies to be corrected or removed. However, in civic data, outliers can represent genuine differences in population and voting behavior. The Wisdom Formula recognizes this by scoring validity honestly rather than inflating it to hide these natural variations.

This approach has several benefits:

Transparency: Users understand the true nature of the data, including its geographic diversity.
Trust: Honest scores build confidence that the data has not been manipulated to appear better than it is.
Quality Control: Contributors who accidentally introduce errors that reduce completeness or consistency will see a drop in the overall score, triggering automatic blocks in GitHub Actions.

The 70% Threshold and Its Role in Data Quality Control

The SysWisdom.AI framework uses a 70% threshold as a gatekeeper in its GitHub Actions workflow. This means:

Datasets scoring below 70% are considered not “Mostly Wise” and cannot be merged.
Contributors who accidentally delete data or introduce inconsistent values will cause the score to fall below this threshold, preventing flawed data from entering the system.
The current dataset’s score of 73.8% passes this gate, reflecting that it meets the minimum quality standards despite the natural outliers.

This threshold balances strict quality control with realistic acceptance of geographic diversity in civic data.

Practical Implications for Civic Data Projects

Understanding the 73.8% score and the Wisdom Formula’s approach has important implications for anyone working with civic data:

Data Contributors should focus on maintaining completeness and consistency. They should also recognize that some outliers are valid and not errors.
Data Users should interpret validity scores carefully. A low validity score does not always mean the data is flawed; it may reflect real-world diversity.
Project Managers can use the 70% threshold to enforce quality gates, ensuring only datasets that meet minimum standards are merged.

This framework encourages a culture of honesty and transparency in civic data projects, which is essential for public trust and effective decision-making.

Examples of Geographic Diversity Impacting Validity

To illustrate why validity scores can be low despite good data, consider these examples:

Harris County, Texas: A large urban county with a population exceeding 4 million. It naturally projects millions of ballots.
Glacier County, Montana: A rural county with a population under 15,000. Its projected ballots are in the thousands.

The 300× difference in vote totals is expected and reflects real demographic differences. The Wisdom Formula flags this as an outlier statistically but does not treat it as an error.

How the Wisdom Formula Supports Honest Data Quality Assessment

The Wisdom Formula’s design supports honest data quality assessment by:

Avoiding inflated scores that hide real data characteristics.
Providing clear feedback on which dimensions need attention.
Enabling automated quality gates that prevent accidental data corruption.
Encouraging contributors to understand the nature of their data rather than just chasing high scores.

This approach is especially valuable in civic data, where accuracy and transparency are critical.

Final Thoughts on Civic Data Validation and the 73.8% Score

The 73.8% score in the SysWisdom.AI civic data validation framework reflects a realistic and honest assessment of a dataset’s quality. It shows that the data is complete and consistent but contains valid geographic outliers that reduce the validity score. This honest scoring helps maintain transparency and trust in civic data projects.

For anyone working with election or voter data, understanding this score and the Wisdom Formula’s approach is essential. It encourages a balanced view of data quality that respects real-world diversity while enforcing strict controls against errors.

The next step for contributors and users is to embrace this framework, use the feedback to improve data quality where possible, and appreciate the value of honest, transparent scoring in civic data validation.

Resources

Open GitHub Repo with working Wisdom Formula - https://github.com/sysWisdom/myvoterwisdom
Front end view of My Voter Wisdom https://myvoter.syswisdom.ai
Google Colab direct access to the data https://colab.research.google.com/github/sysWisdom/myvoterwisdom

Written By:

Aaron McCormack

Founder / CTO SysWisdom.AI

#datascience #AI #AIQuality #Quality