Software & Tools

⚙️ Software, Tools & Resources

Committed to open science — building computational tools, fine-tuned models, multilingual databases, and specialized corpora for the research community. All resources are freely available for academic use.

160K+

HuggingFace Downloads

30+

Open-Source Tools

15+

Corpora & Databases

28+

Languages Supported

11+

Discourse Languages

🤖 Fine-tuned Large Language Models

🧠

Personality Detection LLMs ⬇ 160K+ downloads

Fine-tuned language models for detecting Big Five personality traits from text. State-of-the-art accuracy on multiple benchmarks.

PyTorch Transformers BERT/RoBERTa Psychology · HR · Social Media

🔗 HuggingFace Hub →

💚

Mental Health Detection Models

Specialized LLMs for detecting mental health indicators from text and social media posts. Designed for clinical and research applications.

Deep Learning Text Classification Clinical · Research

🔗 HuggingFace Hub →

📝

IELTS & L2 Essay Grading Models ⬇ 120K+ downloads

Automated multi-dimensional essay scoring system for English language learners, covering grammar, coherence, vocabulary, and task response.

Fine-tuned LLMs Multi-dim Scoring Education · IELTS · L2

🔗 HuggingFace Hub →

✍️

English Writing Diagnostic LLMs

Generative models for providing detailed, multi-aspect feedback on English writing. Built for IELTS preparation and ESL education contexts.

Generative LLMs Educational NLP IELTS · ESL

🔧 Discourse Analysis Tools

🔄

Automatic Discourse Dependency Converter

Converts discourse dependency structures from various corpora formats. Supports cross-framework conversion between RST, PDTB, and dependency representations.

Python NLP Libraries Discourse Parsing · Multilingual

📐

Discourse Complexity Analyzer

Measures discourse complexity at both syntactic and discourse levels. Used for automated text complexity assessment and readability analysis.

Python Statistical Modeling Text Complexity

🏷️

Unified PDTB & RST Annotation Tool

Mining tool for extracting and aligning discourse relations from Penn Discourse TreeBank and Rhetorical Structure Theory corpora.

Python XML Processing Discourse Relations

🕸️

Discourse Network Visualization Toolkit

Interactive visualization for discourse networks and structural relationships. Generates network graphs and structural analyses of discourse.

Python D3.js NetworkX Visual Analysis

📏 Computational Linguistics Metrics

👁️

Attention-Aware Computational Measures Toolkit Published in Cognition & Linguistics

Advanced toolkit for computing attention-aware semantic relevance metrics. Predicts human reading behavior and cognitive processing across multiple languages.

Python PyTorch Transformers Eye-tracking · Reading · Cognition

🌍

Enhanced Computational Measures Toolkit

Comprehensive suite of enhanced computational measures for multilingual analysis, covering semantic similarity, information-theoretic metrics, and surprisal.

Python Advanced NLP Cross-linguistic Analysis

🚀 Integrated Analysis Platforms

🔬

LLM-based Automatic Linguistic Analysis Software

End-to-end synthesis software for automatic linguistic analysis using large language models. Provides a comprehensive pipeline from data ingestion to analysis output.

LLMs Multi-modal API Integration Research Pipeline

🌐

AI Agent for Literary & Scholarly Translation

AI translation agent for ZH-EN and EN-ZH literary and scholarly texts. Optimized for domain-specific vocabulary and stylistic nuance preservation.

LLMs Agentic AI Translation · DH

📖

Synthesis Pipeline for Philological Research

Automatic linguistic annotation pipeline applicable to philological research. Generates multi-layer annotations for historical and literary texts.

Python LLMs Philology · DH

📊 Databases

📅

Historical Discourse Connectives Database

Comprehensive database of discourse connective frequencies across multiple languages spanning 190 years (1820–2010).

Multiple Languages 190-year Span Historical Linguistics

🧪

Historical Psychosemantic Dimensions Database

Norms of historical psychosemantic dimensions in English for studying cognitive and semantic change over time.

English Cognitive Linguistics Semantic Change

🎵

Multilingual Onomatopoeia Sentiment Database

Sentimental properties of onomatopoeia across 28 languages for cross-linguistic sentiment and sound symbolism research.

28 Languages Sentiment Analysis Sound Symbolism

📚 Specialized Corpora

🔗

Multilingual Discourse Dependency Corpus

Balanced corpus with discourse dependency annotations across 11 languages for cross-linguistic discourse research.

11 Languages Discourse Dependencies

🇨🇳

Chinese Textual "Run-on" Sentences Corpus (CCTRS) ACL 2022

Specialized corpus with multi-layer annotations for Chinese run-on sentences, covering syntactic, semantic, and discourse layers.

Multi-layer Annotations Chinese Discourse

➖

English Hyphenated Compounds Corpus

Comprehensive corpus of English hyphenated compound words for morphological analysis and compound processing studies.

English Morphology Compounds

🏗️ Technical Stack

Languages

Python · R · PyTorch · LaTeX · Linux Shell · JavaScript · HTML

Frameworks

Transformers · Hugging Face · TensorFlow · scikit-learn · spaCy

Visualization

D3.js · Matplotlib · ggplot2 · Plotly · NetworkX

Data

SQL · NoSQL · Graph Databases · Pandas · Corpus Tools

Experimentation

Eye-tracker · EEG · fMRI · E-Prime · Online Experiments

Statistics

GAMM · Bayesian · Mixed-effects · Time-series · Causal Inference

🔗 Find My Work Online

💻GitHub @fivehills 🤗Hugging Face @KevSun 📂OSF Open Science 📊Google Scholar Citations

"All software and datasets follow open science principles and are freely available for academic research."
Interested in collaboration, custom model fine-tuning, or dataset access? Feel free to reach out.

Kevin (Kun) Sun

Software & Tools

⚙️ Software, Tools & Resources