Software & Tools

βš™οΈ Software, Tools & Resources

Committed to open science β€” building computational tools, fine-tuned models, multilingual databases, and specialized corpora for the research community. All resources are freely available for academic use.

160K+
HuggingFace Downloads
30+
Open-Source Tools
15+
Corpora & Databases
28+
Languages Supported
11+
Discourse Languages
πŸ€– Fine-tuned Large Language Models
🧠
Personality Detection LLMs ⬇ 160K+ downloads
Fine-tuned language models for detecting Big Five personality traits from text. State-of-the-art accuracy on multiple benchmarks.
PyTorch Transformers BERT/RoBERTa Psychology Β· HR Β· Social Media

πŸ”— HuggingFace Hub β†’

πŸ’š
Mental Health Detection Models
Specialized LLMs for detecting mental health indicators from text and social media posts. Designed for clinical and research applications.
Deep Learning Text Classification Clinical Β· Research

πŸ”— HuggingFace Hub β†’

πŸ“
IELTS & L2 Essay Grading Models ⬇ 120K+ downloads
Automated multi-dimensional essay scoring system for English language learners, covering grammar, coherence, vocabulary, and task response.
Fine-tuned LLMs Multi-dim Scoring Education Β· IELTS Β· L2

πŸ”— HuggingFace Hub β†’

✍️
English Writing Diagnostic LLMs
Generative models for providing detailed, multi-aspect feedback on English writing. Built for IELTS preparation and ESL education contexts.
Generative LLMs Educational NLP IELTS Β· ESL
πŸ”§ Discourse Analysis Tools
πŸ”„
Automatic Discourse Dependency Converter
Converts discourse dependency structures from various corpora formats. Supports cross-framework conversion between RST, PDTB, and dependency representations.
Python NLP Libraries Discourse Parsing Β· Multilingual
πŸ“
Discourse Complexity Analyzer
Measures discourse complexity at both syntactic and discourse levels. Used for automated text complexity assessment and readability analysis.
Python Statistical Modeling Text Complexity
🏷️
Unified PDTB & RST Annotation Tool
Mining tool for extracting and aligning discourse relations from Penn Discourse TreeBank and Rhetorical Structure Theory corpora.
Python XML Processing Discourse Relations
πŸ•ΈοΈ
Discourse Network Visualization Toolkit
Interactive visualization for discourse networks and structural relationships. Generates network graphs and structural analyses of discourse.
Python D3.js NetworkX Visual Analysis
πŸ“ Computational Linguistics Metrics
πŸ‘οΈ
Attention-Aware Computational Measures Toolkit Published in Cognition & Linguistics
Advanced toolkit for computing attention-aware semantic relevance metrics. Predicts human reading behavior and cognitive processing across multiple languages.
Python PyTorch Transformers Eye-tracking Β· Reading Β· Cognition
🌍
Enhanced Computational Measures Toolkit
Comprehensive suite of enhanced computational measures for multilingual analysis, covering semantic similarity, information-theoretic metrics, and surprisal.
Python Advanced NLP Cross-linguistic Analysis
πŸš€ Integrated Analysis Platforms
πŸ”¬
LLM-based Automatic Linguistic Analysis Software
End-to-end synthesis software for automatic linguistic analysis using large language models. Provides a comprehensive pipeline from data ingestion to analysis output.
LLMs Multi-modal API Integration Research Pipeline
🌐
AI Agent for Literary & Scholarly Translation
AI translation agent for ZH-EN and EN-ZH literary and scholarly texts. Optimized for domain-specific vocabulary and stylistic nuance preservation.
LLMs Agentic AI Translation Β· DH
πŸ“–
Synthesis Pipeline for Philological Research
Automatic linguistic annotation pipeline applicable to philological research. Generates multi-layer annotations for historical and literary texts.
Python LLMs Philology Β· DH

πŸ“Š Databases
πŸ“…
Historical Discourse Connectives Database
Comprehensive database of discourse connective frequencies across multiple languages spanning 190 years (1820–2010).
Multiple Languages 190-year Span Historical Linguistics
πŸ§ͺ
Historical Psychosemantic Dimensions Database
Norms of historical psychosemantic dimensions in English for studying cognitive and semantic change over time.
English Cognitive Linguistics Semantic Change
🎡
Multilingual Onomatopoeia Sentiment Database
Sentimental properties of onomatopoeia across 28 languages for cross-linguistic sentiment and sound symbolism research.
28 Languages Sentiment Analysis Sound Symbolism
πŸ“š Specialized Corpora
πŸ”—
Multilingual Discourse Dependency Corpus
Balanced corpus with discourse dependency annotations across 11 languages for cross-linguistic discourse research.
11 Languages Discourse Dependencies
πŸ‡¨πŸ‡³
Chinese Textual "Run-on" Sentences Corpus (CCTRS) ACL 2022
Specialized corpus with multi-layer annotations for Chinese run-on sentences, covering syntactic, semantic, and discourse layers.
Multi-layer Annotations Chinese Discourse
βž–
English Hyphenated Compounds Corpus
Comprehensive corpus of English hyphenated compound words for morphological analysis and compound processing studies.
English Morphology Compounds

πŸ—οΈ Technical Stack
Languages
Python Β· R Β· PyTorch Β· LaTeX Β· Linux Shell Β· JavaScript Β· HTML
Frameworks
Transformers Β· Hugging Face Β· TensorFlow Β· scikit-learn Β· spaCy
Visualization
D3.js Β· Matplotlib Β· ggplot2 Β· Plotly Β· NetworkX
Data
SQL Β· NoSQL Β· Graph Databases Β· Pandas Β· Corpus Tools
Experimentation
Eye-tracker Β· EEG Β· fMRI Β· E-Prime Β· Online Experiments
Statistics
GAMM Β· Bayesian Β· Mixed-effects Β· Time-series Β· Causal Inference

πŸ”— Find My Work Online
"All software and datasets follow open science principles and are freely available for academic research."
Interested in collaboration, custom model fine-tuning, or dataset access? Feel free to reach out.