Software & Tools
βοΈ Software, Tools & Resources
Committed to open science β building computational tools, fine-tuned models, multilingual databases, and specialized corpora for the research community. All resources are freely available for academic use.
160K+
HuggingFace Downloads
30+
Open-Source Tools
15+
Corpora & Databases
28+
Languages Supported
11+
Discourse Languages
π€ Fine-tuned Large Language Models
Personality Detection LLMs β¬ 160K+ downloads
Fine-tuned language models for detecting Big Five personality traits from text. State-of-the-art accuracy on multiple benchmarks.
π HuggingFace Hub β
Mental Health Detection Models
Specialized LLMs for detecting mental health indicators from text and social media posts. Designed for clinical and research applications.
π HuggingFace Hub β
IELTS & L2 Essay Grading Models β¬ 120K+ downloads
Automated multi-dimensional essay scoring system for English language learners, covering grammar, coherence, vocabulary, and task response.
π HuggingFace Hub β
English Writing Diagnostic LLMs
Generative models for providing detailed, multi-aspect feedback on English writing. Built for IELTS preparation and ESL education contexts.
π§ Discourse Analysis Tools
Automatic Discourse Dependency Converter
Converts discourse dependency structures from various corpora formats. Supports cross-framework conversion between RST, PDTB, and dependency representations.
Discourse Complexity Analyzer
Measures discourse complexity at both syntactic and discourse levels. Used for automated text complexity assessment and readability analysis.
Unified PDTB & RST Annotation Tool
Mining tool for extracting and aligning discourse relations from Penn Discourse TreeBank and Rhetorical Structure Theory corpora.
Discourse Network Visualization Toolkit
Interactive visualization for discourse networks and structural relationships. Generates network graphs and structural analyses of discourse.
π Computational Linguistics Metrics
Attention-Aware Computational Measures Toolkit Published in Cognition & Linguistics
Advanced toolkit for computing attention-aware semantic relevance metrics. Predicts human reading behavior and cognitive processing across multiple languages.
Enhanced Computational Measures Toolkit
Comprehensive suite of enhanced computational measures for multilingual analysis, covering semantic similarity, information-theoretic metrics, and surprisal.
π Integrated Analysis Platforms
LLM-based Automatic Linguistic Analysis Software
End-to-end synthesis software for automatic linguistic analysis using large language models. Provides a comprehensive pipeline from data ingestion to analysis output.
AI Agent for Literary & Scholarly Translation
AI translation agent for ZH-EN and EN-ZH literary and scholarly texts. Optimized for domain-specific vocabulary and stylistic nuance preservation.
Synthesis Pipeline for Philological Research
Automatic linguistic annotation pipeline applicable to philological research. Generates multi-layer annotations for historical and literary texts.
π Databases
Historical Discourse Connectives Database
Comprehensive database of discourse connective frequencies across multiple languages spanning 190 years (1820β2010).
Historical Psychosemantic Dimensions Database
Norms of historical psychosemantic dimensions in English for studying cognitive and semantic change over time.
Multilingual Onomatopoeia Sentiment Database
Sentimental properties of onomatopoeia across 28 languages for cross-linguistic sentiment and sound symbolism research.
π Specialized Corpora
Multilingual Discourse Dependency Corpus
Balanced corpus with discourse dependency annotations across 11 languages for cross-linguistic discourse research.
Chinese Textual "Run-on" Sentences Corpus (CCTRS) ACL 2022
Specialized corpus with multi-layer annotations for Chinese run-on sentences, covering syntactic, semantic, and discourse layers.
English Hyphenated Compounds Corpus
Comprehensive corpus of English hyphenated compound words for morphological analysis and compound processing studies.
ποΈ Technical Stack
Languages
Python Β· R Β· PyTorch Β· LaTeX Β· Linux Shell Β· JavaScript Β· HTML
Frameworks
Transformers Β· Hugging Face Β· TensorFlow Β· scikit-learn Β· spaCy
Visualization
D3.js Β· Matplotlib Β· ggplot2 Β· Plotly Β· NetworkX
Data
SQL Β· NoSQL Β· Graph Databases Β· Pandas Β· Corpus Tools
Experimentation
Eye-tracker Β· EEG Β· fMRI Β· E-Prime Β· Online Experiments
Statistics
GAMM Β· Bayesian Β· Mixed-effects Β· Time-series Β· Causal Inference
π Find My Work Online
"All software and datasets follow open science principles and are freely available for academic research."
Interested in collaboration, custom model fine-tuning, or dataset access? Feel free to reach out.
Interested in collaboration, custom model fine-tuning, or dataset access? Feel free to reach out.