Software & Tools
I am committed to open science and have developed numerous computational tools, databases, and resources for the research community. All software and datasets are freely available for academic use.
🛠️ Software & Computational Tools
Discourse Analysis Tools
- Automatic Discourse Dependency Converter
A tool for converting discourse dependency structures from various discourse corpora formats
🔧 Technologies: Python, NLP Libraries
📊 Use Case: Discourse parsing and analysis across multiple languages - Discourse Complexity Analyzer
Computational tool for measuring discourse complexity at syntactic and discourse levels
🔧 Technologies: Python, Statistical Modeling
📊 Use Case: Automated assessment of text complexity - Unified PDTB and RST Annotation Tool
Mining tool for Penn Discourse TreeBank (PDTB) and Rhetorical Structure Theory (RST) corpora
🔧 Technologies: Python, XML Processing
📊 Use Case: Discourse relation annotation and extraction - Discourse Network Visualization Toolkit
Interactive visualization tool for discourse networks and relationships
🔧 Technologies: Python, D3.js, NetworkX
📊 Use Case: Visual analysis of discourse structures
Computational Linguistics Metrics
- Attention-Aware Computational Measures Toolkit
Advanced toolkit for computing attention-aware metrics for multiple languages
🔧 Technologies: Python, PyTorch, Transformers
📊 Use Case: Predicting human reading behavior and cognitive processing
🏆 Featured Research: Published in Cognition and Linguistics - Enhanced Computational Measures Toolkit
Comprehensive suite of enhanced computational measures for multilingual analysis
🔧 Technologies: Python, Advanced NLP
📊 Use Case: Cross-linguistic computational analysis
Fine-tuned Large Language Models
- Personality Detection LLMs
Fine-tuned language models for detecting personality traits from text
🔧 Technologies: PyTorch, Transformers, BERT/RoBERTa
📊 Impact: 60K+ downloads in 2 months
🔗 Available: HuggingFace Hub - Mental Health Detection Models
Specialized LLMs for detecting mental health indicators from text and social media
🔧 Technologies: Deep Learning, Text Classification
📊 Use Case: Clinical and research applications - IELTS & L2 Essay Grading Models
Automated essay scoring system for English language learners
🔧 Technologies: Fine-tuned LLMs, Multi-dimensional Scoring
📊 Use Case: Educational assessment and feedback - English Writing Diagnostic LLMs
Generative models for providing detailed feedback on English writing
🔧 Technologies: GPT-based Models, Educational NLP
📊 Use Case: IELTS preparation and ESL education
Integrated Analysis Platforms
- LLM-based Automatic Linguistic Analysis Software
Comprehensive synthesis software for automatic linguistic analysis using large language models
🔧 Technologies: LLMs, Multi-modal Analysis, API Integration
📊 Use Case: End-to-end linguistic research pipeline
📊 Databases & Corpora
Historical Language Databases
- Historical Discourse Connectives Database
Comprehensive database of discourse connective frequencies across multiple languages (1820-2010)
🌍 Coverage: Multiple languages, 190-year span
📊 Applications: Historical linguistics, language evolution studies - Historical Psychosemantic Dimensions Database
Norms database for historical psychosemantic dimensions in English
📊 Applications: Cognitive linguistics, semantic change analysis - Multilingual Onomatopoeia Sentiment Database
Database of sentimental properties of onomatopoeia across 28 languages
🌍 Coverage: 28 languages
📊 Applications: Cross-linguistic sentiment analysis
Specialized Corpora
- English Hyphenated Compounds Corpus
Comprehensive corpus of English hyphenated compound words
📊 Applications: Morphological analysis, compound processing studies - Multilingual Discourse Dependency Corpus
Balanced corpus with discourse dependency annotations across 11 languages
🌍 Coverage: 11 languages
📊 Applications: Cross-linguistic discourse analysis - Chinese Textual "Run-on" Sentences Corpus (CCTRS)
Specialized corpus with multiple-layer annotations for Chinese run-on sentences
🔧 Features: Multi-layer linguistic annotations
📊 Applications: Chinese syntax and discourse research
📖 Published: ACL Conference 2022
🎯 Technical Specifications
Programming Languages & Technologies
- Advanced: Python, R, PyTorch, LaTeX, Linux Shell
- Intermediate: JavaScript, HTML, CSS
- Frameworks: Transformers, Hugging Face, TensorFlow, scikit-learn
- Databases: SQL, NoSQL, Graph databases
- Visualization: D3.js, Matplotlib, ggplot2, Plotly
Research Impact
- 🔬 Open Source Contributions: 10+ major software packages
- 💾 Datasets Created: 15+ specialized corpora and databases
- 🤖 ML Models Released: 5+ fine-tuned LLMs with 60K+ downloads
- 🌐 Multi-language Support: Tools supporting 28+ languages
- 📈 Community Usage: Tools used by researchers worldwide
🔗 Access & Collaboration
Research Collaboration: I'm always interested in collaborating on computational linguistics projects and sharing resources with the research community. Please feel free to reach out for:
- Access to specialized datasets and corpora
- Collaboration on tool development
- Custom model fine-tuning for specific research needs
- Technical consultation on computational linguistics projects
Find My Work Online
- 🐙 GitHub: @fivehills
- 🤗 Hugging Face: @KevSun
- 📊 OSF: Open Science Framework
- 📈 Google Scholar: Citation Profile
All software and datasets are developed following open science principles and are freely available for academic research. If you use any of these resources in your research, please cite the corresponding publications.