Software & Tools

I am committed to open science and have developed numerous computational tools, databases, and resources for the research community. All software and datasets are freely available for academic use.

🛠️ Software & Computational Tools

Discourse Analysis Tools

  • Automatic Discourse Dependency Converter
    A tool for converting discourse dependency structures from various discourse corpora formats
    🔧 Technologies: Python, NLP Libraries
    📊 Use Case: Discourse parsing and analysis across multiple languages
  • Discourse Complexity Analyzer
    Computational tool for measuring discourse complexity at syntactic and discourse levels
    🔧 Technologies: Python, Statistical Modeling
    📊 Use Case: Automated assessment of text complexity
  • Unified PDTB and RST Annotation Tool
    Mining tool for Penn Discourse TreeBank (PDTB) and Rhetorical Structure Theory (RST) corpora
    🔧 Technologies: Python, XML Processing
    📊 Use Case: Discourse relation annotation and extraction
  • Discourse Network Visualization Toolkit
    Interactive visualization tool for discourse networks and relationships
    🔧 Technologies: Python, D3.js, NetworkX
    📊 Use Case: Visual analysis of discourse structures

Computational Linguistics Metrics

  • Attention-Aware Computational Measures Toolkit
    Advanced toolkit for computing attention-aware metrics for multiple languages
    🔧 Technologies: Python, PyTorch, Transformers
    📊 Use Case: Predicting human reading behavior and cognitive processing
    🏆 Featured Research: Published in Cognition and Linguistics
  • Enhanced Computational Measures Toolkit
    Comprehensive suite of enhanced computational measures for multilingual analysis
    🔧 Technologies: Python, Advanced NLP
    📊 Use Case: Cross-linguistic computational analysis

Fine-tuned Large Language Models

  • Personality Detection LLMs
    Fine-tuned language models for detecting personality traits from text
    🔧 Technologies: PyTorch, Transformers, BERT/RoBERTa
    📊 Impact: 60K+ downloads in 2 months
    🔗 Available: HuggingFace Hub
  • Mental Health Detection Models
    Specialized LLMs for detecting mental health indicators from text and social media
    🔧 Technologies: Deep Learning, Text Classification
    📊 Use Case: Clinical and research applications
  • IELTS & L2 Essay Grading Models
    Automated essay scoring system for English language learners
    🔧 Technologies: Fine-tuned LLMs, Multi-dimensional Scoring
    📊 Use Case: Educational assessment and feedback
  • English Writing Diagnostic LLMs
    Generative models for providing detailed feedback on English writing
    🔧 Technologies: GPT-based Models, Educational NLP
    📊 Use Case: IELTS preparation and ESL education

Integrated Analysis Platforms

  • LLM-based Automatic Linguistic Analysis Software
    Comprehensive synthesis software for automatic linguistic analysis using large language models
    🔧 Technologies: LLMs, Multi-modal Analysis, API Integration
    📊 Use Case: End-to-end linguistic research pipeline

📊 Databases & Corpora

Historical Language Databases

  • Historical Discourse Connectives Database
    Comprehensive database of discourse connective frequencies across multiple languages (1820-2010)
    🌍 Coverage: Multiple languages, 190-year span
    📊 Applications: Historical linguistics, language evolution studies
  • Historical Psychosemantic Dimensions Database
    Norms database for historical psychosemantic dimensions in English
    📊 Applications: Cognitive linguistics, semantic change analysis
  • Multilingual Onomatopoeia Sentiment Database
    Database of sentimental properties of onomatopoeia across 28 languages
    🌍 Coverage: 28 languages
    📊 Applications: Cross-linguistic sentiment analysis

Specialized Corpora

  • English Hyphenated Compounds Corpus
    Comprehensive corpus of English hyphenated compound words
    📊 Applications: Morphological analysis, compound processing studies
  • Multilingual Discourse Dependency Corpus
    Balanced corpus with discourse dependency annotations across 11 languages
    🌍 Coverage: 11 languages
    📊 Applications: Cross-linguistic discourse analysis
  • Chinese Textual "Run-on" Sentences Corpus (CCTRS)
    Specialized corpus with multiple-layer annotations for Chinese run-on sentences
    🔧 Features: Multi-layer linguistic annotations
    📊 Applications: Chinese syntax and discourse research
    📖 Published: ACL Conference 2022

🎯 Technical Specifications

Programming Languages & Technologies

  • Advanced: Python, R, PyTorch, LaTeX, Linux Shell
  • Intermediate: JavaScript, HTML, CSS
  • Frameworks: Transformers, Hugging Face, TensorFlow, scikit-learn
  • Databases: SQL, NoSQL, Graph databases
  • Visualization: D3.js, Matplotlib, ggplot2, Plotly

Research Impact

  • 🔬 Open Source Contributions: 10+ major software packages
  • 💾 Datasets Created: 15+ specialized corpora and databases
  • 🤖 ML Models Released: 5+ fine-tuned LLMs with 60K+ downloads
  • 🌐 Multi-language Support: Tools supporting 28+ languages
  • 📈 Community Usage: Tools used by researchers worldwide

🔗 Access & Collaboration

Research Collaboration: I'm always interested in collaborating on computational linguistics projects and sharing resources with the research community. Please feel free to reach out for:

  • Access to specialized datasets and corpora
  • Collaboration on tool development
  • Custom model fine-tuning for specific research needs
  • Technical consultation on computational linguistics projects

Find My Work Online


All software and datasets are developed following open science principles and are freely available for academic research. If you use any of these resources in your research, please cite the corresponding publications.