QuanSyn

QuanSyn：A python package for quantitative syntax analysis.

Description

QuanSyn is a Python package for Quantitative Linguistics. It provides functionality to quantify linguistic structures and explore language patterns.

This package is consisted of three main parts:

depval.py: some indicators about dependency structures and valency structures.
lawfitter.py: a small fitter for some laws in QL.
lingnet.py: a module for complex network construction.

Installation

You can install QuanSyn via pip:

pip install quansyn

nltk and conllu are required.

pip install nltk conllu

Quick Start

Here's a simple example of how to use QuanSyn:

1. depval

from quansyn.depval import DepValAnalyzer   
data = open(r'your_treebank.conllu',encoding='utf-8')
dv = DependencyAnalyzer(data) 

# dependency distance distribution
dv.dd_distribution()
# mean dependency distance of specific wordclasses
dv.mdd(pos='NOUN')
# mean dependency distance of specific dependency relations
dv.mdd(dependency='nsubj')
# proportion of dependency distance
dv.pdd()
# tree width and tree depth
dv.tree()
# tree width distirbution and tree depth distribution
dv.tree_distribution()

# mean valency
dv.mean_valency()
# valency distribution
dv.valency_distribution()
# probalistic valency pattern 
dv.pvp()

or:

dv = getDepValFeatures(data)
print(dv)

2. lawfitter

from quansyn.lawfitter import fit   
#results = fit(data,model,variant)
results = fit([[1,2,3,4,5,6],[3,4,2,6,8,15]],'zipf')
print(resluts)

3. lingnet

from quansyn.lingnet import conllu2edge
import networkx as nx   
# use a conllu file to construction a network
data = open(r'your_treebank.conllu',encoding='utf-8')
edges = conllu2edge(data,mode='dependency')
# or to construct a co-occurance network 
#edges = conllu2edge(data,mode='adjacency')
G = nx.Graph()
G.add_edges_from(edges)

# to estimate the degree exponents
degree =[i[1] for i in G.degree()]
degree_exponents = fitPowerLaw(degree)
print(degree_exponents)

Documentation

For more detailed information, please refer to the video (in Chinese).

Features

Dependency distance distribution
Mean dependency distance of specific wordclasses
Mean dependency distance of specific dependency relations
Proportion of dependency distance
Tree width and tree depth
Tree width distribution and tree depth distribution
Mean valency
Valency distribution
Probabilistic valency pattern
Law fitter
Complex network construction

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

GitHub: @YuhuYang
Email: yangmufy@163.com

Citing

If our project has been helpful to you, please give it a star and cite our articles. We would be very grateful.

@article{Yang_2022,
doi = {10.1209/0295-5075/ac8bf2},
url = {https://dx.doi.org/10.1209/0295-5075/ac8bf2},
year = {2022},
month = {sep},
publisher = {EDP Sciences, IOP Publishing and Società Italiana di Fisica},
volume = {139},
number = {6},
pages = {61002},
author = {Mu Yang and Haitao Liu},
title = {The role of syntax in the formation of scale-free language networks},
journal = {Europhysics Letters},
abstract = {The overall structure of a network is determined by its micro features, which are different in both syntactic and non-syntactic networks. However, the fact that most language networks are small-world and scale-free raises the question: does syntax play a role in forming the scale-free feature? To answer this question, we build syntactic networks and co-occurrence networks to compare the generation mechanisms of nodes, and to investigate whether syntactic and non-syntactic factors have distinct roles. The results show that frequency is the foundation of the scale-free feature, while syntax is beneficial to enhance this feature. This research introduces a microscopic approach, which may shed light on the scale-free feature of language networks.}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

QuanSyn

Description

Installation

Quick Start

1. depval

2. lawfitter

3. lingnet

Documentation

Features

License

Contact

Citing

Files

README.md

Latest commit

History

README.md

File metadata and controls

QuanSyn

Description

Installation

Quick Start

1. depval

2. lawfitter

3. lingnet

Documentation

Features

License

Contact

Citing