QuanSyn:A python package for quantitative syntax analysis.
QuanSyn
is a Python package for Quantitative Linguistics. It provides functionality to quantify linguistic structures and explore language patterns.
This package is consisted of three main parts:
depval.py
: some indicators about dependency structures and valency structures.lawfitter.py
: a small fitter for some laws in QL.lingnet.py
: a module for complex network construction.
You can install QuanSyn
via pip:
pip install quansyn
nltk
and conllu
are required.
pip install nltk conllu
Here's a simple example of how to use QuanSyn
:
from quansyn.depval import DepValAnalyzer
data = open(r'your_treebank.conllu',encoding='utf-8')
dv = DependencyAnalyzer(data)
# dependency distance distribution
dv.dd_distribution()
# mean dependency distance of specific wordclasses
dv.mdd(pos='NOUN')
# mean dependency distance of specific dependency relations
dv.mdd(dependency='nsubj')
# proportion of dependency distance
dv.pdd()
# tree width and tree depth
dv.tree()
# tree width distirbution and tree depth distribution
dv.tree_distribution()
# mean valency
dv.mean_valency()
# valency distribution
dv.valency_distribution()
# probalistic valency pattern
dv.pvp()
or:
dv = getDepValFeatures(data)
print(dv)
from quansyn.lawfitter import fit
#results = fit(data,model,variant)
results = fit([[1,2,3,4,5,6],[3,4,2,6,8,15]],'zipf')
print(resluts)
from quansyn.lingnet import conllu2edge
import networkx as nx
# use a conllu file to construction a network
data = open(r'your_treebank.conllu',encoding='utf-8')
edges = conllu2edge(data,mode='dependency')
# or to construct a co-occurance network
#edges = conllu2edge(data,mode='adjacency')
G = nx.Graph()
G.add_edges_from(edges)
# to estimate the degree exponents
degree =[i[1] for i in G.degree()]
degree_exponents = fitPowerLaw(degree)
print(degree_exponents)
For more detailed information, please refer to the video (in Chinese).
- Dependency distance distribution
- Mean dependency distance of specific wordclasses
- Mean dependency distance of specific dependency relations
- Proportion of dependency distance
- Tree width and tree depth
- Tree width distribution and tree depth distribution
- Mean valency
- Valency distribution
- Probabilistic valency pattern
- Law fitter
- Complex network construction
This project is licensed under the MIT License - see the LICENSE file for details.
- GitHub: @YuhuYang
- Email: yangmufy@163.com
If our project has been helpful to you, please give it a star and cite our articles. We would be very grateful.
@article{Yang_2022,
doi = {10.1209/0295-5075/ac8bf2},
url = {https://dx.doi.org/10.1209/0295-5075/ac8bf2},
year = {2022},
month = {sep},
publisher = {EDP Sciences, IOP Publishing and Società Italiana di Fisica},
volume = {139},
number = {6},
pages = {61002},
author = {Mu Yang and Haitao Liu},
title = {The role of syntax in the formation of scale-free language networks},
journal = {Europhysics Letters},
abstract = {The overall structure of a network is determined by its micro features, which are different in both syntactic and non-syntactic networks. However, the fact that most language networks are small-world and scale-free raises the question: does syntax play a role in forming the scale-free feature? To answer this question, we build syntactic networks and co-occurrence networks to compare the generation mechanisms of nodes, and to investigate whether syntactic and non-syntactic factors have distinct roles. The results show that frequency is the foundation of the scale-free feature, while syntax is beneficial to enhance this feature. This research introduces a microscopic approach, which may shed light on the scale-free feature of language networks.}
}