Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom analysis for packages which are tools #3657

Open
mit-mit opened this issue May 26, 2020 · 11 comments
Open

Custom analysis for packages which are tools #3657

mit-mit opened this issue May 26, 2020 · 11 comments
Milestone

Comments

@mit-mit
Copy link
Member

mit-mit commented May 26, 2020

Some of our analysis guidelines make no sense for packages which are tools (e.g. stagehand), for example here's an issue with a tool loosing points over not having an example file: dart-archive/stagehand#638 (comment)

@isoos
Copy link
Collaborator

isoos commented May 26, 2020

I think package:stagehand readme's "Usage" section does contain content that is similar to stereotypical package's example tab: https://pub.dev/packages/stagehand#usage
Similarly, they have an Installing section that is a subset of our Installing tab.

Maybe we should do a top-level analysis of the readme, and if the section has a recognized title (e.g. installing, setup, usage, example use) then we can use that and do not require the separate file?

@jonasfj jonasfj added this to the Backlog milestone Jun 2, 2020
@ramyak-mehra
Copy link
Contributor

I think package:stagehand readme's "Usage" section does contain content that is similar to stereotypical package's example tab: https://pub.dev/packages/stagehand#usage
Similarly, they have an Installing section that is a subset of our Installing tab.

Maybe we should do a top-level analysis of the readme, and if the section has a recognized title (e.g. installing, setup, usage, example use) then we can use that and do not require the separate file?

Hey, I would like to work on this. Could you point me in a direction from where I could get started?

@isoos
Copy link
Collaborator

isoos commented Mar 15, 2021

Hey, I would like to work on this. Could you point me in a direction from where I could get started?

@ramyak-mehra: we are using package:markdown to parse the .md file content. It has an AST, and it would be nice to process that AST to extract the hierarchical section structure of a document. From that on we could do not only table-of-contents but also this Usage extraction.

@ramyak-mehra
Copy link
Contributor

@isoos So far I have come up with something like this (not refined)

List<String> _recognizedTitles = ['installing', 'setup', 'usage', 'example'];
var document = Document(); 
var markdown = '';
var lines = markdown.replaceAll('\r\n', '\n').split('\n');  
var htmlLines = HtmlRenderer().render(document.parseLines(lines)).split('\n');
_extract(htmlLines);


bool _extract(List<String> htmlLines) {
  htmlLines.forEach((element) {
    if (_checkIfTitle(element)) {
      return true;
    }
  });
  return false;
}

bool _checkIfTitle(String content) {
  _recognizedTitles.forEach((element) {
    if (content.contains(element)) {
      return true;
    }
  });
  return false;
}

We can use this here
We should also check if the title is a heading or not probably using regex

@isoos
Copy link
Collaborator

isoos commented Mar 16, 2021

@ramyak-mehra: Code like this may be good for a large number of text content, but in general we try to recognise the structure from the parsed syntax tree. One example of such processing is the current changelog updater code:
https://github.com/dart-lang/pub-dev/blob/master/app/lib/shared/markdown.dart#L322-L358

We would like to see a generic processing similar to that, which would extract the hierarchical structure of the markdown (in typed classes), and then decide the content extraction based on that structure.

@ramyak-mehra
Copy link
Contributor

@isoos If I am understanding it correctly we should have some kind of iterable or list in the hierarchical order of the markdown which has elements in typed classes such as different classes for heading, paragraph, etc and from that, we can make the decision?

@isoos
Copy link
Collaborator

isoos commented Mar 16, 2021

@ramyak-mehra: I'm thinking more in a tree, like:

class Section {
  final int level;
  final markdown.Node titleNode;
  final List<markdown.Node> contentNodes;
  List<Section> children;
}

Maybe further methods to extract the text content of titleNode and also to format contentNodes + optionally children to HTML.

@ramyak-mehra
Copy link
Contributor

@isoos I was doing something like this github gist . Probably not the best approach and I found this node visitor
but I was not sure if its the right way to go, I explored it a bit but was unable to fully understand it.

@isoos
Copy link
Collaborator

isoos commented Mar 18, 2021

@ramyak-mehra: as a quick look, I think this code is very early stage, and possible won't handle use case like this:

## section-2

Content of section-2.

#### section-4

Content of section-4.

Which should result in the structure of:

Section(level: 2, titleNode: <... /*section-2*/ ...>, contentNodes: <...>, children: [
  Section(level: 4, titleNode: <... /*section-4*/ ...>, contentNodes: <...>),
]);

As you can see, the level is not the level of the tree node, rather the level of the section title (eg. h2 in html will be level: 2. Also the sections should contain their logical content embedded...

@ramyak-mehra
Copy link
Contributor

@ramyak-mehra: as a quick look, I think this code is very early stage, and possible won't handle use case like this:

## section-2

Content of section-2.

#### section-4

Content of section-4.

Which should result in the structure of:

Section(level: 2, titleNode: <... /*section-2*/ ...>, contentNodes: <...>, children: [
  Section(level: 4, titleNode: <... /*section-4*/ ...>, contentNodes: <...>),
]);

As you can see, the level is not the level of the tree node, rather the level of the section title (eg. h2 in html will be level: 2. Also the sections should contain their logical content embedded...

It was just a starting point for me to move forward. I have one doubt for h1 section of multiple h2s are children or h2 , h3 ,h4 ... h6 are children

@ramyak-mehra
Copy link
Contributor

@isoos wrote this script to make sections from a parsed markdown gist
This breaks when content is found before any heading. How to handle that case. Also, what would be the next steps?
Analise titleNodes on specific keywords. What would be the keywords?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants