Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node values (text) broken by sub-nodes #38

Open
helmersl opened this issue Jan 21, 2019 · 3 comments
Open

Node values (text) broken by sub-nodes #38

helmersl opened this issue Jan 21, 2019 · 3 comments

Comments

@helmersl
Copy link

Hi,

I'm using xml2json to parse publications and have the following problem:

If the text in the abstract node contains XML-tags, the text following these seub-nodes is just ignored in the JSON output, so the xml-example:

     <Abstract>
          <AbstractText>As photosynthetic prokaryotes, cyanobacteria can directly convert CO<sub>2</sub> to organic compounds and grow rapidly using sunlight as the sole source of energy. The direct biosynthesis of chemicals from CO<sub>2</sub> and sunlight in cyanobacteria is therefore theoretically more attractive than using glucose as carbon source in heterotrophic bacteria. To date, more than 20 different target chemicals have been synthesized from CO<sub>2</sub> in cyanobacteria. However, the yield and productivity of the constructed strains is about 100-fold lower than what can be obtained using heterotrophic bacteria, and only a few products reached the gram level. The main bottleneck in optimizing cyanobacterial cell factories is the relative complexity of the metabolism of photoautotrophic bacteria. In heterotrophic bacteria, energy metabolism is integrated with the carbon metabolism, so that glucose can provide both energy and carbon for the synthesis of target chemicals. By contrast, the energy and carbon metabolism of cyanobacteria are separated. First, solar energy is converted into chemical energy and reducing power via the light reactions of photosynthesis. Subsequently, CO<sub>2</sub> is reduced to organic compounds using this chemical energy and reducing power. Finally, the reduced CO<sub>2</sub> provides the carbon source and chemical energy for the synthesis of target chemicals and cell growth. Consequently, the unique nature of the cyanobacterial energy and carbon metabolism determines the specific metabolic engineering strategies required for these organisms. In this chapter, we will describe the specific characteristics of cyanobacteria regarding their metabolism of carbon and energy, summarize and analyze the specific strategies for the production of chemicals in cyanobacteria, and propose metabolic engineering strategies which may be most suitable for cyanobacteria.</AbstractText>
        </Abstract>

is converted to JSON as:

('Abstract',
                                          OrderedDict([('AbstractText',
                                                        OrderedDict([('$',
                                                                      'As photosynthetic prokaryotes, cyanobacteria can directly convert CO'),
                                                                     ('sub',
                                                                      [OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)]),
                                                                       OrderedDict([('$',
                                                                                     2)])])]))]))

Due to the sub-tags.

Is there a way to fix this problem?

Thanks!
Lea

@sanand0
Copy link
Owner

sanand0 commented Feb 10, 2019

@helmersl What would you like the output to be? Based on that, I can suggest if an alternate convention might help.

However, we may have a bigger problem. The XML <AbstractText>head<sub>text</sub>tail</AbstractText> has the following parts:

  1. tree.tag == 'AbstractText'
  2. tree.text == 'head'
  3. tree.getchildren()[0].tag == 'sub'
  4. tree.getchildren()[0].text == 'text'
  5. tree.getchildren()[0].tail == 'tail'

The last bit -- the "tail" -- is not converted to JSON in any of the conventions I know of. If we need to preserve that, we'll need to research a bit.

But for now, what would you like the output to be? Let's use that as a starting point, perhaps?

@mikessut
Copy link

My use case is very similar. Probably the most desired result would be to add a special json key (like '$') for the tail.

For my use case, I want to be able to recover the XML in its original form (for example, using the badgerfish.etree method).

@xgodon
Copy link

xgodon commented Apr 1, 2019

I have the same problem. Maybe a list alternating text and objects could provide a good solution
this is a niceproject
text: ['this is a',{b: 'nice'},'project']

Repository owner deleted a comment from javadev May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants