Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] POC: openbb-store OBBject Extension For Data/Python Object Interchange #6509

Draft
wants to merge 43 commits into
base: develop
Choose a base branch
from

Conversation

deeleeramone
Copy link
Contributor

@deeleeramone deeleeramone commented Jun 18, 2024

This is a WIP and POC, feedback is welcome.

The goal is to facilitate data and Python object interchange, particularly over networks, within the OpenBB Platform ecosystem.

Example Use-Case:

  • Run a script to collect data for a ticker hitting a number of endpoints.
  • Add each item, raw OBBject response or a filtered subset, to the Store.
  • Export the collection as a file to share, use later, or transport across a network.

Simple Use-Case:

  • Store lists of symbols to be used as function inputs, i.e, watchlist.

Below is pasted from the README.md file.

OBBject Store Extension

openbb-store is an OBBject extension for storing and retrieving OBBjects, Data, DataFrames, dictionaries, lists, and strings.

Each entry is stored as a compressed pickle, with SHA1 signature, using the LZMA module with the "xz" algorithm set to maximum compression.

Installation

Install this extension by navigating into the directory and entering:

pip install -e .

Then, rebuild the Python interface:

python -c "import openbb;openbb.build()"

Store Class

Within the OpenBB Platform, the extension acts as a Global class with methods to add, retrieve, and save groups of data objects to memory or file in a transportable and compressed format.

When used as standalone, the user_data_directory property (preference) should be set to the desired read/write directory
upon initialization. Alternatively, specify the complete path to the file when using the IO methods' filename parameter.

Usage

Every output from the OpenBB Platform Python interface will have the store attribute.

Supported Data Types

The following is a list of supported data objects:

  • OBBject
  • Data (generic OpenBB Data class)
  • DataFrame
  • List
  • Dictionary
  • String

The contents of any object being added must be serializable.

Add Data

from openbb import obb

data = obb.equity.price.historical("NVDA", provider="yfinance", start_date="2023-01-01", end_date="2023-12-31")
data.store.add_store(data=data, name="nvda2023")

A confirmation will display unless the "verbose" property is set to False.

"Data store 'nvda2023' added successfully."

Additonal data can be added to the collection, and then exported as a single package.

data = obb.equity.fundamental.metrics("NVDA", provider="yfinance")
data.store.add_store(data = data.to_df().set_index("symbol").T, name="nvdaMetrics", description="Key Valuation Metrics for NVDA.")
"Data store 'nvdaMetrics' added successfully."

Directory Of Objects

An inventory of stored objects is displayed with the 'directory' property.

data.store.directory
{'nvda2023': {'description': None,
  'data_class': 'OBBject',
  'schema_preview': "{'length': 250, 'fields_set': ['open', 'high', 'low', 'close', 'volume', 'split_..."},
 'nvdaMetrics': {'description': 'Key Valuation Metrics for NVDA.',
  'data_class': 'DataFrame',
  'schema_preview': "{'length': 34, 'width': 1, 'columns': Index(['NVDA'], dtype='object', name='symb..."}}

Schemas

Metadata related to the schema are stored independent of the actual data store.
Schemas are retrieved with the get_schema method, using the assigned 'name' as the key.

Example DataFrame schema:

data.store.get_schema("nvdaMetrics")
{'length': 34,
 'width': 1,
 'columns': Index(['NVDA'], dtype='object', name='symbol'),
 'index': Index(['market_cap', 'pe_ratio', 'forward_pe', 'peg_ratio', 'peg_ratio_ttm',
        'enterprise_to_ebitda', 'earnings_growth', 'earnings_growth_quarterly',
        'revenue_per_share', 'revenue_growth', 'enterprise_to_revenue',
        'quick_ratio', 'current_ratio', 'debt_to_equity', 'gross_margin',
        'operating_margin', 'ebitda_margin', 'profit_margin',
        'return_on_assets', 'return_on_equity', 'dividend_yield',
        'dividend_yield_5y_avg', 'payout_ratio', 'book_value', 'price_to_book',
        'enterprise_value', 'overall_risk', 'audit_risk', 'board_risk',
        'compensation_risk', 'shareholder_rights_risk', 'beta',
        'price_return_1y', 'currency'],
       dtype='object'),
 'types_map': symbol
 NVDA    object
 dtype: object}

Example Pydantic model schema:

data.store.get_schema("nvda2023")
{'length': 250,
 'fields_set': ['open',
  'high',
  'low',
  'close',
  'volume',
  'split_ratio',
  'dividend'],
 'data_model': {'additionalProperties': True,
  'description': 'Yahoo Finance Equity Historical Price Data.',
  'properties': {'date': {'anyOf': [{'format': 'date', 'type': 'string'},
     {'format': 'date-time', 'type': 'string'}],
    'description': 'The date of the data.',
    'title': 'Date'},
   'open': {'description': 'The open price.',
    'title': 'Open',
    'type': 'number'},
   'high': {'description': 'The high price.',
    'title': 'High',
    'type': 'number'},
   'low': {'description': 'The low price.', 'title': 'Low', 'type': 'number'},
   'close': {'description': 'The close price.',
    'title': 'Close',
    'type': 'number'},
   'volume': {'anyOf': [{'type': 'number'},
     {'type': 'integer'},
     {'type': 'null'}],
    'default': None,
    'description': 'The trading volume.',
    'title': 'Volume'},
   'vwap': {'anyOf': [{'type': 'number'}, {'type': 'null'}],
    'default': None,
    'description': 'Volume Weighted Average Price over the period.',
    'title': 'Vwap'},
   'split_ratio': {'anyOf': [{'type': 'number'}, {'type': 'null'}],
    'default': None,
    'description': 'Ratio of the equity split, if a split occurred.',
    'title': 'Split Ratio'},
   'dividend': {'anyOf': [{'type': 'number'}, {'type': 'null'}],
    'default': None,
    'description': 'Dividend amount (split-adjusted), if a dividend was paid.',
    'title': 'Dividend'}},
  'required': ['date', 'open', 'high', 'low', 'close'],
  'title': 'YFinanceEquityHistoricalData',
  'type': 'object'},
 'created_at': '2024-06-18 13:08:44.778360',
 'uid': '06671e94-d271-7d4f-8000-43094acbb703'}

Restore Data

Restore data from the Store extension by using the get_store method. The archive is validated against a signature before opening.

data.store.get_store("nvdaMetrics")
index NVDA
market_cap 3335037648896.0
pe_ratio 79.28655
forward_pe 37.661114
peg_ratio 1.04
peg_ratio_ttm 1.5532
enterprise_to_ebitda 67.277
earnings_growth 6.5
earnings_growth_quarterly 6.284
revenue_per_share 3.234
revenue_growth 2.621
enterprise_to_revenue 41.556
quick_ratio 2.877
current_ratio 3.529
debt_to_equity 22.866
gross_margin 0.75286
operating_margin 0.64925003
ebitda_margin 0.61768
profit_margin 0.53398
return_on_assets 0.49103
return_on_equity 1.15658
dividend_yield 0.00029999999
dividend_yield_5y_avg 0.0012
payout_ratio 0.0094
book_value 1.998
price_to_book 67.85786
enterprise_value 3315066994688
overall_risk 7.0
audit_risk 7.0
board_risk 10.0
compensation_risk 1.0
shareholder_rights_risk 6.0
beta 1.694
price_return_1y 2.149727
currency USD

When the stored object is an instance of OBBject, the element to retrieve can be isolated with the element parameter.
By default, it is "dataframe". When set as "OBBject", the object is restored in its original form.

data.store.get_store("nvda2023", element="OBBject")
OBBject

id: 06671e94-d271-7d4f-8000-43094acbb703
results: [{'date': datetime.date(2023, 1, 3), 'open': 14.85099983215332, 'high': 14...
provider: yfinance
warnings: None
chart: None
extra: {'metadata': {'arguments': {'provider_choices': {'provider': 'yfinance'}, 's...

Exporting/Importing

Any item(s) loaded into the extension can be exported to file as a ".xz" archive.
A list of "names" isolates specific objects for writing to disk. Without supplying names,
all entries are exported.

data.store.save_store_to_file(filename="nvda")

Importing works the same way, and a list of "names" can also be included to load only the desired elements.

data.store.load_store_from_file(filename="nvda")

The default path can be overridden by including the complete path, beginning with "/", in the filename.
Do not include the file extension with the name.

@deeleeramone deeleeramone added enhancement Enhancement platform OpenBB Platform v4 PRs for v4 labels Jun 18, 2024
@deeleeramone deeleeramone marked this pull request as draft June 18, 2024 20:47
@piiq
Copy link
Contributor

piiq commented Jun 19, 2024

Hey

First of all this might be the best feature PR description in the history of this repo. Thank you for that.

After reading this description I would like to clarify a few things:

  1. The examples here show how you add data to the store of an obbject that you got in response after using a command. When restoring old stores, where do i get the obbject from?
  2. What is the core differentiator or value proposition of using the store versus aggregating .to_json() of the results into a dictionary and then saving it as a json file or saving the results into separate sheets of an excel workbook?

@deeleeramone
Copy link
Contributor Author

1. The examples here show how you add data to the store of an obbject that you got in response after using a command. When restoring old stores, where do i get the obbject from?

It can come from memory or file. Only files that have been exported can be loaded back in. When an OBBject is stored, the entire class is pickled, it is restored by validating against the original signature and then OBBject.model_validate(restored_obbject)

2. What is the core differentiator or value proposition of using the store versus aggregating  .to_json() of the results into a dictionary and then saving it as a json file or saving the results into separate sheets of an excel workbook?

A major differentiation between aggregating .to_json() is that this uses Bytes and not String as a buffer/IO. Additionally, all of the logic is applied and one-liners are all that is required to dump/load collections.

The ability to 'bundle' various objects together as a single export, and maintain the state of an OBBject - with chart and methods etc - is another difference. LLMs could be fed context through curated stores, which can support function calling.

It is not an equivalent to saving the results into an excel workbook, but with Python in Excel, you could unpack the compressed store and access all the original Python objects.

Additionally, schemas for non-OBBject objects will be generated, with a map of {field:type} and dimensions.

The essence is to be a gateway between deployed applications and the Platform.

@piiq
Copy link
Contributor

piiq commented Jun 19, 2024

It can come from memory or file. Only files that have been exported can be loaded back in. When an OBBject is stored, the entire class is pickled, it is restored by validating against the original signature and then OBBject.model_validate(restored_obbject)

Can you show a command sequence starting with launching python to loading a store

I understood your rationale for 2. In a nutshell it's storing binary data vs text

@deeleeramone
Copy link
Contributor Author

deeleeramone commented Jun 19, 2024

Can you show a command sequence starting with launching python to loading a store

Yes, in the screenshot below, I have 2 environments. One is my regular OpenBB dev environment, the other is a brand new one with only openbb-core, openbb-fmp, openbb-store installed as packages. The environment on the right does not have the provider interface or any routers, it is just the bare packages.

On the left, I have assembled the three financial statements as a single archive, and then exported it to my OpenBBUserData folder.

On the right, I have loaded the file using the Store class directly - which also makes it operate as a local variable instead of global - and then unpacked the balance sheet item and applied the to_df() method.

from openbb import obb

balance_data = obb.equity.fundamental.balance("NVDA", provider="fmp", period="quarter")

# Assign it to make less keystrokes.
store = balance_data.store

store.add_store(data=balance_data, name="balance", description="NVDA Quarterly Balance Sheet Statements")
cash_data = obb.equity.fundamental.cash("NVDA", provider="fmp", period="quarter")
store.add_store(data=cash_data, name="cash", description="NVDA Quarterly Cash Flow Statements")
income_data = obb.equity.fundamental.income("NVDA", provider="fmp", period="quarter")
store.add_store(data=income_data, name="income", description="NVDA Quarterly Income Statements")
store.save_store_to_file("nvda_financials")

Then on the importing side:

from openbb_store.store import Store

store = Store()
# Use the full path to the file in standalone mode.
store.load_store_from_file("/Users/danglewood/NewOpenBBUserData/stores/nvda_financials")
balance_data = store.get_store("balance", element='OBBject')

Screenshot 2024-06-19 at 1 50 52 PM

@piiq
Copy link
Contributor

piiq commented Jun 20, 2024

Then on the importing side:

This is very helpful, thank you.

Please consider allowing creation of a store without the need to pre-initialize an empty instance. Like using either store = Store(file="my/file/path") or a classmethod like store = Store.from_file(path="my/file/path")

@deeleeramone
Copy link
Contributor Author

Please consider allowing creation of a store without the need to pre-initialize an empty instance. Like using either store = Store(file="my/file/path") or a classmethod like store = Store.from_file(path="my/file/path")

Like so?

Screenshot 2024-06-20 at 10 18 13 PM

Copy link
Contributor

@hjoaquim hjoaquim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should live under the Platform repo (vs another repo like openbb-forecast vs personal repo); but if we agree it's the right place, everything here looks good to me.

@IgorWounds
Copy link
Contributor

I think that this PR should be under its separate repo and as its package. CC: @piiq

@deeleeramone
Copy link
Contributor Author

I think that this PR should be under its separate repo and as its package. CC: @piiq

I have a couple more design considerations before I'd call it "ready". I can move it to another repo when I cross that milestone, will leave open for reference in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement platform OpenBB Platform v4 PRs for v4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants