secfi is a free Python library made to simplify access to SEC (U.S. Securities and Exchange Commission) filings and perform basic web scraping of the retrieved documents
- Installation
- Features
- 1.
getCiks
: Full and up to date, ticker/CIK securities dataframe (+10k tickers) - 2.
getFils
: Get a Dataframe with all SEC filings for a specific ticker - 3.
scrapLatest
: Retrieves plain text content of the latest SEC filing of a specified form type for a given company ticker - 4.
scrap
: Scrapes the raw text content of a given URL - 5.
secForms
: Provides a list of all available SEC form types - 6.
chunkText
: Splits a long text into evenly distributed chunks with overlap
- 1.
- Notes
- License
Ypu can try this in free colab:
## Installation
pip install secfi
Fetches a DataFrame of all company tickers and their corresponding Central Index Keys (CIKs).
import secfi
ciks = secfi.getCiks()
print(ciks.head())
Returns: A DataFrame with columns:
cik_str
– The raw CIK string.title
– The company name.cik
– The CIK padded to 10 digits (for SEC queries).
| ticker | cik_str | title | cik |
|--------|----------|-----------------------------|------------|
| NVDA | 1045810 | NVIDIA CORP | 0001045810 |
| AAPL | 320193 | Apple Inc. | 0000320193 |
| MSFT | 789019 | MICROSOFT CORP | 0000789019 |
| AMZN | 1018724 | AMAZON COM INC | 0001018724 |
| GOOGL | 1652044 | Alphabet Inc. | 0001652044 |
| ... | ... | ... | ... |
Fetches recent filings for a specific company by its ticker.
import secfi
filings = secfi.getFils("AAPL")
print(filings.head())
Parameters:
ticker
(str): The company's ticker symbol.
Returns: A DataFrame like:
| filingDate | reportDate | form | filmNumber | size | isXBRL | url |
|------------|------------|---------|------------|---------|--------|--------------|
| 2024-11-01 | 2024-09-30 | 10-Q | 241416538 | 9185722 | 1 | sec.gov/... |
| 2024-08-02 | 2024-06-30 | 10-Q | 241168331 | 8114974 | 1 | sec.gov/... |
| 2024-05-01 | 2024-03-31 | 10-Q | 24899170 | 7428154 | 1 | sec.gov/... |
| 2024-04-11 | 2024-05-22 | DEF 14A | 24836785 | 8289378 | 1 | sec.gov/... |
| 2024-02-02 | 2023-12-31 | 10-K | 24588330 | 12110804| 1 | sec.gov/... |
| 2023-10-27 | 2023-09-30 | 10-Q | 231351529 | 7894342 | 1 | sec.gov/... |
| ... | ... | ... | ... | ... | ... | ... |
Retrieves the textual content of the latest SEC filing of a specific form type for a given ticker.
The SEC provides 165 different types of forms. You can find the complete list in the following CSV file:
- 10-K: Annual report that provides a comprehensive overview of the company's business and financial condition.
- 10-Q: Quarterly report that includes unaudited financial statements and provides a continuing view of the company's financial position.
- 8-K: Report used to announce major events that shareholders should know about (e.g., acquisitions, leadership changes).
- S-1: Registration statement for companies planning to go public with an initial public offering (IPO).
- S-3: Registration statement for secondary offerings or resales of securities.
- DEF 14A: Proxy statement used for shareholder meetings, including executive compensation and voting matters.
- 4: Statement of changes in beneficial ownership (insider trading disclosures).
- 3: Initial statement of beneficial ownership of securities (insider ownership).
- 6-K: Report submitted by foreign private issuers to disclose information provided to their home country's regulators.
- 13D: Filing by anyone acquiring more than 5% of a company's shares, detailing their intentions.
- 6-K: Quarterly or event-specific report submitted by foreign private issuers, serving a similar role to the 10-Q for U.S. companies.
- 20-F: Annual report for foreign private issuers, equivalent to the 10-K for U.S. companies.
- 40-F: Annual report filed by certain Canadian companies under the U.S.-Canada Multijurisdictional Disclosure System.
- F-1: Registration statement for foreign companies planning an initial public offering (IPO) in the U.S.
- F-3: Registration statement for foreign companies conducting secondary offerings in the U.S.
- F-4: Registration statement for mergers, acquisitions, or business combinations involving foreign companies.
- CB: Filing required for tender offers, rights offerings, or business combinations involving foreign private issuers.
- 13F: Quarterly report by institutional investment managers disclosing equity holdings, applicable to some foreign firms.
- 11-K: Annual report for employee stock purchase, savings, and similar plans for foreign issuers.
- SD: Specialized disclosure report, often related to conflict minerals, applicable to foreign private issuers with U.S. reporting obligations.
import secfi
secfi.scrapLatest("NVDA", "10-Q")
When calling the scrapLatest("NVDA", "10-Q")
function, the returned dictionary might look like this:
{ 'filingDate': '2024-11-27', 'reportDate': '2024-11-25', 'form': '4', 'filmNumber': '', 'size': 4872, 'isXBRL': 0, 'url': 'https://www.sec.gov/Archives/edgar/data/0001045810/000104581024000318/xslF345X05/wk-form4_1732744744.xml', 'acceptanceDateTime': '2024-11-27T16:59:12.000Z', 'text': 'STATESSECURITIES AND EXCHANGE COMMISSIONWashington, D.C.\nFor the quarterly period ended October, 2024 OR TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934Commission File Number: 0-23985 NVIDIA CORPORATION(Exact name of registrant as specified in its charter) Delaware94-3177549(State or other jurisdiction of(I.R.S. Employerincorporation or organization)Identification No.)2788 San Tomas Expressway, Santa Clara, California95051(Address\xa0of principal executive offices)(Zip Code)(408) 486-2000 ....' }
Parameters:
ticker
(str): The company's ticker symbol.form
(str): The form type to retrieve (e.g., "10-K", "8-K").
Returns:
dict: A dictionary containing details of the filing, including:
filingDate
(str): The date the filing was submittedreportDate
(str): The reporting period dateform
(str): The type of SEC form (e.g., '10-K', '10-Q')filmNumber
(str): The film number associated with the filingsize
(int): The size of the filing in bytesisXBRL
(int): Whether the filing is in XBRL format (1 for yes, 0 for no)url
(str): The URL of the filingtext
(str): The text content of the filing if found and successfully scraped Otherwise, an empty string
If the specified form is not found for the given ticker, returns an empty dictionary
Scrapes the textual content of a given URL.
content = secfi.scrap("https://example.com")
print(content[:500]) # Preview the first 500 characters
Parameters:
url
(str): The URL to scrape.timeout
(int): Timeout for the HTTP request (default is 15 seconds).
Returns: The cleaned text content of the URL or an error message if the request fails.
Fetches a DataFrame of SEC forms and their details from the sec_forms.csv
file located in the info
directory.
import secfi
sec_forms = secfi.secForms()
print(sec_forms.head())
Returns: A DataFrame with columns:
Number
– The unique identifier for the form.Description
– A brief description of the form.Last Updated
– The last updated date of the form.SEC Number
– The SEC-assigned identifier for the form.Topic(s)
– Relevant topics associated with the form.link
– A direct URL to the PDF version of the form.
Number | Description | Last Updated | SEC Number | Topic(s) | link |
---|---|---|---|---|---|
1 | Application for registration or exemption from... | Feb. 1999 | SEC1935 | Self-Regulatory Organizations | |
1-A | Regulation A Offering Statement (PDF) | Sept. 2021 | SEC486 | Securities Act of 1933, Small Businesses | |
1-E | Notification under Regulation E (PDF) | Aug. 2001 | SEC1807 | Investment Company Act of 1940, Small Busin... | |
... | ... | ... | ... | ... | ... |
Splits a long text into chunks of a specified maximum length with overlap, ensuring all text is evenly distributed and the last chunk is appended to the previous one.
text
(str): The input text to split.max_length
(int, optional): The maximum length of each chunk. Defaults to 10,000.overlap
(int, optional): The number of overlapping characters between consecutive chunks. Defaults to 300.
dict
: A dictionary containing the following keys:total_chars
(int): The total number of characters in the input text.max_length_config
(int): The adjusted maximum length for each chunk after recalculation.total_chunks
(int): The total number of chunks generated.chunks
(list): A list of text chunks.
import secfi
text = """
Se cierra Armani, el taco no, hace la personal y ahi se va, se va\n
Se viene Martínez para el gol y va el tercero y va el tercero\n
Y va el tercero y gol de River gol de River goooool
"""
res = secfi.chunkText(text, max_length=120, overlap=20)
chunks - res["chunks"]
print(f"Original text chars: {res['total_chars']}")
print(f"Total chunks: {res['total_chunks']}")
for i, chunk in enumerate(chunks):
print(f"Chunk number: {i+1}\n{chunk}")
Original text chars: 190 Total chunks: 2 Chunk number: 1 Se cierra Armani, el taco no, hace la personal y ahi se va, se va Se viene Martínez para el gol y va Chunk number: 2 nez para el gol y va el tercero y va el tercero Y va el tercero y gol de River gol de River goooool ol de River goooool
- The library uses a custom
User-Agent
to comply with SEC API requirements. - Ensure that requests to the SEC website respect their usage policies and rate limits.
This project is open source and available under the MIT License.