Extract open graph tags from an HTML document and return them in a simple JSON data structure.
.gitignore | ||
extract.py | ||
LICENSE | ||
README.md | ||
requirements.txt |
opengraph-extractor
Extract open graph tags from an HTML document and return them in a simple JSON data structure. Specifically, look for the canonical site, title, url, summary, and image.
Twitter Cards
This is using the Twitter cards markup, taken from tags looking like <meta name="twitter:site" content="@minorthoughts">
.
- twitter:site
- twitter:title
- twitter:url
- twitter:description
- twtter:image
Facebook Open Graph
This use the Open Graph protocol, created by Facebook. It's taken from tags looking like <meta property="og:site_name" content="Minor Thoughts"/>
.
- og:site_name
- og:title
- og:url
- og:description
- og:image
- og:image:url
- og:image:secure_url
Google+ / Schema.org
This uses the Article schema. It's taken from tags looking like <meta itemprop="name" content="New Prosecutors Are Reopening Old Cases Against Police Officers : Minor Thoughts"/>
- publisher
- name
- headline
- description
- image