Extract open graph tags from an HTML document and return them in a simple JSON data structure.

Find a file

Joe Martin 239957af1a Ignore pew configuration		2022-05-10 16:22:54 -07:00
.gitignore	Ignore pew configuration	2022-05-10 16:22:54 -07:00
extract.py	Start downloading the HTML directly	2022-05-10 16:21:15 -07:00
LICENSE	Initial commit	2022-05-10 17:11:04 +00:00
README.md	Describe the tags that are being parsed out	2022-05-10 13:40:16 -07:00
requirements.txt	Start downloading the HTML directly	2022-05-10 16:21:15 -07:00

README.md

opengraph-extractor

Extract open graph tags from an HTML document and return them in a simple JSON data structure. Specifically, look for the canonical site, title, url, summary, and image.

Twitter Cards

This is using the Twitter cards markup, taken from tags looking like <meta name="twitter:site" content="@minorthoughts">.

twitter:site
twitter:title
twitter:url
twitter:description
twtter:image

Facebook Open Graph

This use the Open Graph protocol, created by Facebook. It's taken from tags looking like <meta property="og:site_name" content="Minor Thoughts"/>.

og:site_name
og:title
og:url
og:description
og:image
og:image:url
og:image:secure_url

Google+ / Schema.org

This uses the Article schema. It's taken from tags looking like <meta itemprop="name" content="New Prosecutors Are Reopening Old Cases Against Police Officers : Minor Thoughts"/>

publisher
name
headline
description
image