The Reddit Meme Graph with Neo4j

Saturday night after not enough drinks, I came across these tweets by @LeFloatingGhost .

This definitely looks like a meme graph. We can do that too
The Reddit Meme Graph with Neo4j

Recorded Session

If you want to see me struggle get this going live, watch my session here

If you want to see an interactive version of this post , check it out at the Graph Gist Collection .

Find us some memes
The Reddit Meme Graph with Neo4j

There is this really nice CSV from Reddit of the top memes around:

https://github.com/umbrae/reddit-top-2.5-million/blob/master/data/memes.csv

We want to grab the raw URL: https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv

And grab an empty Neo4j Sandbox from http://neo4jsandbox.com .

What’s the data like? Check CSV WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row RETURN count(*); %R%P%P%P%P%P%P%P%P%P%P%U │"count(*)"│ %^%P%P%P%P%P%P%P%P%P%P%a │"1000" │ └──────────┘ WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row RETURN row limit 3;

%R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"row" │ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │{"over_18":"False","name":"t3_1edsw9","permalink":"http://www.reddit.com/r/memes/comments/1edsw9/can│ │_we_please_start_a_crazy_amy_meme_for_amy_of/","url":"http://www.quickmeme.com/meme/3uer85/","domain│ │":"quickmeme.com","distinguished":null,"score":"1831","downs":"1010","link_flair_css_class":null,"su│ │breddit_id":"t5_2qjpg","thumbnail":"http://b.thumbs.redditmedia.com/qpz4enS1CCFIs8Ys.jpg","id":"1eds│ │w9","author_flair_css_class":null,"link_flair_text":null,"selftext":null,"ups":"2841","num_comments"│ │:"120","edited":"False","title":"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Compan│ │y?","created_utc":"1368627364.0","is_self":"False"} │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┤ ...

Load them memes WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row WITH row LIMIT 10000 CREATE (m:Meme) SET m=row // we take it all into Meme nodes

Added 100 labels, created 100 nodes, set 1700 properties, statement completed in 120 ms.

Get some memes MATCH (m:Meme) return m limit 25;
The Reddit Meme Graph with Neo4j

MATCH (m:Meme) return m.id, m.title limit 5;

%R%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"m.id" │"m.title" │ %^%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │"1edsw9"│"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Company?" │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1ihc34"│"Given the competitive nature of redditors, I assume you all feel the same way."│ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1gmt99"│"This man left this woman..." │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1ds9y4"│"How to cure bad breath..." │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤

But we want the words !

Let’s grab the first meme and get going.

Split the text into words. MATCH (m:Meme) WITH m limit 1 RETURN split(m.title, " ") as words; ["Can","We","Please","Start","a","Crazy","Amy","Meme","For","Amy","of","Amy's","Baking","Company?"] CAN YOU HEAR ME? MATCH (m:Meme) WITH m limit 1 RETURN split(toUpper(m.title), " ") as words; ["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMY'S","BAKING","COMPANY?"] Remove Punctuation

Create an array of punctuation with split on empty string.

return split(",!?'.","") as chars; [",","!","?","'","."] And replace each of the characters with nothing ” with "a?b.c,d" as word return word, reduce(s=word, c IN split(",!?'.","") | replace(s,c,'')) as no_chars;

%R%P%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%U │"word" │"no_chars"│ %^%P%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%a │"a?b.c,d"│"abcd" │ └─────────┴──────────┘

We got us some nice words MATCH (m:Meme) WITH m limit 1 // lets split the text into words RETURN split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words; %R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"words" │ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMYS","BAKING","COMPANY"]│ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ Enough words, where are the nodes? Let’s create some word nodes

(merge does get-or-create)

MATCH (m:Meme) WITH m limit 1 WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m MERGE (a:Word {text:words[0]}) MERGE (b:Word {text:words[1]}); Our first two words MATCH (n:Word) RETURN n;
The Reddit Meme Graph with Neo4j

Unwind the ra(n)ge

But we want all in the array, so let’s unwind a range.

MATCH (m:Meme) WITH m limit 1 WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]});

MATCH (n:Word) RETURN n;

No Limits MATCH (m:Meme) WITH m // no limits WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]});
The Reddit Meme Graph with Neo4j

MATCH (n:Word) RETURN count(*);

Chain up the memes

Connect the words via :NEXT and store the meme-ids on each rel in an ids property

And for the first word (idx = 0) let’s also connect the Meme node to the first Word

MATCH (m:Meme) WITH m WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]}) // Connect the words via :NEXT and store the meme-ids on each rel in an `ids` property MERGE (a)-[rel:NEXT]->(b) SET rel.ids = coalesce(rel.ids,[]) + [m.id] // to later recreate the meme along the next chain // connect the first word to the meme itself WITH * WHERE idx = 0 MERGE (m)-[:FIRST]->(a);

Set 546 properties, created 614 relationships, statement completed in 65 ms.

Yay done! MATCH (m:Meme)-[:FIRST]->(w:Word)-[:NEXT]->(w2:Word) RETURN * LIMIT 33;
The Reddit Meme Graph with Neo4j

Which words appear most often MATCH (w:Word) WHERE length(w.text) > 4 RETURN w.text, size( (w)--() ) as relCount ORDER BY relCount DESC LIMIT 10;

%R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%U │"w" │"relCount"│ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%a │{"text":"AFTER"} │"56" │ ├──────────────────┼──────────┤ │{"text":"REDDIT"} │"34" │ ├──────────────────┼──────────┤ │{"text":"ABOUT"} │"33" │ ├──────────────────┼──────────┤ │{"text":"TODAY"} │"33" │ ├──────────────────┼──────────┤ │{"text":"SCUMBAG"}│"32" │ ├──────────────────┼──────────┤ │{"text":"EVERY"} │"31" │ ├──────────────────┼──────────┤ │{"text":"FIRST"} │"30" │ ├──────────────────┼──────────┤ │{"text":"ALWAYS"} │"28" │ ├──────────────────┼──────────┤ │{"text":"FRIEND"} │"27" │ ├──────────────────┼──────────┤ │{"text":"THOUGHT"}│"24" │ └──────────────────┴──────────┘

Now let’s find our memes again // first meme MATCH (m:Meme) WITH m limit 1 // from the :FIRST :Word follow the :NEXT chain MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() // let's follow the chain of words starting // from the meme, where all relationships contain the meme-id WHERE ALL(r in rels WHERE m.id IN r.ids) RETURN *;
The Reddit Meme Graph with Neo4j

Show meme by id

We can also get meme from the CSV list,

e.g. id ‘1kc9p2′ ‘As stupid as memes are they can actually make valid points’

MATCH (m:Meme) WHERE m.id = '1kc9p2' MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() WHERE ALL(r in rels WHERE m.id IN r.ids) RETURN *;
The Reddit Meme Graph with Neo4j

Done. Enjoy !

PS: If you want to connect your own stuff, grab a Neo4j Sandbox or use Neo4j on your machine.

If you have questions, ask me, Michael, on Twitter or on Slack

The Reddit Meme Graph with Neo4j

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎