Saturday night after not enough drinks, I came across these tweets by @LeFloatingGhost .

This definitely looks like a meme graph. We can do that too

Recorded Session
If you want to see me struggle get this going live, watch my session here

If you want to see an interactive version of this post , check it out at the Graph Gist Collection .

Find us some memes

There is this really nice CSV from Reddit of the top memes around:
https://github.com/umbrae/reddit-top-2.5-million/blob/master/data/memes.csv
We want to grab the raw URL: https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv
And grab an empty Neo4j Sandbox from http://neo4jsandbox.com .
What’s the data like? Check CSV WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row RETURN count(*); %R%P%P%P%P%P%P%P%P%P%P%U │"count(*)"│ %^%P%P%P%P%P%P%P%P%P%P%a │"1000" │ └──────────┘ WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row RETURN row limit 3;%R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"row" │ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │{"over_18":"False","name":"t3_1edsw9","permalink":"http://www.reddit.com/r/memes/comments/1edsw9/can│ │_we_please_start_a_crazy_amy_meme_for_amy_of/","url":"http://www.quickmeme.com/meme/3uer85/","domain│ │":"quickmeme.com","distinguished":null,"score":"1831","downs":"1010","link_flair_css_class":null,"su│ │breddit_id":"t5_2qjpg","thumbnail":"http://b.thumbs.redditmedia.com/qpz4enS1CCFIs8Ys.jpg","id":"1eds│ │w9","author_flair_css_class":null,"link_flair_text":null,"selftext":null,"ups":"2841","num_comments"│ │:"120","edited":"False","title":"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Compan│ │y?","created_utc":"1368627364.0","is_self":"False"} │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┤ ...
Load them memes WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url LOAD CSV WITH HEADERS FROM url AS row WITH row LIMIT 10000 CREATE (m:Meme) SET m=row // we take it all into Meme nodesAdded 100 labels, created 100 nodes, set 1700 properties, statement completed in 120 ms.
Get some memes MATCH (m:Meme) return m limit 25;
MATCH (m:Meme) return m.id, m.title limit 5;
%R%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"m.id" │"m.title" │ %^%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │"1edsw9"│"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Company?" │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1ihc34"│"Given the competitive nature of redditors, I assume you all feel the same way."│ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1gmt99"│"This man left this woman..." │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤ │"1ds9y4"│"How to cure bad breath..." │ ├────────┼────────────────────────────────────────────────────────────────────────────────┤
But we want the words !Let’s grab the first meme and get going.
Split the text into words. MATCH (m:Meme) WITH m limit 1 RETURN split(m.title, " ") as words; ["Can","We","Please","Start","a","Crazy","Amy","Meme","For","Amy","of","Amy's","Baking","Company?"] CAN YOU HEAR ME? MATCH (m:Meme) WITH m limit 1 RETURN split(toUpper(m.title), " ") as words; ["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMY'S","BAKING","COMPANY?"] Remove PunctuationCreate an array of punctuation with split on empty string.
return split(",!?'.","") as chars; [",","!","?","'","."] And replace each of the characters with nothing ” with "a?b.c,d" as word return word, reduce(s=word, c IN split(",!?'.","") | replace(s,c,'')) as no_chars;%R%P%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%U │"word" │"no_chars"│ %^%P%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%a │"a?b.c,d"│"abcd" │ └─────────┴──────────┘
We got us some nice words MATCH (m:Meme) WITH m limit 1 // lets split the text into words RETURN split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words; %R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%U │"words" │ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%a │["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMYS","BAKING","COMPANY"]│ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ Enough words, where are the nodes? Let’s create some word nodes(merge does get-or-create)
MATCH (m:Meme) WITH m limit 1 WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m MERGE (a:Word {text:words[0]}) MERGE (b:Word {text:words[1]}); Our first two words MATCH (n:Word) RETURN n;
Unwind the ra(n)ge
But we want all in the array, so let’s unwind a range.
MATCH (m:Meme) WITH m limit 1 WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]});MATCH (n:Word) RETURN n;
No Limits MATCH (m:Meme) WITH m // no limits WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]});
MATCH (n:Word) RETURN count(*);
Chain up the memesConnect the words via :NEXT and store the meme-ids on each rel in an ids property
And for the first word (idx = 0) let’s also connect the Meme node to the first Word
MATCH (m:Meme) WITH m WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx MERGE (a:Word {text:words[idx]}) MERGE (b:Word {text:words[idx+1]}) // Connect the words via :NEXT and store the meme-ids on each rel in an `ids` property MERGE (a)-[rel:NEXT]->(b) SET rel.ids = coalesce(rel.ids,[]) + [m.id] // to later recreate the meme along the next chain // connect the first word to the meme itself WITH * WHERE idx = 0 MERGE (m)-[:FIRST]->(a);Set 546 properties, created 614 relationships, statement completed in 65 ms.
Yay done! MATCH (m:Meme)-[:FIRST]->(w:Word)-[:NEXT]->(w2:Word) RETURN * LIMIT 33;
Which words appear most often MATCH (w:Word) WHERE length(w.text) > 4 RETURN w.text, size( (w)--() ) as relCount ORDER BY relCount DESC LIMIT 10;
%R%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%d%P%P%P%P%P%P%P%P%P%P%U │"w" │"relCount"│ %^%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%P%j%P%P%P%P%P%P%P%P%P%P%a │{"text":"AFTER"} │"56" │ ├──────────────────┼──────────┤ │{"text":"REDDIT"} │"34" │ ├──────────────────┼──────────┤ │{"text":"ABOUT"} │"33" │ ├──────────────────┼──────────┤ │{"text":"TODAY"} │"33" │ ├──────────────────┼──────────┤ │{"text":"SCUMBAG"}│"32" │ ├──────────────────┼──────────┤ │{"text":"EVERY"} │"31" │ ├──────────────────┼──────────┤ │{"text":"FIRST"} │"30" │ ├──────────────────┼──────────┤ │{"text":"ALWAYS"} │"28" │ ├──────────────────┼──────────┤ │{"text":"FRIEND"} │"27" │ ├──────────────────┼──────────┤ │{"text":"THOUGHT"}│"24" │ └──────────────────┴──────────┘
Now let’s find our memes again // first meme MATCH (m:Meme) WITH m limit 1 // from the :FIRST :Word follow the :NEXT chain MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() // let's follow the chain of words starting // from the meme, where all relationships contain the meme-id WHERE ALL(r in rels WHERE m.id IN r.ids) RETURN *;
Show meme by id
We can also get meme from the CSV list,
e.g. id ‘1kc9p2′ ‘As stupid as memes are they can actually make valid points’
MATCH (m:Meme) WHERE m.id = '1kc9p2' MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() WHERE ALL(r in rels WHERE m.id IN r.ids) RETURN *;
Done. Enjoy !
PS: If you want to connect your own stuff, grab a Neo4j Sandbox or use Neo4j on your machine.
If you have questions, ask me, Michael, on Twitter or on Slack