Film dataset


This dataset includes most films and actors in Wikipedia, and couple of films in IMDB. The film graph and actor graph are extracted from DBpedia 2014 ( and YAGO 3.0.2 (, which are represented as RDF triples, consisting of a source entity, a target entity and a labeled edge. Another film JSON document is crawled from IMDB by OMDB API (

In this dataset, we present them together in the file named "mix graph". Besides, in order to present them seperately, we present all films and actors as subjects in their individual files (named as "film graph" and "actor graph") and use "-" as reversed direction of the predicate. Note that, there may exist some duplication in film graph and actor graph.

Complete Data Statistics:

RDF graph of DBpedia:
Film Actor Mix
Source nodes 85,667 86,465 488,140
Labeled edges 173 852 634
Target nodes 330,250 649,130 514,269
Triples 1,718,482 1,909,674 3,183,420
RDF graph of Yago:
Film Actor Mix
Source nodes 35,819 27,874 83,841
Labeled edges 14 37 36
Target nodes 54,269 88,294 98,490
Triples 520,520 522,800 899,288
JSON Document of IMDB:
Rows 96,781
Attributes 20

Sample Data:

<Forrest_Gump>  -<notableWork>  <Eric_Roth>
<Forrest_Gump>  -<child>    <Forrest_Gump_(character)>
<Forrest_Gump>  <budget>    "5.5E7"^^<>
<Forrest_Gump>  <cinematography>    <Don_Burgess_(cinematographer)>
<Forrest_Gump>  <country>   <United_States>
<Forrest_Gump>  <director>  <Robert_Zemeckis>
<Forrest_Gump>  <distributor>   <Paramount_Pictures>
<Forrest_Gump>  <editing>   <Arthur_Schmidt_(film_editor)>
<Forrest_Gump>  <gross> "6.77387716E8"^^<>
<Forrest_Gump>  <musicComposer> <Alan_Silvestri>
<Forrest_Gump>  <producer>  <Charles_Newirth>
<Forrest_Gump>  <producer>  <Steve_Starkey>
<Forrest_Gump>  <runtime>   "8520.0"^^<>
<Forrest_Gump>  <starring>  <Gary_Sinise>
<Forrest_Gump>  <starring>  <Mykelti_Williamson>
<Forrest_Gump>  <starring>  <Robin_Wright>
<Forrest_Gump>  <starring>  <Sally_Field>
<Forrest_Gump>  <starring>  <Tom_Hanks>
<Forrest_Gump>  <writer>    <Eric_Roth>
<Forrest_Gump>  <Work/runtime>  "142.0"^^<>
<Forrest_Gump>  <subject>   <Category:1990s_comedy-drama_films>
<Forrest_Gump>  <subject>   <Category:1994_films>
<Forrest_Gump>  <subject>   <Category:American_comedy-drama_films>
<Tom_Hanks> -<guest>    <100_(30_Rock)>
<Tom_Hanks> -<producer> <A_Hologram_for_the_King_(film)>
<Tom_Hanks> -<producer> <Cast_Away>
<Tom_Hanks> -<starring> <A_Hologram_for_the_King_(film)>
<Tom_Hanks> -<starring> <A_League_of_Their_Own>
<Tom_Hanks> -<spouse>   <Rita_Wilson>
<Tom_Hanks> -<voice>    <Sheriff_Woody>
<Tom_Hanks> <activeYearsStartYear>  "1978"^^<>
<Tom_Hanks> <birthDate> "1956-07-09"^^<>
<Tom_Hanks> <birthPlace>    <Concord,_California>
<Tom_Hanks> <birthYear> "1956"^^<>
<Tom_Hanks> <child> <Colin_Hanks>
<Tom_Hanks> <education> <California_State_University,_Sacramento>
<Tom_Hanks> <occupation>    <Tom_Hanks__1>
<Tom_Hanks> <spouse>    <Rita_Wilson>
<Tom_Hanks> <subject>   <Category:1956_births>
<Tom_Hanks> <subject>   <Category:20th-century_American_male_actors>
film.yago.graph :
<Forrest_Gump>  -<wroteMusicFor>    <Alan_Silvestri>
<Forrest_Gump>  -<created>  <Eric_Roth>
<Forrest_Gump>  -<created>  <Robert_Zemeckis>
<Forrest_Gump>  -<actedIn>  <Tom_Hanks>
<Forrest_Gump>  -<actedIn>  <Sally_Field>
<Forrest_Gump>  -<actedIn>  <Mykelti_Williamson>
<Forrest_Gump>  -<actedIn>  <Sam_Anderson>
<Forrest_Gump>  -<actedIn>  <Gary_Sinise>
<Forrest_Gump>  -<actedIn>  <Robin_Wright>
<Forrest_Gump>  <isLocatedIn>   <United_States>
<Forrest_Gump>  -<edited>   <Arthur_Schmidt_(film_editor)>
<Forrest_Gump>  -<directed> <Robert_Zemeckis>
<Forrest_Gump>  rdf:type    <wikicat_1990s_comedy_films>
<Forrest_Gump>  rdf:type    <wikicat_1994_films>
<Tom_Hanks> <actedIn>   <Toy_Story_3>
<Tom_Hanks> <actedIn>   <Toy_Story_2>
<Tom_Hanks> <wasBornIn> <Concord,_California>
<Tom_Hanks> <actedIn>   <The_Da_Vinci_Code_(film)>
<Tom_Hanks> <actedIn>   <Apollo_13_(movie)>
<Tom_Hanks> <actedIn>   <Sleepless_in_Seattle>
<Tom_Hanks> <actedIn>   <Catch_Me_If_You_Can>
<Tom_Hanks> <actedIn>   <Every_Time_We_Say_Goodbye_(film)>
<Tom_Hanks> <hasWonPrize>   <Saturn_Award>
<Tom_Hanks> <hasWonPrize>   <Golden_Globe_Award>
<Tom_Hanks> rdf:type    <wikicat_Writers_from_Los_Angeles,_California>
<Tom_Hanks> rdf:type    <wikicat_People_from_California>
<Tom_Hanks> rdf:type    <wikicat_Living_people>
<Tom_Hanks> rdf:type    <wikicat_Actors>
    "Title":"Forrest Gump",
    "Released":"06 Jul 1994",
    "Runtime":"142 min",
    "Genre":"Comedy, Drama",
    "Director":"Robert Zemeckis",
    "Writer":"Winston Groom (novel), Eric Roth (screenplay)",
    "Actors":"Tom Hanks, Rebecca Williams, Sally Field, Michael Conner Humphreys",
    "Plot":"Forrest Gump, while not intelligent, has accidentally been present at many historic moments, but his true love, Jenny Curran, eludes him.",
    "Awards":"Won 6 Oscars. Another 39 wins & 65 nominations.",

Example Queries:

Query 01 Find all slots of films which has won the American academic awards.
Query 02 Find all actors who is born in United States and has played comedy films.
Query 03 Find all films which is released before 2000 and includes Chinese actor.
Query 04 Find all actors who is born before 1980 and has played the films with at least imdbRating 8.0.
Query 05 Find all actors who is born and die in different countries and starring the films with at least imdbRating 8.0.
Query 06 Find all directors who directs the films with actors coming from more than two countries
Query 07 Find all directors who directs the at least two American academic awards films with at least imdbRating 8.0.
Query 08 Count average age of actors in a film with at least imdbRating 8.0.
Query 09 Count average imdbRating of films starring by the American academic best male actor.
Query 10 Map the film in graph to the film in JSON with their features, since there exists situations such as ambiguity.

