You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

step4_triple_to_number.py 1.2 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
  1. #encoding=utf-8
  2. '''
  3. Step4: transform the triples and represent entity, type and predicate with id
  4. '''
  5. eid = {}
  6. tid = {}
  7. pid = {}
  8. with open('entity id file here','r') as e:
  9. for line in e:
  10. dub = line[:-1].split('\t')
  11. eid[dub[0]] = dub[1]
  12. with open('type id file here','r') as t:
  13. for line in t:
  14. dub = line[:-1].split('\t')
  15. tid[dub[0]] = dub[1]
  16. with open('predicate id file here','r') as p:
  17. for line in p:
  18. dub = line[:-1].split('\t')
  19. pid[dub[0]] = dub[1]
  20. print("%d %d %d"%(len(eid),len(tid),len(pid)))
  21. rt = open("output triple file here",'w')
  22. with open('input triple file here','r') as f:
  23. i = 1;
  24. for line in f:
  25. tri = line[:-2].split('\t')
  26. if tri[1] == '<type>':
  27. if not tid.has_key(tri[2]):
  28. tid[tri[2]] = '-1'
  29. try:
  30. rt.write("%s\t%s\t%s\n"%(eid[tri[0]],pid[tri[1]],tid[tri[2]]))
  31. except KeyError:
  32. print(line)
  33. print(i)
  34. else:
  35. if tri[2][0]=='"':
  36. try:
  37. rt.write("%s\t%s\t-1\n"%(eid[tri[0]],pid[tri[1]]))
  38. except KeyError:
  39. print(line)
  40. print(i)
  41. else:
  42. try:
  43. rt.write("%s\t%s\t%s\n"%(eid[tri[0]],pid[tri[1]],eid[tri[2]]))
  44. except KeyError:
  45. print(line)
  46. print(i)

GAnswer system is a natural language QA system developed by Institute of Computer Science & Techonology Data Management Lab, Peking University, led by Prof. Zou Lei. GAnswer is able to translate natural language questions to query graphs containing semant