You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

step7_get_predicate_fragment.py 1.0 kB

12345678910111213141516171819202122232425262728293031323334353637383940414243
  1. #encoding=utf-8
  2. en2t = {}
  3. with open('input entity fragment','r') as f:
  4. for line in f:
  5. dou = line[:-1].split('\t')
  6. types = dou[1].replace('|','#').split('#')[4]
  7. typeset = types.split(',')
  8. en2t[dou[0]] = set()
  9. for t in typeset:
  10. if len(t)<6 and t!='-1' and len(t)>0:
  11. en2t[dou[0]].add(t)
  12. sen = set()
  13. lisen = {}
  14. for i in range(408261):#iterate every predicate
  15. lisen['%d'%i] = set()
  16. with open('triple file represented by ids here','r') as f:
  17. i = 1
  18. for line in f:
  19. if i%100000==0:
  20. print(i)
  21. tri = line[:-1].split('\t')
  22. if tri[0]!='-1':
  23. pre = '['+','.join(en2t[tri[0]])+']'
  24. else:
  25. pre = '[]'
  26. if tri[2]!='-1':
  27. pos = '['+','.join(en2t[tri[2]])+']\n'
  28. str = pre + '\t' + tri[1] + '\t' + pos
  29. sen.add(str)
  30. else:
  31. lisen[tri[1]].add(tri[0])
  32. for k in lisen.keys():
  33. str = '['+','.join(lisen[k])+']\t'+k+'\tliteral\n'
  34. sen.add(str)
  35. with open('output predicate fragment file','w') as f:
  36. for item in sen:
  37. f.write(item)
  38. print(len(sen))

GAnswer system is a natural language QA system developed by Institute of Computer Science & Techonology Data Management Lab, Peking University, led by Prof. Zou Lei. GAnswer is able to translate natural language questions to query graphs containing semant