Image-stitching and virtual tour solutions My account Updates
It is currently Mon Sep 22, 2014 9:17 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Thu Dec 26, 2013 11:00 am 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
The stats from my webhotel I see (for the first time) that google is systematic collecting tiles from my panotour panos.
Are they decoding the XML's ? Can they possible put it together?
Do I have to protect the panos from google ?
Anybody.

leifs


Attachments:
google.jpg
google.jpg [ 449.01 KiB | Viewed 465 times ]
Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 3:49 pm 
Offline
Member

Joined: Mon Jul 19, 2010 8:53 pm
Posts: 2114
leifs wrote:
The stats from my webhotel I see (for the first time) that google is systematic collecting tiles from my panotour panos.
Are they decoding the XML's ? Can they possible put it together?
Do I have to protect the panos from google ?
Anybody.

leifs


They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 4:33 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
HansKeesom wrote:
They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.


There are no links to the tiles directly. They are jpg's ok, but the name is produced from the XML's by the krpano viewer.
The robot.txt is for denying access to a directory. I don't mind the bots to index my thumbnails etc, but I have not seen them systematicly grab the tiles.
For now, while thinking, I've denied the two ip-addresses (google-image bots) access to my site by using .htaccess. Google and the others can index my site as before.

as seen below there are quite some robots visiting the site. this is for december until now.

leifs


Attachments:
spiders.jpg
spiders.jpg [ 326.59 KiB | Viewed 446 times ]
Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 4:38 pm 
Offline
Member

Joined: Mon Jul 19, 2010 8:53 pm
Posts: 2114
leifs wrote:
HansKeesom wrote:
They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.



There are no links to the tiles directly. They are jpg's ok, but the name is produced from the XML's by the krpano viewer.
The robot.txt is for denying access to a directory. I don't mind the bots to index my thumbnails etc, but I have not seen them systematicly grab the tiles.
For now, while thinking, I've denied the two ip-addresses (google-image bots) access to my site by using .htaccess. Google and the others can index my site as before.

as seen below there are quite some robots visiting the site. this is for december until now.

leifs




So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 5:02 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
HansKeesom wrote:
So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?


They are grabbing the XML's. But there are no explicit filenames to grab there. The filenames are produced from the XML's. This is how the name of the thousands jpg's are presented in the virtualtour.XML

<level tiledimagewidth="898" tiledimageheight="898">
<front url="froystadtua_sphere_64/0/0/%v_%u.jpg"/>
<right url="froystadtua_sphere_64/1/0/%v_%u.jpg"/>
<back url="froystadtua_sphere_64/2/0/%v_%u.jpg"/>
<left url="froystadtua_sphere_64/3/0/%v_%u.jpg"/>
<up url="froystadtua_sphere_64/4/0/%v_%u.jpg"/>
<down url="froystadtua_sphere_64/5/0/%v_%u.jpg"/>
</level>

%v and %u are counters which increase from zero to a maximum integer found or calculated from somewhere else in the XML's.
For me it looks like Google-image has reverse engineered the way krpano make tiles and has the ambition to download all the tiles, for maybe later to put them together and so get the original image. They have cooperated with NSA on other issues, so this is probably a piece of cake when you have this kind of resources.

leifs


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 5:08 pm 
Offline
Member

Joined: Mon Jul 19, 2010 8:53 pm
Posts: 2114
leifs wrote:
HansKeesom wrote:
So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?


They are grabbing the XML's. But there are no explicit filenames to grab there. The filenames are produced from the XML's. This is how the name of the thousands jpg's are presented in the virtualtour.XML

<level tiledimagewidth="898" tiledimageheight="898">
<front url="froystadtua_sphere_64/0/0/%v_%u.jpg"/>
<right url="froystadtua_sphere_64/1/0/%v_%u.jpg"/>
<back url="froystadtua_sphere_64/2/0/%v_%u.jpg"/>
<left url="froystadtua_sphere_64/3/0/%v_%u.jpg"/>
<up url="froystadtua_sphere_64/4/0/%v_%u.jpg"/>
<down url="froystadtua_sphere_64/5/0/%v_%u.jpg"/>
</level>

%v and %u are counters which increase from zero to a maximum integer found or calculated from somewhere else in the XML's.
For me it looks like Google-image has reverse engineered the way krpano make tiles and has the ambition to download all the tiles, for maybe later to put them together and so get the original image. They have cooperated with NSA on other issues, so this is probably a piece of cake when you have this kind of resources.

leifs


The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 5:26 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
HansKeesom wrote:
The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.


Sure ? When I look at the source of the html i've got from the server there is no filename.jpg there. Where is it written on the client, so that google-image can extract jpg filenames from it ?

leifs


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 5:51 pm 
Offline
Member

Joined: Mon Jul 19, 2010 8:53 pm
Posts: 2114
leifs wrote:
HansKeesom wrote:
The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.


Sure ? When I look at the source of the html i've got from the server there is no filename.jpg there. Where is it written on the client, so that google-image can extract jpg filenames from it ?

leifs


In that case your client must interpret the (java)code into real names, else it can not download them. The robot works like a client, it interprets the (java)code.

What I described earlier was server based interpretation, this is client-based.


Top
 Profile  
 
PostPosted: Thu Dec 26, 2013 8:02 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
HansKeesom wrote:
The robot works like a client, it interprets the (java)code.
What I described earlier was server based interpretation, this is client-based.


In panotour the html calls for the virtualtour.js (the krpano HTML5 viewer), which then calls for the appropriate jpg's and present them.
If google is to get the jpg's names it has to sniff them in some way, because they are not in any file on the client as far as I can see.
If this is the case google has made some special software for this purpose. Then it will be of no help to encrypt the XML's using "krpano protect tool" either.

If the Googlebot-image only opened the html it would grab the few level 1 jpg's.
The bot is zooming too ! It has grabbed level 4.
"GET /panotour/lokeberget/loekebergetdata/lokeberget_17_11_201_23/3/4/11_0.jpg HTTP/1.1"

I will keep a close eye on the logs for some time.

leifs


Top
 Profile  
 
PostPosted: Wed Jan 15, 2014 12:04 pm 
Online
Member
User avatar

Joined: Tue Apr 09, 2013 10:59 am
Posts: 2286
Location: France
Solution will be to encrypt images as krpano can do with it's own algorithm. But it's unavailable for the moment.

_________________
Benjamin
http://www.kolor.com


Top
 Profile  
 
PostPosted: Wed Jan 15, 2014 12:45 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
I posted the issue on the krpano forum and Klaus pointed me to the problem:
listing of directories was allowed (by error)
Google had done what they always do, traverse all folders available, including the tiles folders
I have turned listing off and guess that will keep Google at a distance for now

leifs


Top
 Profile  
 
PostPosted: Wed Jan 15, 2014 12:50 pm 
Offline
Member

Joined: Thu Dec 12, 2013 3:53 pm
Posts: 39
To answer your question about the ability of Google to put it together:

Technically they probably could make a photosynth out of it if they integrated krpano xml to Photosynth. In the future they may offer tools to post excisting 360s to Photosynth. It will be a "nice" way to build the volume of Photosynth content.

I do not think its happening now though.

John


Top
 Profile  
 
PostPosted: Wed Jan 15, 2014 2:38 pm 
Offline
Member

Joined: Sun Sep 06, 2009 12:08 pm
Posts: 597
Location: Ørsta Norway
John360 wrote:
I do not think its happening now though.
John


After the NSA issues You never know !

leifs :)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: benji33 and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group