Browser data eviction criteria hit: All data effectively lost

tautologistics · February 1, 2020, 5:50pm

OS: OSX 10.15.2 (19C57)
Browser: Firefox 72.0.2 (64-bit)
Extension: 1.5.3

I had commented on this thread but it appears to be dead at this point: [SOLVED] "requested database object could not be found"

After hitting the 2GB limit in Firefox, I can’t view any of my categories, tags, or bookmarked pages and I constantly get the warning that data may be lost soon.

At this point I just want to get a list of urls and the categories/tags that there were attached to. Backing up to GDrive is useless because Google compartmentalizes app-stored data so it is not accessible to me. Backing up locally requires the external helper app but it does not work.

How can I just get a simple JSON or XML export of urls and tags?

BlackForestBoi · February 1, 2020, 6:30pm

Sorry I missed your response in the other thread.
So there are 2 things you can do now:

Make a copy of the Sqlite and .files file just in case, because we may soon be able to recover those with other means. We are working on an api that would allow you to query the data more effectively from outside of the extension
You can try and purge the images indeed, I’ve tried it with my extension and it worked, but that was chrome and not with a potentially locked database. What you need to do is to go to “about:addons” > “settings” > “debug addons” > “Inspect” on the Memex item > “console” and then type and confirm:
await storageMan.backend.dexieInstance.table('pages').toCollection().modify({ screenshot: null })

This process can take a while, or will immediately fail because of the storage blockers. In that case we would need to wait a bit until we have the API out for another try. When it is successful the console will after a while spit out a number, which is the amount of pages it processed.

Sorry about all those troubles. You won’t believe how hard it is to make Memex work offline first in the browser.
It’s been very challenging, because browsers sometimes only obey to the standards partially (like you’ve experienced now) or not at all. Some low level database implementations even differ from operating system to operating system. So in essence the browser/operating system combination creates a whole spectrum of different setups we need to take into account. On top of that, unlike with any other development flow, we cannot roll out updates selectively to some users to test, but need to always roll them out to everyone, and we have to wait 3-5 days on every update we push, because the chrome store’s review policy. Anyhow. I feel with you this is a super shitty situation for you and I hope we can resolve it soon.

tautologistics · February 13, 2020, 6:13pm

For anyone else that is inclined, the shortcut for inspecting/debugging this extension in FF is:

about:devtools-toolbox?type=extension&id=info%40worldbrain.io

I ran the update and it appears to have cleared thumbnails. Also ran a count of pages and got 12,009 back. Still getting the missing object error and storage limit warnings but will try a browser restart later today.

BlackForestBoi · February 15, 2020, 7:12pm

Thanks @tautologistics for the updates and the shortcut to get to the console!

Hope it is solved with the restart. Keep us posted

tautologistics · February 27, 2020, 4:16pm

Tried a bunch of things… browser restart, OS restart, and some addon debugging but nothing good. Still have a bloated set of data (2GB) and hourly warnings of data loss.

tautologistics · February 28, 2020, 11:54am

Finally got the local backup helper to work (sort of)… now, I have a 102KB images file and 2.8MB content file but the addon indicated 468MB of exportable data. It appears to be an incremental backup based on some previous backup state/timestamp (which never was successful) and I have not found any way to reset the backup state or initiate a complete and full backup.

So even now I don’t have a complete JSON file that I can parse and transform to some other format.

BlackForestBoi · March 5, 2020, 9:02am

Ok great thats a start!

To reset the backup, change the folder via the Extension UI and you’ll be prompted if you want to restart.
If for some reason this should not work, reset your initial backup time go back to the debug screen > Storage > local Storage > moz… & delete the entry LastBackup. Delete the backups you already made, reload the tab with the backup screen and restart the backup.

Let me know if all worked!

tautologistics · March 26, 2020, 12:37pm

@BlackForestBoi

Had a little free time and wrote something to export the data I care about (see below). This dumps all pages (url, title, and tags) that are contained in a list. e.g.:

{
  "1571755589344": {
    "name": "Ideas",
    "id": 1571755589344,
    "pages": [
      {
        "url": "http://www.altx.com/manifestos/rozztox.html",
        "title": "The ROZZ-TOX Manifesto",
        "tags": ["philosophy", "80s", "art"]
      }
    ]
  }
}

It’s unfortunate that it had to come to this because I think the concept is great and the search works well, however the idea that users own and control their data is quite a stretch at this point. Even though data is stored locally, if it can’t easily and reliably be extracted then the data’s locality is meaningless.

Charging for features, having a smooth import process, and claiming “Memex is offline first. You have full control over your data.” all while direct access and export of data is at best a work in progress leaves a bad taste in one’s mouth and gives the impression, whether intentional or not, of trying to retain users by locking their data in.

I sincerely wish you well and success with the hope that the company’s stated goals and values eventually manifest itself in the product you are building.

// about:devtools-toolbox?type=extension&id=info%40worldbrain.io

await Promise.all([
  storageMan.backend.dexieInstance.customLists.toArray(),
  storageMan.backend.dexieInstance.pageListEntries.toArray(),
  storageMan.backend.dexieInstance.tags.toArray(),
]).then(async ([lists, listPages, tags]) => {
  console.log('Data fetched... building export object');

  let result = lists.reduce(
    (result, list) => {
      result[list.id] = { name: list.name, id: list.id, pages: [] };
      return result;
    },
    {}
  );

  let tagIdx = tags.reduce(
    (result, tagEntry) => {
      let tagList = result[tagEntry.url] = result[tagEntry.url] || [];
      tagList.push(tagEntry.name);
      return result;
    },
    {}
  );

  listPages.forEach(async list_url => {
    let url = list_url.pageUrl;
    let page = await storageMan.backend.dexieInstance.pages.where('url').equals(url).first();
    let list = result[list_url.listId];
    list.pages.push({ url: page.fullUrl, title: page.fullTitle, tags: tagIdx[url] || [] });
  });

  return result;
}).then(console.log).catch(console.error);

BlackForestBoi · March 26, 2020, 3:55pm

@tautologistics

Wow awesome initiative Chris.

I know things are not as optimal as they could be in terms of data access. For us it’s really a resource issue at this point and we can’t build all those features while trying to get a sustainable business up and running that does not take venture capital. It’s more a matter of priorities than lack of desire to make all of that smooth. Although it could be better definitely I don’t feel we are violating our values here. Afterall you own your data if its all on your computer, and the ability for you to write a script like that in a few hours speaks for that too. I get your issue though, its not user friendly or finished at all.

The good thing is that we are about to release StorexHub which is essentially an offline-first Zapier like API, also for Memex. You can already work with it actually. We are just polishing up the documentation so that people can get started on it.
We built it because we know there will be many use cases on how people want to export, import and analyse Memex data or use different technologies (like IPFs), and we won’t ever be able to cover all. So we built this interface where developers can do it themselves without our central control.

The little script you built would make a great app in there to help people export their data.